ambitious-guitar-89068
02/17/2022, 5:07 AMbrave-market-65632
02/17/2022, 5:44 AM2022-02-17 10:54:02,004] ERROR {datahub.entrypoints:119} - Stackprinter failed while formatting <FrameInfo /usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py, line 221, scope SQLAlchemyConfig>:
File "/usr/local/lib/python3.9/site-packages/stackprinter/frame_formatting.py", line 224, in select_scope
raise Exception("Picked an invalid source context: %s" % info)
Exception: Picked an invalid source context: [221], [192], dict_keys([192, 193])
So here is your original traceback at least:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 77, in run
pipeline = Pipeline.create(pipeline_config, dry_run, preview)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 175, in create
return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 120, in __init__
source_class = source_registry.get(source_type)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 126, in get
tp = self._ensure_not_lazy(key)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 84, in _ensure_not_lazy
plugin_class = import_path(path)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path
item = importlib.import_module(module_name)
File "/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/snowflake.py", line 29, in <module>
from datahub.ingestion.source.sql.sql_common import (
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 206, in <module>
class SQLAlchemyConfig(StatefulIngestionConfigBase):
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/sql/sql_common.py", line 221, in SQLAlchemyConfig
from datahub.ingestion.source.ge_data_profiler import GEProfilingConfig
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 27, in <module>
from great_expectations.core.util import convert_to_json_serializable
File "/usr/local/lib/python3.9/site-packages/great_expectations/__init__.py", line 7, in <module>
from great_expectations.data_context import DataContext
File "/usr/local/lib/python3.9/site-packages/great_expectations/data_context/__init__.py", line 1, in <module>
from .data_context import BaseDataContext, DataContext, ExplorerDataContext
File "/usr/local/lib/python3.9/site-packages/great_expectations/data_context/data_context.py", line 23, in <module>
from great_expectations.rule_based_profiler.config.base import (
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/__init__.py", line 1, in <module>
from .rule_based_profiler import RuleBasedProfiler
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/rule_based_profiler.py", line 16, in <module>
from great_expectations.rule_based_profiler.domain_builder.domain_builder import (
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/domain_builder/__init__.py", line 1, in <module>
from .domain_builder import DomainBuilder # isort:skip
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/domain_builder/domain_builder.py", line 6, in <module>
from great_expectations.rule_based_profiler.types import (
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/types/__init__.py", line 3, in <module>
from .domain import ( # isort:skip
File "/usr/local/lib/python3.9/site-packages/great_expectations/rule_based_profiler/types/domain.py", line 8, in <module>
from great_expectations.execution_engine.execution_engine import MetricDomainTypes
File "/usr/local/lib/python3.9/site-packages/great_expectations/execution_engine/__init__.py", line 4, in <module>
from .sqlalchemy_execution_engine import SqlAlchemyExecutionEngine
File "/usr/local/lib/python3.9/site-packages/great_expectations/execution_engine/sqlalchemy_execution_engine.py", line 97, in <module>
import pybigquery.sqlalchemy_bigquery
File "/usr/local/lib/python3.9/site-packages/pybigquery/sqlalchemy_bigquery.py", line 32, in <module>
from google.cloud.bigquery import dbapi
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
from google.cloud.bigquery.client import Client
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 70, in <module>
from google.cloud.bigquery import _pandas_helpers
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 67, in <module>
from google.cloud.bigquery import schema
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery/schema.py", line 20, in <module>
from google.cloud.bigquery_v2 import types
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery_v2/__init__.py", line 18, in <module>
from .types.encryption_config import EncryptionConfiguration
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery_v2/types/__init__.py", line 16, in <module>
from .encryption_config import EncryptionConfiguration
File "/usr/local/lib/python3.9/site-packages/google/cloud/bigquery_v2/types/encryption_config.py", line 26, in <module>
class EncryptionConfiguration(proto.Message):
File "/usr/local/lib/python3.9/site-packages/proto/message.py", line 200, in __new__
file_info = _file_info._FileInfo.maybe_add_descriptor(filename, package)
File "/usr/local/lib/python3.9/site-packages/proto/_file_info.py", line 42, in maybe_add_descriptor
descriptor=descriptor_pb2.FileDescriptorProto(
TypeError: descriptor to field 'google.protobuf.FileDescriptorProto.name' doesn't apply to 'FileDescriptorProto' object
witty-butcher-82399
02/17/2022, 9:52 AMtable_pattern
does not prevent them to be profiled, unless they are also filtered out (deny) in the profile_pattern
too. Is that the expected behaviour? Shouldn’t denied tables for ingestion also be denied for profiling?
If I’m not wrong, this is noted here https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py#L1089-L1092 where table pattern is not included.
Thanks!abundant-lizard-52842
02/17/2022, 10:28 AMbrave-secretary-27487
02/17/2022, 4:40 PMdatahub-frontend:
extraEnvs:
- name: METADATA_SERVICE_AUTH_ENABLED
value: "true"
datahub:
metadata_service_authentication:
enabled: true
datahub-gms:
extraEnvs:
- name: METADATA_SERVICE_AUTH_ENABLED
value: "true"
datahub:
metadata_service_authentication:
enabled: true
global:
datahub:
metadata_service_authentication:
enabled: true
systemClientId: "__datahub_system"
systemClientSecret:
secretRef: "datahub-auth-secrets"
secretKey: "token_service_signing_key"
tokenService:
signingKey:
secretRef: "datahub-auth-secrets"
secretKey: "token_service_signing_key"
# Set to false if you'd like to provide your own auth secrets
provisionSecrets: true
I'm, not sure which ENV variable to overwrite as there is a reference in global and in both front-end and GMS.
I'm also unsure how to retrieve systemClientId
Under the global key.
I have deployed this with helm but I was still able to make unauthenticated requests.
I have no clue what I am missing.numerous-eve-42142
02/17/2022, 7:20 PMsource:
type: redshift
config:
# Coordinates
host_port: *******
database: *******
# Credentials
username: ******
password: ******
# Options
include_tables: True
include_table_lineage: True
include_views: False
# Autorization
schema_pattern:
allow:
- "wine_stg"
profiling:
enabled: true
include_field_null_count: true
include_field_min_value: true
include_field_max_value: true
include_field_mean_value: true
include_field_median_value: false
include_field_histogram: false
sink:
# sink configs
type: "datahub-rest"
config:
server: "******"
glamorous-house-64036
02/18/2022, 12:07 AMERROR: for mysql Cannot start service mysql: error while creating mount source path '/mysql/init.sql': mkdir /mysql: read-only file system
ERROR: Encountered errors while bringing up the project.
damp-minister-31834
02/18/2022, 3:40 AMfew-air-56117
02/18/2022, 9:19 AMRuns at 02:20 am (America/New_York)
now, after 2 hours, in ui its look like its still running so i check the logs and its says that its done
tables_scanned': 323,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'views_scanned': 337,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'entities_profiled': 327,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'filtered': [],
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'soft_deleted_stale_entities': [],
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'query_combiner': {'total_queries': 10107,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'uncombined_queries_issued': 4765,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'combined_queries_issued': 603,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'queries_combined': 6298,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'query_exceptions': 11}}
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
Sink (datahub-rest) report:
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
{'records_written': 775,
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'warnings': [],
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'failures': [],
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'downstream_start_time': datetime.datetime(2022, 2, 18, 7, 22, 37, 552032),
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'downstream_end_time': datetime.datetime(2022, 2, 18, 7, 35, 23, 470652),
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
'downstream_total_latency_in_seconds': 765.91862}
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
{}
Info
2022-02-18 09:35:26.496 EET
acryl-datahub-actions
Pipeline finished with failures
brave-secretary-27487
02/18/2022, 2:36 PMFailed to install PyPI packages. black 22.1.0 has requirement click>=8.0.0, but you have click 7.1.2.
Check the Cloud Build log at <https://console.cloud.google.com/cloud-build/builds/0xxxxxxxx?project=xxxxxxxx> for details. For detailed instructions see <https://cloud.google.com/composer/docs/troubleshooting-package-installation>
breezy-portugal-43538
02/18/2022, 2:53 PM[2022-02-18 14:30:10,541] ERROR {logger:26} - Please set env variable SPARK_VERSION
JAVA_HOME is not set
[2022-02-18 14:30:10,896] ERROR {datahub.entrypoints:119} - File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 77, in run
67 def run(config: str, dry_run: bool, preview: bool, strict_warnings: bool) -> None:
(...)
73 pipeline_config = load_config_file(config_file)
74
75 try:
76 logger.debug(f"Using config: {pipeline_config}")
--> 77 pipeline = Pipeline.create(pipeline_config, dry_run, preview)
78 except ValidationError as e:
[...]
Because of the exception I am unable to use ingestion process properly for my own metadata setup. Could you help resovle this issue?
1. In order to reproduce please use following yml (example.yml) file:
source:
type: data-lake
config:
env: "PROD"
platform: "dataLake"
base_path: "load"
profiling:
enabled: true
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
2. Then simply run command within the metadata-ingestion folder as:
./metadata-ingestion/scripts/datahub_docker.sh ingest -c example.yml
If it will be required I can also paste the whole error log from my faulting operation.numerous-application-54063
02/18/2022, 3:19 PMprotoPayload.serviceName="<http://bigquery.googleapis.com|bigquery.googleapis.com>"
i do get logs.
but if i add the second condition taken from the connector, no logs are found:
protoPayload.methodName="jobservice.jobcompleted"
Instead in my logs the method name is formatted like this:
methodName: "google.cloud.bigquery.v2.JobService.InsertJob"
any advice on this one?
thanks!cuddly-engine-66252
02/20/2022, 11:44 AMprompt=select_account
authentication url parameter in docker.env for the frontend-react container? So that if a non-company account is selected, it would be possible to choose it, and not face 403, thank you.cuddly-engine-66252
02/20/2022, 12:11 PMhttps://{frontend_link}/user/urn:li:corpuser:{user}/assets
(new ones do not appear, deleted ones are not deleted)
v0.8.26
UPD: After 10-15 minutes, the data is finally updated. But not immediatelydamp-minister-31834
02/21/2022, 7:01 AMgifted-piano-21322
02/21/2022, 8:30 AMhigh-hospital-85984
02/21/2022, 9:33 AMhigh-hospital-85984
02/21/2022, 9:55 AM07:10:21.041 [qtp544724190-10] INFO c.l.metadata.entity.EntityService:681 - INGEST urn urn:li:chart:(looker,dashboard_elements.21845) with system metadata {lastObserved=1645427421035, runId=looker-2022_02_21-07_10_08}
07:10:21.053 [qtp544724190-10] INFO c.l.m.filter.RestliLoggingFilter:56 - POST /entities?action=ingest - ingest - 500 - 12ms
07:10:21.053 [qtp544724190-10] ERROR c.l.m.filter.RestliLoggingFilter:38 - com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries
Is this a problem with the GMS database?boundless-student-48844
02/21/2022, 12:10 PMMLFeatureV2
to support our ML discoverability use cases). I’ve done the changes on Pegasus, GraphQL, Java and React and it has successfully built with ./gradlew build
.
I tried to ingest to gms and it worked. The aspect values are successfully stored in MySQL (as seen in screenshot).
However, when I navigate to the UI, the getSearchResults
graphql request (second screenshot) for the new entity has below error.
{
"errors": [
{
"message": "The field at path '/search/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'",
"path": [
"search",
"searchResults",
0,
"entity"
],
"extensions": {
"classification": "NullValueInNonNullableField"
}
}
],
"data": {
"search": null
}
}
Do you know what’s missing here?strong-iron-17184
02/21/2022, 2:46 PMalert-teacher-6920
02/21/2022, 9:34 PMbland-orange-13353
02/22/2022, 11:04 AMhallowed-gpu-49827
02/22/2022, 1:44 PMMETADATA_SERVICE_AUTH_ENABLED=true
to frontend AND gms. I restarted them and logged out/in again.
I receive this error in gms logs showing that it’s not sending the token over as it should:
com.datahub.authentication.AuthenticationException: Failed to authenticate inbound request: Authorization header is missing 'Basic' prefix.
strong-iron-17184
02/22/2022, 2:50 PMalert-teacher-6920
02/22/2022, 9:53 PMancient-pillow-45716
02/23/2022, 7:35 AMfew-air-56117
02/23/2022, 7:53 AMehlm install datahub datahub/datahub -f helm_custom_settings_custom_helm.yaml
but i have this error
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
salmon-area-51650
02/23/2022, 12:24 PMvalues.yaml
?
Thanks in advance!mysterious-butcher-86719
02/23/2022, 2:25 PMmysterious-butcher-86719
02/23/2022, 2:51 PM.
. e.g. relational datasets are usually named as <db>.<schema>.<table>
, except for platforms like MySQL which do not have the concept of a `schema`; as a result MySQL datasets are named <db>.<table>
. In cases where the specific platform can have multiple instances (e.g. there are multiple different instances of MySQL databases that have different data assets in them), names can also include instance ids, making the general pattern for a name <platform_instance>.<db>.<schema>.<table>
."