fresh-battery-23937
05/03/2022, 10:16 PMbest-umbrella-24804
05/03/2022, 11:56 PMbest-umbrella-24804
05/04/2022, 12:40 AMbillowy-refrigerator-34936
05/04/2022, 1:20 AMbest-umbrella-24804
05/04/2022, 2:04 AMmammoth-fountain-32989
05/04/2022, 7:57 AMdazzling-queen-76396
05/04/2022, 8:10 AM'1 validation error for BigQueryUsageConfig\n'
'dataset_pattern\n'
' extra fields not permitted (type=value_error.extra)\n',
Here is my recipe
source:
type: bigquery-usage
config:
projects:
- project1
- project2
credential:
project_id: ''
private_key_id: ''
private_key: ''
client_email: ''
client_id: ''
top_n_queries: 10
include_operational_stats: true
max_query_duration: 45
dataset_pattern:
deny:
- '.*_raw'
- '.*_dev'
- '.*_staging.*'
sink:
type: datahub-rest
config:
server: ''
Datahub version is 0.8.32. Could you help to figure out the problem?cold-hydrogen-10513
05/04/2022, 11:49 AM[2022-05-04 11:34:53,096] {{sql_common.py:496}} WARNING - lineage => Extracting lineage from Snowflake failed.Please check your premissions. Continuing...
agreeable-army-26750
05/04/2022, 1:15 PMlemon-terabyte-66903
05/04/2022, 1:41 PMmammoth-fall-12031
05/04/2022, 2:24 PMsource:
type: mssql
config:
username: username
password: password
database: db_name
host_port: host_with_port
profiling:
enabled: true
# see <https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub> for complete documentation
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
Anything I need to add to explicitly enable lineage?bland-orange-13353
05/04/2022, 7:47 PMnutritious-bird-77396
05/04/2022, 8:11 PMcreateIngestionExecutionRequest
but not clear how it would fetch the ingestion recipe..
Any thoughts?millions-waiter-49836
05/04/2022, 9:19 PMbase_folder
, as the lookeml repo could be quite messy in production… I wonder if we can adjust failure report to warning so the lookerml ingestion job won’t show ‘fail’ status everytimeorange-coat-2879
05/04/2022, 11:25 PMrich-policeman-92383
05/05/2022, 12:49 PMFile "/IngestionRecipies/dhubv08_19/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 508, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/IngestionRecipies/dhubv08_19/lib64/python3.6/site-packages/pyhive/hive.py", line 126, in connect
return Connection(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'tez.queue.name'
[2022-05-05 18:14:40,880] INFO {datahub.entrypoints:162} - DataHub CLI version: 0.8.31 at /IngestionRecipies/dhubv08_19/lib64/python3.6/site-packages/datahub/__init__.py
[2022-05-05 18:14:40,880] INFO {datahub.entrypoints:165} - Python version: 3.6.8 (default, Aug 13 2020, 07:46:32)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] at /IngestionRecipies/dhubv08_19/bin/python3 on Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
[2022-05-05 18:14:40,880] INFO {datahub.entrypoints:167} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.31', 'commit': '2f078c981c86b72145eebf621230ffd445948ef6'}}, 'managedIngestion': {'defaultCliVersion': '0.8.26.6', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'retention': 'true', 'noCode': 'true'}
YML
---
source:
type: hive
config:
host_port: hive:10000
env: "PROD"
database: databaseName
table_pattern:
allow:
- datasetname$
options:
connect_args: {'auth': 'KERBEROS','kerberos_service_name': 'hive', 'tez.queue.name': 'root.myqueue'}
profiling:
enabled: true
profile_pattern:
allow:
- datasetname
sink:
type: "datahub-rest"
config:
server: "<https://didatahub.airtel.com:8080>"
straight-telephone-84434
05/05/2022, 2:51 PMsource:
type: bigquery
config:
project_id: "internal-project-bigquery"
options:
credentials_path: "./key_file.json"
table_pattern:
allow:
# Allow only one table
- "bigquery-public-data.chicago_crime.crime"
sink:
# sink data
type: "datahub-rest"
config:
server: # "sink_server"
This is the error I am getting:
AttributeError: module 'pybigquery.sqlalchemy_bigquery' has no attribute 'dialect'
acryl-datahub, version 0.8.28.1worried-motherboard-80036
05/05/2022, 5:00 PMssl_context = create_ssl_context()
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE
es_client = Elasticsearch(
source_config.host,
http_auth=source_config.http_auth,
url_prefix=source_config.url_prefix,
ssl_context=ssl_context
)
red-pizza-28006
05/05/2022, 5:00 PMAttributeError: 'DataHubGraph' object has no attribute 'get_aspect_v2'
and I cannot see the editableSchemaMetadataClass as well. I am on 0.8.33 version.lemon-terabyte-66903
05/05/2022, 8:12 PMdatahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
But on UI, I could see the correct change.billowy-refrigerator-34936
05/06/2022, 8:42 AMagreeable-army-26750
05/06/2022, 11:06 AMacoustic-quill-54426
05/06/2022, 11:16 AMgifted-bird-57147
05/06/2022, 12:43 PMmillions-sundown-65420
05/06/2022, 3:39 PMmillions-sundown-65420
05/06/2022, 6:41 PMcuddly-arm-8412
05/07/2022, 12:45 AMmillions-sundown-65420
05/08/2022, 9:52 AMsticky-dawn-95000
05/08/2022, 12:01 PM[mjlee@tkgkdc01:datahub-ingestion]$ datahub ingest -c ./oracle_to_datahub.yml --preview --preview-workunits=1000 --dry-run
[2022-05-04 16:48:49,925] INFO {datahub.cli.ingest_cli:88} - DataHub CLI version: 0.8.32.2
[2022-05-04 16:48:49,937] INFO {datahub.ingestion.sink.datahub_rest:60} - Setting gms config
[2022-05-04 16:48:53,814] INFO {datahub.cli.ingest_cli:104} - Starting metadata ingestion
/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/oracle/base.py:1421: SAWarning: Oracle version (19, 12, 0, 0, 0) is known to have a maximum identifier length of 128, rather than the historical default of 30. SQLAlchemy 1.4 will use 128 for this database; please set max_identifier_length=128 in create_engine() in order to test the application with this new length, or set to 30 in order to assure that 30 continues to be used. In particular, pay close attention to the behavior of database migrations as dynamically generated names may change. See the section 'Max Identifier Lengths' in the SQLAlchemy Oracle dialect documentation for background.
% ((self.server_version_info,))
/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/oracle/base.py:1776: SAWarning: Did not recognize type 'ROWID' of column 'head_rowid'
% (coltype, colname)
/usr/local/lib64/python3.6/site-packages/sqlalchemy/dialects/oracle/base.py:1776: SAWarning: Did not recognize type 'UROWID' of column 'head_rowid'
% (coltype, colname)
[2022-05-04 16:48:57,984] INFO {datahub.cli.ingest_cli:106} - Finished metadata ingestion
Source (oracle) report:
{'cli_entry_location': '/fshome/mjlee/.local/lib/python3.6/site-packages/datahub/__init__.py',
'cli_version': '0.8.32.2',
'entities_profiled': 0,
'failures': {},
'filtered': [],
'os_details': 'Linux-4.18.0-193.el8.x86_64-x86_64-with-redhat-8.2-Ootpa',
'py_exec_path': '/usr/bin/python3',
'py_version': '3.6.8 (default, Dec 5 2019, 15:45:45) \n[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]',
'query_combiner': None,
'soft_deleted_stale_entities': [],
'tables_scanned': 155,
'views_scanned': 43,
'warnings': {'audsys.aud$unified': ['missing column information'],
'<http://sys.aq|sys.aq>$_alert_qt_g': ['missing column information'],
'<http://sys.aq|sys.aq>$_alert_qt_h': ['missing column information'],
'<http://sys.aq|sys.aq>$_alert_qt_i': ['missing column information'],
'<http://sys.aq|sys.aq>$_alert_qt_t': ['missing column information'],
…
'container-urn:li:container:0c832d6cfb642ca7e447946a0f1d88b6-to-urn:li:dataset:(urn:li:dataPlatform:oracle,<http://sys.aq|sys.aq>$_kupc$datapump_quetab_1_i,PROD)',
'<http://sys.aq|sys.aq>$_kupc$datapump_quetab_1_i'],
'workunits_produced': 759}
Sink (datahub-rest) report:
{'downstream_end_time': None,
'downstream_start_time': None,
'downstream_total_latency_in_seconds': None,
'failures': [],
'gms_version': 'v0.8.29',
'records_written': 0,
'warnings': []}
Pipeline finished with warnings
At the result in my DataHub web service, there is no information about table schema.
I only have a permission to read Oracle system table(all_check_constraints, all_col_comments, all_cons_columns, all_constraints, all_ind_columns, all_indexes, all_sequences, all_synonyms, all_tab_cols, all_tab_columns, all_tab_comments, all_tab_identity_cols, all_tables, all_users, all_views, all_sequences, dual, nls_session_parameters, user_db_links).
Do I need permissions to read more system tables?
How do I figure it out?
Thanks for help in advance.cool-architect-34612
05/08/2022, 11:39 PM