alert-fall-82501
11/28/2022, 10:23 AM[2022-11-28 15:48:50,975] INFO {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.9.2.4
[2022-11-28 15:48:51,351] ERROR {datahub.entrypoints:206} - Command failed: while scanning for the next token
found character '\t' that cannot start any token
in "<file>", line 13, column 32
Traceback (most recent call last
return loader.get_single_data()
File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/lib/python3/dist-packages/yaml/composer.py", line 127, in compose_mapping_node
while not self.check_event(MappingEndEvent):
File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/lib/python3/dist-packages/yaml/parser.py", line 428, in parse_block_mapping_key
if self.check_token(KeyToken):
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 258, in fetch_more_tokens
raise ScannerError("while scanning for the next token", None,
yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
in "<file>", line 13, column 32
alert-fall-82501
11/28/2022, 10:25 AMsource:
type: redshift
config:
# Coordinates
host_port: xxxxxxxxxxxx
database: xxx
database_alias: xx
# Credentials
username: xxxx
password: xxxxxxxx
include_views: True # whether to include views, defaults to True
include_tables: True # whether to include views, defaults to True
include_table_lineage: True
schema_pattern:
allow: ['rawdata']
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
alert-fall-82501
11/28/2022, 11:45 AMalert-fall-82501
11/28/2022, 11:46 AM/usr/lib/python3/dist-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,
[2022-11-28 17:12:43,199] INFO {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.9.2.4
[2022-11-28 17:12:43,386] INFO {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
/home/kiranto@cybage.com/.local/lib/python3.8/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 0.1.36ubuntu1 is an invalid version and will not be supported in a future release
warnings.warn(
/home/kiranto@cybage.com/.local/lib/python3.8/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 0.23ubuntu1 is an invalid version and will not be supported in a future release
warnings.warn(
/home/kiranto@cybage.com/.local/lib/python3.8/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.13.1-unknown is an invalid version and will not be supported in a future release
warnings.warn(
[2022-11-28 17:12:58,883] WARNING {root:99} - project_id_pattern is not set but project_id is set, setting project_id as project_id_pattern. project_id will be deprecated, please use project_id_pattern instead.
[2022-11-28 17:12:59,244] ERROR {datahub.entrypoints:182} - Failed to configure source (bigquery): 1 validation error for BigQueryV2Config
credential -> include_table_lineage
extra fields not permitted (type=value_error.extra)
brave-pencil-21289
11/28/2022, 12:16 PMlively-dusk-19162
11/28/2022, 5:57 PMlively-dusk-19162
11/28/2022, 5:57 PMbreezy-controller-54597
11/29/2022, 2:25 AMaverage-baker-96343
11/29/2022, 3:14 AMaverage-baker-96343
11/29/2022, 3:15 AM'(trino.exceptions.FailedToObtainAddedPrepareHeader) \\n[SQL: SELECT \\"table_name\\"\\nFROM '
'\\"information_schema\\".\\"views\\"\\nWHERE \\"table_schema\\" = ?]\\n[parameters: (\'trino_cd_test\',)]\\n(Background on '
'this error at: <http://sqlalche.me/e/13/dbapi>)"], "xxl_job_2.1.0": ["Tables error: '
'(trino.exceptions.FailedToObtainAddedPrepareHeader) \\n[SQL: SELECT \\"table_name\\"\\nFROM '
'\\"information_schema\\".\\"tables\\"\\nWHERE \\"table_schema\\" = ? and \\"table_type\\" != \'VIEW\']\\n[parameters: '
'(\'xxl_job_2.1.0\',)]\\n(Background on this error at: <http://sqlalche.me/e/13/dbapi>)", "Views error: '
'(trino.exceptions.FailedToObtainAddedPrepareHeader) \\n[SQL: SELECT \\"table_name\\"\\nFROM '
'\\"information_schema\\".\\"views\\"\\nWHERE \\"table_schema\\" = ?]\\n[parameters: (\'xxl_job_2.1.0\',)]\\n(Background on '
'this error at: <http://sqlalche.me/e/13/dbapi>)"]}, "tables_scanned": "0", "views_scanned": "0", "entities_profiled": "0", '
'"filtered": [], "soft_deleted_stale_entities": [], "start_time": "2022-11-29 03:10:56.524470", "running_time_in_seconds": '
'"1"}}, "sink": {"type": "datahub-rest", "report": {"total_records_written": "55", "records_written_per_second": "11", '
'"warnings": [], "failures": [], "start_time": "2022-11-29 03:10:53.114169", "current_time": "2022-11-29 03:10:57.986499", '
'"total_duration_in_seconds": "4.87", "gms_version": "v0.9.2", "pending_requests": "0"}}}'}
Execution finished with errors.
loud-journalist-47725
11/29/2022, 6:12 AMremoved
state
Has anyone else had the chance to change the status 'removed': 'true'
to '`false'`?ancient-policeman-73437
11/29/2022, 10:22 AMaloof-iron-76856
11/29/2022, 6:37 PMfreezing-cat-19219
11/30/2022, 1:27 AMlate-ability-59580
11/30/2022, 7:34 AMfuture-iron-16086
11/30/2022, 12:35 PM{
"entity":{
"value":{
"com.linkedin.metadata.snapshot.DatasetSnapshot":{
"urn":"urn:li:dataset:(urn:li:dataPlatform:bigquery,project.schema.table,QA)",
"aspects":[
{
"com.linkedin.schema.EditableSchemaMetadata": {
"editableSchemaFieldInfo":[
{
"fieldPath":"IND_STATUS",
"globalTags": {
"tags":[
{
"tag":"urn:li:tag:Engineering_03",
"tag":"urn:li:tag:Felipe"
}
]
},
}
]
}
}
]
}
}
}
}
Is it possible to do?calm-psychiatrist-98577
11/30/2022, 9:24 PMsquare-solstice-69079
12/01/2022, 8:24 AMancient-jordan-41401
12/01/2022, 10:03 AMancient-apartment-23316
12/01/2022, 1:33 PMdatahub ingest -c myrecipe.dhub.yaml
and I getting a lot of errors:
Warning - Read timed out
Error - Failed to fetch the large result set
-[2022-12-01 15:23:19,646] WARNING {snowflake.connector.vendored.urllib3.connectionpool:780} - Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='<http://myhostname.s3.amazonaws.com|myhostname.s3.amazonaws.com>', port=443): Read timed out. (read timeout=7)")': /5bdk-s-v2st8093/results/01a8ad7b-0402-c842-0021-fd031e5452d2_0/main/data_0_4_10?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=qweqwe&Expires=1669922461&Signature=qweqwe
/[2022-12-01 15:23:19,843] ERROR {snowflake.connector.result_batch:342} - Failed to fetch the large result set batch data_0_4_7 for the 1 th time, backing off for 3s for the reason: 'HTTPSConnectionPool(host='<http://myhostname.s3.amazonaws.com|myhostname.s3.amazonaws.com>', port=443): Read timed out.'
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/snowflake/connector/vendored/urllib3/contrib/pyopenssl.py", line 319, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/OpenSSL/SSL.py", line 1800, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/OpenSSL/SSL.py", line 1607, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/snowflake/connector/vendored/urllib3/contrib/pyopenssl.py", line 319, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/OpenSSL/SSL.py", line 1800, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/OpenSSL/SSL.py", line 1607, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
bumpy-pharmacist-66525
12/01/2022, 2:43 PMsuperset
source with an OAuth token to Superset? At the moment it seems like you can only supply a username and password to a Superset account, but being able to supply it an OAuth token would be a great feature. Unless I am misunderstanding how it works, once you enable OAuth in Superset, there is no longer a way to login using a local username and password, you must go the OAuth route. This means that the superset
source can no longer work as soon as you enable OAuth (on the Superset end).quiet-wolf-56299
12/01/2022, 4:26 PMancient-apartment-23316
12/01/2022, 8:03 PMinvalid-dataset-pattern
I don’t see this data loaded into the datahub, I made a recipe for only 2 tables, but it didn’t get into the datahub
'warnings': {'invalid-dataset-pattern': ["Found ['MY_QQQ_PROD', 'WWW'] of type Schema", "Found ['MY_QQQ_PROD', 'WWW'] of type Schema", "Found ['MY_QQQ_PROD', 'WWW'] of type Schema", "Found ['MY_QQQ_PROD', 'WWW'] of type Schema", "Found ['MY_QQQ_DEV', 'JJ_KK_WW'] of type Schema", "Found ['MY_QQQ_PROD', 'WWW'] of type Schema", "Found ['MY_QQQ_DEV', 'JJ_KK_WW'] of type Schema", "Found ['MY_QQQ_DEV', 'QWE'] of type Schema", "Found ['MY_QQQ_DEV', 'KKK'] of type Schema", "Found ['MY_QQQ_DEV', 'QWE'] of type Schema", '... sampled of 12 total elements']},
...
'total_records_written': '9',
...
Pipeline finished with at least 12 warnings; produced 9 events in 4 minutes and 37.43 seconds.
future-iron-16086
12/01/2022, 8:14 PMrhythmic-stone-77840
12/01/2022, 10:57 PMblue-fall-10754
12/01/2022, 11:04 PMdatahub.emitter.rest_emitter.DatahubRestEmitter
on the gms endpoint; I am being met with a 401 Unauthorized -
ConfigurationError: Unable to connect to https://{MY_COMPANIES_DATAHUB_HOST}/api/gms/config with status_code: 401. Maybe you need to set up authentication? Please check your configuration and make sure you are talking to the DataHub GMS (usually <datahub-gms-host>:8080) or Frontend GMS API (usually <frontend>:9002/api/gms).
I know the team has not _opted-in_ for authn (can confirm this also coz the root user cannot create access tokens), so is it expected behavior to be running into this issue while attempting hitting gms using the rest emitter API?
Whats adding confusion to this is that, when I hit the same gms link in my browser, I can see the json config returned (so that rules out hitting the wrong URL for gms).. Also this issue is not met by a teammate who ingests smaller datasets using file ingestion via UI.square-solstice-69079
12/02/2022, 8:39 AM'[2022-12-02 08:23:27,442] INFO {datahub.cli.ingest_cli:177} - DataHub CLI version: 0.8.43.5\n'
'[2022-12-02 08:23:27,467] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
I'm on version 0.9.2, and I use only quickstart to upgrade, but use a docker-compose after a upgrade to add some variables to the docker for OIDC.limited-forest-73733
12/02/2022, 9:48 AMlemon-cat-72045
12/05/2022, 5:51 AM'DatahubIngestionCheckpointingProvider. Commit policy = CommitPolicy.ON_NO_ERRORS, has_errors=True, has_warnings=False\n'
kind-sunset-55628
12/05/2022, 6:32 AM(cx_Oracle.DatabaseError) ORA-00942: table or view does not exist\n'
'[SQL: SELECT username FROM dba_users ORDER BY username]