full-chef-85630
11/23/2022, 5:56 AMsilly-finland-62382
11/23/2022, 7:17 AMbetter-fireman-33387
11/23/2022, 10:48 AMalert-fall-82501
11/23/2022, 11:41 AMcan anybody help me with this error , I am working airflow dag jobs lineage ,installed required plugins and docker-compose file
airflow-init_1 | ....................
airflow-init_1 | ERROR! Maximum number of retries (20) reached.
airflow-init_1 |
airflow-init_1 | Last check result:
airflow-init_1 | $ airflow db check
airflow-init_1 | Unable to load the config, contains a configuration error.
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/usr/local/lib/python3.9/pathlib.py", line 1323, in mkdir
airflow-init_1 | self._accessor.mkdir(self, mode)
airflow-init_1 | FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2022-11-23'
airflow-init_1 |
airflow-init_1 | During handling of the above exception, another exception occurred:
airflow-init_1 |
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/usr/local/lib/python3.9/logging/config.py", line 564, in configure
airflow-init_1 | handler = self.configure_handler(handlers[name])
airflow-init_1 | File "/usr/local/lib/python3.9/logging/config.py", line 745, in configure_handler
airflow-init_1 | result = factory(**kwargs)
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/log/file_processor_handler.py", line 46, in __init__
airflow-init_1 | Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)
airflow-init_1 | File "/usr/local/lib/python3.9/pathlib.py", line 1327, in mkdir
airflow-init_1 | self.parent.mkdir(parents=True, exist_ok=True)
airflow-init_1 | File "/usr/local/lib/python3.9/pathlib.py", line 1323, in mkdir
airflow-init_1 | self._accessor.mkdir(self, mode)
airflow-init_1 | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'
airflow-init_1 |
airflow-init_1 | The above exception was the direct cause of the following exception:
airflow-init_1 |
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1 | from airflow.__main__ import main
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/__init__.py", line 46, in <module>
airflow-init_1 | settings.initialize()
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/settings.py", line 444, in initialize
airflow-init_1 | LOGGING_CLASS_PATH = configure_logging()
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/logging_config.py", line 73, in configure_logging
airflow-init_1 | raise e
airflow-init_1 | File "/home/airflow/.local/lib/python3.9/site-packages/airflow/logging_config.py", line 68, in configure_logging
airflow-init_1 | dictConfig(logging_config)
airflow-init_1 | File "/usr/local/lib/python3.9/logging/config.py", line 809, in dictConfig
airflow-init_1 | dictConfigClass(config).configure()
airflow-init_1 | File "/usr/local/lib/python3.9/logging/config.py", line 571, in configure
airflow-init_1 | raise ValueError('Unable to configure handler '
airflow-init_1 | ValueError: Unable to configure handler 'processor'
airflow-init_1 |
silly-intern-25190
11/23/2022, 12:40 PMfuture-iron-16086
11/23/2022, 1:51 PMmammoth-gigabyte-6392
11/23/2022, 3:06 PMquaint-barista-82836
11/23/2022, 4:33 PMquaint-barista-82836
11/23/2022, 7:19 PMquaint-barista-82836
11/23/2022, 7:22 PMlemon-lock-89160
11/23/2022, 8:25 PMfamous-quill-82626
11/23/2022, 10:15 PMdatahub ingest -c {recipe-filename.yaml}... I do not seem to be able to do this with Domains, and these must be manually entered via the UI - is this correct?
polite-alarm-98901
11/23/2022, 11:11 PMabundant-napkin-12120
11/24/2022, 3:27 AMancient-policeman-73437
11/24/2022, 7:38 AMacoustic-ghost-64885
11/24/2022, 9:16 AMwitty-microphone-40893
11/24/2022, 9:48 AMsource:
type: dbcat.datahub.CatalogSource
config:
database: main
path: '/Users/user/Documents/datascience-experiments/piiscan'
source_names:
- prod_cat
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
I run it with
datahub ingest -c ./export.dhub.yml
And the resulting run contains errors like these: (snipped for conciseness)
[2022-11-24 09:40:47,109] INFO {datahub.cli.ingest_cli:167} - DataHub CLI version: 0.9.2.1
[2022-11-24 09:40:47,171] INFO {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2022-11-24 09:40:53,745] INFO {datahub.ingestion.run.pipeline:197} - Source configured successfully.
[2022-11-24 09:40:53,746] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion
-[2022-11-24 09:40:53,985] ERROR {datahub.ingestion.run.pipeline:57} - failed to write record with workunit loan_management.account_holder with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/4/globalTags/tags/1/tag :: "Provided urn urn.li.tag.ADDRESS" is invalid\nERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/5/globalTags/tags/1/tag :: "Provided urn urn.li.tag.PERSON" is invalid\n', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/c', 'status': 422, 'id': 'urn:li:dataset:(urn:li:dataPlatform:mysql,loan_management.account_holder,PROD)'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: com.linkedin.metadata.entity.validation.ValidationException:
....
....
{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: '
'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class '
'com.linkedin.entity.Entity: ERROR :: '
'/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/3/globalTags/tags/1/tag '
':: "Provided urn urn.li.tag.PERSON" is invalid\n'
'\n'
'\tat com.linkedin.metadata.resources.entity.EntityResource.ingest(EntityResource.java:213)',
'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class '
'com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/c',
'status': 422,
'id': 'urn:li:dataset:(urn:li:dataPlatform:mysql,document_templates.editions,PROD)'}},
'... sampled of 87 total elements'],
'start_time': '2022-11-24 09:40:47.165725 (11.66 seconds ago).',
'current_time': '2022-11-24 09:40:58.824433 (now).',
'total_duration_in_seconds': '11.66',
'gms_version': 'v0.9.2',
'pending_requests': '0'}
Pipeline finished with at least 87 failures ; produced 181 events in 5.08 seconds.
It seems the errors are similar to "Provided urn <http://urn.li|urn.li>.tag.ADDRESS" is invalid\nERROR
What am I missing to get this to ingest?rich-van-74931
11/24/2022, 10:47 AMdatahub ingest -c tableau.yml
[2022-11-24 10:38:28,123] INFO {datahub.cli.ingest_cli:167} - DataHub CLI version: 0.9.2.2
[2022-11-24 10:38:28,152] INFO {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
==================
[2022-11-24 10:38:28,546] INFO {datahub.ingestion.run.pipeline:197} - Source configured successfully.
[2022-11-24 10:38:28,549] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion
-[2022-11-24 10:38:28,582] INFO {datahub.cli.ingest_cli:135} - Finished metadata ingestion
/
Cli report:
{'cli_entry_location': '/home/ec2-user/.local/lib/python3.7/site-packages/datahub/__init__.py',
'cli_version': '0.9.2.2',
'mem_info': '68.09 MB',
'os_details': 'Linux-5.10.149-133.644.amzn2.x86_64-x86_64-with-glibc2.2.5',
'py_exec_path': '/usr/bin/python3',
'py_version': '3.7.10 (default, Jun 3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]'}
Source (tableau) report:
{'event_ids': [],
'events_produced': '0',
'events_produced_per_sec': '0',
'failures': {'tableau-login': ['Unable to login with credentials provided: \n\n\t401001: Signin Error\n\t\tError signing in to Tableau Server']},
'running_time': '0.55 seconds',
'soft_deleted_stale_entities': [],
'start_time': '2022-11-24 10:38:28.255411 (now).',
'warnings': {}}
Sink (datahub-rest) report:
{'current_time': '2022-11-24 10:38:28.805863 (now).',
'failures': [],
'gms_version': 'v0.9.2',
'pending_requests': '0',
'records_written_per_second': '0',
'start_time': '2022-11-24 10:38:28.148111 (now).',
'total_duration_in_seconds': '0.66',
'total_records_written': '0',
'warnings': []}
Pipeline finished with at least 1 failures ; produced 0 events in 0.55 seconds.
busy-computer-98970
11/24/2022, 1:29 PM'"systemMetadata": {"lastObserved": 1669295089809, "runId": "aeed3a9a-9dcb-4b89-91c3-8c78c068dc88"}}}\' '
"'<http://datahub-datahub-gms:8080/aspects?action=ingestProposal>'\n"
'[2022-11-24 13:04:49,851] DEBUG {datahub.ingestion.run.pipeline:47} - sink wrote workunit '
'container-urn:li:container:d517cb430c984c29a927ccf609be7dcf-to-urn:li:dataset:(urn:li:dataPlatform:athena,api_silver_db.superlogica_faturas,PROD)\n'
'[2022-11-24 13:04:49,884] DEBUG {datahub.emitter.rest_emitter:236} - Attempting to emit to DataHub GMS; using curl equivalent to:\n',
'2022-11-24 13:04:49.888314 [exec_id=aeed3a9a-9dcb-4b89-91c3-8c78c068dc88] INFO: Caught exception EXECUTING '
'task_id=aeed3a9a-9dcb-4b89-91c3-8c78c068dc88, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
' line = await self.readuntil(sep)\n'
' File "/usr/local/lib/python3.10/asyncio/streams.py", line 620, in readuntil\n'
' raise exceptions.LimitOverrunError(\n'
'asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit\n'
'\n'
'During handling of the above exception, another exception occurred:\n'
'\n'
'Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 147, in execute\n'
' await tasks.gather(_read_output_lines(), _report_progress(), _process_waiter())\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 99, in _read_output_lines\n'
' line_bytes = await ingest_process.stdout.readline()\n'
' File "/usr/local/lib/python3.10/asyncio/streams.py", line 534, in readline\n'
' raise ValueError(e.args[0])\n'
'ValueError: Separator is found, but chunk is longer than limit\n']}
Execution finished with errors.
Remembering, this error only occurs when activating profiling.
My recipe:
source:
type: athena
config:
aws_region: us-west-2
s3_staging_dir: '------------------------------'
profiling:
enabled: true
include_field_sample_values: false
work_group: primary
enough-mouse-67490
11/24/2022, 3:06 PMsource:
type: snowflake
config:
include_table_lineage: true
account_id: pagaya-luigi
profiling:
enabled: true
include_view_lineage: true
warehouse: datahub_wh
stateful_ingestion:
enabled: false
username: '${snowflake_prod_username}'
password: '${snowflake_prod_password}'
table_pattern:
deny:
- '.*TMP$'
role: datahub_test
pipeline_name: 'urn:li:dataHubIngestionSource:_______'
I see all the tables but, I cant see the snowpipes, streams and tasks.
Is someone knows how to connect them or what is wrong with my recipe?
Thanks in advance 🙂colossal-smartphone-90274
11/24/2022, 4:26 PMwhite-xylophone-3944
11/25/2022, 5:25 AMflaky-soccer-57765
11/25/2022, 9:48 AMfull-planet-19427
11/25/2022, 12:57 PMsource:
type: delta-lake
config:
env: "DEV"
base_path: "<s3://dl-gold-zone-dev/>"
s3:
aws_config:
aws_region: "us-east-1"
aws_access_key_id: {{ $aws_access_key_id }}
aws_secret_access_key: {{ $aws_secret_access_key }}
sink:
type: "datahub-rest"
config:
server: "{{ $serverUrl }}"
But when my Job starts, it gets this error:
deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Failed to read S3 object content: Request ID: None Body: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AuthorizationHeaderMalformed</Code><Message>The authorization header is malformed; the region 'custom' is wrong; expecting 'us-east-1'</Message><Region>us-east-1</Region><RequestId>AEW6HDPFQ0P65Z4C</RequestId><HostId>YlU8KVjy7UmSYF6tOM7iZmSJshn1tTKpCzF/mKogmz8lEPkB+ZWhcoNce4Laj/kNYmHTMiqWIRc=</HostId></Error>
I can't understand where do I have to configure this region, even I've configured the "aws_region" parameter. Could someone help me to understand this problem?future-iron-16086
11/25/2022, 6:23 PMeager-cpu-59593
11/26/2022, 2:29 PMsource:
type: elasticsearch
config:
host: <our_host>:9200
use_ssl: false
verify_certs: false
url_prefix: ""
index_pattern:
allow: [".*"]
deny: ["^_."]
sink:
type: "datahub-rest"
config:
server: "http://<datahub-service-name>:8080"
• The elastic we're trying to ingest has version 6.6.2, and the result logs after the ingestion are the following:
Cli report:
{'cli_version': '0.9.2+docker',
'cli_entry_location': '/usr/local/lib/python3.10/site-packages/datahub/__init__.py',
'py_version': '3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110]',
'py_exec_path': '/usr/local/bin/python',
'os_details': 'Linux-5.4.219-126.411.amzn2.x86_64-x86_64-with-glibc2.31',
'mem_info': '128.81 MB'}
Source (elasticsearch) report:
{'events_produced': '0',
'events_produced_per_sec': '0',
'event_ids': [],
'warnings': {},
'failures': {},
'index_scanned': '55',
'filtered': [],
'start_time': '2022-11-26 12:10:03.720786 (now).',
'running_time': '0.46 seconds'}
Sink (datahub-rest) report:
{'total_records_written': '0',
'records_written_per_second': '0',
'warnings': [],
'failures': [],
'start_time': '2022-11-26 12:10:03.494297 (now).',
'current_time': '2022-11-26 12:10:04.179827 (now).',
'total_duration_in_seconds': '0.69',
'gms_version': 'v0.9.2',
'pending_requests': '0'}
Pipeline finished successfully; produced 0 events in 0.46 seconds.
• Some WARNINGS that appear during the ingestion are similar to this one:
[2022-11-26 12:10:03,932] WARNING {datahub.ingestion.source.elastic_search:172} - Missing 'properties' in elastic search mappings={"status": {"properties": {"indexing_status": {"type": "keyword", "index": false}, "version": {"type": "long"}}}}!
Does anyone have an idea of what might be happening or if we're doing something wrong? Thank you very much!thankful-kite-1198
11/27/2022, 12:21 PMquaint-rainbow-60164
11/28/2022, 3:38 AMlineage
in PostgreSQL ingestion?full-chef-85630
11/28/2022, 10:10 AMalert-fall-82501
11/28/2022, 10:22 AM