refined-energy-76018
12/17/2022, 1:55 AMcustomProperties
of dataProcessInstanceProperties
? Are there plans to add new attributes to customProperties
or should new attributes be added by extending the entity? I noticed there are some missing properties that the airflow API has when comparing to what airflow_generator.py
emits. More specifically, comparing this with the screenshot attached.great-fall-93268
12/17/2022, 4:37 PMbrave-lunch-64773
12/19/2022, 4:25 AM(oracle): 1 validation error for OracleConfig\n'
'schema_pattern.allow\n'
' extra fields not permitted (type=value_error.extra)\n',
limited-forest-73733
12/19/2022, 7:49 AMdamp-ambulance-34232
12/19/2022, 8:04 AMaloof-energy-17918
12/19/2022, 9:05 AMfaint-actor-78390
12/19/2022, 9:37 AMbitter-park-52601
12/19/2022, 10:15 AMMETADATA_SERVICE_AUTH_ENABLED
environment variable to “true” for the datahub-gms
AND datahub-frontend
containers / pods.
2. granted the privileges Generate Personal Access Tokens
or `Manage All Access Tokens`to my user
3. Generated an Access Token without expiration date
4. Tried the following request:
curl -X ‘GET’ \
‘<myserverurl>/openapi/entities/v1/latest?urns=<myurn>’ \
-H ‘Authorization: Bearer <my token>’
-H ‘accept: application/json’
Any ideas? 🙂stocky-truck-96371
12/19/2022, 2:02 PMrhythmic-church-10210
12/19/2022, 3:00 PMaloof-lamp-5537
12/19/2022, 3:27 PMget_schema_metadata()
is given a kafka topic to then fetch a single schema. How would I handle topics that hold messages with more than one schema?
2. Do I need to fork the source code or can I just copy src/datahub/metadata/schema_classes.py
to my project in order to get the SchemaMetatadata
class? Or is there a better way?future-florist-65080
12/19/2022, 9:00 PMlively-dusk-19162
12/19/2022, 11:25 PMswift-evening-68463
12/20/2022, 6:59 AMalert-fall-82501
12/20/2022, 11:29 AMalert-fall-82501
12/20/2022, 11:30 AM[2022-12-20, 06:00:12 UTC] {{subprocess.py:74}} INFO - Running command: ['bash', '-c', 'python3 -m datahub ingest -c /usr/local/airflow/dags/dt_datahub/recipes/prod/Hive/hive.yaml']
[2022-12-20, 06:00:12 UTC] {{subprocess.py:85}} INFO - Output:
[2022-12-20, 06:00:16 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:16 UTC] INFO {datahub.cli.ingest_cli:179} - DataHub CLI version: 0.8.44
[2022-12-20, 06:00:16 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:16 UTC] INFO {datahub.ingestion.run.pipeline:165} - Sink configured successfully. DataHubRestEmitter: configured to talk to <https://datahub-gms.digitalturbine.com:8080>
[2022-12-20, 06:00:21 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:21 UTC] INFO {datahub.ingestion.run.pipeline:190} - Source configured successfully.
[2022-12-20, 06:00:21 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:21 UTC] INFO {datahub.cli.ingest_cli:126} - Starting metadata ingestion
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:22 UTC] INFO {datahub.cli.ingest_cli:134} - Source (hive) report:
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - {'entities_profiled': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'event_ids': [],
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'events_produced': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'events_produced_per_sec': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'failures': {},
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'filtered': [],
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'read_rate': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'running_time_in_seconds': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'soft_deleted_stale_entities': [],
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'start_time': '2022-12-20 06:00:21.431859',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'tables_scanned': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'views_scanned': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'warnings': {}}
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:22 UTC] INFO {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - {'current_time': '2022-12-20 06:00:22.094502',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'failures': [],
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'gms_version': 'v0.8.45',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'pending_requests': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'records_written_per_second': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'start_time': '2022-12-20 06:00:14.207811',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'total_duration_in_seconds': '7.89',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'total_records_written': '0',
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - 'warnings': []}
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - [2022-12-20, 06:00:22 UTC] ERROR {datahub.entrypoints:192} -
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - Traceback (most recent call last):
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/entrypoints.py", line 149, in main
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - sys.exit(datahub(standalone_mode=False, **kwargs))
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return self.main(*args, **kwargs)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1053, in main
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - rv = self.invoke(ctx)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return _process_result(sub_ctx.command.invoke(sub_ctx))
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return _process_result(sub_ctx.command.invoke(sub_ctx))
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return ctx.invoke(self.callback, **ctx.params)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/core.py", line 754, in invoke
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return __callback(*args, **kwargs)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return f(get_current_context(), *args, **kwargs)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - raise e
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - res = func(*args, **kwargs)
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
[2022-12-20, 06:00:22 UTC] {{subprocess.py:89}} INFO - return func(*args, **kwargs)
packages/airflow/operators/bash.py", line 188, in execute
f'Bash command failed. The command returned a non-zero exit code {result.exit_code}.'
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
[2022-12-20, 06:00:23 UTC] {{taskinstance.py:1280}} INFO - Marking task as UP_FOR_RETRY. dag_id=datahub_hive_ingest, task_id=hive_ingest, execution_date=20221219T060000, start_date=20221220T060011, end_date=20221220T060023
[2022-12-20, 06:00:23 UTC] {{standard_task_runner.py:91}} ERROR - Failed to execute job 85663 for task hive_ingest
chilly-spring-43918
12/20/2022, 12:37 PMsource:
type: bigquery
config:
credential:
private_key_id: #####key_id#####
project_id: #####project_id#####
client_email: #####client_email#####
private_key: '${stg_pvt_key}'
client_id: '#####client_d#####
project_id_pattern:
allow:
- #####bigquery_project#####
and here is the error
⏳ Pipeline running successfully so far; produced 19 events in 7.76 seconds.
/usr/local/bin/run_ingest.sh: line 40: 376 Killed ( datahub ${debug_option} ingest run -c "${recipe_file}" ${report_option} )
2022-12-20 12:18:14.874076 [exec_id=1d5677d6-5b68-4652-87df-9842306804aa] INFO: Failed to execute 'datahub ingest'
2022-12-20 12:18:14.874434 [exec_id=1d5677d6-5b68-4652-87df-9842306804aa] INFO: Caught exception EXECUTING task_id=1d5677d6-5b68-4652-87df-9842306804aa, name=RUN_INGEST, stacktrace=Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task
task_event_loop.run_until_complete(task_future)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute
raise TaskError("Failed to execute 'datahub ingest'")
acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'
~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '1d5677d6-5b68-4652-87df-9842306804aa',
i am using datahub version v0.9.3 using helm version 0.2.120best-wire-59738
12/20/2022, 12:49 PMbest-wire-59738
12/20/2022, 1:41 PMpurple-terabyte-64712
12/20/2022, 1:45 PMlimited-forest-73733
12/20/2022, 2:10 PMmicroscopic-machine-90437
12/20/2022, 2:49 PMmicroscopic-mechanic-13766
12/20/2022, 3:55 PMfaint-tiger-13525
12/21/2022, 11:02 AMlate-ability-59580
12/21/2022, 12:07 PMlate-ability-59580
12/21/2022, 1:16 PMplatform_instance
can be used to differentiate between 2 Snowflake accounts, and allows for identical resource (<db.schema.table>) names in different accounts.
My question is about shared databases and tables:
Is there a way to automatically identify shared entities and provide lineage between them?bumpy-egg-8563
12/21/2022, 3:20 PMdbt
and bigquery
entities have been wrapped into one, user-friendly looking dataset. So, I could expect to see the results of BQ SQL profiling (Stats
) present next to dbt tests
results (Validation
), assuming ingestion was performed using two separate recipies, am I right? If not, please, could you give me a hint about what kind of action should I take to make these tabs both available?
P.S. I'm using v0.8.44
atm.abundant-airport-72599
12/21/2022, 6:54 PM!(deprecated:true)
is the only way I can figure out how to do it, I guess because the deprecated property doesn’t exist at all until it’s first set as true?
• Lineage graph visuals give no indication that something downstream is deprecated, unless you click on the deprecated thing.
• The language around deprecation makes it sound like it’s meant only to be a first step in removal. For example, if you put a deprecation date that’s in the past the UI only states that it is planned to be decommissioned on that past date, not that it is. E.g. “Scheduled to be decommissioned on 16/Nov/2022”.
Ideally I’d want deprecated items to be A) excluded by default but toggle-able and/or B) visually indicated as deprecated in all contexts. Should I be soft-deleting instead? Is there a way to explicitly ask to see soft-deleted items in the UI?helpful-greece-26038
12/21/2022, 7:04 PMlively-dusk-19162
12/21/2022, 9:56 PM