few-air-56117
12/17/2021, 7:57 AMsource:
type: bigquery
config:
project_id: <project_id>
include_table_lineage: True
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
The table/views are in datahub but the lineage button si not available. Am i missing something? Thx a lot 😄nice-planet-17111
12/17/2021, 8:29 AMbigquery udfs
ingestion? I tried to do it, but it really returns nothing (even if i set include_views: true)few-air-56117
12/17/2021, 8:58 AMgreen-football-48146
12/17/2021, 10:03 AMhive
, if it encounter an abnormality in some tables, the ingestion will be interrupted. Is there any way to skip these abnormal tables when errors occur?proud-accountant-49377
12/17/2021, 11:42 AMbest-planet-6756
12/17/2021, 7:32 PMbusy-zebra-64439
12/20/2021, 9:01 AMwitty-butcher-82399
12/20/2021, 9:28 AMfew-air-56117
12/20/2021, 2:37 PMred-pizza-28006
12/20/2021, 2:51 PMCREATE TEMP TABLE temp.temp_stone AS
WITH upd_stone_response_messages AS
(
SELECT DISTINCT srm.id AS stone_response_message_id
FROM src_payment.stone_response_messages srm
WHERE updated_at BETWEEN $date_start AND $date_end
AND request_type = 'authorize'
AND transaction_id IS NOT NULL
)
Here you can see I have a dependency on src_payment.stone_response_messages
, but when i look at the lineage UI, I only see that the dataset is built using temp_stone.but nothing more than that. The SQL inside of the CTE is not captured in the lineagemodern-monitor-81461
12/20/2021, 3:09 PMMetadataChangeEvent
with a SchemaMetadata
aspect. It looks like something is rejected by the Avro validator, but it doesn't tell me what. Is there a trick to figure what exactly is incompatible with the schema?
File "/datahub/metadata-ingestion/src/datahub/cli/ingest_cli.py", line 82, in run
pipeline.run()
File "/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 157, in run
for record_envelope in self.transform(record_envelopes):
File "/datahub/metadata-ingestion/src/datahub/ingestion/extractor/mce_extractor.py", line 46, in get_records
raise ValueError(
ValueError: source produced an invalid metadata work unit: MetadataChangeEventClass(...
microscopic-elephant-47912
12/20/2021, 8:40 PMmysterious-lamp-91034
12/23/2021, 5:07 AMdatahub docker check
shows no issue
The context is I am running docker-compose.quickstart.yml
in my dev machineabundant-photographer-45796
12/24/2021, 6:25 AMsource:
type: superset
config:
# Coordinates
connect_uri: <http://localhost:8088>
# Credentials
username: xxx
password: xxx
provider: db
sink:
# sink configs
type: "datahub-rest"
config:
server: "<http://192.168.229.4:8080>"
then, I carried out the ingestion command,
datahub ingest -c superset.yml
I get this hint
But in my datahub homepage, I can't see the charts.
Can someone tell me why? Thank youbusy-zebra-64439
12/27/2021, 11:26 AMrich-policeman-92383
12/27/2021, 12:32 PMbeeline -e "set tez.queue.name='myqueue' describe formatted myschem.mytable;"
agreeable-river-32119
12/28/2021, 6:49 AMbusy-zebra-64439
12/28/2021, 1:55 PMnice-autumn-10105
12/28/2021, 5:35 PMcurved-magazine-23582
12/28/2021, 8:38 PMlemon-cartoon-14299
12/29/2021, 12:13 AMbetter-orange-49102
12/30/2021, 7:34 AMnice-country-99675
12/30/2021, 12:13 PMpipeline = Pipeline.create(
{
"source": {
"type": source,
"config": {
"username": f"{conn.login}",
"password": f"{conn.password}",
"database": f"{conn.schema}",
"host_port": f"{conn.host}:{conn.port}",
"database_alias": alias,
"env": "PROD",
"schema_pattern": {
"deny": deny_schemas
},
},
},
"transformers": transformers,
"sink": {
"type": "datahub-rest",
"config": {"server": f"{datahub.host}"},
},
}
)
pipeline.run()
pipeline.raise_from_status()
The thing is the DAG ends with
{local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL
It's the only thing that is actually logged.... it seems it fails as soon as the process starts. At first, I thought it was a memory issue, we increase the pods memory, and now we are pretty far from the memory limit. It even fails when I ran dray_run. Locally it's working fine.
Locally I'm using Airflow 2.2.2 while in production I'm using Airflow 2.2.1. I would really appreciate any suggestion...gentle-florist-49869
12/30/2021, 2:42 PMdamp-ambulance-34232
01/03/2022, 4:25 AMred-pizza-28006
01/03/2022, 12:52 PM[2022-01-03 13:44:47,679] INFO {datahub.cli.ingest_cli:81} - Starting metadata ingestion
[2022-01-03 13:47:19,428] INFO {datahub.ingestion.run.pipeline:77} - sink wrote workunit kafka-<topic1>
[2022-01-03 13:49:50,867] INFO {datahub.ingestion.run.pipeline:77} - sink wrote workunit kafka-<topic2>
better-orange-49102
01/03/2022, 1:27 PMgentle-florist-49869
01/03/2022, 6:14 PMadventurous-apple-98365
01/04/2022, 2:14 AMGlobalTags
aspect.
The tag itself isn't ingested(not in the elastic tag index so we cant search!) but does properly appear on the dataset and in the list of 'filter checkboxes' when viewing data sets.
Is there anyway to have the tag also created outside of ingesting the tag separately before ingesting the data set? Not sure if it makes sense to try solve it in one place (within dathaub itself) versus in each of our ingestion plugins