fresh-coat-71059
07/07/2022, 7:48 AMentities
and transform the schemaMetadata
aspect.
But when I try to run it in a data ingestion procudure (Mysql datasource), It can recognize all datasets but can't get the schemaMetadata
aspect correctly. the parameter aspect
is always a None value.
how can I change this transformer to meet my requirement?echoing-farmer-38304
07/07/2022, 7:55 AMearly-librarian-13786
07/07/2022, 8:49 AMcuddly-arm-8412
07/07/2022, 9:40 AMdef make_glossary_node_urn(path: List[str]) -> str:
return "urn:li:glossaryNode:" + ".".join(path)
def make_glossary_term_urn(path: List[str]) -> str:
return "urn:li:glossaryTerm:" + ".".join(path)
I think official cases are a unique sign,Is there any suitable conversion method
official->https://demo.datahubproject.io/glossaryTerm/urn:li:glossaryTerm:62a8cfcf-109d-442d-a06c-bf9ece8bbc14/Documentation?is_lineage_mode=false
mine->urnliglossaryNode:主题.车辆主题域fancy-artist-67223
07/07/2022, 10:18 AMAIRFLOW_CONN_DATAHUB_REST_DEFAULT: <datahub-rest://http>%3A%2F%2Fdatahub-gms%3A8080
AIRFLOW__LINEAGE__BACKEND: datahub_provider.lineage.datahub.DatahubLineageBackend
AIRFLOW__LINEAGE__DATAHUB_KWARGS: '{"datahub_conn_id": "datahub_rest_default",
"capture_ownership_info": true,
"capture_tags_info": true,
"graceful_exceptions": true }'
Once I add this to my docker compose file (is the one airflow provides, no changes) I can't get the airflow to start. Could you please help me? Thank youkind-helicopter-53206
07/07/2022, 12:24 PMplain-guitar-45103
07/07/2022, 4:42 PMsource:
type: delta-lake
config:
base_path: '<s3://mybucketpath>'
s3:
aws_config:
aws_access_key_id: XXXXXX
aws_secret_access_key: XXXXXXX
sink:
type: console
I get this error:
'"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/config.py", '
'line 79, in validate_config\n'
' 75 @pydantic.root_validator()\n'
' 76 def validate_config(cls, values: Dict) -> Dict[str, Any]:\n'
' 77 values["_is_s3"] = is_s3_uri(values["base_path"])\n'
' 78 if values["_is_s3"]:\n'
'--> 79 if values["s3"] is None:\n'
' 80 raise ValueError("s3 config must be set for s3 path")\n'
'\n'
'---- (full traceback above) ----\n'
'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 106, in '
'run\n'
' pipeline = Pipeline.create(pipeline_config, dry_run, preview, preview_workunits)\n'
'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
'204, in create\n'
' return cls(\n'
'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
'152, in __init__\n'
' self.source: Source = source_class.create(\n'
'File '
'"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/source.py", '
'line 99, in create\n'
' config = DeltaLakeSourceConfig.parse_obj(config_dict)\n'
'File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
'File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__\n'
'File "pydantic/main.py", line 1064, in pydantic.main.validate_model\n'
'File '
'"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/config.py", '
'line 79, in validate_config\n'
' if values["s3"] is None:\n'
'\n'
"KeyError: 's3'\n"
Full log is attachedmysterious-lamp-91034
07/08/2022, 12:34 AMcurl "<http://localhost:8080/entities?action=delete>" -X POST --data '{"urn": "urn:li:glossaryTerm:AccountBalance"}'
It deleted the data in mysql but not in UI. I realized it may still exist in elasticsearch. Then I deleted all data in elastic search
curl -s -X DELETE <https://vpc-schema-registry-XXXXXX.us-east-1.es.amazonaws.com/*>
Then restart the server.
Looks like the server is rebuilding the index and backfill data.
But the speed is very slow.
Is there a quick way to rebuild the elastic search?
Or
What is the right way to delete an entity in both mysql and elastic search?
Thanks!bright-cpu-56427
07/08/2022, 3:45 AMbland-orange-13353
07/08/2022, 8:34 AMsilly-ice-4153
07/08/2022, 9:51 AMsparse-raincoat-42898
07/08/2022, 11:09 AMplain-beach-61128
07/08/2022, 3:13 PMtable-pattern:
allow:
- "^my_table_name$"
. But this does not ingest the desired table but instead ingests many other tables and views. Can you please help me configure this correctly?sparse-barista-40860
07/08/2022, 3:58 PMsparse-barista-40860
07/08/2022, 3:58 PMsparse-barista-40860
07/08/2022, 3:58 PMsparse-barista-40860
07/08/2022, 3:59 PM./gradlew :metadata-ingestion-examples:kafka-etl:bootRun
sparse-barista-40860
07/08/2022, 4:21 PMsparse-barista-40860
07/08/2022, 4:21 PMsparse-barista-40860
07/08/2022, 4:21 PMsilly-ice-4153
07/08/2022, 4:35 PMbig-plumber-87113
07/08/2022, 7:45 PM--header 'Authorization: Bearer <...>
. my current hack has been to store a generated token from the UI and use it to generate a new token before expiration and replace the old one 🥴faint-television-78785
07/11/2022, 1:44 AMloud-kite-94877
07/11/2022, 3:42 AM" 'failures': [{'error': 'Unable to emit metadata to DataHub GMS',\n"
" 'info': {'message': '401 Client Error: Unauthorized for url: "
"<http://datahub-datahub-gms:8080/aspects?action=ingestProposal'}}>,\n"
" {'error': 'Unable to emit metadata to DataHub GMS',\n"
" 'info': {'message': '401 Client Error: Unauthorized for url: "
"<http://datahub-datahub-gms:8080/aspects?action=ingestProposal'}}>],\n"
lemon-zoo-63387
07/11/2022, 5:58 AMsalmon-angle-92685
07/11/2022, 7:23 AMyes | datahub delete --env PROD --entity_type glossaryTerm --hard ; yes | datahub delete --entity_type glossaryNode --hard
Thank you guys in advance :)late-bear-87552
07/11/2022, 7:23 AMmicroscopic-mechanic-13766
07/11/2022, 7:26 AMcuddly-arm-8412
07/11/2022, 8:42 AMbusy-wolf-34537
07/11/2022, 9:43 AM