stocky-energy-24880
06/17/2022, 10:58 AM# see <https://datahubproject.io/docs/generated/ingestion/sources/mysql> for complete documentation
source:
type: "postgres"
config:
host_port: localhost:54320
database: test
stateful_ingestion:
enabled: True
table_pattern:
deny:
- '.*company$'
pipeline_name: "my_postgres_pipeline_2"
transformers:
- type: "simple_add_dataset_ownership"
config:
owner_urns:
- "urn:li:corpGroup:mobilede@DataHub"
datahub_api:
server: "<http://localhost:8080>"
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
Here we are trying to softdelete a table using deny pattern and as per the stateful ingestion the soft deleted item should not be displayed from UI, but the soft deleted table still visible from UI.
while seeing the logs with debug we got to know that when we are using a transformer then the deleted entity upserted again.
please see below logs:
[2022-06-17 113520,773] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "status", "aspect": {"value": "{\"removed\": true}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal'
[2022-06-17 113520,794] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit soft-delete-table-urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)
[2022-06-17 113520,795] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urnlicorpGroup:mobilede@DataHub\", \"type\": \"DATAOWNER\"}], \"lastModified\": {\"time\": 0, \"actor\": \"urnlicorpuser:unknown\"}}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal'
[2022-06-17 113520,830] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit txform-urnlidataPlatform:postgres-test.public.company-PROD-ownership
Is this expected? I mean stateful ingestion with transformer not supported?
Or, Is there any configuration for transformers to check the soft deleted entity?bulky-soccer-26729
06/17/2022, 2:47 PMremoved: true
in your transformer should soft delete it and hide it from the UI https://datahubproject.io/docs/metadata-ingestion/transformers/#mark-dataset-statusbulky-soccer-26729
06/17/2022, 2:47 PMremoved: true
statusbulky-soccer-26729
06/17/2022, 3:24 PM