Hello Team, Stateful ingestion with Transformers a...
# ingestion
s
Hello Team, Stateful ingestion with Transformers are not working, Please find below config:
Copy code
# see <https://datahubproject.io/docs/generated/ingestion/sources/mysql> for complete documentation
source:
  type: "postgres"
  config:
    host_port: localhost:54320
    database: test
    stateful_ingestion:
      enabled: True
    table_pattern:
      deny:
        - '.*company$'
pipeline_name: "my_postgres_pipeline_2"
transformers:
      - type: "simple_add_dataset_ownership"
        config:
          owner_urns:
           - "urn:li:corpGroup:mobilede@DataHub"

datahub_api:
    server: "<http://localhost:8080>"

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
Here we are trying to softdelete a table using deny pattern and as per the stateful ingestion the soft deleted item should not be displayed from UI, but the soft deleted table still visible from UI. while seeing the logs with debug we got to know that when we are using a transformer then the deleted entity upserted again. please see below logs: [2022-06-17 113520,773] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "status", "aspect": {"value": "{\"removed\": true}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-06-17 113520,794] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit soft-delete-table-urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD) [2022-06-17 113520,795] DEBUG {datahub.emitter.rest_emitter:229} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataset", "entityUrn": "urnlidataset:(urnlidataPlatform:postgres,test.public.company,PROD)", "changeType": "UPSERT", "aspectName": "ownership", "aspect": {"value": "{\"owners\": [{\"owner\": \"urnlicorpGroup:mobilede@DataHub\", \"type\": \"DATAOWNER\"}], \"lastModified\": {\"time\": 0, \"actor\": \"urnlicorpuser:unknown\"}}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1655458520711, "runId": "postgres-2022_06_17-11_35_18"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-06-17 113520,830] INFO {datahub.ingestion.run.pipeline:84} - sink wrote workunit txform-urnlidataPlatform:postgres-test.public.company-PROD-ownership Is this expected? I mean stateful ingestion with transformer not supported? Or, Is there any configuration for transformers to check the soft deleted entity?
b
hey there! I think this may be of help to you - setting the status of an entity to have
removed: true
in your transformer should soft delete it and hide it from the UI https://datahubproject.io/docs/metadata-ingestion/transformers/#mark-dataset-status
you can obviously filter only the entities that you want to set this
removed: true
status
also a quick remonder to post larger blocks of code and logs as a thread under the main post to keep the channel cleaner and easier to navigate! it makes finding things for the team and others much easier