https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • f

    fresh-coat-71059

    07/07/2022, 7:48 AM
    Hi , I want to write a custom transformer to get and analyze the latest schema of datasets. I wrote a transformer which will process dataset
    entities
    and transform the
    schemaMetadata
    aspect. But when I try to run it in a data ingestion procudure (Mysql datasource), It can recognize all datasets but can't get the
    schemaMetadata
    aspect correctly. the parameter
    aspect
    is always a None value. how can I change this transformer to meet my requirement?
    plus1 2
    • 1
    • 3
  • e

    echoing-farmer-38304

    07/07/2022, 7:55 AM
    Hello, I run ingestion with transformers to add dataset fields tags, but the problem is that dataset fields tags are not editable ( fieldTags ). Is it possible to add editable tags ( editedFieldTags) by transformers?
  • e

    early-librarian-13786

    07/07/2022, 8:49 AM
    Hello! I'm trying to make alerting on failed Great Expectations assertions with Actions Framework. I expect Kafka Event Source to receive UPSERT event after assertion status change, but it doesn't happen. Do you have any ideas why this might be happening? Here's my action config, I'm using DataHub v0.8.39
    datahub-actions.yaml
    b
    • 2
    • 1
  • c

    cuddly-arm-8412

    07/07/2022, 9:40 AM
    hi,team,When I build my own glossary-node/term,I found that my urn-address was built with my nodeName
    Copy code
    def make_glossary_node_urn(path: List[str]) -> str:
        return "urn:li:glossaryNode:" + ".".join(path)
    
    
    def make_glossary_term_urn(path: List[str]) -> str:
        return "urn:li:glossaryTerm:" + ".".join(path)
    I think official cases are a unique sign,Is there any suitable conversion method official->https://demo.datahubproject.io/glossaryTerm/urn:li:glossaryTerm:62a8cfcf-109d-442d-a06c-bf9ece8bbc14/Documentation?is_lineage_mode=false mine->urnliglossaryNode:主题.车辆主题域
    b
    • 2
    • 6
  • f

    fancy-artist-67223

    07/07/2022, 10:18 AM
    Hello everybody. So, I'm trying to ingest information from airflow. Not only the pipelines, but mainly to create lineage. I've followed this guide https://datahubproject.io/docs/lineage/airflow/. I do have the plugin installed and the connection is set. I've also used https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_demo.py this lineage demo to try to test it. I though this was enough, but then I read that the dag itself does nothing, it needs to establish the connection that should be exposed on the docker compose file. So I added this that was on other question:
    Copy code
    AIRFLOW_CONN_DATAHUB_REST_DEFAULT: <datahub-rest://http>%3A%2F%2Fdatahub-gms%3A8080
    AIRFLOW__LINEAGE__BACKEND: datahub_provider.lineage.datahub.DatahubLineageBackend
    AIRFLOW__LINEAGE__DATAHUB_KWARGS: '{"datahub_conn_id": "datahub_rest_default",
                                     "capture_ownership_info": true,
                                     "capture_tags_info": true,
                                     "graceful_exceptions": true }'
    Once I add this to my docker compose file (is the one airflow provides, no changes) I can't get the airflow to start. Could you please help me? Thank you
    d
    • 2
    • 9
  • k

    kind-helicopter-53206

    07/07/2022, 12:24 PM
    Hi i try to use ptyhon emitter like this example https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_mcpw_rest.py i need to config the emitter connection correctly as i have domain name for datahub server and token for authentication! i already tried this emitter = DatahubRestEmitter("https://example.com/api/gms", "extra_headers={Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxx}") but didnot works to connect with datahub server
    b
    a
    r
    • 4
    • 12
  • p

    plain-guitar-45103

    07/07/2022, 4:42 PM
    Hi, I am trying to perform a very simple ingestion with delta lakes using the following recipe via UI
    Copy code
    source:
        type: delta-lake
        config:
            base_path: '<s3://mybucketpath>'
            s3:
                aws_config:
                    aws_access_key_id: XXXXXX
                    aws_secret_access_key: XXXXXXX
    sink:
        type: console
    I get this error:
    Copy code
    '"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/config.py", '
               'line 79, in validate_config\n'
               '    75   @pydantic.root_validator()\n'
               '    76   def validate_config(cls, values: Dict) -> Dict[str, Any]:\n'
               '    77       values["_is_s3"] = is_s3_uri(values["base_path"])\n'
               '    78       if values["_is_s3"]:\n'
               '--> 79           if values["s3"] is None:\n'
               '    80               raise ValueError("s3 config must be set for s3 path")\n'
               '\n'
               '---- (full traceback above) ----\n'
               'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 106, in '
               'run\n'
               '    pipeline = Pipeline.create(pipeline_config, dry_run, preview, preview_workunits)\n'
               'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
               '204, in create\n'
               '    return cls(\n'
               'File "/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line '
               '152, in __init__\n'
               '    self.source: Source = source_class.create(\n'
               'File '
               '"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/source.py", '
               'line 99, in create\n'
               '    config = DeltaLakeSourceConfig.parse_obj(config_dict)\n'
               'File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
               'File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__\n'
               'File "pydantic/main.py", line 1064, in pydantic.main.validate_model\n'
               'File '
               '"/tmp/datahub/ingest/venv-ffbf74a0-cc21-4052-b01e-9e37f43cf20d/lib/python3.9/site-packages/datahub/ingestion/source/delta_lake/config.py", '
               'line 79, in validate_config\n'
               '    if values["s3"] is None:\n'
               '\n'
               "KeyError: 's3'\n"
    Full log is attached
    log.txt
    m
    c
    • 3
    • 7
  • m

    mysterious-lamp-91034

    07/08/2022, 12:34 AM
    Hi I am trying to delete some glossaryTerm so I ran
    Copy code
    curl "<http://localhost:8080/entities?action=delete>" -X POST --data '{"urn": "urn:li:glossaryTerm:AccountBalance"}'
    It deleted the data in mysql but not in UI. I realized it may still exist in elasticsearch. Then I deleted all data in elastic search
    Copy code
    curl -s -X DELETE <https://vpc-schema-registry-XXXXXX.us-east-1.es.amazonaws.com/*>
    Then restart the server. Looks like the server is rebuilding the index and backfill data. But the speed is very slow. Is there a quick way to rebuild the elastic search? Or What is the right way to delete an entity in both mysql and elastic search? Thanks!
    m
    i
    • 3
    • 47
  • b

    bright-cpu-56427

    07/08/2022, 3:45 AM
    Hi Team After deleting with the datahub delete command, the entity created with the same name is not visible on the Datahub UI. How should I handle it?
    b
    • 2
    • 1
  • b

    bland-orange-13353

    07/08/2022, 8:34 AM
    This message was deleted.
  • s

    silly-ice-4153

    07/08/2022, 9:51 AM
    Hello I have a basic question about the python cli ingestion - the passwords in the recipe.yml file - can they be hidden or can it be connected to a vault - so the passwords are not cleartext there?
    m
    • 2
    • 2
  • s

    sparse-raincoat-42898

    07/08/2022, 11:09 AM
    Hi All, Basic question. my data flow(SFTP-->ADLS-->Delta Table(Databricks)) I have CSV files hosted in SFTP server, is there any way to ingest and enable profiling from SFTP? Thanks.
  • p

    plain-beach-61128

    07/08/2022, 3:13 PM
    Hi all, I want to test ingestion from a Snowflake data source with only a single table. Following the docs at https://datahubproject.io/docs/generated/ingestion/sources/snowflake, I'm trying
    table-pattern:
    allow:
    - "^my_table_name$"
    . But this does not ingest the desired table but instead ingests many other tables and views. Can you please help me configure this correctly?
    plus1 1
    h
    • 2
    • 1
  • s

    sparse-barista-40860

    07/08/2022, 3:58 PM
    hi all
  • s

    sparse-barista-40860

    07/08/2022, 3:58 PM
    https://gist.github.com/430bed738bca50b5ccf321591ebd162d
  • s

    sparse-barista-40860

    07/08/2022, 3:58 PM
    how can solve that error?
  • s

    sparse-barista-40860

    07/08/2022, 3:59 PM
    Copy code
    ./gradlew :metadata-ingestion-examples:kafka-etl:bootRun
  • s

    sparse-barista-40860

    07/08/2022, 4:21 PM
    i downgrade gradle to 6.9.2
  • s

    sparse-barista-40860

    07/08/2022, 4:21 PM
    https://gist.github.com/f72ae75a4ed414dbbd095c4e409c0767
  • s

    sparse-barista-40860

    07/08/2022, 4:21 PM
    and still show me same error
  • s

    silly-ice-4153

    07/08/2022, 4:35 PM
    Hello I have a problem with looker ingestition Failed to initialize Looker client. Please check your configuration. I have defined the looker.ini and installed looker-sdk - Maybe I'm missing something? I work with the .cloud.looker.com
    h
    • 2
    • 3
  • b

    big-plumber-87113

    07/08/2022, 7:45 PM
    hi team, is there a way to either (a) generate tokens that last longer than 6 months or (b) programmatically generate tokens without needing another token for validation? for example querying graphql with curl, I need to still provide an access token for programmatic token generation with
    --header 'Authorization: Bearer <...>
    . my current hack has been to store a generated token from the UI and use it to generate a new token before expiration and replace the old one 🥴
    i
    l
    +2
    • 5
    • 6
  • f

    faint-television-78785

    07/11/2022, 1:44 AM
    hey all this is brian! anybody know of example code for a Custom Ingestion Source, that sends tag updates for paths? ive looked in the docs but cant find any. im aiming to update an existing datahub-synced postgres source. any pointers appreciated!
  • l

    loud-kite-94877

    07/11/2022, 3:42 AM
    Hi, all. I got this failure sometimes. Espectially, when I executed several ingestions simultaneously . Thank for help. version 0.8.38 in k8s. GMS_AUTHENTICATION_ENABLE: TRUE ingestion executed in action cotnainer.
    Copy code
    " 'failures': [{'error': 'Unable to emit metadata to DataHub GMS',\n"
               "               'info': {'message': '401 Client Error: Unauthorized for url: "
               "<http://datahub-datahub-gms:8080/aspects?action=ingestProposal'}}>,\n"
               "              {'error': 'Unable to emit metadata to DataHub GMS',\n"
               "               'info': {'message': '401 Client Error: Unauthorized for url: "
               "<http://datahub-datahub-gms:8080/aspects?action=ingestProposal'}}>],\n"
    plus1 2
    l
    • 2
    • 1
  • l

    lemon-zoo-63387

    07/11/2022, 5:58 AM
    .Hello,everyone,The following links are related to other systems, how to carry their own lineage when ingesting metadata, Thanks in advance for your help https://datahubproject.io/docs/lineage/sample_code
    l
    • 2
    • 1
  • s

    salmon-angle-92685

    07/11/2022, 7:23 AM
    Hello everyone, Is there a way of deleting all the Glossary Terms using the same command lines found on this doc: https://datahubproject.io/docs/how/delete-metadata/ ? I tried to replace the entity_type by glossaryTerm or glossaryNode, but it didn't work:
    Copy code
    yes | datahub delete --env PROD --entity_type glossaryTerm --hard ; yes | datahub delete --entity_type glossaryNode --hard
    Thank you guys in advance :)
    l
    • 2
    • 1
  • l

    late-bear-87552

    07/11/2022, 7:23 AM
    Hello everyone, just wanted to understand about lineage for airflow on datahub. if any of airflow task fails, will there be any affect on lineage on the datahub. Because i can see there is a task failure which i can see on the runs list of datahub ui but i can see lineage does not have any change.
    d
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    07/11/2022, 7:26 AM
    Good morning, one quick question, where can I see the CLI version that currently exist?? I am trying to update to v0.8.40 but don't know if the latest CLI version is 0.8.40.0 or 0.8.39.x
    d
    • 2
    • 1
  • c

    cuddly-arm-8412

    07/11/2022, 8:42 AM
    hi,team, I want to know the task model。How can I quickly understand pipline, dataFlow,task?I want to evaluate whether I can intervene in our company's internal task system,Our internal system is based on the dolphin scheduler
  • b

    busy-wolf-34537

    07/11/2022, 9:43 AM
    Hello Everyone, Good Day!!! Is there any way to change the database/schema name case (from current lower to upper case) of how it's represented in UI after/during metadata ingestion, like by passing any kind of configuration options etc.
    l
    • 2
    • 1
1...525354...144Latest