https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • b

    blue-beach-27940

    07/01/2022, 2:26 AM
    hi all, how to use
    profiling
    in the ingestion action, I want to get the column statistics? I got some error with blow recipe.yml
  • b

    blue-beach-27940

    07/01/2022, 2:27 AM
    image.png
  • b

    brave-tomato-16287

    07/01/2022, 11:15 AM
    Hello all. Could you help to understand what happened with DBT tests? run_results.json contains several records with success tests.
  • p

    proud-baker-56489

    07/01/2022, 11:58 AM
    image.png
  • g

    gray-river-37120

    07/02/2022, 1:36 AM
    Sorry to unbury old threads, but wondering @numerous-holiday-52504 if you ever got this solved? I’m having the same issues. No matter how I specify the options for the Snowflake connection, I get a
    403
    error. Here’s the recipe: ``source:`
    type: snowflake
    config:
    check_role_grants: true
    provision_role:
    enabled: false
    dry_run: true
    run_ingestion: false
    admin_username: '${SNOWFLAKE_USER}'
    admin_password: '${SNOWFLAKE_PASS}'
    account_id: ak43980
    warehouse: COMPUTE_WH
    username: '${SNOWFLAKE_USER}'
    password: '${SNOWFLAKE_PASS}'
    role: ROLENAME
    `ignore_start_time_lineage: true`` And, from the logs:
    Copy code
    '[2022-07-02 00:59:43,674] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.40\n'
               '[2022-07-02 00:59:44,598] INFO     {datahub.ingestion.source_config.sql.snowflake:236} - using authenticator type '
               "'DEFAULT_AUTHENTICATOR'\n"
               '[2022-07-02 00:59:44,742] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
               '[2022-07-02 00:59:44,743] INFO     {datahub.ingestion.source.sql.snowflake:114} - Checking current version\n'
               '[2022-07-02 01:02:06,031] ERROR    {snowflake.connector.network:920} - 000403: HTTP 403: Forbidden\n'
  • p

    proud-baker-56489

    07/06/2022, 3:42 AM
    hi, I try to ingest some hive tables into datahub using the datahub cli, but I get the erro that tells me the queue is full, how can I change it ?
  • t

    tall-butcher-30509

    07/07/2022, 7:55 AM
    Regarding encoding, within our Java application, we are taking data from BigQuery, which returns it in UTF-8, using the MetadataChangeProposalWrapper to make a request and then send it with RestEmitter. We don’t make any particular configuration of these classes regarding encoding and are currently assuming that they use UTF-8 by default. The error only occurs when we change a property value from ascii only to include Japanese characters (business requirement).
    Copy code
    Sample Code:
    ------------
    MetadataChangeProposalWrapper mcpw = MetadataChangeProposalWrapper.builder()
                    .entityType("dataset")
                    .entityUrn("urn:li:dataset:(urn:li:dataPlatform:bigquery,<REMOVED>,TEST)")
                    .upsert()
                    .aspect(new DatasetProperties().setDescription("Sample Data - 商品ブランドコード"))
                    .aspectName("datasetProperties")
                    .build();
    emitAspectsToDataHub(mcpw);
    
    
    Error Log:
    ------------
    
     Ingestion failed: EMIT_METADATA_ERROR_RESPONSE : Failed to emit entity type: dataset, entity urn: urn:li:dataset:(urn:li:dataPlatform:bigquery,<REMOVED>,TEST), aspect: datasetProperties with status code: 400 retry started...
  • l

    lemon-zoo-63387

    07/11/2022, 1:32 AM
    Hello everyone, I have a very strange problem. Can only 10000 entities be deleted at a time.
  • t

    tall-butcher-30509

    07/11/2022, 11:36 PM
    FYI GitHub issue was raised https://github.com/datahub-project/datahub/issues/5367
  • b

    bumpy-camera-96689

    07/14/2022, 6:56 AM
    Hi! Can anyone help me understand which ingestion sources support population of the "Features" tab, for an MLModel entity? In the screenshot below, I created this model programatically, for testing. But I'm interested in any sources which would link models and features together. Thanks!
  • s

    sparse-barista-40860

    07/18/2022, 6:16 PM
    SNAG-0058.png
  • c

    cool-vr-73109

    07/21/2022, 10:22 AM
    IMG_20220721_155142.jpg
  • c

    chilly-carpet-99599

    07/21/2022, 7:23 PM
    Thanks, Swaroop. Will give that a shot. @creamy-van-28626 , FYI.
  • c

    cool-vr-73109

    07/26/2022, 9:29 AM
    IMG_20220726_145827.jpg
  • c

    cool-vr-73109

    07/28/2022, 8:26 AM
    IMG_20220728_135220.jpg,IMG_20220728_135101.jpg
  • g

    gifted-kite-59905

    08/01/2022, 6:24 AM
    Will the new Bulk Edits via the UI help to solve this. Or is it at table level only @orange-night-91387?
  • d

    dazzling-insurance-83303

    08/05/2022, 7:31 PM
    So at this time I am not sure if it is YAML specification error or a bug in the
    allow_deny_patterns
    for profiling. A quick confirmation would really help. Thanks! CC @little-megabyte-1074, @mammoth-bear-12532
  • g

    gifted-knife-16120

    08/08/2022, 9:31 AM
    hi anyone here can help?
  • r

    rapid-house-76230

    08/12/2022, 5:29 PM
    Hi team, I would appreciate some help on this if you have a chance
  • f

    famous-florist-7218

    08/17/2022, 7:24 AM
    hi @dazzling-judge-80093, I have a question that my s3 bucket has a thousand files. If I ingest all in the file level, and build lineage graph between them. It will be very messy. So how do I combine them in the lineage graph? I know that if I specify {table} placeholder, all these files will be counted as a dataset with actual table_name. But I need to trace the log of each file. Do you have any potential approaching? Thanks 🙂 For example:
    Copy code
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file1.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file2.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file3.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file4.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file5.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file6.json.gz>
    <s3://bucket/foo/bar/folder/table_name/year=2022/month=08/day=04/hour=09/file7.json.gz>
    ...
  • n

    nutritious-bird-77396

    08/17/2022, 4:11 PM
    @big-carpet-38439 I have been trying to get off the custom image for datahub frontend and datahub-actions but couldn't get an example to mount a custom file...would you be able to share an example of the same... We use kustomize but i can try based on helm examples as well. Thanks!
  • s

    sparse-forest-98608

    08/18/2022, 7:32 AM
    I have some metadata got from csv files, I want to ingest that to datahub
  • s

    sparse-forest-98608

    08/18/2022, 7:34 AM
    image.png
  • c

    curved-magazine-23582

    08/19/2022, 6:57 PM
    hello, coming back to this PowerBI ingestion, I have registered an azure app, and given it
    application
    type permission of Tenant.Read.All, But I am still getting 401 error during ingestion, any suggestion on what to look next?
  • b

    bright-motherboard-35257

    08/20/2022, 10:01 PM
    I'm using the docker quick start container. Trying to perform an S3 ingest. I installed pyspark, openjdk-8 java and set env variable SPARK_VERSION=3.2.0 on the datahub-actions image to get past the ingest error "Please set env variable SPARK_VERSION". The ingest runs without error yet no information is retrieved from my S3 bucket. Recipe:
    sink:
    type: datahub-rest
    config:
    server: 'http://<redacted>/api/gms'
    source:
    type: s3
    config:
    profiling:
    enabled: true
    path_specs:
    -
    include: 'https://<redacted>.<http://s3.amazonaws.com/branch-data/*.*|s3.amazonaws.com/branch-data/*.*>'
    env: PROD
    aws_config:
    aws_access_key_id: <redacted>
    aws_region: us-east-1
    aws_secret_access_key: <redacted>
    pipeline_name: 'urn:li:dataHubIngestionSource:<redacted>'
  • g

    gifted-bird-57147

    08/22/2022, 6:15 AM
    As per documentation (https://datahubproject.io/docs/metadata-ingestion#recipes) the yaml plugin for vscode does syntax validation and type checking. The ingestion itself works, but the validation/auto completion is a nice feature to help you author the recipes...
  • h

    happy-twilight-44865

    08/25/2022, 1:03 PM
    I am looking for same connection
  • g

    great-account-95406

    08/29/2022, 5:47 AM
    Could anyone give me advice here, please?
  • b

    brave-nail-85388

    08/30/2022, 8:18 PM
    ingestion error.PNG,ingestionerror1.PNG
  • b

    brave-nail-85388

    08/30/2022, 8:41 PM
    image.png
1...139140141...144Latest