https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • o

    orange-flag-48535

    10/03/2022, 3:32 PM
    Datahub UI isn't picking up my entity when I re-ingest it after a soft delete. The soft-delete sets '{"removed":true}' for the "status" aspectName, as per my look into the Mysql. But the re-ingest isn't touching the removed:true entry at all. My re-ingest is based on sending an UPSERT of with aspectName "schemaMetadata", and the same urn value as the original ingest.
    m
    s
    • 3
    • 7
  • b

    brainy-table-99728

    10/03/2022, 4:19 PM
    When ingesting data from Snowflake, is a table uniquely identified by database.schema.table_name? I'm asking because we tend to drop and recreate tables during our ETL processes, and I want to know if I add descriptions to columns in DataHub, will those descriptions stay put or will it drop that "table" from DataHub and add a new one?
    m
    • 2
    • 1
  • s

    sparse-holiday-36471

    10/03/2022, 5:35 PM
    Hello In our company we are thinking about using DataHub for storing meta-information for all databases we have. In our current case we have 5+ PostgreSQL Clusters, some of them do have same database names We see that dev / qa and prod are hardcoded values for environments Is it possible to add a new environment or somehow else organize multiple “containers” with same names? (ideally we would have a group equals to the business unit)
    g
    • 2
    • 2
  • g

    green-honey-91903

    10/03/2022, 6:48 PM
    I have
    datahub
    deployed via the latest helm charts and setup snowflake integration (via UI). The lineage features seem to be inaccessible/greyed out despite datahub having access to the
    ACCESS_HISTORY
    view. Am I missing a step in this integration?
    g
    • 2
    • 3
  • w

    wonderful-notebook-20086

    10/03/2022, 11:17 PM
    I'm trying to figure out why the redshift lineage is not picking anything up ... recipe:
    Copy code
    source:
        type: redshift
        config:
            platform_instance: etl2_prod
            table_lineage_mode: mixed
            include_table_lineage: true
            database: insightsetl
            password: '${etl2test}'
            include_copy_lineage: false
            profiling:
                enabled: false
            host_port: '<http://pi-redshift-etl-2.ccvpgkqogsrc.us-east-1.redshift.amazonaws.com:8192|pi-redshift-etl-2.ccvpgkqogsrc.us-east-1.redshift.amazonaws.com:8192>'
            stateful_ingestion:
                enabled: false
            username: datahub_ingestion
            capture_lineage_query_parser_failures: true
    Where do the lineage query parser failures actually get recorded? How can I review these?
    g
    f
    • 3
    • 19
  • f

    flat-painter-78331

    10/04/2022, 1:04 AM
    Hi guys. Is it possible to show lineage as a combination of tasks in two DAGs? For example, if DAG_1 has two tasks for extracting and loading data into tables, and DAG_2 has a task for aggregating these tables, and therefore to show lineage for the tasks in DAG_1 and in DAG_2 together?
    b
    • 2
    • 1
  • a

    alert-fall-82501

    10/04/2022, 6:07 AM
    Copy code
    datahub.ingestion.run.pipeline.PipelineInitError: Failed to set up framework context
    [2022-10-04, 06:00:32 UTC] {{subprocess.py:89}} INFO - [2022-10-04, 06:00:32 UTC] ERROR    {datahub.entrypoints:196} - Command failed:
    [2022-10-04, 06:00:32 UTC] {{subprocess.py:89}} INFO - 	Failed to set up framework context due to
    [2022-10-04, 06:00:32 UTC] {{subprocess.py:89}} INFO - 		'Failed to connect to DataHub' due to
    [2022-10-04, 06:00:32 UTC] {{subprocess.py:89}} INFO - 			'HTTPSConnectionPool(host='<http://datahub-gms.amer-prod.xxxx.com|datahub-gms.amer-prod.xxxx.com>', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f07678c0c90>: Failed to establish a new connection: [Errno -2] Name or service not known'))'
    g
    h
    • 3
    • 3
  • a

    alert-fall-82501

    10/04/2022, 6:08 AM
    Can anybody suggest on this error ? I am ingesting redshift source metadata to datahub private server . Thanks !
  • f

    famous-florist-7218

    10/04/2022, 10:18 AM
    Hi guys, I just face to the weird problem. When I move both mysql-db and elastic-search to the new one, the Ingestion - UI frontend is empty as a blank page. Although I’ve triggered migration job to restore indices, but result is still the same. Step to reproduce: • Go to Ingestion page. • The UI loads skeleton page then shows “Loading ingestion sources” popup. • After the popup fades away, the UI becomes the blank page.
    g
    • 2
    • 5
  • m

    microscopic-mechanic-13766

    10/04/2022, 11:21 AM
    Good day everyone, I am facing a "problem" with PostgreSQL ingestion. I am trying to ingest from all the databases that exist in my PostgreSQL using the postgres user but, although the ingestion succeeds, no data is ingested. But if I ingest one database at a time with the same user, it is ingested correctly. Does anybody know why this might be happening? (I will upload both the recipe that I use for all the databases ingestion and the resultant log in the thread)
    d
    • 2
    • 6
  • a

    ancient-policeman-73437

    10/04/2022, 1:31 PM
    Dear all, when I drop a platform Datahub doesnt remove containers. Do you know why and how to clean it completely? Thanks for your help!
    g
    • 2
    • 1
  • n

    numerous-account-62719

    10/04/2022, 1:43 PM
    Hi Team, I want to add the data from ArangoDB into datahub. Can anyone please tell me how to do that and if ArangoDB is supported in datahub?
    g
    • 2
    • 7
  • t

    tall-lighter-95403

    10/04/2022, 2:58 PM
    Hi Everyone.. can I confirm if the s3 ingestion mechanism in datahub supports kms encrypted buckets?
    d
    • 2
    • 4
  • a

    ancient-policeman-73437

    10/04/2022, 3:09 PM
    Dear all, I have another question, which I couldnt find in the documentation. I have ingested domains from csv ingestion and they appeared in the wrong format. I want to remove those domains, but they are not in domain glossary an dont have urns.... what to do in that case ?
    g
    • 2
    • 1
  • s

    steep-family-13549

    10/04/2022, 3:13 PM
    Hello team, I want to create lineage with java code it is possible ?
  • a

    ancient-policeman-73437

    10/04/2022, 3:40 PM
    Dear All, I would like to ask if I can get a list of PII fields from DataHub ? I use the search command fieldTags:"PII" OR editedFieldTags:"PII" ? DH shows only database - table level....
  • a

    adamant-rain-51672

    10/04/2022, 3:07 PM
    Hey, a question about creating dataset stats (for a source that doesn't support it automatically). I'd like to create them programatically using python client. Is there a way to do so (interface to submit stats attached to dataset)? Here's what I want end up with:
    g
    • 2
    • 3
  • m

    mysterious-lizard-42579

    10/04/2022, 6:01 PM
    Hi! How to remove wrongly ingested lineage? I wrote the lineage file with .yml extension but I added some wrong upstreams by mistake and now I can't undo or delete.
    g
    • 2
    • 8
  • d

    damp-queen-61493

    10/04/2022, 7:24 PM
    Hi team! I'm trying to ingest protobuf metadata but I can't figure out how What I'm trying to do: 1. Clone Datahub 2. Build metadata-integration 3. Copy generate datahub-protobuf-0.8.42-SNAPSHOT.jar 4. Execute the command
    Copy code
    $ java -jar utils/datahub-protobuf/datahub-protobuf-0.8.42-SNAPSHOT.jar --descriptor descriptor_set_out.desc --directory events/products 
    Exception in thread "main" java.lang.NoClassDefFoundError: datahub/shaded/org/apache/http/ssl/TrustStrategy
            at datahub.protobuf.Proto2DataHub.main(Proto2DataHub.java:275)
    Caused by: java.lang.ClassNotFoundException: datahub.shaded.org.apache.http.ssl.TrustStrategy
            at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
            at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
            at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
            ... 1 more
    m
    • 2
    • 4
  • c

    chilly-ability-77706

    10/05/2022, 1:28 AM
    Hi, is there a plan to add support for Azure Data Explorer and ADLS Gen2. I asked this last week as well, just thought of bringing it back
    g
    • 2
    • 4
  • r

    rapid-potato-4736

    10/05/2022, 7:42 AM
    If ingestion from druid is done, will the metadata entered in druid be also ingested to datahub? If that's not the case, is it just fetching lists of table and column frames from druid to datahub? I'm curious about the case of superset ingestion as well.
    d
    • 2
    • 1
  • s

    strong-australia-51849

    10/05/2022, 8:40 AM
    Hi all, im not quite sure about data hub lineage. While exploring datahub I found that some of the data lineage displayed on the UI is not as what I expected. What did I do wrong ? Please guide me. Thanks
    d
    • 2
    • 2
  • w

    worried-zebra-47870

    10/05/2022, 1:50 PM
    Hi all, Quick question about the dbt ingestion: I do I get the compiled view definition? When I look at my dbt models in the UI, on the tab View Definition I see elements like
    ref {{}}
    and my teammates would like to see our snowflake table name (database.schema.table_name). Can you help me on this?
    g
    m
    • 3
    • 4
  • a

    ancient-policeman-73437

    10/05/2022, 2:32 PM
    Dear Support, if we put something by Edit buttons in the UI. Can we extract those data somehow from DataHub to not lose it ?
    l
    • 2
    • 2
  • p

    polite-application-51650

    10/06/2022, 6:45 AM
    Hi team, I'm using profiler configuration in my ingestion properties for BQ. But I can't find any anything under STATS for any of my tables. Can someone please help. @gray-shoe-75895
    m
    d
    f
    • 4
    • 8
  • c

    chilly-spring-43918

    10/06/2022, 10:44 AM
    Hi, i got this error when doing ingestion via UI.
    Copy code
    '  File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
               '  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__\n'
               'pydantic.error_wrappers.ValidationError: 1 validation error for SubProcessIngestionTaskArgs\n'
               'debug_mode\n'
               '  extra fields not permitted (type=value_error.extra)\n']}
    Execution finished with errors.
    the recipe is just like this
    Copy code
    source:
        type: bigquery
        config:
            credential:
                private_key_id: xxxxxxxxx
                project_id: xxxxxxxxxxxxxxxxxxx
                client_email: xxxxxxxxxxxxxxxxxxxxxxx
                private_key: xxxxxxxxxxxxxxxxxxxxxxx
                client_id: xxxxxxxxxxxxxxxx
            project_id: xxxxxxxxxxxxxxx
    pipeline_name: 'urn:li:dataHubIngestionSource:7552c3ac-6533-4f5a-9dc2-7d50abc3be5d'
    datahub version: 0.8.44.1
    h
    • 2
    • 2
  • h

    hallowed-spring-18709

    10/06/2022, 1:23 PM
    Hi everyone, first of all, thank you all for developing such a great tool. I have been trying to ingest a simple recipe using the datahub's ui but I want this recipe to ingest properties as well. I get a success message on ingest but no properties get added to my ingested files. I will post my recipe on the thread. I've seen there is a post already suggesting to start with the mce files from the examples but I am talking specifically on how to achieve the ingestion via the ui in accordance with the documentation here: https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#simple-add-dataset-datasetproperties
    d
    • 2
    • 4
  • t

    tall-lighter-95403

    10/06/2022, 1:58 PM
    Hi everyone.. running into an issue with setting up a Microsoft SQL Server ingestion as the password for the database (stored in secrets) has special characters like @+\{ and I'm getting Json errors while building the job
    m
    g
    • 3
    • 4
  • a

    alert-fall-82501

    10/07/2022, 11:11 AM
    Hi Team - I wanted configure the CSV source , does anybody has example of this ? .. how we can add glossary term to it .
  • h

    happy-twilight-44865

    10/07/2022, 10:17 AM
    Domain is appearing with urn instead on only value in latest version of datahub release when I am trying to insert metadata using #csv-enricher CSV file having urn mentioned as in image. Expectation is only Caspian.
    d
    • 2
    • 1
1...757677...144Latest