https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • l

    loud-account-57875

    06/08/2023, 6:12 AM
    hi. Is it impossible to connect to google dataform other than dbt? Or can tags, terms, and dataset descriptions applied in Data Hub be automatically applied to BigQuery?
    ✅ 1
    d
    • 2
    • 3
  • a

    adventurous-apple-52621

    06/08/2023, 8:32 AM
    Hello, our schedule status for ingestion service haven’t updated for about 7days. Could anyone help to check this problem ?
    d
    • 2
    • 3
  • b

    brief-evening-58385

    06/08/2023, 1:46 PM
    Hi Team, I am a new datahub user. i am trying to add a kafka topic as upstream for a s3 path. the ingesting command is successful . but the lineage is not reflecting in UI. could you please help . how to debug and fix . lineage: - entity: env: prod name: s3://bucketname/folder/folder1 platform: s3 type: dataset upstream: - entity: env: prod name: topicname platform: kafka type: dataset version: 1 Pipeline finished successfully; produced 1 events in 0.28 seconds. ❗Client-Server Incompatible❗ Your client version 0.10.2.3 is older than your server version 0.10.3. Upgrading the cli to 0.10.3 is recommended. ➡️ Upgrade via "pip install 'acryl-datahub==0.10.3'"
    m
    • 2
    • 2
  • w

    wonderful-tomato-83083

    06/08/2023, 5:09 PM
    Hello! I found something that said you can set a different auth header in a recipe via
    extraHeaders
    but I haven't found an example of what that would look like. Can anyone help?
    d
    • 2
    • 1
  • a

    adventurous-apple-52621

    06/09/2023, 6:50 AM
    Hi all, when we ingest data, there are some elastic search exception thrown. Can anyone help on this ?
    d
    • 2
    • 4
  • c

    cuddly-garden-9148

    06/09/2023, 8:12 AM
    Hi all, when i create a new source, is there an option to exclude specific tables please?
    ✅ 1
    s
    • 2
    • 2
  • b

    billions-rose-75566

    06/09/2023, 11:00 AM
    Hi all, i am trying to run a recipe with datahub-kafka as a sink. With datahub-rest it works fine, but with datahub-kafka i get this error: {datahub.entrypoints:199} - Command failed: Failed to configure the source (postgres): Missing provider configuration. I only changed the sink:
    pipeline_name: DatabaseNameToBeIngested
    source: type: postgres config: host_port: postgres:5432 database: db username: db password: password profiling: enabled: true stateful_ingestion: enabled: true sink: type: "datahub-kafka" config: connection: bootstrap: "broker:29092" schema_registry_url: "http://schema-registry:8081"
    ✅ 1
    w
    • 2
    • 3
  • s

    swift-agency-2567

    06/09/2023, 1:29 PM
    Hi there! I've set up a local datahub with quickstart to try the lineage features on my data warehouse. I could import the data model of our database (all tables that datahub has access to have been listed as Datasets) but the lineage feature doesn't seem to cover all of it, only a portion of lineage has been processed (2 schemas out 12). I tried to rerun ingestion several times with different scope filtering but it doesn't seem to evolve anymore. Should I purge everything and retry? I'm on Snowflake and the role I use has access to
    snowflake.account_usage.access_history
    .
    g
    d
    • 3
    • 3
  • c

    cuddly-garden-9148

    06/09/2023, 2:13 PM
    Hello, can we use the jdbc url for the creation of a new source?
    ✅ 1
    g
    • 2
    • 2
  • w

    wonderful-tomato-83083

    06/09/2023, 3:08 PM
    Hi again, it looks like
    ssl_verify
    isn't supported in openapi recipe, is there a way around that?
    ✅ 1
    g
    • 2
    • 2
  • l

    limited-forest-73733

    06/09/2023, 3:10 PM
    Hey team I integrated airflow with datahub using datahub kafka and datahub version 0.10.3. I can see the airflow metadata on datahub UI but i have one question in airflow dag , task lineage is not coming up. Thanks
    d
    a
    • 3
    • 6
  • d

    dazzling-london-20492

    06/10/2023, 1:57 AM
    Hi team i wanted to ingest iceberg built in s3 but documentation only mention azure , are we going to supporting s3 in future
    g
    d
    p
    • 4
    • 3
  • q

    quiet-scientist-40341

    06/10/2023, 8:23 AM
    Hi everyone. have you met the question? If there are different version ?
    ✅ 1
    g
    • 2
    • 1
  • l

    loud-account-57875

    06/11/2023, 4:27 PM
    hi, I want to give permission for each user group. Where is the document about linking user groups using Google auth?
    ✅ 1
    g
    • 2
    • 1
  • w

    wonderful-tomato-83083

    06/12/2023, 5:06 PM
    ok, I've worked through a few of the openapi issues I've been having, but I hacked some of the datahub python lib to do so. A couple things that it doesn't seem to handle on its own, unless I just missed it. 🧵
    d
    g
    a
    • 4
    • 10
  • s

    swift-painter-68980

    06/12/2023, 7:40 PM
    Hi team, we are having an issue where Datahub is not able to see some of our looker dashboards and explores. It isn’t clear why. Details in the thread if someone can help 🧵
    g
    l
    +2
    • 5
    • 58
  • n

    numerous-refrigerator-15664

    06/13/2023, 1:45 AM
    Hi team, according to File Based Lineage doc, it is said that available entity type is
    dataset
    only. Is it still true? I have an external mysql DB that has dataset-datajob-dataset metadata and I'm looking for a way to ingest them into datahub. I already checked out the pipeline lineage too, but since I need to export my mysql data as desired format, it seems yaml file based one is more doable, so I wish I could use
    datajob
    or
    dataflow
    entity type in file based lineage too. Thanks!
    ✅ 1
    d
    • 2
    • 2
  • h

    hallowed-farmer-50988

    06/13/2023, 7:34 AM
    Hi all, could someone please help me understand this new behaviour I’m seeing on DataHub when upgrading to
    0.10.3
    (from 0.10.0)? So I’m ingesting dbt metadata with Athena as the target platform, and I noticed that now the browse structure as well as the dataset urn have the catalog name in them so for instance: Browse structure: • then:
    dataset/{ENV}/dbt/{platform_instance}/{database}/{table}
    • now:
    dataset/{ENV}/dbt/{platform_instance}/{catalog_name}/{database}/{table}
    urn: • then:
    urn:li:dataPlatform:dbt,{platform_instance}.{database}.{table},{ENV}
    • now:
    urn:li:dataPlatform:dbt,{platform_instance}.{catalog_name}.{database}.{table},{ENV}
    I searched the code for what could have caused that change and I couldn’t find anything. The problem with this is that now the entity created by the dbt ingestion for the target platform (Athena in my case) doesn’t match the existing entity for that platform as the later doesn’t have the
    catalog_name
    in the urn. Any help will be much appreciated.
    g
    • 2
    • 1
  • c

    cool-tiger-42613

    06/13/2023, 7:45 AM
    Hello, I have the datahub set up for custom ingestion source. The structure is multiple projects->buckets->dataset. I have a few datasets that are used across projects and the lineage does not make the connect because I define the urn as project_name.bucket_name.dataset_name. Is there a way to link the URN's or add the URN as alias vs actual datasets?
    ✅ 1
    g
    • 2
    • 1
  • a

    adventurous-apple-52621

    06/13/2023, 11:04 AM
    Hi, why there is not ingestion tab in my homepage when I use the datahub to login.
    d
    • 2
    • 3
  • r

    ripe-eye-60209

    06/13/2023, 2:04 PM
    Hello Team for the powerbi-ingestior when it runs for long time the token expires? It seems you don't have a logic to renew the token while the pipeline is running? could you check this?
    g
    g
    +2
    • 5
    • 12
  • o

    orange-gpu-90973

    06/13/2023, 3:18 PM
    Hi, Is there anyway to ignore any kind of exception while ingesting from some data source? Eg. like data source might have some missing columns but datahub can ignore them while ingesting data.
    d
    g
    • 3
    • 2
  • r

    rich-restaurant-61261

    06/13/2023, 5:53 PM
    Hi Team, I am trying to ingest superset data into datahub, and base on documentation(https://datahubproject.io/docs/generated/ingestion/sources/superset), I need to run following script, is anyone know where should I run it?
    pip install 'acryl-datahub[superset]'
    ✅ 1
    d
    • 2
    • 2
  • c

    creamy-battery-20182

    06/13/2023, 6:07 PM
    Hi! I was running into these exceptions on a dbt ingestion job:
    Copy code
    023-06-12 22:40:55,684 [qtp944427387-17466] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=assertionInfo, systemMetadata={lastObserved=1686609651915, runId=dbt-2023_06_12-22_40_42}, entityUrn=urn:li:assertion:d8691f1c759e159221940a3696e48cf8, entityType=assertion, aspect={contentType=application/json, value=ByteString(length=1375,bytes=7b226375...6e227d7d)}, changeType=UPSERT}
    
    2023-06-12 22:40:55,687 [qtp944427387-17421] ERROR c.l.m.filter.RestliLoggingFilter:38 - <http://Rest.li|Rest.li> error: 
    com.linkedin.restli.server.RestLiServiceException: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries
    But these are the underlying exceptions (logs are from the GMS pod):
    Copy code
    Caused by: io.ebean.DuplicateKeyException: Error when batch flush on sql: insert into metadata_aspect_v2 (urn, aspect, version, metadata, createdOn, createdBy, createdFor, systemmetadata) values (?,?,?,?,?,?,?,?)
    
    Caused by: java.sql.BatchUpdateException: Duplicate entry 'urn:li:assertion:04063f0fbcbe627b390598a883fb0272-assertionInfo-' for key 'PRIMARY'
    
    Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry 'urn:li:assertion:04063f0fbcbe627b390598a883fb0272-assertionInfo-' for key 'PRIMARY'
    Has anyone seen these before? What could be the underlying issue here, is there an issue with the data itself?
    ✅ 1
    d
    • 2
    • 1
  • e

    elegant-river-39160

    06/14/2023, 12:21 AM
    hi, I want to use Confluent S3 Sink Connector as a source in Datahub. From this, it seems like it is not supported yet. Also came across this thread but it seems like nothing happened there. Any suggestions?
    ✅ 1
    d
    • 2
    • 3
  • p

    purple-terabyte-64712

    06/14/2023, 9:31 AM
    Hi, can anyone help me with this issue? https://datahubspace.slack.com/archives/CUMUWQU66/p1683950289060239
    d
    • 2
    • 1
  • b

    billions-lawyer-94523

    06/14/2023, 8:13 PM
    Hi Team Datahub, greetings! .. here to request a quick help - we are not able to establish a successful connection from datahub to unity catalog and the connection status stuck in "Pending" . Any help / pointers would be great . Much appreciate your time and inputs.
    d
    g
    • 3
    • 10
  • l

    limited-forest-73733

    06/15/2023, 12:17 PM
    Hey team i am running dbt recipe from datahub UI but getting source configured error
    g
    a
    • 3
    • 6
  • w

    wonderful-book-58712

    06/16/2023, 1:38 AM
    Do we have an option to ingest metadata of CouchDB into Datahub ?
    ✅ 1
    g
    • 2
    • 1
  • c

    creamy-pizza-80433

    06/16/2023, 8:32 AM
    Hello there, I'm confused about how DataHub ingests data from my Hive database. Why are the tables ingested into separate containers, database and schema? In the first picture, some tables are ingested into the
    prd_db
    Schema container inside the Database container, while other tables are ingested directly into the Database container. Do you have any insights into why this might be happening? Could there be any issues with how I'm ingesting the data? Thank you!
    ✅ 1
    g
    • 2
    • 1
1...125126127...144Latest