https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • a

    alert-football-80212

    04/28/2022, 2:31 PM
    Hi, there is a recipe to ingest one specific kafka topic with his schema to datahub? Thank you
    e
    • 2
    • 7
  • g

    gorgeous-napkin-73659

    04/28/2022, 6:34 PM
    Hi 👋 I’ve been playing around with the Airflow lineage backend (super cool!) and had a question about inlets and outlets. In production, we have hundreds of Airflow DAGs, many that have important lineage metadata. Relying on users to manually add inlets and outlets for each DAG will likely not be scalable and prone to errors. Has anyone had any experience with programmatically adding these fields, maybe for specific operators? Thanks!
    d
    • 2
    • 9
  • m

    modern-belgium-81337

    04/28/2022, 7:51 PM
    Hi team, I’ve been trying to ingest data from Databricks and has not been able to connect (connection refused), here’s my recipe
    Copy code
    source:
        type: hive
        config:
            host_port: '<https://my-databricks-workspace.com:443>'
            database: null
            username: <mailto:myemail@gmail.com|myemail@gmail.com>
            password: databricks-user-access-token
    sink:
        type: datahub-rest
        config:
            server: '<http://localhost:9002/api/gms>'
    is there anything obvious that I am missing here?
    e
    m
    p
    • 4
    • 48
  • b

    broad-tomato-45373

    04/29/2022, 11:27 AM
    Hi Team, Need your help 1. We want to add some links related to a dataset in the ui while ingesting the dataset i.e programatically. (highlighted in screenshot for better clarity) is it possible to add the same via transformer in recipe or python datahub package? Any help would be much appreciated.
    h
    • 2
    • 2
  • c

    creamy-van-28626

    04/29/2022, 1:01 PM
    How we can look for URN to delete things from datahub ?
    b
    m
    • 3
    • 44
  • r

    rapid-book-98432

    04/29/2022, 1:19 PM
    Hi there 🙂 The superset ingestion is only for databases/datasets in superset, but how to you create lineage between datasets & superset dashboard. I thought that lineage was defined at ingestion aswell 😕 bad understanding from my part. So to have a lineage you must use that way : https://datahubproject.io/docs/lineage/sample_code ? Thanks for you help 🙂
    e
    g
    • 3
    • 3
  • m

    millions-waiter-49836

    04/29/2022, 3:28 PM
    Hey guys, I pushed some quantiles stats onto my local datahub, but I can’t see that in the UI. I don’t see the quantiles in demo too. I also can’t see partitionSpec in the UI after I pushed it.
    e
    r
    • 3
    • 15
  • g

    gorgeous-telephone-63628

    04/29/2022, 5:02 PM
    Hello, I am working on setup on DataHub on a Kubernetes cluster using helm. I would like to enable Ingest sources as part of the helm charts is that possible? If there is documentation that touches on this could you point me towards it?
    e
    e
    +3
    • 6
    • 17
  • b

    bland-orange-13353

    04/30/2022, 4:27 AM
    This message was deleted.
    c
    • 2
    • 4
  • c

    cuddly-arm-8412

    04/30/2022, 9:43 AM
    hi,when i run datahub ingest -c /github/datahub/metadata-ingestion/mysql_recipe.yml an error was prompted (mysql is disabled due to an error in initialization) datahub ingest -c /Users/wangdongkun/chj/github/datahub/metadata-ingestion/mysql_recipe.yml [2022-04-30 174148,662] INFO {datahub.cli.ingest_cli:96} - DataHub CLI version: unavailable (installed in develop mode) [2022-04-30 174148,984] ERROR {datahub.entrypoints:165} - mysql is disabled due to an error in initialization [2022-04-30 174148,984] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.0.0.dev0 at /Users/wangdongkun/chj/github/datahub/metadata-ingestion/src/datahub/__init__.py [2022-04-30 174148,984] INFO {datahub.entrypoints:179} - Python version: 3.8.9 (default, Oct 26 2021, 072553) [Clang 13.0.0 (clang-1300.0.29.30)] at /Users/wangdongkun/chj/github/datahub/metadata-ingestion/venv/bin/python on macOS-12.1-arm64-arm-64bit [2022-04-30 174148,984] INFO {datahub.entrypoints:182} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.33', 'commit': 'b1b1898752be8d6a7d613b4adc0c579ee3c2b97c'}}, 'managedIngestion': {'defaultCliVersion': '0.8.32.1', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'} Can I get more details about mysql errors
    d
    • 2
    • 9
  • a

    alert-football-80212

    05/01/2022, 8:13 AM
    Hi all, I'm trying to create for myself s3 ingestion recipe according to datahub docs i see that one recipe is for one table in s3. There is a way to create recipe to ingest multiple tables that store under the same bucket?
    d
    • 2
    • 9
  • b

    best-umbrella-24804

    05/02/2022, 1:39 AM
    When executing this I get the following error '1 validation error for GlueSourceConfig\n' 'catalog_id\n' ' extra fields not permitted (type=value_error.extra)\n',
    b
    • 2
    • 1
  • w

    wonderful-egg-79350

    05/02/2022, 5:27 AM
    Hello Everyone. I have a question about lineage. How to ingest data lineage from MS SQL(SQL Server) to DataHub?
    h
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    05/02/2022, 7:09 AM
    Hi, I have deployed the v0.8.33 of Datahub in Docker. I have also managed to ingest information from both Hive and Trino, but haven't been able to get their lineage (although some datasets DO have lineage). Is there anything I could do to make that lineage discoverable to Datahub?? Thanks in advance!
    👍 1
    d
    h
    • 3
    • 7
  • w

    wonderful-egg-79350

    05/02/2022, 8:14 AM
    예시 mstest.PNG
    h
    f
    • 3
    • 15
  • m

    microscopic-umbrella-94716

    05/02/2022, 9:27 AM
    Hello! We have had a wild idea to try ingest metadata from GCS using the the "S3 Datalake" module. So far the ingestions haven't been successful, although the interface claims so. Basically the ingestion goes through, but no metadata comes in. Is there even a theoretical change, that this could work. The S3 and GCS claim to be intercompatible at least to some level, so we thought to give it a try.
    d
    l
    +2
    • 5
    • 11
  • c

    chilly-potato-57465

    05/02/2022, 10:12 AM
    Hello everyone! I am trying to understand if data collection flows for IoT data relying on MQTT -> InfluxDB (or another timeseries db) -> Grafana can be ingested in DataHub. I have looked into existing source plugins on the DataHub page but can't seem to find any timeseries or Grafana related ones. Is there any other place I can find DataHub source plugins? How should I approach a custom implementation, for instance extracting timeseries db schema into a file and ingesting the file? Many thanks in advance for the advice!
    h
    • 2
    • 2
  • s

    shy-parrot-64120

    05/02/2022, 12:11 PM
    Hi folks, need some small consultation using custom Python ingesting module to generate Airflow flows/jobs over our own metadata-driven config trying to add Link but cant find an aspect which to use in
    DataJobSnapshotClass.aspects
    h
    • 2
    • 3
  • a

    able-painting-61389

    05/02/2022, 1:33 PM
    Hi everyone, I'm trying to ingest metadata about an OpenApi-based API, but the API is setup to always return an object with shape:
    Copy code
    {
    “data”: [<Data goes here>],
    “error: “if error - message goes here”
    }
    Is there a way for me to use the ingester or write a transformer that lets me grab the “data” key from the result?
    h
    • 2
    • 6
  • o

    orange-coat-2879

    05/02/2022, 4:14 PM
    Hi folks, I ingested data from dbt but all of models become tables. There is not model sub type. How can I fix it? Thanks!
    h
    • 2
    • 2
  • m

    modern-belgium-81337

    05/02/2022, 5:03 PM
    is there an issue with our docs?
    h
    m
    • 3
    • 10
  • n

    nutritious-bird-77396

    05/02/2022, 10:39 PM
    Team... Looking for some help with
    datahub-actions
    env vars:
    Copy code
    KAFKA_PROPERTIES_SECURITY_PROTOCOL=SASL_SSL
    KAFKA_PROPERTIES_SASL_MECHANISM=SCRAM-SHA-512
    KAFKA_BOOTSTRAP_SERVER=<bootstrap-server>
    SCHEMA_REGISTRY_URL=<schema-registry-server>
    KAFKA_PROPERTIES_SASL_JAAS_CONFIG=org.apache.kafka.common.security.scram.ScramLoginModule required username="$(MSK_USERNAME)" password="$(MSK_PASSWORD)";
    I am getting the below error when running datahub actions
    Copy code
    KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create consumer: No provider for SASL mechanism GSSAPI: recompile librdkafka with libsasl2 or openssl support. Current build options: PLAIN SASL_SCRAM OAUTHBEARER"}
    Are the env vars correct? Any other additional vars needed? It worked in other modules (ingestion/frontend)...
    h
    b
    f
    • 4
    • 43
  • s

    steep-soccer-91284

    05/03/2022, 2:13 AM
    Hi folks, I’m wondering what is the result of ingesting Okta.
    b
    • 2
    • 1
  • l

    lemon-terabyte-66903

    05/03/2022, 3:53 AM
    Hi How to create a custom orchestrator like Airflow? I am trying to have databricks as orchestrator.
    • 1
    • 1
  • l

    lemon-terabyte-66903

    05/03/2022, 6:43 AM
    I am trying to create a lineage using python emitter for the datasets ingested using s3 connector. But the resulting datajob (task) looks empty like this.
    f
    • 2
    • 2
  • a

    alert-football-80212

    05/03/2022, 12:23 PM
    Hi all, I want to ingest a specific kafka topic and his schema and I am not sure how to write that kind of recipe. There is any chance that someone already have that kind of recipe for example? Thank you!
    m
    • 2
    • 2
  • d

    dry-zoo-35797

    05/03/2022, 4:48 PM
    Hello All, Seems like ‘Acryl-datahub[mssql]’ connector only supports local database authentication (userid/pass). Does this connector also support Windows Authentication? If so, what would be configuration? Appreciate your response.
    h
    f
    • 3
    • 3
  • q

    quaint-lighter-81058

    05/03/2022, 5:01 PM
    Hi All, I have ingested the database with couple of tables with primary key and foreign key relationships... I don't see any lineage between the related tables from MYSQL source am i missing any configuration ?
    h
    • 2
    • 1
  • o

    orange-coat-2879

    05/03/2022, 5:53 PM
    Hi folks, I use
    datahub docker quickstart
    to start datahub but get error below. My colleague is using mysql port 3306 on the VM. My boss created a security group to allow us access to port 9002 using a link without having to create an ssh tunnel, is it the reason the frontend-react container not presented? How can I start the datahub appropriately? Thanks!
    g
    • 2
    • 20
  • m

    millions-waiter-49836

    05/03/2022, 9:05 PM
    Hi everyone, I am developing lookerml ingestion locally. I ran
    pip install -e '.[dev]'
    and tried to ingest lookml via cli, but it returned
    ConfigurationError: lookml is disabled; try running: pip install 'acryl-datahub[lookml]'
    . Any thoughts?
    m
    • 2
    • 4
1...394041...144Latest