https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • m

    mammoth-fountain-32989

    04/22/2022, 3:31 PM
    Hi, Is there a way to know when was a dataset's metadata last refreshed on Datahub from UI? Thanks
    b
    • 2
    • 2
  • m

    modern-monitor-81461

    04/22/2022, 4:32 PM
    How to best leverage DBT with DataHub Data engineers in my org are starting to use DBT. It seems to be working fine for them, so now I'm trying to get metadata from DBT into DataHub. I have tested the DBT source twice (once with
    disable_dbt_node_creation
    to
    true
    and once with
    false
    ) and we feel like having the DBT nodes is better since it tells the user how the resulting dataset was created. But I can totally see that the lineage can become messy once DBT is used all over the place in your pipelines and I think I recall that somewhere on DataHub's roadmap, there was a planned feature where you could hide elements from the lineage and that could be used for showing or not DBT intermediate nodes... I tried to use the
    meta_mapping
    feature of the DBT source to create a Glossary Term depending on a meta value (our DBT model has a meta field describing the data tier, Bronze/Silver/Gold), but I want to have this term applied to the "platform dataset" and not only to the DBT node... I don't want to have to go in DataHub and manually add those terms to the resulting dataset, I want everything being driven by DBT and the source model. What I would like to know is how are you guys using DBT with DataHub? Is what I'm asking for something impossible or silly? Looking for advice on how to best leverage DBT with DataHub... Thanks!
    l
    m
    g
    • 4
    • 8
  • f

    fierce-city-89572

    04/22/2022, 5:32 PM
    Question about policy: It looks like the policies only apply to datahub platform and metadata inside Datahub, right? Can I define a policy that will enforce access control (authorization) on a database? e.g. Can I define a policy like user A can read/write to table B in Postgres DB, and it would be enforced (implemented) on the Postgres?
    b
    • 2
    • 2
  • s

    some-minister-22606

    04/23/2022, 3:23 PM
    Hi! I’m looking to start using DataHub for our corporate Oracle and MSSQL db. Previous database engineers did not set up all possible foreign keys. Does DataHub provide a way to add these relationships in the metadata, or does it need changing at the source? Adding it at the source is ideal in most cases, but I know some prefer not to add foreign keys in the db since it can make testing and loading data more difficult. Thanks and I’m excited to get started!
    e
    • 2
    • 3
  • b

    blue-hair-87908

    04/24/2022, 12:55 AM
    HI. I'm 14 years old and I want to use DataHub for my Snowflake DB. I am looking for some assistance as I don't know much about this.
    b
    • 2
    • 1
  • a

    alert-football-80212

    04/24/2022, 9:51 AM
    Hi all, I am trying to use the REST.li API, someone succeed to use it and recommend?
    e
    h
    • 3
    • 5
  • s

    straight-telephone-84434

    04/25/2022, 2:16 PM
    Good evening, I am trying to develop datahub in minikube, everything is working fine except for the datahub-datahub-upgrade-job, what happens if I disable it? Is it very important?
    i
    s
    • 3
    • 51
  • m

    modern-belgium-81337

    04/25/2022, 11:37 PM
    I’m trying to set up a local version of datahub and is running into this error
    Copy code
    ./gradlew build
    To honour the JVM settings for this build a single-use Daemon process will be forked. See <https://docs.gradle.org/6.9.2/userguide/gradle_daemon.html#sec:disabling_the_daemon>.
    Daemon will be stopped at the end of the build
    
    FAILURE: Build failed with an exception.
    
    * Where:
    Settings file '/Users/thnguyen/dev/rivian/datahub/datahub/settings.gradle'
    
    * What went wrong:
    Could not compile settings file '/Users/thnguyen/dev/rivian/datahub/datahub/settings.gradle'.
    > startup failed:
      General error during semantic analysis: Unsupported class file major version 61
    
      java.lang.IllegalArgumentException: Unsupported class file major version 61
    I followed every step in the documentation here https://datahubproject.io/docs/developers/ Is there anything that I’m missing?
    b
    m
    • 3
    • 27
  • c

    cool-architect-34612

    04/26/2022, 2:36 AM
    Hi, I got these errors while
    'docker pull acryldata/datahub-upgrade:head && docker run --env-file docker_env/datahub-upgrade.env acryldata/datahub-upgrade:head -u RestoreIndices'
    how can I solve this?
    Copy code
    Starting upgrade with id RestoreIndices...
    Cleanup has not been requested.
    Skipping Step 1/3: ClearSearchServiceStep...
    Cleanup has not been requested.
    Skipping Step 2/3: ClearGraphServiceStep...
    Executing Step 3/3: SendMAEStep...
    Sending MAE from local DB...
    Found 38084 latest aspects in aspects table
    Reading rows 0 through 1000 from the aspects table.
    2022-04-26 02:34:24.372  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
    2022-04-26 02:34:24.372  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
    2022-04-26 02:34:25.577  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
    2022-04-26 02:34:25.577  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
    2022-04-26 02:34:26.481  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
    2022-04-26 02:34:26.481  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
    2022-04-26 02:34:27.585  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
    2022-04-26 02:34:27.586  WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
    e
    g
    • 3
    • 4
  • m

    mammoth-fountain-32989

    04/26/2022, 9:32 AM
    Hi, How to ingest Pipelines information into Datahub. From sample data, I see few Airflow DAGs data being ingested. I general, how to create any entity of a specific type with the quickstart the docker setup. Thanks
    d
    • 2
    • 14
  • l

    lemon-terabyte-66903

    04/26/2022, 3:06 PM
    Does DataHub handle schema changes? Let’s say column type changes or new column gets added in a s3 dataset. How to capture those?
    e
    h
    • 3
    • 39
  • p

    powerful-easter-94567

    04/27/2022, 8:02 AM
    HI all,Can datahub gms service be debugged in local IDEA and not by docker mode? How can i do it? thanks!
    e
    a
    o
    • 4
    • 4
  • b

    bitter-dusk-52400

    04/28/2022, 7:47 AM
    Where can i find python’s acryl-datahub code in github and also is there documentation for this repo? Please share me some links
    d
    m
    • 3
    • 6
  • a

    astonishing-guitar-79208

    04/28/2022, 9:17 AM
    Hi All. Can someone point me to datahub python and java SDKs which have querying capability? AFAIK the java, python SDKs only support emitting change events.
    b
    • 2
    • 2
  • r

    rhythmic-beard-86717

    04/28/2022, 12:44 PM
    I am trying to build datahub-frontend on my local system but it's filling when configuring. Can you tell me how to build it on local.
    b
    • 2
    • 9
  • c

    curved-football-28924

    04/28/2022, 4:48 PM
    When I am trying datahub docker quickstart, I am getting this error. Kindly guide how to resolve it
    thank you 1
    f
    • 2
    • 1
  • p

    proud-chef-44773

    04/29/2022, 1:04 PM
    I’ve got a good picture of how DataHub can skim off metadata as individual groups in a company model and process their data. That allows the company to know what exists and where it is. I also see how individual tables can be queried using dao. What I’m trying to understand is how this metadata provides for querying data across the company. If a group’s data is on S3, it could be directly queried. But if it is in a database like MySQL, that requires something like a federated query. What do companies do to support an enterprise data warehouse from company-wide data? If they are required to push their data and not just instrument what they do and have for metadata sharing, would publishing it consist of streaming it to s3? And that would make it available for the data warehouse to consume and then publish using reverse ETL?
    e
    l
    • 3
    • 4
  • m

    many-guitar-67205

    05/02/2022, 8:18 AM
    2 questions on Policies In the documentation it says:
    Policies can be managed under the /policies page, or accessed inside the Control Center, a slide-out menu appearing on the left side of the DataHub UI
    1. What is the Control Center and this slide-out menu? Or is that documentation out of date? 2. can you use regex in the condition? the JSON has a
    "condition": "EQUALS"
    hinting at other options, but the code only seems to support
    EQUALS
    h
    b
    • 3
    • 6
  • b

    bitter-dusk-52400

    05/02/2022, 8:55 AM
    Can anyone share any documentation for creating domains programmatically or ingesting new domains to datahub programmatically
    b
    • 2
    • 3
  • p

    powerful-librarian-82760

    05/02/2022, 3:42 PM
    Trying to develop a Metadata Ingestion Source for internal purpose first, I am following the document https://datahubproject.io/docs/metadata-ingestion/adding-source/ but I can not find how I can register the new class in datahub utility (following Step 3) #getting-started
    h
    b
    • 3
    • 8
  • o

    orange-tailor-45265

    05/03/2022, 2:32 AM
    Hello,I want to create a python class named as ThriftSchema. However, I don't know how to compile the ThriftSchema.pdl file into a python class. The file is this
    namespace com.linkedin.schema
    /**
    * Schema holder for Thrift schema types.
    */
    record ThriftSchema {
    /**
    * The native schema in the dataset's platform.
    */
    rawSchema: string
    fields: array[ThriftField]
    annotations: optional array[Annotation]
    namespace_: optional map[string, string]
    }
    I expected that I can generate the python code from this by excuting
    ./gradle :metadata-ingestion:codegen
    and I expected the generated python code appearing in this path:
    datahub/metadata-ingestion/src/datahub/metadata/com/linkedin/pegasus2avro/schema/__init__.py
    as well as
    /home/waylee/datahub/metadata-ingestion/src/datahub/metadata/schema_classes.py
    because OtherSchema and many other pythohn codes generated by pdl file appear here. My ThriftSchema,pdl located in
    datahub/metadata-models/src/main/pegasus/com/linkedin/schema/ThriftSchema.pdl
    . Is there anything wrong in my process? My context is creating a schema to ingest thrift file.
    👍 1
    h
    • 2
    • 4
  • b

    bitter-dusk-52400

    05/03/2022, 7:11 AM
    Hi , Need suggestion for creating domains programmatically using python Below is the graphql query for creating domains in datahub
    Copy code
    '{"query" : "mutation createDomain { createDomain(input: { name:\"det_test\",description: \"from graphql api for test\"})}"}'
    if i execute the above query on the curl command i can able to create domains but i tried using python packages like gql i got error like “__init__() got an unexpected keyword argument ‘allowed_methods’” please refer below python code:
    Copy code
    from gql import Client, gql
    from gql.transport.requests import RequestsHTTPTransport
    
    transport = RequestsHTTPTransport(
        url='<http://localhost:8080/api/graphql>',
        headers= {
            'X-DataHub-Actor': 'urn:li:corpuser:datahub',
            'Content-Type': 'application/json'
        },
        verify=True,
        retries=3,
    )
    
    client = Client(transport=transport, fetch_schema_from_transport=True)
    
    query = gql(
        '''
         mutation createDomain 
            { createDomain(input: 
                { name:"det_test1",description: "from graphql api for test"})
                }
            '''
    )
    
    result = client.execute(query)
    print(result)
    h
    • 2
    • 1
  • b

    bright-furniture-98709

    05/03/2022, 2:03 PM
    Hi, a couple of questions from a newbie: • What is an algorithm of search results? That is, according to what rules the search results are sorted and whether this behavior could be configured/customized? • We are working with Snowflake and Tableau ingestion, but it seems like data coming from Tableau and data coming from Snowflake are not merged into a single lineage mapping. Namely, the same table has two lineages - one for Snowflake and another for Tableau (see attached pictures). I would expect them to be consolidated into a single lineage. Is it a known issue?
    m
    b
    h
    • 4
    • 11
  • b

    bored-dress-52175

    05/03/2022, 2:11 PM
    I am trying to run actions on files ingestion but it is showing this kind of error can anybody help me?
    h
    b
    • 3
    • 6
  • b

    bored-dress-52175

    05/03/2022, 8:03 PM
    I just trying to ingest this file into datahub and it is showing this type of error, pleaseeeeeee help me
    m
    b
    h
    • 4
    • 25
  • w

    wonderful-egg-79350

    05/04/2022, 3:10 AM
    How to change folder? for example I want to make upper folder like below path(second one) From ‘Datasets > prod > s3 > project > root > events > logging_events_bckp’ To ‘Datasets > prod > s3 > test >project > root > events > logging_events_bckp’
    h
    • 2
    • 1
  • c

    curved-crayon-1929

    05/04/2022, 7:50 AM
    HI All, I have a few questions on datahub (Asked by management in my demo ) Can someone help me with the following questions? 1. Where is this MetaData stored? How is Security handled? 2. Is there an automatic tagging enabled/ in the roadmap? 3. Customization of Analytics possible? Can we create a dashboard from the existing metadata? thanks, nagendra.
    h
    • 2
    • 2
  • f

    fresh-napkin-5247

    05/04/2022, 12:22 PM
    Hello. how can I install the current datahub version from github? I think there is a PR that changed something on the GLUE connector that might solve an error that I was having, but I am not sure how to install the current development version from github, any help? Also, when will the new version be released? Thanks!
    h
    • 2
    • 3
  • s

    shy-kitchen-7972

    05/04/2022, 1:03 PM
    #getting-started Hi everyone, does anyone has a good example data set in the demo environment that has great expectations validation results linked to it?
    m
    • 2
    • 2
  • a

    able-optician-93924

    05/04/2022, 4:10 PM
    hey folks.. playing around with some graphql. trying to return a list of fields that are have a certain glossary term attached. So, far, I have it returning only at the table level but the fields are returning nulls, even though the fields have both tags and terms.
    • 1
    • 4
1...262728...80Latest