https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • f

    full-area-6720

    10/13/2021, 9:43 AM
    Hi everyone, I am Amit, and I am getting familiar with datahub. I wanted to know if there's table lineage available for redshift. Can somebody guide me on this. Thanks!
    m
    • 2
    • 3
  • f

    full-area-6720

    10/14/2021, 11:43 AM
    Hi, I was looking to set up a demo for our team using our own data. And I wanted to create new usernames and passwords. Is that possible without having to setup sso? Just for demo purposes, a manual way maybe.
    b
    b
    c
    • 4
    • 5
  • f

    fierce-action-87313

    10/14/2021, 4:08 PM
    Hello all. QQ: a lot of settings can be setup via env vars, which is very useful of course. However, for some parameters (i.e. the SQL URL) this could be worth programatically set or have some configuration templates that we can configure with a more featured engine than just env vars (i.e. Jinja2) So where can one find the config files that would very useful (not K8s helm values)
    b
    • 2
    • 1
  • b

    brief-lock-26227

    10/17/2021, 3:33 PM
    I heard about Datahub on the latest episode of the Data Engineering Podcast and decided to check it out. When I run the quickstart on my MacBook Pro with the M1 chip, it fails on `mysql`:
    Copy code
    % datahub docker quickstart
    Fetching docker-compose file from GitHub
    No Datahub Neo4j volume found, starting with elasticsearch as graph service.
    To use neo4j as a graph backend, run
    `datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml`
    from the root of the datahub repo
    
    Pulling elasticsearch          ... done
    Pulling elasticsearch-setup    ... done
    Pulling mysql                  ... pulling from library/mysql
    Pulling datahub-gms            ... done
    Pulling datahub-frontend-react ... done
    Pulling mysql-setup            ... done
    Pulling zookeeper              ... done
    Pulling broker                 ... done
    Pulling schema-registry        ... done
    Pulling kafka-setup            ... done
    
    ERROR: for mysql  no matching manifest for linux/arm64/v8 in the manifest list entries
    ERROR: no matching manifest for linux/arm64/v8 in the manifest list entries
    [2021-10-17 09:24:33,254] ERROR    {datahub.entrypoints:99} - File "/opt/homebrew/lib/python3.9/site-packages/datahub/entrypoints.py", line 91, in main
    ...
    I found a page saying it might help to have a
    platform:
    specified in the Dockerfile, but the only Dockerfile I can find is transient and I haven't found a good way to edit the one that the quickstart script executes. Any suggestions?
    m
    • 2
    • 5
  • e

    elegant-machine-39016

    10/17/2021, 11:50 PM
    Hi - I am trying to explore Datahub locally to see if it is a good option for us for Kafka Metadata. I started the quickstart and ingested sample data. This shows me the samplekafkadataset (screenshot). But I am unsure where it is being pulled from. When I view the topics in the kafka broker on docker I don't see the topic name
    SampleKafkaDataset
    when I view the topics with kafkacat
    kcat -L -b localhost:9092
    . Can someone explain where is the metadata about the sample kafka stored and coming from?
    b
    • 2
    • 1
  • e

    elegant-machine-39016

    10/18/2021, 12:03 AM
    Additionally, when I add a new topic and add sample data to it using kafkacat like this
    Copy code
    kcat -P -b localhost:9092 -t topic1 -K :
    mykey1:mymessage1
    mykey2:mymessage2
    I don't see this show up in datahub. Do I need to run an ingestion job after this for this to be picked up by datahub?
    b
    l
    • 3
    • 3
  • b

    brave-businessperson-3969

    10/18/2021, 10:11 AM
    I'm just started playing around with DataHub. It looks very powerfull and might be a good software for our data catalog needs. I have a question concerning table delets: We have some RDMBS (mainly Oracle and MS) where whole tables get created and deleted now and then. New tables and new or deleted columns are picked up by DataHub during ingestion but when a table gets deleted in the database it remains visible in DataHub. I understand, that "delete table" is a dedicated command which needs to be send to the backend, but are there any best practise or suggestions how we can detect deleted tables during scanning/ingestion and then remove them from the catalog? Maybe some kind of comparison/diff?
    l
    b
    b
    • 4
    • 5
  • w

    witty-keyboard-20400

    10/18/2021, 11:49 AM
    When I try the following GraphQL query:
    Copy code
    {
      dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:cg,kv_entity,PROD)") {
        schemaMetadata {
          fields {
            fieldPath
          }
        }
      }
    }
    The response is an error message:
    Copy code
    {
      "errors": [
        {
          "message": "An unknown error occurred.",
          "locations": [
            {
              "line": 5,
              "column": 5
            }
          ],
          "path": [
            "dataset",
            "schemaMetadata"
          ],
          "extensions": {
            "code": 500,
            "type": "SERVER_ERROR",
            "classification": "DataFetchingException"
          }
        }
      ],
      "data": {
        "dataset": {
          "schemaMetadata": null
        }
      }
    }
    However, my GMS is running fine which I verified using the same dataset urn, but queried on ownership info, which worked. Could anyone help me understand what is wrong in my query. Basically I'm looking for GraphQL queries wherein: 1. Given a dataset URN and field path mention, the response gives list of all the field names. 2. field name is specified in query, and the response returns the names of all the datasets which have field with the matching name. 3. Glossary term is specified in the query and the response returns the names of all the fields (and datasets) to which the glossary term is tagged.
    b
    m
    • 3
    • 8
  • b

    busy-dusk-4970

    10/18/2021, 4:18 PM
    Has anyone gotten DataHub running on a M1 mac?
    s
    b
    +7
    • 10
    • 86
  • a

    agreeable-hamburger-38305

    10/19/2021, 3:57 AM
    Hi all! I am deploying DataHub on GKE and I want to put it behind GCP IAP. If I then also set up Google Authentication like the DataHub Doc detailed, the user would have to log in twice. So I want to skip OAuth, and simply use the JWT token added by IAP to identify individual users. I am wondering if DataHub has something that works with IAP, maybe similar to this https://github.com/GoogleCloudPlatform/jupyterhub-gcp-proxies-authenticator/blob/c[…]6b70e9005c52/gcpproxiesauthenticator/gcpproxiesauthenticator.py
    b
    l
    • 3
    • 22
  • f

    full-area-6720

    10/19/2021, 11:42 AM
    Hi, is it possible to restrict some tables from being viewed by a certain set of users? And columns too, such as PII columns
    b
    g
    l
    • 4
    • 11
  • b

    blue-megabyte-68048

    10/20/2021, 10:14 PM
    Is there any way (through the API) to search for a dataset within a specific browse path?
    q
    e
    • 3
    • 8
  • c

    cuddly-house-13470

    10/21/2021, 9:10 PM
    Hi everyone. Great tool and support. I have one question. Do Datahub's source extractors support deleting previously ingested metadata for deleted objects. Just ran a Snowflake extraction recipe and I noticed that non-longer existing objects still appear in the catalog. Thanks!
    m
    • 2
    • 2
  • s

    silly-translator-73123

    10/22/2021, 8:46 AM
    can i use datahub to manage column data dictionary ? especially for enum data type e.g male female
    m
    b
    • 3
    • 4
  • b

    blue-animal-80464

    10/22/2021, 1:07 PM
    Hi we introduce datrahub at the moment - and I saw in the last medium post improvements-to-user-group-management that groups and users can be synced from azure-AD. Is it also possible to do this syncing from an LDAP. So that we can create Users, Groups and Group Membership. Logging with LDAP is already possible (via Jaas.config) but then the logged in Users are not part of a group.
    b
    a
    • 3
    • 9
  • n

    nice-country-99675

    10/22/2021, 2:40 PM
    Hi all! just a quick question... I'm running
    datahub
    locally (using docker images) and I ran a ingestion to load some metadata from postgres... everything seem to be executed properly... no failures or warnings in the sink, plus about 116 records written. But I cannot see any dataset in the frontend. Do I need to run something else beyond
    datahub ingest run
    ? I'm using the default
    datahub
    user, so I don't know how this user is related to the data ingested...
    b
    e
    • 3
    • 15
  • k

    kind-dawn-17532

    10/22/2021, 7:35 PM
    Hi All! On a non Neo4j deployment, I saw datahub stored lineage in MySQL tables as upstreamLineage aspect.. Are there there other places where the lineage is also stored? I deleted all upstreamlineage rows from the backend database, but i still see Lineage in DataHub ui...
    g
    l
    • 3
    • 6
  • e

    elegant-machine-39016

    10/23/2021, 10:16 PM
    Hi - Is it able to use datahub to visualize the downstream consumers of a kafka topic? For example in the demo https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)/Lineage?is_lineage_mode=true . how is datahub knowing that Baz Chart 1 and SampleHDFSDatabase are built on using SampleKafkaDataset ?
    g
    • 2
    • 2
  • c

    cuddly-family-62352

    10/25/2021, 2:28 AM
    I started with dataHub Docker QuickStart, And then I quit, How do I restart it?
    l
    • 2
    • 2
  • s

    silly-translator-73123

    10/25/2021, 3:15 AM
    can i use data preview function now?
    l
    • 2
    • 2
  • f

    freezing-teacher-87574

    10/25/2021, 8:00 AM
    Hello. How I can connect recipe trough ssl to superset? and setting frontend to navegate to Superset? Thanks
    m
    s
    k
    • 4
    • 9
  • f

    future-hamburger-62563

    10/25/2021, 2:04 PM
    I've been trying to build datahub, but the gradle job is failing at Task docs websiteyarnLint. Does the error look familiar to anyone?
    g
    • 2
    • 26
  • a

    abundant-flag-19546

    10/26/2021, 10:18 AM
    Hi all, I’m trying to access GraphQL API with datahub-frontend. I want to use the GraphQL API in CLI, but I can’t find how to get the auth cookie without using browser. I tried adding additional oidc callback uri to Okta,
    localhost:<PORT>/callback/oidc
    , and make a flask server that can get <STATE> and <CODE>. (reference https://developer.okta.com/blog/2018/07/16/oauth-2-command-line) But when I make a GET request to
    http://<DATAHUB_URL>/callback/oidc?code=<CODE>&state=<STATE>
    , It makes redirect-uri mismatch error. (
    Bad token response, error=invalid_grant
    ) Is there any great way to get the auth cookie without using browser? (I’m using Okta OIDC.)
    e
    b
    b
    • 4
    • 6
  • b

    bland-wolf-37286

    10/27/2021, 10:01 AM
    Hi all, I’m at the stage of doing a proof of concept to learn how DataHub can satisfy our data catalogue needs. So far I’ve spun up the quickstart deployment and have ingested metadata from Hive tables. That was all quite straightforward thanks to good documentation and examples. I’m now looking at taking this further to explore how we can associate extended information about schema fields over and above
    name
    ,
    type
    and
    description
    . For example, we might want to add information about the
    format
    (perhaps ‘UUID’, ‘ISO8601 date’ or some other free text),
    source
    (where does data in the field originate from) and perhaps other attributes we might define. This extended information will need to be editable from within the UI as well as via the API. I’ve been looking at doing this by extending the metadata model, adding attributes to
    SchemaField.pdl
    and
    EditableSchemaFieldInfo.pdl
    then chasing the changes through, but it looks like I need make changes in quite a lot of other places (so far I have edits in 10 different
    pdl
    ,
    graphql
    ,
    json
    and
    java
    files). I thought it best to pause at this point and ask the community on here whether this is the right way to go about this or if there’s a better way that I have overlooked?
    m
    b
    +2
    • 5
    • 13
  • a

    agreeable-hamburger-38305

    10/28/2021, 12:48 AM
    Hi all, is there a way to set things up so that the metadata would be stored in BigQuery?
    m
    • 2
    • 5
  • a

    acceptable-honey-21072

    10/28/2021, 4:35 AM
    Hii, can anyone help me use DataHub with Clickhouse in a docker environment? Here is something I found: https://githubmemory.com/repo/linkedin/datahub/issues/2381
    m
    • 2
    • 6
  • n

    nutritious-agent-76783

    10/28/2021, 12:21 PM
    Hi to all, I have a question regarding the dashboard and charts. If I add for example multiple instances of redash source I noticed that urn creation for dashboards is like
    urn:li:dashboard:(<tool>,<id>)
    and for charts is similar. Does this mean that if I add multiple redash instances that some dashboards and charts will be overwritten? I suppose that if I don't want that to happen that I need to change the existing entity as well to take care that the other parts work with the modified entity. For some additional distinction, I suppose that maybe transformers will work. Do you plan to change this behavior in some future releases?
    plus1 2
    m
    • 2
    • 2
  • d

    damp-minister-31834

    10/29/2021, 5:39 AM
    Hi all, I'm using Datahub as what "Quickstart" in official website tells me to do and the service is running with local mode . I wander if there is a way to deploy Datahub with a distributed mode and how?
    b
    l
    • 3
    • 8
  • f

    fierce-action-87313

    11/01/2021, 2:54 PM
    Hello. Testing again the startup with the quickstartup.sh, but now getting errors in the broker receiving requests for new topics, with replication factor = 3 where there is only 1 broker so replication factor must be 1 😕
    g
    • 2
    • 8
  • d

    damp-minister-31834

    11/04/2021, 2:07 AM
    Hello, I am new to datahub. I want to ask something about GraphQL Api and Rest li Api. The two apis are all for interacting with entities & relationships comprising your metadata graph. I want to know exactly the difference of the two apis functionally.
    b
    • 2
    • 4
1...141516...80Latest