https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • l

    little-spring-72943

    06/28/2022, 9:05 AM
    What are the best practices putting Datahub to production? 1) I have 3 (or more) installation of Datahub (Dev, Test and Prod) and ingest each environment's metadata respectively. 2) Have one installation of Datahub, ingest all environments metadata to it and restrict users view environment in Datahub (e.g. only developers can see DEV environment's metadata). Is #2 even possible?
    b
    • 2
    • 5
  • b

    bland-orange-13353

    06/29/2022, 7:27 AM
    This message was deleted.
  • h

    happy-helmet-66366

    06/29/2022, 2:45 PM
    What do people use for managing reference data within their systems? Even better if it integrates with data hub
  • h

    hallowed-machine-2603

    06/30/2022, 7:38 AM
    Hi guys, I have one question about DataHub Interface. What is the Quries tab at table page? How can I activate that? And what is the Validation tab at table page? How can I activate that? Thx 🙂
    d
    • 2
    • 7
  • g

    gray-architect-29447

    06/30/2022, 9:45 AM
    Hi all, i'm trying to get the list of login events from elasticsearch database and observed that new events are not being pushed into the elasticsearch. Do you guys know when do new events put into the elasticsearch db, maybe next day? Or a few hours later?
    o
    • 2
    • 1
  • t

    tall-magician-303

    06/30/2022, 2:15 PM
    Hi everyone! I'm attempting to ingest some data in the quickstart guide. I have a simple CSV file and I am using the datalake files source. I am trying to use the profile feature to infer the schema and profile the data. It needs PySpark and PyDeequ for it. Where I am running into trouble is getting all the dependencies to versions that work together. I am on a Mac Book Pro 2020 (M1 chip). I can install everything via homebrew and PySpark works well. However it only installs apach-spark version 3.3.0 which doesn't currently work with PyDeequ. I have tried to install an older version of spark manually but having lots of trouble getting everything to work together. Has anyone done this already and would be able to share the detailed steps to help? Thank you!
    l
    • 2
    • 2
  • l

    late-zoo-31017

    06/30/2022, 3:52 PM
    Can I have my own custom entities if I use datahub-as-a-service (after "ingesting" them)? Or do I have, in that case, to use the "pre-packaged" ones only?
    ✅ 1
  • s

    square-hair-99480

    06/30/2022, 4:40 PM
    Hi friends I have just started exploring DataHub so sorry if the question is very basic or repetitive. I was able to get DataHub up and running but I am not really sure where to start reading for connecting it. I would like to connect it specially to Snowflake, but Airflow will also be needed. Is there any tip on where to start in the docs or a straight forward tutorial?
    o
    • 2
    • 2
  • c

    cuddly-butcher-39945

    06/30/2022, 5:32 PM
    I just reset my local docker environment and trying the connection again. I’ll let you know.
    o
    • 2
    • 1
  • a

    adorable-addition-82413

    06/30/2022, 5:34 PM
    Hello people! In a kubernetes deploy the default kafka have only 1 pod, is this the best practice? Or is it better to have 1 kafka pod in every host?
  • f

    few-rainbow-57094

    06/30/2022, 7:38 PM
    Good day, I'm trying to perform my first ingestion of dbt data via datahub, but keep getting the error:
    Copy code
    [2022-06-30 19:34:16,095] WARNING  {urllib3.connectionpool:810} - Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09ce5c2100>: Failed to establish a new connection: [Errno 111] Connection refused')': //config
    Has this ever happened to anyone? Any solves?
    • 1
    • 1
  • p

    purple-analyst-83660

    06/30/2022, 8:40 PM
    Hello Everybody, I wanted to use datahub.cli.delete_cli from acryl-datahub library to delete some of the records for URN’s. Can anybody suggest some sample piece of code/method for its authentication to my datahub setup? How do we normally authenticate datahub.cli library methods? Thanks!
    l
    • 2
    • 1
  • a

    adventurous-apple-98365

    07/01/2022, 12:53 PM
    Does anyone know of a demo dataset on demo.datahubproject.io with queries metadata?
  • h

    handsome-alarm-6227

    07/04/2022, 10:00 AM
    Hi! On the dbt source page I see that it's possible to set up AWS credentials, but I don't see how to give credentials for GCP for instance. How are we supposed to give access to our dbt manifest files? (eg. if stored on Google Cloud Storage or on a VM instance volume)
    plus1 1
    b
    • 2
    • 2
  • m

    millions-sundown-65420

    07/04/2022, 3:22 PM
    Hi team. Is there any page which shows sample GraphQL queries that I could use to get metadata? thanks
    i
    • 2
    • 4
  • s

    square-hair-99480

    07/04/2022, 3:26 PM
    Hi friends from DataHub, so I just ingested all my Snowflake data into DataHub. Very nice, however for some reason I can not access any lineage information. I have the felling the Snowflake user/role I used to ingest the meta data did not have enough privileges. Is there I minimum set o privileges for a Snowflake user/role ingesting data so lineage data can be brought in?
    d
    • 2
    • 3
  • a

    abundant-breakfast-43931

    07/05/2022, 4:07 PM
    Hey! I am a beginner trying to understand how DataHub works. Can anyone give me a rough idea of how much it costs to operate a data catalog solution based on DataHub (sorry if there is any mistake in the wording of my question)?
    g
    • 2
    • 1
  • r

    ripe-breakfast-83546

    07/06/2022, 8:19 AM
    Hi everyone. Can you help me begin work with datahub. After installation I can not logon to UI with default credentials (datahub:datahub). I have error:
    Failed to log in! SyntaxError: JSON.parse: unexpected character at line 2 column 1 of the JSON data.
    I tried to define
    user.props
    (with line datahub:datahub) and restart docker, but I still have the same problem.
    b
    a
    b
    • 4
    • 5
  • c

    colossal-sandwich-50049

    07/06/2022, 1:16 PM
    Hi everyone, beginner question: can Datahub function as a schema registry in a production environment? I.e. does it have feature parity with something like confluent schema registry?
    s
    i
    +2
    • 5
    • 10
  • b

    brief-church-81973

    07/06/2022, 4:45 PM
    hello folks, I'm using datahub api's to feed the datahub from our various engines. While doing it, I create a DataJobInfo instance and add some custom properties to it. I was wondering if there is a way to show these properties in a formatted way on the datahub ui.(html format for example) Currently all properties are formatted as text and it looks ugly for our use case. The piece of code is like:
    Copy code
    DataJobInfo().setName(jobName).
    ...
    ...
    setCustomProperties(customProps); //customProps is a Stringmap here
    l
    • 2
    • 4
  • r

    rhythmic-stone-77840

    07/06/2022, 6:05 PM
    Hey all - we've populated datahub with table lineage and I want to be able to make a call to DataHub to get the downstream lineage given a specific table. I can't seem to find anything that would do that through the OpenApi or GraphQL Api setup. Is there something I'm missing here?
    i
    g
    f
    • 4
    • 8
  • w

    white-notebook-12278

    07/07/2022, 12:37 PM
    Hi everyone! I'm trying to use a GraphQL API for adding users to group. Body of my mutation request:
    mutation addGroupMembers {
    addGroupMembers(
    input: {
    groupUrn: "urn:li:corpgroup:Admins",
    userUrns: ["urn:li:corpuser:username"]
    }
    )
    }
    Response for it is following:
    {
    "data": {
    "addGroupMembers": true
    },
    "extensions": {}
    }
    But in fact group in DataHub is not being changed Version of DataHub is 0.8.40 Also, there are no any error messages in both frontend or gms services. Can you please tell, what might be wrong in that case?
    b
    • 2
    • 8
  • f

    few-rainbow-57094

    07/07/2022, 9:01 PM
    Good day everyone, I'm looking into implementing datahub in our architecture, but was wondering about modularity. Even though I'm fairly set on the product, there is still the possibility that we may want to migrate in the future Is it possible to export descriptions/owners/etc to another service? Where is this data stored? Would it be possible to re-ingest dbt descriptions inside our codebase?
    b
    • 2
    • 2
  • h

    helpful-librarian-40144

    07/08/2022, 2:25 AM
    how to backup metadata information in datahub in case we will not delete it by mistake?
    h
    b
    • 3
    • 3
  • c

    curved-musician-2321

    07/08/2022, 8:07 AM
    Hi Team. I am investigating the incremental update for Aspect. I want to know where I can find the corresponding technical documents? I only found a simple description document on the official website.(https://datahubproject.io/docs/what/delta/)
    b
    • 2
    • 4
  • s

    square-hair-99480

    07/08/2022, 12:56 PM
    Hello friends in my company we did an initial POC with DataHub and we are very excited. Now we are moving to a more ambitious test so we want to setup it in an EC2 in AWS and start using it more broadly across the company. Is there any specification for an ideal minimum setup for production use? Which is the ideal size for an EC3? Is the
    datahub docker quickstart
    setup enough or I should consider other ways?
    i
    b
    • 3
    • 8
  • b

    busy-airport-23391

    07/08/2022, 7:49 PM
    Hi all, I wasn't sure which channel to post this on, but I was wondering when datahub-client v0.8.40 will be available from the Maven Repository? Super excited to try the new kafka capabilities for the java emitter 🙂
    b
    • 2
    • 3
  • w

    wooden-london-40203

    07/09/2022, 10:53 AM
    Hi all, I want to add a new database entity to datahub which has a "contains" relationship with dataset ( like dashboard and chart ) and I want a tables tab with tables and some tables' features ( like dataset with schemas ). I'm new to datahub and I need help with adding this new entity to datahub as simple as possible. Is there a good solution to my problem?
    l
    b
    • 3
    • 7
  • f

    fancy-alligator-33404

    07/11/2022, 6:59 AM
    Hello, everyone! I have a question about the lineage of the datahub. I searched the information of the lineage and found out how to draw lineage automatically with airflow integration. But I'd like to know more how to automatically set lineage with logs etc. Especitally, if data is flowing in EAI, ETL, etc, how can I draw the lineages at once?? Is there a way to draw a non-manual lineage? Thanks in advance for your help!!!
    c
    • 2
    • 1
  • s

    square-hair-99480

    07/11/2022, 1:37 PM
    Just a quick question docker compose file should I use for a POC to production deploy. I found this one https://github.com/datahub-project/datahub/blob/master/docker/docker-compose.yml but I am not sure? (I am aware K8s is the best option but I do not have access to a K8s cluster)
    b
    • 2
    • 1
1...333435...80Latest