• r


    11/16/2022, 1:17 PM
    Hello Team, any docs or examples on how to customize datahub frontend e.g., add new information tabs to a dataset or domain programmatically?
  • m


    11/17/2022, 12:39 AM
    Hi team. Is there a sample athena ingestion that I can reference? I followed documentation and setup and ingestion but 0 assets are ingested. I did not find any samples in
  • m


    11/17/2022, 12:42 AM
    we have data in s3 and use glue for transformation. will datahub show lineage in this case?
  • s


    11/17/2022, 6:51 AM
    how do you guys manage data dictinary?
  • t


    11/17/2022, 8:00 AM
    hello, everyone! It seems that for some ingestors datahub fills in Containers, and for others - only datasets. Could someone explain, why is it so? in UI we've got as a result different levels of data representation for Containers and Datasets, and we consider if it should be changed or not
  • a


    11/17/2022, 8:06 PM
    Hello team, I am trying to add different aspects on the dataset and field level but couldn’t find the python SDK to add column level description and domain for the dataset. Can someone help me with this?
  • a


    11/18/2022, 1:32 AM
    Hello. I wonder the process where metedata ingested from recipe and column description, table description, documentation typed by user who are given 'editor' role. This question is for 'backup'. when server where datahub quckstart is running is down or docker is down for changing configuration, I guess those (metadata or written descriptions) may be removed. Therefore, backup is essential and to do this it is important where they are stored. Can they be stored in my local? Or other containers?
  • d


    11/18/2022, 4:56 AM
    Hello, I saw “To capture lineage across Glue jobs and databases, a requirements must be met – otherwise the AWS API is unable to report any lineage. The job must be created in Glue Studio with the “Generate classic script” option turned on (this option can be accessed in the “Script” tab). Any custom scripts that do not have the proper annotations will not have reported lineage.” on if I have a glue job with custom scripts, how can I get the lineage manually?
  • j


    11/18/2022, 2:25 PM
    hi folks, i just set datahub up locally and am trying to ingest from BigQuery. the ingest job has been running all night and the logs say:
    WARNING: These logs appear to be stale. No new logs have been received since 2022-11-18 04:21:17.792411 (36251 seconds ago). However, the ingestion process still appears to be running and may complete normally.
    is this normal?
  • w


    11/18/2022, 3:13 PM
    Hello everyone. I've setup a demo example with Datahub by following Docker quickstart. I've configured a couple of sources with the UI. When I reach the table detail page, I've the following screen. How can I configure the grayed out functionalities? Where are these info in the official documentation? Thanks 😃
  • f


    11/18/2022, 4:06 PM
    Hi everyone, I'm pretty new to this, but I'm trying to get QuickStart going on a windows 10 pro laptop...can you please take a look at this case?
  • a


    11/18/2022, 5:57 PM
    Hi everyone, I’m a data hub newbie. I was just wondering if it’s possible to add the domain name through python sdk? Can someone share any reference code to add the domain? Thanks in advance!🙂
  • e


    11/19/2022, 6:11 AM
    Hi I have signed up for an account, but I am unable to login. I received an email a while ago saying, “Hello there, we’ve just received your DataHub onboarding request. Please expect to hear from us shortly. Our customer success team will look at it to match your needs with a perfect recipe to bring you on board. We’re excited to see your interest in using the DataHub APIs!”
  • w


    11/20/2022, 7:22 PM
    Any help?
  • b


    11/22/2022, 1:54 PM
    Hi there, I was wondering how one can customize the type of meta-data to be collected (like extending the 'stats')? Let us say I would want to store the 100 unique values of a each column or a quantile sketch of a column.
  • f


    11/23/2022, 6:54 AM
    Hello everyone , My apologies since i'm posting this query again in this channel , we are trying to build datahub-frontend image using github-workflows and everytime the connection will timeout during "Task :datahub-web-react:yarnBuild", with below error message. Error Message : " The build failed because the process exited too early. This probably means the system ran out of memory or someone called
    kill -9
    on the process. info Visit for documentation about this command. error Command failed with exit code 1." one thing to be noted here is that we have added proxy setting in file , since build was unable to download gradle v6.9.2 in our environment . Any suggestion will be very helpful since we are stuck with this issue from few days. Attached is the docker build logs, along with github workflow
  • f


    11/23/2022, 7:15 AM
    Hi Guys, quick question, is there anyway to identify elasticsearch or neo4j is using as the graph service? I run the quickstart and I see both elasticsearch and neo4j containers are running. (As I know, we only need one?)
  • a


    11/23/2022, 10:16 AM
    Hi guys, I installed datahub on my infrastructure and connected a mssql-server with it. Is it possible to load all the SSIS in datahub and visualize it?
  • g


    11/23/2022, 2:04 PM
    Hi everyone, when I execute
    python3 -m datahub docker quickstart
    , the error occured. I checked my docker-compose version is v2 , but useless. I would appreciate it if you could give me some advice The error info:
  • q


    11/23/2022, 4:21 PM
    Hi Team, I am exploring datahub and did setup on my local machine, I am trying to connect it to BQ with SA but getting below error : "
    "[2022-11-23 16:16:59,218] ERROR    {datahub.entrypoints:206} - Command failed: ('Failed to load service account credentials from "
               "/tmp/tmpzoqmj48z', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an "
               "unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', "
               "[_OpenSSLErrorWithText(code=503841036, lib=60, reason=524556, reason_text=b'error:1E08010C:DECODER routines::unsupported')]))\n"
    I am passing credentials on below format, could you advice if I have to pass the SA key from cli, how could I pass it with the local setup ?
  • q


    11/23/2022, 4:21 PM
  • a


    11/23/2022, 6:35 PM
    Hello. I have deployed datahub on AWS following the provided guide on the website. However, it would seem that datahub-gms fails for some reason. datahub-frontend, trying to connect to it fails because the connection is refused, and while inspecting the logs in datahub-gms I found this. As context, I can say I am using a managed RDS instance, however from the logs it would appear that GMS can successfully connect to all dependencies needed. What could be the cause of this error? I am attaching the log file which contains only the erroneous part, which is quite long. If more context is needed, I am happy to provide the full log. thank you.
  • f


    11/24/2022, 4:04 AM
    Good day guys! quick question: is there anywhere we can check the date/plan of the next releases (and the changes if any)? Thank you!
  • h


    11/24/2022, 9:09 PM
    Hey all! I really like Datahub project but it seems that Spark is not yet well supported. I expected to see Hive tables with schemas and their lineages but I only see Spark Platform and Pipelines and Tasks. I can show Visualize Lineage for separate Tasks but I don't see their relations. From the Visualization I can find the Hive and File Datasets but not from the search. Those Datasets don't have schemas but the lineage shows the related upstream Task. Am I doing something wrong here or is there still lot of limitations for integrating Spark? I also tried using local parquet files instead of Hive but the result was very similar to the above. It would be great to have even column lineage for Spark but I understood that is still quite new feature. Code used for testing:
        {'col_1': 1, 'col_2': 2},
        {'col_1': 3, 'col_2': 4}
    spark.sql("SELECT col_1 + col_2 AS col_3 FROM dataset_a").write.mode("overwrite").saveAsTable('dataset_b')
    spark.sql("SELECT col_3 AS col_4 FROM dataset_b").write.mode("overwrite").saveAsTable('dataset_c')
    spark.sql("SELECT col_3 AS col_5 FROM dataset_b").write.mode("overwrite").saveAsTable('dataset_d')
    I tested with • jupyter/pyspark-notebook😒park-2 and 😒park3.1.2 • also tried running spark in master mode instead of local • spark.jars.packages=io.acryl:datahub-spark-lineage:0.8.23 and :0.8.45 and :0.9.2 • datahub docker quickstart --version v0.8.23 and v0.8.45 and v0.9.2
  • g


    11/28/2022, 6:00 AM
    Hello everyone . The column level lineage`s feature is amazing ! Are only Snowflake and Looker supported now ?
  • t


    11/28/2022, 1:32 PM
    hi everyone! may be someone can consult on the following: I want to add relations between columns and glossary terms using bulk upload of columns into datahub. Is there any from-the-box mechanics how to do this?
  • e


    11/28/2022, 2:06 PM
    Hi everyone, i need some help here. How can I change the MySQL storage backend host in datahub gms??
  • c


    11/28/2022, 3:33 PM
    HI! We want to use an already established postgres database instead of the one provisioned by prerequisites. Is it just to point gms to this DB and run the init scripts found here?