https://datahubproject.io logo
Join Slack
Powered by
# integrate-databricks-datahub
  • l

    little-megabyte-1074

    02/11/2022, 11:08 PM
    set the channel description: Central channel to collaborate on Databricks integration
  • b

    bumpy-furniture-4631

    02/14/2022, 3:01 AM
    Hey Maggie, thanks for setting this up. Can you help me understand the current status of the integration? Are there specific things needed from the Databricks team for this(I can ask folks at Databricks to prioritize it from our end)?
  • c

    careful-pilot-86309

    02/14/2022, 12:11 PM
    @bumpy-furniture-4631 Though set up is little bit complicated, currently we are able to push lineages from databricks to datahub. We would like to try this on real time setup and see how usable current mappings ( like pipeline name) are.
  • l

    little-megabyte-1074

    02/14/2022, 3:00 PM
    <!channel> Hello, folks! Hope you all had a wonderful weekend 🙂 Excited to announce that @careful-pilot-86309 & @elegant-doctor-86344 are working on a Databricks <> DataHub integration. We’re eager for folks in the Community to test out the integration & to provide feedback early on! If you’re able to help us out, please let us know!
    🙌 1
  • p

    prehistoric-room-17640

    02/14/2022, 3:00 PM
    WOOOO HOOO!
  • p

    prehistoric-room-17640

    02/14/2022, 3:02 PM
    Absolutely. Let me know how.
  • q

    quiet-kilobyte-82304

    02/14/2022, 3:02 PM
    Let me know as well. Is there a writeup on how this might work?
  • b

    bumpy-furniture-4631

    02/14/2022, 3:15 PM
    @careful-pilot-86309 @little-megabyte-1074 I can help with the integration testing. Please point me to the docs
  • l

    loud-island-88694

    02/14/2022, 3:24 PM
    Hello All - to clarify the initial scope of work, @careful-pilot-86309 has been working on Databricks spark lineage. We will update the documentation soon. Support for deltalake, notebooks etc. will come in the future
    👍 2
  • c

    careful-pilot-86309

    02/16/2022, 6:29 AM
    Hello All - Appreciate your enthusiastic response. I am attaching usage instructions and jar. Please try it and let us know your feedback. Please note that this is basic beta version.
    DATABRICKS_README.pdfdatahub-spark-lineage-databricks.jar
    teamwork 1
  • p

    prehistoric-room-17640

    03/09/2022, 3:18 AM
    are there any updates on deltalake support?
    plus one 1
    l
    • 2
    • 2
  • l

    lemon-terabyte-66903

    03/10/2022, 4:11 PM
    Hello, Is there support for
    databricks
    platform in lineage? I would like to have a custom lineage with s3 datasets and databricks jobs.
  • c

    careful-pilot-86309

    03/10/2022, 6:40 PM
    Right now, we are supporting hdfs and few jdbc sources ( hive,oracle,mysql etc) on databricks
  • c

    careful-pilot-86309

    03/10/2022, 6:42 PM
    Bellow are usage instruction for datahub-databricks: https://datahubspace.slack.com/files/U02HE6R3F5L/F0339SXFSJF/databricks_readme.pdf https://files.slack.com/files-pri/TUMKD5EGJ-F033NFEFR97/download/datahub-spark-lineage-databricks.jar Let me know if you are trying this. I can help with setup
    datahub-spark-lineage-databricks.jarDATABRICKS_README.pdf
    l
    p
    m
    • 4
    • 8
  • m

    modern-belgium-81337

    04/27/2022, 10:32 PM
    Copy code
    master î‚° databricks fs --overwrite datahub-spark-lineage*.jar dbfs:/datahub
    Usage: databricks fs [OPTIONS] COMMAND [ARGS]...
    Try 'databricks fs -h' for help.
    
    Error: No such option: --overwrite Did you mean --version?
    Hi, I’m trying to follow the doc here but it seems like the command hasn’t been updated?
  • c

    careful-pilot-86309

    04/28/2022, 3:39 PM
    --overwrite is just an optionto overwrite it if that jar is present before. If not supported on your environment, you can skip it.
    teamwork 1
  • c

    careful-pilot-86309

    04/28/2022, 3:43 PM
    I have created document with databricks cli Version 0.16.3
  • c

    creamy-tent-10151

    07/29/2022, 5:33 PM
    Hi all, is there a way to change the spark task name? right now it's just picking up my queries and using that as the name
    c
    • 2
    • 1
  • s

    silly-finland-62382

    08/26/2022, 9:19 AM
    Hey, can someone help me to build spark lineage on databricks ?
  • b

    bumpy-furniture-4631

    09/04/2022, 10:19 PM
    Hey Guys, are there any plans to ingest Data Lineage from Databricks Unity Catalog? The feature is currently in Private Preview. And they have APIs to export the lineage info btw.
    c
    n
    • 3
    • 3
  • l

    loud-island-88694

    09/05/2022, 8:15 PM
    @bumpy-furniture-4631 this is on our roadmap. Contributions are welcome if you have the bandwidth
    ❤️ 1
    b
    • 2
    • 1
  • c

    careful-action-61962

    09/30/2022, 9:31 AM
    You can do this. Create a cluster in Single User Mode, It connects to unity catalog. Create a Personal Token for that user and configure it in datahub using Hive connector.
    Copy code
    spark.databricks.sql.initial.catalog.name <unity catalog name>
    add this in your spark cluster config and you're good to go. Please make sure your user has select permission on tables. If not, run this:
    Copy code
    catalogs = spark.sql('show catalogs;');
    for catalog in catalogs.toPandas()['catalog']:
      if catalog in ['default', 'samples']:
        continue
      print(catalog)
      use_catalog = f"USE CATALOG {catalog};"
      print(use_catalog)
      spark.sql(use_catalog);
      show_db = f"SHOW DATABASES;"
      print(show_db)
      dbs = spark.sql(show_db);
      for db in dbs.toPandas()['databaseName']:
        spark.sql(f"grant usage on database {db} to `datahub`;")
        if db in ['temp_notebooks', 'temp']:
          continue
        show_table = f"SHOW TABLES IN {db};"
        tables = spark.sql(show_table);
        for idx, row in tables.toPandas().iterrows():
          table = row['database'] + "." + row['tableName']
          grant_query = f'grant select on table {table} to `datahub`;'
          print(grant_query)
          spark.sql(grant_query);
  • n

    numerous-yak-58823

    10/03/2022, 2:38 PM
    Hello, I was reading the documentation: https://datahubproject.io/docs/metadata-integration/java/spark-lineage/ And it says: Note that testing for other environments such as Databricks is planned in near future. Do you happen to know when Databricks will be officially supported?
    m
    • 2
    • 3
  • h

    hallowed-shampoo-52722

    02/09/2023, 6:02 PM
    Hi Team, We have integrated databricks with datahub.. I have created a recipe databricks+pyhive Spark agent is installed but I dont see pipeline tasks. Any idea why that’s happening?
  • h

    hallowed-shampoo-52722

    02/13/2023, 9:27 PM
    Hi Guys, Could you please help me with the lineage here!!
  • f

    fierce-animal-98957

    04/25/2023, 5:47 AM
    Hi, We are validating data using Great Expectations inside a Databricks Notebook, and now trying to integrate DataHub in the same Notebook. Is that even possible to connect to DataHub from with in Great Expectations, and everything running inside one single Databricks notebook?
  • f

    fierce-animal-98957

    05/02/2023, 4:25 PM
    Hi Team, We are using “DataHubValidationAction” to send assertions metadata to DataHub. We are running this from inside Databricks that uses Spark engine. From the documentation, this currently works only with “SqlAlchemyExecutionEngine”. Do anyone of you know when this class will be enhanced to add Spark engine support? Anything on the roadmap? https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/#capabilities https://docs.greatexpectations.io/docs/integrations/integration_datahub/
  • g

    gentle-arm-6777

    06/29/2023, 3:58 PM
    Hi Guys! I set up databricks spark lineage, but i dont see any pipeline on datahub after execute a notebook on databricks. The logs shows that listener ir registrated, but any datahub emitter message is not exists on logs. Any help?
    d
    • 2
    • 2
  • b

    bulky-shoe-65107

    10/16/2023, 12:38 AM
    has renamed the channel from "integration-databricks-datahub" to "integrate-databricks-datahub"
  • f

    few-piano-98292

    03/06/2024, 4:28 PM
    Hello, would appreciate any feedback/help to move forward!