DataHub #ingestion

limited-forest-73733

09/06/2022, 8:10 AM

Hii everyone! I am getting compatibility issue while integrating great_expectations with airflow. Can someone please help me out @dazzling-judge-80093

enough-monitor-24292

09/06/2022, 11:48 AM

HI,

enough-monitor-24292

09/06/2022, 11:48 AM

I need help for fetching all users from datahub using api. Can anyone please help. Thanks

microscopic-mechanic-13766

09/06/2022, 12:05 PM

Hello again, I have ingested data from PostgreSQL with the profiling enabled. I was specting to have the "Queries" tab enabled as well as the "Stats", but the former one wasn't enabled. Is anything else needed to enable that tab?? Thanks in advance!!

helpful-london-56362

09/06/2022, 12:18 PM

Hello, I'm having these errors when I try to ingest some data. I'm using the form UI. This is on version 0.8.44

agreeable-army-26750

09/06/2022, 12:48 PM

Hi everyone! I am trying to extend the metadata entity model with forking the repository following this guide: https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/ (For testing I wanted to create a GlossaryTerm like entity (called DatasetMetadata) with the same fields and aspects, only the Key and the Info aspect was recreated as a separate file with the same body) I have done the following steps: 1. Created the Key aspect pdl based on GlossaryTerm 2. Created the new Info pdl based on GlossaryTerm 3. Created the entity in the registry yml 4. Ran: ./gradlew build 5. Redeployed gms service docker image I tried to call the openApi endpoint to create a dataseMetadata object with Postman (it is working with glossaryterm aspects) but it fails with 400 error code. The docker image throws a detailed error. How should I resolve this issue? Is there any step I am missing in order to achieve my goal? Thank you very much for helping!

full-chef-85630

09/06/2022, 1:14 PM

Hi，Ingest bigquery and add sharded_ table_ Pattern and the default value is used. An error is reported. Has anyone ever encountered it。@dazzling-judge-80093

Copy code

source:
  type: bigquery
  config:
    project_id: ${SOCIAL_INSIGHTS_BIGQUERY_ID}
    credential:
      project_id: ${SOCIAL_INSIGHTS_BIGQUERY_ID}
      private_key_id: ${SOCIAL_INSIGHTS_PRIVATE_KEY_ID}
      private_key: ${SOCIAL_INSIGHTS_PRIVATE_KEY}
      client_email: ${SOCIAL_INSIGHTS_CLIENT_EMAIL}
      client_id: ${SOCIAL_INSIGHTS_CLIENT_ID}
    sharded_table_pattern: ((.+)[_$])?(\d{4,10})$
    
sink: 
  type: datahub-rest
  config:
    server: ${DATAHUB_GMS}
    token: ${TOKEN}

alert-fall-82501

09/06/2022, 6:17 PM

Hi Team - getting this error message for ingesting atlas and olympus metadata from redshift to datahub

alert-fall-82501

09/06/2022, 6:17 PM

Copy code

psycopg2.errors.InsufficientPrivilege: permission denied for relation svv_table_info

alert-fall-82501

09/06/2022, 6:17 PM

can anybody suggest on this ?

some-hairdresser-53679

09/06/2022, 6:50 PM

Hello, do you have an example of python emitter for data lineage?

delightful-barista-90363

09/06/2022, 7:56 PM

hello, I am currently implementing the DatahubSparkListener. We have a token setup, but it gets logged in the spark submit command. Is there anyway to hide the token or configure the token through environment variables?

faint-translator-23365

06/07/2022, 7:14 AM

Hi I am trying to ingest ldap as source. I'm getting this error. Attached the logs and recipe.yaml

miniature-journalist-76345

09/07/2022, 6:58 AM

Hi, team. Is there a way to replace queries section for dataset when ingesting it? When you ingest new query, it doesn't replace existing queries, but appends to them.

bumpy-journalist-41369

09/07/2022, 7:10 AM

Hi, team. Is there a way to ingest data from all databases in a S3 data lake with one recipe, instead of making a recipe for each db ?

bland-orange-13353

09/07/2022, 7:50 AM

This message was deleted.

rich-policeman-92383

09/07/2022, 8:03 AM

Hello daatahub version: v0.8.41 While ingesting glossary terms : https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml, using command

# datahub ingest -c business_glossary.yml

we are getting below error:

Copy code

source
  value is not a valid dict (type=type_error.dict)
nodes
  extra fields not permitted (type=value_error.extra)
owners
  extra fields not permitted (type=value_error.extra)
url
  extra fields not permitted (type=value_error.extra)
version
  extra fields not permitted (type=value_error.extra)

melodic-beach-18239

09/07/2022, 9:30 AM

Hi, another question about mysql ingest. When i specify

include_views

true

. Why this ingestion ingest

information_schema

as my database’s view?

rhythmic-nest-54679

09/07/2022, 9:37 AM

why ui based ingestion need install python dependencies when executing，any way to install thoses packages mannuly？ I just test the latest version with quickstart compose file，without neo4j one

rich-policeman-92383

09/07/2022, 11:11 AM

Hello Does datahub support ES 8 or above. While trying to ingest glossary terms with ES 8.2.3 as Graph backend we are getting below error on MAE "unable to feed bulk request. no retires left. unable to parse response body for response." datahub version: v0.8.41

plain-farmer-27314

09/07/2022, 2:13 PM

Hey all - recently updated to a newer version (8.41) and am seeing the following error when trying to ingest LookML now (this wasn't happening on a previous version)

Copy code

ValueError: '/Users/zachary.bluhm/Dev/datahub/discord-looker-data-reporting/views/my_view.view.lkml' does not start with 'discord-looker-data-reporting'

Any pro tips to get my config updated so that we can carry on processing lookml? EDIT: Looks like our looker ingestion is broken as well. We were previously on version 0.8.39

modern-monitor-68945

09/07/2022, 4:49 PM

Hi! Does anybody know if there is a way to set datahub.cluster in bitnami airflow? It is configured via env vars and seems like it ignores datahub options.

rich-policeman-92383

09/08/2022, 8:25 AM

Hello How can we add Glossary term groups using business_glossary.yml

prehistoric-dream-67257

09/08/2022, 8:51 AM

Does datahub support ingesting kafka topic tags? Like Lenses describes here (We don't use Lenses)? https://lenses.io/blog/2021/04/apache-kafka-metadata-management/

many-hairdresser-79517

09/08/2022, 10:09 AM

Hello Team, I get this error when ingest metadata from clickhouse 'default.exchange_rate2eur': ["Ingestion error: Orig exception: Code: 47, e.displayText() = DB:Exception Missing columns: 'comment' while processing query: 'SELECT database, name AS table_name, comment, formatRow('JSONEachRow', engine, partition_key, sorting_key, primary_key, sampling_key, storage_policy, metadata_modification_time, total_rows, total_bytes, data_paths, metadata_path) AS properties FROM system.tables WHERE name NOT LIKE '.inner%'', required columns: 'comment' 'primary_key' 'engine' 'data_paths' 'name' 'metadata_modification_time' 'metadata_path' 'partition_key' 'sampling_key' 'storage_policy' 'total_bytes' 'sorting_key' 'database' 'total_rows', maybe you meant: ['primary_key','engine','data_paths','name','metadata_modification_time','metadata_path','partition_key','sampling_key','storage_policy','total_bytes','sorting_key','database','total_rows'] (version 21.3.2.5 (official build))\n"], yml file source: config: host_port: "xxxxxxxxxxxxxxxxx" password: xxxxxxxxxxxxxxxx username: xxxxxxxxxxxxxxxxxxx type: clickhouse Hope you guys can take a look, thank you so much

chilly-potato-57465

09/08/2022, 11:07 AM

Hello! I am trying to ingest a .csv from a bucket on a local S3 with a DataHub deployed on K8S. I am running the ingestion from the CLI (v 0.8.43.4) and can ping the S3 server from the machine which runs the CLI. I have a very simple recipe: source: type: s3 config: path_specs: - include: "{s3-server}/{bucket-name}/*.csv" env: "PROD" profiling: enabled: false # sink configs sink: type: "datahub-rest" config: server: "http://localhost:8080" The pipeline finishes successfully but generates no events. The source and sink reports from the ingest are empty except for the timestamps. I only see this error (ERROR {logger:26} - Please set env variable SPARK_VERSION) when I run with --debug but in my recipe profiling is disabled. Any advice on how to resolve this will be greatly appreciated! Thank you in advance!

lemon-engine-23512

09/08/2022, 11:26 AM

Hello, anyone has any reference for a dashboard push model

chilly-potato-57465

09/08/2022, 11:34 AM

Hello! I am ingesting from a mysql source where I have a table used in several views. After the ingestion the lineage tab for the table and views is empty while I would expect it will show them. How can I ingest/show the lineage?

bland-balloon-48379

09/08/2022, 12:27 PM

Hi everyone! I have a few ingestion runs I'd like to rollback and had a couple quick questions. I read through the documentation on rolling back ingestion batch runs and it's pretty easy; I was able to roll back two ingestions last week using these commands. However, it seems to only do a soft delete since the rows are still present in mysql and the

--hard

option does not appear to be compatible with the rollback command. Is there a way to do a hard delete when rolling back an ingestion run? If not, is this something we could expect to see in future releases? Thanks!

chilly-potato-57465

09/08/2022, 12:29 PM

Hello again! Got another question when ingesting from mysql. I am interested in dataset schema evolution and read about the TimelineAPI. I modified a schema of a table and ingested it again - I can see the modified schema in the UI but not the history of changes as here (https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,testTimelineDataset,PROD)/Schema?is_lineage_mode=false) I can see the changes for the table in the Open API, how to show them in the UI? Am I missing some plugin?