https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • b

    brainy-tailor-93048

    02/09/2023, 11:58 AM
    Hey 👋 Wanted to bump this thread here. Regarding the Environment field (which currently is constrained to a fixed enum set), the docs say that:
    Copy code
    Note that this field will soon be deprecated in favor of a more standardized concept of Environment
    Is there anything that could be shared, RFC or discussion, about what the plans are for this field, even if tentative? Will be very excited to take advantage of a more general environment implementation when it arrives!
    a
    • 2
    • 3
  • k

    kind-kite-29761

    02/09/2023, 12:56 PM
    Hi team, one query here !! Once I ingest data from a data source like athena or postgres or any DB system and schedule it. Is datahub able to show version changes of table i.e. any update in columns , addition , deletion etc. schema changes between schedules. Or it will just show us the final verison ?
    ✅ 1
    b
    a
    • 3
    • 3
  • l

    lemon-daybreak-58504

    02/09/2023, 1:11 PM
    is there a way to conect the ingestion source using ssl credentials on cloud sql postgress?
    s
    • 2
    • 16
  • l

    lemon-daybreak-58504

    02/09/2023, 1:17 PM
    and i'm having problems with the recipe for the ingestion by mysql
  • l

    lemon-daybreak-58504

    02/09/2023, 1:18 PM
    datahub can't find the directory for the ssl credentials
  • a

    alert-fall-82501

    02/09/2023, 1:54 PM
    Hi Team - I have integrated okta to datahub and I am facing some issue which is when integrated with okta some people are seeing an infinite browser refresh issue. Can anybody suggest on this ?
    b
    e
    • 3
    • 4
  • t

    tall-pizza-132

    02/09/2023, 3:31 PM
    Hello everyone, just wondering if anyone tried to connect with Exasol using SqlAlchemy in the past? I'm working with the receipt and I'm getting this error:
    ["Tables error: 'pyodbc.Row' object has no attribute 'table_name'"]}
    Any advice or help?
    Thanks
    a
    • 2
    • 8
  • v

    victorious-evening-88418

    02/09/2023, 5:32 PM
    Dear all, I'm trying to ingest metadata from one of our Storage Account using the following recipe: --- source: type: azure config: env: TEST base_path: ############# container_name: ############# account_name: ############# sas_token: '#############' sink: type: "datahub-rest" config: server: "http://localhost:8080" ...but I receive the following error: "PipelineInitError: Failed to find a registered source for type azure: 'Did not find a registered class for azure'" I tried to understand something more reading "https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/azure/azure_common.py" but unfortunately the error persist eveb after many tentative I did. Thanks in advance for your support.
    ✅ 1
    d
    • 2
    • 3
  • s

    salmon-spring-51500

    02/09/2023, 6:53 PM
    Hi, is there any reference to ingest Mysql metadata through Python emitter?
    ✅ 1
    a
    • 2
    • 2
  • c

    chilly-ability-77706

    02/09/2023, 9:49 PM
    Hello All, I am getting error with Power BI ingestion , attaching the logs here and looking for some guidance.Thanks in advance!!!
    powerbi.log
    a
    d
    g
    • 4
    • 16
  • b

    bland-lighter-26751

    02/10/2023, 12:02 AM
    Hey, having a weird issue. I wanted to start from scratch with Metabase ingestion. I deleted everything with:
    Copy code
    datahub delete --entity_type dashboard --platform metabase --hard
    datahub delete --entity_type chart --platform metabase --hard
    Then I reingested, watched the logs pick up everything, but the UI doesn't show Metabase assets anywhere. Help?
    a
    e
    • 3
    • 13
  • m

    many-helicopter-71451

    02/10/2023, 12:24 AM
    Hi all, we use dbt-core with AWS MWAA (Managed Airflow). I've ingestion our Glue jobs/tables as well as Redshift. Now I'd like to add in dbt and I'm not finding how to do this. Can anyone please help. TIA, Chris
    a
    • 2
    • 2
  • s

    salmon-spring-51500

    02/10/2023, 1:08 AM
    Hi, how does Kafka connect source connector work?
    ✅ 1
    a
    • 2
    • 1
  • b

    best-planet-6756

    02/10/2023, 1:31 AM
    Hi All, I am ingesting and profiling an Oracle DB and get this error after some time:
    Copy code
    Task exited with return code Negsignal.SIGSEGV
    Searching the error it mentions to increase resources on your cluster but monitoring the resources on GKE I don't see anything out of the ordinary. Any advise?
    g
    • 2
    • 1
  • f

    fierce-baker-1392

    02/10/2023, 2:19 AM
    Hi team, i want UI ingestion to ingest glossary, how to upload & config file? (We use kubernetes to deploy datahub). Is there any use example to share? thanks.
    e
    b
    g
    • 4
    • 20
  • g

    great-notebook-53658

    02/10/2023, 6:31 AM
    Hi, DataHub newbie here. I have managed to get table lineage between S3 and Snowflake but unfortunately, I do not see column level lineage between S3 and Snowflake. I have already set “upstream_lineage_in_report” and “include_column_lineage” to true using the snowflake recipe. May I know if it is possible at all to get this S3-Snowflake lineage using the snowflake ingestion? Thanks!
    ✅ 1
    g
    h
    • 3
    • 4
  • b

    billions-family-12217

    02/10/2023, 7:08 AM
    datahub-ingestion-cron: enabled: true crons: mysql: schedule: "0 * * * *" # Every hour recipe: configmapName: recipe-config fileName: mysql_recipe.yml is this not working can any One help me out
    ✅ 1
    g
    a
    b
    • 4
    • 3
  • r

    ripe-eye-60209

    02/10/2023, 8:51 AM
    https://github.com/datahub-project/datahub/issues/7307
    ✅ 1
    g
    • 2
    • 2
  • s

    square-football-37770

    02/10/2023, 8:57 AM
    Hi! Loving datahub! Surely this has come up before, is there a way to set SQLAlchemy's
    timeout
    ? If not I can use filters to ingest databases from a server in batches, but increasing timeout I could interest all the DBs on a server in one go. Thanks!
    ✅ 2
    g
    • 2
    • 3
  • a

    agreeable-cricket-61480

    02/10/2023, 10:04 AM
    Hi, I am trying to schedule the metadata ingestion from snowflake to datahub through CLI apiVersion: v1 kind: ConfigMap metadata: name: recipe-config data: mysql_recipe.yml: |- source: type: mysql config: # Coordinates host_port: <MYSQL HOST>:3306 database: dbname # Credentials username: root password: example sink: type: datahub-rest config: server: http//&lt;GMS HOST&gt;8080 This is the code I have seen to schedule using kubernetes from https://datahubproject.io/docs/metadata-ingestion/schedule_docs/kubernetes data: mysql_recipe.yml: |- In the above code what should I place in the place of "|-" datahub-ingestion-cron: enabled: true crons: mysql: schedule: "0 * * * *" # Every hour recipe: configmapName: recipe-config fileName: snowflake.dhub.yml also in the above values.yaml code they have used mysql: as I am ingesting data from snowflake should I rename it with snowflake: or not Help me to schedule the metadata ingestion from snowflake to datahub through CLI. I am using windows
    ✅ 1
    👀 1
    a
    g
    i
    • 4
    • 11
  • q

    quick-megabyte-61846

    02/10/2023, 11:44 AM
    Hello, sorry if it’s in the wrong chat I’ve stepped upon this in action recipe:
    Authorization: "Basic ${DATAHUB_SYSTEM_CLIENT_ID:-__datahub_system}:${DATAHUB_SYSTEM_CLIENT_SECRET:-JohnSnowKnowsNothing}"
    source And this in
    acryl-datahub-action
    pod deployed by helm chart
    DATAHUB_SYSTEM_CLIENT_ID
    Copy code
    The internal system id that is used to communicate with DataHub GMS. Required if metadata_service_authentication is 'true'.
    Where should I find this internal system id?
    ✅ 1
    s
    • 2
    • 3
  • b

    bitter-evening-61050

    02/10/2023, 12:36 PM
    Hi , I tried to connect datahub to databricks with unity catalog .It is showing the connection is successful but not able to get any event or tables. I have followed all the rules while implementing the unity catalog . source: type: unity-catalog config: workspace_url: 'https://xxxx.azuredatabricks.net/' include_table_lineage: true include_column_lineage: true token: xxxx sink: type: datahub-rest config: server: http://xxx token: xxxx Error: {datahub.ingestion.source.unity.proxy:140} - Metastores not found Cli report: {'cli_version': '0.9.5', 'cli_entry_location': 'xxxxxx', 'py_version': 'xxx', 'py_exec_path': 'xxxx', 'os_details': 'xxxx', 'mem_info': '70.18 MB'} Source (unity-catalog) report: {'events_produced': '0', 'events_produced_per_sec': '0', 'event_ids': [], 'warnings': {}, 'failures': {}, 'soft_deleted_stale_entities': [], 'scanned_metastore': '0', 'scanned_catalog': '0', 'scanned_schema': '0', 'scanned_table': '0', 'num_catalogs_to_scan': {}, 'num_schemas_to_scan': {}, 'num_tables_to_scan': {}, 'tables_scanned': '0', 'views_scanned': '0', 'filtered': [], 'start_time': '2023-02-10 180402.472903 (2.26 seconds ago).', 'running_time': '2.26 seconds'} Sink (datahub-rest) report: {'total_records_written': '0', 'records_written_per_second': '0', 'warnings': [], 'failures': [], 'start_time': '2023-02-10 180400.948255 (3.79 seconds ago).', 'current_time': '2023-02-10 180404.736336 (now).', 'total_duration_in_seconds': '3.79', 'gms_version': 'v0.9.5', 'pending_requests': '0'} Pipeline finished successfully; produced 0 events in 2.26 seconds.
    g
    d
    +2
    • 5
    • 35
  • l

    lemon-scooter-69730

    02/10/2023, 4:18 PM
    Hi in trying to reach the datahub gms rest endpoint so that I can emit lineage data for airflow I keep getting not found... it looks something like this
    default backend - 404
    a
    • 2
    • 3
  • q

    quiet-beach-32846

    02/10/2023, 6:56 PM
    Hi - I'm getting started with DataHub and Snowflake and getting the ingest up and running. I've found some initial testing databases we created in Snowflake have table names that are invalid and cause the entire DataHub ingest to fail. While I've the Filter section to allow/deny ingest of content, is there a flag for the ingest in general to ignore when an object can not be accessed/ingested vs. failing the entire ingest run?
    ✅ 1
    g
    • 2
    • 3
  • s

    salmon-spring-51500

    02/10/2023, 7:14 PM
    Hi- While configuring kafka-connect, how to give a jwt token instead of username and password?
    thank you 1
    a
    w
    • 3
    • 3
  • w

    white-horse-97256

    02/10/2023, 10:12 PM
    Hi, is there a column-level lineage for mysql database?
    g
    a
    +3
    • 6
    • 6
  • a

    agreeable-cricket-61480

    02/13/2023, 5:32 AM
    Hi, I have ingested metadata from snowflake to datahub through CLI. I have deleted a few tables in snowflake but can see them in datahub. I re-run the command : python3 -m datahub ingest -c Desktop/example.dhub.yml The tables are not updating after chages I have made. Let me know what I am doing wrong use POC_DATAHUB_GX_DB; use warehouse POC_DATAHUB_GX_WH; grant role datahub_role to user IPSERVICE; grant operate, usage on warehouse POC_DATAHUB_GX_WH to role datahub_role; grant usage on DATABASE POC_DATAHUB_GX_DB to role datahub_role; grant usage on DATABASE ADQT_METADATA to role datahub_role; grant usage on all schemas in database POC_DATAHUB_GX_DB to role datahub_role; grant usage on future schemas in database POC_DATAHUB_GX_DB to role datahub_role; grant references on all tables in database POC_DATAHUB_GX_DB to role datahub_role; grant references on future tables in database POC_DATAHUB_GX_DB to role datahub_role; grant references on all external tables in database POC_DATAHUB_GX_DB to role datahub_role; grant references on future external tables in database POC_DATAHUB_GX_DB to role datahub_role; grant references on all views in database POC_DATAHUB_GX_DB to role datahub_role; grant references on future views in database POC_DATAHUB_GX_DB to role datahub_role; // If you ARE using Snowflake Profiling or Classification feature: Grant select privileges to your tables grant select on all tables in database POC_DATAHUB_GX_DB to role datahub_role; grant select on future tables in database POC_DATAHUB_GX_DB to role datahub_role; grant select on all external tables in database POC_DATAHUB_GX_DB to role datahub_role; grant select on future external tables in database POC_DATAHUB_GX_DB to role datahub_role; // Create a new DataHub user and assign the DataHub role to it create or replace user datahub_user display_name = 'DataHub' password='' default_role = datahub_role default_warehouse = 'POC_DATAHUB_GX_WH'; // Grant the datahub_role to the new DataHub user. grant role datahub_role to user datahub_user; grant usage on schema POC_DATAHUB_GX_DB.RAW to role datahub_role; grant all on user ipservice to role datahub_role; source: type: snowflake config: # This option is recommended to be used to ingest all lineage ignore_start_time_lineage: true # Coordinates account_id: "xyz.east-us-2.azure" warehouse: "POC_DATAHUB_GX_WH" # Credentials username: "xyzxyz" password: "xyzxyz" role: "datahub_role" profiling: # Change to false to disable profiling enabled: false # This option is recommended to reduce profiling time and costs. turn_off_expensive_profiling_metrics: false # Default sink is datahub-rest and doesn't need to be configured sink: type: datahub-rest config: server: :xyz token: token
    👀 1
    g
    • 2
    • 3
  • i

    incalculable-manchester-41314

    02/13/2023, 10:05 AM
    Hi, is it possible to split (partition) file into small parts before saving ?
    g
    • 2
    • 4
  • b

    bitter-evening-61050

    02/13/2023, 10:23 AM
    Hi , I tried to connect datahub to databricks with unity catalog .It is showing the connection is successful but not able to get any event or tables. I have followed all the rules while implementing the unity catalog . source: type: unity-catalog config: workspace_url: 'https://xxxx.azuredatabricks.net/' include_table_lineage: true include_column_lineage: true token: xxxx sink: type: datahub-rest config: server: http://xxx token: xxxx Error: {datahub.ingestion.source.unity.proxy:140} - Metastores not found Cli report: {'cli_version': '0.9.5', 'cli_entry_location': 'xxxxxx', 'py_version': 'xxx', 'py_exec_path': 'xxxx', 'os_details': 'xxxx', 'mem_info': '70.18 MB'} Source (unity-catalog) report: {'events_produced': '0', 'events_produced_per_sec': '0', 'event_ids': [], 'warnings': {}, 'failures': {}, 'soft_deleted_stale_entities': [], 'scanned_metastore': '0', 'scanned_catalog': '0', 'scanned_schema': '0', 'scanned_table': '0', 'num_catalogs_to_scan': {}, 'num_schemas_to_scan': {}, 'num_tables_to_scan': {}, 'tables_scanned': '0', 'views_scanned': '0', 'filtered': [], 'start_time': '2023-02-10 180402.472903 (2.26 seconds ago).', 'running_time': '2.26 seconds'} Sink (datahub-rest) report: {'total_records_written': '0', 'records_written_per_second': '0', 'warnings': [], 'failures': [], 'start_time': '2023-02-10 180400.948255 (3.79 seconds ago).', 'current_time': '2023-02-10 180404.736336 (now).', 'total_duration_in_seconds': '3.79', 'gms_version': 'v0.9.5', 'pending_requests': '0'} Pipeline finished successfully; produced 0 events in 2.26 seconds. (edited) Debug logs: 2023-02-10 184727,737] DEBUG {datahub.telemetry.telemetry:208} - Sending init Telemetry [2023-02-10 184728,964] DEBUG {datahub.telemetry.telemetry:241} - Sending Telemetry [2023-02-10 184729,486] INFO {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.9.5 [2023-02-10 184731,161] DEBUG {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config [2023-02-10 184731,162] DEBUG {datahub.ingestion.sink.datahub_rest:118} - Setting gms config [2023-02-10 184731,166] DEBUG {datahub.ingestion.run.pipeline:178} - Sink type datahub-rest (<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'>) configured [2023-02-10 184731,177] INFO {datahub.ingestion.run.pipeline:179} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://xxxx with token: xxxxx [2023-02-10 184731,749] DEBUG {datahub.ingestion.sink.datahub_rest:116} - Setting env variables to override config [2023-02-10 184731,749] DEBUG {datahub.ingestion.sink.datahub_rest:118} - Setting gms config [2023-02-10 184731,750] DEBUG {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:123} - Ingestion source urn = urnlidataHubIngestionSource:cli-xxxxx [2023-02-10 184731,751] DEBUG {datahub.emitter.rest_emitter:250} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.28.2' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: Bearer xxxx' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urnlidataHubIngestionSource:cli-xxxxx", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] unity-catalog\", \"type\": \"unity-catalog\", \"platform\": \"urnlidataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"unity-catalog\\\", \\\"config\\\": {\\\"workspace_url\\\": \\\"https://xxxx/\\\", \\\"token\\\": \\\"********\\\", \\\"include_table_lineage\\\": true, \\\"include_column_lineage\\\": true, \\\"schema_pattern\\\": {\\\"deny\\\": [\\\"information_schema\\\"]}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://xxxx\\\", \\\"token\\\": \\\"********\\\"}}}\", \"version\": \"0.9.5\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' 'http://xxxx/aspects?action=ingestProposal' [2023-02-10 184732,047] DEBUG {datahub.ingestion.run.pipeline:253} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured. [2023-02-10 184732,154] DEBUG {datahub.ingestion.run.pipeline:195} - Source type unity-catalog (<class 'datahub.ingestion.source.unity.source.UnityCatalogSource'>) configured [2023-02-10 184732,155] INFO {datahub.ingestion.run.pipeline:196} - Source configured successfully. [2023-02-10 184732,160] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion -[2023-02-10 184732,176] DEBUG {datahub.emitter.rest_emitter:250} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.28.2' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: Bearer xxxx' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:unity-catalog-2023_02_10-18_47_29", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestInput", "aspect": {"value": "{\"task\": \"CLI Ingestion\", \"args\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"unity-catalog\\\", \\\"config\\\": {\\\"workspace_url\\\": \\\"https://xxxx/\\\", \\\"token\\\": \\\"********\\\", \\\"include_table_lineage\\\": true, \\\"include_column_lineage\\\": true, \\\"schema_pattern\\\": {\\\"deny\\\": [\\\"information_schema\\\"]}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://xxxx\\\", \\\"token\\\": \\\"********\\\"}}}\", \"version\": \"0.9.5\"}, \"executorId\": \"__datahub_cli_\", \"source\": {\"type\": \"CLI_INGESTION_SOURCE\", \"ingestionSource\": \"urnlidataHubIngestionSource:cli-xxx\"}, \"requestedAt\": 1676035052176}", "contentType": "application/json"}}}' 'http://xxxx/aspects?action=ingestProposal' /[2023-02-10 184733,497] INFO {datahub.ingestion.source.unity.proxy:140} - Metastores not found [2023-02-10 184733,503] DEBUG {datahub.emitter.rest_emitter:250} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.28.2' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' -H 'Authorization: Bearer xxxx' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:unity-catalog-2023_02_10-18_47_29", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestResult", "aspect": {"value": "{\"status\": \"SUCCESS\", \"report\": \"{\\n \\\"cli\\\": {\\n \\\"cli_version\\\": \\\"0.9.5\\\",\\n \\\"cli_entry_location\\\": \\\"C:\\\\\\\\Users\\\\\\\\xxxxx\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Programs\\\\\\\\Python\\\\\\\\Python311\\\\\\\\Lib\\\\\\\\site-packages\\\\\\\\datahub\\\\\\\\__init__.py\\\",\\n \\\"py_version\\\": \\\"3.11.2 xxx]\\\",\\n \\\"py_exec_path\\\": \\\"C:\\\\\\\\Users\\\\\\\\xxx\\\\\\\\AppData\\\\\\\\Local\\\\\\\\Programs\\\\\\\\Python\\\\\\\\Python311\\\\\\\\python.exe\\\",\\n \\\"os_details\\\": \\\"xxxx\\\",\\n \\\"mem_info\\\": \\\"69.22 MB\\\"\\n },\\n \\\"source\\\": {\\n \\\"type\\\": \\\"unity-catalog\\\",\\n \\\"report\\\": {\\n \\\"events_produced\\\": \\\"0\\\",\\n \\\"events_produced_per_sec\\\": \\\"0\\\",\\n \\\"event_ids\\\": [],\\n \\\"warnings\\\": {},\\n \\\"failures\\\": {},\\n \\\"soft_deleted_stale_entities\\\": [],\\n \\\"scanned_metastore\\\": \\\"0\\\",\\n \\\"scanned_catalog\\\": \\\"0\\\",\\n \\\"scanned_schema\\\": \\\"0\\\",\\n \\\"scanned_table\\\": \\\"0\\\",\\n \\\"num_catalogs_to_scan\\\": {},\\n \\\"num_schemas_to_scan\\\": {},\\n \\\"num_tables_to_scan\\\": {},\\n \\\"tables_scanned\\\": \\\"0\\\",\\n \\\"views_scanned\\\": \\\"0\\\",\\n \\\"filtered\\\": [],\\n \\\"start_time\\\": \\\"2023-02-10 184732.153634 (1.35 seconds ago).\\\",\\n \\\"running_time\\\": \\\"1.35 seconds\\\"\\n }\\n },\\n \\\"sink\\\": {\\n \\\"type\\\": \\\"datahub-rest\\\",\\n \\\"report\\\": {\\n \\\"total_records_written\\\": \\\"0\\\",\\n \\\"records_written_per_second\\\": \\\"0\\\",\\n \\\"warnings\\\": [],\\n \\\"failures\\\": [],\\n \\\"start_time\\\": \\\"2023-02-10 184730.637389 (2.86 seconds ago).\\\",\\n \\\"current_time\\\": \\\"2023-02-10 184733.500759 (now).\\\",\\n \\\"total_duration_in_seconds\\\": \\\"2.86\\\",\\n \\\"gms_version\\\": \\\"v0.9.5\\\",\\n \\\"pending_requests\\\": \\\"0\\\"\\n }\\n }\\n}\", \"startTimeMs\": 1676035051751, \"durationMs\": 1749}", "contentType": "application/json"}}}' 'http://xxxx/aspects?action=ingestProposal' |[2023-02-10 184733,777] INFO {datahub.cli.ingest_cli:133} - Finished metadata ingestion [2023-02-10 184733,778] DEBUG {datahub.telemetry.telemetry:241} - Sending Telemetry - Cli report: {'cli_version': '0.9.5', 'cli_entry_location': 'xxx', 'py_version': '3.11.2 (xxx', 'py_exec_path': 'xxx', 'os_details': 'xxx', 'mem_info': '69.35 MB'} Source (unity-catalog) report: {'events_produced': '0', 'events_produced_per_sec': '0', 'event_ids': [], 'warnings': {}, 'failures': {}, 'soft_deleted_stale_entities': [], 'scanned_metastore': '0', 'scanned_catalog': '0', 'scanned_schema': '0', 'scanned_table': '0', 'num_catalogs_to_scan': {}, 'num_schemas_to_scan': {}, 'num_tables_to_scan': {}, 'tables_scanned': '0', 'views_scanned': '0', 'filtered': [], 'start_time': '2023-02-10 184732.153634 (2.1 seconds ago).', 'running_time': '2.1 seconds'} Sink (datahub-rest) report: {'total_records_written': '0', 'records_written_per_second': '0', 'warnings': [], 'failures': [], 'start_time': '2023-02-10 184730.637389 (3.62 seconds ago).', 'current_time': '2023-02-10 184734.260004 (now).', 'total_duration_in_seconds': '3.62', 'gms_version': 'v0.9.5', 'pending_requests': '0'} Pipeline finished successfully; produced 0 events in 2.1 seconds. [2023-02-10 184734,440] DEBUG {datahub.upgrade.upgrade:264} - Version stats found: server=ServerVersionStats(current=VersionStats(version=<Version('0.9.5')>, release_date=None), latest=VersionStats(version=<Version('0.10.0')>, release_date=datetime.datetime(2023, 2, 7, 21, 16, 15, tzinfo=datetime.timezone.utc)), current_server_type='prod') client=ClientVersionStats(current=VersionStats(version=<Version('0.9.5')>, release_date=datetime.datetime(2022, 12, 23, 20, 59, 55)), latest=VersionStats(version=<Version('0.10.0')>, release_date=datetime.datetime(2023, 2, 8, 15, 40, 32))) [2023-02-10 184734,445] DEBUG {datahub.telemetry.telemetry:241} - Sending Telemetry (edited)
    d
    g
    • 3
    • 9
  • c

    clean-doctor-27061

    02/13/2023, 3:31 PM
    Hi team, I am using the DatahubRestEmitter to emit DataFlow and DataJob to our local Datahub using Airflow but I keep getting this error below.
    Copy code
    datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"MetadataChangeLog","namespace":"com.linkedin.pegasus2avro.mxe","doc":"Kafka event for capturing update made to an entity\'s metadata.","fields":[{"name":"auditHeader","type":["null",{"type":"record","name":"KafkaAuditHeader","namespace":"com.linkedin.events","doc":"This header records information about the context of an event as it is emitted into kafka and is intended to be used by the kafka audit application.
    d
    • 2
    • 1
1...102103104...144Latest