DataHub #ingestion

jolly-book-3043

11/18/2022, 9:38 PM

i’m kicking off manual ingestions in the UI using a local quickstart, and it says

Successfully submitted ingestion execution request!

but nothing is triggered. i’ve restarted all containers and still no dice. any tips?

red-waitress-53338

11/20/2022, 12:13 AM

Can someone please help me out with this, stuck on this for a quite a long time now? I am running datahub-gms and frontend locally inside a docker container, and exposed them using the port forwarding feature to ports 8080 and 9002 respectively. I am able to curl frontend but not gms. Ran the following command:

curl --location --request POST '<http://localhost:8080/entities?action=search>' \

-raw '{

"input":> --header 'X-RestLi-Protocol-Version: 2.0.0' \

"*",

"e> --header 'Content-Type: application/json' \

> --data-raw '{

>     "input": "*",

>     "entity": "dataset",

"sta>     "start": 0,

count":>     "count": 1000

> }'

Getting the error:

Copy code

{
  "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
  "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/entities'\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n",
  "message": "No root resource defined for path '/entities'",
  "status": 404
}

late-ability-59580

11/20/2022, 2:22 PM

Hi all! My ingested s3 metadata appears in the UI as multiple datasets, one per file. How can I specify a prefix (part of the path) that will be considered as the dataset including all the files?

billowy-pilot-93812

11/21/2022, 4:58 AM

Hi team, does Datahub's policies support that a group of user only see information in a specific domain, for example marketing teams just allow to see datasets and dashboard that belongs to marketing domain. If yes, how can i do this?

rich-policeman-92383

11/21/2022, 6:37 AM

Hi Please help me with this ingestion problem Problem: We have two ingestion recipe Recipe A: For complete oracle DB with stateful ingestion Recipe B: For a specific schema in oracle DB ingested by recipe A When recipe A runs it overwrites the columns for tables ingested by recipe B. Datahub version: v0.8.45 Datahub cli version: 0.9.1

modern-artist-55754

11/21/2022, 7:49 AM

Question regarding Stateful ingestion. Where is the state stored? If i have my ingestion job running on ephemeral container, would it affect the state file?

millions-carpet-50697

11/21/2022, 8:03 AM

Hello everyone. I've been suffered from ingesting data catalog from spark for a while. We use Spark session to run process in AWS Glue3.0(Python). When I tried to ingest spark catalog following https://datahubspace.slack.com/archives/CUMUWQU66/p1651750160522829?thread_ts=1651624835.071429&cid=CUMUWQU66, 2 errors raised as below. 1.NullPointerException

Copy code

2022-11-21 03:10:16,113 ERROR [spark-listener-group-shared] spark.DatahubSparkListener (DatahubSparkListener.java:onOtherEvent(273)): java.lang.NullPointerException
	at datahub.spark.DatasetExtractor.lambda$static$6(DatasetExtractor.java:147)
	at datahub.spark.DatasetExtractor.asDataset(DatasetExtractor.java:237)
	at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:114)
	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:350)
	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:262)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:12)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

2.UnsatisfiedLinkError

Copy code

Exception in thread "map-output-dispatcher-0" java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
	at com.github.luben.zstd.Zstd.setCompressionLevel(Native Method)
	at com.github.luben.zstd.ZstdOutputStream.<init>(ZstdOutputStream.java:67)
	at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
	at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:903)
	at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
	at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
	at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
	at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

And I found a Git issue exactly match the error I met, but seems it would not be fixed. https://github.com/datahub-project/datahub/issues/5979. Could anyone help? Thanks a lot.

✅ 1

cool-tiger-42613

11/21/2022, 8:43 AM

Hello, is there an example code for pushing the historical job runs like this example here? https://demo.datahubproject.io/tasks/urn:li:dataJob:(urn:li:dataFlow:(airflow,datahub_an[…]s_refresh,prod),run_data_task)/Runs?is_lineage_mode=false I found this example , but when I run it the

Time

is updated as current datetime rather than what I give as input. Can I get help on this?

✅ 1

average-baker-96343

11/21/2022, 8:48 AM

Hello,after setting Ingestion in the UI and configuring Schedule, error messages appear when running.

exec-urn_li_dataHubExecutionRequest_d44e3334-45b5-4e8c-81bc-1c6fce16823c.log

✅ 1

average-baker-96343

11/21/2022, 8:53 AM

this is my UI show that

creamy-pizza-80433

11/21/2022, 11:44 AM

Hello everyone, I've tried to ingest Hive with Profiling enabled but I got these error messages, does anyone know what seems the problem?

✅ 1

mammoth-gigabyte-6392

11/21/2022, 12:48 PM

Hello everyone, can anyone please help with data profiling for s3 source

✅ 1

rich-policeman-92383

11/21/2022, 1:09 PM

Hi Is there way to increase performance of ingestion. With current configuration it takes a day or more to ingest 14K tables from hive.

✅ 1

high-hospital-85984

11/21/2022, 5:33 PM

I want to ingest schema data with a pipeline, but enrich it afterwards with say a glossary term via the REST API. What’s the best way of doing this? One way of doing this is to use the

editableSchemaMetadata

but then we basically remove the ability to make UI edits (as they might get over written). It’s not a huge deal for us, but the approach feels like a hack. Any better ideas?

✅ 1

lively-dusk-19162

11/21/2022, 6:09 PM

Hello, I am facing the following issue when doing hard delete.

✅ 1

lively-dusk-19162

11/21/2022, 6:16 PM

Even there is no metadata it is showing me empty records of 370.

little-breakfast-38102

11/21/2022, 6:33 PM

Hi Team, we are using custom datahub-actions image for MSSQL ingestion. We were able to successfully ingest metadata, we are running into a crashloop error starting today (did not make any recent changes) to the image. Attaching error from actions pod as well cmd and entry point section from docker. Appreciate any help.

✅ 1

lemon-musician-50603

11/21/2022, 8:04 PM

Hi Team, I have an excel file with all the metadata details. can you please help how to ingest in datahub ? which source do i need to use?

👍 1

✅ 1

miniature-painting-28571

11/21/2022, 10:19 PM

hi - what SCANNER or Scanner FUNCTIONALITY is being used to find PII data with DataHub? Or is that something on the roadmap for the future?

✅ 1

miniature-painting-28571

11/21/2022, 10:19 PM

Thanks for your replies

modern-artist-55754

11/22/2022, 1:05 AM

Question regarding Column Level Lineage: • We use dbt for snowflake transformation. Some of the dbt models requires tmp table creation. Which means the derived model isn’t the direct child of the source table, no lineage is shown. I can only see the lineage at table level between

_dbt_tmp

table and the destination table. Is there any way to get around that? Before column level lineage, we just suppressed the

_dbt_tmp

tables completely, and use dbt for the table lineage, but now with column level linage the

_dbt_tmp

show up again and column level linage isn’t working because dbt has not been supported.

✅ 1

ancient-policeman-73437

11/22/2022, 7:23 AM

It is just a reminder. We have a bug in Looker ingestion 0.9.x. Could somebody help please?

limited-forest-73733

11/22/2022, 7:45 AM

Hey team i am not able to see any table level lineage. This is the recipe that i am using. Can anyone please help me out. Thanks in advance

✅ 1

steep-family-13549

11/22/2022, 10:16 AM

I had integrated Great Expectations in datahub but I get this error " Datasource my_postgres_db is not present in platform_instance_map" any one help me out. "my_postgres_db " is the name of the data source in the great expectations.yml. Please let me know would find something.

✅ 1

kind-scientist-44426

11/22/2022, 10:57 AM

Hi All, I’m trying to ingest the metadata from BQ. But i’m getting error as below

Copy code

[2022-11-22 10:46:51,215] ERROR    {datahub.ingestion.source.bigquery_v2.bigquery:557} - Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 554, in _process_project
    yield from self._process_schema(conn, project_id, bigquery_dataset)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 604, in _process_schema
    yield from self._process_table(conn, table, project_id, dataset_name)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 650, in _process_table
    for wu in table_workunits:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 720, in gen_table_dataset_workunits
    custom_properties["time_partitioning"] = str(str(table.time_partitioning))
  File "/usr/local/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 2659, in __repr__
    key_vals = ["{}={}".format(key, val) for key, val in self._key()]
  File "/usr/local/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 2635, in _key
    properties["type_"] = repr(properties.pop("type"))
KeyError: 'type'

[2022-11-22 10:46:51,215] ERROR    {datahub.ingestion.source.bigquery_v2.bigquery:558} - Unable to get tables for dataset DB in project project51, skipping. The error was: 'type'

Can someone help with it.

✅ 1

dazzling-park-96517

11/22/2022, 11:57 AM

Hi all, I’ve an issue with the ingestion process of Trino, by using the last version of Datahub. For the recipe I’ve followed the docs. The ingestion goes in “succeeded” status but an error raise and import just the schemas and not import the metadata table, here part of the error:

Copy code

'--- Logging error ---\n'
           'Traceback (most recent call last):\n'
           '  File "/tmp/datahub/ingest/venv-trino-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 362, in run\n'
           '    for record_envelope in self.transform(record_envelopes):\n'
           '  File "/tmp/datahub/ingest/venv-trino-0.9.1/lib/python3.10/site-packages/datahub/ingestion/extractor/mce_extractor.py", line 76, in '
           'get_records\n'
           '    raise ValueError(\n'
           "ValueError: source produced an invalid metadata work unit: MetadataChangeEventClass({'auditHeader': None, 'proposedSnapshot': "
           "DatasetSnapshotClass({'urn': 'urn:li:dataset:(urn:li:dataPlatform:trino,catalog.schema
.test_table
,PROD)', 'aspects': "
           "[StatusClass({'removed': False}), DatasetPropertiesClass({'customProperties': {'table_name': 'test_table', 'comment': None}, "
           "'externalUrl': None, 'name': 'test_table
', 'qualifiedName': None, 'description': None, 'uri': None, 'tags': []}), "
           "SchemaMetadataClass({'schemaName': 'catalog
.schema.test_table', 'platform': 'urn:li:dataPlatform:trino', 'version': 0, "
           "'created': AuditStampClass({'time': 0, 'actor': 'urn:li:corpuser:unknown', 'impersonator': None, 'message': None}), 'lastModified': "
           "AuditStampClass({'time': 0, 'actor': 'urn:li:corpuser:unknown', 'impersonator': None, 'message': None}), 'deleted': None, 'dataset': "
           "None, 'cluster': None, 'hash': '', 'platformSchema': MySqlDDLClass({'tableSchema': ''}), 'fields': [SchemaFieldClass({'fieldPath': "
           "'name', 'jsonPath': None, 'nullable': True, 'description': None, 'label': None, 'created': None, 'lastModified': None, 'type': "
           "SchemaFieldDataTypeClass({'type': StringTypeClass({})}), 'nativeDataType': 'VARCHAR()', 'recursive': False, 'globalTags': None, "
           "'glossaryTerms': None, 'isPartOfKey': False, 'isPartitioningKey': None, 'jsonProps': None}), SchemaFieldClass({'fieldPath': 'lastname', "
           "'jsonPath': None, 'nullable': True, 'description': None, 'label': None, 'created': None, 'lastModified': None, 'type': "

…………
'  File "/usr/local/lib/python3.10/logging/__init__.py", line 368, in getMessage\n'
           '    msg = msg % self.args\n'
           'TypeError: not all arguments converted during string formatting\n'
           'Call stack:\n'

Seems like the data cannot be read correctly… Thanks in advance for answers and tips.

✅ 1

alert-fall-82501

11/22/2022, 12:12 PM

Hi Team - I am working on datahub custom action framework . My question is , Is it possible to get the info of metadata changes on DataHub UI itself ? ..There should be tab /button where we can see the changes . if yes , What would be the process ? TIA!

✅ 1

colossal-smartphone-90274

11/22/2022, 3:42 PM

Hello everyone, I would like to add on-premise Active Directory data to datahub is this a planned feature? Currently, the only option is to use online AD however, I cannot use this option as I don't have access to a tenant or a client id. Thanks 🙂

✅ 1

happy-notebook-43808

11/22/2022, 9:29 PM

Hello everyone! Just started using DataHub Self Hosted and ingested a table from MS SQL Server using ODBC 17 (pyodbc). I installed acryl-datahub[mssql] before the ingest. I have enabled many of the profiling settings as shown below in my recipe.yml. Distinct count, distinct %, and standard deviation are the only ones that never get populated. Please let me know if you may be able to help with this issue.

Copy code

source:
  type: mssql
  config:
    
...

    # Options
    use_odbc: "True"
    uri_args:
      driver: "ODBC Driver 17 for SQL Server"
      Encrypt: "yes"
      TrustServerCertificate: "Yes"
      ssl: "True"

    profiling:
      enabled: true
      limit: 100000
      report_dropped_profiles: false
      profile_table_level_only: false  
      include_field_null_count: true   
      include_field_min_value: true
      include_field_max_value: true
      include_field_mean_value: true
      include_field_median_value: true
      include_field_stddev_value: true
      include_field_quantiles: true
      include_field_distinct_value_frequencies: true
      include_field_sample_values: true
      turn_off_expensive_profiling_metrics: false
      include_field_histogram: true
      catch_exceptions: false
      max_workers: 4
      query_combiner_enabled: true
      max_number_of_fields_to_profile: 100
      profile_if_updated_since_days: null
      partition_profiling_enabled: false

plus1 1

✅ 1

bland-lighter-26751

11/22/2022, 11:53 PM

Hi everyone! I'm beginning to trial Datahub for my org and need some help with Metabase ingestion. I see that the plugin is in beta, but has anyone gotten it to work when the backend DB is MySQL? There are some optional configuration options that might help me connect but the documentation is pretty bare. The only fields I am using are: connect_uri, password, username Here is the error I am getting:

exec-urn_li_dataHubExecutionRequest_564473b4-eb76-4334-9c7f-79f1a300698f.log

✅ 1