https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • j

    jolly-book-3043

    11/18/2022, 9:38 PM
    i’m kicking off manual ingestions in the UI using a local quickstart, and it says
    Successfully submitted ingestion execution request!
    but nothing is triggered. i’ve restarted all containers and still no dice. any tips?
    g
    • 2
    • 7
  • r

    red-waitress-53338

    11/20/2022, 12:13 AM
    Can someone please help me out with this, stuck on this for a quite a long time now? I am running datahub-gms and frontend locally inside a docker container, and exposed them using the port forwarding feature to ports 8080 and 9002 respectively. I am able to curl frontend but not gms. Ran the following command:
    curl --location --request POST '<http://localhost:8080/entities?action=search>' \
    -raw '{
    "input":> --header 'X-RestLi-Protocol-Version: 2.0.0' \
    "*",
    "e> --header 'Content-Type: application/json' \
    > --data-raw '{
    >     "input": "*",
    >     "entity": "dataset",
    "sta>     "start": 0,
    count":>     "count": 1000
    > }'
    Getting the error:
    Copy code
    {
      "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
      "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/entities'\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n",
      "message": "No root resource defined for path '/entities'",
      "status": 404
    }
    m
    b
    • 3
    • 74
  • l

    late-ability-59580

    11/20/2022, 2:22 PM
    Hi all! My ingested s3 metadata appears in the UI as multiple datasets, one per file. How can I specify a prefix (part of the path) that will be considered as the dataset including all the files?
    d
    • 2
    • 10
  • b

    billowy-pilot-93812

    11/21/2022, 4:58 AM
    Hi team, does Datahub's policies support that a group of user only see information in a specific domain, for example marketing teams just allow to see datasets and dashboard that belongs to marketing domain. If yes, how can i do this?
    m
    • 2
    • 2
  • r

    rich-policeman-92383

    11/21/2022, 6:37 AM
    Hi Please help me with this ingestion problem Problem: We have two ingestion recipe Recipe A: For complete oracle DB with stateful ingestion Recipe B: For a specific schema in oracle DB ingested by recipe A When recipe A runs it overwrites the columns for tables ingested by recipe B. Datahub version: v0.8.45 Datahub cli version: 0.9.1
    m
    g
    • 3
    • 6
  • m

    modern-artist-55754

    11/21/2022, 7:49 AM
    Question regarding Stateful ingestion. Where is the state stored? If i have my ingestion job running on ephemeral container, would it affect the state file?
    d
    g
    • 3
    • 3
  • m

    millions-carpet-50697

    11/21/2022, 8:03 AM
    Hello everyone. I've been suffered from ingesting data catalog from spark for a while. We use Spark session to run process in AWS Glue3.0(Python). When I tried to ingest spark catalog following https://datahubspace.slack.com/archives/CUMUWQU66/p1651750160522829?thread_ts=1651624835.071429&amp;cid=CUMUWQU66, 2 errors raised as below. 1.NullPointerException
    Copy code
    2022-11-21 03:10:16,113 ERROR [spark-listener-group-shared] spark.DatahubSparkListener (DatahubSparkListener.java:onOtherEvent(273)): java.lang.NullPointerException
    	at datahub.spark.DatasetExtractor.lambda$static$6(DatasetExtractor.java:147)
    	at datahub.spark.DatasetExtractor.asDataset(DatasetExtractor.java:237)
    	at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:114)
    	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:350)
    	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:262)
    	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
    	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
    	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
    	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
    	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
    	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
    	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:12)
    	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    	at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
    	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
    	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381)
    	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
    2.UnsatisfiedLinkError
    Copy code
    Exception in thread "map-output-dispatcher-0" java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
    	at com.github.luben.zstd.Zstd.setCompressionLevel(Native Method)
    	at com.github.luben.zstd.ZstdOutputStream.<init>(ZstdOutputStream.java:67)
    	at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
    	at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:903)
    	at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
    	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    	at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
    	at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
    	at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:750)
    And I found a Git issue exactly match the error I met, but seems it would not be fixed. https://github.com/datahub-project/datahub/issues/5979. Could anyone help? Thanks a lot.
    ✅ 1
    a
    • 2
    • 4
  • c

    cool-tiger-42613

    11/21/2022, 8:43 AM
    Hello, is there an example code for pushing the historical job runs like this example here? https://demo.datahubproject.io/tasks/urn:li:dataJob:(urn:li:dataFlow:(airflow,datahub_an[…]s_refresh,prod),run_data_task)/Runs?is_lineage_mode=false I found this example , but when I run it the
    Time
    is updated as current datetime rather than what I give as input. Can I get help on this?
    ✅ 1
    g
    • 2
    • 5
  • a

    average-baker-96343

    11/21/2022, 8:48 AM
    Hello,after setting Ingestion in the UI and configuring Schedule, error messages appear when running.
    exec-urn_li_dataHubExecutionRequest_d44e3334-45b5-4e8c-81bc-1c6fce16823c.log
    ✅ 1
    m
    g
    • 3
    • 6
  • a

    average-baker-96343

    11/21/2022, 8:53 AM
    this is my UI show that
  • c

    creamy-pizza-80433

    11/21/2022, 11:44 AM
    Hello everyone, I've tried to ingest Hive with Profiling enabled but I got these error messages, does anyone know what seems the problem?
    ✅ 1
    g
    l
    • 3
    • 10
  • m

    mammoth-gigabyte-6392

    11/21/2022, 12:48 PM
    Hello everyone, can anyone please help with data profiling for s3 source
    ✅ 1
    h
    • 2
    • 7
  • r

    rich-policeman-92383

    11/21/2022, 1:09 PM
    Hi Is there way to increase performance of ingestion. With current configuration it takes a day or more to ingest 14K tables from hive.
    ✅ 1
    d
    m
    • 3
    • 2
  • h

    high-hospital-85984

    11/21/2022, 5:33 PM
    I want to ingest schema data with a pipeline, but enrich it afterwards with say a glossary term via the REST API. What’s the best way of doing this? One way of doing this is to use the
    editableSchemaMetadata
    but then we basically remove the ability to make UI edits (as they might get over written). It’s not a huge deal for us, but the approach feels like a hack. Any better ideas?
    ✅ 1
    g
    • 2
    • 1
  • l

    lively-dusk-19162

    11/21/2022, 6:09 PM
    Hello, I am facing the following issue when doing hard delete.
    ✅ 1
    g
    g
    • 3
    • 5
  • l

    lively-dusk-19162

    11/21/2022, 6:16 PM
    Even there is no metadata it is showing me empty records of 370.
    g
    • 2
    • 1
  • l

    little-breakfast-38102

    11/21/2022, 6:33 PM
    Hi Team, we are using custom datahub-actions image for MSSQL ingestion. We were able to successfully ingest metadata, we are running into a crashloop error starting today (did not make any recent changes) to the image. Attaching error from actions pod as well cmd and entry point section from docker. Appreciate any help.
    ✅ 1
    g
    b
    • 3
    • 8
  • l

    lemon-musician-50603

    11/21/2022, 8:04 PM
    Hi Team, I have an excel file with all the metadata details. can you please help how to ingest in datahub ? which source do i need to use?
    👍 1
    ✅ 1
    a
    g
    • 3
    • 19
  • m

    miniature-painting-28571

    11/21/2022, 10:19 PM
    hi - what SCANNER or Scanner FUNCTIONALITY is being used to find PII data with DataHub? Or is that something on the roadmap for the future?
    ✅ 1
    m
    m
    • 3
    • 28
  • m

    miniature-painting-28571

    11/21/2022, 10:19 PM
    Thanks for your replies
  • m

    modern-artist-55754

    11/22/2022, 1:05 AM
    Question regarding Column Level Lineage: • We use dbt for snowflake transformation. Some of the dbt models requires tmp table creation. Which means the derived model isn’t the direct child of the source table, no lineage is shown. I can only see the lineage at table level between
    _dbt_tmp
    table and the destination table. Is there any way to get around that? Before column level lineage, we just suppressed the
    _dbt_tmp
    tables completely, and use dbt for the table lineage, but now with column level linage the
    _dbt_tmp
    show up again and column level linage isn’t working because dbt has not been supported.
    ✅ 1
    g
    • 2
    • 18
  • a

    ancient-policeman-73437

    11/22/2022, 7:23 AM
    It is just a reminder. We have a bug in Looker ingestion 0.9.x. Could somebody help please?
    a
    • 2
    • 9
  • l

    limited-forest-73733

    11/22/2022, 7:45 AM
    Hey team i am not able to see any table level lineage. This is the recipe that i am using. Can anyone please help me out. Thanks in advance
    ✅ 1
    g
    • 2
    • 5
  • s

    steep-family-13549

    11/22/2022, 10:16 AM
    I had integrated Great Expectations in datahub but I get this error " Datasource my_postgres_db is not present in platform_instance_map" any one help me out. "my_postgres_db " is the name of the data source in the great expectations.yml. Please let me know would find something.
    ✅ 1
    h
    • 2
    • 20
  • k

    kind-scientist-44426

    11/22/2022, 10:57 AM
    Hi All, I’m trying to ingest the metadata from BQ. But i’m getting error as below
    Copy code
    [2022-11-22 10:46:51,215] ERROR    {datahub.ingestion.source.bigquery_v2.bigquery:557} - Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 554, in _process_project
        yield from self._process_schema(conn, project_id, bigquery_dataset)
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 604, in _process_schema
        yield from self._process_table(conn, table, project_id, dataset_name)
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 650, in _process_table
        for wu in table_workunits:
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 720, in gen_table_dataset_workunits
        custom_properties["time_partitioning"] = str(str(table.time_partitioning))
      File "/usr/local/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 2659, in __repr__
        key_vals = ["{}={}".format(key, val) for key, val in self._key()]
      File "/usr/local/lib/python3.10/site-packages/google/cloud/bigquery/table.py", line 2635, in _key
        properties["type_"] = repr(properties.pop("type"))
    KeyError: 'type'
    
    [2022-11-22 10:46:51,215] ERROR    {datahub.ingestion.source.bigquery_v2.bigquery:558} - Unable to get tables for dataset DB in project project51, skipping. The error was: 'type'
    Can someone help with it.
    ✅ 1
    a
    • 2
    • 1
  • d

    dazzling-park-96517

    11/22/2022, 11:57 AM
    Hi all, I’ve an issue with the ingestion process of Trino, by using the last version of Datahub. For the recipe I’ve followed the docs. The ingestion goes in “succeeded” status but an error raise and import just the schemas and not import the metadata table, here part of the error:
    Copy code
    '--- Logging error ---\n'
               'Traceback (most recent call last):\n'
               '  File "/tmp/datahub/ingest/venv-trino-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 362, in run\n'
               '    for record_envelope in self.transform(record_envelopes):\n'
               '  File "/tmp/datahub/ingest/venv-trino-0.9.1/lib/python3.10/site-packages/datahub/ingestion/extractor/mce_extractor.py", line 76, in '
               'get_records\n'
               '    raise ValueError(\n'
               "ValueError: source produced an invalid metadata work unit: MetadataChangeEventClass({'auditHeader': None, 'proposedSnapshot': "
               "DatasetSnapshotClass({'urn': 'urn:li:dataset:(urn:li:dataPlatform:trino,catalog.schema
    .test_table
    ,PROD)', 'aspects': "
               "[StatusClass({'removed': False}), DatasetPropertiesClass({'customProperties': {'table_name': 'test_table', 'comment': None}, "
               "'externalUrl': None, 'name': 'test_table
    ', 'qualifiedName': None, 'description': None, 'uri': None, 'tags': []}), "
               "SchemaMetadataClass({'schemaName': 'catalog
    .schema.test_table', 'platform': 'urn:li:dataPlatform:trino', 'version': 0, "
               "'created': AuditStampClass({'time': 0, 'actor': 'urn:li:corpuser:unknown', 'impersonator': None, 'message': None}), 'lastModified': "
               "AuditStampClass({'time': 0, 'actor': 'urn:li:corpuser:unknown', 'impersonator': None, 'message': None}), 'deleted': None, 'dataset': "
               "None, 'cluster': None, 'hash': '', 'platformSchema': MySqlDDLClass({'tableSchema': ''}), 'fields': [SchemaFieldClass({'fieldPath': "
               "'name', 'jsonPath': None, 'nullable': True, 'description': None, 'label': None, 'created': None, 'lastModified': None, 'type': "
               "SchemaFieldDataTypeClass({'type': StringTypeClass({})}), 'nativeDataType': 'VARCHAR()', 'recursive': False, 'globalTags': None, "
               "'glossaryTerms': None, 'isPartOfKey': False, 'isPartitioningKey': None, 'jsonProps': None}), SchemaFieldClass({'fieldPath': 'lastname', "
               "'jsonPath': None, 'nullable': True, 'description': None, 'label': None, 'created': None, 'lastModified': None, 'type': "
    
    …………
    '  File "/usr/local/lib/python3.10/logging/__init__.py", line 368, in getMessage\n'
               '    msg = msg % self.args\n'
               'TypeError: not all arguments converted during string formatting\n'
               'Call stack:\n'
    Seems like the data cannot be read correctly… Thanks in advance for answers and tips.
    ✅ 1
    h
    • 2
    • 8
  • a

    alert-fall-82501

    11/22/2022, 12:12 PM
    Hi Team - I am working on datahub custom action framework . My question is , Is it possible to get the info of metadata changes on DataHub UI itself ? ..There should be tab /button where we can see the changes . if yes , What would be the process ? TIA!
    ✅ 1
    f
    • 2
    • 5
  • c

    colossal-smartphone-90274

    11/22/2022, 3:42 PM
    Hello everyone, I would like to add on-premise Active Directory data to datahub is this a planned feature? Currently, the only option is to use online AD however, I cannot use this option as I don't have access to a tenant or a client id. Thanks 🙂
    ✅ 1
    a
    • 2
    • 3
  • h

    happy-notebook-43808

    11/22/2022, 9:29 PM
    Hello everyone! Just started using DataHub Self Hosted and ingested a table from MS SQL Server using ODBC 17 (pyodbc). I installed acryl-datahub[mssql] before the ingest. I have enabled many of the profiling settings as shown below in my recipe.yml. Distinct count, distinct %, and standard deviation are the only ones that never get populated. Please let me know if you may be able to help with this issue.
    Copy code
    source:
      type: mssql
      config:
        
    ...
    
        # Options
        use_odbc: "True"
        uri_args:
          driver: "ODBC Driver 17 for SQL Server"
          Encrypt: "yes"
          TrustServerCertificate: "Yes"
          ssl: "True"
    
        profiling:
          enabled: true
          limit: 100000
          report_dropped_profiles: false
          profile_table_level_only: false  
          include_field_null_count: true   
          include_field_min_value: true
          include_field_max_value: true
          include_field_mean_value: true
          include_field_median_value: true
          include_field_stddev_value: true
          include_field_quantiles: true
          include_field_distinct_value_frequencies: true
          include_field_sample_values: true
          turn_off_expensive_profiling_metrics: false
          include_field_histogram: true
          catch_exceptions: false
          max_workers: 4
          query_combiner_enabled: true
          max_number_of_fields_to_profile: 100
          profile_if_updated_since_days: null
          partition_profiling_enabled: false
    plus1 1
    ✅ 1
    h
    • 2
    • 1
  • b

    bland-lighter-26751

    11/22/2022, 11:53 PM
    Hi everyone! I'm beginning to trial Datahub for my org and need some help with Metabase ingestion. I see that the plugin is in beta, but has anyone gotten it to work when the backend DB is MySQL? There are some optional configuration options that might help me connect but the documentation is pretty bare. The only fields I am using are: connect_uri, password, username Here is the error I am getting:
    exec-urn_li_dataHubExecutionRequest_564473b4-eb76-4334-9c7f-79f1a300698f.log
    ✅ 1
    m
    • 2
    • 3
1...858687...144Latest