https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • m

    miniature-airport-96424

    06/17/2021, 1:30 PM
    + datahub-gms in restarting
  • i

    important-bird-56181

    06/23/2021, 10:11 AM
    @important-bird-56181 has left the channel
  • f

    fancy-helmet-32669

    06/24/2021, 7:36 PM
    Hi there, It’s just a bit of information that can be helpful for other new users of Datahub metadata ingestions. After a successful deployment of Datahub on AWS EKS, I’ve stumbled upon example data ingestion with Docker
    ./scripts/datahub_docker.sh ingest -c ./examples/recipes/example_to_datahub_rest.yml
    The result was a failure.
  • f

    fancy-helmet-32669

    06/24/2021, 7:36 PM
    But with CLI command everything works smooth.
    Copy code
    pip install 'acryl-datahub[datahub-rest]' # install the required plugin
    datahub ingest -c ./examples/recipes/example_to_datahub_rest.yml
  • l

    loud-island-88694

    06/29/2021, 11:22 AM
    @gray-shoe-75895 @chilly-holiday-80781 ^
  • f

    future-waitress-970

    06/29/2021, 1:52 PM
    Hey, I'm trying to set up datahub and im having the following problem
  • f

    future-waitress-970

    06/29/2021, 1:53 PM
    ERROR  {datahub.ingestion.run.pipeline:53} - failed to write record with workunit dev.public.tmp with Expecting value: line 1 column 1 (char 0) and info {}
  • f

    future-waitress-970

    06/29/2021, 1:53 PM
    And this for 9 different tables
  • f

    future-waitress-970

    06/29/2021, 1:53 PM
    This is for a redshift to datahub ingestion
  • l

    loud-island-88694

    07/01/2021, 4:48 AM
    @early-lamp-41924 ^
  • c

    cool-iron-6335

    07/02/2021, 9:21 AM
    Hi, I got this error
    JSONDecodeError('Expecting value: line 1 column 1 (char 0)')
    when i tried to ingest metadata into DataHub. I am using the repository from git with tag of v8.5.0 but arcyl-datahub python library has just be updated to version of 0.8.4.0. Is that okay ?
  • r

    rich-policeman-92383

    07/12/2021, 12:58 PM
    Copy code
    '\t... 62 more\n',
                            'status': 404}},
                  {'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'message': "No root resource defined for path '/entities'",
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '
                                          "'/entities'\n"
                                          '\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
                                          '\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n'
                                          '\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n'
                                          '\tat '
                                          'com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n'
                                          '\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
                                          '\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
                                          '\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n'
                                          '\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n'
                                          '\tat '
                                          'com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n'
                                          '\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n'
                                          '\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n'
                                          '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
                                          '\tat '
                                          'com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n'
                                          '\tat '
                                          'org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n'
                                          '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n'
                                          '\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n'
                                          '\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n'
                                          '\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
                                          '\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n'
                                          '\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n'
                                          '\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n'
                                          '\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n'
                                          '\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n'
                                          '\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n'
                                          '\tat '
                                          'org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n'
                                          '\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n'
                                          '\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n'
                                          '\tat java.lang.Thread.run(Thread.java:748)\n'
                                          "Caused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n"
                                          '\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n'
                                          '\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n'
                                          '\t... 62 more\n',
                            'status': 404}}],
     'records_written': 0,
     'warnings': []}
    Pipeline finished with failures
  • a

    adventurous-scooter-52064

    07/19/2021, 3:08 AM
    Hi is anyone working on data ingestion with Metabase? We want too use Metabase with Datahub just like Superset with Datahub as well^^
  • w

    witty-butcher-82399

    07/19/2021, 8:30 AM
    For this I was thinking of a sort of garbage collector job traversing all entities and set
    removed=true
    (or maybe a new status,
    stale=true
    ) for those entities which have not been recently updated. I would like to hear how the community is solving this, too.
  • b

    brave-forest-92595

    07/20/2021, 5:37 PM
    Hello,
  • s

    silly-state-21367

    07/21/2021, 7:16 AM
    We have the Column level and table level descriptions in the flat files, we are looking to bring the same into DataHub datasets which already brought as part of the data ingestions.
  • s

    silly-state-21367

    07/21/2021, 7:16 AM
    If anyone show some light on this topic, it will be very helpful
  • s

    silly-state-21367

    07/21/2021, 7:17 AM
    Instead of asking the datasets owner to key in the description for each and every columns in their datasets on Datahub, we looking to leverage the information available in flat file from the backend itself
  • m

    mammoth-bear-12532

    07/22/2021, 4:23 AM
    @prehistoric-yak-75049: this is a useful thread related to ssl config for postgres. You can probably use the host_port option described here, it should work for pymsql as well. There is a stackoverflow thread I found that describes the variables needed. (ssl_key, ssl_cert, etc). Let us know how that goes!
  • f

    future-waitress-970

    07/22/2021, 3:00 PM
    Im running it in the right place i think, in the docker CLI for airflow.
  • c

    cool-iron-6335

    07/26/2021, 1:58 PM
    Can i ask how can i ingest all tables from a specific Hive database into DataHub? I'd already specify the desired database in the recipe yaml file but it didn't seem to work. It turned out ingesting all the databases and tables belongs. And i tried out table pattern and schema pattern and they didn't work too
  • c

    cool-iron-6335

    07/26/2021, 2:01 PM
    Copy code
    source:
      type: hive
      config:
        host_port: localhost:10000
        database: db1
    
    
    sink:
      type: "datahub-rest"
      config:    
        server: "<http://localhost:8080>"
  • c

    cool-iron-6335

    07/26/2021, 2:01 PM
    Copy code
    [2021-07-26 20:51:15,631] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit db1.db1.t1
    [2021-07-26 20:51:15,783] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit db1.db2.t2
  • c

    cool-iron-6335

    07/30/2021, 8:52 AM
    Screenshot from 2021-07-30 15-52-16.png
  • m

    mammoth-bear-12532

    08/03/2021, 11:44 PM
    Hey @quiet-kilobyte-82304 I understand you are working on the Tableau connector. Wanted to mention it here to see if someone else is also interested and would like to join forces with you.
  • f

    faint-hair-91313

    08/04/2021, 2:11 PM
    image.png
  • b

    bland-orange-95847

    08/06/2021, 9:51 AM
    I’m having similar thoughts on that topics as we have multiple bigquery dataset in different projects and want to always keep the metadata/tables up to date. So there is no other way than having another Airflow DAG with one task per yaml, loading the recipes and executing them? Do you plan to have some service where you can register the recipe and it’s executed on a schedule automatically?
  • b

    bland-easter-53873

    08/06/2021, 2:36 PM
    Hi folks, I am having trouble connecting the datahub to the snowflake. It is ignoring all the schemas if I add the schema_pattern just allow with like ^my_schema
  • f

    future-waitress-970

    08/06/2021, 4:06 PM
    Hey everyone, I still havent figured a solution to the problem above, any help is appreciated
  • b

    bland-easter-53873

    08/10/2021, 10:13 AM
    The ingestion is kinda stuff for me when I enable profiling, is there any way to debug the cause
1...132133134...144Latest