https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • w

    witty-butcher-82399

    07/16/2021, 7:53 AM
    Hi! With some scheduling, connectors keep the metadata for a given source system up-to-date in DataHub. However, what if a dataset is removed in the source system? how are you managing this scenario to set the
    Status.removed=true
    in particular and prevent stale metadata in general?
    ➕ 1
    b
    • 2
    • 7
  • s

    square-activity-64562

    07/16/2021, 10:56 AM
    https://github.com/linkedin/datahub/pull/2898 @gray-shoe-75895 review please. I tested it on my local and it works. This is for adding this feature https://datahubspace.slack.com/archives/CUMUWQU66/p1625566182481100
    g
    • 2
    • 3
  • a

    adamant-pharmacist-61996

    07/18/2021, 12:25 AM
    Hi everyone, I'm curious to hear if anyone has managed to ingest meta data from redash? We use redash internally as one of our data consumption tools and it would be great to ingest the meta data from this in a similar way to superset
    m
    s
    • 3
    • 10
  • s

    square-activity-64562

    07/19/2021, 7:14 AM
    @gray-shoe-75895 https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/recipes/secured_kafka.yml#L43 this needs to be quoted. Otherwise we get
    Copy code
    ScannerError: mapping values are not allowed here
    g
    • 2
    • 2
  • s

    square-activity-64562

    07/19/2021, 7:40 AM
    I used this ingestion file and was able to get the data into kafka. It shows up in confluent cloud but datahub's
    datahub-mce-consumer
    is unable to consume it
    Copy code
    source:
      type: postgres
      config:
        username: ${DB_USERNAME}
        password: ${DB_PASSWORD}
        host_port: ${DB_HOST}
        database: ${DB_database}
        table_pattern:
          allow:
            - "superset.public.logs"
    
        schema_pattern:
          deny:
            - "information_schema"
    
    sink:
      type: "datahub-kafka"
      config:
        connection:
          bootstrap: ${BOOTSTARP_URL}
          producer_config:
            security.protocol: sasl_ssl
            sasl.mechanism: PLAIN
            sasl.username: ${KAFKA_KEY_ID}
            sasl.password: ${KAFKA_KEY_SECRET}
          schema_registry_url: https://${SCHEMA_REGISTRY_URL}
          schema_registry_config:
            <http://basic.auth.user.info|basic.auth.user.info>: "${SCHEMA_REGISTRY_KEY_ID}:${SCHEMA_REGISTRY_KEY_PASSWORD}"
    b
    • 2
    • 18
  • f

    faint-hair-91313

    07/19/2021, 9:40 AM
    Hey guys, any plans to support ingestion of Oracle spatial columns? I get this error when ingesting with an Oracle connector.
    Copy code
    [2021-07-19 09:39:57,010] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit edw.excluded_airspace_volume_bb
    /home/mmmstz013/gmarin/.local/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py:256: SAWarning: Did not recognize type 'SDO_GEOMETRY' of column 'flight_sector_geom'
      columns = inspector.get_columns(table, schema)
    g
    • 2
    • 1
  • a

    adamant-pharmacist-61996

    07/20/2021, 9:16 AM
    Hi everyone, I’m getting this error whilst using the athena recipe with airflow:
    Copy code
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
        result = task_copy.execute(context=context)
      File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 113, in execute
        return_value = self.execute_callable()
      File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
        return self.python_callable(*self.op_args, **self.op_kwargs)
      File "/usr/local/airflow/dags/datahub_ingestion_athena.py", line 50, in ingest_from_athena
        pipeline.run()
      File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
        for wu in self.source.get_workunits():
      File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 283, in get_workunits
        yield from self.loop_views(inspector, schema, sql_config)
      File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 344, in loop_views
        for view in inspector.get_view_names(schema):
      File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 326, in get_view_names
        self.bind, schema, info_cache=self.info_cache
      File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/interfaces.py", line 345, in get_view_names
        raise NotImplementedError()
    NotImplementedError
    s
    g
    • 3
    • 7
  • s

    square-activity-64562

    07/20/2021, 10:49 AM
    I was able to connect to Confluent cloud when running datahuv v0.8.6 on GCP GKE thanks to @big-carpet-38439. Notes here in case it helps. Dataset not showing in search results. But the data is showing going directly to the URL so data is there
    b
    • 2
    • 3
  • f

    faint-hair-91313

    07/20/2021, 3:56 PM
    Hi guys, I am ingesting Oracle metadata and for a specific view I am getting the error in the debug log file. The view is quite complex and I assume it is because of that. Also attached the view definition. I've tried to skip it during the ingestion, but it's not happening, maybe works only for tables.
    Copy code
    include_views: True
        table_pattern:
             deny:
                - "^(sco_sector_configuration_bbs).*"
    view_definition.txtingest.debug.zip
    g
    • 2
    • 7
  • b

    brave-forest-92595

    07/20/2021, 5:37 PM
    I was trying to inject data from mongodb and I got the following error and was hoping for some assistance, [2021-07-20 133334,439] ERROR  {datahub.ingestion.run.pipeline:53} - failed to write record with workunit links.109403932 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:111)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n', 'status': 404}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n
    g
    • 2
    • 7
  • p

    proud-church-91494

    07/20/2021, 8:30 PM
    Hi everyone, Where I find a documentation for all parameters for all types of sources to ingestion config file? Example, I'd like to ingest a Avro file from File, how can I do?
    l
    g
    • 3
    • 3
  • f

    full-balloon-75621

    07/21/2021, 2:07 AM
    Hi, I'm seeing Authorization Failure trying with MongoDB using read-only account on non-default database. Note: It works using "super" credentials
    Copy code
    source:
      type: "mongodb"
      config:
        connect_uri: "<mongodb://my.hostname:27017/mydb>"
        username: "readonly"
        password: "readonly"
        env: "DEV"
        authMechanism: "DEFAULT"
        options:
          tls: True
          tlsCAFile: "/path/to/my.pem"
    I suspect the account can't access the "default" database, but putting database on the uri didn't help. Any suggestions?
    g
    • 2
    • 7
  • s

    silly-state-21367

    07/21/2021, 7:14 AM
    Hello Everyone,
    s
    • 2
    • 2
  • b

    better-orange-49102

    07/21/2021, 2:15 PM
    i have a question on browsePath aspect. I see that it accepts a list, and assumed that it meant that i could potentially have multiple url references pointing to the same dataset. for instance: browsePaths=[/A/C_dataset, /A/B/C_dataset] for C_dataset. however, what i got in the end was /A/C_dataset became a folder that was empty in the UI and the dataset is found at /A/B/C_dataset. Kinda weird behavior. is it possible to have multiple urls point to the same dataset? i was hoping to arrange the data in several perspectives, for instance, the default datahub arranges dataset by source, but another way is to arrange the dataset by business process, etc. its to enable users to navigate to the datasets more easily according to their domain knowledge.
    e
    b
    • 3
    • 8
  • p

    proud-jelly-46237

    07/22/2021, 12:07 AM
    has anybody tried to crate an ingress on datahub-gms and tried ingestion using the
    internal
    alb url in aws?
    l
    e
    • 3
    • 3
  • p

    prehistoric-yak-75049

    07/22/2021, 1:31 AM
    Hi I am trying out below airflow DAG , https://github.com/linkedin/datahub/blob/4958febed52ced82c02f62c15a930779201583ef/[…]ingestion/src/datahub_provider/example_dags/mysql_sample_dag.py In my case MySql has SSL enabled and cert is install on the VM . what extra parameter I need to use to enable SSL connection. I tried using uri_opts but BaseModel Validation is failing
    g
    • 2
    • 2
  • s

    square-activity-64562

    07/22/2021, 4:25 AM
    There is some issue with data ingestion in postgres. It has added some odd description (seems like error messages) to the tables Problem in table descriptions
    m
    • 2
    • 17
  • s

    square-activity-64562

    07/22/2021, 11:05 AM
    Is there any difference between the output produced by
    glue
    and
    athena
    sources? We have athena tables which are managed by
    glue
    catalog. The ingestion plugins for athena does not support views. So I was think
    glue
    ->
    file
    -> replace
    glue
    with
    athena
    in file ->
    athena
    ingestion. Will bypass needing to add support for athena views
    m
    • 2
    • 6
  • m

    mysterious-laptop-65928

    07/22/2021, 1:36 PM
    Hi All, I am exploring Datahub. so far was successful in checking the lineage . it was great. Now, we would like to ingest tags and short description using an input flat file without having to do it individually for every data set. is there any solution already for this ? please help on this. please share any useful threads. links or feel free to DM for more clarity. thank you !
    👍 1
    g
    • 2
    • 1
  • c

    careful-insurance-60247

    07/22/2021, 2:18 PM
    I am trying to use a transformer when ingesting data from mssql. I am getting an error with the following details
    Copy code
    source:
      type: mssql
      config:
        host_port: host:1433
        username: <user>
        password: <password>
        database: <db>
        table_pattern:
          deny:
            - "^.*\\.sys_.*" # deny all tables that start with sys_
            - "^.*\\.cdc.*"
    transformer:
      type: "simple_add_dataset_tags"
      config:
        tag_urns:
          - "urn:li:tag:NeedsDocumentation"
    sink:
      type: "datahub-rest"
      config:
        server: "http://<IP>:8080"
    
    Error: 
    
     datahub ingest -c ./mssql_poc.yml
    1 validation error for PipelineConfig
    transformers
      value is not a valid list (type=type_error.list)
    g
    • 2
    • 2
  • f

    future-waitress-970

    07/22/2021, 3:00 PM
    Hey, im trying to connect airflow and datahub, but when I run:
    airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '<http://172.17.0.1:9002>'
    I get:
    sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
    (Background on this error at: <http://sqlalche.me/e/e3q8>)
    g
    • 2
    • 8
  • p

    prehistoric-yak-75049

    07/22/2021, 7:32 PM
    Hey , I am planing to setup Kafka Metadata but I don’t have Schema Registry service. There are 1000+ topics we have mostly contents type are JSON and logs (Text) . What is the best way to setup Kafka Metadata ingestion.
    m
    • 2
    • 3
  • s

    strong-restaurant-35629

    07/23/2021, 11:16 AM
    Hi I’m trying to setup an ingestion from bigquery. Copied example from documentation but receiving error. 1 validation error in PipelineConfig sink field required (type=value_error.missing) any advise appreciated
    👍 1
    g
    • 2
    • 1
  • m

    mysterious-lamp-73086

    07/24/2021, 5:32 PM
    Hi! Did anyone add postgresql procedure?
    g
    • 2
    • 5
  • t

    thankful-family-51777

    07/26/2021, 6:10 AM
    Hi all, I have this error problem while loading the dataset. This error only occurs in some datasets. Does anyone know what I should do?
    Copy code
    datahub-frontend-react    | Caused by: com.linkedin.r2.RemoteInvocationException: Received error 414 from server for URI <http://datahub-gms:8080/datasets>
    datahub-frontend-react    |     at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:98)
    datahub-frontend-react    |     at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
    datahub-frontend-react    |     at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
    datahub-frontend-react    |     at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
    datahub-frontend-react    |     at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
    datahub-frontend-react    |     at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
    datahub-frontend-react    |     at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
    datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
    datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
    datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
    datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
    datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
    datahub-frontend-react    |     at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
    datahub-frontend-react    |     at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
    datahub-frontend-react    |     at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
    datahub-frontend-react    |     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    datahub-frontend-react    |     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    datahub-frontend-react    |     at java.lang.Thread.run(Thread.java:748)
    e
    • 2
    • 2
  • a

    adamant-pharmacist-61996

    07/26/2021, 8:49 AM
    Hi all, when ingesting using the airflow plugin is there a way to define which environment the task is operating in? It's possible to define the dataset environment as part of the Dataset definition, but is you'd possible in the pipeline too?
    g
    • 2
    • 1
  • c

    cool-iron-6335

    07/26/2021, 2:02 PM
    the table
    t2
    that belongs to
    db2
    is not supposed to be ingested into DataHub. I have only
    db1
    and
    db2
    in my database
    m
    • 2
    • 3
  • f

    fancy-controller-77815

    07/26/2021, 6:44 PM
    is there an escape character, or other method to address values in your yml ingest recipes that contain # signs. Specifically looking at passwords for a test, or how to parameterize this to be prompted when run
    g
    • 2
    • 1
  • g

    gifted-queen-61023

    07/27/2021, 10:52 AM
    Hey guys waving from afar left I was trying out DataHub and t is being really cool and easy to setup. I have a requirement but I'm not sure if DataHub allows me to easily do so. I need to be able to ingest a customised
    .json
    or
    .csv
    with information regarding some dashboards that we use. We would like to extend the data discovery capabilities of DataHub with not only automatic discovery (awesome experience so far) but also manual introduced metadata. In this way we can easily add and tweak metadata according to most used report, spread across all platforms (metadata like,
    title
    ,
    description
    (*with a link to the too*l),
    tags
    , etc.). I'm aware of the source type
    file
    , but it seems to verbose due to being "from a previously generated file". Is it easy develop
    .json
    with correct sintaxe to feed Datahub? Also noticed that
    demo_data.json
    is generated by a
    .csv
    (diretives) with the help of
    enrich.py
    script (source). Is it easy to tweak it to chose if it should fall under Dashboards instead of Datasets? Or even make it a feature? 😊 Thanks in advance 🙂
    g
    • 2
    • 7
  • c

    colossal-furniture-76714

    07/27/2021, 3:52 PM
    Hi @gray-shoe-75895 , I would like to the test the airflow Integration or in other words I want to produce lineage based on airflow DAGs. I have followed the doc thoroughly, but I'm not sure if the connection is already working. Is there a way to check it? I have a DAG that runs fine, but I do not see any data in datahub
    g
    • 2
    • 4
1...789...144Latest