DataHub #ingestion

witty-butcher-82399

07/16/2021, 7:53 AM

Hi! With some scheduling, connectors keep the metadata for a given source system up-to-date in DataHub. However, what if a dataset is removed in the source system? how are you managing this scenario to set the

Status.removed=true

in particular and prevent stale metadata in general?

➕ 1

square-activity-64562

07/16/2021, 10:56 AM

https://github.com/linkedin/datahub/pull/2898 @gray-shoe-75895 review please. I tested it on my local and it works. This is for adding this feature https://datahubspace.slack.com/archives/CUMUWQU66/p1625566182481100

adamant-pharmacist-61996

07/18/2021, 12:25 AM

Hi everyone, I'm curious to hear if anyone has managed to ingest meta data from redash? We use redash internally as one of our data consumption tools and it would be great to ingest the meta data from this in a similar way to superset

square-activity-64562

07/19/2021, 7:14 AM

@gray-shoe-75895 https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/recipes/secured_kafka.yml#L43 this needs to be quoted. Otherwise we get

Copy code

ScannerError: mapping values are not allowed here

square-activity-64562

07/19/2021, 7:40 AM

I used this ingestion file and was able to get the data into kafka. It shows up in confluent cloud but datahub's

datahub-mce-consumer

is unable to consume it

Copy code

source:
  type: postgres
  config:
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
    host_port: ${DB_HOST}
    database: ${DB_database}
    table_pattern:
      allow:
        - "superset.public.logs"

    schema_pattern:
      deny:
        - "information_schema"

sink:
  type: "datahub-kafka"
  config:
    connection:
      bootstrap: ${BOOTSTARP_URL}
      producer_config:
        security.protocol: sasl_ssl
        sasl.mechanism: PLAIN
        sasl.username: ${KAFKA_KEY_ID}
        sasl.password: ${KAFKA_KEY_SECRET}
      schema_registry_url: https://${SCHEMA_REGISTRY_URL}
      schema_registry_config:
        <http://basic.auth.user.info|basic.auth.user.info>: "${SCHEMA_REGISTRY_KEY_ID}:${SCHEMA_REGISTRY_KEY_PASSWORD}"

faint-hair-91313

07/19/2021, 9:40 AM

Hey guys, any plans to support ingestion of Oracle spatial columns? I get this error when ingesting with an Oracle connector.

Copy code

[2021-07-19 09:39:57,010] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit edw.excluded_airspace_volume_bb
/home/mmmstz013/gmarin/.local/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py:256: SAWarning: Did not recognize type 'SDO_GEOMETRY' of column 'flight_sector_geom'
  columns = inspector.get_columns(table, schema)

adamant-pharmacist-61996

07/20/2021, 9:16 AM

Hi everyone, I’m getting this error whilst using the athena recipe with airflow:

Copy code

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/datahub_ingestion_athena.py", line 50, in ingest_from_athena
    pipeline.run()
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
    for wu in self.source.get_workunits():
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 283, in get_workunits
    yield from self.loop_views(inspector, schema, sql_config)
  File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 344, in loop_views
    for view in inspector.get_view_names(schema):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 326, in get_view_names
    self.bind, schema, info_cache=self.info_cache
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/interfaces.py", line 345, in get_view_names
    raise NotImplementedError()
NotImplementedError

square-activity-64562

07/20/2021, 10:49 AM

I was able to connect to Confluent cloud when running datahuv v0.8.6 on GCP GKE thanks to @big-carpet-38439. Notes here in case it helps. Dataset not showing in search results. But the data is showing going directly to the URL so data is there

faint-hair-91313

07/20/2021, 3:56 PM

Hi guys, I am ingesting Oracle metadata and for a specific view I am getting the error in the debug log file. The view is quite complex and I assume it is because of that. Also attached the view definition. I've tried to skip it during the ingestion, but it's not happening, maybe works only for tables.

Copy code

include_views: True
    table_pattern:
         deny:
            - "^(sco_sector_configuration_bbs).*"

view_definition.txt ingest.debug.zip

brave-forest-92595

07/20/2021, 5:37 PM

I was trying to inject data from mongodb and I got the following error and was hoping for some assistance, [2021-07-20 133334,439] ERROR {datahub.ingestion.run.pipeline:53} - failed to write record with workunit links.109403932 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:111)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n', 'status': 404}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n

proud-church-91494

07/20/2021, 8:30 PM

Hi everyone, Where I find a documentation for all parameters for all types of sources to ingestion config file? Example, I'd like to ingest a Avro file from File, how can I do?

full-balloon-75621

07/21/2021, 2:07 AM

Hi, I'm seeing Authorization Failure trying with MongoDB using read-only account on non-default database. Note: It works using "super" credentials

Copy code

source:
  type: "mongodb"
  config:
    connect_uri: "<mongodb://my.hostname:27017/mydb>"
    username: "readonly"
    password: "readonly"
    env: "DEV"
    authMechanism: "DEFAULT"
    options:
      tls: True
      tlsCAFile: "/path/to/my.pem"

I suspect the account can't access the "default" database, but putting database on the uri didn't help. Any suggestions?

silly-state-21367

07/21/2021, 7:14 AM

Hello Everyone,

better-orange-49102

07/21/2021, 2:15 PM

i have a question on browsePath aspect. I see that it accepts a list, and assumed that it meant that i could potentially have multiple url references pointing to the same dataset. for instance: browsePaths=[/A/C_dataset, /A/B/C_dataset] for C_dataset. however, what i got in the end was /A/C_dataset became a folder that was empty in the UI and the dataset is found at /A/B/C_dataset. Kinda weird behavior. is it possible to have multiple urls point to the same dataset? i was hoping to arrange the data in several perspectives, for instance, the default datahub arranges dataset by source, but another way is to arrange the dataset by business process, etc. its to enable users to navigate to the datasets more easily according to their domain knowledge.

proud-jelly-46237

07/22/2021, 12:07 AM

has anybody tried to crate an ingress on datahub-gms and tried ingestion using the

internal

alb url in aws?

prehistoric-yak-75049

07/22/2021, 1:31 AM

Hi I am trying out below airflow DAG , https://github.com/linkedin/datahub/blob/4958febed52ced82c02f62c15a930779201583ef/[…]ingestion/src/datahub_provider/example_dags/mysql_sample_dag.py In my case MySql has SSL enabled and cert is install on the VM . what extra parameter I need to use to enable SSL connection. I tried using uri_opts but BaseModel Validation is failing

square-activity-64562

07/22/2021, 4:25 AM

~~There is some issue with data ingestion in postgres. It has added some odd description (seems like error messages) to the tables~~ Problem in table descriptions

square-activity-64562

07/22/2021, 11:05 AM

Is there any difference between the output produced by

glue

and

athena

sources? We have athena tables which are managed by

glue

catalog. The ingestion plugins for athena does not support views. So I was think

glue

file

-> replace

glue

with

athena

in file ->

athena

ingestion. Will bypass needing to add support for athena views

mysterious-laptop-65928

07/22/2021, 1:36 PM

Hi All, I am exploring Datahub. so far was successful in checking the lineage . it was great. Now, we would like to ingest tags and short description using an input flat file without having to do it individually for every data set. is there any solution already for this ? please help on this. please share any useful threads. links or feel free to DM for more clarity. thank you !

👍 1

careful-insurance-60247

07/22/2021, 2:18 PM

I am trying to use a transformer when ingesting data from mssql. I am getting an error with the following details

Copy code

source:
  type: mssql
  config:
    host_port: host:1433
    username: <user>
    password: <password>
    database: <db>
    table_pattern:
      deny:
        - "^.*\\.sys_.*" # deny all tables that start with sys_
        - "^.*\\.cdc.*"
transformer:
  type: "simple_add_dataset_tags"
  config:
    tag_urns:
      - "urn:li:tag:NeedsDocumentation"
sink:
  type: "datahub-rest"
  config:
    server: "http://<IP>:8080"

Error: 

 datahub ingest -c ./mssql_poc.yml
1 validation error for PipelineConfig
transformers
  value is not a valid list (type=type_error.list)

future-waitress-970

07/22/2021, 3:00 PM

Hey, im trying to connect airflow and datahub, but when I run:

airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '<http://172.17.0.1:9002>'

I get:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

(Background on this error at: <http://sqlalche.me/e/e3q8>)

prehistoric-yak-75049

07/22/2021, 7:32 PM

Hey , I am planing to setup Kafka Metadata but I don’t have Schema Registry service. There are 1000+ topics we have mostly contents type are JSON and logs (Text) . What is the best way to setup Kafka Metadata ingestion.

strong-restaurant-35629

07/23/2021, 11:16 AM

Hi I’m trying to setup an ingestion from bigquery. Copied example from documentation but receiving error. 1 validation error in PipelineConfig sink field required (type=value_error.missing) any advise appreciated

👍 1

mysterious-lamp-73086

07/24/2021, 5:32 PM

Hi! Did anyone add postgresql procedure?

thankful-family-51777

07/26/2021, 6:10 AM

Hi all, I have this error problem while loading the dataset. This error only occurs in some datasets. Does anyone know what I should do?

Copy code

datahub-frontend-react    | Caused by: com.linkedin.r2.RemoteInvocationException: Received error 414 from server for URI <http://datahub-gms:8080/datasets>
datahub-frontend-react    |     at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:98)
datahub-frontend-react    |     at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
datahub-frontend-react    |     at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
datahub-frontend-react    |     at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
datahub-frontend-react    |     at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
datahub-frontend-react    |     at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
datahub-frontend-react    |     at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react    |     at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react    |     at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react    |     at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react    |     at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
datahub-frontend-react    |     at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
datahub-frontend-react    |     at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
datahub-frontend-react    |     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
datahub-frontend-react    |     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
datahub-frontend-react    |     at java.lang.Thread.run(Thread.java:748)

adamant-pharmacist-61996

07/26/2021, 8:49 AM

Hi all, when ingesting using the airflow plugin is there a way to define which environment the task is operating in? It's possible to define the dataset environment as part of the Dataset definition, but is you'd possible in the pipeline too?

cool-iron-6335

07/26/2021, 2:02 PM

the table

t2

that belongs to

db2

is not supposed to be ingested into DataHub. I have only

db1

and

db2

in my database

fancy-controller-77815

07/26/2021, 6:44 PM

is there an escape character, or other method to address values in your yml ingest recipes that contain # signs. Specifically looking at passwords for a test, or how to parameterize this to be prompted when run

gifted-queen-61023

07/27/2021, 10:52 AM

Hey guys waving from afar left I was trying out DataHub and t is being really cool and easy to setup. I have a requirement but I'm not sure if DataHub allows me to easily do so. I need to be able to ingest a customised

.json

.csv

with information regarding some dashboards that we use. We would like to extend the data discovery capabilities of DataHub with not only automatic discovery (awesome experience so far) but also manual introduced metadata. In this way we can easily add and tweak metadata according to most used report, spread across all platforms (metadata like,

title

description

(*with a link to the too*l),

tags

, etc.). I'm aware of the source type

file

, but it seems to verbose due to being "from a previously generated file". Is it easy develop

.json

with correct sintaxe to feed Datahub? Also noticed that

demo_data.json

is generated by a

.csv

(diretives) with the help of

enrich.py

script (source). Is it easy to tweak it to chose if it should fall under Dashboards instead of Datasets? Or even make it a feature? 😊 Thanks in advance 🙂

colossal-furniture-76714

07/27/2021, 3:52 PM

Hi @gray-shoe-75895 , I would like to the test the airflow Integration or in other words I want to produce lineage based on airflow DAGs. I have followed the doc thoroughly, but I'm not sure if the connection is already working. Is there a way to check it? I have a DAG that runs fine, but I do not see any data in datahub