witty-butcher-82399
07/16/2021, 7:53 AMStatus.removed=true
in particular and prevent stale metadata in general?square-activity-64562
07/16/2021, 10:56 AMadamant-pharmacist-61996
07/18/2021, 12:25 AMsquare-activity-64562
07/19/2021, 7:14 AMScannerError: mapping values are not allowed here
square-activity-64562
07/19/2021, 7:40 AMdatahub-mce-consumer
is unable to consume it
source:
type: postgres
config:
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
host_port: ${DB_HOST}
database: ${DB_database}
table_pattern:
allow:
- "superset.public.logs"
schema_pattern:
deny:
- "information_schema"
sink:
type: "datahub-kafka"
config:
connection:
bootstrap: ${BOOTSTARP_URL}
producer_config:
security.protocol: sasl_ssl
sasl.mechanism: PLAIN
sasl.username: ${KAFKA_KEY_ID}
sasl.password: ${KAFKA_KEY_SECRET}
schema_registry_url: https://${SCHEMA_REGISTRY_URL}
schema_registry_config:
<http://basic.auth.user.info|basic.auth.user.info>: "${SCHEMA_REGISTRY_KEY_ID}:${SCHEMA_REGISTRY_KEY_PASSWORD}"
faint-hair-91313
07/19/2021, 9:40 AM[2021-07-19 09:39:57,010] INFO {datahub.ingestion.run.pipeline:44} - sink wrote workunit edw.excluded_airspace_volume_bb
/home/mmmstz013/gmarin/.local/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py:256: SAWarning: Did not recognize type 'SDO_GEOMETRY' of column 'flight_sector_geom'
columns = inspector.get_columns(table, schema)
adamant-pharmacist-61996
07/20/2021, 9:16 AMTraceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/datahub_ingestion_athena.py", line 50, in ingest_from_athena
pipeline.run()
File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
for wu in self.source.get_workunits():
File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 283, in get_workunits
yield from self.loop_views(inspector, schema, sql_config)
File "/usr/local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 344, in loop_views
for view in inspector.get_view_names(schema):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 326, in get_view_names
self.bind, schema, info_cache=self.info_cache
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/interfaces.py", line 345, in get_view_names
raise NotImplementedError()
NotImplementedError
square-activity-64562
07/20/2021, 10:49 AMfaint-hair-91313
07/20/2021, 3:56 PMinclude_views: True
table_pattern:
deny:
- "^(sco_sector_configuration_bbs).*"
brave-forest-92595
07/20/2021, 5:37 PMproud-church-91494
07/20/2021, 8:30 PMfull-balloon-75621
07/21/2021, 2:07 AMsource:
type: "mongodb"
config:
connect_uri: "<mongodb://my.hostname:27017/mydb>"
username: "readonly"
password: "readonly"
env: "DEV"
authMechanism: "DEFAULT"
options:
tls: True
tlsCAFile: "/path/to/my.pem"
I suspect the account can't access the "default" database, but putting database on the uri didn't help. Any suggestions?silly-state-21367
07/21/2021, 7:14 AMbetter-orange-49102
07/21/2021, 2:15 PMproud-jelly-46237
07/22/2021, 12:07 AMinternal
alb url in aws?prehistoric-yak-75049
07/22/2021, 1:31 AMsquare-activity-64562
07/22/2021, 4:25 AMsquare-activity-64562
07/22/2021, 11:05 AMglue
and athena
sources? We have athena tables which are managed by glue
catalog. The ingestion plugins for athena does not support views. So I was think glue
-> file
-> replace glue
with athena
in file -> athena
ingestion. Will bypass needing to add support for athena viewsmysterious-laptop-65928
07/22/2021, 1:36 PMcareful-insurance-60247
07/22/2021, 2:18 PMsource:
type: mssql
config:
host_port: host:1433
username: <user>
password: <password>
database: <db>
table_pattern:
deny:
- "^.*\\.sys_.*" # deny all tables that start with sys_
- "^.*\\.cdc.*"
transformer:
type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:NeedsDocumentation"
sink:
type: "datahub-rest"
config:
server: "http://<IP>:8080"
Error:
datahub ingest -c ./mssql_poc.yml
1 validation error for PipelineConfig
transformers
value is not a valid list (type=type_error.list)
future-waitress-970
07/22/2021, 3:00 PMairflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host '<http://172.17.0.1:9002>'
I get:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
(Background on this error at: <http://sqlalche.me/e/e3q8>)
prehistoric-yak-75049
07/22/2021, 7:32 PMstrong-restaurant-35629
07/23/2021, 11:16 AMmysterious-lamp-73086
07/24/2021, 5:32 PMthankful-family-51777
07/26/2021, 6:10 AMdatahub-frontend-react | Caused by: com.linkedin.r2.RemoteInvocationException: Received error 414 from server for URI <http://datahub-gms:8080/datasets>
datahub-frontend-react | at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:98)
datahub-frontend-react | at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
datahub-frontend-react | at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
datahub-frontend-react | at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
datahub-frontend-react | at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
datahub-frontend-react | at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
datahub-frontend-react | at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
datahub-frontend-react | at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react | at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react | at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react | at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react | at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react | at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react | at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react | at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react | at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
datahub-frontend-react | at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
datahub-frontend-react | at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
datahub-frontend-react | at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
datahub-frontend-react | at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
datahub-frontend-react | at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
datahub-frontend-react | at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
datahub-frontend-react | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
datahub-frontend-react | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
datahub-frontend-react | at java.lang.Thread.run(Thread.java:748)
adamant-pharmacist-61996
07/26/2021, 8:49 AMcool-iron-6335
07/26/2021, 2:02 PMt2
that belongs to db2
is not supposed to be ingested into DataHub. I have only db1
and db2
in my databasefancy-controller-77815
07/26/2021, 6:44 PMgifted-queen-61023
07/27/2021, 10:52 AM.json
or .csv
with information regarding some dashboards that we use. We would like to extend the data discovery capabilities of DataHub with not only automatic discovery (awesome experience so far) but also manual introduced metadata.
In this way we can easily add and tweak metadata according to most used report, spread across all platforms (metadata like, title
, description
(*with a link to the too*l), tags
, etc.).
I'm aware of the source type file
, but it seems to verbose due to being "from a previously generated file". Is it easy develop .json
with correct sintaxe to feed Datahub?
Also noticed that demo_data.json
is generated by a .csv
(diretives) with the help of enrich.py
script (source). Is it easy to tweak it to chose if it should fall under Dashboards instead of Datasets? Or even make it a feature? 😊
Thanks in advance 🙂colossal-furniture-76714
07/27/2021, 3:52 PM