salmon-rose-54694
02/25/2022, 5:36 AMcurved-carpenter-44858
02/25/2022, 6:42 AMsource:
type: hive
config:
scheme: hive+http
host_port: 'hive-metastore.hive.svc.cluster.local:9083'
database: null
username: null
password: null
sink:
type: datahub-rest
config:
server: '<http://datahub-datahub-gms.datahub.svc.cluster.local:8080>'
When I ran it from the datahub frontend I got the below error. (pasted partial logs)
......
version, status, reason = self._read_status()\n'
'File "/usr/local/lib/python3.9/http/client.py", line 289, in _read_status\n'
' raise RemoteDisconnected("Remote end closed connection without"\n'
'\n'
'RemoteDisconnected: Remote end closed connection without response\n',
"2022-02-25 06:21:18.926125 [exec_id=a071f153-5777-419f-9511-37214e1429b6] INFO: Failed to execute 'datahub ingest'",
'2022-02-25 06:21:18.926532 [exec_id=a071f153-5777-419f-9511-37214e1429b6] INFO: Caught exception EXECUTING '
'task_id=a071f153-5777-419f-9511-37214e1429b6, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
.......
In the metastore logs I found this. am I miss anything ? what could be the reason ?
2022-02-25T06:19:46,599 ERROR [pool-6-thread-200] server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:228) ~[libthrift-0.9.3.jar:0.9.3]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:76) ~[hive-standalone-metastore-3.1.2.jar:3.1.2]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar:0.9.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
few-air-56117
02/25/2022, 10:18 AMfew-air-56117
02/25/2022, 11:10 AMsource:
type: bigquery
config:
project_id: <project>
credential:
include_table_lineage: true
stateful_ingestion.enabled: true
sink:
type: datahub-rest
config:
server: '<http://localhost:8080>'
but i got this
stateful_ingestion.enabled
extra fields not permitted (type=value_error.extra)
and
source:
type: bigquery
config:
project_id: am-dwh-t1
credential:
include_table_lineage: true
stateful_ingestion:
enabled: true
sink:
type: datahub-rest
config:
server: '<http://localhost:8080>'
but i got
: 'BigQuerySource' object has no attribute 'config
Thx 😄dazzling-judge-80093
02/25/2022, 11:13 AMpipeline_name
-> https://datahubproject.io/docs/metadata-ingestion/source_docs/stateful_ingestion/some-crayon-90964
02/25/2022, 3:44 PMshy-island-99768
02/25/2022, 4:06 PMSource (lookml) report:
{'workunits_produced': 0,
'workunit_ids': [],
'warnings': {},
'failures': {'/models/google_ads.model.lkml': ['cannot resolve include /views/vm_datawarehouse/sales/sales_orderline.view.lkml']},
'models_discovered': 9,
'models_dropped': [...],
'views_discovered': 0,
'views_dropped': []}
Sink (datahub-rest) report:
{'records_written': 0,
'warnings': [],
'failures': [],
'downstream_start_time': None,
'downstream_end_time': None,
'downstream_total_latency_in_seconds': None}
Pipeline finished with failures
Note that im running this in docker:
FROM python:3.8-slim-bullseye
WORKDIR /app
COPY ./models /models
COPY ./views /views
COPY ./receipt.yml /app/
RUN pip install acryl-datahub[lookml]
RUN ls /views/vm_datawarehouse/sales/
CMD ["datahub", "ingest", "-c", "receipt.yml"]
With the receipt:
source:
type: "lookml"
config:
# Coordinates
base_folder: /models/
# Options
api:
# Coordinates for your looker instance
base_url: <https://host>
client_id: ID
client_secret: SECRET
github_info:
repo: VanMoof/looker
model_pattern:
allow:
- "google_ads"
sink:
type: "datahub-rest"
config:
server: "<http://HOST:8080>"
token: "TOKEN"
Note that when running the 'ls' I can see the view in the correct folder. Any ideas why this is failing? Something particular related to the absolute paths?red-accountant-48681
02/25/2022, 4:32 PMgifted-queen-80042
02/25/2022, 5:55 PMrapid-article-86196
02/27/2022, 10:43 AM.**email.**
should be tagged as email
rough-van-26693
02/28/2022, 4:30 AMrough-van-26693
02/28/2022, 5:10 AMThe field at path '/searchAcrossEntities/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult' The field at path '/searchAcrossEntities/searchResults[1]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult' The field at path '/searchAcrossEntities/searchResults[2]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'
rough-van-26693
02/28/2022, 6:16 AMwitty-dream-29576
02/28/2022, 1:23 PMnumerous-camera-74294
02/28/2022, 1:57 PMsilly-beach-19296
02/28/2022, 5:43 PMmillions-waiter-49836
02/28/2022, 7:44 PMpostgres
data sources but with same db and table name, so naturally we want to use platform_instance
to customize the URNs... only to find out postgres
recipe doesn't support platform_instance
as MySQL
and MSSQL
do. Can I ask if there is any special consideration for this?rough-van-26693
03/01/2022, 2:13 AMsquare-solstice-69079
03/01/2022, 9:22 AM'DatabaseError: (cx_Oracle.DatabaseError) DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared '
'object file: No such file or directory". See <https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html> for help\n'
'(Background on this error at: <http://sqlalche.me/e/13/4xp6>)\n',
"2022-03-01 09:16:17.371091 [exec_id=9a7d99e0-a12e-433e-ba3e-c4d384602ff6] INFO: Failed to execute 'datahub ingest'",
I did install the oracle package with pip install 'acryl-datahub[oracle]'.
Any thoughts on where the error could be?numerous-camera-74294
03/02/2022, 12:29 PMalert-hydrogen-52567
03/02/2022, 2:50 PMnumerous-holiday-52504
03/02/2022, 5:31 PMhost_port
piece but I can't figure out what is wrong. Whenever I connect with Python I seem to have to provide azure credentials as part of the connection string.
source:
type: snowflake
config:
host_port: [snowflakeaccount].[azure-region].<http://azure.snowflakecomputing.com|azure.snowflakecomputing.com>
warehouse: *****
username: *****
password: *****
role: ****
sink:
type: datahub-rest
config:
server: '<http://localhost:9002/api/gms>'
nutritious-bird-77396
03/02/2022, 6:46 PMusername
and password
config parameters?
Documentation states it supports it - https://datahubproject.io/docs/metadata-ingestion/source_docs/kafka-connect
But, in the code i don't see the creds passed for connection - https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/kafka_connect.py#L728nutritious-bird-77396
03/02/2022, 9:26 PMplatform_instance_map
parameter work in Kafka Connect Ingestion connector?
If there are 2 different postgres
instances each with its own platform_instance
name such as instance1
and instance2
how will the map parameter look?
Not sure how it would work for the same platform having multiple instances with the example - https://datahubproject.io/docs/metadata-ingestion/source_docs/kafka-connect#config-detailspowerful-nest-24866
03/03/2022, 2:53 AMnarrow-finland-38723
03/03/2022, 12:35 PMhigh-toothbrush-90528
03/03/2022, 2:25 PM[
{
"auditHeader":null,
"entityType":"container",
"entityUrn": "urn:li:container:DATAPR",
"changeType":"UPSERT",
"aspectName":"containerProperties",
"aspect":{
"value":"{\"name\": \"datahub_db\", \"description\": \"DPROD\" }",
"contentType":"application/json"
},
"systemMetadata":null
},
{
"auditHeader":null,
"entityType":"container",
"entityUrn": "urn:li:container:DATAPR",
"changeType":"UPSERT",
"aspectName":"domains",
"aspect":{
"value":"{\"domains\": [\"urn:li:domain:marketing\"] }",
"contentType":"application/json"
},
"systemMetadata":null
},
{
"auditHeader":null,
"entityType":"dataset",
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"changeType":"UPSERT",
"aspectName":"container",
"aspect":{
"value":"{\"container\": \"urn:li:container:DATAPR\" }",
"contentType":"application/json"
},
"systemMetadata":null
}
]
rich-policeman-92383
03/03/2022, 6:28 PMCaused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "pk_metadata_aspect_v2"
gifted-queen-80042
03/03/2022, 7:43 PMprofiling.enabled: True
in my recipe. I can see aspect names datasetProfile
in the output, along with the values in the respective aspects being produced, in terms of rowCount
, columnCount
, fieldProfiles
, etc. However, when I ingest onto the UI, with sink type set to datahub-kafka
, I don't see the data. The Stats
tab remains disabled. Any idea why this might be happening?mysterious-nail-70388
03/04/2022, 5:30 AM