broad-crowd-13788
11/12/2021, 10:35 PMCaused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
stocky-guitar-68560
11/13/2021, 4:46 PMnice-planet-17111
11/15/2021, 1:17 AMclean-crayon-15379
11/15/2021, 8:55 AMwooden-arm-26381
11/15/2021, 1:11 PMFailed to remove term: An unknown error occurred.
Only a re-ingestion of those terms and then removing them seems to work for me.handsome-belgium-11927
11/15/2021, 2:05 PMThe field at path '/listRecommendations/modules[4]/content[0]/entity/glossaryTermInfo' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'GlossaryTermInfo' within parent type 'GlossaryTerm' (code undefined)
Appears everywhere after glossary ingestionbrief-lizard-77958
11/15/2021, 2:11 PMfuture-hamburger-62563
11/16/2021, 2:39 AMdocker compose
v2 is throwing an error when the environment variables have whitespace in them. The command also appears to throw an error when there is a dot (.) in the key of shell variable. Shirshanka mentioned there was just one file to change, but I can't figure out how to do it. I thought about putting escaped quotes around the values in/around line 50 of generate_docker_quickstart.py
but I dont know how it would affect numerical values like ip addresses or port numbers.
Another area I'm unsure of is the Actions/build & test. Its failing at quickstart-compose-validation, but I'm not really getting what's happening. Is the script running quickstart_docker_quickstart.sh
that is in docker/quickstart
in order to generate the temp.quickstart.yml
and generating/comparing it to a fresh copy generated by generate_docker_quickstart.py
. Or is there something else happening? (Like the temp.quickstart.yml getting pulled from some preconfigured folder out of sight).
PR: https://github.com/linkedin/datahub/pull/3522
Any thoughts would be appreciated. If you want to test it on your system load docker, turn on v2 docker compose in the settings and try and run docker/dev.sh
. In my case, I got an error: https://datahubproject.io/docs/docker/development#unexpected-characternice-planet-17111
11/17/2021, 5:55 AMdatabase_alias
when ingestion from mysql? (Mine does not work, it just get ingested as original database name, and even in urn it's still the original name. )melodic-helmet-78607
11/17/2021, 6:51 AMhandsome-football-66174
11/17/2021, 1:37 PMbetter-orange-49102
11/17/2021, 1:46 PMjavax.servlet.ServletException: org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.UnsupportedOperationException: GraphQL gets not supported.
aloof-london-98698
11/17/2021, 5:28 PM1 validation error for SnowflakeConfig
database_pattern -> allow
value is not a valid list (type=type_error.list)
source:
type: "snowflake"
config:
# Coordinates
host_port: "xxxxxx"
warehouse: "xxxxxxx"
# Credentials
username: "username"
password: "password"
role: "role"
include_table_lineage: "True"
database_pattern:
allow: "database_name"
ignoreCase: "True"
sink:
type: "datahub-rest"
config:
server: xxxxxx
agreeable-thailand-43234
11/17/2021, 11:27 PMacryl-datahub, version 0.8.16.11
with the docker quickstart, image tag says head
, I’m trying to ingest data using linkedin/datahub-ingestion
docker image with the following command
docker run -v /Desktop/test:/datahub-ingestion linkedin/datahub-ingestion ingest -c ./datahub-ingestion/config.yaml
the config.yaml looks like this:
source:
type: "athena"
config:
# Coordinates
aws_region: "xxx"
work_group: "xxx"
# Credentials
username: "xxx"
password: "xxx"
database: "xxx"
# Options
s3_staging_dir: "<s3://xxx/>"
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>". #also tried "<http://datahub-gms:8080>"
then i’ve got this error:
ERROR {datahub.ingestion.run.pipeline:52} - failed to write record with workunit admincube.cubeprod with ('Unable to emit metadata to DataHub GMS', {'message': "HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /datasets?
I tried linkedin/datahub-ingestion:latest
as well as linkedin/datahub-ingestion:head
Any idea?
Cheers!handsome-belgium-11927
11/18/2021, 1:29 PMRelated Entities
tab is empty. What may be the problem?thousands-intern-95970
11/18/2021, 2:45 PMmysterious-park-53124
11/19/2021, 4:07 AMjava.net.UnknownHostException: schema-registry
after run docker-compose up
I custom file user.props, then run docker-compose up
and ingest data
What may be the problem?
datahub-gms | 04:05:22.429 [qtp544724190-23] ERROR i.c.k.s.client.rest.RestService - Failed to send HTTP request to endpoint: <http://schema-registry:8081/subjects/MetadataAuditEvent_v4-value/versions>
datahub-gms | java.net.UnknownHostException: schema-registry
datahub-gms | at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
datahub-gms | at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
datahub-gms | at java.net.Socket.connect(Socket.java:607)
datahub-gms | at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
datahub-gms | at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
datahub-gms | at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
datahub-gms | at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
datahub-gms | at sun.net.www.http.HttpClient.New(HttpClient.java:339)
datahub-gms | at sun.net.www.http.HttpClient.New(HttpClient.java:357)
datahub-gms | at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
some-microphone-33485
11/19/2021, 5:14 PMnutritious-bird-77396
11/19/2021, 11:03 PMdatahub-frontend
using docker and I am able to browse thru datasets successfully.
When clicking on Analytics I get An unknown error occurred. (code 500)
I have DATAHUB_ANALYTICS_ENABLED=true
To connect to ES I have these env variables...
ELASTIC_CLIENT_HOST=<http://zzzzzzzzz.us-east-1.es.amazonaws.com|zzzzzzzzz.us-east-1.es.amazonaws.com>
ELASTIC_CLIENT_PORT=443
ELASTIC_CLIENT_USERNAME=username
ELASTIC_CLIENT_PASSWORD=password
ELASTIC_CLIENT_USE_SSL=true
USE_AWS_ELASTICSEARCH=true
Any idea if this is because of ES/Kafka Connection or something else? Don't see much in the logs though...nutritious-bird-77396
11/19/2021, 11:06 PMELASTICSEARCH_HOST
whereas in frontend ELASTIC_CLIENT_HOST
I think its not a bad idea to make the env vars haev the same name across apps...red-pizza-28006
11/22/2021, 12:27 PMred-pizza-28006
11/22/2021, 7:23 PMbreezy-guitar-97226
11/23/2021, 11:31 AMCaused by: org.pac4j.core.exception.TechnicalException: State parameter is different from the one sent in authentication request. Session expired or possible threat of cross-site request forgery
wonderful-quill-11255
11/23/2021, 12:59 PMhttp
with the gms, not https
. In our setup all datahub components talk with each other over SSL. However, if I change the scheme here to https I get a 400 Bad Request response back from the GMS. I was wondering if I'm missing something else that might have to be configured to make the connection work over SSL. I see that by coincidence, 12 hours ago some support for https was committed to master, but we prefer to stay a few releases behind latest. Perhaps @big-carpet-38439 you have a tip?brief-wolf-70822
11/23/2021, 8:10 PM❯ kubectl exec -n xxxxxx datahub-datahub-gms-7bfb87d7cd-7sksf -- env | grep METADATA
METADATA_CHANGE_EVENT_NAME=xxx.MetadataChangeEvent_v4
METADATA_AUDIT_EVENT_NAME=xxx.MetadataAuditEvent_v4
FAILED_METADATA_CHANGE_EVENT_NAME=xxx.FailedMetadataChangeEvent
However, GMS startup fails with:
java.lang.IllegalStateException: Topic(s) [MetadataChangeEvent_v4] is/are not present and missingTopicsFatal is true
I also tried setting SPRING_KAFKA_LISTENER_MISSING_TOPICS_FATAL=false
but that didn't seem to do anything. Any advice?lively-jackal-83760
11/24/2021, 12:23 PMnice-country-99675
11/24/2021, 11:36 PMdatahub delete -query AuM
there's one match for the query but it's not deleted... I was able to delete everything else but for some reason there are two dataset that refuse to be deleted 🤷 ... do you want me to provide you more debug information before I nuke the DB?lemon-receptionist-90470
11/25/2021, 1:04 PMdatahub-gms:
enabled: true
image:
repository: xxxxx/datahub-gms
tag: "v0.8.14"
service:
type: ClusterIP
ingress:
enabled: true
annotations:
<http://cert-manager.io/cluster-issuer|cert-manager.io/cluster-issuer>: vault
hosts:
- host: "datahub-gms-api.xxxx.xxxx"
paths: ["/"]
tls:
- secretName: datahub-gms-tls
hosts:
- "datahub-gms-api.xxxx.xxxx"
My file custom-ingestion.yml
source:
type: file
config:
# Coordinates
filename: output.json
sink:
type: "datahub-rest"
config:
server: "<http://datahub-gms-api.xxxx.xxxx>"
Error
When I execute datahub ingest -c custom-ingestion.yml --dry-run
I have the following error:
HTTPError: 404 Client Error: Not Found for url: <http://datahub-gms-api.xxxx.xxxx/config>
Is something missing me?
Thanks!abundant-flag-19546
11/26/2021, 8:32 AMimport datahub.emitter.mce_builder as builder
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Construct a lineage object.
lineage_mce = builder.make_lineage_mce(
[], # Empty upstream dataset to delete the lineage.
builder.make_dataset_urn("bigquery", "test.TEST_DATASET.dev", "PROD"),
)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://localhost:8080>")
# Emit metadata!
emitter.emit_mce(lineage_mce)
How can I delete the lineage with python REST Emitter?
I’m using the latest(v0.8.17
) version.red-pizza-28006
11/29/2021, 3:44 PM0.8.17.2
, I suddenly started seeing this error in Airflow DAGs. The only change in the config is to add a simple transformer of adding dataset owners.
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1869, in _instantiate_datasource_from_config
] = self._build_datasource_from_config(name=name, config=config)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1938, in _build_datasource_from_config
config_defaults={"module_name": module_name},
File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/util.py", line 121, in instantiate_class_from_config
class_instance = class_(**config_with_defaults)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 64, in sqlalchemy_datasource_init
underlying_datasource_init(self, *args, **kwargs, engine=conn)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/datasource/sqlalchemy_datasource.py", line 217, in __init__
name, "ModuleNotFoundError: No module named 'sqlalchemy'"
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_sqlalchemy_datasource-a18b60ef-52a5-481c-a73f-769ff10a8ffe, error: ModuleNotFoundError: No module named 'sqlalchemy'
During handling of the above exception, another exception occurred: