numerous-address-22061
08/31/2023, 6:40 PMable-library-93578
08/31/2023, 11:21 PMfierce-doctor-85079
09/01/2023, 7:15 AMfierce-doctor-85079
09/01/2023, 7:16 AMfierce-doctor-85079
09/01/2023, 7:17 AMbland-orange-13353
09/01/2023, 7:50 AMlate-addition-48515
09/01/2023, 8:27 AMdef _post_lineage(self, parents, child):
# Implement the API call here
lineage_mce = builder.make_lineage_mce(
[builder.make_dataset_urn("sbx-ml", 'dataset_1'),
builder.make_dataset_urn("sbx-ml", 'dataset_2')], # upstream
builder.make_dataset_urn("sbx-ml", 'dataset_3), # downstream
)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://34:8080>")
# Emit metadata!
emitter.emit_mce(lineage_mce)
purple-refrigerator-27989
09/04/2023, 6:12 AMpurple-refrigerator-27989
09/04/2023, 7:35 AMfuture-yak-13169
09/05/2023, 2:18 AMdatahub-frontend:
image:
repository:
imagePullSecrets:
- name:
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 1Gi
cpu: 500m
datahub-gms:
image:
repository:
imagePullSecrets:
- name:
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 1000m
memory: 4Gi
livenessProbe:
initialDelaySeconds: 120
readinessProbe:
initialDelaySeconds: 120
extraEnvs:
- name: DATAHUB_TELEMETRY_ENABLED
value: "false"
- name: EBEAN_MAX_CONNECTIONS
value: "400"
- name: EBEAN_WAIT_TIMEOUT_MILLIS
value: "9000"
elasticsearchSetupJob:
image:
repository:
resources:
limits:
cpu: 250m
memory: 512Mi
requests:
cpu: 250m
memory: 512Mi
kafkaSetupJob:
image:
repository:
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 1000m
memory: 1024Mi
datahubUpgrade:
enabled: true
image:
repository:
imagePullSecrets:
- name:
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 250m
memory: 256Mi
restoreIndices:
resources:
limits:
cpu: 800m
memory: 3Gi
requests:
cpu: 500m
memory: 2Gi
esJavaOpts: "-Xmx2048m -Xms2048m"
datahubSystemUpdate:
image:
repository:
podSecurityContext: {}
securityContext: {}
podAnnotations: {}
resources:
limits:
cpu: 2000m
memory: 2048Mi
requests:
cpu: 1000m
memory: 1024Mi
global:
graph_service_impl: elasticsearch
sql:
datasource:
host:
hostForMysqlClient:
url:
username:
password:
secretRef: mysql-secrets
secretKey: mysql-root-password
kafka:
schemaregistry:
url: "http://prerequisites-cp-schema-registry:8081"
type: KAFKA
datahub:
version: v0.10.4
metadata_service_authentication:
enabled: true
-------------------------------------------------------------------
elasticsearch:
image:
imagePullSecrets:
- name:
sysInitContainer:
enabled: false
sysctlInitContainer:
enabled: false
esJavaOpts: "-Xmx2048m -Xms2048m"
replicas: 3
resources:
requests:
cpu: 100m
memory: 2Gi
limits:
cpu: 200m
memory: 4Gi
livenessProbe:
initialDelaySeconds: 120
readinessProbe:
initialDelaySeconds: 120
kafka:
global:
imageRegistry:
imagePullSecrets:
-
image:
registry:
repository: bitnami/kafka
pullSecrets:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
initialDelaySeconds: 120
readinessProbe:
initialDelaySeconds: 120
persistence:
enabled: true
storageClass: "nas"
accessModes:
- ReadWriteOnce
size: 200Gibusy-analyst-35820
09/05/2023, 4:59 AMfierce-doctor-85079
09/05/2023, 5:17 AMfierce-doctor-85079
09/05/2023, 6:02 AMbland-orange-95847
09/05/2023, 6:11 AMMETADATA_SERVICE_AUTH
enabled and group ownership policies. Its hard to describe but if you have some ideas in that area please have a look at this GitHub issue https://github.com/datahub-project/datahub/issues/8781
appreciate any help.
For me it looks like at some place too many data is fetched and one indirection is not resolved correctly, but maybe I am missing something 🙂future-yak-13169
09/05/2023, 9:51 AMCaused by: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:887)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:283)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:87)
... 13 common frames omitted
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:279)
... 19 common frames omitted
Any advice on why this could be happening ? something to do with elasticsearch ?quick-pizza-8906
09/05/2023, 11:19 AM0.10.x
version of Datahub. I recently got hurt by searchAcrossEntities
facets (buckets) count limited to 20. I found 2 related settings to this:
1. Environment variable ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE
2. searchFlags.maxAggValues
input parameter for the query
What I have noticed is that changing value of ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE
does not change actual limit of returned bucket count, while changing searchFlags.maxAggValues
(at least in version 0.10.5
) actually changes the bucket count limit. I am a bit confused what is intended relation between the env variable and the query input parameter?
This puzzles me especially, considering the query builder:
https://github.com/datahub-project/datahub/blob/master/metadata-io/src/main/java/c[…]ta/search/elasticsearch/query/request/SearchRequestHandler.java
Does not use finalSearchFlags
when building aggregations as well as does not seem to use parameters coming from ELASTICSEARCH_QUERY_MAX_TERM_BUCKET_SIZE
. What am I missing here?mysterious-advantage-78411
09/05/2023, 1:19 PMchilly-potato-57465
09/05/2023, 1:38 PMbig-nightfall-99541
09/05/2023, 1:55 PM'Unable to emit metadata to DataHub GMS: java.lang.RuntimeException: Unknown aspect upstreamLineage for entity mlmodelgroup'
What I'm doing wrong?
[In the thread there is the full script and traceback]
Thank you!gentle-gold-63488
09/05/2023, 3:53 PMgentle-gold-63488
09/05/2023, 3:53 PMcolossal-football-58924
09/05/2023, 5:51 PMable-library-93578
09/05/2023, 7:28 PMbroad-grass-53166
09/05/2023, 9:24 PMTraceback (most recent call last):
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/entrypoints.py", line 10, in <module>
from datahub.cli.check_cli import check
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/cli/check_cli.py", line 13, in <module>
from datahub.ingestion.run.pipeline import Pipeline
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 29, in <module>
from datahub.ingestion.extractor.extractor_registry import extractor_registry
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/ingestion/extractor/extractor_registry.py", line 1, in <module>
from datahub.ingestion.api.registry import PluginRegistry
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/ingestion/api/registry.py", line 18, in <module>
import entrypoints
File "/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/entrypoints.py", line 10, in <module>
from datahub.cli.check_cli import check
ImportError: cannot import name 'check' from partially initialized module 'datahub.cli.check_cli' (most likely due to a circular import) (/home/asaniya/code/datahub-master/metadata-ingestion/src/datahub/cli/check_cli.py)
I have already tried following the steps noted in below documentation: https://datahubproject.io/docs/metadata-ingestion/developing/#requirements
I am at a point where I am build the code but unable to run it. Ideally, I would like to run this in IntelliJ. Could you please help to resolve the above issue?
CC: @hundreds-photographer-13496clever-dinner-20353
09/06/2023, 4:02 AMinlets
are not showing up on DataHub
Here is the code
task1 = BashOperator(
task_id="run_data_task",
dag=dag,
bash_command="echo 'This is where you might run your data tooling.'",
inlets=[
Dataset(platform="snowflake", name="mydb.schema.tableA"),
Dataset(platform="snowflake", name="mydb.schema.tableB", env="DEV"),
Dataset(
platform="snowflake",
name="mydb.schema.tableC",
platform_instance="cloud",
),
# You can also put dataset URNs in the inlets/outlets lists.
Urn(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableC,PROD)"
),
],
outlets=[Dataset("snowflake", "mydb.schema.tableD")],
)
and here is the lineage. It should show all the previous snowflake datasets..able-library-93578
09/06/2023, 10:21 PMMETADATA_SERVICE_AUTH_ENABLED
active as well. Below is my yaml for the action:
# hello_world.yaml
name: "hello_world"
source:
type: "kafka"
config:
connection:
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-prerequisites-kafka:9092}
schema_registry_url: ${SCHEMA_REGISTRY_URL:-<http://prerequisites-cp-schema-registry:8081>}
filter:
event_type: "EntityChangeEvent_v1"
event:
category: "TAG"
operation: [ "ADD", "REMOVE" ]
modifier: "urn:li:tag:SourcesSDP"
action:
type: "hello_world"
datahub:
server: "<https://my-datahub-domain.com/api/gms>"
token: "my-token"
Here is my logs from the cli:
datahub actions -c hello_world.yaml
[2023-09-06 15:15:33,421] INFO {datahub_actions.cli.actions:76} - DataHub Actions version: 0.0.13
[2023-09-06 15:15:34,298] INFO {datahub_actions.cli.actions:119} - Action Pipeline with name 'hello_world' is now running.
%3|1694038534.460|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 179ms in state CONNECT)
%3|1694038536.289|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 1 identical error(s) suppressed)
%3|1694038567.357|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 16 identical error(s) suppressed)
%3|1694038597.424|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038627.493|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038657.561|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038687.640|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038717.702|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038747.775|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 15 identical error(s) suppressed)
%3|1694038778.840|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 16 identical error(s) suppressed)
%3|1694038809.898|FAIL|rdkafka#consumer-1| [thrd:prerequisites-kafka:9092/bootstrap]: prerequisites-kafka:9092/bootstrap: Failed to resolve 'prerequisites-kafka:9092': nodename nor servname provided, or not known (after 3ms in state CONNECT, 16 identical error(s) suppressed)
^C[2023-09-06 15:20:31,393] INFO {datahub_actions.cli.actions:137} - Stopping all running Action Pipelines...
[2023-09-06 15:20:32,803] INFO {datahub_actions.plugin.source.kafka.kafka_event_source:178} - Kafka consumer exiting main loop
[2023-09-06 15:20:32,804] INFO {datahub_actions.pipeline.pipeline_manager:81} - Actions Pipeline with name 'hello_world' has been stopped.
Pipeline Report for hello_world
Started at: 2023-09-06 15:15:34.297000 (Local Time)
Duration: 298.508s
Pipeline statistics
{
"started_at": 1694038534297
}
Action statistics
{}
Any advice on what to tweak is greatly appreciated.best-laptop-39921
09/07/2023, 2:00 AM\q fieldPaths: column_name
), as it doesn't work.
Only \q name:
works.
Any advice would be greatly appreciated. Thank you. :)
(I used helm chart. --> any settings for advanced query...?)quiet-arm-91745
09/07/2023, 8:09 AMmetadata:
annotations:
gke-gcsfuse/volumes: "true"
otherwise i can't mount gcs bucket as volume
thanks in advancebitter-florist-92385
09/07/2023, 8:31 AMfrom datahub import DataHubClient, MetadataChangeEvent
i get an Import Error. Is the package not complete, or am i missing something else ?mysterious-advantage-78411
09/07/2023, 9:34 AM