DataHub #troubleshoot

Join Slack

prehistoric-room-17640

02/14/2022, 2:10 PM

(through search)

prehistoric-room-17640

02/14/2022, 2:14 PM

It must be related to elasticsearch index but I don't see any exceptions in the GMS pod. just this warning.

Copy code

14:10:14.862 [Thread-3020] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/*index_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.16.2-2b937c44140b6559905130a8650c64dbd0879cfb "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

brave-businessperson-3969

02/14/2022, 8:37 PM

Hi everyone, I have a question concerning ingestion: We deployed DataHub on an OpenShift cluster for testing purposes. Pods look fine as far as I can tell and the frontend is accessible via webbrowser. However, it is not possible to ingest data somehow. From the pod which performs the ingestion (self-build) the gms service should be reachable (wget http://datahub-gms:8080/config returns a json file) but when running datahub ingest after like 30 or 40 seconds I get the following warning few times warning and then datahub ingest just exits: WARNING {urllib3.connectionpool:810} - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc756d78400>: Failed to establish a new connection: [Errno 110] Connection timed out'))': http://datahub-gms:8080/config Any idea what could cause this error? (From the ingestion pod/container, currently only the gms pod is reachable, as a sink we use datahub-rest)

rich-policeman-92383

02/17/2022, 7:29 AM

Hi Using the restli api how do i define ownership of a group. What will be the exact payload to define group ownership. https://datahubproject.io/docs/metadata-service/#get-a-corpgroup

👀 1

plus1 1

damp-minister-31834

02/18/2022, 3:40 AM

What rest api should I call?

gifted-piano-21322

02/21/2022, 10:13 AM

Awesome, thanks!

broad-thailand-41358

02/22/2022, 6:49 PM

No ideas? this is getting quite frustrating

able-rain-74449

03/01/2022, 2:08 PM

Hi All i am gettting error when i deploy

prerequisites-cp-schema-registry

not sure if that's Kafka not connecting.

Copy code

➜  01pre-req kubectl logs datahub-prerequisites-cp-schema-registry-65d8777cc8-m88mn cp-schema-registry-server
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
===> Running preflight checks ... 
===> Check if Kafka is healthy ...
[main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: 
        bootstrap.servers = [z-1.datahub-demo-cluster-......................OMITTED:9092]
        client.dns.lookup = use_all_dns_ips
        client.id = 
        <http://connections.max.idle.ms|connections.max.idle.ms> = 300000
        <http://default.api.timeout.ms|default.api.timeout.ms> = 60000
        <http://metadata.max.age.ms|metadata.max.age.ms> = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        <http://metrics.sample.window.ms|metrics.sample.window.ms> = 30000
        receive.buffer.bytes = 65536
        <http://reconnect.backoff.max.ms|reconnect.backoff.max.ms> = 1000
        <http://reconnect.backoff.ms|reconnect.backoff.ms> = 50
        <http://request.timeout.ms|request.timeout.ms> = 30000
        retries = 2147483647
        <http://retry.backoff.ms|retry.backoff.ms> = 100
        sasl.client.callback.handler.class = null
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism = GSSAPI
        security.protocol = PLAINTEXT
        security.providers = null
        send.buffer.bytes = 131072
        <http://socket.connection.setup.timeout.max.ms|socket.connection.setup.timeout.max.ms> = 127000
        <http://socket.connection.setup.timeout.ms|socket.connection.setup.timeout.ms> = 10000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.certificate.chain = null
        ssl.keystore.key = null
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.3
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.certificates = null
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS

[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 6.1.0-ccs
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 5496d92defc9bbe4
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1646143502378
[kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1646143532389, tries=1, nextAllowedTryMs=1646143532490) timed out at 1646143532390 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
[main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list.
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1646143542388, tries=1, nextAllowedTryMs=1646143542489) timed out at 1646143542389 after 1 attempt(s)
        at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
        at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
        at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
        at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:149)
        at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listNodes, deadlineMs=1646143542388, tries=1, nextAllowedTryMs=1646143542489) timed out at 1646143542389 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
[main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ...
[main] ERROR io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Brokers found [].

able-rain-74449

03/01/2022, 2:09 PM

any help would be great BTW: i have converted helm into yaml.

miniature-account-72792

03/01/2022, 2:39 PM

Have you set the correct bootstrap server in the

values.yaml

of the prerequisites?

able-rain-74449

03/01/2022, 2:42 PM

i am not using helm

able-rain-74449

03/01/2022, 2:43 PM

so my deployment looks like

Copy code

---
# Source: datahub-prerequisites/charts/cp-helm-charts/charts/cp-schema-registry/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: datahub-prerequisites-cp-schema-registry
  namespace: datahub
  labels:
    app: cp-schema-registry
    chart: cp-schema-registry-0.1.0
    release: datahub-prerequisites
    heritage: Helm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cp-schema-registry
      release: datahub-prerequisites
  template:
    metadata:
      labels:
        app: cp-schema-registry
        release: datahub-prerequisites
      annotations:
        <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
        <http://prometheus.io/port|prometheus.io/port>: "5556"
    spec:
      containers:
        - name: prometheus-jmx-exporter
          image: "solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143"
          imagePullPolicy: "IfNotPresent"
          command:
          - java
          - -XX:+UnlockExperimentalVMOptions
          - -XX:+UseCGroupMemoryLimitForHeap
          - -XX:MaxRAMFraction=1
          - -XshowSettings:vm
          - -jar
          - jmx_prometheus_httpserver.jar
          - "5556"
          - /etc/jmx-schema-registry/jmx-schema-registry-prometheus.yml
          ports:
          - containerPort: 5556
          resources:
            {}
          volumeMounts:
          - name: jmx-config
            mountPath: /etc/jmx-schema-registry
        - name: cp-schema-registry-server
          image: "confluentinc/cp-schema-registry:6.1.0"
          imagePullPolicy: "IfNotPresent"
          ports:
            - name: schema-registry
              containerPort: 8081
              protocol: TCP
            - containerPort: 5555
              name: jmx
          resources:
            {}
          env:
          - name: SCHEMA_REGISTRY_HOST_NAME
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: SCHEMA_REGISTRY_LISTENERS
            value: <http://0.0.0.0:8081>
          - name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS
            value: z-1.datahub-demo-cluster-1..............OMITTEED..........:9092 #:9092
          - name: SCHEMA_REGISTRY_KAFKASTORE_GROUP_ID
            value: datahub-prerequisites
          - name: SCHEMA_REGISTRY_MASTER_ELIGIBILITY
            value: "true"
          - name: SCHEMA_REGISTRY_HEAP_OPTS
            value: "-Xms512M -Xmx512M"
          - name: JMX_PORT
            value: "5555"
      volumes:
      - name: jmx-config
        configMap:
          name: datahub-prerequisites-cp-schema-registry-jmx-configmap

able-rain-74449

03/01/2022, 2:44 PM

the configumape.yaml

Copy code

---
# Source: datahub-prerequisites/charts/cp-helm-charts/charts/cp-schema-registry/templates/jmx-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: datahub-prerequisites-cp-schema-registry-jmx-configmap
  namespace: datahub
  labels:
    app: cp-schema-registry
    chart: cp-schema-registry-0.1.0
    release: datahub-prerequisites
    heritage: Helm
data:
  jmx-schema-registry-prometheus.yml: |+
    jmxUrl: service:jmx:rmi:///jndi/<rmi://localhost:5555/jmxrmi>
    lowercaseOutputName: true
    lowercaseOutputLabelNames: true
    ssl: false
    whitelistObjectNames:
    - kafka.schema.registry:type=jetty-metrics
    - kafka.schema.registry:type=master-slave-role
    - kafka.schema.registry:type=jersey-metrics
    rules:
    - pattern : 'kafka.schema.registry<type=jetty-metrics>([^:]+):'
      name: "cp_kafka_schema_registry_jetty_metrics_$1"
    - pattern : 'kafka.schema.registry<type=master-slave-role>([^:]+):'
      name: "cp_kafka_schema_registry_master_slave_role"
    - pattern : 'kafka.schema.registry<type=jersey-metrics>([^:]+):'
      name: "cp_kafka_schema_registry_jersey_metrics_$1"

able-rain-74449

03/01/2022, 2:45 PM

and the service.yaml

Copy code

---
# Source: datahub-prerequisites/charts/cp-helm-charts/charts/cp-schema-registry/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: datahub-prerequisites-cp-schema-registry
  namespace: datahub
  labels:
    app: cp-schema-registry
    chart: cp-schema-registry-0.1.0
    release: datahub-prerequisites
    heritage: Helm
spec:
  ports:
    - name: schema-registry
      port: 8081
    - name: metrics
      port: 5556
  selector:
    app: cp-schema-registry
    release: datahub-prerequisites

able-rain-74449

03/01/2022, 2:49 PM

also

datahub-elasticsearch-master-2

not ready 🤔

red-napkin-59945

03/01/2022, 9:30 PM

hey team, I would like to check what the status is of the "Long Term" items described here

miniature-account-72792

03/02/2022, 7:12 AM

I also saw that my

datahub-upgrade-job

is failing with the following error

Copy code

Cannot connect to GMSat host datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.

Is this also related to the fact that I use certificates?

bland-orange-95847

03/02/2022, 9:50 AM

Just found this thread and I have the same issue like @numerous-application-54063 with bigquery. The first run is working and checkpoint gets created but in the second run it cannot read the checkpoint and fails with

Message: “Failed to construct checkpoint’s config from checkpoint aspect.”

Arguments: (ConfigurationError(‘BigQuery project ids are globally unique. You do not need to specify a platform instance.’),)

I think there is something different with the platform instances as they are not supported by bigquery source

red-napkin-59945

03/03/2022, 5:25 PM

I would like to know what is

FACET_FIELDS

rhythmic-bear-20384

03/04/2022, 5:17 AM

The datahub actions container seems to get killed after a while when using the quick start. This is leading to ingestion being non responsive. The logs from actions container show that the health check URL is unreachable. The datahbub-gms container is up and running and I verified that the actions container is part of the datahub-network. Any ideas on what is happening and suggestions on fixes?

gorgeous-dinner-4055

03/16/2022, 6:00 AM

Sorry to revive this old thread, but could you clarify 2 John? In the GraphQLEntityResolver I am seeing: https://github.com/datahub-project/datahub/blob/55357783f330950408e4624b3f1421594c[…]rc/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java Which is used for the autocomplete feature: https://github.com/datahub-project/datahub/blob/55357783f330950408e4624b3f1421594c[…]rc/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java Without that turned on, I'm unable to make a new entity show up in search. in the UI below, the graphQL call for autocomplete the

getAutoCompleteMultipleResults

function is called, and the searchable types is registered for autocomplete:

Copy code

.dataFetcher("autoCompleteForMultiple", new AuthenticatedResolver<>(
                    new AutoCompleteForMultipleResolver(searchableTypes)))

early-midnight-66457

03/16/2022, 7:58 AM

i am facing this error after running an app for almost an hour.

early-midnight-66457

03/16/2022, 7:59 AM

app is trying to create a new thread and is unable to do so.

early-midnight-66457

03/16/2022, 7:59 AM

any suggestions would be helpful

fierce-author-36990

03/16/2022, 10:10 AM

1647425391(1).jpg,B2V1IUT~5~YSWDE%DX1G{53.png

high-family-71209

03/18/2022, 12:15 PM

This seems like an unsolved thing for quickstart. There seems to be something like a race condition where the zookeeper doesn't come up in time for the kafka-setup.

little-salesmen-55578

03/23/2022, 4:59 PM

Can anyone help debug this? I am out of ideas now 🙂

👀 1

bulky-intern-2942

03/30/2022, 7:40 PM

Hi Pedro, Okay, I´ve just deleted the message posted in the other channel. I´m gonna downgrade the cluster version and retry the installation proccess. Thanks.

sticky-dawn-95000

04/01/2022, 7:22 AM

I tried to run DataHub using CLI command ‘datahub docker quickstart’, but I got the error like bellow:

brief-businessperson-12356

04/04/2022, 11:12 AM

Finally managed to get this working! I made two small changes which seemed to do the trick! 1. Created a new java truststore that contained just the CA for mkcert 2. Created a configmap from that new truststore:

Copy code

kubectl create configmap truststore-configmap --from-file=newTruststore

🎉 3