https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • g

    gifted-knife-16120

    11/16/2022, 2:44 AM
    hi all, do we have solution to remove
    datahub
    user? or we can disable datahub. it is not passed for our security requirements
    i
    • 2
    • 8
  • g

    gifted-knife-16120

    11/16/2022, 3:45 AM
    can I remove here?
  • g

    green-hamburger-3800

    11/16/2022, 8:41 AM
    Hey folks, is it possible to disable the creation of some of the initial policies? Thanks (=
    a
    e
    • 3
    • 6
  • s

    steep-fountain-54482

    11/16/2022, 8:48 AM
    hello, i´m trying to emit an UpstreamLineage to datahub
  • s

    steep-fountain-54482

    11/16/2022, 8:48 AM
    after upsert operation I only see uptstreams on the UI but no no downstreams (fine-grained)
  • s

    steep-fountain-54482

    11/16/2022, 8:49 AM
    my UpstreamLineage contains both ....
  • s

    steep-fountain-54482

    11/16/2022, 8:49 AM
    Copy code
    val lineage = new UpstreamLineage()
        lineage.setFineGrainedLineages(new FineGrainedLineageArray(lineages.asJava))
        lineage.setUpstreams(new UpstreamArray(upstreams.asJava))
  • g

    green-hamburger-3800

    11/16/2022, 9:05 AM
    On another question... is it possible to create the first Personal Access Token programatically?! Thanks a lot!
    b
    • 2
    • 4
  • f

    fresh-cricket-75926

    11/16/2022, 9:32 AM
    Hi Everyone, we are trying to deploy javakeystore in datahub-frontend using helm charts , but when we tried to create config and mount extraVolumes and extraEnvs in values.yml , the pod is simple terminating with "oops cant start the server" message . Does anyone faced this type of issue ? ,would be helpful with suggestion/solution.
    a
    • 2
    • 1
  • b

    billowy-pilot-93812

    11/16/2022, 10:22 AM
    Hi all, I'm getting this error when try to ingest superset meta data any idea on this problems? Thank yo '[2022-11-16 101623,427] ERROR {datahub.entrypoints:185} - File ' '"/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/entrypoints.py", line 164, in main\n' ' 161 def main(**kwargs):\n' ' 162 # This wrapper prevents click from suppressing errors.\n' ' 163 try:\n' '--> 164 sys.exit(datahub(standalone_mode=False, **kwargs))\n' ' 165 except click.Abort:\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1130, in __call__\n' ' 1128 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:\n' ' (...)\n' '--> 1130 return self.main(*args, **kwargs)\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1055, in main\n' ' rv = self.invoke(ctx)\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n' ' return _process_result(sub_ctx.command.invoke(sub_ctx))\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1657, in invoke\n' ' return _process_result(sub_ctx.command.invoke(sub_ctx))\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 1404, in invoke\n' ' return ctx.invoke(self.callback, **ctx.params)\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/core.py", line 760, in invoke\n' ' return __callback(*args, **kwargs)\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func\n' ' return f(get_current_context(), *args, **kwargs)\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper\n' ' 290 def wrapper(*args: Any, **kwargs: Any) -> Any:\n' ' (...)\n' ' 343 "status": "error",\n' ' 344 "error": get_full_class_name(e),\n' ' 345 },\n' ' 346 )\n' '--> 347 raise e\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper\n' ' 290 def wrapper(*args: Any, **kwargs: Any) -> Any:\n' ' (...)\n' ' 295 telemetry_instance.ping(\n' ' 296 "function-call", {"function": function, "status": "start"}\n' ' 297 )\n' ' 298 try:\n' '--> 299 res = func(*args, **kwargs)\n' ' 300 telemetry_instance.ping(\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in ' 'wrapper\n' ' 86 def wrapper(ctx: click.Context, *args: P.args, **kwargs: P.kwargs) -> Any:\n' ' (...)\n' ' 91 )\n' ' 92 _init_leak_detection()\n' ' 93 \n' ' 94 try:\n' '--> 95 return func(ctx, *args, **kwargs)\n' ' 96 finally:\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 192, in run\n' ' 103 def run(\n' ' 104 ctx: click.Context,\n' ' 105 config: str,\n' ' 106 dry_run: bool,\n' ' 107 preview: bool,\n' ' 108 strict_warnings: bool,\n' ' 109 preview_workunits: int,\n' ' 110 test_source_connection: bool,\n' ' 111 report_to: str,\n' ' 112 no_default_report: bool,\n' ' 113 no_spinner: bool,\n' ' 114 ) -> None:\n' ' (...)\n' ' 188 raw_pipeline_config,\n' ' 189 )\n' ' 190 \n' ' 191 loop = asyncio.get_event_loop()\n' '--> 192 loop.run_until_complete(run_func_check_upgrade(pipeline))\n' '\n' 'File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n' ' 610 def run_until_complete(self, future):\n' ' (...)\n' ' 642 future.remove_done_callback(_run_until_complete_cb)\n' ' 643 if not future.done():\n' " 644 raise RuntimeError('Event loop stopped before Future completed.')\n" ' 645 \n' '--> 646 return future.result()\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 151, in ' 'run_func_check_upgrade\n' ' 146 async def run_func_check_upgrade(pipeline: Pipeline) -> None:\n' ' 147 version_stats_future = asyncio.ensure_future(\n' ' 148 upgrade.retrieve_version_stats(pipeline.ctx.graph)\n' ' 149 )\n' ' 150 the_one_future = asyncio.ensure_future(run_pipeline_async(pipeline))\n' '--> 151 ret = await the_one_future\n' ' 152 \n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 142, in run_pipeline_async\n' ' 140 async def run_pipeline_async(pipeline: Pipeline) -> int:\n' ' 141 loop = asyncio._get_running_loop()\n' '--> 142 return await loop.run_in_executor(\n' ' 143 None, functools.partial(run_pipeline_to_completion, pipeline)\n' '\n' 'File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run\n' ' 53 def run(self):\n' ' 54 if not self.future.set_running_or_notify_cancel():\n' ' 55 return\n' ' 56 \n' ' 57 try:\n' '--> 58 result = self.fn(*self.args, **self.kwargs)\n' ' 59 except BaseException as exc:\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 133, in ' 'run_pipeline_to_completion\n' ' 117 def run_pipeline_to_completion(\n' ' 118 pipeline: Pipeline, structured_report: Optional[str] = None\n' ' 119 ) -> int:\n' ' (...)\n' ' 129 )\n' ' 130 logger.info(\n' ' 131 f"Sink ({pipeline.config.sink.type}) report:\\n{pipeline.sink.get_report().as_string()}"\n' ' 132 )\n' '--> 133 raise e\n' ' 134 else:\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 125, in ' 'run_pipeline_to_completion\n' ' 117 def run_pipeline_to_completion(\n' ' 118 pipeline: Pipeline, structured_report: Optional[str] = None\n' ' 119 ) -> int:\n' ' (...)\n' ' 121 with click_spinner.spinner(\n' ' 122 beep=False, disable=no_spinner, force=False, stream=sys.stdout\n' ' 123 ):\n' ' 124 try:\n' '--> 125 pipeline.run()\n' ' 126 except Exception as e:\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 344, in run\n' ' 332 def run(self) -> None:\n' ' (...)\n' ' 340 else DeadLetterQueueCallback(\n' ' 341 self.ctx, self.config.failure_log.log_config\n' ' 342 )\n' ' 343 )\n' '--> 344 for wu in itertools.islice(\n' ' 345 self.source.get_workunits(),\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 354, in ' 'get_workunits\n' ' 353 def get_workunits(self) -> Iterable[MetadataWorkUnit]:\n' '--> 354 yield from self.emit_dashboard_mces()\n' ' 355 yield from self.emit_chart_mces()\n' '\n' 'File "/tmp/datahub/ingest/venv-superset-0.9.2/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 263, in ' 'emit_dashboard_mces\n' ' 247 def emit_dashboard_mces(self) -> Iterable[MetadataWorkUnit]:\n' ' (...)\n' ' 259 \n' ' 260 current_dashboard_page += 1\n' ' 261 \n' ' 262 payload = dashboard_response.json()\n' '--> 263 for dashboard_data in payload["result"]:\n' ' 264 dashboard_snapshot = self.construct_dashboard_from_api_data(\n' '\n'
    a
    • 2
    • 1
  • f

    full-salesclerk-85947

    11/16/2022, 11:14 AM
    Hello everyone! Cannot start project from quickstart. See this error. Could anyone help me?
    Copy code
    Fetching docker-compose file <https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHub
    Pulling docker images...
    Finished pulling docker images!
    
    [+] Running 7/7
     ⠿ Network datahub_network        Created                                                                                                                                                                                                 4.1s
     ⠿ Container elasticsearch        Created                                                                                                                                                                                                 0.1s
     ⠿ Container zookeeper            Created                                                                                                                                                                                                 0.1s
     ⠹ Container mysql                Creating                                                                                                                                                                                                0.2s
     ⠿ Container elasticsearch-setup  Created                                                                                                                                                                                                 0.0s
     ⠿ Container broker               Created                                                                                                                                                                                                 0.1s
     ⠿ Container schema-registry      Created                                                                                                                                                                                                 0.0s
     ⠿ Container kafka-setup          Created                                                                                                                                                                                                 0.0s
    Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /Users/<user>/.datahub/mysql/init.sql
    .............
    [+] Running 0/0
     ⠋ Container mysql  Creating                                                                                                                                                                                                              0.0s
    Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /Users/<user>/.datahub/mysql/init.sql
    ..............
    a
    b
    • 3
    • 3
  • g

    green-hamburger-3800

    11/16/2022, 11:45 AM
    Hello folks! On Policies... I've got two policies for `allUsers`: METADATA • View Entity Page • View Dataset Usage • View Dataset Profile PLATFORM • View Analytics • Generate Personal Access Tokens But the users can't load correctly the analytics page because they aren't able to load the Domains and they actually can't see the defined Domains in the Govern menu. Any ideas?
    a
    e
    +3
    • 6
    • 18
  • m

    microscopic-mechanic-13766

    11/16/2022, 12:47 PM
    Hello, so I am modifying the spark-lineage plugin to try some improvements but I keep getting this error which I don't really know the source of it:
    Exception in thread "map-output-dispatcher-3" java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
    I am building the jar with the following command
    ./gradlew metadata-integration:java:spark-lineage:buildDependents
    , and the source code I have started from is the code of v0.9.1 tag. The modifications done do not modify yet the base functionality, they are just more log prints. The application submitted worked with the v0.9.2 of the spark-lineage plugin, so I am guessing that this error could have been some step that I have unconsciously skipped in the build process. (I skipped the tests as I was stuck with another error and as it was a checking of the docker deployment, I didn't gave it much importance. Could that be it??)
    d
    b
    • 3
    • 12
  • m

    mysterious-motorcycle-80650

    11/16/2022, 11:00 PM
    i have this error --- [3:44 PM] [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 6.1.4-ccs [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: c9124241a6ff43bc [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1667849230200 WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic MetadataAuditEvent_v4. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic MetadataChangeEvent_v4. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic FailedMetadataChangeEvent_v4. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic MetadataChangeLog_Versioned_v1. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic MetadataChangeLog_Timeseries_v1. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic MetadataChangeProposal_v1. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic FailedMetadataChangeProposal_v1. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic PlatformEvent_v1. WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic DataHubUsageEvent_v1. Error while executing config command with args '--command-config /tmp/connection.properties --bootstrap-server b-1.prodproddatahubekspro.5w4ibt.c17.kafka.us-east-1.amazonaws.com:9092,b-2.prodproddatahubekspro.5w4ibt.c17.kafka.us-east-1.amazonaws.com:9092 --entity-type topics --entity-name _schemas --alter --add-config cleanup.policy=compact' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.getResourceConfig(ConfigCommand.scala:552) at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:322) at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:302) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:97) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: [3:45 PM] is when i want to use kafka of aws
    i
    • 2
    • 22
  • m

    microscopic-room-90690

    11/17/2022, 10:52 AM
    Hi guys, I got a trouble when I ingest metadata from S3. The recipe I use is attached. The version of CLI and GMS are both 0.8.43 And the error is
    ERROR  {datahub.ingestion.run.pipeline:112} - failed to write record with workunit <s3://path/data-lake-dbt/cdm_dim/users_snapshot9996> with Expecting value: line 1 column 1 (char 0) and info {}
    Any help will be appreciated. Thank you!
    d
    • 2
    • 3
  • l

    late-ability-59580

    11/17/2022, 1:47 PM
    Hey everyone! A question in the area of AWS s3 (It's my first day trying out DataHub) . Is it possible to ingest files that don't have extension? I see the ingestion is "successful", but 0 events are created. I suspect it's because I specify something like
    <s3://bucket/pref/pref/*/*>
    in the
    source.config.path_specs.include
    I understand it expects something like
    s3://.../*.*
    , but this won't match the pattern of my files. Am I missing something?
    d
    • 2
    • 6
  • g

    green-hamburger-3800

    11/17/2022, 2:10 PM
    Hello folks! Is it possible to allow a group to manage part of the glossary but not everything? (Maybe everything within a GlossaryNode) Thanks a lot (=
    e
    • 2
    • 4
  • m

    microscopic-mechanic-13766

    11/17/2022, 3:15 PM
    Good afternoon everyone, quick question: Does anyone know if the connection to a Hive with hive.server2.transport.mode set to HTTP is supported?? The default value of that property is binary, but due to external causes had to change it to HTTP and now the Hive-Datahub connection is giving problems
    h
    i
    • 3
    • 21
  • b

    bright-motherboard-35257

    11/17/2022, 4:16 PM
    Getting this error when trying to ingest from Hive. I tried with/without password; and with/without KERBEROS references to no change in result.
    Copy code
    ERROR    {datahub.entrypoints:206} - Command failed: Password should be set if and only if in LDAP or CUSTOM
    Recipe:
    Copy code
    source:
        type: hive
        config:
            database: hc_orders
            profiling:
                enabled: true
            host_port: '<redacted>:10000'
            stateful_ingestion:
                enabled: true
            username: <redacted>
            password: <redacted>
            options:
                connect_args:
                    auth: KERBEROS
                    kerberos_service_name: hive
    h
    • 2
    • 4
  • a

    ancient-apartment-23316

    11/17/2022, 9:26 PM
    Hello, I installed the datahub via helm, and these tabs are not available, please help me, how can I enable them?
    e
    • 2
    • 2
  • f

    few-sunset-43876

    11/18/2022, 3:20 AM
    Hi everyone! I just upgrade the datahub version to v0.9.2 using docker-compose.yml
    Copy code
    version: '3.8'
    services:
    ...  
      neo4j:
        image: neo4j:4.4.9-community
        env_file: neo4j/env/docker.env
        hostname: neo4j
        container_name: neo4j
        ports:
          - ${DATAHUB_MAPPED_NEO4J_HTTP_PORT:-7474}:7474
          - ${DATAHUB_MAPPED_NEO4J_BOLT_PORT:-7687}:7687
        volumes:
          - neo4jdata:/data
    ...
    networks:
      default:
        name: datahub_network
    
    volumes:
      esdata:
      neo4jdata:
      zkdata:
      broker:
    neo4j/env/docker.env
    Copy code
    NEO4J_AUTH=neo4j/datahub
    NEO4J_dbms_default__database=graph.db
    NEO4J_dbms_allow__upgrade=true
    But the neo4j container could not start due to the message:
    Copy code
    Changed password for user 'neo4j'. IMPORTANT: this change will only take effect if performed before the database is started for the first time.
    Could anyone help? Thank you so much in advance!
    i
    • 2
    • 7
  • f

    famous-florist-7218

    11/18/2022, 8:49 AM
    Hi folks, Is it possible to add custom labels in serviceMonitor? I was about to enable Prometheus exporter. The original chart doesn’t have a place to add a label which used to match with our data-agent. cc @bulky-electrician-72362
    Copy code
    {{- if and .Values.serviceMonitor.create .Values.global.datahub.monitoring.enablePrometheus -}}
    apiVersion: <http://monitoring.coreos.com/v1|monitoring.coreos.com/v1>
    kind: ServiceMonitor
    metadata:
      name: {{ printf "%s-%s" .Release.Name "datahub-gms" }}
      labels:
        {{- include "datahub-gms.labels" . | nindent 4 }}
      {{- with .Values.serviceMonitor.annotations }}
      annotations:
        {{- toYaml . | nindent 4 }}
      {{- end }}
    spec:
      endpoints:
      - port: jmx
        relabelings:
        - separator: /
          sourceLabels:
          - namespace
          - pod
          targetLabel: instance
      selector:
        matchLabels:
          <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
          <http://app.kubernetes.io/name|app.kubernetes.io/name>: datahub-gms
    {{- end -}}
    b
    • 2
    • 23
  • h

    happy-baker-8735

    11/18/2022, 11:17 AM
    Hi everyone, I have some questions about access policies. I tried to connect as a Reader, and I see on the UI that: • I can go to Analytics page and Ingestion page (delete a source, see the parameters used for db connection and secrets!) • I can create and modify domains • I can see and use the edit action buttons (for ex Documentation), use the WYSIWYG editor and finally when I try to save, I've got this message error: "Update Failed! Unauthorized to perform this action. Please contact your Datahub administrator." When I say "connect as a Reader": • I first created a user with the role Reader • I created another user without role but with a policy with only "View Entity Page" privileges. Is it normal that as a Reader, we can see all these things? Or, is it a parameter issue? I saw in some comments, that showing edit buttons for readers is something that could be changed one day. But what about the ingestion page? This page is kind of confidential for some users.
    b
    c
    • 3
    • 5
  • p

    polite-egg-47560

    11/18/2022, 12:41 PM
    Hi everyone! I have a question about Pipelines entities on the UI. We've implemented the lineage export from Airflow to Datahub, and we can see the tasks on the tables lineage tab. However, it's not possible to perform an Impact Analysis through the lineage when looking into the Pipeline itself, only through the individual tasks. Due to this, for some DAGs that have many tasks, we can't check who will be affected, and we'd need to go to each task to see the lineage. Is there any more straightforward way? One use case: A user wants to check if she can disable a given DAG and who will be impacted, but the DAG has 30+ tasks, so she needs to go to each task to see the lineage.
    d
    • 2
    • 5
  • s

    swift-farmer-36942

    11/18/2022, 5:25 PM
    Hi again all, After talking with my manager on the latest vulnerability assessment we had done on our DH instance. The company said they were able to read secret access tokens from ther users, but it turns out it was just the metadata of the tokens. Crisis averted 😅
    i
    • 2
    • 2
  • l

    little-breakfast-38102

    11/18/2022, 7:46 PM
    Hi Team, I am trying to ingest airflow metadata into datahub and need help resolving following error(attached screen shot). We are on datahub 0.8.45, installed acryl-datahub-airflow-plug==0.9.2.2. Attached screen shots of Task, datahub airflow operator import, datahub connector, airflow config additions. Added [core] lazy_load_plugins = False @dazzling-judge-80093, @gray-shoe-75895
    g
    d
    • 3
    • 15
  • b

    better-spoon-77762

    11/18/2022, 7:51 PM
    Hello everyone, has anyone faced any issues loading postgres driver since version 0.9,0 ? I keep getting the following exception
    Copy code
    java.util.concurrent.CompletionException: java.lang.IllegalStateException: Problem loading Database Driver [org.postgresql.Driver]: org.postgresql.Driver
            at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
            at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
            at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
            at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
            at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
            at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
            at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
            at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
            at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
    Caused by: java.lang.IllegalStateException: Problem loading Database Driver [org.postgresql.Driver]: org.postgresql.Driver
            at io.ebean.datasource.pool.ConnectionPool.initialise(ConnectionPool.java:281)
            at io.ebean.datasource.pool.ConnectionPool.<init>(ConnectionPool.java:246)
            at io.ebean.datasource.core.Factory.createPool(Factory.java:15)
            at io.ebeaninternal.server.core.DefaultContainer.getDataSourceFromConfig(DefaultContainer.java:273)
            at io.ebeaninternal.server.core.DefaultContainer.setDataSource(DefaultContainer.java:217)
            at io.ebeaninternal.server.core.DefaultContainer.createServer(DefaultContainer.java:103)
            at io.ebeaninternal.server.core.DefaultContainer.createServer(DefaultContainer.java:35)
            at io.ebean.EbeanServerFactory.createInternal(EbeanServerFactory.java:109)
            at io.ebean.EbeanServerFactory.create(EbeanServerFactory.java:70)
            at com.linkedin.metadata.entity.ebean.EbeanTenantDaoManager.getTenantDao(EbeanTenantDaoManager.java:29)
            at com.linkedin.metadata.entity.EntityService.getEntityDao(EntityService.java:193)
            at com.linkedin.metadata.entity.EntityService.getEnvelopedAspects(EntityService.java:1867)
            at com.linkedin.metadata.entity.EntityService.getCorrespondingAspects(EntityService.java:403)
            at com.linkedin.metadata.entity.EntityService.getLatestEnvelopedAspects(EntityService.java:356)
            at com.linkedin.metadata.entity.EntityService.getEntitiesV2(EntityService.java:310)
            at com.linkedin.metadata.client.JavaEntityClient.batchGetV2(JavaEntityClient.java:114)
            at com.linkedin.datahub.graphql.resolvers.MeResolver.lambda$get$0(MeResolver.java:57)
            at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
            ... 6 common frames omitted
    Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver
            at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
            at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
            at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
            at java.base/java.lang.Class.forName0(Native Method)
            at java.base/java.lang.Class.forName(Class.java:398)
            at io.ebean.datasource.pool.ConnectionPool.initialise(ConnectionPool.java:276)
    I wonder if its related to having both openjdk8 and openjdk11 in the gms build ? I have verified that the postgres driver exists within the container
    a
    • 2
    • 2
  • m

    miniature-eve-21984

    11/18/2022, 8:24 PM
    Has anyone tried to use the new "Manage Children" metadata privilege released in v.0.9.2? If so, would be curious of your results as I am not having luck setting up a policy to allow a user access to only create and delete Glossary Terms and Glossary Nodes within a specific Glossary Node. Setup: 1. The user does not have "Manage Glossaries" platform privileges a. I don't want them to have global rights on glossaries. 2. The user is granted rights in a GlossaryNode specific policy with the following configuration: a. Type: Metadata b. Asset Types: Glossary Terms & Glossary Term Groups c. Assets: GlossaryNode X where X is the only Node I want the user to have access to manage. d. Domains: All e. Privilege: Manage Glossary Children
    b
    b
    • 3
    • 15
  • s

    straight-mouse-85445

    11/21/2022, 3:10 AM
    I have one doubt if anyone can answer please- When trying to ingest recipe.yaml, I am getting error that resource is not defined. I ran command with --debug option and realized that resource is being prefixed with \ufeff. Does anyone have seen this issue?
    g
    b
    d
    • 4
    • 12
  • f

    few-sunset-43876

    11/21/2022, 3:27 AM
    Hi Everyone! After upgrading the new version v0.9.2, I got the OOM issue when I search the lineage of a dataset. It keeps loanding and results with timeout (pic as below) The logs from datahub-gms:
    Copy code
    03:16:05.440 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
    03:16:05.663 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: SearchAcrossLineageResultsViewEvent
    03:16:05.663 [ThreadPoolTaskExecutor-1] WARN  c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"SearchAcrossLineageResultsViewEvent","query":"","total":10,"actorUrn":"urn:li:corpuser:datahub","timestamp":1669000565516,"date":"Mon Nov 21 2022 10:16:05 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"57f357cc-cdf7-4104-a7fa-30d8eda4f486"}
    03:16:06.447 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
    
    Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "I/O dispatcher 1"
    
    Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ThreadPoolTaskScheduler-1"
    
    Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | producer-1"
    and the logs from datahub-frontend-react:
    Copy code
    2022-11-21 03:17:04,148 [application-akka.actor.default-dispatcher-13] ERROR application - 
    
    ! @7pkjpoecp - Internal server error, for (POST) [/api/v2/graphql] ->
     
    play.api.UnexpectedException: Unexpected exception[CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms]
    	at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:340)
    	at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:263)
    	at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:443)
    	at play.core.server.AkkaHttpServer$$anonfun$1.applyOrElse(AkkaHttpServer.scala:441)
    	at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:417)
    	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
    	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    	at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    	at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)
    	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
    	at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:92)
    	at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
    	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
    	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
    	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
    	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
    	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
    	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
    	at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
    	at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
    	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
    	at scala.concurrent.BatchingExecutor$Batch.processBatch$1(BatchingExecutor.scala:67)
    	at scala.concurrent.BatchingExecutor$Batch.$anonfun$run$1(BatchingExecutor.scala:82)
    	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
    	at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:59)
    	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:875)
    	at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:110)
    	at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:107)
    	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:873)
    	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
    	at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
    	at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
    	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
    	at scala.concurrent.Promise.complete(Promise.scala:53)
    	at scala.concurrent.Promise.complete$(Promise.scala:52)
    	at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
    	at scala.concurrent.Promise.failure(Promise.scala:104)
    	at scala.concurrent.Promise.failure$(Promise.scala:104)
    	at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:187)
    	at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
    	at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
    	at play.shaded.ahc.org.asynchttpclient.netty.request.NettyRequestSender.abort(NettyRequestSender.java:473)
    	at play.shaded.ahc.org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
    	at play.shaded.ahc.org.asynchttpclient.netty.timeout.ReadTimeoutTimerTask.run(ReadTimeoutTimerTask.java:56)
    	at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:670)
    	at play.shaded.ahc.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:745)
    	at play.shaded.ahc.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:473)
    	at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.util.concurrent.TimeoutException: Read timeout to datahub-gms/172.18.0.3:8080 after 60000 ms
    	... 7 common frames omitted
    the stats of the containers
    Copy code
    CONTAINER ID   NAME                        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
    52616fa99479   datahub-frontend-react      0.59%     523.7MiB / 31.26GiB   1.64%     598kB / 619kB     0B / 0B           52
    d72c1d91089c   datahub_datahub-actions_1   0.06%     50.66MiB / 31.26GiB   0.16%     295MB / 181MB     5.46MB / 0B       24
    805489e4533c   datahub-gms                 748.15%   1.754GiB / 31.26GiB   5.61%     316MB / 3.25MB    0B / 0B           127
    69761ab51fcc   schema-registry             0.21%     520.5MiB / 31.26GiB   1.63%     104MB / 99.2MB    6.14MB / 12.3kB   49
    34814372e50d   broker                      0.88%     508.4MiB / 31.26GiB   1.59%     957MB / 977MB     13.3MB / 801MB    89
    30a6648fdbd5   elasticsearch               0.98%     932.2MiB / 31.26GiB   2.91%     26.5MB / 27.6MB   34.1MB / 178MB    134
    bbef225eadba   zookeeper                   0.22%     358MiB / 31.26GiB     1.12%     20MB / 12MB       451kB / 188kB     67
    9a83d87163a1   mysql                       0.06%     348MiB / 31.26GiB     1.09%     63.7MB / 301MB    14.9MB / 26.1MB   33
    e0d367b11df2   neo4j                       0.59%     1.609GiB / 31.26GiB   5.15%     17.3MB / 926MB    1.47GB / 26.1MB   78
    The java heap size of datahub-gms
    Copy code
    bash-5.1$ java -XX:+PrintFlagsFinal -version | grep HeapSize
       size_t ErgoHeapSizeLimit                        = 0                                         {product} {default}
       size_t HeapSizePerGCThread                      = 43620760                                  {product} {default}
       size_t InitialHeapSize                          = 526385152                                 {product} {ergonomic}
       size_t LargePageHeapSizeThreshold               = 134217728                                 {product} {default}
       size_t MaxHeapSize                              = 8392802304                                {product} {ergonomic}
        uintx NonNMethodCodeHeapSize                   = 5836300                                {pd product} {ergonomic}
        uintx NonProfiledCodeHeapSize                  = 122910970                              {pd product} {ergonomic}
        uintx ProfiledCodeHeapSize                     = 122910970                              {pd product} {ergonomic}
    openjdk version "11.0.17" 2022-10-18
    OpenJDK Runtime Environment (build 11.0.17+8-alpine-r3)
    OpenJDK 64-Bit Server VM (build 11.0.17+8-alpine-r3, mixed mode)
    datahub-gms container with free command:
    Copy code
    docker exec -it datahub-gms bash
    bash-5.1$ free
                  total        used        free      shared  buff/cache   available
    Mem:       32776400     8052724      417880           0    24305796    24294940
    Swap:       4194300        3584     4190716
    The application is deploy in GCP, the stats of VM:
    Copy code
    cat /proc/meminfo
    MemTotal:       32776400 kB
    MemFree:          306556 kB
    MemAvailable:   24412316 kB
    Buffers:            2212 kB
    Cached:         23913504 kB
    SwapCached:          124 kB
    Active:         15746384 kB
    Inactive:       15120120 kB
    Active(anon):    5049788 kB
    Inactive(anon):  1926800 kB
    Active(file):   10696596 kB
    Inactive(file): 13193320 kB
    Unevictable:           0 kB
    Mlocked:               0 kB
    SwapTotal:       4194300 kB
    SwapFree:        4191228 kB
    Dirty:                84 kB
    Writeback:             0 kB
    AnonPages:       6950912 kB
    Mapped:           309100 kB
    Shmem:             25800 kB
    Slab:             885396 kB
    SReclaimable:     618596 kB
    SUnreclaim:       266800 kB
    KernelStack:       18816 kB
    PageTables:        30292 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:    20582500 kB
    Committed_AS:   13568028 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:       63820 kB
    VmallocChunk:   34359661428 kB
    Percpu:             5760 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:   2617344 kB
    CmaTotal:              0 kB
    CmaFree:               0 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    DirectMap4k:      103232 kB
    DirectMap2M:     5136384 kB
    DirectMap1G:    30408704 kB
    The production with older version V0.8.24 didn't have this OOM issue. It happens after upgrading to v0.9.2. I upgrade the new version using command from docker-compose.yml of version v0.9.2
    Copy code
    docker-compose down --remove-orphans && docker-compose pull && docker-compose -p datahub up --force-recreate
    Is there anything I need to check or adjust (reindexing or something...)? Any help would be appreciated.
    b
    b
    • 3
    • 27
1...606162...119Latest