DataHub #troubleshoot

jolly-pager-18761

07/05/2023, 2:51 PM

Hi, is there any known issue with Datahub v0.10.4 helm deployment in GCP? We have been having serious issues in getting the gms service up and running. Details in the thread - Any help would be greatly appreciated 🙏

busy-analyst-35820

07/06/2023, 3:58 AM

This message contains interactive elements.

plus1 1

kind-whale-9577

07/06/2023, 4:19 AM

HI Team,

✅ 1

kind-whale-9577

07/06/2023, 4:19 AM

I'm getting following error

kind-whale-9577

07/06/2023, 4:19 AM

Copy code

23/07/06 04:16:52 ERROR DatahubSparkListener: java.lang.NullPointerException
	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:296)
	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:241)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)

kind-whale-9577

07/06/2023, 4:19 AM

can someone please help me ASAP

kind-whale-9577

07/06/2023, 4:40 AM

Copy code

23/07/06 04:40:11 ERROR DatahubSparkListener: java.lang.NullPointerException
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:262)
	at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:258)

worried-rocket-84695

07/06/2023, 5:31 AM

hi all .. i am receiving data from Kafka into my mongoDB , but i am unable to automatically generate the lineage . I have created the Ingestion on Kafka and MongoDB as well . There is concept of custom emitter , but i am unable to under how to execute them .

✅ 1

adventurous-lighter-8945

07/06/2023, 6:20 AM

This message contains interactive elements.

✅ 1

powerful-tent-14193

07/06/2023, 8:02 AM

Hi Team,

powerful-tent-14193

07/06/2023, 8:09 AM

Hi Team, I'm using kafka Emitter to push metadata from kafka to datahub. I want to know is there any way to set the attributes for the metadata with java, such as "Domain", "Data Product" "Owner"..... And could you please provide some guidance on how to configure lineage between kafka topics with java emitter? Thank you very much!

acceptable-computer-51491

07/06/2023, 9:55 AM

Hi Guys, While ingesting data from Glue using UI, I am getting following issue. Any ideas

Copy code

[2023-07-06 09:16:10,979] DEBUG    {datahub.emitter.rest_emitter:247} - Attempting to emit to DataHub GMS; using curl equivalent to:\n',
           '2023-07-06 09:16:11.149010 [exec_id=280a9dbb-5208-4212-95ee-d28a9e4d4afc] INFO: Caught exception EXECUTING '
           'task_id=280a9dbb-5208-4212-95ee-d28a9e4d4afc, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
           '    line = await self.readuntil(sep)\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 603, in readuntil\n'
           '    raise exceptions.LimitOverrunError(\n'
           'asyncio.exceptions.LimitOverrunError: Separator is not found, and chunk exceed the limit\n'
           '\n'
           'During handling of the above exception, another exception occurred:\n'
           '\n'
           'Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 147, in execute\n'
           '    await tasks.gather(_read_output_lines(), _report_progress(), _process_waiter())\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 99, in _read_output_lines\n'
           '    line_bytes = await ingest_process.stdout.readline()\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 534, in readline\n'
           '    raise ValueError(e.args[0])\n'
           'ValueError: Separator is not found, and chunk exceed the limit\n']}

✅ 1

limited-dentist-50437

07/06/2023, 12:08 PM

Hi all, I’m working on a poc that involves emitting the lineage from airflow dag runs to Datahub. I understand that the inlets and outlets for a specific operator should be configured in order to visualize the full lineage. This I have done for for snowflake tables which works as expected. However, I’m running into an issue where I can’t configure datasets that point to files stored on s3. The docs: https://datahubproject.io/docs/generated/metamodel/entities/dataset insists that this can be achieved but there’s no clear guidance on how to setup a urn for s3. I’d be happy to be pointed in the right direction. Thanks

✅ 1

broad-pager-68280

07/06/2023, 12:54 PM

Hello Team, has any one experimented that how users can do a bulk changes on the field level (for example at schema level ,can a user apply a bulk change to apply certain tags on the field level?). Expectation is to have a kebab menu similar to the bulk change menu that is available on the dataset level where we can make bulk edits to glossary terms, owners and tags etc...)

✅ 1

plus1 1

kind-whale-9577

07/06/2023, 4:13 PM

#troubleshoot HI all, I'm facing an issue while

✅ 1

kind-whale-9577

07/06/2023, 4:13 PM

integrating spark with datahub

Copy code

23/07/06 04:16:52 ERROR DatahubSparkListener: java.lang.NullPointerException
	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:296)
	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:241)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)

kind-whale-9577

07/06/2023, 4:15 PM

@early-area-1276 @witty-plumber-82249 please help me to resolve this issue

delightful-beard-43126

07/06/2023, 6:34 PM

Hi, I’ve been running Datahub on EKS using the helm charts for a couple months now. I’m also using Aurora MySQL and Opensearch as dependencies. It is using Okta for authentication. But since last week, the ingestions stopped running. While trying to fix it, I restarted the K8s deployment. Now, the deployment does not finish. The GMS cannot reach a healthy status. I found this repeating in the log:

Copy code

2023-07-06 15:20:20,856 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:829)
2023-07-06 15:20:29,258 [R2 Nio Event Loop-1-2] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:829)
2023-07-06 15:20:29,259 [ThreadPoolTaskExecutor-1] ERROR c.l.m.kafka.hydrator.EntityHydrator:49 - Error while calling GMS to hydrate entity for urn urn:li:corpuser:${company_email}
2023-07-06 15:20:29,259 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.t.DataHubUsageEventTransformer:128 - No matches for urn urn:li:corpuser:${company_email}
2023-07-06 15:20:29,548 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 1 Took time ms: -1 Message: failure in bulk execution:
[0]: index [datahub_usage_event-000059], type [_doc], id [PageViewEvent_urn%3Ali%3Acorpuser%3A${company_email}_1688570490840_11171], message [[datahub_usage_event-000059/v6wPC48TQBOr5UQ9q9kygw][[datahub_usage_event-000059][0]] ElasticsearchException[Elasticsearch exception [type=version_conflict_engine_exception, reason=[PageViewEvent_urn%3Ali%3Acorpuser%3A${company_email}_1688570490840_11171]: version conflict, document already exists (current version [1])]]]
2023-07-06 15:20:30,065 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080

What could be happening here?

✅ 1

victorious-monkey-86128

07/06/2023, 7:11 PM

Hey guys, I'm trying to run

gradlew/quickstart

in the datahub repo but I keep getting the error:

Copy code

Execution failed for task ':docker:kafka-setup:docker'.
> Process 'command 'docker'' finished with non-zero exit value 1

nice-rocket-26538

07/06/2023, 7:38 PM

I don't understand this piece of the python sdk documentation. It says that the inputDatasets and outputDatasets are deprecated and that you should use the inputDataSetEdges instead. However the former two are mandatory parameters in the constructor, how can you omit a deprecated parameter if it is mandatory? Also, creating job lineages with this basically requires you to have both input and output datsets to a job. Doesn't make much sense under all circumstances.

✅ 1

important-pager-98358

07/06/2023, 8:39 PM

Hello everyone! I'm having a problem/doubt regarding the association of tags at the field level of a chart. According to this documentation(https://datahubproject.io/docs/generated/metamodel/entities/chart#inputfields), the inputFields aspect declares the relationship of the chart's fields with the fields of a dataset. However, the aspect of the chart is completely independent since if I add a tag in the column which a field of the char references, that same tag is not propagated to the chart. It is in this detail that the problem lies. In addition to the datahub not rendering the changes occurring in the fields that the chart references, I still cannot make this change manually, even though I have administrator permissions on the DataHub. Can anyone help me with this? More details: DataHub version: 0.10.4 Charts Ingested by: Tableau source (default cli version 0.10.4) Deployment: Kubernetes (GKE version 1.24.9-gke.3200)

green-monitor-16572

07/07/2023, 7:17 AM

Hi all I am trying to build the master code checkout from github locally on my Machine by running the command "./gradlew build" , but gradle build is failing on below task- Execution failed for task 'metadata eventsmxe-utils-avro-1.7:compileJava' with the below error:

Copy code

^
  symbol:   method schema()
  location: variable event of type PlatformEvent
/Users/r0b0d1h/open source code/datahub/metadata-events/mxe-utils-avro-1.7/src/main/java/com/linkedin/metadata/EventUtils.java:314: error: cannot find symbol
        DataTranslator.dataMapToGenericRecord(event.data(), event.schema(), ORIGINAL_DUHE_AVRO_SCHEMA);
                                                   ^
  symbol:   method data()
  location: variable event of type DataHubUpgradeHistoryEvent
/Users/r0b0d1h/open source code/datahub/metadata-events/mxe-utils-avro-1.7/src/main/java/com/linkedin/metadata/EventUtils.java:314: error: cannot find symbol
        DataTranslator.dataMapToGenericRecord(event.data(), event.schema(), ORIGINAL_DUHE_AVRO_SCHEMA);
                                                                 ^
  symbol:   method schema()
  location: variable event of type DataHubUpgradeHistoryEvent
Note: /Users/r0b0d1h/open source code/datahub/metadata-events/mxe-utils-avro-1.7/src/main/java/com/linkedin/metadata/EventUtils.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /Users/r0b0d1h/open source code/datahub/metadata-events/mxe-utils-avro-1.7/src/main/java/com/linkedin/metadata/EventUtils.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
36 errors

Kindly help me why I am facing with the above compiler error

✅ 1

green-autumn-94171

07/07/2023, 7:28 AM

Hi community! I would have a question about the activity of the Datahub software when it's not being used. We installed Datahub on our AWS to test out the product and connected it & ingested metadata from a MDM tool. My question relates to the CPU usage of the Datahub instance that isn't being used but still has an 10% all the time. Any insights to this situation and why it does that? Thanks in advance.

✅ 1

adorable-lawyer-88494

07/07/2023, 7:31 AM

Hi all, Got the following error when running open-api ingest 'Unable to find an example for endpoint. Please add it to the list of forced examples' There are more than 150 APIs so do I need to add all in forced example? or is there any way to fix it?

Copy code

[13:00] Sejal Jain

RUN_INGEST - {'errors': [],
 'exec_id': '2663b6f4-59b2-4b55-a5e8-9b7d89f1be66',
 'infos': ['2023-07-07 06:32:09.122532 [exec_id=2663b6f4-59b2-4b55-a5e8-9b7d89f1be66] INFO: Starting execution for task with name=RUN_INGEST',
           '2023-07-07 06:32:31.297837 [exec_id=2663b6f4-59b2-4b55-a5e8-9b7d89f1be66] INFO: stdout=venv setup time = 0\n'
           'This version of datahub supports report-to functionality\n'
           'datahub  ingest run -c /tmp/datahub/ingest/2663b6f4-59b2-4b55-a5e8-9b7d89f1be66/recipe.yml --report-to '
           '/tmp/datahub/ingest/2663b6f4-59b2-4b55-a5e8-9b7d89f1be66/ingestion_report.json\n'
           '[2023-07-07 06:32:10,812] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.3\n'
           '[2023-07-07 06:32:10,848] INFO     {datahub.ingestion.run.pipeline:210} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-datahub-gms:8080>\n'
           '[2023-07-07 06:32:10,862] INFO     {datahub.ingestion.run.pipeline:227} - Source configured successfully.\n'
           '[2023-07-07 06:32:10,863] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion\n'
           '[2023-07-07 06:32:29,861] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote SUCCESS report successfully to '
           "<_io.TextIOWrapper name='/tmp/datahub/ingest/2663b6f4-59b2-4b55-a5e8-9b7d89f1be66/ingestion_report.json' mode='w' encoding='UTF-8'>\n"
           '[2023-07-07 06:32:29,861] INFO     {datahub.cli.ingest_cli:142} - Finished metadata ingestion\n'
           '\n'
           'Cli report:\n'
           "{'cli_version': '0.10.3',\n"
           " 'cli_entry_location': '/tmp/datahub/ingest/venv-openapi-0.10.3/lib/python3.10/site-packages/datahub/__init__.py',\n"
           " 'py_version': '3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110]',\n"
           " 'py_exec_path': '/tmp/datahub/ingest/venv-openapi-0.10.3/bin/python3',\n"
           " 'os_details': 'Linux-5.10.179-168.710.amzn2.x86_64-x86_64-with-glibc2.31',\n"
           " 'peak_memory_usage': '66.3 MB',\n"
           " 'mem_info': '66.3 MB',\n"
           " 'peak_disk_usage': '17.64 GB',\n"
           " 'disk_info': {'total': '21.46 GB', 'used': '17.64 GB', 'free': '3.82 GB'}}\n"
           'Source (openapi) report:\n'
           "{'events_produced': 0,\n"
           " 'events_produced_per_sec': 0,\n"
           " 'entities': {},\n"
           " 'aspects': {},\n"
           " 'warnings': {'/api/health': ['Unable to find an example for endpoint. Please add it to the list of forced examples.'],\n"
           "              '/resources/agents/CommPayable/{commPayableID}': ['Unable to find an example for endpoint. Please add it to the list of "
           "forced examples.'],\n"
           "              '/resources/agents/Commissions': ['Unable to find an example for endpoint. Please add it to the list of forced "
           "examples.'],\n"
           "              '/resources/agents/{agentID}/carrierAppointments': ['Unable to find an example for endpoint. Please add it to the list of "
           "forced examples.'],\n"
           "              '/resources/agents/{agentID}/leads': ['Unable to find an example for endpoint. Please add it to the list of forced "
           "examples.'],\n"
           "              '/resources/customers/{ssn}/GetCustomersBySSN': ['Unable to find an example for endpoint. Please add it to the list of "
           "forced examples.'],\n"
           "              '/resources/eapp/{applicationNumber}/covarageMatchedParties': ['Unable to find an example for endpoint. Please add it to "
           "the list of forced examples.'],\n"
           "              '/resources/policies/{policyNumber}/notes': ['Unable to find an example for endpoint. Please add it to the list of forced "
           "examples.'],\n"
           "              '/resources/portalForms/{formCode}': ['Unable to find an example for endpoint. Please add it to the list of forced "
           "examples.'],\n"
           "              '/resources/portalForms/{formCode}/localizeConfig': ['Unable to find an example for endpoint. Please add it to the list of "
           "forced examples.'],\n"
           "              'sampled': '10 sampled of at most 120 entries.'},\n"
           " 'failures': {},\n"
           " 'start_time': '2023-07-07 06:32:10.862463 (19.17 seconds ago)',\n"
           " 'running_time': '19.17 seconds'}\n"
           'Sink (datahub-rest) report:\n'
           "{'total_records_written': 0,\n"
           " 'records_written_per_second': 0,\n"
           " 'warnings': [],\n"
           " 'failures': [],\n"
           " 'start_time': '2023-07-07 06:32:10.844377 (19.19 seconds ago)',\n"
           " 'current_time': '2023-07-07 06:32:30.029331 (now)',\n"
           " 'total_duration_in_seconds': 19.18,\n"
           " 'gms_version': 'v0.9.6',\n"
           " 'pending_requests': 0}\n"
           '\n'
           ' Pipeline finished with at least 120 warnings; produced 0 events in 19.17 seconds.\n'
           '❗Client-Server Incompatible❗ Your client version 0.10.3 is newer than your server version 0.9.6. Downgrading the cli to 0.9.6 is '
           'recommended.\n'
           ' ➡️ Downgrade via `"pip install \'acryl-datahub==0.9.6\'"\n',
           "2023-07-07 06:32:31.298067 [exec_id=2663b6f4-59b2-4b55-a5e8-9b7d89f1be66] INFO: Successfully executed 'datahub ingest'"],
 'structured_report': '{"cli": {"cli_version": "0.10.3", "cli_entry_location": '
                      '"/tmp/datahub/ingest/venv-openapi-0.10.3/lib/python3.10/site-packages/datahub/__init__.py", "py_version": "3.10.7 (main, Sep '
                      '13 2022, 14:31:33) [GCC 10.2.1 20210110]", "py_exec_path": "/tmp/datahub/ingest/venv-openapi-0.10.3/bin/python3", '
                      '"os_details": "Linux-5.10.179-168.710.amzn2.x86_64-x86_64-with-glibc2.31", "peak_memory_usage": "66.03 MB", "mem_info": '
                      '"66.03 MB", "peak_disk_usage": "17.64 GB", "disk_info": {"total": "21.46 GB", "used": "17.64 GB", "free": "3.82 GB"}}, '
                      '"source": {"type": "openapi", "report": {"events_produced": 0, "events_produced_per_sec": 0, "entities": {}, "aspects": {}, '
                      '"warnings": {"/api/health": ["Unable to find an example for endpoint. Please add it to the list of forced examples."], '
                      '"/resources/agents/CommPayable/{commPayableID}": ["Unable to find an example for endpoint. Please add it to the list of '
                      'forced examples."], "/resources/agents/Commissions": ["Unable to find an example for endpoint. Please add it to the list of '
                      'forced examples."], "/resources/agents/{agentID}/carrierAppointments": ["Unable to find an example for endpoint. Please add '
                      'it to the list of forced examples."], "/resources/agents/{agentID}/leads": ["Unable to find an example for endpoint. Please '
                      'add it to the list of forced examples."], "/resources/customers/{ssn}/GetCustomersBySSN": ["Unable to find an example for '
                      'endpoint. Please add it to the list of forced examples."], "/resources/eapp/{applicationNumber}/covarageMatchedParties": '
                      '["Unable to find an example for endpoint. Please add it to the list of forced examples."], '
                      '"/resources/policies/{policyNumber}/notes": ["Unable to find an example for endpoint. Please add it to the list of forced '
                      'examples."], "/resources/portalForms/{formCode}": ["Unable to find an example for endpoint. Please add it to the list of '
                      'forced examples."], "/resources/portalForms/{formCode}/localizeConfig": ["Unable to find an example for endpoint. Please add '
                      'it to the list of forced examples."], "sampled": "10 sampled of at most 120 entries."}, "failures": {}, "start_time": '
                      '"2023-07-07 06:32:10.862463 (19 seconds ago)", "running_time": "19 seconds"}}, "sink": {"type": "datahub-rest", "report": '
                      '{"total_records_written": 0, "records_written_per_second": 0, "warnings": [], "failures": [], "start_time": "2023-07-07 '
                      '06:32:10.844377 (19.02 seconds ago)", "current_time": "2023-07-07 06:32:29.861069 (now)", "total_duration_in_seconds": 19.02, '
                      '"gms_version": "v0.9.6", "pending_requests": 0}}}'}
Execution finished successfully!

✅ 1

broad-pager-68280

07/07/2023, 11:30 AM

Hi Team , we want to hide the below mentioned container Id from the URL, please advise how can we do this.

✅ 1

nice-waiter-58576

07/07/2023, 12:25 PM

Hi, we are seeing below the exception in MAE consumer logs just after the startup. Datahub v0.10.4 Kafka 3.4.1 2023-07-07 105654,963 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group Has anyone faced the same issue?

✅ 1

some-crowd-4662

07/07/2023, 1:47 PM

Hi All, i am trying to enable classification, but getting following error Failed to configure the source (snowflake): 1 validation error for SnowflakeV2Config classifiers extra fields not permitted (type=value_error.extra) My config section where i am enabling classification, can someone please let me know what i am doing wrong here? profiling: enabled: true profile_table_level_only: true classification: enabled: true classifiers: - type: datahub config: confidence_level_threshold: 0.6

chilly-elephant-51826

07/08/2023, 11:17 AM

Hi Facing issue with OIDC setup after sometime of application being running face gets below error response fro datahub frontend, the issue seems to be interim and does not have a clear understanding why it is happening seems. generally sees the issue when the container restart and the web session was already created earlier not sure why it is not able to provision new session token when this issue arises

Copy code

2023-07-08 08:56:12,569 [application-akka.actor.default-dispatcher-15] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:12,684 [application-akka.actor.default-dispatcher-15] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:12,684 [application-akka.actor.default-dispatcher-9] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:12,685 [application-akka.actor.default-dispatcher-13] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:12,692 [application-akka.actor.default-dispatcher-15] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:12,692 [application-akka.actor.default-dispatcher-9] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:13,725 [application-akka.actor.default-dispatcher-10] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
2023-07-08 08:56:13,726 [application-akka.actor.default-dispatcher-15] ERROR controllers.SsoCallbackController - Caught exception while attempting to handle SSO callback! It's likely that SSO integration is mis-configured.
java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: State cannot be determined
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
at play.core.j.HttpExecutionContext.$anonfun$execute$1(HttpExecutionContext.scala:64)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.pac4j.core.exception.TechnicalException: State cannot be determined
at org.pac4j.oidc.credentials.extractor.OidcExtractor.lambda$extract$0(OidcExtractor.java:100)
at java.base/java.util.Optional.orElseThrow(Optional.java:408)
at org.pac4j.oidc.credentials.extractor.OidcExtractor.extract(OidcExtractor.java:100)
at org.pac4j.core.client.BaseClient.retrieveCredentials(BaseClient.java:66)
at org.pac4j.core.client.IndirectClient.getCredentials(IndirectClient.java:143)
at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:85)
at auth.sso.oidc.OidcCallbackLogic.perform(OidcCallbackLogic.java:100)
at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:91)
at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:77)
at org.pac4j.play.CallbackController.lambda$callback$0(CallbackController.java:54)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
... 8 common frames omitted

there is an indication that oidc config are mis configured, but can that be an issue ? as on redeploying the application application everything looks fine

✅ 1

fancy-crayon-39356

07/09/2023, 5:01 PM

Hey team! I'm trying to implement the new Datahub Chrome Extension for Tableau and I'm facing an interesting problem. After managing to install it and authenticate through the

datahub-frontend

, it just doesn't find the tableau assets in Datahub: "Sorry, we are unable to find this entity in DataHub". Digging into the problem, I see that the extension uses the

useGetSearchResultsForMultipleQuery

function (https://github.com/datahub-project/datahub/pull/8033) that basically calls the

searchAcrossEntities

query using

externalUrl, chartUrl, dashboardUrl

fields as filters. So I figured out that the problem is with this query - on my side, the filter simply doesn't work. I've managed to replicate the query that the plugin does on GraphiQL, and the server is not able to filter for

externalUrl

chartUrl

dashboardUrl

. The strange thing is that

externalUrl

is now a searchable field: https://github.com/datahub-project/datahub/pull/7953 Example of query I'm running:

Copy code

searchAcrossEntities(input: {query: "*", start: 0, count: 2, orFilters: [{and: [{field: "externalUrl", values: ["<https://my-tableau-server-url>..."]}]}]}) {
    searchResults {
      entity {
        urn,
        type
      }
    }
  }

searchResults

is empty here - and I'm providing the correct externalUrl, checked many times. Does anyone have an idea why I can't search using the

externalUrl

field on the filters? My thinking is that this is the root cause behind the Chrome Extension for Tableau not working. Datahub version: v0.10.3 Deployment method: Helm charts Help would be much appreciated 🙏 @big-carpet-38439

numerous-account-62719

07/10/2023, 11:07 AM

Untitled