nutritious-salesclerk-57675
02/22/2023, 4:06 PMpurple-oil-61897
02/22/2023, 4:24 PMdatahub docker quickstart
, a few of the containers keep restarting and exiting... I dont see and obvious issues, no erros in mysql-setup, gms has error with:
2023-02-22 17:21:43 2023/02/22 16:21:43 Problem with dial: dial tcp 172.18.0.8:29092: connect: connection refused. Sleeping 1s
2023-02-22 17:21:44 2023/02/22 16:21:44 Timeout after 4m0s waiting on dependencies to become available: [<http://elasticsearch:9200> <tcp://mysql:3306> <tcp://broker:29092>]
Zoopkeer looks like is not running:
2023-02-22 17:18:48 [2023-02-22 16:18:48,129] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
2023-02-22 17:18:48 [2023-02-22 16:18:48,369] WARN Close of session 0x0 (org.apache.zookeeper.server.NIOServerCnxn)
2023-02-22 17:18:48 java.io.IOException: ZooKeeperServer not running
2023-02-22 17:18:48 at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:544)
2023-02-22 17:18:48 at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
2023-02-22 17:18:48 at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
2023-02-22 17:18:48 at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
2023-02-22 17:18:48 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2023-02-22 17:18:48 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2023-02-22 17:18:48 at java.base/java.lang.Thread.run(Thread.java:829)
2023-02-22 17:18:48 [2023-02-22 16:18:48,434] WARN Unexpected exception (org.apache.zookeeper.server.WorkerService)
2023-02-22 17:18:48 java.lang.NullPointerException
I dont really know what to do with this...witty-motorcycle-52108
02/22/2023, 6:58 PMip_address
in a table, searching address
shows results but searching ip
has an empty result set. we have a large number of columns (and tables) that should match, so unsure what's going onbland-barista-59197
02/22/2023, 11:57 PMdatahub-gms.replicaCount : 2
in value.yaml the workload is shows containers with unready status: [datahub-gms]
and datahub-gms container restart multiple times. NOTE: datahub-gms works well when replicaCont is 1.
2. Same message for when disable system update global.datahub.systemUpdate.enabled: false
and datahub-gms.replicaCount : 1
in value.yamlcuddly-butcher-39945
02/23/2023, 4:44 AMpolite-actor-701
02/23/2023, 9:08 AM{'errors': [{'message': 'An unknown error occurred.', 'locations': [{'line': 2, 'column': 3}], 'path': ['search'], 'extensions': {'code': 500, 'type': 'SERVER_ERROR', 'classification': 'DataFetchingException'}}], 'data': {'search': None}}
print correct result
{'data': {'search': {'searchResults': [{'entity': {'urn': 'urn:li:tag:AUM추출', 'properties': {'name': 'AUM추출'}}}]}}}
Could you please advise which part I should correct to prevent this error?
I attached the 'test.py' file and the 'datahub-gms-log.txt' file when there was an error.fresh-postman-88589
02/23/2023, 4:01 PMgentle-lifeguard-88494
02/24/2023, 1:15 AMnamespace com.mycompany.dq
record distinctColValues {
column: string
distinctValues: array[string]
}
File: distinctValues.pdl
namespace com.mycompany.dq
@Aspect = {
"name": "distinctValues",
"autoRender": true,
"renderSpec": {
"displayType": "tabular", // or properties
"key": "distinct_key",
"displayName": "Distinct Values"
}
}
record distinctValues {
distinct_key: array[distinctColValues]
}
}
I put some screenshots below. I see the data in GraphQL, just not in the UI
Any help would be appreciated, thanks!rich-policeman-92383
02/24/2023, 5:50 AMbored-dentist-25467
02/24/2023, 6:17 AMglamorous-elephant-17130
02/24/2023, 9:06 AMdatahub-frontend:
enabled: true
image:
repository: linkedin/datahub-frontend-react
tag: "v0.10.0"
# Set up ingress to expose react front-end
extraVolumes:
- name: user-props
secret:
secretName: datahub-pass-secret
extraVolumeMounts:
- name: user-props
mountPath: /datahub-frontend/conf/user.props
subPath: token
readOnly: true
ingress:
enabled: false
resources:
limits:
memory: 1400Mi
requests:
cpu: 100m
memory: 512Mi
Tried to upgrade the default password using this. Had created the secret before hand.
Now my frontend is stuck in ContainerCreating.
Any clue guys?agreeable-belgium-70840
02/24/2023, 11:25 AM2023-02-24 11:24:01,525 [ThreadPoolTaskExecutor-1] WARN o.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 1434 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
Any idea why?lively-jackal-83760
02/24/2023, 2:23 PMagreeable-belgium-70840
02/24/2023, 3:02 PM2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: partitions revoked: []
2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,561 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:552 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] (Re-)joining group
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.AbstractCoordinator:503 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Successfully joined group with generation 1837
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions:
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO o.a.k.c.c.i.ConsumerCoordinator:273 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Adding newly assigned partitions:
2023-02-24 14:43:11,592 [ThreadPoolTaskExecutor-1] INFO o.s.k.l.KafkaMessageListenerContainer:292 - mce-consumer-job-client: partitions assigned: []
Any idea why?glamorous-elephant-17130
02/24/2023, 9:08 PMDevelopment-Technology-Developer:~/environment/deploy-datahub-using-aws-managed-services-ingest-metadata (main) $ datahub actions -c actions/slack_integration.yaml
[2023-02-24 20:55:05,426] INFO {datahub_actions.cli.actions:77} - DataHub Actions version: 0.0.11
[2023-02-24 20:55:05,542] INFO {datahub_actions.plugin.action.slack.slack:96} - Slack notification action configured with bot_token=SecretStr('**********') signing_secret=SecretStr('**********') default_channel='C04QT4JEYSK' base_url='<http://datahub.dev.creditsaison.xyz:9002>' suppress_system_activity=True
/home/ec2-user/.local/lib/python3.7/site-packages/slack_sdk/web/internal_utils.py:290: UserWarning: The top-level `text` argument is missing in the request payload for a chat.postMessage call - It's a best practice to always provide a `text` argument when posting a message. The `text` argument is used in places where content cannot be rendered such as: system push notifications, assistive technology such as screen readers, etc.
warnings.warn(missing_text_message, UserWarning)
[2023-02-24 20:55:06,130] INFO {datahub_actions.cli.actions:119} - Action Pipeline with name 'datahub_slack_action' is now running.
%4|1677272135.501|FAIL|rdkafka#consumer-1| [thrd:<http://b-2.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/boo|b-2.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/boo>]: <http://b-2.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/bootstrap|b-2.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/bootstrap>: Connection setup timed out in state CONNECT (after 30033ms in state CONNECT)
%4|1677272136.167|FAIL|rdkafka#consumer-1| [thrd:<http://b-1.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/boo|b-1.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/boo>]: <http://b-1.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/bootstrap|b-1.mskdatahub.0c8aba.c9.kafka.us-east-1.amazonaws.com:9092/bootstrap>: Connection setup timed out in state CONNECT (after 30037ms in state CONNECT)
glamorous-elephant-17130
02/24/2023, 9:09 PMglamorous-elephant-17130
02/24/2023, 9:09 PMgentle-lifeguard-88494
02/25/2023, 2:32 PMdatahub ingest list-runs
from this thread here: https://datahubspace.slack.com/archives/C029A3M079U/p1675534582989809
I figured it out, but I was wondering why I had to manually add my datahub token to the list runs function to get it to work. Am I doing something wrong with my setup possibly or is there something in the code that needs to be updated? Thanks!powerful-cat-68806
02/26/2023, 4:11 PM502 bad gateway
nginx error
Checking the ingress-nginx
pods logs, I found the following errors:
2023/02/26 12:47:47 [error] 2331#2331: *16878988 upstream sent too big header while reading response header from upstream, client: <http://xx.xx.xxx.xxx|xx.xx.xxx.xxx>, server: <http://datahub-xxx-xx.xxx.xxx|datahub-xxx-xx.xxx.xxx>, request: "GET /sso HTTP/1.1", upstream: "<http://datahub-xxx-xx.xxx.xxx:9002/sso>", host: "<http://datahub-xxx-xx.xxx.xxx|datahub-xxx-xx.xxx.xxx>", referrer: "<https://datahub-xxx-xx.xxx.xxx/login>"
I0226 14:43:16.410437 66544 request.go:665] Waited for 1.000965907s due to client-side throttling, not priority and fairness, request: GET:<https://xxxxxxxxxxxxx.xxx.us-east-1.eks.amazonaws.com/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-xxxxxxx-xxxx/log?container=controller&follow=true&tailLines=10>
I’ve validate:
• DH pods are running
• Ingress is configured correct
• Services are configured with the right dns & ports
• env | grep AUTH
is configured with the correct values, for datahub-gms
pod
When trying to login with usr+pwd, all works fine
I’ve checked with our IT team as well, if the blocker is from our VPN
Pls. advisebest-umbrella-88325
02/27/2023, 10:18 AMFailed to compile.
./node_modules/react-syntax-highlighter/dist/esm/async-languages/prism.js
Module not found: Can't resolve 'refractor/lang/asmatmel.js' in '/mnt/c/XXX/XXX/datahub/datahub-web-react/node_modules/react-syntax-highlighter/dist/esm/async-languages'
Can someone help me out with this please?breezy-boots-97651
02/27/2023, 11:45 AM"2023-02-27 08:59:21.828147 [exec_id=dd1bff69-e3ef-432f-af66-5ad29328d529] INFO: Failed to execute 'datahub ingest'",
'2023-02-27 08:59:21.828429 [exec_id=dd1bff69-e3ef-432f-af66-5ad29328d529] INFO: Caught exception EXECUTING '
'task_id=dd1bff69-e3ef-432f-af66-5ad29328d529, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 168, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
lively-jackal-83760
02/27/2023, 2:16 PM{datahub_actions.plugin.action.teams.teams:60} - Teams notification action configured with webhook_url
but I see these
2023/02/27 13:25:30 Received 200 from <http://datahub-datahub-gms:8080/health>
[2023-02-27 13:25:32,685] DEBUG {datahub.telemetry.telemetry:210} - Sending init Telemetry
[2023-02-27 13:25:33,292] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2023-02-27 13:25:33,545] INFO {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.9.0.5rc2
[2023-02-27 13:25:33,551] DEBUG {datahub.cli.ingest_cli:196} - Using config: ...
[2023-02-27 13:25:34,190] DEBUG {datahub.ingestion.run.pipeline:174} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
[2023-02-27 13:25:34,190] INFO {datahub.ingestion.run.pipeline:175} - Sink configured successfully.
[2023-02-27 13:25:34,190] WARNING {datahub.ingestion.run.pipeline:276} - Failed to configure reporter: datahub
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 264, in _configure_reporting
reporter_class.create(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/reporting/datahub_ingestion_run_summary_provider.py", line 92, in create
raise ValueError(
ValueError: Datahub ingestion reporter will be disabled because sink type console is not supported
[2023-02-27 13:25:34,486] INFO {acryl_action_fwk.source.datahub_streaming:176} - Action executor:ExecutionRequestAction: configured
[2023-02-27 13:25:34,486] DEBUG {datahub.ingestion.run.pipeline:199} - Source type:datahub-stream,<class 'acryl_action_fwk.source.datahub_streaming.DataHubStreamSource'> configured
[2023-02-27 13:25:34,486] INFO {datahub.ingestion.run.pipeline:200} - Source configured successfully.
[2023-02-27 13:25:34,488] INFO {datahub.cli.ingest_cli:129} - Starting metadata ingestion
[2023-02-27 13:25:34,488] INFO {acryl_action_fwk.source.datahub_streaming:196} - Will subscribe to MetadataAuditEvent_v4, MetadataChangeLog_Versioned_v1
[2023-02-27 13:25:34,489] INFO {acryl_action_fwk.source.datahub_streaming:199} - Action framework started
[2023-02-27 13:26:06,968] INFO {acryl_action_fwk.source.datahub_streaming:206} - Msg received: MetadataChangeLog_Versioned_v1, 1, 888957
[2023-02-27 13:26:06,968] INFO {acryl_action_fwk.source.datahub_streaming:89} - Calling act of ExecutionRequestAction
looks like totally different. We updated our helm charts to the latest version
Or it doesn't look like correct version of datahub-actions pod?quiet-television-68466
02/27/2023, 2:20 PMtransformers:
-
type: add_dataset_properties
config:
add_properties_resolver_class: 'cdp-datahub-actions.snowflake_properties:SnowflakePropertiesResolver'
and I can see that cdp-datahub-actions is installed in the datahub-actions pod using pip freeze. Despite this I’m getting the error Failed to configure transformers: No module named 'cdp-datahub-actions'
. Have I installed the module in the wrong place? Any advice would be super appreciated!handsome-flag-16272
02/27/2023, 5:33 PMblue-agency-87812
02/27/2023, 5:39 PMgray-airplane-39227
02/27/2023, 6:23 PMplatform_instance
, from document it shows it’s supported by default but in code, metadata-ingestion/src/datahub/ingestion/source_config/sql/bigquery.py
has validator that says bigquery_doesnt_need_platform_instance
, wondering how these two would align.narrow-queen-90189
02/27/2023, 10:56 PMnarrow-queen-90189
02/27/2023, 10:58 PM~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/c3d726e3-088f-4574-af3c-89d4831fb7f9/recipe.yml --report-to /tmp/datahub/ingest/c3d726e3-088f-4574-af3c-89d4831fb7f9/ingestion_report.json
[2023-02-27 22:25:55,940] INFO {datahub.cli.ingest_cli:165} - DataHub CLI version: 0.10.0
[2023-02-27 22:25:55,965] INFO {datahub.ingestion.run.pipeline:179} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
/tmp/datahub/ingest/venv-powerbi-0.10.0/lib/python3.10/site-packages/datahub/ingestion/source/powerbi/powerbi.py:867: ConfigurationWarning: env is deprecated and will be removed in a future release. Please use platform_instance instead.
config = PowerBiDashboardSourceConfig.parse_obj(config_dict)
[2023-02-27 22:25:56,235] INFO {datahub.ingestion.source.powerbi.proxy:231} - Trying to connect to <https://login.microsoftonline.com/{tenant-id}>
[2023-02-27 22:25:56,235] INFO {datahub.ingestion.source.powerbi.proxy:349} - Generating PowerBi access token
[2023-02-27 22:25:56,375] INFO {datahub.ingestion.source.powerbi.proxy:363} - Generated PowerBi access token
[2023-02-27 22:25:56,375] INFO {datahub.ingestion.source.powerbi.proxy:233} - Able to connect to <https://login.microsoftonline.com/{tenant-id}>
[2023-02-27 22:25:56,587] INFO {datahub.ingestion.source.powerbi.proxy:231} - Trying to connect to <https://login.microsoftonline.com/{tenant-id}>
[2023-02-27 22:25:56,588] INFO {datahub.ingestion.source.powerbi.proxy:349} - Generating PowerBi access token
[2023-02-27 22:25:56,679] INFO {datahub.ingestion.source.powerbi.proxy:363} - Generated PowerBi access token
[2023-02-27 22:25:56,679] INFO {datahub.ingestion.source.powerbi.proxy:233} - Able to connect to <https://login.microsoftonline.com/{tenant-id}>
[2023-02-27 22:25:56,679] INFO {datahub.ingestion.run.pipeline:196} - Source configured successfully.
[2023-02-27 22:25:56,681] INFO {datahub.cli.ingest_cli:120} - Starting metadata ingestion
[2023-02-27 22:25:56,682] INFO {datahub.ingestion.source.powerbi.powerbi:892} - PowerBi plugin execution is started
[2023-02-27 22:25:56,682] INFO {datahub.ingestion.source.powerbi.proxy:757} - Request to get groups endpoint URL=<https://api.powerbi.com/v1.0/myorg/groups>
[2023-02-27 22:25:56,852] INFO {datahub.ingestion.reporting.file_reporter:52} - Wrote SUCCESS report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/c3d726e3-088f-4574-af3c-89d4831fb7f9/ingestion_report.json' mode='w' encoding='UTF-8'>
[2023-02-27 22:25:56,852] INFO {datahub.cli.ingest_cli:133} - Finished metadata ingestion
Cli report:
{'cli_version': '0.10.0',
'cli_entry_location': '/tmp/datahub/ingest/venv-powerbi-0.10.0/lib/python3.10/site-packages/datahub/__init__.py',
'py_version': '3.10.9 (main, Jan 23 2023, 22:32:48) [GCC 10.2.1 20210110]',
'py_exec_path': '/tmp/datahub/ingest/venv-powerbi-0.10.0/bin/python3',
'os_details': 'Linux-5.15.0-1026-aws-x86_64-with-glibc2.31',
'mem_info': '70.89 MB'}
Source (powerbi) report:
{'events_produced': 0,
'events_produced_per_sec': 0,
'entities': {},
'aspects': {},
'warnings': {},
'failures': {},
'dashboards_scanned': 0,
'charts_scanned': 0,
'filtered_dashboards': [],
'filtered_charts': [],
'start_time': '2023-02-27 22:25:56.071050 (now)',
'running_time': '0.94 seconds'}
Sink (datahub-rest) report:
{'total_records_written': 0,
'records_written_per_second': 0,
'warnings': [],
'failures': [],
'start_time': '2023-02-27 22:25:55.962587 (1.04 seconds ago)',
'current_time': '2023-02-27 22:25:57.006657 (now)',
'total_duration_in_seconds': 1.04,
'gms_version': 'v0.10.0',
'pending_requests': 0}
Pipeline finished successfully; produced 0 events in 0.94 seconds.
bland-appointment-45659
02/28/2023, 4:41 AMcool-yacht-59889
02/28/2023, 7:42 AM