proud-waitress-17589
05/16/2023, 5:08 PMrich-state-73859
05/17/2023, 4:32 PMastonishing-father-13229
05/18/2023, 7:14 PMadamant-furniture-37835
05/23/2023, 7:56 AMfuture-analyst-98466
05/31/2023, 6:42 AMhelpful-dream-67192
06/02/2023, 8:27 AM2023-06-02 08:21:55,927 [ThreadPoolTaskExecutor-1] WARN c.l.m.b.k.DataHubUpgradeKafkaListener:99 - System version is not up to date: v0.10.3-0. Waiting for datahub-upgrade to complete...
2023-06-02 08:21:56,093 [pool-20-thread-1] WARN org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
2023-06-02 08:21:56,112 [pool-20-thread-1] WARN org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
2023-06-02 08:21:56,117 [pool-20-thread-1] WARN org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
2023-06-02 08:21:56,394 [I/O dispatcher 1] WARN org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/_bulk?timeout=1m>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."]
2023-06-02 08:21:56,402 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
2023-06-02 08:22:02,937 [pool-12-thread-1] WARN org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
2023-06-02 08:22:38,508 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-02 08:22:40,615 [R2 Nio Event Loop-1-2] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Can someone help here? Thanks in advance.
cc: @proud-dusk-671 @millions-football-58938 @brainy-beach-58125fast-vegetable-81275
06/02/2023, 2:45 PMcuddly-butcher-39945
06/06/2023, 3:57 PMbland-gigabyte-28270
06/10/2023, 5:42 AMbland-gigabyte-28270
06/12/2023, 1:00 AMelegant-article-21703
06/13/2023, 8:28 AMelegant-salesmen-99143
06/13/2023, 12:57 PMthankful-morning-85093
06/14/2023, 10:56 PMelegant-guitar-28442
06/15/2023, 6:05 AMadorable-lawyer-88494
06/16/2023, 7:51 AMincalculable-portugal-45517
06/19/2023, 5:04 PMbland-gigabyte-28270
06/22/2023, 1:17 AMmax_threads
fix. This is 0.10.3
using Snowflake:
Config:
source:
type: snowflake
config:
account_id: <account-id>
include_table_lineage: true
include_view_lineage: true
include_tables: true
include_views: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
warehouse: DATAHUB_WH
username: datahub_user
role: DATAHUB_READER
database_pattern:
allow:
- PATTERN
password: '${SNOWFLAKE_DATAHUB_USER_PASSWORD}'
sink:
type: datahub-rest
config:
server: '<http://datahub-datahub-gms:8080/>'
max_threads: 1
Logs:
{
"error": "Unable to emit metadata to DataHub GMS: javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
"info": {
"exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
"message": "javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
"status": 500,
"id": "urn:li:dataset:(urn:li:dataPlatform:snowflake,arene.aha.kfk_aha_feature,PROD)"
}
},
{
"error": "Unable to emit metadata to DataHub GMS: javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
"info": {
"exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
"message": "javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
"status": 500,
"id": "urn:li:dataset:(urn:li:dataPlatform:snowflake,arene.aha.kfk_aha_release,PROD)"
}
},
great-car-44033
07/03/2023, 12:02 PMproud-intern-59151
07/11/2023, 6:31 AMDatasource my_datasource is not present in platform_instance_map.
rich-restaurant-61261
07/11/2023, 6:44 AM[2023-07-11 06:32:40,593] ERROR {datahub.entrypoints:199} - Command failed: Failed to find a registered source for type s3: SPARK_VERSION environment variable is required. Supported values are: dict_keys(['3.3', '3.2', '3.1', '3.0', '2.4'])
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/pydeequ/configs.py", line 26, in _get_spark_version
spark_version = os.environ["SPARK_VERSION"]
File "/usr/local/lib/python3.10/os.py", line 680, in __getitem__
raise KeyError(key) from None
KeyError: 'SPARK_VERSION'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
yield
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
source_class = source_registry.get(source_type)
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 183, in get
tp = self._ensure_not_lazy(key)
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 127, in _ensure_not_lazy
plugin_class = import_path(path)
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 57, in import_path
item = importlib.import_module(module_name)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/source/s3/__init__.py", line 1, in <module>
from datahub.ingestion.source.s3.source import S3Source
File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/source/s3/source.py", line 12, in <module>
import pydeequ
victorious-monkey-86128
07/11/2023, 4:47 PM> Task :docker:kafka-setup:docker
#12 ERROR: process "/bin/sh -c mkdir -p /opt && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\\?as_json\\=1 | jq -r '.preferred') && curl -sSL \"${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz\" | tar -xzf - -C /opt && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka && adduser -DH -s /sbin/nologin kafka && chown -R kafka: /opt/kafka && echo \"===> Installing python packages ...\" && pip install --no-cache-dir jinja2 requests && pip install --prefer-binary --prefix=/usr/local --upgrade \"${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}\" && rm -rf /tmp/* && apk del --purge .build-deps" did not complete successfully: exit code: 1
------
> [stage-1 5/15] RUN mkdir -p /opt && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi?as_json=1> | jq -r '.preferred') && curl -sSL "${mirror}kafka/3.4.0/kafka_2.13-3.4.0.tgz" | tar -xzf - -C /opt && mv /opt/kafka_2.13-3.4.0 /opt/kafka && adduser -DH -s /sbin/nologin kafka && chown -R kafka: /opt/kafka && echo "===> Installing python packages ..." && pip install --no-cache-dir jinja2 requests && pip install --prefer-binary --prefix=/usr/local --upgrade "git+<https://github.com/confluentinc/confluent-docker-utils@v0.0.58>" && rm -rf /tmp/* && apk del --purge .build-deps:
#12 1.144 tar: invalid magic
#12 1.144 tar: short read
------ Dockerfile:31
--------------------
30 | RUN apk add --no-cache -t .build-deps git curl ca-certificates jq gcc musl-dev libffi-dev zip
31 | >>> RUN mkdir -p /opt \
32 | >>> && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\?as_json\=1 | jq -r '.preferred') \
33 | >>> && curl -sSL "${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz" \
34 | >>> | tar -xzf - -C /opt \
35 | >>> && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka \
36 | >>> && adduser -DH -s /sbin/nologin kafka \
37 | >>> && chown -R kafka: /opt/kafka \
38 | >>> && echo "===> Installing python packages ..." \
39 | >>> && pip install --no-cache-dir jinja2 requests \
40 | >>> && pip install --prefer-binary --prefix=/usr/local --upgrade "${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}" \
41 | >>> && rm -rf /tmp/* \
42 | >>> && apk del --purge .build-deps
43 |
--------------------
ERROR: failed to solve: process "/bin/sh -c mkdir -p /opt && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\\?as_json\\=1 | jq -r '.preferred') && curl -sSL \"${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz\" | tar -xzf - -C /opt && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka && adduser -DH -s /sbin/nologin kafka && chown -R kafka: /opt/kafka && echo \"===> Installing python packages ...\" && pip install --no-cache-dir jinja2 requests && pip insta$l --prefer-binary --prefix=/usr/local --upgrade \"${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}\" && rm -rf /tmp/* && apk del --purge .build-deps" did not complete successfully: exit code: 1
> Task :docker:kafka-setup:docker FAILED
bitter-wire-42401
07/12/2023, 12:00 PMdatahub docker ingest-sample-data
does not work
ERROR {datahub.ingestion.run.pipeline:68} - failed to write record with workunit filesome-crowd-4662
07/14/2023, 7:10 PMsome-crowd-4662
07/17/2023, 3:18 AMbrave-engine-32813
07/19/2023, 4:47 AMnutritious-bird-77396
07/19/2023, 3:39 PMsome-crowd-4662
07/19/2023, 6:52 PMbland-barista-59197
07/25/2023, 7:17 PM/q browsePaths: /datasets/prod/hive*
? I’m getting error. 500 Server_error.eager-nest-72774
08/02/2023, 4:39 PMs3_resource = boto3.resource('s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, aws_session_token=token)
the credentials are working in boto3 but when i passed the same credentials in the delta lake ingestion recipe it is not working on the pod of kubernetis clusterbland-barista-59197
08/03/2023, 4:10 PM