https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • p

    proud-waitress-17589

    05/16/2023, 5:08 PM
    Reviving an old thread - is it possible to delete based on glossaryTermGroup? ie. I would like to remove a large branch from my glossary that was populated using the Glossary ingestion in order to rerun ingestion for that sub-tree, but do not want to delete the whole Glossary.
  • r

    rich-state-73859

    05/17/2023, 4:32 PM
    Is there any update for this issue?
  • a

    astonishing-father-13229

    05/18/2023, 7:14 PM
    Can someone help me ?
  • a

    adamant-furniture-37835

    05/23/2023, 7:56 AM
    Hi @astonishing-answer-96712,Apologies for delayed response. I didn't notice any error message on dev tools at UI. Maybe I haven't understood the feature or my expectations are different. Here is the scenario and queries : 1. I created a View with filter "platform of type Vertica or Tableau" and made it default view. a. When I login to homepage, it shows me everything i.e. all entity types, all platforms. I can see in dev tools that a graphql call is done to fetch View details but results aren't filtered. Isn't it that landing page should only show what default view allows ? 2. On Home page if I click on any other platform type, let's say Snowflake, it shows the message : No results found for "" . This is good but the unwanted platforms shouldn't be shown in first place, right ? 3. Under explore your data, I am able to navigate to all entity types and look at their details even though View is selected in the top penal. Our expectations is nothing should be shown that falls outside view definition. Please provide your opinion if it's a bug or part of the feature itself. Thanks, Mahesh
  • f

    future-analyst-98466

    05/31/2023, 6:42 AM
    @few-air-34037 how to pin/ lock sqlparse ver 0.4.3? tks!
  • h

    helpful-dream-67192

    06/02/2023, 8:27 AM
    We are trying to deploy latest version of datahub via helm. Getting same error in datahub-gms pod,
    Copy code
    2023-06-02 08:21:55,927 [ThreadPoolTaskExecutor-1] WARN  c.l.m.b.k.DataHubUpgradeKafkaListener:99 - System version is not up to date: v0.10.3-0. Waiting for datahub-upgrade to complete...
    2023-06-02 08:21:56,093 [pool-20-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    2023-06-02 08:21:56,112 [pool-20-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    2023-06-02 08:21:56,117 [pool-20-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    2023-06-02 08:21:56,394 [I/O dispatcher 1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/_bulk?timeout=1m>] returned 1 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."]
    2023-06-02 08:21:56,402 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
    2023-06-02 08:22:02,937 [pool-12-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <http://elasticsearch-master:9200/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 2 warnings: [299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See <https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html> to enable security."],[299 Elasticsearch-7.17.3-5ad023604c8d7416c9eb6c0eadb62b14e766caff "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    2023-06-02 08:22:38,508 [R2 Nio Event Loop-1-1] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
    io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
    Caused by: java.net.ConnectException: Connection refused
    	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
    	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
    	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    2023-06-02 08:22:40,615 [R2 Nio Event Loop-1-2] WARN  c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
    io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
    Can someone help here? Thanks in advance. cc: @proud-dusk-671 @millions-football-58938 @brainy-beach-58125
    plus1 1
  • f

    fast-vegetable-81275

    06/02/2023, 2:45 PM
    I tried this way referring to the docs, it didn't work. Does this s3 method need a spark and hadoop version set up on my machine?
  • c

    cuddly-butcher-39945

    06/06/2023, 3:57 PM
    I am also having issues with a gms deployment not finishing with helm. I am deploying helm chart release 2.161 onto EKS on fargate with AWS RDS/Opensearch services and standalone Kafka on fargate as well.
    gms-deploy-logs.zip
  • b

    bland-gigabyte-28270

    06/10/2023, 5:42 AM
    Same issue, can someone help?
  • b

    bland-gigabyte-28270

    06/12/2023, 1:00 AM
    We are encountering the same problem. Can someone help?
  • e

    elegant-article-21703

    06/13/2023, 8:28 AM
    Hi again, I've been playing with some combinations and I realised that: • If I don't apply a role to the new user I'm creating, when login/out I got an error in the login page • When applying a role to a new user, if this user belongs to a group, the privileges restrictions I applied are surpass by the role privileges (reader on this case) • If I take out the role from that user, it cannot access any of the assets (regardless of the policy applied to the user group) Does anyone faced something similar? Thanks everyone in advance (more info of my context in the thread)
  • e

    elegant-salesmen-99143

    06/13/2023, 12:57 PM
    Hi Team, sorry for repeating, but it's been few weeks since I tried to get help with Analytics tab issue where getHighlghts and getAnalyticsChart queries on it return empty from backend, and Analytics doesn't work. I don't know what can we do in this situation and really need help from Team, please🙏🙏🙏 The logs have been provided as requested, but no answer so far, I don't know if anyone saw it
  • t

    thankful-morning-85093

    06/14/2023, 10:56 PM
    Hi All, I tried to upgrade our Datahub deployment from 0.8.45 to 0.10.4. Still getting "Unauthorized" while the GMS pod does not through any error
  • e

    elegant-guitar-28442

    06/15/2023, 6:05 AM
    Thank you very much! I solved the problem according to your prompts. I will contribute a PR to fix this BUG.
    thank you 1
  • a

    adorable-lawyer-88494

    06/16/2023, 7:51 AM
    FYI @best-umbrella-88325
  • i

    incalculable-portugal-45517

    06/19/2023, 5:04 PM
    bump 🙂
  • b

    bland-gigabyte-28270

    06/22/2023, 1:17 AM
    Resurface it here, I’m still having problems even with
    max_threads
    fix. This is
    0.10.3
    using Snowflake: Config:
    Copy code
    source:
        type: snowflake
        config:
            account_id: <account-id>
            include_table_lineage: true
            include_view_lineage: true
            include_tables: true
            include_views: true
            profiling:
                enabled: true
                profile_table_level_only: true
            stateful_ingestion:
                enabled: true
            warehouse: DATAHUB_WH
            username: datahub_user
            role: DATAHUB_READER
            database_pattern:
                allow:
                    - PATTERN
            password: '${SNOWFLAKE_DATAHUB_USER_PASSWORD}'
    sink:
        type: datahub-rest
        config:
            server: '<http://datahub-datahub-gms:8080/>'
            max_threads: 1
    Logs:
    Copy code
    {
              "error": "Unable to emit metadata to DataHub GMS: javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
              "info": {
                "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
                "message": "javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
                "status": 500,
                "id": "urn:li:dataset:(urn:li:dataPlatform:snowflake,arene.aha.kfk_aha_feature,PROD)"
              }
            },
            {
              "error": "Unable to emit metadata to DataHub GMS: javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
              "info": {
                "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
                "message": "javax.persistence.PersistenceException: Error when batch flush on sql: update metadata_aspect_v2 set metadata=?, createdOn=?, createdBy=?, createdFor=?, systemmetadata=? where urn=? and aspect=? and version=?",
                "status": 500,
                "id": "urn:li:dataset:(urn:li:dataPlatform:snowflake,arene.aha.kfk_aha_release,PROD)"
              }
            },
    plus1 2
  • g

    great-car-44033

    07/03/2023, 12:02 PM
    I too have the same issue which was reported by @salmon-exabyte-77928. Is there any plan to fix this issue in coming releases?
  • p

    proud-intern-59151

    07/11/2023, 6:31 AM
    Hi @hundreds-photographer-13496, Thank you for your reply. I am just curious if it is necessary to ingest the Athena dataset (in my case) into DataHub, given that I am only submitting the Great Expectations’ validation results into DataHub. Do I really need to ingest my entire data into DataHub first? I have followed the below document link, and it doesn’t mention the need to pre-pollute the entire datasets into DataHub before submitting the respective metadata into it. https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/ In my case, the logs say that my data source name (my_datasource) is not present in “platform_instance_map”, which I don’t get exactly.
    Datasource my_datasource is not present in platform_instance_map.
    🩺 1
  • r

    rich-restaurant-61261

    07/11/2023, 6:44 AM
    Hi Team, I know this used to be blocked due to this awslabs/python-deequ#106, but I saw the deequ just got a new release, and should unblock this issue? I receive following error code when trying to ingest data from S3, and I am assuming we need a SPARK_VERSION environment variable to solver it? Supported values are: dict_keys(['3.3', '3.2', '3.1', '3.0', '2.4']) @gray-shoe-75895 @big-carpet-38439
    Copy code
    [2023-07-11 06:32:40,593] ERROR    {datahub.entrypoints:199} - Command failed: Failed to find a registered source for type s3: SPARK_VERSION environment variable is required. Supported values are: dict_keys(['3.3', '3.2', '3.1', '3.0', '2.4'])
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/pydeequ/configs.py", line 26, in _get_spark_version
        spark_version = os.environ["SPARK_VERSION"]
      File "/usr/local/lib/python3.10/os.py", line 680, in __getitem__
        raise KeyError(key) from None
    KeyError: 'SPARK_VERSION'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 120, in _add_init_error_context
        yield
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 220, in __init__
        source_class = source_registry.get(source_type)
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 183, in get
        tp = self._ensure_not_lazy(key)
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 127, in _ensure_not_lazy
        plugin_class = import_path(path)
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 57, in import_path
        item = importlib.import_module(module_name)
      File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
      File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/source/s3/__init__.py", line 1, in <module>
        from datahub.ingestion.source.s3.source import S3Source
      File "/tmp/datahub/ingest/venv-s3-0.10.4/lib/python3.10/site-packages/datahub/ingestion/source/s3/source.py", line 12, in <module>
        import pydeequ
  • v

    victorious-monkey-86128

    07/11/2023, 4:47 PM
    Hi, also some more info during build process:
    > Task :docker:kafka-setup:docker
    #12 ERROR: process "/bin/sh -c mkdir -p /opt   && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\\?as_json\\=1 | jq -r '.preferred')   && curl -sSL \"${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz\"   | tar -xzf - -C /opt   && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka   && adduser -DH -s /sbin/nologin kafka   && chown -R kafka: /opt/kafka   && echo \"===> Installing python packages ...\"    && pip install --no-cache-dir jinja2 requests   && pip install --prefer-binary --prefix=/usr/local --upgrade \"${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}\"   && rm -rf /tmp/*   && apk del --purge .build-deps" did not complete successfully: exit code: 1
    ------
    > [stage-1  5/15] RUN mkdir -p /opt   && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi?as_json=1> | jq -r '.preferred')   && curl -sSL "${mirror}kafka/3.4.0/kafka_2.13-3.4.0.tgz"   | tar -xzf - -C /opt   && mv /opt/kafka_2.13-3.4.0 /opt/kafka   && adduser -DH -s /sbin/nologin kafka   && chown -R kafka: /opt/kafka   && echo "===> Installing python packages ..."    && pip install --no-cache-dir jinja2 requests   && pip install --prefer-binary --prefix=/usr/local --upgrade "git+<https://github.com/confluentinc/confluent-docker-utils@v0.0.58>"   && rm -rf /tmp/*   && apk del --purge .build-deps:
    #12 1.144 tar: invalid magic
    #12 1.144 tar: short read
    ------                                                                                                                                                                                                                                                                       Dockerfile:31
    --------------------
    30 |     RUN apk add --no-cache -t .build-deps git curl ca-certificates jq gcc musl-dev libffi-dev zip
    31 | >>> RUN mkdir -p /opt \
    32 | >>>   && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\?as_json\=1 | jq -r '.preferred') \
    33 | >>>   && curl -sSL "${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz" \
    34 | >>>   | tar -xzf - -C /opt \
    35 | >>>   && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka \
    36 | >>>   && adduser -DH -s /sbin/nologin kafka \
    37 | >>>   && chown -R kafka: /opt/kafka \
    38 | >>>   && echo "===> Installing python packages ..."  \
    39 | >>>   && pip install --no-cache-dir jinja2 requests \
    40 | >>>   && pip install --prefer-binary --prefix=/usr/local --upgrade "${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}" \
    41 | >>>   && rm -rf /tmp/* \
    42 | >>>   && apk del --purge .build-deps
    43 |
    --------------------
    ERROR: failed to solve: process "/bin/sh -c mkdir -p /opt   && mirror=$(curl --stderr /dev/null <https://www.apache.org/dyn/closer.cgi>\\?as_json\\=1 | jq -r '.preferred')   && curl -sSL \"${mirror}kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz\"   | tar -xzf - -C /opt   && mv /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION} /opt/kafka   && adduser -DH -s /sbin/nologin kafka   && chown -R kafka: /opt/kafka   && echo \"===> Installing python packages ...\"    && pip install --no-cache-dir jinja2 requests   && pip insta$l --prefer-binary --prefix=/usr/local --upgrade \"${PYTHON_CONFLUENT_DOCKER_UTILS_INSTALL_SPEC}\"   && rm -rf /tmp/*   && apk del --purge .build-deps" did not complete successfully: exit code: 1
    > Task :docker:kafka-setup:docker FAILED
  • b

    bitter-wire-42401

    07/12/2023, 12:00 PM
    I was able to resolve the issue by following the steps here https://datahubspace.slack.com/archives/C029A3M079U/p1680557916230119?thread_ts=1680516752.955739&amp;cid=C029A3M079U But now
    datahub docker ingest-sample-data
    does not work ERROR {datahub.ingestion.run.pipeline:68} - failed to write record with workunit file
  • s

    some-crowd-4662

    07/14/2023, 7:10 PM
    Ingest Log
    Ingest Log
  • s

    some-crowd-4662

    07/17/2023, 3:18 AM
    @hundreds-photographer-13496 Hi, i turedn on the debug mode and then i saw the follwing error
  • b

    brave-engine-32813

    07/19/2023, 4:47 AM
    Hi everyone , Anyone facing issue connecting to ssl enabled s3 or minio in datahub UI ingestion? If you are connecting using s3 delta lake source config, is verify_ssl parameter working as expected? Thanks
  • n

    nutritious-bird-77396

    07/19/2023, 3:39 PM
    @delightful-ram-75848 Let me rephrase the question I am able to ingest redshift tables, schemas and views but for views the schema is not pulled, is it currently supported in datahub?
  • s

    some-crowd-4662

    07/19/2023, 6:52 PM
    yes i can hit this url in the browser
  • b

    bland-barista-59197

    07/25/2023, 7:17 PM
    Hi @delightful-ram-75848 is it possible to run this query
    /q browsePaths: /datasets/prod/hive*
    ? I’m getting error. 500 Server_error.
  • e

    eager-nest-72774

    08/02/2023, 4:39 PM
    @hundreds-photographer-13496 On the kubernetis cluster using boto3 i generated the credentials and i had passed these credentials in
    s3_resource = boto3.resource('s3', aws_access_key_id=access_key, aws_secret_access_key=secret_key, aws_session_token=token)
    the credentials are working in boto3 but when i passed the same credentials in the delta lake ingestion recipe it is not working on the pod of kubernetis cluster
  • b

    bland-barista-59197

    08/03/2023, 4:10 PM
    Hi @delightful-ram-75848 any update?
1...115116117118119Latest