https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • c

    cuddly-butcher-39945

    07/28/2022, 7:35 PM
    Hi Everyone! I am running the quickstart on an Amazon Linux 2 T3 Large Instance. • DataHub CLI version: 0.8.41.2 • Python version: 3.7.10 (default, Jun 3 2021, 000201) • [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] I've confirmed my Snowflake Credentials are correct, but keep getting this error on a manual ingestion.
    Copy code
    'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
    I've never seen anything about a pipeline_name being needed. Please let me know if there is something you can think of to help me get my first ingestion to complete 🙂 More context around this error:
    Copy code
    '[2022-07-28 19:30:04,041] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.41\n'
               '[2022-07-28 19:30:04,091] INFO     {datahub.ingestion.run.pipeline:160} - Sink configured successfully. DataHubRestEmitter: configured '
               'to talk to <http://datahub-gms:8080>\n'
               '[2022-07-28 19:30:06,244] INFO     {datahub.ingestion.source_config.sql.snowflake:231} - using authenticator type '
               "'DEFAULT_AUTHENTICATOR'\n"
               '[2022-07-28 19:30:06,244] ERROR    {datahub.ingestion.run.pipeline:126} - pipeline_name must be provided if stateful ingestion is '
               'enabled.\n'
               '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
               '[2022-07-28 19:30:06,244] INFO     {datahub.cli.ingest_cli:133} - Finished metadata pipeline\n'
               '\n'
               'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
               "2022-07-28 19:30:08.102905 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Failed to execute 'datahub ingest'",
               '2022-07-28 19:30:08.103382 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Caught exception EXECUTING '
               'task_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 114, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    m
    • 2
    • 5
  • f

    faint-translator-23365

    07/28/2022, 8:51 PM
    In datahub v0.8.40 frontend screen goes blank whenever I click on glossary terms. Can anyone help on this? Thanks! Slack Conversation
    b
    b
    • 3
    • 11
  • m

    microscopic-mechanic-13766

    07/29/2022, 10:23 AM
    Hello again, reposting my issue from a few days back again to see if anyone could me with it. Appreciate any type of help! 🙂 Update: I have found that the methods in the class
    CentralLogoutController.java
    (datahub/datahub-frontend/app/controllers) are not used in any part of the code. I have also found that the default logout URL used is
    "/"
    , shouldn't it be something like this for Keycloak??
    Copy code
    https://<keycloak_hsot>/auth/realms/<realm>/protocol/openid-connect/logout?redirect_uri=https://<datahub_host>/logout
    g
    • 2
    • 3
  • v

    victorious-pager-14424

    07/29/2022, 1:10 PM
    (issue solved) Hi everyone, I’ve deployed Datahub in k8s using a custom helm chart (to follow my company standards) but it seems that the frontend is returning error 400 bad request for every request in the
    api/v2/graphql
    route. Any tips on how can I debug this issue? More info on 🧵
    • 1
    • 2
  • w

    wide-printer-24185

    07/29/2022, 1:52 PM
    👋 Hello, team! i'm facing following issue while starting datahub on Docker: # python3 -m datahub docker quickstart Unable to run quickstart: - Docker doesn't seem to be running. Did you start it? # python3 -c "import platform; print(platform.platform())" Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31 # python3 -c "import sys; print(sys.version); print(sys.executable); import datahub; print(datahub.file); print(datahub.version);" 3.10.5 (main, Jul 12 2022, 113211) [GCC 10.2.1 20210110] /usr/local/bin/python3 /usr/local/lib/python3.10/site-packages/datahub/__init__.py 0.8.41.2 # python3 -c "import docker; print(f'{docker.from_env()}')" Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/local/lib/python3.10/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/local/lib/python3.10/http/client.py", line 975, in send self.connect() File "/usr/local/lib/python3.10/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen retries = retries.increment( File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/local/lib/python3.10/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.10/http/client.py", line 1037, in _send_output self.send(msg) File "/usr/local/lib/python3.10/http/client.py", line 975, in send self.connect() File "/usr/local/lib/python3.10/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/docker/api/client.py", line 214, in _retrieve_server_version return self.version(api_version=False)["ApiVersion"] File "/usr/local/lib/python3.10/site-packages/docker/api/daemon.py", line 181, in version return self._result(self._get(url), json=True) File "/usr/local/lib/python3.10/site-packages/docker/utils/decorators.py", line 46, in inner return f(self, *args, **kwargs) File "/usr/local/lib/python3.10/site-packages/docker/api/client.py", line 237, in _get return self.get(url, **self._set_request_timeout(kwargs)) File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 600, in get return self.request("GET", url, **kwargs) File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 547, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.10/site-packages/docker/client.py", line 96, in from_env return cls( File "/usr/local/lib/python3.10/site-packages/docker/client.py", line 45, in init self.api = APIClient(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/docker/api/client.py", line 197, in init self._version = self._retrieve_server_version() File "/usr/local/lib/python3.10/site-packages/docker/api/client.py", line 221, in _retrieve_server_version raise DockerException( docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) Please help me on this.
    l
    • 2
    • 1
  • r

    red-vr-34382

    07/29/2022, 3:42 PM
    hi, i’m trying to ingest s3 and the job keeps failing (through the UI).
    l
    c
    • 3
    • 6
  • d

    delightful-barista-90363

    07/29/2022, 6:27 PM
    Hey, think i ran into a bug. Want to run profiling on the S3DataLake ingestion source like
    <s3://bucket_name/{table}/20220729/*.csv>
    but it doesnt seem to be working. spark gets initialized but thats about it. thanks in advanced. Actually doesnt look like profiling is run at all. Spark gets initialized but isnt used 🤔 More specifically getting this error
    Unable to infer schema for CSV. It must be specified manually.
    Looks like in the debug logs, its only going to the
    {table}
    when trying to open up spark
    DEBUG:datahub.ingestion.source.s3.source:Opening file <s3://bucket/jordan-test/dataset_a> for profiling in spark
    when the file lives 2 folders down
    c
    • 2
    • 14
  • c

    chilly-elephant-51826

    07/31/2022, 5:48 PM
    #troubleshoot I have ingestion setup in datahub that populates owner details, though some the owner were added using UI, after scheduled run user added from ui were removed, shouldn't datahub preserve the changes and add only delta ? Plus I found it weird that using MCE we can add only one user to an entity, even with other type of ownership
    • 1
    • 2
  • s

    square-solstice-69079

    08/01/2022, 6:51 AM
    Hello, our datahub instance went down during vacation, and I'm not sure how to debug further. Its run on a EC2 instance using quickstart. The main modification done is enabling OIDC authentication with a custom docker-compose file. I'm able to connect to the datahub EC2 instance. The error is first 502 Bad Gateway, and after refreshes its the red Oops, and error occurred, This exception has been logged with id xyz. I rerun the docker compose command, tried to disable OIDC (docker-compose -p datahub -f docker-compose.yml -f docker-compose.yml up -d datahub-frontend-react), tried ingestion, but getting Connection refused. Any tips on how to debug further? I'm on version 0.8.40.2. This is the docker-compose: https://pastebin.com/3xRCtURe
    i
    • 2
    • 6
  • h

    helpful-painting-48754

    08/01/2022, 8:29 AM
    Hello, I tried to test Datahub's GraphQL on Postman but, I am having this issue. How should I go about it?
    b
    • 2
    • 1
  • c

    cold-autumn-7250

    07/31/2022, 5:46 PM
    Hey guys, we are currently setting up Airflow with Datahub. This works really great but it seems like macros are not resolved. Does anyone know how to do so? Example (result see attachment):
    Copy code
    s3_sensor = S3KeySensor(
      task_id="s3_file_check",
      aws_conn_id="aws_prod",
      bucket_key=bronze_path,
      bucket_name=bronze_bucket,
      poke_interval=60,
      mode="reschedule",
      timeout=60 * 60 * 8,
      dag=dag,
      inlets=[Dataset("s3", "test/{{ ds }}")],
    )
    Reason is that I would like to connect the actual file on S3 with the Airflow run. Thanks a lot for any suggestion 🙂 PS: I am using the following versions: apache-airflow-providers-amazon==4.1.0 acryl-datahub-airflow-plugin==0.8.41.2
    plus1 1
    d
    w
    • 3
    • 4
  • h

    handsome-football-66174

    07/29/2022, 8:54 PM
    Hi Team, Trying to use GraphQL for searchAcrossEntities https://datahubproject.io/docs/graphql/queries#searchacrossentities how do we add filters( came up with this so far ) ?
    Copy code
    {
      searchAcrossEntities( input: {start: 0, count: 20, query: "*", types: [CONTAINER],filters:  ["subTypes","Database"]
        })
     {
        searchResults {
          entity {
            urn
            type
            }
          }
        }
     }
    b
    • 2
    • 3
  • b

    big-zoo-81740

    08/01/2022, 9:00 PM
    Hey all, not sure if this is the right place to be asking this, but I'm setting up lookml ingestion and keep getting an error saying that the directory is not found. I'm using the directory of my lookml repo/files using the
    base_folder
    config option,
    /home/ubuntu/github/myreponame
    , but I keep getting an error saying it can't find the directory or it doesn't exist. The folder has r/w permissions, so datahub should be able to read from it. Is there something I am obviously doing wrong? Does the repo need to be located in a specific folder in order for the
    base_folder
    config to be able to read from it?
  • l

    little-breakfast-38102

    08/02/2022, 4:05 AM
    Hi team, I recently updated our acryl-datahub-actions service to use newer version of image and started facing “CrashLoopBackOff” error. Here are the images with versions used in values.yaml Following is the error message from pod log
    2022/08/01 222256 Waiting for: http//health
    2032/99022, 32-32 38, 28135 P, dotan'tiP;"éfeal"bet"hittp://eath":nttpinoHost in request UfL. Steeping 15
    Values.yaml
    tag: "V0.8.40"
    datahub-gms:
    image:
    repository: $(cg_image_repo}/linkedin/datahub-gms
    datahub-frontend:
    image:
    repository: ${cg_image_repo}/linkedin/datahub-frontend-react
    acryl-datahub-actions:
    image:
    repository: $(cg_image_repo}/acryldata/datahub-actions
    datahub-mae-consumer:
    image:
    repository: ${cg_image_repo}/linkedin/datahub-mae-consumer
    datahub-mce-consumer:
    image:
    repository: ${cg_image_repo}/Linkedin/datahub-mce-consumer
    datahub-ingestion-cron:
    image:
    repository: $(ecr_image_repo}/acryldata/datahub-ingestion:v0.8.40
    #customized by adding additional drivers
    elasticsearchSetupJob:
    image:
    repository: $(cg_image_repo}/Linkedin/datahub-elasticsearch-setup
    kafkaSetupJob:
    image:
    repository: ${cg_image_repo}/Linkedin/datahub-kafka-setup
    mysqlSetupJob:
    image:
    repository: $(cg_image_repo}/acryldata/datahub-mysql-setup
    postgresqlSetupJob:
    image:
    repository: $(cg_image_repo}/acryldata/datahub-postgres-setup
    datahubUpgrade:
    image:
    repository: ${cg_image_repo}/acryldata/datahub-upgrade
    Appreciate any help.
    f
    l
    i
    • 4
    • 6
  • a

    astonishing-guitar-79208

    08/01/2022, 1:31 PM
    Hi Is there a way to disable frontend authentication for datahub-frontend container? We wanted to use a proxy in front of datahub which does the authentication for the user.
    b
    g
    b
    • 4
    • 7
  • i

    icy-portugal-26250

    07/25/2022, 12:08 PM
    Hi, after updating to datahub v0.8.40 in our production system (which is weird as it works in the sandboxed dev), all users can authenticate via OICD but all content appears to be unaccessible with a “Unauthorized” message. I tried to control the policies on the
    <datahub-url>/policies
    only to get a
    Copy code
    Unauthorized to perform this action. Please contact your DataHub administrator. (code 403)
    I wanted to login as a datahub user, but the logout just redirect me to the homepage, and in logs the
    datahub-frontend
    pod the following error:
    Copy code
    13:45:34 [application-akka.actor.default-dispatcher-47862] ERROR auth.sso.oidc.OidcCallbackLogic - Unable to renew the session. The session store may not support this feature
    I tried also adding myself to the
    user.props
    but it does not any effect. Is there other ways to add policies? How would I go to debug this?
    b
    s
    +3
    • 6
    • 20
  • s

    silly-room-64336

    07/28/2022, 9:12 AM
    Hi everyone I am facing some problem with Postgresqlsetup over ssl Tried with below conf but it was not successful , Valuse.yml - sql section
    Copy code
    sql:
        datasource:
          host: "<http://xxx.privatelink.azure.com:5432|xxx.privatelink.azure.com:5432>"
          hostForpostgresqlClient: "<http://xxx.privatelink.azure.com|xxx.privatelink.azure.com>"
          port: "5432"
          url: "jdbc:<postgresql://xxx.privatelink>..<http://azure.com:5432/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8|azure.com:5432/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>"
          driver: "org.postgresql.Driver"
          username: "pgadmin"
          password:
            secretRef: mysql-secrets
            secretKey: password
    Error that i am getting
    Copy code
    2022/07/28 10:48:26 Waiting for: <tcp://xxx.privatelink.postgres.database.azure.com:5432>
    2022/07/28 10:48:26 Connected to <tcp://xxx.privatelink.postgres.database.azure.com:5432>
    psql: error: connection to server at "<http://xxx.privatelink.postgres.database.azure.com|xxx.privatelink.postgres.database.azure.com>" , port 5432 failed: FATAL:  password authentication failed for user "datahub"
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    connection to server at "<http://xxx.privatelink.postgres.database.azure.com|xxx.privatelink.postgres.database.azure.com>" , port 5432 failed: FATAL:  SSL connection is required. Please specify SSL options and retry.
    I verified pgadmin password is correct and able to login with the same password in PGADMIN4 app Please help here
    g
    b
    • 3
    • 2
  • n

    nutritious-finland-99092

    07/26/2022, 2:19 PM
    Hi guys, I'm having trouble to restore indices on my DataHub deployment using AWS ECS with doker images. I'm using this doc but I'm struggling with the
    Copy code
    --restore-indices
    command on ECS. Doc: https://datahubproject.io/docs/how/restore-indices/ Can someone help me please?
    plus1 3
    i
    • 2
    • 1
  • a

    adamant-van-21355

    07/13/2022, 7:51 AM
    Hi everyone 👋🏼 we are using Datahub on the latest version (v0.8.40) and currently ingesting data from Snowflake and DBT. We are having some issues using the
    stateful ingestion
    feature on DBT. Once we enable the stateful configuration we got the following stack-trace (in thread) with an assertion error, while metadata is ingested successfully. This is happening either on top of an old DBT ingestion config or on a new one after enabling the stateful ingestion with
    "remove_stale_metadata": True
    . I would appreciate any clues on how we can make this work properly so any stale metadata is removed on future ingestion runs 🙏
    Copy code
    ...
    "pipeline_name": "my-dbt-pipeline",
    
    ...
    
    "stateful_ingestion": {
        "enabled": True,
        "remove_stale_metadata": True
    },
    ...
    l
    w
    +4
    • 7
    • 14
  • j

    jolly-traffic-67085

    08/02/2022, 8:59 AM
    Hello Team, I need to know about graphiQL query. - list all Data platform - list database all or list database under the platform GraphiQL can query follow this. please advice me about grapiQL query.
    b
    • 2
    • 4
  • d

    delightful-zebra-4875

    08/02/2022, 11:22 AM
    Copy code
    The table structure of flink-hivecatalog is in the properties after adding the data source
  • d

    delightful-zebra-4875

    08/02/2022, 11:23 AM
    Copy code
    What should I do at this time?
  • m

    miniature-journalist-76345

    07/19/2022, 8:09 AM
    Hello, team! I'm trying to run quickstart on latest version of Datahub and getting unhealthy datahub-gms container, and this time I can't fix it myself, it shows some NPEs in logs. Can anybody help me please? Error is in the thread.
    b
    • 2
    • 8
  • a

    ancient-apartment-23316

    07/28/2022, 5:55 PM
    Hi, I successfully installed datahub on AWS (managed services) + EKS using helm. I can access datahub on the k8s POD via
    kubectl port-forward pod/datahub-datahub-frontend-67986b756-tsrv4 9002:9002
    but there is no access from the internet. I have k8s service external-ip (ac5408daf1d594c16b937f80a18e0218-1264659697.us-east-1.elb.amazonaws.com) but this link doesn’t work (9002 port). Also I have the k8s ingress and it doesn’t work too. Can you help me please? I check everything but can’t find the root cause
    i
    • 2
    • 1
  • n

    numerous-account-62719

    08/01/2022, 6:55 PM
    Hi, Can anyone please tell me how to enable the lineage feature in datahub?
    g
    l
    • 3
    • 2
  • n

    numerous-account-62719

    08/02/2022, 6:43 AM
    Hi, Can anyone please tell me how to enable the lineage feature in datahub? Please resolve this on priority
    g
    c
    • 3
    • 13
  • p

    purple-soccer-81736

    08/02/2022, 4:00 PM
    Hi team! do you know if possible change
    METADATA_SERVICE_AUTH_ENABLED=true
    after the installation of datahub? or the only way is reinstall
    s
    • 2
    • 1
  • e

    echoing-alligator-70530

    08/02/2022, 4:48 PM
    Is there a way to delete by source for eg I had done a file based lineage and there are a lot of entities is there a quick way to delete that lineage?
  • f

    faint-translator-23365

    08/02/2022, 7:12 PM
    Hi, I want to use ldap just for authentication and while doing so I want to retrieve the user attributes(email, username etc) from ldap server, which module should I use in my jaas.conf? I used com.sun.security.auth.module.LdapLoginModule and also org.eclipse.jetty.server.server.plus.jaas.spi.LdapLoginModule but these modules doen't have the option to retrieve those user attributes. Can anyone please help and share the sample configuration if possible, thanks!
    l
    • 2
    • 1
  • d

    delightful-jelly-56633

    08/02/2022, 7:13 PM
    Hello, I'm trying to run the quickstart and getting an error. Here are some system config details
    Copy code
    docker compose version
    Docker Compose version v2.6.0
    
    python --version
    Python 3.9.9
    
    docker --version
    Docker version 20.10.17, build 100c701
    
     docker version
    Client: Docker Engine - Community
     Version:           20.10.17
    ...
    Server: Docker Engine - Community
     Engine:
      Version:          20.10.17
    
    lsb_release -a
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 22.04 LTS
    Release:	22.04
    Codename:	jammy
    
    datahub --version
    acryl-datahub, version 0.8.41.2
    c
    g
    • 3
    • 6
1...404142...119Latest