https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • b

    brave-tomato-16287

    08/10/2022, 8:04 AM
    Hello All. What role should a user have to add tags?
    b
    b
    • 3
    • 3
  • a

    adamant-van-21355

    08/10/2022, 8:54 AM
    Hi everyone đź‘‹ We try to upgrade on the latest version
    v0.8.42
    , we are using helm charts for deployment but I don't see the latest version 0.2.88 on the datahub helm chart release yet here and the application version showing there is still
    0.8.41
    . Any updates on that? Also fyi the new version does not reflect to the components used in the values.yaml on the datahub-helm repo (still showing 0.8.41 for all of them instead of 0.8.42).
    i
    • 2
    • 5
  • n

    narrow-apple-60403

    08/10/2022, 9:17 AM
    Hello, I'm trying to deploy a datahub via EKS. The deployment was successful, but I am having difficulties using the load balancer. I followed this link(https://datahubproject.io/docs/deploy/aws#expose-endpoints-using-a-load-balancer) as-is and succeeded in deploying aws-load-balancer-controller. However, when creating an ELB, a classic load balancer, not an ALB, is created. I'm not sure what the problem is. I need help.
    kubectl get deployment -n kube-system aws-load-balancer-controller
    Copy code
    NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
    aws-load-balancer-controller   2/2     2            2           26h
    kubectl get ingress
    Copy code
    NAME                       CLASS    HOSTS                               ADDRESS   PORTS   AGE
    datahub-datahub-frontend   <none>   <http://datahub-eks.example.com|datahub-eks.example.com>                       80      39m
    values.yaml
    Copy code
    datahub-frontend:
      enabled: true
      image:
        repository: linkedin/datahub-frontend-react
        tag: "v0.8.41"
      # Set up ingress to expose react front-end
      ingress:
        enabled: true
        annotations:
          <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
          <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
          <http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: instance
          <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: <<certificate-arn>>
          <http://alb.ingress.kubernetes.io/inbound-cidrs|alb.ingress.kubernetes.io/inbound-cidrs>: 0.0.0.0/0
          <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: '[{"HTTP": 80}, {"HTTPS":443}]'
          <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
        hosts:
          - host: <http://datahub-eks.example.com|datahub-eks.example.com>
            redirectPaths:
              - path: /*
                name: ssl-redirect
                port: use-annotation
            paths:
              - /*
    i
    b
    • 3
    • 4
  • c

    careful-france-4125

    08/10/2022, 9:19 AM
    Hi Team,We are trying to push monthly stats info into DataHub..We see that this is a value that must be precalculated and just passed on..Do any of you know if there is an option to customise this field name .. to anything like yearly stats
    b
    • 2
    • 1
  • a

    ambitious-cartoon-15344

    08/10/2022, 10:08 AM
    Hi Team: Datahub airflow plug-in has a safe bug . When airflow uses RedshiftOperator, it captures the sql statements, but there may be a safety problem in the sql, the sql should be Rendered, if some unload sql is executed there is aws_iam_role
    d
    • 2
    • 3
  • f

    famous-florist-7218

    08/10/2022, 10:52 AM
    Hi guys, My MongoDB’s ingestion job keeps failing with the error below. Is there anyway to deal with this case? Any help is appreciated 🤗
    Copy code
    'File "/tmp/datahub/ingest/venv-d4f145d7-4f51-4053-990f-4c541a5eaa57/lib/python3.9/site-packages/bson/__init__.py", line 1164, in '
    '_decode_all_selective\n'
    '    return decode_all(data, codec_options)\n'
    'File "/tmp/datahub/ingest/venv-d4f145d7-4f51-4053-990f-4c541a5eaa57/lib/python3.9/site-packages/bson/__init__.py", line 1108, in '
    'decode_all\n'
    '    return _decode_all(data, opts)  # type: ignore[arg-type]\n'
    '\n'
    'InvalidBSON: year 50577 is out of range\n'
    b
    • 2
    • 6
  • f

    famous-florist-7218

    08/10/2022, 11:28 AM
    In
    bigquery.py
    , I found that the base_query will be failed with
    Syntax error: Missing whitespace between literal and alias
    whenever the profiling is enabled. • It seems like {schema}.TABLES should be put into the angled apostraphes (``{schema}.__TABLES__``) because of BigQuery Syntax Rule. • Should I raise this point as a bug?
    Copy code
    @staticmethod
        def get_all_schema_tables_query(schema: str) -> str:
            base_query = (
                f"SELECT "
                f"table_id, "
                f"size_bytes, "
                f"last_modified_time, "
                f"row_count, "
                f"FROM {schema}.__TABLES__"
            )
            return base_query
    P/s: Please take a look at the attached screenshot for more detail.
    g
    d
    +2
    • 5
    • 9
  • b

    brave-secretary-27487

    08/10/2022, 11:48 AM
    Hey has anybody tried to intergrate this https://datahubproject.io/docs/lineage/airflow/#using-datahubs-airflow-lineage-plugin-new with airflow google managed (composer)? Just tried it and our schedular broke and we had to revert the changes? If anybody has any tips that would be great
    h
    • 2
    • 1
  • p

    purple-analyst-83660

    08/10/2022, 1:17 PM
    Hi Team, I am trying to enable option of metadata access token from Datahub UI. I included this piece in values.yaml and upgraded.
    Copy code
    global:        
        datahub:
            metadata_service_authentication:
                enabled: true
    Still I get
    "Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature."
    Datahub GMS version 0.8.41. Can anybody suggest what I might be doing wrong?
    b
    i
    • 3
    • 13
  • g

    gentle-camera-33498

    08/10/2022, 2:22 PM
    Hi! I'm getting some errors while browsing the frontend, but when I open the networking tab there is no problem with the calls to the backend. Does anyone have an idea what it could be?
    b
    o
    b
    • 4
    • 36
  • j

    jolly-traffic-67085

    08/11/2022, 3:38 AM
    Hi Team, I have a question . I can change or update name in this container? [picture below] if can change please advice me about how to. thank.
    b
    • 2
    • 9
  • l

    limited-forest-73733

    08/11/2022, 11:05 AM
    Hii team! While upgarding images to 0.8.43 rediness and liveness probe are not working for mae-consumer and mce-consumer
    i
    s
    +6
    • 9
    • 55
  • f

    fierce-garage-74290

    08/11/2022, 2:11 PM
    Mocking query and stats for demo purposes I am about to demo DataHub to some clients including features like
    Queries
    and
    Stats
    , which are quite hard to produce in appealing manner in R&D env. Is there any way I can ingest some dummy query and stats on service startup via some script with API calls? Thanks for the tips!
    b
    • 2
    • 1
  • h

    handsome-football-66174

    08/10/2022, 5:26 PM
    Hi Team, Quick question - While trying to enable Glue Profiling for a non-partitioned table I am getting this error - Is profiling only enabled for partitioned tables ( though in code I do see it as available - glue.py lines 837-841)?
    Copy code
    else:
                    # ingest data profile without partition
                    table_stats = response["Table"]["Parameters"]
                    column_stats = response["Table"]["StorageDescriptor"]["Columns"]
                    return [self._create_profile_mcp(mce, table_stats, column_stats)]
    h
    • 2
    • 1
  • l

    little-breakfast-38102

    08/12/2022, 5:52 AM
    Hi Team, I am trying to ingest metadata from MSSQL server. I am able to successfully customize base image "acryldata/datahub- ingestion:v0.8.40" and locally create a container and extract metadata through CLI. Question: I am having following in my terraform script to use the customized image from ECR which I pushed manually. datahub-ingestion-cron: enabled: true image: repository: ${customized_image} tag: ${image_version} I am unable to see a component “datahub-ingestion-cron” in my k8 workspace. Appreciate if you can help
    i
    d
    +2
    • 5
    • 16
  • b

    big-ocean-9800

    08/12/2022, 5:53 AM
    Cross-posting from #ingestion to see if anyone has any ideas about this one: https://datahubspace.slack.com/archives/CUMUWQU66/p1660251283230179
    i
    • 2
    • 7
  • h

    high-gigabyte-86638

    08/12/2022, 8:44 AM
    Hello I am trying to install DataHub Quickstart instance on a Linux VM (Ubuntu). When running the command "datahub docker quickstart" I get the error message that Docker is not started. I think it's because I can't find the Docker compose v1 plugin, but only the latest version. Can someone help me?
    f
    m
    • 3
    • 3
  • p

    purple-analyst-83660

    08/12/2022, 11:49 AM
    Hi Team, I recently enabled metadata access token on my frontend and gms, Now i can’t seem to ingest my tableau metadata to the datahub, It throws authentication error. As far as I understand I probably need to supply metadata token for the ingestion. But is there way I can keep metadata authentication token generation (I use it to make delete request) and not have to authenticate when I ingest data?
    failed to write record with workunit container-urn:li:container:2380f5d80f130b57eeeee5016d8c18b7-to-urn:li:chart:(tableau,885b1266-5bd9-4338-745e-c480bc7f0168) with ('Unable to emit metadata to DataHub GMS', {'message': '401 Client Error: Unauthorized for url:
    i
    • 2
    • 7
  • g

    great-motherboard-71467

    08/12/2022, 1:31 PM
    Hi Team, I`m trying to integrate authentication for frontend UI with LDAP server I have following config provided in to the jaas.conf
    Copy code
    WHZ-Authentication {
      com.sun.security.auth.module.LdapLoginModule sufficient
      userProvider="<ldaps://ldaps.some.server.eu/dc=some,dc=domain,dc=com>"
      authIdentity="{USERNAME}@some.domain.com"
      userFilter="uid={USERNAME},cn=users,cn=accounts,dc=some,dc=domain,dc=com"
      java.naming.security.authentication="simple"
      debug="true"
      useSSL="true";
    };
    Whatever i`m changing inside of this config for example port setting to :636 I`m ending with following error
    Copy code
    datahub-frontend-react    |             [LdapLoginModule] authentication-first mode; SSL enabled
    datahub-frontend-react    |             [LdapLoginModule] user provider: <ldaps://ldaps.some.server.eu/cn=users,cn=accounts,dc=some,dc=domain,dc=com>
    datahub-frontend-react    | 13:06:46 [application-akka.actor.default-dispatcher-2] ERROR application - The submitted callback is of type: class javax.security.auth.callback.NameCallback : javax.security.auth.callback.NameCallback@332d2227
    datahub-frontend-react    | 13:06:46 [application-akka.actor.default-dispatcher-2] ERROR application - The submitted callback is of type: class javax.security.auth.callback.PasswordCallback : javax.security.auth.callback.PasswordCallback@7dbea6b9
    datahub-frontend-react    |             [LdapLoginModule] attempting to authenticate user: some_test_user
    datahub-frontend-react    |             [LdapLoginModule] authentication failed
    datahub-frontend-react    |             [LdapLoginModule] aborted authentication
    No matter if i will change authIdentity to only {USERNAME} or if i will provide with domain name, or with dc standard. It is not working. Or when i`m trying to provide technical user which will connect by providing
    Copy code
    java.naming.security.principal=
    java.naming.security.credential=
    Then the Dummy Module is authenticating everything in such case. When i`m doing on my CLI following ldapsearch, i`m able to get info from LDAP about the specific user
    Copy code
    ldapsearch -H <ldaps://ldaps.some.server.eu> -x -b dc=some,dc=domain,dc=com '(&(objectClass=person)(uid=some_test_user))'
    Any hint what could be wrong ?
  • b

    busy-petabyte-37287

    08/12/2022, 2:11 PM
    Hi Team, I am following this guide to deploy Datahub to AWS: https://datahubproject.io/docs/deploy/aws After executing this command:
    kubectl get deployment -n kube-system aws-load-balancer-controller
    I get the following output:
    b
    • 2
    • 8
  • b

    busy-petabyte-37287

    08/12/2022, 2:13 PM
    However, when executing the command:
    Copy code
    helm upgrade --install datahub datahub/datahub --values values.yaml
    I get no address:
  • b

    busy-petabyte-37287

    08/12/2022, 2:14 PM
    I have repeated the full process two times but the result is the same, any of you know what could be wrong? Thank you so much
  • a

    ambitious-cartoon-15344

    08/14/2022, 3:20 PM
    If anyone knows why, please help me out, thanks
  • b

    bitter-insurance-49151

    08/15/2022, 9:54 AM
    #jetty.port help me please . ./gradlew metadata servicewar:run I'd like to change the port 8080 to 9002
    b
    i
    • 3
    • 9
  • e

    echoing-alligator-70530

    08/15/2022, 1:03 PM
    Hi Everyone, I am having some troubles viewing lineage on datahub, It seems that I am not seeing any downstream lineage on a dataset. I see only upstream and it says hidden downstream dependencies but doesn't show them. This is with dbt ingestion it will show the upstream to a snowflake dataset but that dataset has many other down streams which we were able to expand before but now there is an error and doesn't show it
    e
    c
    +2
    • 5
    • 5
  • b

    brief-cat-57352

    08/15/2022, 1:19 PM
    Hi Team, I'm having issues on data emission for our Airflow DAGs with schedule interval
    None
    or
    1:00:00
    . Other DAGs are working fine. Any idea on this? Thanks. Please see error stacktrace. cc: @hallowed-intern-9191
    Copy code
    File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 830, in serialize_dag
        serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
      File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 830, in <listcomp>
        serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
      File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 308, in _serialize
        return SerializedBaseOperator.serialize_operator(var)
      File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 578, in serialize_operator
        serialize_op['params'] = cls._serialize_params_dict(op.params)
      File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 451, in _serialize_params_dict
        if f'{v.__module__}.{v.__class__.__name__}' == '***.models.param.Param':
    AttributeError: 'int' object has no attribute '__module__'
    plus1 1
    d
    • 2
    • 6
  • t

    thankful-morning-85093

    08/15/2022, 4:00 PM
    Hi Team, I am trying to integrate PySpark notebook to Datahub. I am using spark on k8s under the hood. It works when I use spark-submit but not when I use notebooks. I am getting this error:
    Copy code
    22/08/13 20:54:48 ERROR DatahubSparkListener: java.lang.NullPointerException
    Even though I get this error, the command completes execution without pushing data to Datahub. Using Spark 3.3.0. Any help appreciated. Thanks.
    l
    c
    • 3
    • 17
  • b

    bitter-lizard-32293

    08/16/2022, 3:27 AM
    Hi folks, We've been trying to enable the analytics support in the DataHub UI and seem to be getting stuck hitting 500 due to DataFetchingExceptions. GMS logs indicate the index is not found:
    Copy code
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
    Details in đź§µ
    b
    • 2
    • 14
  • f

    few-carpenter-93837

    08/16/2022, 8:41 AM
    Hey all, Could someone help me understand how to come around solving https://github.com/datahub-project/datahub/issues/5295 For Vertica source in case of timestamp(x) or timestamptz(x), the following warning is raised:
    'warnings': {'XSCHEMA.YTABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}
    This issue is raised from the following code in source\sql\vertica.py:
    Copy code
    elif attype in ("timestamptz", "timetz"):
            kwargs["timezone"] = True
            if charlen:
                kwargs["precision"] = int(charlen)  # type: ignore
            args = ()  # type: ignore
        elif attype in ("timestamp", "time"):
            kwargs["timezone"] = False
            if charlen:
                kwargs["precision"] = int(charlen)  # type: ignore
    Which leads to the point that TIMESTAMP imported from
    sqlalchemy.sql.sqltypes import TIME, TIMESTAMP, String
    is missing the parameter precision: https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.DateTime One of the solutions would be to add
    Copy code
    self.precision = precision
    into the
    Copy code
    class TIMESTAMP(DateTime):
    in \Lib\site-packages\sqlalchemy\sql\sqltypes.py Which results in the correct output
    Copy code
    {
      "fieldPath": "DATE_CREATED",
      "jsonPath": null,
      "nullable": true,
      "description": null,
      "created": null,
      "lastModified": null,
      "type": {
        "type": {
          "com.linkedin.pegasus2avro.schema.TimeType": {}
        }
      },
      "nativeDataType": "TIMESTAMP(precision=0)",
      "recursive": false,
      "globalTags": null,
      "glossaryTerms": null,
      "isPartOfKey": false,
      "isPartitioningKey": null,
      "jsonProps": null
    }
    Since the fix is in sqlalchemy files, how to implement this to datahub code instead?
    d
    • 2
    • 3
  • n

    numerous-account-62719

    08/16/2022, 9:41 AM
    Hi Team I am facing one issue. I recently upgraded to version 0.8.41 In the total datasets the count is showing as 3.6k but as you can see I have 3.8k in oracle Why is there this mismatch?
    i
    s
    • 3
    • 6
1...434445...119Latest