https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • m

    mysterious-portugal-30527

    03/02/2022, 9:34 PM
    Hmmm. I have a postgres ingestion where I am toggling profiling to true and when I do ingestion fails with
    permission denied for table someschema.blahblahblah
    …but the user connecting to the instance can query said tables, validated via Postgres CLI. It is failing every table even though the user has select on all tables via role membership. Help! Ingestion script follows:
    Copy code
    source:
        type: postgres
        config:
            host_port: 'somepostgreshost:port'
            database: someuserdatabase
            username: avaliduser
            password: '${userpassword}'
            schema_pattern:
                allow:
                    - public
            include_tables: true
            include_views: true
            profiling:
                enabled: true
    sink:
        type: datahub-rest
        config:
            server: '<http://datahub-gms:8080>'
    Error text snippet:
    Copy code
    ['Profiling exception (psycopg2.errors.InsufficientPrivilege) permission denied for "
               "table '\n"
    l
    c
    +2
    • 5
    • 24
  • a

    adamant-kilobyte-90981

    03/02/2022, 10:19 PM
    Hey everyone. I have started up DataHub 0.8.27 both with the Docker QuickStart and on Minikube, and with both I am getting an error when executing a Snowflake ingestion source. Recipe and logs are in the reply thread.
    s
    b
    • 3
    • 20
  • c

    cool-painting-92220

    03/03/2022, 1:49 AM
    Hey everyone! I have a scenario at hand and wanted to figure out the best recommended practice for ensuring security in usage of datahub: a teammate and I both have access to the base admin account (user datahub). My Snowflake credentials are currently stored in a secure recipe file and used for our ingestion jobs. We recently updated our platform to the new UI instance, where ingestion jobs can be scheduled and managed from an admin account's UI. We would prefer to start managing our recipe/ingestion job through the UI, but I would want to preserve the secrecy of my Snowflake credentials for the recipe. I saw that a new feature for
    Secrets
    has been added that could be related to what I'm looking for, but I'm not too familiar with its limitations/capabilities. Is there a best practice out there for this situation?
    b
    • 2
    • 8
  • r

    red-zebra-92204

    03/03/2022, 2:47 AM
    Hi, when calling
    DataHubGraph.get_aspect(aspect="dataJobInfo")
    , i'm getting this error:
    avro.schema.AvroException: ('Datum union type not in schema: %s', None)
    The problem lies on the
    type
    field, which should be either
    AzkabanJobTypeClass
    or
    string
    , but it has the value of
    {"string": "SPARK"}
    . I don't know how to resolve this? Is this related to the deprecation of class
    AzkabanJobType
    ? Code to produce this error:
    Copy code
    graph = DataHubGraph(config=DatahubClientConfig(
        server='<https://demo.datahubproject.io/api/gms>',
        extra_headers={'cookie': f'{cookie}'}
    ))
    
    job_info = graph.get_aspect(
        entity_urn='urn:li:dataJob:(urn:li:dataFlow:(spark,orders_cleanup_flow,PROD),orders_dedupe_job)',
        aspect='dataJobInfo',
        aspect_type=DataJobInfoClass,
    )
    Github issue: https://github.com/linkedin/datahub/issues/4289
    b
    • 2
    • 1
  • a

    adorable-flower-19656

    03/03/2022, 3:01 AM
    Hello Datahub, I'd like to connect my Datahub with my company OIDC based on Microsoft ADFS. Claims seem to have no abnormalities. I can see created new user urn in datahub mysql after trying to login. But I couldn't see Datahub main page. Colud you help me solve this problem?
    b
    • 2
    • 11
  • r

    red-napkin-59945

    03/03/2022, 6:26 AM
    Hey team, I am trying to introduce a new
    BrowsableEntityType
    and noticed other existing type has some variable called
    FACET_FIELDS
    like in
    DashboardType
    Copy code
    private static final Set<String> FACET_FIELDS = ImmutableSet.of("access", "tool");
    l
    e
    • 3
    • 12
  • s

    square-solstice-69079

    03/03/2022, 8:29 AM
    Hello, any idea why this error is showing? Seen it before as well. Happens if I delete a source, and then ingest the source again with some changes, like filters on schemas, or changing from the test database to the prod database.
    h
    b
    • 3
    • 7
  • a

    adamant-furniture-37835

    03/03/2022, 8:35 AM
    Hi, After upgrading to 0.8.27 we are facing strange issues with Users & Groups. We get following error when we try to add users to a group : Failed to add group members!: Unauthorized to perform this action. Please contact your DataHub administrator. The default policy 'All Users - All Platform Privileges' is already activated. We wonder if someone else has faced similar problems after upgrading to latest version ?
    plus1 1
    b
    l
    • 3
    • 8
  • a

    average-vr-64604

    03/03/2022, 11:41 AM
    Hello, trying to deploy DH in protected networ segment. I'll set
    DATAHUB_TELEMETRY_ENABLED
    to
    false
    . But any calls to
    datahub
    CLI falls with error:
    mixpanel.MixpanelException: HTTPSConnectionPool(host='<http://api.mixpanel.com|api.mixpanel.com>', port=443): Max retries exceeded with url: /engage (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fafa0073b80>, 'Connection to corp-proxy timed out. (connect timeout=10)'))
    Do there any other options to disable telemetry collection?
    s
    • 2
    • 2
  • a

    adventurous-dream-16099

    03/03/2022, 5:30 PM
    Hello! We are seeing issues with overall policies since upgrading to the latest version. Even though we have policies created and connected to groups, users in that group cannot perform basic actions (e.g. adding owners to datasets (metadata policy) or adding users to groups (platform policy). Even when activating 'All Users - All Platform Policies' we receive the error message as per below. However, when we use GraphQL through postgres and add a header 'X-Datahub-actor' as datahub we are able to add users programatically. Is there any option to roll-back to version 8.0.26 or bypass the GraphQL work-around?
    b
    • 2
    • 7
  • n

    numerous-camera-74294

    03/03/2022, 5:52 PM
    hi! I have been messing around with the spark listener and I can get the spark application to finish. It got hung every single time and the onApplicationEnd never get called. Performing some trial and error I have been able to track down the issue and I think is it related to the emitter and the thread pool not being properly stopped and shutdown. If I do so on the SparkListenerSQLExecutionEnd event, everything goes well. But that is not an option because of many other problems it entails, any clue?
    c
    • 2
    • 4
  • m

    mysterious-portugal-30527

    03/03/2022, 6:23 PM
    Making progress, saw an error profiling a table with a JSON data type (POSTGRES 13) column. Is this a known issue?
    Copy code
    'HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.\n'
    The associated SQL Statement:
    Copy code
    '(SELECT count(*) AS element_count, sum(CASE WHEN (offer_set IN (NULL) OR offer_set IS NULL) THEN %(param_11)s ELSE %(param_12)s END) '
               'AS null_count \n'
    Also, this statement seems a bit non-sensical. checking for offer_set in (NULL) or offer_set IS NULL) ?? Both of these tests do the same thing right?? Wouldn’t we want to pick one test? Why do both? Am I missing something??
    b
    • 2
    • 4
  • b

    billowy-rocket-47022

    03/03/2022, 9:58 PM
    I tried the ingestion of one of my mysql database and it loaded, I can see the table, how to see the table content/row ?
    n
    b
    • 3
    • 7
  • r

    red-napkin-59945

    03/03/2022, 10:58 PM
    Hey Team, I found Dashboard entity has one aspect
    domains
    which indicate one dashboard could belongs to multiple domains. However, in the GraphQL schema definition, Dashboard entity only has one domain? Is this on purpose?
    b
    • 2
    • 6
  • m

    mysterious-portugal-30527

    03/03/2022, 11:46 PM
    Sigh.
    Copy code
    datahub docker check
    The following issues were detected:
    - datahub-gms is running but not healthy
    What would be next steps?
    b
    • 2
    • 8
  • r

    red-napkin-59945

    03/04/2022, 1:15 AM
    Hey Team, I got the following error which is related to GraphQL.
    Copy code
    {errors=[The object type 'DataDoc' [@3117:1] does not have a field 'relationships' required via interface 'Entity' [@297:1], There is no type resolver defined for interface / union 'DataDocCell' type]}
    Any suggestions about how to fix it?
    b
    • 2
    • 10
  • m

    most-pillow-90882

    03/04/2022, 5:52 AM
    Hello! New to Datahub. Got the quickstart running no prob, but am having problems with setup for local development. Following instructions here: https://datahubproject.io/docs/docker/development. Gradle build working, but when I try to run docker/dev.sh I get the following repeated errors
    datahub-gms        | 2022/03/04 05:44:18 Problem with dial: dial tcp: lookup broker on 127.0.0.11:53: server misbehaving. Sleeping 1s
    datahub-actions_1     | 2022/03/04 05:44:18 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.20.0.10:8080: connect: connection refused. Sleeping 1s
    b
    • 2
    • 4
  • a

    able-rain-74449

    03/04/2022, 8:54 AM
    Hi all has anyone come across this error before??
    n
    b
    • 3
    • 4
  • s

    salmon-area-51650

    03/04/2022, 9:38 AM
    Hi Team! 👋
    JSON
    columns are not compatible in
    SQL Profile
    for
    PostgreSQL
    . Is there a way to skip all JSON columns or I need to include all columns one by one in
    profile_pattern
    ? Thanks!!
    d
    l
    c
    • 4
    • 4
  • f

    few-air-56117

    03/04/2022, 1:45 PM
    Hi guys, i reinstall datahub but i got thi error on datahub-acryl
    Copy code
    2022-03-04 15:44:28.049 EETInvalidURL: Failed to parse: http://${GMS_HOST:-localhost}:${GMS_PORT:-8080}/config
    😞
    d
    s
    b
    • 4
    • 11
  • d

    damp-greece-27806

    03/04/2022, 2:01 PM
    Hi, I’m trying to ingest metadata via redshift. I have a recipe with the appropriate credentials and proper sink setup, but I keep getting a 401 from the GMS service
    b
    • 2
    • 4
  • s

    some-pizza-26257

    03/04/2022, 6:52 PM
    Hi all, I want to ingest from a custom data source through the RestEmitter. I am trying to add the DatasetProperties to the new dataset. But it’s not clear how to set the properties. Can anyone help?
    b
    • 2
    • 6
  • e

    elegant-traffic-96321

    03/04/2022, 9:58 PM
    Hey all, we’re currently running into some issues with our analytics tab. We’re running the graph implementation using AWS elastic search. Here’s the error message we’ve been getting:
    Copy code
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
          {
            "shard": 0,
            "index": "datahub_usage_event",
            "node": "3d1IH_U4T1OXbZJkCbNWtw",
            "reason": {
              "type": "illegal_argument_exception",
              "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
            }
          }
        ],
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        }
      },
      "status": 400
    }
    l
    e
    l
    • 4
    • 50
  • a

    acoustic-wolf-70583

    03/05/2022, 1:27 AM
    Hi, which are the compatible python package versions to use for pip install 'acryl-datahub[datahub-rest,datahub-kafka], when using datahub v 0.8.26 for frontend and gms containers with images - linkedin/datahub-frontend-react:v0.8.26 and linkedin/datahub-gms:v0.8.26
    h
    • 2
    • 3
  • m

    mysterious-portugal-30527

    03/07/2022, 6:44 PM
    I had to reboot my host where datahub is running. After the reboot I ran
    datahub docker nuke --keep-data
    and then
    datahub docker quickstart
    After everything restarts, if I go to the ingestion page it indicates that an ingestion which had been running before my reboot is still running. I believe this is incorrect. I think this is a state artifact left over from before the restart, right?
    b
    • 2
    • 2
  • m

    melodic-helmet-78607

    03/08/2022, 2:09 AM
    Hello, I have a recurring problem where I have to reload the frontend in my browser repeatedly to avoid login error. Any ideas where to start troubleshooting? I'm using LdapLoginModule to connect to ldap.jumpcloud.com. How do I know if is it cookie problem or ldap problem?
    Copy code
    02:05:17 [application-akka.actor.default-dispatcher-16] ERROR controllers.AuthenticationController - Authentication error
    javax.naming.AuthenticationException: javax.security.auth.login.FailedLoginException: Cannot connect to LDAP server
    s
    • 2
    • 5
  • a

    adorable-tomato-97942

    03/08/2022, 6:51 AM
    Hi all, I met a problem when log in datahub after I enabled SSO for datahub. here is the error msg in browser console: <h1>Bad Message 431</h1><pre>reason: Request Header Fields Too Large</pre> could anyone help me?
    g
    b
    +3
    • 6
    • 45
  • s

    sparse-account-96468

    03/08/2022, 10:34 AM
    Not sure if this is the right channel - apologies (new), but wanted to ask about the preference/use of a local private key path vs supporting serialized private key for use against snowflake? I can see the env variable for a private key path (and can see the code in datahub that then serializes this for snowflake connection), but haven’t found support for just a serialized private key?
    g
    • 2
    • 9
  • f

    full-dentist-68591

    03/08/2022, 10:46 AM
    Hi all, I am troubling setting up policies. E.g. I create a policy for granting "MANAGE_USERS_AND_GROUPS" but when the applied user or group tries to add or remove other users from a group a error occurs "Failed to remove group member!: Unauthorized to perform this action. Please contact your DataHub administrator." Any hunch what could be wrong here?
    g
    • 2
    • 6
  • n

    numerous-camera-74294

    03/08/2022, 3:15 PM
    hi all, I am getting an
    Copy code
    ValueError: com.linkedin.pegasus2avro.schema.Schemaless contains extra fields: {'com.linkedin.schema.MySqlDDL'}
    when using DataHubGraph.get_aspect like:
    Copy code
    current_schema_metadata = graph.get_aspect(
                entity_urn=dataset_urn,
                aspect="schemaMetadata",
                aspect_type=SchemaMetadataClass,
            )
    g
    • 2
    • 7
1...192021...119Latest