https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • q

    quick-pizza-8906

    01/10/2023, 12:37 AM
    Hello I have operational questions about Datahub inner workings, especially with respect to the Kafka topics used internally. At this moment my team and I are running Datahub in production with all Kafka topics required by the project (https://datahubproject.io/docs/how/kafka-config/#topic-configuration) configured with a single partition - we were extremely early adopters of Datahub and in the beginning we were worried about ordering of events if topics were partitioned - which of course comes at a price when performance is considered. Therefor we have below questions: 1. Can we use partitions for all (or part) of the topics used by Datahub? Are there any experiences as to what numbers should be used here? 2. How many parallel consumers should be used depending on the partitions of the topics? Does it make sense to have dedicated consumers for separate topics (metadata change log vs metadata change proposal) or is it better (performance wise) to have each consumer consume all of them? 3. As we are running highly-available service and pull metadata as well as receive metadata via pushes we can not stop ingestion of the metadata. If we wanted to recreate topics with higher partition-count we planned to simply create new topic, change deployment of consumers/gms to point at new topic and roll them out one at a time (to keep HA), drain old topic into new one (to get metadata which was received during transition period) - in this case there is possibility that older events will be processed after newer ones - will they be properly treated (discarded) and not override newer data?
    i
    • 2
    • 5
  • q

    quick-pizza-8906

    01/10/2023, 1:33 AM
    Another question independent from the above - we observe exponential raise in time of processing aspect when running reindexing job - with 1.6 mln aspects we observe that initial batches are processed very quickly:
    Copy code
    metrics so far RestoreIndicesResult(ignored=0, rowsMigrated=1000, timeSqlQueryMs=2, timeGetRowMs=0, timeUrnMs=6, timeEntityRegistryCheckMs=0, aspectCheckMs=0, createRecordMs=92, sendMessageMs=854)
    Successfully sent MAEs for 1000/1651106 rows (0.06% of total). 0 rows ignored (0.00% of total)
    0.02 mins taken. 33.25 est. mins to completion. Total mins est. = 33.27.
    While time increases with each batch to reach 20 mins per batch at the end:
    Copy code
    Reading rows 1650000 through 1651000 from the aspects table completed.
    Successfully sent MAEs for 1649942/1651106 rows (99.93% of total). 58 rows ignored (0.00% of total)
    2113.96 mins taken. 1.49 est. mins to completion. Total mins est. = 2115.45.
    metrics so far RestoreIndicesResult(ignored=58, rowsMigrated=1650942, timeSqlQueryMs=40, timeGetRowMs=0, timeUrnMs=1821, timeEntityRegistryCheckMs=376, aspectCheckMs=296, createRecordMs=31830, sendMessageMs=80009)
    Args are RestoreIndicesArgs(start=1651000, batchSize=1000, numThreads=1, batchDelayMs=100, aspectName=null, urn=null, urnLike=null)
    Successfully sent MAEs for 1650942/1651106 rows (99.99% of total). 58 rows ignored (0.00% of total)
    Reading rows 1651000 through 1652000 from the aspects table started.
    2117.09 mins taken. 0.21 est. mins to completion. Total mins est. = 2117.30.
    Did somebody experience similar problems with restore job? Can we improve performance somehow?
    i
    w
    • 3
    • 21
  • a

    average-dinner-25106

    01/10/2023, 2:29 AM
    Hi, how can I reset the column descriptions which were edited? I removed ingestion of target database and re-ingested it, but table and column remained same. Edited descriptions still appeared. I want description of column and table to change to ones written in original databases.
    g
    b
    • 3
    • 9
  • b

    best-wire-59738

    01/10/2023, 8:18 AM
    Hello Team, I was working on custom authenticator plugin. It’s throwing the below issue Everytime I tried to deploy the plugin. I had implemented the Authenticator interface in my class A3Authenticator. could you please have a look and let me know what is causing the error.
    b
    g
    +4
    • 7
    • 35
  • l

    late-book-30206

    01/10/2023, 10:36 AM
    Hello, my team and I are trying to deploy DataHub in our Production environment with k8s. But we do have an error with elasticsearch :
    Copy code
    2023/01/10 10:19:06 Problem with request: Get <http://elasticsearch:9200>: dial tcp: lookup elasticsearch on 10.3.0.10:53: no such host. Sleeping 1s
    2023/01/10 10:19:06 Timeout after 2m0s waiting on dependencies to become available: [<http://elasticsearch:9200>]
  • l

    late-book-30206

    01/10/2023, 10:37 AM
    Hello, my team and I are trying to deploy DataHub in our Production environment with k8s. But we do have an error with elasticsearch :
    Copy code
    2023/01/10 10:19:06 Problem with request: Get <http://elasticsearch:9200>: dial tcp: lookup elasticsearch on 10.3.0.10:53: no such host. Sleeping 1s
    2023/01/10 10:19:06 Timeout after 2m0s waiting on dependencies to become available: [<http://elasticsearch:9200>]
    It seems, elasticsearch is not reachable or not resolved by his IP address. For information : prerequisistes 0.0.9 datahub 0.2.89 Does anyone know what could be the problem here please ?
    i
    s
    • 3
    • 25
  • a

    acceptable-terabyte-34789

    01/10/2023, 12:26 PM
    hello, how can I make an export of all the data of the tool? with /GET I can only access to a unique element (urn)
    h
    • 2
    • 1
  • f

    faint-actor-78390

    01/10/2023, 2:57 PM
    Hi all, Trying to run ./run_upgrade.sh script ================== Error trace below ... What are the pre-requisite to run this upgrade ?? TIA and Happy New Year 🙂 Bruno ==== head: Pulling from acryldata/datahub-upgrade Digest: sha256:2691ab0cdf49be771b38e472a1ae61a130194b97b856430ccbc0a170190537c0 Status: Image is up to date for acryldata/datahub-upgrade:head docker.io/acryldata/datahub-upgrade:head ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v2.5.12) 2023-01-10 145004.598 INFO 1 --- [ main] io.ebean.EbeanVersion : ebean version: 11.33.3 2023-01-10 145004.619 INFO 1 --- [ main] io.ebean.config.properties.LoadContext : loaded properties from [application.yml] Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 2023-01-10 145004.707 INFO 1 --- [ main] io.ebean.datasource.pool.ConnectionPool : DataSourcePool [gmsEbeanServiceConfig] autoCommit[false] transIsolation[READ_COMMITTED] min[2] max[50] 2023-01-10 145009.999 INFO 1 --- [ main] i.e.d.pool.PooledConnectionQueue : Reseting DataSourcePool [gmsEbeanServiceConfig] min[2] max[50] free[0] busy[0] waiting[0] highWaterMark[0] waitCount[0] hitCount[0] 2023-01-10 145009.999 INFO 1 --- [ main] i.e.d.pool.PooledConnectionQueue : Busy Connections: 2023-01-10 145010.000 ERROR 1 --- [ main] c.l.g.factory.entity.EbeanServerFactory : Failed to connect to the server. Is it up? ERROR SpringApplication Application run failed org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanServer' defined in class path resource [com/linkedin/gms/factory/entity/EbeanServerFactory.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.ebean.EbeanServer]: Factory method 'createServer' threw exception; nested exception is java.lang.NullPointerException at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659) $$$$$
    👀 1
    ✅ 1
    i
    k
    • 3
    • 38
  • l

    late-shampoo-30424

    01/10/2023, 5:54 PM
    I have a question/issue, when attempting to post my great expectations results to my local instance of datahub, I get this error:
    Copy code
    Datasource mssql_datasource is not present in platform_instance_map
    ('Unable to emit metadata to DataHub GMS', {'message': '401 Client Error: Unauthorized for url: <http://localhost:9002/openapi/entities/v1/aspects?action=ingestProposal'}>)
    Any idea on what is going on? I’m a bit lost
    ✅ 1
    👀 1
    a
    h
    • 3
    • 10
  • k

    kind-policeman-5342

    01/10/2023, 7:47 PM
    Hey guys, I’m facing an issue on the web interface and wanted to know if anyone has a clue. On the top menu, when I click on the profile icon, should have the datahub’s tag, but shows “null”. Is there a parameter that should have been set during deployment for it to work? Where does this value come from? Does anyone know where in the github repository can I find the creation of this component? Any idea on what is going on? Thanks!!
    ✅ 1
    b
    b
    • 3
    • 17
  • s

    salmon-jackal-36326

    01/10/2023, 10:53 PM
    @here
  • s

    salmon-jackal-36326

    01/10/2023, 10:54 PM
    Does some had the experience trying to restore big sql files? I'm trying but only receiving: ERROR {datahub.cli.docker_cli:353} - Failed to run MySQL restore
  • s

    salmon-jackal-36326

    01/10/2023, 11:35 PM
    Guys, I reinserted my backup.sql in an empty database via sqlcmd because the file was too big to use using the datahub-cli:
    Copy code
    python3 -m datahub docker quickstart --restore --restore-file ../datahub/docker/quickstart/backup_2023_01_10.sql
    but despite the data being already inserted in the database, I still don't see anything through the ui. What should I do? Because when I use the command above I receive [2023-01-10 233638,311] ERROR {datahub.cli.docker_cli:353} - Failed to run MySQL restore
    ✅ 1
    h
    • 2
    • 8
  • a

    average-dinner-25106

    01/11/2023, 7:50 AM
    Hello, I want to upload an image located in my local in the glossary term markdown document. However, as shown in the figure, datahub ui can't find the image though the path seems not wrong. I have no idea what to do.
    b
    b
    • 3
    • 6
  • m

    melodic-dress-7431

    01/11/2023, 11:24 AM
    Hi Team...have compiled and created cusstom image for datahub_frontend ...but not able to see any charts in analytics tab (have added datasets though)
    ✅ 1
    b
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    01/11/2023, 3:25 PM
    Hello, I am having a problem with not seeing the validations of a certain PostgreSQL table. Not long ago (this morning) I was able to send different validations to another table. The thing is that now I am unable to do so, although I am following the same procedure. I am not seeing any error logs and, as shown in the image, the validation succeed. Any idea why could it be?
    b
    • 2
    • 23
  • f

    freezing-father-76422

    01/11/2023, 4:54 PM
    Hey! I have a problem with the backup restoration. Steps are: • install datahub via the official chart • add some datasets, users, tags • backup the contents of MySQL (
    metadata_aspect_v2
    table) • drop the chart completely and reinstall it back • restore MySQL data After that, I expect datasets and users to re-appear in the UI, but UI remains completely clean.
    ✅ 1
    i
    • 2
    • 3
  • f

    faint-painting-38451

    01/11/2023, 6:11 PM
    Hi, we recently had an issue were there were many lingering indices for one of our custom entities that weren't getting removed. Tracked down where it should have been removing these indices to ESIndexBuilder.java > buildIndex(). One thing I noticed in there is that if for whatever reason it fails to delete the old index, that old index will be stuck and will need to be manually deleted since this function only deletes the active index after setting up the new one. What I was thinking was adding a block at the end of the function here that would remove any indices that follow the index name pattern and don't have an alias to prevent indices being leftover. Does that seem like a good solution that would be wanted upstream or is there already something in place to clean up the indices that we would need to add our custom entity to?
    a
    b
    • 3
    • 6
  • f

    freezing-father-76422

    01/11/2023, 7:28 PM
    Second question - Is there an easy way to integrate Datahub with LDAP? I found only this issue so far and it didn't help
    b
    b
    • 3
    • 10
  • r

    ripe-electrician-13049

    01/12/2023, 1:44 AM
    Hi all. We want to use datahub to parse some sql commands using spark3.0. But I find that the latest code uses spark2.0. I want to know when will the project support spark3.0?
    ✅ 1
    d
    a
    • 3
    • 5
  • r

    red-waitress-53338

    01/12/2023, 2:25 AM
    Hi! Is there a way to use Elasticsearch AUTH KEY instead of ELASTICSEARCH_USERNAME and ELASTICSEARCH_PASSWORD in the GMS service?
    ✅ 1
    👀 1
    a
    • 2
    • 3
  • m

    magnificent-lock-58916

    01/12/2023, 7:22 AM
    Tableau Ingestion imports data successfully, but folder hierarchy is messed up. Some folders are in place, some folders are missing completely with its contents. Some folders are in wrong place or contain that they should not contain, or merged based on same name despite being two different folders in different places originally What can be a reason and solution to this? I saw someone mention that if objects in Tableau has same names, it can mess up the structure. Is this the reason? If it is, are there any plans to address this issue? Because trying to have unique names across thousands of workbooks, charts, datasources, folders and so on seems almost impossible considering collaborative nature of our Tableau server P.S. We’re running on older version of Datahub, so that might be not the case in newer version. That’s what I want to know, because our devops team would like to know if it’s worth to update to fix this issue, or it will stay the same no matter the version
    h
    a
    • 3
    • 6
  • d

    dazzling-shampoo-79134

    01/12/2023, 8:03 AM
    Hi all, Been getting the following exception(s) after logging in, causing failures when viewing metadata in general. Can any one help? Thanks!
    ✅ 1
    f
    b
    • 3
    • 6
  • l

    lemon-cat-72045

    01/12/2023, 9:11 AM
    Hi all, I am seeing this error on the homepage. Does anyone know what's causing this problem? I am running 0.9.3. Thanks!
    Copy code
    Validation error (UnknownType) : Unknown type 'BatchGetStepStatesInput' (code undefined)
    ✅ 1
    h
    b
    • 3
    • 2
  • s

    steep-fountain-54482

    01/12/2023, 9:43 AM
    hello, I´m trying to get all data platforms by running this query, but it returns zero results.
    ✅ 1
  • s

    steep-fountain-54482

    01/12/2023, 9:44 AM
    Copy code
    query {
      search(input: {
        type: DATA_PLATFORM,
        query: "*",
        count: 100,
        start: 0,    
      }) {
        total
        searchResults {
          entity {
              urn                  
          }
        }
      }
    m
    b
    • 3
    • 22
  • s

    steep-fountain-54482

    01/12/2023, 9:44 AM
    i can see 2 platforms in the UI (athena, s3)
  • s

    steep-fountain-54482

    01/12/2023, 9:44 AM
    any idea what's wrong on this?
  • c

    chilly-postman-99763

    01/12/2023, 2:49 PM
    Hello!
    Copy code
    Hello,
    
    I'm trying to install bigquery plugin in composer,
  • m

    modern-garden-35830

    01/12/2023, 3:01 PM
    Hi All, _datahub ingest return error for CSVEnricherConfig should_overwrite extra fields not permitted_, why is that?
    Copy code
    source:
        type: "csv-enricher"
        config:
            filename: "/datahub/datahub.csv"
            should_overwrite: false
            delimiter: ","
            array_delimiter: "|"
    
    sink:
        type: "datahub-rest"
        config:
            server: "<http://serveraddress.com>"
    ✅ 1
    b
    • 2
    • 3
1...697071...119Latest