https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • b

    brave-businessperson-3969

    01/28/2022, 4:18 PM
    A short question: is it correct that there is a hard limit of 10 000 on the number of results you can get from a search query? I've run into strange errors when sending search requests via GraphQL API programmatically: "code 500 SERVER_ERROR, classification "DataFetchingException" " as soon as I tried to set the "start" parameter in GraphQL search queries to values >= 10 000
    b
    e
    • 3
    • 11
  • a

    agreeable-plastic-37919

    01/28/2022, 8:31 PM
    I was following https://github.com/linkedin/datahub/blob/master/metadata-models-custom/README.md - when I run ../gradlew install like the readme states, it appears to build and install locally. I'd like the newly created aspect pushed to my gms running in k8s. I checked and my datahub cli is configured to point to the remote server, so I'm not sure why the install doesn't push the artifact to the server. /config endpoint shows only noCode true. Am I understanding this example properly, shouldn't it add new aspects at runtime for existing entities?
    m
    e
    d
    • 4
    • 53
  • p

    prehistoric-room-17640

    02/01/2022, 1:41 PM
    curious from the community here. I've deployed datahub into kubernetes and wrote ingestion recipes for redshift, kafka, s3, but what I want is to dynamically search our aws systems for data impacting systems (mysql, postgres, kafka, elastic search, etc...). Do most implementations create yaml files manually and save to GIT and then run updates through airflow, or is there an implementation pattern that I can follow that is typical of AWS or GCP.
    o
    • 2
    • 1
  • a

    agreeable-plastic-37919

    02/01/2022, 5:11 PM
    I was building custom artifacts, but how can I import and use existing linkedin objects in my model. ~/projects/linkedin/metadata-models-custom$ ../gradlew -Prest.model.compatibility=ignore -PprojVersion=0.0.14 install - I see an error: "CorpuserUrn" or "com.linkedin.common.CorpuserUrn" cannot be resolved. 20,20: Type not found: CorpuserUrn "CorpuserUrn" or "com.linkedin.common.CorpuserUrn" cannot be resolved. 26,18: Type not found: CorpuserUrn "CorpuserUrn" or "com.linkedin.common.CorpuserUrn" cannot be resolved. 32,25: Type not found: CorpuserUrn "CorpuserUrn" or "com.linkedin.common.CorpuserUrn" cannot be resolved. 38,22: Type not found: CorpuserUrn at com.linkedin.data.schema.generator.AbstractGenerator.parseSources(AbstractGenerator.java:157) at com.linkedin.data.avro.generator.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:191) at com.linkedin.data.avro.generator.AvroSchemaGenerator.run(AvroSchemaGenerator.java:165) at com.linkedin.data.avro.generator.AvroSchemaGenerator.main(AvroSchemaGenerator.java:123)
    e
    b
    g
    • 4
    • 18
  • s

    square-machine-96318

    02/03/2022, 1:03 AM
    I am about to use datahub through an in-house Kubernetes and AWS EKS server. However, I want to reset the wrong setting at first, but the official document seems to have only documents related to nuke in the docker environment. Do you have any documents to refer to in this regard?
    o
    • 2
    • 2
  • s

    square-machine-96318

    02/03/2022, 2:38 AM
    I want to un-ingest(I don’t know this word ‘uningest’ is right. I just want to roll-back before I ingest some psql file) psql file. For example, I already ingested two files of postgreSQL but I want to un-ingest one postgreSQL file. How can I do it?
    b
    m
    • 3
    • 6
  • b

    better-orange-49102

    02/03/2022, 3:03 PM
    is the datahub-actions image not buildable at our end, ie, we must always pull it? Because i'm building my own containers in intranet and need to be careful not to pull the wrong version of action especially when its always pointing to head. Being able to build my own ensures that head doesnt go out of sync with the rest of the containers. Because I only see an option to pull in docker-compose.yml
    a
    e
    • 3
    • 7
  • a

    agreeable-plastic-37919

    02/03/2022, 3:35 PM
    Can we hide UI components for a given subtype of a dataset? For example, I want to remove Lineage, Queries and Stats from showing up on a particular subtype we created?
    b
    • 2
    • 3
  • b

    better-orange-49102

    02/04/2022, 9:02 AM
    im using Keycloak as OIDC Idp. Is there a way to refresh group membership of a user (maybe delete the user daily so that it forces a refresh? 🥴)? assuming that i do not create any groups using the datahub UI or API, the group membership of a user is determined at first login. Wondering how I could keep the user's membership updated as they change appointments and move around in my org... 🤨 I do not want to manually edit the membership of each membership in UI, that is very laborious in production. Only for adhoc basis. wondering if i need to write a custom daily script to query user's group membership (via a third party, in-house API) and gen a new version of groupmembership if it changes. Since membership info is kept in CorpUser entity and not CorpGroup entity.
    o
    b
    • 3
    • 11
  • h

    handsome-football-66174

    02/04/2022, 2:58 PM
    Hi Everyone, we are trying to switch to MWAA (https://aws.amazon.com/managed-workflows-for-apache-airflow/ ). We would like to understand how to integrate Datahub with MWAA and if there is any documentation for the same.
    r
    o
    +3
    • 6
    • 21
  • g

    gorgeous-dinner-4055

    02/05/2022, 3:48 AM
    Hi all! Not sure where this question goes, deployment might be the best tangentially related area? I'm curious about the
    metadata-ingest/src/datahub/metadata
    subfolder where the metadata is built by translating the
    .pdl
    files into
    avro
    files. Is there any reason not to check those into our forked version of Datahub? Do those change often? Are there cross platform compilation issues? Conserving space? The reason behind this question is that internally we use Bazel to build our dependencies, and instead of trying to publish whls(we don't have an internal pypi to host these whls 😞) we can directly import the code and create a dependency through the git source code in other repos. We could publish the whls to S3 or as git packages, but before going down that route, just curious why we don't check them in.
    o
    • 2
    • 1
  • e

    echoing-dress-35614

    02/05/2022, 9:07 PM
    trying to deploy datahuib on the new-ish AWS ARM64 platform, (uname -m reports 'aarch64'), and it's not having it:
    Copy code
    ⠿ datahub-frontend-react Error                                                                                                                   0.3s
     ⠸ kafka-setup Pulling                                                                                                                            0.3s
     ⠿ broker Error                                                                                                                                   0.3s
     ⠿ elasticsearch-setup Error                                                                                                                      0.3s
     ⠸ datahub-gms Pulling                                                                                                                            0.3s
     ⠸ elasticsearch Pulling                                                                                                                          0.3s
     ⠸ mysql Pulling                                                                                                                                  0.3s
     ⠿ datahub-actions Error                                                                                                                          0.3s
    no matching manifest for linux/arm64/v8 in the manifest list entries
    • 1
    • 1
  • e

    echoing-dress-35614

    02/05/2022, 10:16 PM
    trying to add
    METADATA_SERVICE_AUTH_ENABLED
    environment variable to the datahub-gms container that was started using quickstart - what's the preferred method to add this when using the quickstart config?
    b
    • 2
    • 2
  • b

    breezy-noon-83306

    02/06/2022, 6:44 PM
    Is it a way to manage Datahub fully nocode or thru UI? I am just starting. Thanks !!
    i
    • 2
    • 1
  • f

    few-air-56117

    02/07/2022, 10:16 AM
    Hy guys, when datahub v0.8.25 will be available in helm? 😄
    i
    • 2
    • 2
  • w

    wooden-football-7175

    02/07/2022, 2:40 PM
    Hello DH team. I have a quick question. It’s an obligation to deploy DH with Kafka or is an option? On the beginning the idea is not to use Kafka but may be to add the feature latter!
    b
    • 2
    • 1
  • b

    busy-lion-43973

    02/07/2022, 3:42 PM
    Hi All, im new to this slack group and DataHub, was looking for best practices in Scaling the MAC and MAE in a K8s cluster
    i
    • 2
    • 5
  • s

    sticky-kite-42322

    02/08/2022, 12:03 PM
    Hi All, I am kind of new here. I have couple of questions. Please how do I change default login (datahub) and password (datahub) when deploying using helm to kubernetes?
    b
    • 2
    • 8
  • b

    billowy-jewelry-4209

    02/08/2022, 7:33 PM
    Hi everyone. I need datahub to start automatically when system rebooted. May be someone have written service for systemd? I will be glad to hear another advices how to make datahub restart after failure.
    i
    • 2
    • 6
  • r

    red-accountant-48681

    02/08/2022, 8:30 PM
    Hi all. I am quite new to datahub, and docker as well. I want to double check that my understanding: What does it mean to "deploy" datahub? I have gone through the quickstart so I have it set up on my local machine. How would I be able to get others to access the UI?
    i
    • 2
    • 1
  • b

    boundless-student-48844

    02/09/2022, 7:04 AM
    Hi team, I encountered this error for building
    datahub-mae-consumer
    image with tag
    v0.8.26
    Copy code
    Execution failed for task ':metadata-jobs:mae-consumer-job:checkstyleMain'.
    > Could not resolve all files for configuration ':metadata-jobs:mae-consumer-job:checkstyle'.
       > Could not download antlr4-runtime.jar (org.antlr:antlr4-runtime:4.7)
          > Could not get resource '<https://plugins.gradle.org/m2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar>'.
             > Could not GET '<https://plugins.gradle.org/m2/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar>'.
                > Connection reset
    It seems to be due to redirect of URL (303 response when wget). Do you have some idea how to fix this?
    • 1
    • 2
  • p

    prehistoric-dawn-23569

    02/09/2022, 11:59 AM
    Hello. Has anyone got any advice about real-world Elasticsearch cluster sizes and index sizes to share please? We're currently planning a deployment of DataHub from scratch, with at most a few 10s of thousands of datasets from Hive, Druid, Cassandra etc. I'm also considering hosting the graph database on Elasticsearch as well, as opposed to Neo4J. I'm looking at a 3-node Elasticsearch cluster for high-availability purposes, but I wondered if anyone could share any experiences of their experiences in sizing an Elasticsearch cluster for a similar workload, to make sure I'm not massively over or under speccing it. Thanks.
    i
    • 2
    • 2
  • b

    better-orange-49102

    02/09/2022, 12:12 PM
    I've been running an instance using neo4j as graph backend. Is it possible to switch over to a pure ES and MySQL only instance without losing the data within?
    p
    i
    • 3
    • 4
  • f

    few-air-56117

    02/09/2022, 1:33 PM
    Hi guys, how can i deploy datahub 0.8.26 using helm?
    c
    l
    i
    • 4
    • 3
  • s

    square-machine-96318

    02/10/2022, 1:22 AM
    Hi guys! how can I ingest Airflow in Datahub system?
    i
    s
    • 3
    • 7
  • d

    delightful-sugar-63810

    02/10/2022, 5:58 PM
    Hey hey ! I'm seeing that we have these topics plus some others in the kafka:
    Copy code
    FailedMetadataChangeEvent_v4
    FailedMetadataChangeProposal_v1
    MetadataAuditEvent_v4
    MetadataChangeEvent_v4
    MetadataChangeLog_Timeseries_v1
    MetadataChangeLog_Versioned_v1
    MetadataChangeProposal_v1
    I'm planning to define a retention policy for these topics. Does that sounds okay if I define a window that allows the records to get consumed, like 5-10 days? Are there any topics that are used as a (long term)source-of-truth data store by any component?
    👀 1
    i
    • 2
    • 4
  • a

    agreeable-plastic-37919

    02/10/2022, 6:44 PM
    I'm seeing a strange error trying to ingest a dataset profile or usage, 'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /partitionSpec/type :: unrecognized field found but not allowed\n', 'status': 422}),
    Copy code
    usageStats = DatasetUsageStatisticsClass(
                timestampMillis=1629840771000,
                uniqueUserCount=10,
                totalSqlQueries=20,
                fieldCounts=[
                    DatasetFieldUsageCountsClass(
                        fieldPath="field1",
                        count=10
                    )
                ]
            )
    
     # Construct a MetadataChangeProposalWrapper object.
        metadata_event = MetadataChangeProposalWrapper(
            entityType="dataset",
            changeType=ChangeTypeClass.UPSERT,
            entityUrn=builder.make_dataset_urn("myplatform", "path.to.my.table"),
            aspectName="datasetUsageStatistics",
            aspect=usageStats,
        )
    
        # Emit metadata! This is a blocking call
        emitter.emit(metadata_event)
    i
    • 2
    • 7
  • a

    ancient-apartment-23316

    02/10/2022, 9:25 PM
    Hi, can you please advise the best way to properly backup datahub infrastructure and restore from backups?
    plus1 1
    i
    • 2
    • 1
  • b

    brave-secretary-27487

    02/11/2022, 10:25 AM
    Hey, I am currently working on configuring
    METADATA_SERVICE_AUTH_ENABLED
    The docs state the following
    set the METADATA_SERVICE_AUTH_ENABLED environment variable to "true" for the datahub-gms AND datahub-frontend containers / pods.
    While the docs on github suggest there are more steps to enable the meta data auth service. found at (https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/datahub-frontend) and https://datahubproject.io/docs/introducing-metadata-service-authentication/ Should I follow the official docs or go with the github docs?
    b
    • 2
    • 2
  • a

    agreeable-plastic-37919

    02/11/2022, 1:01 PM
    Does datasetUsageStatistics for a dataset showup on the GUI anywhere? I added usage to a dataset, it appears to have ingested just fine, but I don't see anything in the GUI for uniqueUserCount, etc?
    s
    d
    • 3
    • 8
1...789...53Latest