https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • a

    abundant-apartment-78179

    05/02/2023, 7:52 AM
    Hey Guys, we are planning to integrate the schema registry into the datahub. Could you point to some appropriate links on the same? Could you also point to the roadmap items on this topic? I want to build a general picture on how we can utilize and plan the work accordingly
    l
    a
    • 3
    • 5
  • r

    rapid-appointment-83078

    05/02/2023, 10:18 AM
    @brave-room-48783 - Hey can you please the problem we are facing to have Datahub running on our end, thanks
    l
    a
    • 3
    • 2
  • f

    fierce-animal-98957

    05/02/2023, 4:26 PM
    Hi Team, We are using โ€œDataHubValidationActionโ€ to send assertions metadata to DataHub. We are running this from inside Databricks using Great Expectations, that uses Spark engine. From the documentation, this currently works only with โ€œSqlAlchemyExecutionEngineโ€. Do anyone of you know when this class will be enhanced to add Spark engine support? Anything on the roadmap? https://datahubproject.io/docs/metadata-ingestion/integration_docs/great-expectations/#capabilities https://docs.greatexpectations.io/docs/integrations/integration_datahub/
    l
    a
    • 3
    • 3
  • b

    bland-orange-13353

    05/03/2023, 9:50 AM
    This message was deleted.
    ๐Ÿ” 1
    โœ… 1
    ๐Ÿ“– 1
    l
    p
    • 3
    • 2
  • c

    colossal-autumn-78301

    05/03/2023, 3:35 PM
    Hey everyone, trying to extend metadata model defined in https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/. Added a custom aspect to dataset entity. When the added aspect contains an
    array
    then in the UI the dataset page becomes
    white blank
    page. And, when we include a
    struct
    in the
    pdl
    , the UI shows the new
    aspect
    as
    tab
    in the dataset but the fields in the struct are not shown. Examples (note- these pdls are in separate files):
    Copy code
    PDLs:
    namespace a.b.c
    record TestRec{
       name: string
       desc: optional string
    }
    record NewAspectRecord {
      name: string
      isRequired: boolean
      description: string
      restrictions: optional array[string]
      testRec: TestRec
    }
    the aspect pdl is
    Copy code
    namespace a.b.c
    
    @Aspect = {
      "name": "newAspect",
      "autoRender": true,
      "renderSpec": {
        "displayType": "tabular", // or properties
        "key": "newRecords",
        "displayName": "Dataset Consumption Contract"
      }
    }
    record NewAspect {
      rules: array[NewAspectRecord]
    }
    Here, due to `1`:
    restrictions
    in
    NewAspectRecord
    the UI goes blank for the dataset,
    2
    : due to the
    testRec
    of type
    TestRec
    , the UI is there but
    TestRec
    does not get rendered? PS: we do add to entity-registry.yaml and build and install works. Are we doing something wrong?
  • l

    lively-table-90009

    05/04/2023, 1:07 AM
    Hi Team, Is there a way to lookup a dataset by exact name via API? I am using the Search query in GraphQL by inputting
    <http://MY_SCHEMA.MY|MY_SCHEMA.MY>_TABLE
    and it returns a list of results when I just want one result. I see that you need the
    urn
    to get a dataset directly. What is the best way to accomplish this?
    l
    a
    • 3
    • 5
  • a

    average-lock-95905

    05/04/2023, 7:13 AM
    Hi Team,
    Copy code
    from datahub.ingestion.run.pipeline import Pipeline
    from datahub.ingestion.run.pipeline import Pipeline
    
    # The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
    pipeline = Pipeline.create(
        {
            "source": {
                "type": "mysql",
                "config": {
                    "username": "user",
                    "password": "pass",
                    "database": "db_name",
                    "host_port": "localhost:3306",
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {"server": "<http://localhost:8080>"},
            },
        }
    )
    
    # Run the pipeline and report the results.
    pipeline.run()
    pipeline.pretty_print_summary()
    I'm using above code to ingest data into datahub, is there a way to add cloumn description while ingesting the data??
    ๐Ÿ“– 1
    l
    m
    a
    • 4
    • 23
  • p

    proud-dusk-671

    05/04/2023, 10:06 AM
    Unable to roll back data of a particular run. Please look at my command and the response received -
    Copy code
    $ datahub ingest rollback --run-id e25bf8dd-d006-44d5-8b9e-9f7c9e5057af -f
    
    Failed to execute operation
    INTERNAL SERVER ERROR
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    • 3
    • 5
  • b

    billions-baker-82097

    05/04/2023, 4:12 PM
    Does Datahub provide any way to override the class for their /login api?
    l
    a
    • 3
    • 3
  • a

    alert-piano-20790

    05/05/2023, 9:21 AM
    Hi, when testing datahub, we have some requirements about doing some airgap testing. Unfortunately, the datahub solutions are not friendly on this aspect, despite many images are composite of the whole solution, two of them are mysteriously monoliths : acryldata/datahub-ingestion and acryldata/datahub-actions . Are there any actions to adress their "pharaonic" size (about 4 and 5G) ? Thanks
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    m
    a
    • 4
    • 7
  • w

    wide-ghost-47822

    05/05/2023, 11:22 AM
    Hi, Iโ€™d like to ask you a question. Think that we execute lots of tests with Great Expectations in different tables in a dataset and integrate it into Datahub. There are lots of tables in Dataset and we wish to see a metric like how much validations has passed and failed in all tables in a Dataset or all Datasets. It will give us an overview about the tests and helps to create some alerts based on this data. Is there any Datahub API which we can be able to see that information?
    ๐Ÿ“– 1
    l
    a
    • 3
    • 2
  • w

    wide-battery-62896

    05/05/2023, 12:14 PM
    Hi together I tryed to setup datahub on my machine (mac book) and god this error message
    l
    b
    a
    • 4
    • 16
  • p

    proud-dusk-671

    05/05/2023, 12:49 PM
    Hi team, what happens if my current setup of quickstart has some state and I go ahead and restore some other backup via
    Copy code
    datahub docker quickstart --restore
    a. I want to specifically know what happens to my previous data b. Will datahub reach a bad state
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    • 3
    • 4
  • p

    proud-dusk-671

    05/05/2023, 12:56 PM
    Hi team, Getting this exception in logs of datahub-frontend
    Copy code
    2023-05-05 12:52:41,783 [application-akka.actor.default-dispatcher-6] WARN  o.e.j.j.spi.PropertyFileLoginModule - Exception starting propertyUserStore /etc/datahub/plugins/frontend/auth/user.props
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    • 3
    • 2
  • b

    bulky-scientist-8960

    05/05/2023, 2:10 PM
    Hello guys, is it possible to run datahub without elasticsearch or is it part of the core functionality? I know it's likely not officially supported to run this without elasticsearch, just wondering if anyone knows how dependent datahub is on elasticsearch, when going the neo4j route, is elasticsearch only used for the search functionality, so with some code modifications, would it be possible to disable the search and thus eliminate the elasticsearch requirement? Maybe someone familiar with datahubs internal workings can speed things up, have been going over the code, but still struggling to see the full picture.
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    a
    +3
    • 6
    • 23
  • i

    icy-kitchen-54364

    05/05/2023, 6:42 PM
    Hello, I want to integrate SSO for DataHub UI. Setting up first time and My OS is CentOS where DataHub is running? Anyone can please guide with the process?.
    l
    d
    • 3
    • 4
  • s

    silly-nest-50341

    05/08/2023, 3:22 AM
    hi, would like to ask if Datahub supports python api or sdk for validating(or searching) if given dataset urn exists or not? (seems like it is possible in graphQL api) thanks in advance
    l
    m
    +3
    • 6
    • 13
  • p

    proud-dusk-671

    05/08/2023, 4:12 PM
    Hi team, a very basic question - I have the following event coming to Datahub-Actions. Can you tell the best way to serialize this so that I can extract information out? I will prefer json serialisation but am okay with anything else as long as it is writtable in Python
    Copy code
    EventEnvelope(event_type='EntityChangeEvent_v1', event=EntityChangeEvent({'entityType': 'dataset', 'entityUrn': 'urn:li:dataset:(urn:li:dataPlatform:snowflake,user.l0.company_info,PROD)', 'category': 'TAG', 'operation': 'ADD', 'modifier': 'urn:li:tag:pii', 'parameters': None, 'auditStamp': AuditStampClass({'time': 1683562084007, 'actor': 'urn:li:corpuser:datahub', 'impersonator': None, 'message': None}), 'version': 0, '__parameters_json': {'tagUrn': 'urn:li:tag:pii'}}), meta={'kafka': {'topic': 'PlatformEvent_v1', 'offset': 4415, 'partition': 0}})
    ๐Ÿ“– 1
    โœ… 1
    ๐Ÿ” 1
    l
    b
    • 3
    • 9
  • c

    cold-london-92205

    05/09/2023, 12:22 AM
    Hi team, I would like to ask what went wrong in our SSO setup running on K8s. For a long time itโ€™s running perfectly but this week suddently the SSO broke. error from frontend
    Copy code
    2023-05-09 00:21:10,599 [application-akka.actor.default-dispatcher-9] ERROR auth.sso.oidc.OidcCallbackLogic - Failed to perform post authentication steps. Redirecting to error page.
    errror from GMS
    Copy code
    2023-05-09 00:21:10,595 [qtp1645547422-24] WARN  c.d.a.a.AuthenticatorChain:80 - Authentication chain failed to resolve a valid authentication. Errors: [(com.datahub.authentication.authenticator.DataHubSystemAuthenticator,Failed to authenticate inbound request: Provided credentials do not match known system client id & client secret. Check your configuration values...), (com.datahub.authentication.authenticator.DataHubTokenAuthenticator,Failed to authenticate inbound request: Authorization header missing 'Bearer' prefix.)]
    Our configs are based here: https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-google and I can confirm that the values wasnโ€™t updated or changed. Feedback would be highly appreciated. Thank you!
    ๐Ÿ“– 1
    l
    d
    +2
    • 5
    • 5
  • a

    adamant-postman-92176

    05/09/2023, 12:24 AM
    I'd like to use datahub to track downstream consumers of data, as well as upstream producers. Say for example I have an airflow job that writes to an s3 bucket. Later, a cron job reads from that s3 bucket and takes some action (e.g. emails a customer, etc.) What's the best way to represent this cron job as a consumer of data? Should it be tracked as a "dataset", even though it doesn't really store data anywhere? Or, is it better to track it using

    metadata enrichmentโ–พ

    to write a set of tags to the data source for the s3 bucket saying how the data is used? Thanks for any help. I'm sure this is a common problem, but I think I lack the proper nouns to properly search for this; I haven't had much luck so far. -Eli
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    b
    • 3
    • 5
  • l

    late-notebook-97260

    05/09/2023, 4:58 AM
    Hi I want to get the downstream lineage of the dataset using python SDK. I tried below code to fetch upstream but dont find a way to get the downstream table / list ( a.ka. right side of dataset) .
    Copy code
    from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
    from datahub.metadata.schema_classes import SchemaMetadataClass
    from datahub.metadata.schema_classes import UpstreamLineageClass
    table_urn = "urn:li:dataset:(urn:li:dataPlatform:looker,Revenue_SOX.explore.payments_promos_SOX,PROD)"
    graph = DataHubGraph(DatahubClientConfig(server=gms_server , token= datahub_gms_api_token))
    current_schema_metadata = graph.get_aspect(
        entity_urn=table_urn,
        aspect_type=UpstreamLineageClass)
    l
    m
    • 3
    • 3
  • a

    astonishing-father-13229

    05/09/2023, 7:41 AM
    Hi @hundreds-photographer-13496 @echoing-airport-49548 @astonishing-answer-96712 @better-orange-49102 Is there any way to disable datahub login page ? we are using our own xyz company's authentication tool to authenticate Note : we use default user name and password and then later dynamic username and password Could you please advise? Thanks in advance.
    ๐Ÿ” 1
    l
    d
    a
    • 4
    • 5
  • b

    bland-orange-13353

    05/09/2023, 9:23 AM
    This message was deleted.
    โœ… 1
    l
    • 2
    • 1
  • b

    bland-orange-13353

    05/09/2023, 9:29 AM
    This message was deleted.
    โœ… 1
    l
    • 2
    • 1
  • k

    kind-solstice-88514

    05/09/2023, 9:40 AM
    Hi- I have recently started exploring DataHub! I want to check if DataHub can integrate with ETL Tools like Qlik Compose and Qlik Replicate to capture the metadata from these tools and build Data catalog and Data Lineage?
    l
    d
    s
    • 4
    • 4
  • f

    fast-oxygen-9103

    05/10/2023, 2:22 PM
    hi i have started working on Datahub, which is new to me. i am trying to integrate airflow with datahub, in that process i started to run datahub container which is giving below error.
    โœ… 1
    l
    a
    • 3
    • 3
  • m

    mysterious-table-75773

    05/11/2023, 9:03 AM
    is there a way to run datahub without datahub-actions? it contains tens of critical vulnerabilities
    ๐Ÿ” 1
    โœ… 1
    ๐Ÿ“– 1
    l
    m
    b
    • 4
    • 7
  • d

    dazzling-daybreak-52128

    05/11/2023, 3:22 PM
    Hello everyone, I'm new to DataHub (testing functionalities). I am researching whether it is possible to automatically reuse e.g. a description of a column every time that column is reused (such as automatically recognizing / propagating metadata). Can someone help me with it?
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    a
    • 3
    • 3
  • a

    average-nail-72662

    05/11/2023, 4:23 PM
    Hello Everyone! Iโ€™m trying to run datahub container, but I giving the error broker is unhealthy. Anybody can you help me?
    l
    d
    b
    • 4
    • 4
  • f

    future-table-91845

    05/11/2023, 5:41 PM
    Hello Data Hub Team
    l
    • 2
    • 2
1...626364...80Latest