https://datahubproject.io logo
Join Slack
Powered by
# contribute-code
  • g

    gray-airplane-39227

    12/19/2023, 7:47 PM
    Hello, can I get some eyes ๐Ÿ‘€ on this small PR for fixing mongodb ingestion when
    platform_instance
    is missing from recipe? Thank you!
    โœ… 1
    ๐Ÿ‘ 1
  • c

    clever-ice-25576

    01/02/2024, 8:08 PM
    Hi team, I am in the process of writing a custom ingestion source, using the datahub documentation as a guide. does the datahub server need to have the custom source installed? or can sources be shimmed in at the package level for whatever is communicating to the server?
    b
    • 2
    • 2
  • l

    late-electrician-25891

    01/04/2024, 5:45 AM
    Hi All, Request team to help with PR https://github.com/datahub-project/datahub/pull/9412 - This is basically addressing broken Spark lineage for hive tables. Message links on other channels posted for the same https://datahubspace.slack.com/archives/C04BSKCPXCK/p1702033468493729?thread_ts=1696582266.105799&cid=C04BSKCPXCK.
    r
    g
    +2
    • 5
    • 6
  • p

    prehistoric-dream-62416

    01/08/2024, 11:34 AM
    query = """ query searchAcrossLineage { searchAcrossLineage( input: { query: "*" urn: "urnlidataset:(urnlidataPlatform:""" + dataplatform + """,""" + dataset + """.""" + table + """,PROD)" start: 0 count: 10 direction: """ + lin_stream + """ orFilters: [{ and: [ { condition: EQUAL negated: false field: "degree" values: ["1","2", "3+"] } ] # Additional search filters can be included here as well } ] } ) { searchResults { degree entity { urn type } } } } """ How to add subTypes in this query, resutlting in subTypes like table or view
  • r

    ripe-eye-60209

    01/08/2024, 1:21 PM
    Hello Team, could someone point out where is the usage statistics extraction logic is in the source code for Teradata source ?: https://datahubproject.io/docs/features/dataset-usage-and-query-history/
    s
    • 2
    • 1
  • f

    fresh-petabyte-24461

    01/08/2024, 7:06 PM
    Hello, can I get some eyes ๐Ÿ‘€ on this small pr https://github.com/acryldata/datahub/pull/274? It's about overriding datajob external_url in datahub airflow plugin. Thanks
  • f

    future-controller-3884

    01/11/2024, 5:49 AM
    Hi team, I'm using Datastream and BigQuery. The table on BigQuery is a upsert table. So, it will get a error when it execute query on https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries.py#L72 My question: Seem likes this query only collect the size of table, number of partition, etc. We can skip this metric. Do we have any suggestion to deal with it? We also can check exception and default a value in this case.
    r
    • 2
    • 1
  • m

    most-refrigerator-85564

    01/11/2024, 3:13 PM
    Hi team, since AWS has announced Amazon MSK IAM authentication now supports all programming languages, I want to check with the community on where we at for adopting this in datahub-action, the usage looks relatively straightforward as the example suggested here. Happy to put together a draft rfc if this hasn't been done yet.
    ๐Ÿ‘€ 1
    g
    • 2
    • 1
  • g

    gray-airplane-39227

    01/11/2024, 7:18 PM
    Hi I have a very small fix for mongodb PR to fix when downsampling collection schema, the output is not always consistent. This is because the schema is sort by frequency, but there are chances frequency is the same and we should compare their
    delimited_name
    to make sure output is consistent.
    g
    • 2
    • 2
  • t

    thankful-sunset-61734

    01/12/2024, 10:22 AM
    Hi guys, any chance someone from the team could have a quick look at this open PR: https://github.com/datahub-project/datahub/pull/9212. We currently need 3 custom images (mae-consumer, datahub-upgrade and gms) because of it. Many thanks!
    plus1 1
    b
    • 2
    • 1
  • s

    straight-cricket-47915

    01/22/2024, 8:32 AM
    Hi guys! What to do, if some tests were failed due to network reason? And this tests are not related to main scope of changes of PR? For example. PR is related to fixing mssql data_flows & data_jobs, but failed tests are Airflow related? PR
    g
    • 2
    • 2
  • a

    adventurous-dawn-19232

    01/23/2024, 10:54 AM
    my requirement creates a dataset for each database inside MySQL. Inside each database dataset, creates separate table datasets with names like "table_name" (without the database prefix), and each table dataset contains all the fields for that specific table. from the csv file useing python code is that possible i am getting databasename.tablename
    l
    • 2
    • 1
  • q

    quiet-television-68466

    01/24/2024, 3:04 PM
    Heya all, we (Checkout.com) built an access request feature through DataHub and was wondering if thereโ€™s any interest in receiving a contribution from us on it? This is what weโ€™ve done: 1. Extend the metadata model to include an aspect called
    accessRequest
    and
    accessRequests
    where an
    accessRequest
    contains an
    auditStamp
    and a stringMap of
    additionalMetadata
    (similar to custom properties). 2. We attached
    accessRequests
    to
    datasets
    ,
    containers
    and
    dataProducts
    3. We implemented a graphQL query
    addAccessRequest
    which takes looks like this
    Copy code
    mutation addAccessRequest {
      addAccessRequest(input: {
        resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
        additionalMetadata: [
          {
            key: "why2"
            value: "I need it"
          },
          {
            key: "who"
            value: "The Data Platform group"
          }
        ]
      })
    }
    4. We also implemented a button that contains a form that looks like (image attached) which submits the addAccessRequest ticket. (This could likely be made more modular by someone else) 5. Finally we have an action that runs that takes tickets and assigns them to a freshservice queue based on the domain attached (but we could contribute a simpler version that just requires a group_id and freshservice_api key in the config (or as secrets). Not sure which components of these would be useful for other people, but I thought it would at least start a discussion!
    plus1 5
    blob excited 2
    ๐Ÿ’ฏ 1
    ๐Ÿ‘Œ 4
    b
    b
    +2
    • 5
    • 9
  • s

    square-mouse-38194

    01/26/2024, 5:28 PM
    Hello everyone, I reimplemented the (S3) delta-lake ingestion source to support Azure (az://, adl://, abfs://, and will provide support for Microsoft Fabric as well), is there any interest for this code to be contributed back to the project once is ready?. Thanks!
    plus1 1
    l
    h
    d
    • 4
    • 4
  • l

    limited-monitor-26855

    01/31/2024, 5:44 AM
    We have EntityType.INGESTION_SOURCE in EntityType enum values but search operation over INGESTION_SOURCE return the exception:
    "Unknown entity type: INGESTION_SOURCE"
    . Note: Change the type to any other EntityType will return correct results
    Copy code
    query SearchIngestionSourcesWithoutFilter {
      search(input: {
        type: INGESTION_SOURCE,
        query: "*", 
        start: 0,
        count: 10
        
      }) {
        start,
        count,
        total: count,
        
        searchResults {
          entity {
            urn
          }
        }
      }
    }
    r
    • 2
    • 1
  • q

    quick-pizza-8906

    02/01/2024, 8:40 PM
    Hello, I am struggling to understand why my PR fails during linting: https://github.com/datahub-project/datahub/actions/runs/7747136631/job/21126913171?pr=9762 Considering that: 1. I didn't touch mentioned files or anything related to them 2. Last time they were touched was long time ago 3. Running
    :metadata-ingestion:lint
    task succeeds locally Does anybody have any idea what is happening there? I notice master builds succeed...
    g
    • 2
    • 2
  • f

    fresh-petabyte-24461

    02/06/2024, 12:46 AM
    Hey! How to trigger github actions? It seems getting stuck as always
    d
    • 2
    • 7
  • f

    fresh-petabyte-24461

    02/12/2024, 9:08 PM
    Hello, here is the PR for upgrading
    shiro-core
    due to a vulnerability. please take a look. thanks
    ๐Ÿ™Œ 1
    โœ… 1
  • f

    fresh-petabyte-24461

    02/12/2024, 10:03 PM
    Hello, here is the PR for upgrading
    fastapi
    and
    gitdb
    due to a vulnerability. please take a look. thanks
    โœ… 1
    o
    • 2
    • 1
  • g

    gray-airplane-39227

    02/14/2024, 7:46 PM
    datahubbbbHello, I have two improvement PR out for review: 1. implement flattening of struct fields in DynamoDB ingestion, this would allow flattening fields for
    Map
    attribute type in DynamoDB: https://github.com/datahub-project/datahub/pull/9852 2. improve sorting when downsampling collection schema, this would allow the output to be consistent with alphabetical order, relatively small: https://github.com/datahub-project/datahub/pull/9856 Would be greatly appreciate if team can have a look ๐Ÿ™๐Ÿ™๐Ÿ™
    โœ… 1
    r
    • 2
    • 3
  • r

    ripe-agency-8696

    02/15/2024, 4:17 AM
    Hi Team - can someone help in reviewing my PR https://github.com/acryldata/datahub-helm/pull/429 This PR would help in adding custom labels to datahub-frontend service.
    • 1
    • 1
  • d

    dry-raincoat-85182

    02/15/2024, 1:24 PM
    Hi Team, Kindly review our PR for the introduction of new entity "Business Attribute" in Datahub
    r
    r
    • 3
    • 4
  • a

    adamant-article-76582

    02/16/2024, 7:09 PM
    Hi team ๐Ÿ‘‹ , could you please review my PR about fixing various lineage problems for the Tableau entities.
  • m

    most-scientist-56654

    02/23/2024, 8:49 AM
    Hey folks ๐Ÿ‘‹๐Ÿป! I found that some people other than me are interested in something like enforceable views - functionality for limiting entity visibility/discoverability. In company I'm working for we're still evaluating DataHub and also interested in having this feature - so we can dedicate some time and work on it. But before starting I just wanted to ask is there something similar in the works or on a roadmap? Thanks. https://datahubspace.slack.com/archives/CV2UXSE9L/p1680054010543709 https://datahubspace.slack.com/archives/CV2UXSE9L/p1702598510793429
    b
    • 2
    • 3
  • l

    lemon-processor-68383

    02/26/2024, 9:03 PM
    Hi All. This is my first contribution on Datahub. Can anyone please review this PR and trigger workflow? https://github.com/datahub-project/datahub/pull/9921 This is about adding the support for JSONL files in s3 / GCS source.
    r
    • 2
    • 2
  • s

    some-crowd-4662

    02/27/2024, 5:33 PM
    Hi All @hundreds-photographer-13496 we need to update description for multiple tables across schema, database and schemas, we can use mutation api, but then i have to make a separate call for each record. is there a way /endpoint where we can supply urn and description in array and it will update description for multiple tables?
    r
    • 2
    • 2
  • a

    average-vr-23088

    02/27/2024, 8:56 PM
    Hi. Is there some documented way to do local development for the Datahub Airflow plugin? Iโ€™m trying to spin up a local airflow via docker and mount the files for the airflow plugin into the airflow plugins directory. Iโ€™m generating the files after doing a
    python setup.py build
    on the datahub-airflow-plugin. Is this the right approach? Iโ€™ve also tried to install the necessary python dependencies but iโ€™m getting errors from Airflow when it tries to load the plugin. Iโ€™m currently getting the following error:
    Copy code
    File "/usr/local/airflow/plugins/datahub_airflow_plugin/datahub_listener.py", line 19, in <module>
    aws-mwaa-local-runner-2_7-local-runner-1  |     from datahub.sql_parsing.sqlglot_lineage import SqlParsingResult
    aws-mwaa-local-runner-2_7-local-runner-1  | ModuleNotFoundError: No module named 'datahub.sql_parsing'
    This is despite me installing the
    acryl-datahub[sql-parser]==0.12.1.5
    package. Any guidance would be much appreciated!
    g
    • 2
    • 7
  • a

    average-vr-23088

    02/27/2024, 9:29 PM
    Which python package would i need to install, to be able to import
    datahub.sql_parsing
    ? I see that its part of the main acryl-datahub but for some reason i canโ€™t do that import when i install 0.12.1.5
  • a

    able-evening-90828

    03/01/2024, 1:47 AM
    Could someone please review my PR below? It has been a week since I submitted. https://github.com/acryldata/datahub-helm/pull/431
    r
    • 2
    • 1
  • f

    faint-painting-38451

    03/05/2024, 6:00 PM
    We were thinking about adding a failed topic for the MAE consumer, but just had some questions before looking into that. Found that the following files for the MCE consumer have failed topics: https://github.com/datahub-project/datahub/blob/master/metadata-jobs/mce-consumer/[โ€ฆ]/com/linkedin/metadata/kafka/MetadataChangeEventsProcessor.java https://github.com/datahub-project/datahub/blob/master/metadata-jobs/mce-consumer/[โ€ฆ]m/linkedin/metadata/kafka/MetadataChangeProposalsProcessor.java However, this file for the MAE consumer doesn't send to a failed topic in the event of an error: https://github.com/datahub-project/datahub/blob/master/metadata-jobs/mae-consumer/[โ€ฆ]ava/com/linkedin/metadata/kafka/MetadataChangeLogProcessor.java Is there a reason that the MetadataChangeLogProcessor doesn't have a failed topic like the MCE consumer or is that something that we can look into?
    g
    r
    • 3
    • 2