https://datahubproject.io logo
Docs
Join the conversationJoin Slack
Channels
acryl-omnisend
advice-data-governance
advice-metadata-modeling
all-things-datahub-in-windows
all-things-deployment
announcements
authentication-authorization
chatter
column-level-lineage
contribute
contribute-datahub-blog
data-council-workshop-2023
datahub-soda-test
demo-slack-notifications
design-business-glossary
design-data-product-entity
design-data-quality
design-datahub-documentation
design-dataset-access-requests
design-dataset-joins
feature-requests
flyte-datahub-integration
getting-started
github-activities
help-
i18n-community-contribution
ingestion
integration-alteryx-datahub
integration-azure-datahub
integration-dagster-datahub
integration-databricks-datahub
integration-datastudio-datahub
integration-iceberg-datahub
integration-powerbi-datahub
integration-prefect-datahub
integration-protobuf
integration-tableau-datahub
integration-vertica-datahub
introduce-yourself
jobs
metadata-day22-hackathon
muti-tenant-deployment
office-hours
openapi
plugins
show-and-tell
talk-data-product-management
troubleshoot
ui
Powered by Linen
contribute
  • l

    limited-library-89060

    01/11/2023, 9:20 AM
    Hi, could someone please review my PR https://github.com/datahub-project/datahub/pull/7007 , this is my first datahub contribution 😄, I just change the platform event topic name according to the related docker env
    i
    • 2
    • 1
  • w

    worried-branch-76677

    01/11/2023, 10:46 AM
    Hi Team, I created a PowerBI connector for PowerBI Admin only. Please review, feel free to discuss here or on the PR. https://github.com/datahub-project/datahub/pull/7009
  • b

    boundless-nail-65912

    01/11/2023, 3:57 PM
    Hi Team, Could someone please review my PR https://github.com/datahub-project/datahub/pull/7010. we have added Vertica as a source in datahub UI
    b
    b
    • 3
    • 3
  • a

    agreeable-dentist-42022

    01/13/2023, 1:43 PM
    Hi everyone! I found message in this channel dated several months ago (here it is https://datahubspace.slack.com/archives/C017W0NTZHR/p1669132636417829) about multi tenancy support in DataHub. I have the same question and wondering if there is possibility to extend DataHub's search with tenantId (having models contained this tenantId in their properties) so that it gives the way to filter data by tenant?
    a
    • 2
    • 2
  • e

    elegant-state-4

    01/20/2023, 2:10 PM
    I am finding contributing to Datahub to be very frustrating. I am unable to build it on my local machine and can’t seem to get any help building it despite reaching out to the community. This pours cold water over my excitement at using Datahub
    d
    i
    a
    • 4
    • 13
  • g

    gentle-camera-33498

    01/25/2023, 2:36 PM
    Hello Everyone, Just a little contribution to extract BigQuery lineage from Data Catalog Lineage API https://github.com/datahub-project/datahub/pull/7137
    d
    • 2
    • 9
  • c

    curved-planet-99787

    01/26/2023, 6:34 AM
    Does anyone knows anything about the status of this endeavor? https://github.com/datahub-project/datahub/pull/6119
  • w

    witty-butcher-82399

    01/30/2023, 7:31 PM
    Hi all, yet another little contribution https://github.com/datahub-project/datahub/pull/7177
  • f

    fast-barista-4910

    02/02/2023, 10:35 AM
    I encountered an issue where our late binding views didn't have any lineage although I couldn't see anything wrong with them. After a while I found out that those views where written with
    WITH NO SCHEMA BINDING
    in the SQL and the query for lineage building was case sensitive and lower case. I changed the
    like
    to
    ilike
    and got it working normally. PR here: https://github.com/datahub-project/datahub/pull/7223
  • g

    gentle-lifeguard-88494

    02/04/2023, 6:01 PM
    Hey I'm trying to push a one-line change, but I get a permissions issue when trying to push a new branch. I'm sure there's some process I'm not realizing here, this is the first time I've tried to contribute to an open-source project. There is just a small issue in the quickstart file:
    - curl -sS --fail '<http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=0s>' 
    || exit 1
    It's just a formatting issue - the '|| exit 1' is on a new line instead of being on the same line , here is the Github issue: https://github.com/datahub-project/datahub/issues/7255
    c
    • 2
    • 2
  • a

    adventurous-nightfall-90271

    02/13/2023, 12:17 AM
    Hey guys. I just created https://github.com/acryldata/datahub-helm/pull/264 to allow deployments to pass in a plaintext password for elasticsearch. I've given this a brief test and it looks to have worked.
  • g

    gentle-lifeguard-88494

    02/15/2023, 11:55 PM
    Hey everyone, I am looking to partner with someone to help me implement a new addition to the GraphQL API for the SQL profiling stats options. I want to add an option for distinct field sample values for low cardinality fields. Here is a sample of some code I mocked up that I think would achieve this. I took the sample values and modified it to my use case. I would also be happy to work with someone to convert it to not depend on great expectations. Anyway, I'm going to start diving into updating the GraphQL API following instructions here: https://github.com/datahub-project/datahub/tree/master/datahub-graphql-core. I've never contributed to an open source project before so definitely need some help navigating the codebase. If anyone wants to help me build this out, I would love to learn. Thanks! file: datahub/ingestion/source/ge_data_profiler.py (added at line 510)
    @_run_with_query_combiner
        def _get_dataset_column_distinct_values(
            self, column_profile: DatasetFieldProfileClass, column: str, unique_count: int, nonnull_count: int
        ) -> None:
            if not self.config.include_field_distinct_values or unique_count > 25:
                return
    
            try:
                # TODO do this without GE
                self.dataset.set_config_value("interactive_evaluation", True)
                
                # Check for distinct values in ever larger increments
                pct_dataset = [.01,.05,.10,.25,.5,1]
    
                for pct in pct_dataset:
                    samples_to_check = nonnull_count * pct
    
                    res = self.dataset.expect_column_values_to_be_in_set(
                        column,
                        [],
                        result_format={
                            "result_format": "SUMMARY",
                            "partial_unexpected_count": samples_to_check,
                        },
                    ).result
    
                    # Get the distinct values
                    distinct_values = [*set(res["partial_unexpected_list"])]
    
                    if len(distinct_values) == unique_count:
                        column_profile.distinctValues = [
                        str(v) for v in res["partial_unexpected_list"]
                        ]
                        # Exit loop if the distinct values are all captured
                        break
    
            except Exception as e:
                logger.debug(
                    f"Caught exception while attempting to get distinct values for column {column}. {e}"
                )
                self.report.report_warning(
                    "Profiling - Unable to get column distinct values",
                    f"{self.dataset_name}.{column}",
                )
    • 1
    • 1
  • r

    refined-energy-76018

    02/16/2023, 12:46 AM
    hi, I would like to contribute as a corporate employee. Part of our internal open source approval process requires us to provide a copy of the Contributor License Agreement (CLA) for review if applicable. Is there any CLA for Datahub?
    b
    l
    • 3
    • 3
  • b

    best-umbrella-88325

    02/23/2023, 6:36 AM
    Hey guys! Created PR https://github.com/datahub-project/datahub/pull/7410 for enhancing the S3 ingestion wherein data exists in the form of partitions. Currently, if there is any file that is present in the latest partition (day,month or year), the first file in the oldest partition gets picked. Due to this limitation, if there are any schema updates in the latest files, they are not visible on DataHub. We are trying to enhance this capability via this PR. Not sure whom to add in the reviewers section.
    m
    • 2
    • 2
  • b

    big-postman-38407

    02/28/2023, 9:52 AM
    Hello! I’m looking for a way to implement changes for the problem described here. I checked the links to posts provided by the maintenance team and didn’t find the answer I was looking for there. We want relationship changes between terms to be reflected in both ways so that we can track them and not lose them (like in Jira, for instance, when we link one task to another, we see the connection made in both tasks without any additional manual work). Is there a way to implement this change? Moreover, I found similar requests on the websites that may be connected to this one I’m trying to solve: • List inbound relationships to Glossary Terms • Missing functionality to display inherited terms through the main search bar
    e
    h
    • 3
    • 3
  • h

    hallowed-lizard-92381

    03/01/2023, 8:58 PM
    One liner contribution that fixes a dbt_core ingestion edge case. cc @gray-shoe-75895
  • q

    quiet-jelly-11365

    03/03/2023, 11:40 AM
    https://github.com/datahub-project/datahub/pull/7487 -> One line contribution, that fixes AWS role-based authentication for delta-lake ingestion.
  • n

    nutritious-bird-77396

    03/03/2023, 11:50 PM
    https://github.com/datahub-project/datahub/pull/7476 -> Small PR to do estimate counts in profiling for postgres
  • s

    shy-keyboard-55519

    03/06/2023, 12:16 PM
    Hi, can someone review this PR and reopen this issue? https://github.com/acryldata/datahub-helm/pull/270 https://github.com/acryldata/datahub-helm/issues/261
    a
    a
    b
    • 4
    • 5
  • b

    blue-engineer-74605

    03/06/2023, 5:38 PM
    Hey Folks! I’m working with Superset Metadata extraction - updating it to Superset 2.0, but I’m lost and also have some questions, anyone available for a chat? I’m willing to open a PR too.
    a
    • 2
    • 2
  • c

    cold-book-93720

    03/07/2023, 3:22 PM
    Hi there, I pushed a temp fix for an error that's happening on SSO with
    v0.10.0
    https://github.com/datahub-project/datahub/pull/7512 more on that here
    a
    • 2
    • 1
  • f

    fancy-oil-68203

    03/07/2023, 11:43 PM
    @astonishing-answer-96712 added a workflow to this channel: *Community Support Bot *.
  • w

    worried-branch-76677

    03/09/2023, 10:58 AM
    https://github.com/datahub-project/datahub/pull/7519 Please help to review my PR 🙂 with alot of goods for PowerBI connector
    a
    • 2
    • 1
  • s

    silly-fish-85029

    03/10/2023, 9:53 AM
    https://github.com/datahub-project/datahub/pull/7514 Please review my PR, this PR adds a feature to support path_specs of different S3 buckets in a recipe
    m
    g
    • 3
    • 5
  • a

    adamant-article-76582

    03/13/2023, 4:02 PM
    https://github.com/datahub-project/datahub/pull/7559 Hi 👋 , could you review my first PR please? This enables the batchDelayMs parameter, it has been passed as argument, however the usage had been removed.
    a
    • 2
    • 1
  • a

    acoustic-quill-54426

    03/13/2023, 4:05 PM
    👋 here is a PR adding support to parsing lineage from tableau custom SQL tables
    a
    • 2
    • 1
  • s

    shy-dog-84302

    03/15/2023, 5:57 PM
    Hi! Here is a minor fix to the values.yaml file in datahub-helm repository. Can someone take time to review this?
    a
    • 2
    • 1
  • a

    astonishing-cartoon-6079

    03/16/2023, 6:25 AM
    https://github.com/datahub-project/datahub/pull/7539 Please review my PR, this PR adds a jattach in docker for debug some performance issue in production env
  • m

    mysterious-monkey-71931

    03/22/2023, 11:46 AM
    Hello I'd like to use DataHub with OpenSearch because of ES license. However, I got the bellow error. After some short of investigation, I found that
    mapping types
    is completely removed in OpenSearch 2.0 and ElasticSearch 8.0 as well. So my question is there any plan to support OpenSearch and ElasticSearch 8.x
    a
    • 2
    • 1
  • f

    flat-engineer-75197

    03/22/2023, 1:53 PM
    https://github.com/datahub-project/datahub/pull/7639 👋 PR to allow the Glue recipe to ignore resource links (a type of cross account database). First raised in this Slack thread.
Powered by Linen
Title
f

flat-engineer-75197

03/22/2023, 1:53 PM
https://github.com/datahub-project/datahub/pull/7639 👋 PR to allow the Glue recipe to ignore resource links (a type of cross account database). First raised in this Slack thread.
View count: 1