https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • s

    swift-plastic-79414

    10/14/2022, 4:33 PM
    Hello, all! I understand that the best way to enrich the metadata is through using a transform at ingestion. However, is there any way to bulk update datasets once they been ingested? Or do I need to rerun ingestion with a transformer any time I want to enrich the metadata? Thanks in advance!
    b
    • 2
    • 1
  • c

    chilly-scientist-91160

    10/14/2022, 10:44 PM
    Hi, I’ve set up datahub and its running (jeuh 🙂). I’ve tried some of the ingestion sources and it works (jeuh 🙂). Now I am trying to ingest a typical json schema through the API but I cant get it to work - actually I am getting lost in the POST schema not sure what to put where Is there a simple example available to ingest a simple json schema (somehow, through API or other source)?
  • t

    thankful-kite-1198

    10/15/2022, 2:11 PM
    hello everyone! I tried to load an example of business glossary using cli, the following yml file was loaded:
  • t

    thankful-kite-1198

    10/15/2022, 2:11 PM
    version: 1 source: DataHub nodes: - name: Classification description: A set of terms related to Data Classification terms: - name: Sensitive description: Sensitive Data custom_properties: is_confidential: false - name: Confidential description: Confidential Data custom_properties: is_confidential: true - name: HighlyConfidential description: Highly Confidential Data
    m
    • 2
    • 7
  • t

    thankful-kite-1198

    10/15/2022, 2:11 PM
    the following errors I've got:
  • t

    thankful-kite-1198

    10/15/2022, 2:12 PM
    image.png
  • t

    thankful-kite-1198

    10/15/2022, 2:12 PM
    could anyone say what I made wrong?
  • t

    thankful-kite-1198

    10/15/2022, 7:11 PM
    Hi, everyone. I tried to load glossary terms and put some links between them via CLI. I've put contains and inherits in my yaml file, but in UI I found out that instead of linking one term to another datahub just created 2 different terms: one is that is linked to my main term (but it's not described and without any properties in place), and the other (which is described by me, with filled in properties), that is not linked to my main term, but named the same as linked one. How should I pointed to terms in "contains" and "inherits" statements to link terms among each other without datahub creating new terms itself?
    a
    • 2
    • 1
  • c

    chilly-scientist-91160

    10/17/2022, 7:31 AM
    Hellow again, Meanwhile I am doing a PoC to show our organisation what DataHub can do - working well. I have a question about ownership: Is it true “owners” can only be registered users with access to datahub? We have datasources from external companies - which we would like to mark as owner, but not grant access.
    b
    • 2
    • 8
  • h

    helpful-nightfall-24608

    10/17/2022, 9:20 AM
    👋 Hi everyone!
  • h

    helpful-nightfall-24608

    10/17/2022, 9:23 AM
    im interesting in adding custom lineage points for our data platfrom and from the docs it appears that i need to publish/push this metdata from outside of datahub, (eg from a secheduled script/task) is there anyway to have this as a pull mechanism from datahub itself
  • h

    helpful-nightfall-24608

    10/17/2022, 9:23 AM
    referring to lineage specifically here
  • k

    kind-dawn-17532

    10/17/2022, 2:29 PM
    Hi Everyone! Is there a way to only reindex specific Elasticsearch indices rather than everything in Datahub?
    a
    • 2
    • 1
  • q

    quaint-potato-16935

    10/18/2022, 12:56 PM
    Hi Everyone, We are having few schemas in Our Schema Registry. Is there is any way we can ingest those schemas in Datahub without creating Kafka Topics ? We will be creating & ingesting data in Kafka Topics later on. But planning to create a dataset with Schema Definition in first place.
    d
    m
    • 3
    • 2
  • m

    microscopic-restaurant-56474

    10/19/2022, 12:27 AM
    X-posting this here in case this is a better spot. Thanks! https://datahubspace.slack.com/archives/CV2UXSE9L/p1666138884735629
    q
    • 2
    • 3
  • c

    chilly-scientist-91160

    10/19/2022, 6:57 AM
    Hi, One last thing I don’t yet understand: In the repo I see al lot of mappers: (SchemaMapper.java, DataSetMapper.java) https://github.com/datahub-project/datahub/tree/55357783f330950408e4624b3f1421594c[…]rc/main/java/com/linkedin/datahub/graphql/types/dataset/mappers And also definitions for schema types: (BinaryJsonSchema, ParquetSchema, etc) https://github.com/datahub-project/datahub/tree/55357783f330950408e4624b3f1421594c98e3bc/metadata-models/src/main/pegasus/com/linkedin/schema It looks like there is already out of the box conversion available from various datasources into entities. My question is: do we already have a simple way to map an actual json-schema to SchemaMetaData of DataSet entity? If so, could you provide me a link to an example? - this would really bootstrap out adaptation. (currently I’ve build the jsonschema -> SchemaMetadata mapper myself, but looking at the project I expect there is already something out there)
  • a

    average-dinner-25106

    10/19/2022, 9:30 AM
    Hi, I have one question. Can datahub provide information about tables/columns that are joined frequently with the target table? If possible, what is the name of that function and how can I activate this?
    plus1 1
    a
    • 2
    • 3
  • b

    brave-tomato-16287

    10/19/2022, 10:20 AM
    Hello all! Is it possible to filter by words in raw SQL queries. For example filter all tableau datasets that contain "from user_account" in "custom sql query". In UI we can see this query as
    Definition
    • 1
    • 1
  • b

    brief-toothbrush-55766

    10/19/2022, 12:00 PM
    This filter used to work:
    Copy code
    {
      search(input: {type: DATASET, query: "*", start: 0, count: 100, filters: {field: "owners", value: "urn:li:corpuser:5fe30ef289188b1a76ee1677"}}) {
        start
        count
        total
        searchResults {
         
        }
      }
    }
  • b

    brief-toothbrush-55766

    10/19/2022, 12:04 PM
    However, now we get issues with the filers...something like
    filters is not defined by SearchInput ...Did you mean orFilter
  • b

    brief-toothbrush-55766

    10/19/2022, 12:05 PM
    But I cant find any ref on how the orFilter needs to be used
    b
    b
    • 3
    • 6
  • v

    victorious-vegetable-86298

    10/19/2022, 1:40 PM
    Hey, I have a general question. I was looking at using datahub to make a bunch of datasets consisting of non-tabular files (life sciences: e.g. think images or DNA bases) more discoverable and accessible. I was wondering does datahub have a way to basically have a folder as a source? I wasn't sure if https://datahubproject.io/docs/generated/ingestion/sources/file was that?
    a
    • 2
    • 1
  • a

    astonishing-lizard-90580

    10/19/2022, 4:45 PM
    Hi folks, I was doing some more testing for our org -- after deleting a few tables in BigQuery and re-running the ingestion (a number of times..) the tables are still showing in Datahub. I'm on version 0.9.0 Is this a known issue or should the tables be soft deleted as expected? Thanks!
    • 1
    • 1
  • v

    victorious-farmer-91004

    10/20/2022, 3:40 AM
    Hi everyone, as we deployed datahub using Kubernetes in GKE. But while config google SSO not able to succeed the procedure. Need to add env values in the datahub-frontend, as I am using helm to deploy the datahub nowhere to add values in values.yaml can anyone help me on this.
    s
    • 2
    • 1
  • q

    quiet-wolf-56299

    10/20/2022, 2:28 PM
    Can someone quickly explain the difference between JAAS and Native authentication with respect to datahub?
  • g

    glamorous-lion-94745

    10/20/2022, 5:26 PM
    Hello everyone! I'm trying to deploy Datahub on AWS following instructions located here: https://datahubproject.io/docs/deploy/aws/ - > deployed cluster and installed everything on my machine https://datahubproject.io/docs/deploy/kubernetes/ -> created secrets, created repo and tried install prerequisites after trying install prerequisites, prerequisites-cp-schema-registry keeps crashloopbackoff and restarting by himself and the others prerequisites keeps on "pending" state. Can someone help me debug this? I'm not very familiar about kubernetes. kubectl get pods Printscreen:
    plus1 2
    b
    • 2
    • 1
  • h

    happy-gigabyte-50024

    10/20/2022, 10:39 PM
    We are evaluating datahub within our company. One feature we are looking for is to be able to see unique values in a field(categories) and then be able to define what the values mean. Do you know if we can do something like that in Datahub. I see a profiling feature but it only shows upto 3 unique values and I can't seem to add meaning to the values.
  • r

    red-teacher-23135

    10/21/2022, 9:31 AM
    Hi peeps blob wave We are using Helm charts to deploy DataHub, and we have
    metadata_service_authentication
    enabled. I’m trying to deploy datahub-ingestion-cron in conjunction with DataHub gms and other DataHub services, so that I can have my cron jobs configured at the same time DataHub is deployed. But because metadata authentication is enabled, it seems that I need to have a Personal Access Token on my recipes, which must be generated through an existing valid user (like the root ‘datahub’ user). So, does this mean that I shouldn’t have the cron jobs and DataHub deployed at the same time? Is there a way for me to generate an access token for the ingestion recipes, without having to login and generate a token beforehand?
    s
    • 2
    • 4
  • m

    miniature-plastic-43224

    10/21/2022, 5:30 PM
    Hi All, quick question, how do I check the exact version of datahub I currently have? I mean, by looking at datahub project files, assuming datahub is not running anywhere.
    g
    b
    • 3
    • 5
  • w

    wonderful-egg-79350

    10/24/2022, 5:32 AM
    Hello All. I have a question about root password change. how I change datahub(root account) password in docker environment?
    b
    w
    • 3
    • 3
1...454647...80Latest