https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • g

    great-cpu-72376

    05/16/2022, 2:32 PM
    Hi, I am reading the datahub docs to understand the architecture but I do not find anything about datahub-action, what does this component do? it performs ingestion? I would like to deploy sono pip modules to support ingestion of metadata for other system as apache superset, should I install the module in datahub-action, datahub-gms and datahub-frontend or should I modify the dockerfile of not all of these components?
    i
    • 2
    • 2
  • d

    delightful-barista-90363

    05/16/2022, 8:53 PM
    Hello, I am setting up Datahub using AWS managed services, including RDS. Is it possible to use an iam based authentication for the rds/database instance or is a username/password auth required?
    i
    • 2
    • 1
  • l

    late-zoo-31017

    05/16/2022, 9:51 PM
    Hi all. I am looking for a nice example (or -es) (ie visually appealing) to demonstrate data lineage to my group. Do you have anything I can ingest to my local datahub + show people?
    i
    • 2
    • 3
  • b

    bored-dress-52175

    05/17/2022, 2:34 AM
    I am trying to make new entity in https://github.com/datahub-project/datahub/tree/master/metadata-models-custom ,code of entity registry and special aspect is shown below , after running ../gradlew -PprojVersion=0.0.1 build and ../gradlew -PprojVersion=0.0.1 install command i am getting an error which is shown in photo below
    i
    • 2
    • 1
  • s

    shy-refrigerator-3266

    05/17/2022, 9:10 AM
    Hi All, New to datahub here I am trying to generate personal access token for programmatic access to the endpoints provided. I have followed this guide and set
    AUTH_POLICIES_ENABLED=true
    in
    datahub-gms
    of my
    docker-compose.quickstart.yml
    . However, the UI is still showing the error
    Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature.
    in the settings page. Am I doing something wrong here?
    h
    s
    p
    • 4
    • 5
  • a

    abundant-receptionist-6114

    05/17/2022, 1:33 PM
    Hi All, We are thinking how can we visualize data flow between microservices in our product. Now we have an idea of writing custom plugin that will ingest lineage between tables via JSON config. Is there a better approach for doing this?
    i
    • 2
    • 1
  • a

    alert-football-80212

    05/17/2022, 1:56 PM
    Hi all, There is a cli datahub command to delete everything from specific platform? e.g s3 Thank you
    m
    i
    • 3
    • 17
  • c

    cool-actor-73767

    05/17/2022, 2:30 PM
    Hi All, I cloned repo and without do anything I try to build the project using command " ./gradlew build " but I'm receiving this error "task: docs-website:generateGraphQLSchema FAILED". Someone knows what is happing?
    i
    • 2
    • 14
  • g

    great-cpu-72376

    05/18/2022, 1:41 PM
    Hi, I am trying to understand tags, domain and glossary. I noticed that there are transformers to associate tags to element as all tables element of a database but there is not any transformer to associate a domani. From UI I can add a domain to several entities as database ad tables but how can I associate domain to each table of a database? If I associate the domain to the database it is associated only to the database entity and not to all the tables contained in it. Does a way exist?
    b
    l
    d
    • 4
    • 7
  • b

    best-wolf-3369

    05/18/2022, 2:34 PM
    Hi all, I am trying to create a GlossaryTerm using the Rest.li API and it's almost working. The item is created but the camelCase notation used at the name is not respected and everything goes to lowercase. This problem is not present while using yml ingestion method, but we need to use Rest.li API in owr development. Could you provide some insight? Thank you very much.
    Copy code
    import requests
    import json
    
    url = "<http://host>:port/entities?action=ingest"
    
    payload = json.dumps({
      "entity": {
        "value": {
          "com.linkedin.metadata.snapshot.GlossaryTermSnapshot": {
            "urn": "urn:li:glossaryTerm:camelCaseObject",
            "aspects": [
              {
                "com.linkedin.glossary.GlossaryTermInfo": {
                  "definition": "Object definition",
                  "parentNode": "urn:li:glossaryTerm:camelCaseObjectParent",
                  "sourceRef": "DataHub",
                  "sourceUrl": "<https://github.com/linkedin/datahub/>",
                  "termSource": "INTERNAL"
                }
              }
            ]
          }
        }
      }
    })
    headers = {
      'Content-Type': 'application/json'
    }
    
    response = requests.request("POST", url, headers=headers, data=payload)
    i
    • 2
    • 1
  • g

    great-cpu-72376

    05/18/2022, 3:05 PM
    I am trying to execute a file ingestion using the following receipe:
    Copy code
    source:
        type: file
        config:
            filename: '/home/afmul/PRODUCTION/datasetAnalysis/positioning/RAW/positioningRaw.csv'
    sink:
        type: datahub-rest
        config:
            server: '<http://localhost:9090>'
    I try to execute the ingestion using command:
    datahub ingest run -c ingestion_file_receipe.yml
    But I receive a big error:
    Copy code
    ---- (full traceback above) ----
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 317, in wrapper
        raise e
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 269, in wrapper
        res = func(*args, **kwargs)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
        res = func(*args, **kwargs)
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 128, in run
        raise e
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 114, in run
        pipeline.run()
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 214, in run
        for wu in itertools.islice(
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 77, in get_workunits
        for i, obj in enumerate(iterate_generic_file(self.config.filename)):
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 42, in iterate_generic_file
        for i, obj in enumerate(_iterate_file(path)):
    File "/home/afmul/linkedin-datahub/lib/python3.8/site-packages/datahub/ingestion/source/file.py", line 25, in _iterate_file
        obj_list = json.load(f)
    File "/usr/lib/python3.8/json/__init__.py", line 293, in load
        return loads(fp.read(),
    File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
        return _default_decoder.decode(s)
    File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    What does it mean?
    b
    d
    • 3
    • 42
  • c

    calm-jackal-26275

    05/18/2022, 6:08 PM
    Hi, I am trying to figure out how to add new terms to the glossary. I have datahub quickstart running locally and have tried ingesting this business_glossary.yml
    datahub ingest -c business_glossary.yml
    But get some syntax errors
    Copy code
    [2022-05-18 14:02:49,315] INFO     {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.34.2
    5 validation errors for PipelineConfig
    source
      value is not a valid dict (type=type_error.dict)
    nodes
      extra fields not permitted (type=value_error.extra)
    owners
      extra fields not permitted (type=value_error.extra)
    url
      extra fields not permitted (type=value_error.extra)
    version
      extra fields not permitted (type=value_error.extra)
    i
    • 2
    • 1
  • c

    cool-actor-73767

    05/18/2022, 11:41 PM
    Hi Guys! When I try to build the project I receive this error message:
    Task metadata ingestiondocGen FAILED
    Caching disabled for task 'metadata ingestiondocGen' because: Build cache is disabled Task 'metadata ingestiondocGen' is not up-to-date because: Task has not declared any outputs despite executing actions. Starting process 'command 'bash''. Working directory: /home/ubuntu/datahub/metadata-ingestion Command: bash -c source venv/bin/activate && ./scripts/docgen.sh Successfully started process 'command 'bash'' rm: cannot remove '../docs/generated/ingestion': No such file or directory Traceback (most recent call last): File "scripts/docgen.py", line 7, in <module> from importlib.metadata import metadata, requires ModuleNotFoundError: No module named 'importlib.metadata'
    teamwork 1
    i
    m
    • 3
    • 4
  • c

    curved-truck-53235

    05/19/2022, 12:19 PM
    Hi everyone! Can we use Postrgres 14.3 as backend? Did anyone test with latest Postgres versions?
    b
    • 2
    • 1
  • g

    gentle-camera-33498

    05/19/2022, 12:47 PM
    Hello Guys!! It's possible to have multiple replicas of the GMS server instead of scaling it vertically?
    b
    • 2
    • 1
  • c

    cool-actor-73767

    05/19/2022, 6:44 PM
    Hello Guys! Does somebody have an easy guide to use LDAP for authentication to datahub? I follow doc https://datahubproject.io/docs/datahub-frontend and changed 'JAAS.CONF' with LDAP parameters, but doesn't work.
    b
    • 2
    • 4
  • f

    future-student-30987

    05/19/2022, 8:36 PM
    Hi everyone, Does datahub capture/collect some user data? If so, which? My question is because of GDPR
    i
    • 2
    • 2
  • b

    bright-beard-86474

    05/19/2022, 8:40 PM
    Hello DataHub Team! I have a quick question regarding the GraphQL API. The Docs says functionality like creating and removing entities is planned to be integrated and support. Is there any ETA? Thanks!
    l
    b
    • 3
    • 2
  • g

    gentle-umbrella-84426

    05/20/2022, 2:25 PM
    Hi all! I’m Antonio from Agile Lab 🙂 I’m a passionate data engineer and very curios about datahub development!
    i
    • 2
    • 2
  • b

    breezy-noon-83306

    05/20/2022, 7:22 PM
    Hi all ! Which is the best way to start with datahub if you are an analyst, not a developer or engineer? Is there a feasible way to install it and getting it running without coding ?
    i
    • 2
    • 3
  • s

    sparse-raincoat-42898

    05/20/2022, 10:02 PM
    Hello, I have DataHub setup in AKS and its up and running and I also have Airflow configured in AKS. I am following this document to integrate but not sure where to run this command "*pip install acryl-datahub[airflow]*". Any help is appreciated
    i
    • 2
    • 2
  • c

    clever-machine-43182

    05/21/2022, 7:13 AM
    Hi. I’m trying to ingest
    .json.gz
    file on S3. Can DataHub infer schema on this case? It seems like no options for compressed type.
    d
    • 2
    • 1
  • c

    clever-machine-43182

    05/22/2022, 10:43 AM
    Hi. I have another question. In my case, company has spec sheet for internal use, and for external sharing. Can I import and export Google Spread Sheet or other format for migration? We have problem with managing spec sheet for internal use and synchronize with system like RDBMS, also handle thousands of metadata on Google sheet. We don’t want to migrate it one by one. We want to synchronize DataHub and our multiple data storages’ metadata automatically, but I can’t find the way to import/export all data from our system. Can I get a support?
    b
    i
    • 3
    • 4
  • c

    cool-carpet-74662

    05/23/2022, 10:16 AM
    to track changes / meta data information, like how many datasets inside datahub, and functions for change tracking in files
    b
    • 2
    • 1
  • s

    steep-sandwich-72508

    05/23/2022, 2:04 PM
    Hello everyone, I am developing a frontend application that consumes the datahub graphql api. I want to show the "Most Popular" dashboards accessed by a specific user (the user that represents my application). My questions are: 1. I'm using que listRecommendations query to get that data. is there a way to return just the "Most Popular" module? 2. How can I filter the results to get just modules with a given tag 3. How can I filter the results to get just modules with a given type 4. How do I tell DataHub that the user accessed some dashboard, to make datahub able to count this access for "most popular" reasons? Thanks!
    b
    e
    • 3
    • 2
  • s

    steep-sandwich-72508

    05/23/2022, 2:40 PM
    And I also need to get more than five "Most Popular" results
    b
    e
    • 3
    • 3
  • b

    breezy-noon-83306

    05/23/2022, 5:13 PM
    Have you ever thought in starting specific online training about starting with Datahub ? It would be very very useful and an excellent complement of this community and the youtube channel...
    b
    • 2
    • 2
  • f

    full-raincoat-68234

    05/23/2022, 8:33 PM
    Hello Guys, Is it possible to specify the region when using the snowflake connection?
    l
    s
    • 3
    • 2
  • h

    helpful-librarian-40144

    05/24/2022, 2:03 AM
    Hello everyone, I am trying to use datahub restAPI ,but there is no authentication, how to integrate with third party service to protect rest API in datahub
    b
    • 2
    • 2
  • b

    bitter-dusk-52400

    05/24/2022, 5:19 AM
    Hi datahub team, how to create users and change password for existing datahub account in datahub UI.
    b
    • 2
    • 2
1...282930...80Latest