https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • m

    miniature-eve-89383

    12/08/2021, 9:03 PM
    Hi, is there a way to simply have the front-end (http://server:9002) listen to port 80 instead? We tried changing the docker-compose.yaml file but we received an error. We then tried using an
    nginx
    container as a reverse-proxy and got the same error:
    Failed to log in! SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
    . I have searched for documentation to do that but I couldn't find anything except some document explaining how to chang a port number in the case where mysql is already running locally, for example.
    o
    b
    • 3
    • 18
  • m

    miniature-eve-89383

    12/09/2021, 4:57 PM
    Question one: I understand that the recommended deployment method for a production install is on k8s. Docker is only for POC/development?
    m
    • 2
    • 2
  • m

    miniature-eve-89383

    12/09/2021, 4:58 PM
    Question two: Is it an option to use something else than MySQL 5.7 for the DBMS? Is it a good idea or 99% of people use MySQL 5.7?
    m
    • 2
    • 1
  • m

    miniature-eve-89383

    12/09/2021, 5:02 PM
    Question 3: Is Apache Kafka necessary? I can't figure out.
    m
    b
    • 3
    • 3
  • m

    miniature-eve-89383

    12/09/2021, 5:50 PM
    How does one backs up the whole Datahub system? And restore?
    i
    l
    b
    • 4
    • 7
  • m

    miniature-eve-89383

    12/09/2021, 5:53 PM
    We're trying to automate the creation of a POC datahub instance, with at least one source of metadata (most likely the PostgreSQL demo database found here https://www.postgresqltutorial.com/postgresql-sample-database/) but I was wondering, has this been done before? We could contribute our code once done but if someone has done that already it would save us time.
    • 1
    • 1
  • m

    mysterious-lamp-91034

    12/09/2021, 8:34 PM
    Today I am deploying datahub in my dev server and try it. I am seeing this error after I ran
    datahub docker quickstart
    https://gist.github.com/wizardxz/bbdfc5aba74a6f7f61b4d9ef7f521745#file-gistfile1-txt-L32 Then I replaced all the 8081 to 8089 in datahub code
    find . -type f -exec sed -i 's/8081/8089/g' {} +
    Then I still see it complaining the same error. I wish I can get some support here. Thanks
    i
    e
    • 3
    • 7
  • m

    miniature-eve-89383

    12/10/2021, 2:50 PM
    Are there datatypes that are not supported for databases? Example: json/jsonb on PostgreSQL
    m
    • 2
    • 3
  • m

    miniature-eve-89383

    12/10/2021, 2:51 PM
    Are there limitations on the docker install that are not in the k8s install? I can't seem to be able to change the user's password, or create new users in the interface and I can't find anything in the CLI
    e
    b
    • 3
    • 5
  • g

    gentle-florist-49869

    12/13/2021, 6:58 PM
    is it a simple question and I'm sure that someone can help: Does Datahub any feature to control schema versions? e.g MySQL schemas that was changed, like new column or something like this, please?
    i
    m
    • 3
    • 13
  • m

    miniature-eve-89383

    12/13/2021, 8:24 PM
    It looks like the REST service doesn't require authentication. Is that normal?
    b
    e
    • 3
    • 8
  • s

    strong-kite-83354

    12/14/2021, 9:46 AM
    Hi all, I'm doing a Proof of Concept with Datahub, and I have a question regarding how to represent multiple "instances" of a dataset in Datahub. To make this concrete, we take in a lot of data from suppliers on a monthly basis. So something like the Land Registry Price Paid dataset is released every month (https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads). I store this on disk in directories like "land-registry-price-paid\2021-09-01" where each instance is labelled with a date. The format and documentation of this dataset will generally not change. How should I represent these related datasets? It seems wrong to onboard each instance independently since they share a great deal of metadata. I can see I could put this information in the BrowsePath or label instances in a properties field but I don't know which way is "right". Thanks for your help!
    b
    • 2
    • 2
  • t

    tall-toddler-62007

    12/14/2021, 2:43 PM
    Hi all, I'm a newbie here, trying to use datahub with clickhouse. I've pulled the this and this branches and have tried building my docker images. After doing that, when I try to ingest this recipe, I'm getting the below error:
    Copy code
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/entrypoints.py", line 93, in main
      sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
      return self.main(*args, **kwargs)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1053, in main
      rv = self.invoke(ctx)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 754, in invoke
      return __callback(*args, **kwargs)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 67, in run
      pipeline = Pipeline.create(pipeline_config, dry_run, preview)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 143, in create
      return cls(config, dry_run=dry_run, preview_mode=preview_mode)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 108, in __init__
      source_class = source_registry.get(source_type)
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 124, in get
      raise KeyError(f"Did not find a registered class for {key}")
    
    KeyError: 'Did not find a registered class for clickhouse'
    p
    m
    • 3
    • 17
  • d

    dazzling-appointment-34954

    12/15/2021, 3:11 PM
    Hi everyone, I have a short question regarding ingestion / lineage. Is it possible to add Glossary terms to the lineage graph in any way? (and maybe even lineage between multiple glossary items)
    b
    m
    • 3
    • 6
  • m

    magnificent-carpet-1266

    12/16/2021, 8:58 PM
    Hi everybody I'm looking for some information on if it's possible to mask sensitive information recipes. I have successfully ingested information from my secured Druid source but had to hard code my user name and password in the recipe. Having the user name and password hard coded in the recipe is a security risk and wondering if there is a way to pull them from my azure vault where they are securely stored. Or even if they change that I won't have to go and update my recipes. Cheers
    b
    i
    +3
    • 6
    • 6
  • m

    mammoth-bear-12532

    12/17/2021, 4:52 PM
    <!here> I hear rumors that this town-hall has some really cool surprises, and it is starting in 8 mins 🎉 See you there!
    datahub 4
    m
    i
    l
    • 4
    • 6
  • f

    future-petabyte-5942

    12/20/2021, 5:52 AM
    Hi guys, Does this quickstart guide works well also in windows?
    s
    s
    • 3
    • 6
  • a

    agreeable-river-32119

    12/20/2021, 8:38 AM
    Hi team, can I run it locally without docker dependency?After run
    Copy code
    MCE_CONSUMER_ENABLED=true MAE_CONSUMER_ENABLED=true DATAHUB_ANALYTICS_ENABLED=true ./gradlew :metadata-service:war:run
    It cause a error
    Copy code
    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanAspectDao' defined in com.linkedin.gms.factory.entity.EbeanAspectDaoFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.entity.ebean.EbeanAspectDao]: Factory method 'createInstance' threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanServer' defined in com.linkedin.gms.factory.entity.EbeanServerFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.ebean.EbeanServer]: Factory method 'createServer' threw exception; nested exception is java.lang.NullPointerException
    😭
    b
    • 2
    • 4
  • m

    miniature-eve-89383

    12/21/2021, 2:37 PM
    How can we tell what "version" of Datahub we're running? We just upgraded our test instance and want to make sure that log4j has been updated to the latest everywhere.
    s
    o
    b
    • 4
    • 13
  • m

    miniature-eve-89383

    12/22/2021, 1:23 AM
    How can we manage easily our setup with an nginx container in front? We modified the
    docker-compose.yml
    file to add a standard nginx in front to expose port 80 instead of 9002 directly from the
    frontent
    container. Easy to install, but I don't know how to deal with that when upgrading.
    b
    b
    • 3
    • 3
  • m

    mysterious-lamp-91034

    12/22/2021, 10:30 PM
    n00b question: Do we support struct type in datahub? Looks like we don't support ingesting it? https://github.com/linkedin/datahub/blob/2ee1a78f4e50c3f0dba3a2d4f7bb2b9c85d19797/[…]tadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
    l
    g
    • 3
    • 14
  • b

    brief-church-81973

    12/23/2021, 2:41 PM
    Hello, we are running a POC with the datahub tool and I'm working on integrating our data quality tool with it. Does datahub have a java library that supports things like MetadataChangeProposalWrapper ?
    i
    l
    c
    • 4
    • 6
  • b

    bland-orange-13353

    12/27/2021, 10:28 AM
    This message was deleted.
    p
    • 2
    • 2
  • w

    white-air-98106

    12/27/2021, 12:57 PM
    🙋 Anwesend! Wer ist sonst noch hier?
    👋 2
    b
    • 2
    • 1
  • c

    cuddly-telephone-51804

    12/30/2021, 1:17 AM
    Hi all, I want to create a new user to login (with username and password). I installed DataHub with
    datahub docker quickstart
    and that did not include source code, just several running containers. How can I edit
    user.props
    file to add user to login? I tried to edit this file inside the
    datahub-frontend-react
    container but it didn't work.
    b
    • 2
    • 2
  • p

    powerful-flower-6452

    12/30/2021, 6:12 AM
    Can we extend the available fieldTypes inside
    @Searchable
    to include numeric field types? Currently the ones available are https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/#searchable. This can enable simple numeric filters. Any particular reason why this isn’t supported currently?
    e
    • 2
    • 18
  • b

    busy-dusk-4970

    12/30/2021, 9:35 PM
    Does anyone know if datahub get user details from the OIDC provider like what groups they are a member of etc or is this something I would have to do on my end and get into datahub myself?
    f
    • 2
    • 2
  • r

    rhythmic-kitchen-64860

    01/03/2022, 5:57 AM
    can anyone tell me how to get the tags in every field? thx before
    e
    • 2
    • 6
  • p

    powerful-flower-6452

    01/03/2022, 11:11 AM
    I am looking at a use case where I have to extend the metadata model. I followed the flowchart on https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model/#to-fork-or-not-to-fork and looks like I need to clone the
    metadata-models-custom
    repo and deploy it as an extension to the metadata service. The example on https://github.com/linkedin/datahub/tree/master/metadata-models-custom tells how to deploy an extension on local. What should I do if I want an extension to gms service running on Kubernetes? Should I do something like mount the plugins folder on the gms pod?
    e
    • 2
    • 4
  • f

    few-air-56117

    01/03/2022, 2:05 PM
    Hi guys, i am new on datahub soo, i have a question (even if it’s probably stupid)😇. How can i run datahub on production server (vm, app engine) not the
    Copy code
    datahub docker quickstart
    with the localhost server. Thx 😄 ,
    e
    b
    • 3
    • 22
1...171819...80Latest