https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • e

    eager-florist-67924

    03/10/2022, 10:19 PM
    Hi Team. I have noticed the that entity dataprocess has been deprecated and the development of lineage focus more on dataflow. While we plan to source data from multiple apps over set of kafka topics by kafka streams springboot apps. What is your vision here or suggestion? https://github.com/linkedin/datahub/pull/4114
    g
    m
    • 3
    • 3
  • n

    nice-intern-57251

    03/14/2022, 10:34 AM
    Hello everyone ! My name is Anton, I am interested in using DataHub for the modern Data Mesh architecture. Is there any useful info regarding DataHub federation process? Currently, I found only a few paragraphs on the official website, but there is no docs, guidelines etc.
    👍 1
    b
    • 2
    • 3
  • c

    curved-carpenter-44858

    03/14/2022, 1:15 PM
    Hello everyone. My name is Ramakrishna, I am exploring Datahub from last couple of weeks. It looks a promising tool for our data discovery and observability use cases. To move further I am looking for some details on the feature differences between the OSS and Acryl Data managed version of DataHub. I could not find it anywhere. Can someone help me with it ?
    👍 1
    l
    b
    • 3
    • 2
  • t

    tall-island-10703

    03/14/2022, 2:31 PM
    hey guys how do i actually get data out from datahub? e.g. let users run a query or point them to a bq database, a csv download etc..?
    b
    b
    +2
    • 5
    • 16
  • g

    gifted-kite-59905

    03/15/2022, 1:41 PM
    Hi team , how should I decide deploying MCE and MAE integrated with GMS? or separately?
    e
    • 2
    • 1
  • b

    brief-businessperson-12356

    03/16/2022, 2:01 AM
    Hi all - alot of our database documentation is currently held within spreadsheets, so am wondering whether it is possible to programmatically add table and column descriptions from these spreadsheets? It is not quite clear to me whether this is done through a Transformation whilst metadata is being ingested, the GraphQL API or Emitter (or something else entirely!) Thanks!
    m
    b
    • 3
    • 3
  • n

    numerous-telephone-79266

    03/16/2022, 6:38 AM
    Hello all How can I configure LDAP login in datahub please? I installed datahub from datahub docker quickstart
    b
    b
    +3
    • 6
    • 7
  • b

    big-businessperson-99368

    03/17/2022, 2:41 PM
    Checking the documentation about extract Metadata from Data Lake I have some questions: • is it possible to read Azure Data lake ? I saw just YAML attributes for AWS
    l
    l
    • 3
    • 6
  • a

    ambitious-magazine-1421

    03/18/2022, 6:00 AM
    Hi All, I’m a new user of datahub and I’m not sure if it’s okay to ask questions on this channel? I have a strange problem, I have a table in dataset hk_db, this table is named user_vouchers, I can’t find it when I search with “vouchers”, but I can find it when I query with “voucher”, does anyone have any suggestions?
    e
    b
    • 3
    • 3
  • l

    loud-kite-94877

    03/18/2022, 7:33 AM
    Hi all. What is the different between search "business" and "tags:'business'"? Why the dataset "AppAsset" not in the result, while the search text is "tags:'business'"? Is this a BUG?
    b
    l
    +2
    • 5
    • 4
  • s

    swift-breakfast-25077

    03/18/2022, 10:38 AM
    hi all, i am new to datahub i just installed it using pip (the quick start guide) and now i want to ingest metadata from postgres but i can't find the examples/recipes folder and all the others datahub folders. I found that datahub-env package, can anyone help me find the location of the files? I can't configure anything if I can't find them.
    b
    • 2
    • 9
  • s

    sparse-grass-96875

    03/21/2022, 8:25 AM
    Hi all, I am new to datahub. I am trying to check if datahub support lineage of field level? Checked the docs and didn't find the answer straightly. Thanks for your reply in advance.
    b
    c
    • 3
    • 4
  • c

    curved-truck-53235

    03/21/2022, 9:38 AM
    Hi everyone! Is there some code examples to create lineage via REST?
    b
    l
    o
    • 4
    • 11
  • h

    hallowed-analyst-96384

    03/22/2022, 12:54 PM
    Hello everyone, at work, we have an Airflow Dag which collects files from FTP and SFTP servers and moves them to GCS and AWS S3 after transformation. I have already configured an Airflow connection to Datahub. How should I implement Airflow inlets and outlets to represent this flow in Datahub? Most examples like this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/lineage_backend_demo.py use Datasets but I am not sure how to represent an Entity like FTP in Datahub
    d
    o
    • 3
    • 4
  • j

    jolly-traffic-67085

    03/23/2022, 5:18 AM
    Hello everyone, I am new to datahub , I have one problem I need to curl use command to remove entity dataset but I don't know urn pattern please tell me. --> this command curl "http://localhost:8080/entities?action=delete" -X POST --data '{"urn": <urn pattern>}'
    h
    • 2
    • 2
  • j

    jolly-traffic-67085

    03/23/2022, 1:13 PM
    Hi everyone again. I need to know more about policies Can I only set specific user to only see specific dataset Ex. Have 3 datasets (dataset1, dataset2, dataset3) I want to grant only dataset1 for UserA that mean he must not see other 2 datasets is it possible ? Thanks for advance
    b
    • 2
    • 1
  • b

    boundless-student-48844

    03/23/2022, 2:37 PM
    Hi team, can i check how entity relationship is stored in the persistence layer (mysql /es)? I don’t seem to find any es index that stores the entity relationship. For example, the
    ownedBy
    relationship between CorpUser / CorpGroup entities and Dataset entities (link). I am trying to understand the difference of adding & without adding
    @relationship
    annotation in PDL.
    g
    • 2
    • 2
  • f

    fancy-butcher-14164

    03/24/2022, 1:19 PM
    Hello all. General question - is anyone using Okera for access management?
    b
    • 2
    • 1
  • w

    worried-zebra-47870

    03/25/2022, 1:13 PM
    Hi all. Quick question: did any one of you manage to make the lookml source work with Datahub dockerized? I don't know how to make it work..
    b
    • 2
    • 2
  • t

    thousands-pencil-42773

    03/27/2022, 6:44 PM
    Hi All , I am new to datahub. I tried to build the datahub locally (
    ./gradlew build
    ) . It got passed once at the start and after that whenever I do
    ./gradlew build
    its failing for some or the other reason . And now the situation is that its taking too much time to build (recently took 2h45m to fail). Any suggestion what might be the problem??
    i
    • 2
    • 2
  • b

    best-umbrella-24804

    03/27/2022, 9:46 PM
    Hi! I’m a senior data engineer trying to help my company get started with data hub I’ve been going through the “Deploying to AWS” guide for the past few days https://datahubproject.io/docs/deploy/aws I’ve completed all of the steps up to the end of “Expose endpoints using a load balancer” I can confirm that the application works with port forwarding, that the alb is operational and the ingress is present, but when I navigate to the URL in my browser I get “No website was found for the web address” error I've been scratching my head, googling and reverting to different versions. Really having trouble working it out. Any advice would be really appreciated.
    l
    e
    • 3
    • 8
  • t

    thousands-pencil-42773

    03/28/2022, 8:31 AM
    Hi I am a developer at Dell, trying to explore datahub for my company. Just a quick question... How much time does an average
    ./gradlew build
    takes
    s
    b
    i
    • 4
    • 13
  • g

    gentle-camera-33498

    03/28/2022, 6:18 PM
    Hello everyone, I am a Data Engineer exploring Datahub functionalities for my company. I found documentation about MAC (Metada Audit Change) but it's not clear to me if I can export this information. Is there any way to see Metadata changes on Datahub UI? An important characteristic for us is to track who made changes to an entity and when.
    i
    • 2
    • 5
  • c

    cuddly-lunch-28022

    03/29/2022, 4:57 AM
    Hello! I need field_level_lineage , if i undesrstand i have to use
    DatasetUpstreamLineage
    https://datahubproject.io/docs/rfc/active/1841-lineage/field_level_lineage/#! but i don't understand where and how to create it (in graphql) ? /** * ASPECT :: Fine Grained upstream lineage for fields in a dataset */ record DatasetUpstreamLineage { /** * Upstream to downstream field level lineage mappings */ fieldMappings: array[DatasetFieldMapping] }
    m
    h
    • 3
    • 10
  • m

    mammoth-fountain-32989

    03/29/2022, 9:23 AM
    Hi everyone, I am new to datahub and started exploring to use as data catalog for our org. Building a PoC using the docker setup on local, had the datahub up and running. Trying to ingest metadata from a Postgresql DB from UI by creating a recipe. Getting the below connection error: '[2022-03-29 082413,989] WARNING {urllib3.connectionpool:810} - Retrying (Retry(total=2, connect=None, read=None, redirect=None, ' "status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ee9465280>: Failed to " "establish a new connection: [Errno 111] Connection refused')': /config\n" '[2022-03-29 082417,993] WARNING {urllib3.connectionpool:810} - Retrying (Retry(total=1, connect=None, read=None, redirect=None, ' "status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ee9465670>: Failed to " "establish a new connection: [Errno 111] Connection refused')': /config\n" '[2022-03-29 082426,003] WARNING {urllib3.connectionpool:810} - Retrying (Retry(total=0, connect=None, read=None, redirect=None, ' "status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ee9465850>: Failed to " "establish a new connection: [Errno 111] Connection refused')': /config\n" '[2022-03-29 082427,169] ERROR {datahub.entrypoints:119} - File ' '"/tmp/datahub/ingest/venv-79dc2846-eaae-4e1d-9e23-bf3e5ae54a87/lib/python3.9/site-packages/urllib3/connection.py", line 174, in ' '_new_conn\n' ' 161 def _new_conn(self):\n' ' (...)\n' ' 170 if self.socket_options:\n' ' 171 extra_kw["socket_options"] = self.socket_options\n' ' 172 \n' ' 173 try:\n' '--> 174 conn = connection.create_connection(\n' ' 175 (self._dns_host, self.port), self.timeout, **ext Can someone please help. I am not sure if I am passing the sink URL correctly. Tried with "http://localhost:9002/api/gms" as well, but got same issue. Recipe is as below: source: type: postgres config: host_port: 'xxx.xx.xx.xxx:5432' database: xxxx username: xxx password: '${<secrete_name_given_in_secrets>}' include_tables: <table_name> include_views: true profiling: enabled: false sink: type: datahub-rest config: server: 'http://localhost:8080' Thanks
    b
    i
    +2
    • 5
    • 5
  • m

    miniature-park-34349

    03/30/2022, 10:45 AM
    Hi! I'm wondering if I could ask what might turn out to be a basic question. I'm slightly confused about the typical workflow a user would perform using a dataset. Let's say a user queries the DataHub UI and finds a dataset held on Postgres, for example https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:postgres,jaff[…]p.dbt_jaffle.customers,PROD)/Schema?is_lineage_mode=false How does the user know where that dataset is located in order to connect to it to query the data?
    m
    m
    b
    • 4
    • 10
  • b

    bright-jewelry-99677

    03/30/2022, 12:37 PM
    Hi, I have a basic question. We can grant permissions to edit column-level tags to data users. But, I would like to integrate a review process before finalizing tags. Let’s consider if we would like to manage tags about data security level and take advantage of the metadata as policy tags. For instance, a user can send a request to add a tag of data security level and then the security team reviews it. After getting approval, we can merge the request as github pull request. Is there any feature like this on DataHub?
    b
    b
    l
    • 4
    • 10
  • a

    able-rain-74449

    03/30/2022, 1:03 PM
    QQ: https://datahubproject.io/docs/deploy/kubernetes Does it need Pre-requisite statefulset/pods
    Copy code
    prerequisites-cp-schema-registry-cf79bfccf-kvjtv   2/2     Running     1          63m
    prerequisites-kafka-0                              1/1     Running     2          62m
    prerequisites-mysql-0                              1/1     Running     1          62m
    prerequisites-neo4j-community-0                    1/1     Running     0          52m
    prerequisites-zookeeper-0
    i
    • 2
    • 5
  • i

    icy-piano-35127

    03/30/2022, 10:26 PM
    Hi, i have a basic question. When i was trying to ingest some of my athena database data, it's appearing a container information when i'm exploring the dataset (image bellow). Is there any way to remove that? What can i do to remove that, should i ingest the data again?
    b
    • 2
    • 18
  • f

    fresh-memory-10355

    03/31/2022, 7:38 AM
    hey i wanted to ingest dataset on my postgres database to the datahub please let me know how to achive this
    h
    i
    • 3
    • 2
1...232425...80Latest