DataHub #ingestion

victorious-evening-88418

02/17/2023, 1:54 PM

Hi @ripe-eye-60209, I had the same problem in the past. I solved the issue upgrading to DataHub CLI version: 0.10.0 and commenting "workspace_id_pattern" in the recipe

thanks bear 1

lively-dusk-19162

02/23/2023, 2:09 AM

Hello, I am facing the below error when I was running the below command: ./gradlew build Could you please help me on this error?

lively-dusk-19162

03/01/2023, 3:10 PM

Could anyone please help me on this?

most-animal-32096

03/06/2023, 5:06 PM

So, for the follow-up, here attached is the archive of a very minimal sample Gradle project to import next to

datahub-client

one, to actually try metadata emission, through REST and Kafka. (NB: previously mentioned documentation misses the

emitter.close()

and doesn't mention the required Gradle dependencies)

datahub-client-sample.zip

🩺 1

numerous-scientist-83156

03/08/2023, 10:31 AM

I did some more digging around trying to find out why this was not working as expected. I found that if i changed the platform from

adslGen2

adlsg2

, both name and id, and my class' local variable, it would work as expected, with breadcrumbs and all (first picture) My coworker then mentioned that there are some predefined

dataplatforms

that could be found in data_platform.json, here I noticed that the delimiter for

adlsGen2

instead of

So just for fun I changed the platform back to use the predefined name

adlsGen2

but added a line that replaces all the

in the dataset urn with

and this also works as expected.. I've then looked through the code a bit more and found that the function

create_from_ids

, that's being used in the

make_dataset_urn_with_platform_instance

function, i made to always use the

as the delimiter in the name.. Is this working as intended? Is there a another function I should be using to generate the dataset_urn when it's from an

adlsGen2

thousands-printer-59538

03/15/2023, 10:31 AM

message has been deleted

witty-butcher-82399

03/20/2023, 12:03 PM

A quick search in the code shows the breaking changes have been implemented already https://github.com/datahub-project/datahub/blob/b526dc1ab6cd31cc235cd0edf87caacbba[…]metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_core.py Question solved 😅

wonderful-jordan-36532

03/22/2023, 11:18 AM

Not a particular platform actually, but Databricks ml models or mlflow would work best for us. Otherwise AWS or just via plain documentation

clean-scooter-32205

03/23/2023, 11:59 AM

Hi! when trying to ingest from databricks unity catalog, I get a

PERMISSION_DENIED

Only account admin can list metastores

. Is there a way to not require an account admin token? I would only be using a specific metastore id, and there’s no way the account admin would be ok on having an associated token lying around.

brash-caravan-14114

03/28/2023, 3:18 PM

I am experiencing the same error. I configured according to the documentation. Kafka-setup-job, gms, and frontend were all deployed successfully, using only IAM permissions. The topics were created on the cluster which proves the authentication works. However, when trying run an ingestion from the ui, I see the same error as @best-napkin-60434 in the datahub-actions pod. I have also tried to configure executor.yaml as suggested here, and replaced the user / password configuration with the JAAS configuration described here. I received the following error:

Copy code

KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Java JAAS configuration is not supported, see <https://github.com/edenhill/librdkafka/wiki/Using-SASL-with-librdkafka> for more information."}

Is it possible to use datahub-actions and authenticate using IAM? attaching executor.yaml Thanks!

executor.yaml

plus1 1

witty-butcher-82399

03/28/2023, 4:17 PM

Beyond the concern on the missing entities in the map, we have found a scenario where an

AssertionError

is thrown when doing the

_should_process

validation. I have created a PR fixing this case: https://github.com/datahub-project/datahub/pull/7702

proud-dusk-671

03/30/2023, 11:23 AM

Any updates on this? The docs are written in respect to Confluent and no information about the AWS MSK ingestion is provided

modern-france-82371

04/05/2023, 10:26 AM

Hi, I’m using version 0.10.0, I was setting both configured additional access policies and not. It also had the same. I think it’s a bug.

bland-barista-59197

04/13/2023, 7:46 PM

Hi Team, I’m getting same error. do you get any solution to address 404 error?

quiet-rain-16785

04/17/2023, 2:05 PM

hi guys any update on this i am doing the same...but getting nothing at datahub!! can you help me @dazzling-judge-80093 @quiet-television-68466

plus1 2

agreeable-table-54007

04/18/2023, 9:00 AM

@modern-artist-55754 Oh okay thanks for the info so if i want to ingest data from data factory i'd need to convert it into something else. But then also in csv : resource subresource glossary_terms tags ownersownership_type description domain Are there only these columns or can we add more ? And is the json schema useful for ingesting csv files ? Is this yml structure correct for a csv file ? source: type: "file" config: format: "csv" path: "/path/to/your/data.csv" sink: type: "datahub-rest" config: server: "http://localhost:8080" Thanks.

damp-lighter-99739

04/18/2023, 2:28 PM

Hi team, Could someone help with this please

wonderful-jordan-36532

04/24/2023, 6:53 AM

How did you resolve bypassing the 2FA requirement for Tableau ingestion? @brave-france-7945

quiet-television-68466

04/25/2023, 11:49 AM

Really sorry to message this in the chat, but still looking for a bit more help if anyone has any ideas!

adorable-magazine-49274

04/25/2023, 12:01 PM

Is there anyone can help me?

bulky-lunch-41113

04/27/2023, 3:18 AM

message has been deleted

numerous-byte-87938

05/01/2023, 5:29 PM

Gentle bump in case it was missed 😃

fresh-dusk-60832

05/02/2023, 2:15 PM

did anyone have this problem?

flaky-refrigerator-97518

05/09/2023, 2:51 AM

0.10.2 quickstart (docker-compose) Logs already shared : org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index Error occurs when I add new custom entity

bland-barista-59197

05/11/2023, 7:39 AM

Hi @dazzling-judge-80093 I think this we can be reproduced by 1. Set

project_on_behalf

other than scanning project e.g. bq-project-1. 2. Added two dataset in bq-project-1. one has partition key and other does not. Solution: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py#L923 should be something like this `bq_sql = f"SELECT * FROM

{schema}

.`{table}`"`

ripe-helmet-49084

05/31/2023, 9:47 AM

Hi All, Can someone suggest on this please.

fierce-agent-11572

05/31/2023, 2:12 PM

thank you very much 🙏

astonishing-father-13229

05/31/2023, 3:58 PM

Thanks for all the help, it's working now Thanks Steve and Bagwan 😎

👍 1

hundreds-airline-29192

06/02/2023, 8:19 AM

quick start with specify version solve my problems

ripe-helmet-49084

06/02/2023, 10:57 AM

HI @gentle-hamburger-31302 can you please help me to get the table level lineage from MYSQL to Redash, now sure what I am missing here. I am using below recipe. source: type: "redash" config: connect_uri: "****" api_key: "****" dashboard_patterns: allow: - "test_dashboard" chart_patterns: allow: - "test_chart_*" parse_table_names_from_sql: true sink: type: "datahub-rest" config: server: "http://****:8080"