DataHub #ingestion

Join Slack

quiet-pilot-28237

10/19/2021, 6:44 AM

image.png

freezing-teacher-87574

10/25/2021, 8:50 AM

Hello. How I can connect recipe trough ssl to superset? and setting frontend to navegate to Superset? Thanks .

powerful-manchester-27331

10/26/2021, 6:25 PM

hello team, getting below error any idea?

victorious-dream-46349

10/27/2021, 4:38 PM

Please refer this question ingestion relation question I asked in #troubleshoot channel

red-pizza-28006

10/29/2021, 9:20 AM

thanks, I tried setting that up but now running into this issue, any ideas what am I missing? AD is authenticating the user correctly though

Copy code

Caused by: java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_client
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
    at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:56)
    ... 6 common frames omitted
Caused by: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_client
    at auth.sso.oidc.custom.CustomOidcAuthenticator.validate(CustomOidcAuthenticator.java:155)
    at auth.sso.oidc.custom.CustomOidcAuthenticator.validate(CustomOidcAuthenticator.java:39)
    at org.pac4j.core.client.BaseClient.retrieveCredentials(BaseClient.java:71)
    at org.pac4j.core.client.IndirectClient.getCredentials(IndirectClient.java:140)
    at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:89)
    at auth.sso.oidc.OidcCallbackLogic.perform(OidcCallbackLogic.java:87)
    at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:62)
    at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:49)
    at org.pac4j.play.CallbackController.lambda$callback$0(CallbackController.java:56)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    ... 7 common frames omitted

agreeable-hamburger-38305

11/04/2021, 6:35 AM

Cross posting from #ui , want to hear how people using Google services deal with batch user ingestion

eager-answer-71364

11/08/2021, 7:16 AM

Any ideas?

full-area-6720

11/17/2021, 6:14 AM

One, more question, is foreign key data ingestable for redshift?

orange-flag-48535

12/09/2021, 6:47 AM

Feature request for a "datahub ingest rollback --latest" command - https://feature-requests.datahubproject.io/b/User-Experience/p/cli-option-for-rollback-of-latest-ingestion-run

gentle-florist-49869

12/13/2021, 9:38 PM

Hello, is it a simple question - if someone can help it will be great

gentle-florist-49869

12/13/2021, 9:39 PM

# A sample recipe that pulls metadata from MSSQL and puts it into DataHub # using the Rest API. source: type: mssql config: username: sa password: ${MSSQL_PASSWORD} database: DemoData transformers: - type: "fully-qualified-class-name-of-transformer" config: some_property: "some.value" sink: type: "datahub-rest" config: server: "http://localhost:8080"

witty-butcher-82399

12/14/2021, 2:04 PM

I’m testing DBT connector and we are also having issues because of this restriction on the metadata files being in local. In general, connectors fetch/pull metadata from the source system and publish it into the datacatalog. However, in the case of DBT connector, metadata files are not in the domain of the source system but local. So in practice, DBT connector requires the source owner to deliver/pull the manifest and catalog files to the domain of the data catalog. Instead, we should avoid this file exchange and as usual, directly fetch the metadata files from somewhere in the domain of the source system. A simple way to make the DBT connector following the same approach as other connectors could be: metadata files are URIs so they are somewhere in the domain of the source system. WDYT? Any other experiences with the DBT connector?

gentle-florist-49869

12/14/2021, 4:07 PM

Does anyone alreay tested to filter and ingesting specific table from MySQL, please? I'm using the recipe below

gray-table-56299

12/14/2021, 6:09 PM

any updates on this? are there any plans to provide ingestion libraries in java in the future?

miniature-eve-89383

12/15/2021, 8:23 PM

Is there a reason why the

datahub_docker.sh

script only accepts relative paths for the

-c

option?

abundant-photographer-45796

12/24/2021, 5:49 AM

• same as@colossal-furniture-76714,Do you have any suggestions?

big-coat-53708

12/28/2021, 9:16 AM

Hi @boundless-student-48844, I saw this discussion while I was evaluating the difference between DataHub and Amundsen. This ingestion duration you provided is a little bit shocking. Currently, our team is using Amundsen and each Hive ingestion is able to be finished under 30 min for

40K tables

and

2.5 million columns

. Our approach actually combined the two methods you mentioned. 1. Amundsen ingestion framework query Hive Metastore directly 2. We also applied partition and multiprocessing based on it I’m not saying that Amundsen is better than DataHub. I just want to share that Amundsen is running a more efficient approach on this one, and maybe DataHub could learn from it. More details about how we did the partition is written in this issue.

thank you 1

rich-policeman-92383

12/30/2021, 9:31 AM

Does running pipeline with pipeline_name specified enable Pipelines on the UI. I have tried running this pipeline but nothing named pipeline is visible on UI.

👀 1

plus1 1

loud-holiday-22352

01/05/2022, 7:11 AM

@nice-planet-17111

gentle-sundown-2310

01/05/2022, 6:41 PM

It was working fine for the first time that I tried but keep getting the error afterwards.

thankful-businessperson-69424

01/11/2022, 10:51 AM

hi, Is there an example of elasticsearch source configuration?

breezy-controller-54597

01/12/2022, 4:57 AM

I used Transformer's pattern_add_dataset_tags to add tags, but it only gave me a single tag even if it applied to multiple patterns in the rules. It would be nice if there was a function that would give multiple tags when multiple conditions are met.

red-pizza-28006

01/12/2022, 1:06 PM

@dazzling-judge-80093

damp-queen-61493

01/19/2022, 9:26 PM

Can I run transform against dataset fields?

acoustic-wolf-70583

01/21/2022, 12:54 AM

@big-carpet-38439 Is there a way I can check what was published to this topic ? MetadataChangeEvent_v4 . I tried using an external consumer, but having issues.

blue-boots-43993

01/27/2022, 6:26 PM

Is having whitespace in urn "illegal"? This is one of potential reasons for unsuccessful lineage graph

lemon-hydrogen-83671

01/28/2022, 6:17 PM

Hey folks, i was wondering how others were dealing with kafka to kafka based lineages. I was thinking of setting up something with the python lineage emitter but it seemed kind of hackey. Would love to hear how others have approached it

full-leather-27343

01/28/2022, 10:50 PM

hello, for bigquery it seams that the stats for views are not getting imported. Is this normal behaviour? I would love to see if also the views are used and who uses it, not only for the tables.

brief-apartment-60236

01/31/2022, 5:45 PM

Copy code

properties = {
    "High LWM": "2022-01-19 04:00:00"
}

platform = 'hive'
env = 'PROD'

dataset_name = 'integrated_xx.yy'

datasetUrn = f"urn:li:dataset:(urn:li:dataPlatform:{platform},{dataset_name},{env})"

# todo: I want to get the existing set of properties and upsert my 'properties' value and then write back to datahub

dataset_snapshot = DatasetSnapshot(
    urn=datasetUrn,  aspects=[],
)

# Construct a dataset properties object.
dataset_properties = DatasetProperties(customProperties=properties)

dataset_snapshot.aspects.append(dataset_properties)

# Construct a MetadataChangeProposalWrapper object.
properties_mcp = MetadataChangeProposalWrapper(
    entityType="dataset",
    changeType=ChangeTypeClass.UPSERT,
    entityUrn=datasetUrn,
    aspectName="datasetProperties",
    aspect=dataset_properties,
)


Restemitter = DatahubRestEmitter("<http://localhost:8080>")

Restemitter.emit_mcp(properties_mcp)

white-animal-39458

02/02/2022, 1:40 AM

Hi everyone, has anyone ingested metadata to gather dataset-like information about inhouse OpenApi Endpoints ? like to connect for more questions. (https://datahubproject.io/docs/metadata-ingestion/source_docs/openapi/)