https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • q

    quiet-pilot-28237

    10/19/2021, 6:44 AM
    image.png
  • f

    freezing-teacher-87574

    10/25/2021, 8:50 AM
    Hello. How I can connect recipe trough ssl to superset? and setting frontend to navegate to Superset? Thanks .
  • p

    powerful-manchester-27331

    10/26/2021, 6:25 PM
    hello team, getting below error any idea?
  • v

    victorious-dream-46349

    10/27/2021, 4:38 PM
    Please refer this question ingestion relation question I asked in #troubleshoot channel
  • r

    red-pizza-28006

    10/29/2021, 9:20 AM
    thanks, I tried setting that up but now running into this issue, any ideas what am I missing? AD is authenticating the user correctly though
    Copy code
    Caused by: java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_client
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
        at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:56)
        ... 6 common frames omitted
    Caused by: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_client
        at auth.sso.oidc.custom.CustomOidcAuthenticator.validate(CustomOidcAuthenticator.java:155)
        at auth.sso.oidc.custom.CustomOidcAuthenticator.validate(CustomOidcAuthenticator.java:39)
        at org.pac4j.core.client.BaseClient.retrieveCredentials(BaseClient.java:71)
        at org.pac4j.core.client.IndirectClient.getCredentials(IndirectClient.java:140)
        at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:89)
        at auth.sso.oidc.OidcCallbackLogic.perform(OidcCallbackLogic.java:87)
        at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:62)
        at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:49)
        at org.pac4j.play.CallbackController.lambda$callback$0(CallbackController.java:56)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        ... 7 common frames omitted
  • a

    agreeable-hamburger-38305

    11/04/2021, 6:35 AM
    Cross posting from #ui , want to hear how people using Google services deal with batch user ingestion
  • e

    eager-answer-71364

    11/08/2021, 7:16 AM
    Any ideas?
  • f

    full-area-6720

    11/17/2021, 6:14 AM
    One, more question, is foreign key data ingestable for redshift?
  • o

    orange-flag-48535

    12/09/2021, 6:47 AM
    Feature request for a "datahub ingest rollback --latest" command - https://feature-requests.datahubproject.io/b/User-Experience/p/cli-option-for-rollback-of-latest-ingestion-run
  • g

    gentle-florist-49869

    12/13/2021, 9:38 PM
    Hello, is it a simple question - if someone can help it will be great
  • g

    gentle-florist-49869

    12/13/2021, 9:39 PM
    # A sample recipe that pulls metadata from MSSQL and puts it into DataHub # using the Rest API. source: type: mssql config: username: sa password: ${MSSQL_PASSWORD} database: DemoData transformers: - type: "fully-qualified-class-name-of-transformer" config: some_property: "some.value" sink: type: "datahub-rest" config: server: "http://localhost:8080"
  • w

    witty-butcher-82399

    12/14/2021, 2:04 PM
    I’m testing DBT connector and we are also having issues because of this restriction on the metadata files being in local. In general, connectors fetch/pull metadata from the source system and publish it into the datacatalog. However, in the case of DBT connector, metadata files are not in the domain of the source system but local. So in practice, DBT connector requires the source owner to deliver/pull the manifest and catalog files to the domain of the data catalog. Instead, we should avoid this file exchange and as usual, directly fetch the metadata files from somewhere in the domain of the source system. A simple way to make the DBT connector following the same approach as other connectors could be: metadata files are URIs so they are somewhere in the domain of the source system. WDYT? Any other experiences with the DBT connector?
  • g

    gentle-florist-49869

    12/14/2021, 4:07 PM
    Does anyone alreay tested to filter and ingesting specific table from MySQL, please? I'm using the recipe below
  • g

    gray-table-56299

    12/14/2021, 6:09 PM
    any updates on this? are there any plans to provide ingestion libraries in java in the future?
  • m

    miniature-eve-89383

    12/15/2021, 8:23 PM
    Is there a reason why the
    datahub_docker.sh
    script only accepts relative paths for the
    -c
    option?
  • a

    abundant-photographer-45796

    12/24/2021, 5:49 AM
    • same as@colossal-furniture-76714,Do you have any suggestions?
  • b

    big-coat-53708

    12/28/2021, 9:16 AM
    Hi @boundless-student-48844, I saw this discussion while I was evaluating the difference between DataHub and Amundsen. This ingestion duration you provided is a little bit shocking. Currently, our team is using Amundsen and each Hive ingestion is able to be finished under 30 min for
    40K tables
    and
    2.5 million columns
    . Our approach actually combined the two methods you mentioned. 1. Amundsen ingestion framework query Hive Metastore directly 2. We also applied partition and multiprocessing based on it I’m not saying that Amundsen is better than DataHub. I just want to share that Amundsen is running a more efficient approach on this one, and maybe DataHub could learn from it. More details about how we did the partition is written in this issue.
    thank you 1
  • r

    rich-policeman-92383

    12/30/2021, 9:31 AM
    Does running pipeline with pipeline_name specified enable Pipelines on the UI. I have tried running this pipeline but nothing named pipeline is visible on UI.
    👀 1
    plus1 1
  • l

    loud-holiday-22352

    01/05/2022, 7:11 AM
    @nice-planet-17111
  • g

    gentle-sundown-2310

    01/05/2022, 6:41 PM
    It was working fine for the first time that I tried but keep getting the error afterwards.
  • t

    thankful-businessperson-69424

    01/11/2022, 10:51 AM
    hi, Is there an example of elasticsearch source configuration?
  • b

    breezy-controller-54597

    01/12/2022, 4:57 AM
    I used Transformer's pattern_add_dataset_tags to add tags, but it only gave me a single tag even if it applied to multiple patterns in the rules. It would be nice if there was a function that would give multiple tags when multiple conditions are met.
  • r

    red-pizza-28006

    01/12/2022, 1:06 PM
    @dazzling-judge-80093
  • d

    damp-queen-61493

    01/19/2022, 9:26 PM
    Can I run transform against dataset fields?
  • a

    acoustic-wolf-70583

    01/21/2022, 12:54 AM
    @big-carpet-38439 Is there a way I can check what was published to this topic ? MetadataChangeEvent_v4 . I tried using an external consumer, but having issues.
  • b

    blue-boots-43993

    01/27/2022, 6:26 PM
    Is having whitespace in urn "illegal"? This is one of potential reasons for unsuccessful lineage graph
  • l

    lemon-hydrogen-83671

    01/28/2022, 6:17 PM
    Hey folks, i was wondering how others were dealing with kafka to kafka based lineages. I was thinking of setting up something with the python lineage emitter but it seemed kind of hackey. Would love to hear how others have approached it
  • f

    full-leather-27343

    01/28/2022, 10:50 PM
    hello, for bigquery it seams that the stats for views are not getting imported. Is this normal behaviour? I would love to see if also the views are used and who uses it, not only for the tables.
  • b

    brief-apartment-60236

    01/31/2022, 5:45 PM
    Copy code
    properties = {
        "High LWM": "2022-01-19 04:00:00"
    }
    
    platform = 'hive'
    env = 'PROD'
    
    dataset_name = 'integrated_xx.yy'
    
    datasetUrn = f"urn:li:dataset:(urn:li:dataPlatform:{platform},{dataset_name},{env})"
    
    # todo: I want to get the existing set of properties and upsert my 'properties' value and then write back to datahub
    
    dataset_snapshot = DatasetSnapshot(
        urn=datasetUrn,  aspects=[],
    )
    
    # Construct a dataset properties object.
    dataset_properties = DatasetProperties(customProperties=properties)
    
    dataset_snapshot.aspects.append(dataset_properties)
    
    # Construct a MetadataChangeProposalWrapper object.
    properties_mcp = MetadataChangeProposalWrapper(
        entityType="dataset",
        changeType=ChangeTypeClass.UPSERT,
        entityUrn=datasetUrn,
        aspectName="datasetProperties",
        aspect=dataset_properties,
    )
    
    
    Restemitter = DatahubRestEmitter("<http://localhost:8080>")
    
    Restemitter.emit_mcp(properties_mcp)
  • w

    white-animal-39458

    02/02/2022, 1:40 AM
    Hi everyone, has anyone ingested metadata to gather dataset-like information about inhouse OpenApi Endpoints ? like to connect for more questions. (https://datahubproject.io/docs/metadata-ingestion/source_docs/openapi/)
1...134135136...144Latest