https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • p

    polite-application-51650

    05/26/2022, 6:55 AM
    can anyone tell me how to use Datahub Action integration with Big Query and Great Expectations. @big-carpet-38439
    h
    t
    • 3
    • 5
  • g

    gorgeous-optician-32034

    05/26/2022, 4:37 PM
    Just a note on deleting: you can use
    stateful_ingestion
    and configure it to delete datasets that don't show up in the latest ingestion. For example, see the
    stateful_*
    configuration options on sources like hive. Loving datahub!
    b
    • 2
    • 2
  • l

    lemon-hydrogen-83671

    05/26/2022, 6:26 PM
    TIL the hard way that the
    simple_add_dataset_tags
    transformer will replace any existing tags you have on a dataset. Keep an eye out for that!
    b
    f
    • 3
    • 6
  • n

    nutritious-bird-77396

    05/26/2022, 10:25 PM
    @big-carpet-38439 I see all the Ingestion sources that was added through the frontend Ingestion has been running twice for the same scheduled time. As far as the entry in the table I see only 1 entry per source
    Copy code
    select * from metadata_aspect_v2 where urn like 'urn:li:dataHubIngestionSource:%' and version = 0 and aspect = 'dataHubIngestionSourceInfo'
    Do you know what else could trigger this duplicate execution?
    b
    h
    • 3
    • 17
  • c

    clean-piano-28976

    05/27/2022, 10:01 AM
    Hi all 👋 I’m seeing some weird behaviour when trying to delete all dataset for dbt using the command below I can see the right number of entities to delete is being detected correctly however when checking the UI the data is still available (thought this could be down to timing or cache but is doesn’t look like) — Screenshots in thread
    Copy code
    datahub delete --env PROD --entity_type dataset --platform dbt --hard
    m
    i
    • 3
    • 23
  • c

    calm-dinner-63735

    05/28/2022, 6:52 AM
    can i ingest business glossary through datahub UI
    b
    • 2
    • 1
  • p

    polite-application-51650

    05/30/2022, 5:23 AM
    Hi all, can anyone tell me what all informations can be ingested using the Java/ Python emitter provided by Datahub?
    d
    h
    • 3
    • 4
  • c

    cuddly-arm-8412

    05/30/2022, 8:50 AM
    hi,team,!I want to initialize our users to the datahub in the ingestion module. How should I build it with corpUser,Are there any encapsulation methods in metadata-ingestion module
    d
    m
    • 3
    • 10
  • s

    salmon-angle-92685

    05/30/2022, 1:25 PM
    Hey guys, I am having trouble creating the lineages. I want to create a lineage between my S3 tables and my Redshift tables which have the same name. Do you guys have an example of how I could do that ? The dataset-dataset lineage example they give on Git is too simple... I can't make it work. Thank you so much!
    d
    • 2
    • 6
  • m

    modern-laptop-12942

    05/30/2022, 2:48 PM
    Hi all. I can ingest from snowflake using UI but it failed using CLI. Anyone has any idea? Thank you.
    d
    • 2
    • 13
  • w

    witty-butcher-82399

    05/30/2022, 4:55 PM
    Hi DH community! I have a couple of use cases where I require to apply a function to all datasets (eg set some aspect). One option could be to define such a function as a transform and apply the transform in all connectors and ingestions. However, that would left out all datasets ingested via push. Also, in some use case I may require some aggregated view of the datasets. So: Is DH providing some interface to do such batch processing? or any plan for that? Has anyone here solved this use case before? Thanks!
    b
    l
    m
    • 4
    • 8
  • a

    alert-football-80212

    05/31/2022, 11:52 AM
    Hi all, there is a command/api to check if dataset exist in datahub?
    b
    i
    +2
    • 5
    • 9
  • g

    glamorous-microphone-33484

    06/01/2022, 1:27 AM
    Hi all, Need the help from the folks here as I am not very familar with restli. Was trying to ingest some dataset by calling the method "ACTION_INGEST" in (https://github.com/datahub-project/datahub/blob/master/metadata-service/restli-ser[…]java/com/linkedin/metadata/resources/entity/EntityResource.java). However, the code now is not able to capture the actor/user making the call to the method. Where should we make the necessary changes to include auditStamp information?
    🙏 1
    m
    e
    +3
    • 6
    • 24
  • n

    numerous-diamond-76461

    06/01/2022, 3:54 AM
    I ingest data from hive and debug log of datahub-action below. I run datahub in company with proxy. What should i do? Tks
    b
    • 2
    • 1
  • l

    lemon-zoo-63387

    06/01/2022, 7:51 AM
    What permissions do I need to open before I can see the schema, table and column in the dataset
    b
    • 2
    • 4
  • l

    lemon-zoo-63387

    06/01/2022, 8:30 AM
    No matter which one is selected, it is always loading why,Some plug-ins failed to load successfully????
    e
    • 2
    • 3
  • l

    lemon-zoo-63387

    06/01/2022, 9:15 AM
    We are trying to ingest Oracle database,
    Copy code
    sudo python3 -m pip install cx_Oracle --upgrade
    sudo python3 -m pip install cx_Oracle --upgrade --user
    e
    • 2
    • 4
  • c

    cuddly-arm-8412

    06/01/2022, 12:37 PM
    hi,team.I would like to ask if glossary_terms only two levels are supported? I would like to ask whether it supports multi-level
    e
    • 2
    • 2
  • w

    worried-painting-70907

    06/01/2022, 2:13 PM
    So, looking at datahub I was wondering what would be the best way to handle the following scenario with ingestion: I have a kafka deployment on kubernetes through helm charts. I am declaring it like so:
    Copy code
    apiVersion: <http://kafka.strimzi.io/v1beta2|kafka.strimzi.io/v1beta2>
    kind: KafkaTopic
    metadata:
      name: my-topic
      labels:
        <http://strimzi.io/cluster|strimzi.io/cluster>: my-cluster
        <http://infra.mycompany.net/dev_team|infra.mycompany.net/dev_team>: dev_team_a
        <http://infra.mycompany.net/ops_team|infra.mycompany.net/ops_team>: ops_team_a
        <http://infra.mycompany.net/app_id|infra.mycompany.net/app_id>: app-0001
    spec:
      partitions: 10
      replicas: 3
      config:
        <http://retention.ms|retention.ms>: 604800000
        segment.bytes: 1073741824
    What would be the best way that I can either get this data into the datahub system? I dont know if it would be best to go and make commits to the kafka integration that supports connecting to kubernetes and getting the metadata, or if there's a better way? Kind of a huge edge case considering the metadata is in the kubernetes layer, but that's the only way we found to manage these kinds of metadata
    e
    • 2
    • 16
  • d

    dry-zoo-35797

    06/01/2022, 3:16 PM
    Hello, Anybody was successful connecting to Synapse using MSSQL plugin? I could not get it working. I am using “ODBC Driver 17 for SQL Server”. I would appreciate if you could share your experience. Thanks, Mahbub
    h
    b
    +2
    • 5
    • 10
  • g

    gorgeous-telephone-63628

    06/01/2022, 7:19 PM
    I am looking to be able to add the ability to insert a custom property as part of ingestion. Could someone point me towards the code for adding a config option to ingestion?
    e
    • 2
    • 1
  • w

    worried-painting-70907

    06/01/2022, 7:53 PM
    General question about ingestion and tagging - is the idea that the tags like owner, domain, documentation etc are all manually added in the datahub UI?
    e
    • 2
    • 1
  • r

    rich-policeman-92383

    06/02/2022, 5:12 AM
    During oracle ingestion we are getting below errors. Please suggest.
    Copy code
    used by: java.lang.RuntimeException: Failed to validate entity URN '
    'urn:li:dataset:(urn:li:dataPlatform:oracle,b0225565.PORTIN_,MAR22,PROD)\n'
    '\tat com.linkedin.metadata.utils.EntityKeyUtils.getUrnFromProposal(EntityKeyUtils.java:33)\n'
    '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)\n'
    '\t... 78 more\n'
    'Caused by: java.lang.IllegalArgumentException: Failed to convert urn to entity key: urns parts and key fields '
    'do not have same length\n'
    '\tat com.linkedin.metadata.utils.EntityKeyUtils.convertUrnToEntityKey(EntityKeyUtils.java:97)\n'
    '\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n'
    '\tat java.lang.Thread.run(Thread.java:748)\n'
    'Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n'
    '\tat com.datahub.util.RecordUtils.invokeProtectedMethod(RecordUtils.java:355)\n'
    '\tat com.datahub.util.RecordUtils.getRecordTemplateField(RecordUtils.java:258)\n'
    '\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)\n'
    '\t... 84 more\n'
    'Caused by: java.lang.reflect.InvocationTargetException\n'
    '\tat sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)\n'
    '\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n'
    '\tat com.datahub.util.RecordUtils.invokeProtectedMethod(RecordUtils.java:353)\n'
    '\t... 90 more\n'
    'Caused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Invalid number of '
    'keys.: urn:li:dataset:(urn:li:dataPlatform:oracle,b0225565.PORTIN_,MAR22,PROD)\n'
    '\tat com.linkedin.common.urn.DatasetUrn$1.coerceOutput(DatasetUrn.java:78)\n'
    '\tat com.linkedin.data.template.RecordTemplate.obtainCustomType(RecordTemplate.java:366)\n'
    '\t... 94 more\n'
    'Caused by: java.net.URISyntaxException: Invalid number of keys.: '
    'urn:li:dataset:(urn:li:dataPlatform:oracle,b0225565.PORTIN_,MAR22,PROD)\n'
    '\tat com.linkedin.common.urn.DatasetUrn.createFromUrn(DatasetUrn.java:49)\n'
    '\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n'
    '\tat java.lang.Thread.run(Thread.java:748)\n'
    'Caused by: java.lang.RuntimeException: Failed to validate entity URN '
    'urn:li:dataset:(urn:li:dataPlatform:oracle,b0225565.PORTIN_,MAR22,PROD)\n'
    '\tat com.linkedin.metadata.utils.EntityKeyUtils.getUrnFromProposal(EntityKeyUtils.java:33)\n'
    '\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)\n'
    '\t... 78 more\n'
    'Caused by: java.lang.IllegalArgumentException: Failed to convert urn to entity key: urns parts and key fields '
    'do not have same length\n'
    '\tat com.linkedin.metadata.utils.EntityKeyUtils.convertUrnToEntityKey(EntityKeyUtils.java:97)\n'
    h
    • 2
    • 3
  • l

    lemon-zoo-63387

    06/02/2022, 7:37 AM
    Could you please help me to answer questions? how to display mssql icon in new ingestion source page,This command has been executed
    Copy code
    sudo python3 -m pip install 'acryl-datahub[mssql]'
    e
    • 2
    • 4
  • d

    dry-zoo-35797

    06/02/2022, 1:33 PM
    Hello Members, I do not see any plugin for Azure Databricks. Anyone has any experience of connecting to Databricks using out-of-the-box or custom plugin? Thanks, Mahbub
    h
    s
    • 3
    • 3
  • l

    lemon-alarm-94169

    06/02/2022, 2:36 PM
    Hi all, I’ve read about Snowflake column lineage support here, but I don’t see it even with
    include_table_lineage: True
    on Column Stats tab. Is it a feature exclusive to Acryl data?
    m
    • 2
    • 3
  • l

    lemon-alarm-94169

    06/02/2022, 2:41 PM
    Also I’m having difficulty showing lineage from s3 to snowflake table though all the lineage between snowflake tables are shown fine. Can DataHub reflect the ‘copy into’ query to show its lineage? Thanks in advance.
    m
    • 2
    • 4
  • f

    flat-window-44654

    06/02/2022, 8:28 PM
    Hi there! Quick question about ingesting metadata from Looker into DataHub. We would need to ingest additional metadata from Looker than DataHub extracts (specifically we need dashboard
    view_count
    ,
    updated_at
    ,
    last_updater_name
    , etc.). How could we go about ingesting that additional metadata? Would we have to write a transformer?
    h
    • 2
    • 1
  • n

    nutritious-bird-77396

    06/02/2022, 9:57 PM
    Hi Team, I would like to pull the default datahub credentials that is currently available thru
    user.props
    to be more secure, Either pulled in via ENV var or otherwise...Could you help on this?
    • 1
    • 3
  • s

    sparse-raincoat-42898

    06/03/2022, 11:09 AM
    Hello, I am running datahub in Azure Kubernetes Services(AKS) using Helm chart. I can access the UI but I am not able to ingest the data. How do I find out REST endpoint for ingestion? Here is my gms pod and service up and running.
    b
    • 2
    • 1
1...444546...144Latest