https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • w

    wooden-chef-22394

    07/27/2022, 8:22 AM
    How to ingest a non-password Clickhouse datasource?
    source:
    type: clickhouse
    config:
    # Coordinates
    host_port: ******:8123
    # Credentials
    username: default
    password:
    # Options
    platform_instance: DatabaseNameToBeIngested
    include_views: True # whether to include views, defaults to True
    include_tables: True # whether to include views, defaults to True
    sink:
    type: datahub-rest
    config:
    server: http://*****:18080
    It said 'password is incorrect'.
    DatabaseException: Orig exception: Code: 516. DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name. (AUTHENTICATION_FAILED) (version 22.5.1.2079 (official build))
    c
    • 2
    • 2
  • m

    mysterious-nail-70388

    07/27/2022, 9:21 AM
    Hi, I first downloaded the DataHub code for version 0.8.40, then went into Build war.war and replaced the WAR in the GMS container, which produced the following error. ./gradlew metadata servicewar:build -x test
  • h

    hallowed-lawyer-5424

    07/27/2022, 5:24 PM
    Hi Team, I am trying to ingest datasets/pipelines using our custom Kafka events to DataHub's Kafka using the Kafka injection recipes which is mentioned in the official docs. Can anyone suggest whether I am doing the right approach or do I need to to rely on Java/Python emitter code to ingest the metadata by checking our custom Kafka. Or is it possible to inject the data without implementing any code and just by creating the ingestion recipes?
    m
    t
    • 3
    • 8
  • m

    mysterious-pager-59554

    07/27/2022, 7:28 PM
    Hello Team Has anybody here worked on Datahub's integration with great-expectations : i.e pushing the validation results of great expectations to datahub for a CSV /Parquet file. I could only accomplish this for SQL alike data sources (i.e BigQuery)
    m
    h
    • 3
    • 2
  • e

    echoing-alligator-70530

    07/27/2022, 7:47 PM
    Hello Everyone, Is there a way to ingestion lookML data strictly using the api and not specifying a repo for it? If we have to specify a repo is there a way so that it is a URL and not a path?
    m
    • 2
    • 4
  • g

    gifted-knife-16120

    07/27/2022, 7:31 PM
    Hi Team, I got this error after I add below expectation using GreatExpectation
    Copy code
    ,
        {
          "expectation_context": {
            "description": null
          },
          "expectation_type": "expect_column_values_to_match_regex",
          "kwargs": {
            "column": "account_type",
            "regex": "^(CLIENT)$"
          },
          "meta": {}
        }
    m
    • 2
    • 3
  • c

    cool-vr-73109

    07/28/2022, 8:25 AM
    Hello everyone, below two are the configuration for s3 data ingestion and file lineage ingestion for the same s3 ingestion two enable lineage tab.But the lineage ingestion is taking as a new ingestion and it's not enabling my lineage tab for the s3 ingestion. Please suggest anything I am doing wrong here?
    c
    • 2
    • 1
  • b

    brainy-intern-50400

    07/28/2022, 12:44 PM
    Hi everyone, I am trying to emit Metadata with the Python emitter. Is there a validator for Metadata Change Events? I could not find a hint in the sources or in the docu
    b
    • 2
    • 2
  • s

    square-hair-99480

    07/28/2022, 1:50 PM
    Hello friends, quick question. In my organisation we have multiple Snowflake accounts. I have being running a POC ingesting metadata for one account but I was asked if having multiple ones can I filter the snoflake by account. For instance, I see we have a path like
    prod/snowflake/database/schema/table
    but what if I ingest metadata from another account would I get something like
    prod/snowflake/account/database/schema/table
    ?
    c
    • 2
    • 2
  • a

    alert-fall-82501

    07/28/2022, 2:19 PM
    after ingesting yaml file I am getting these issues and data table was not pushed on datahub .can anyone help me with this ?
  • d

    dazzling-insurance-83303

    07/28/2022, 2:30 PM
    Seeking input on
    allow_deny_patterns
    from wider community… please see my thread from the below starting point. Thanks! https://datahubspace.slack.com/archives/CUMUWQU66/p1658979688399899?thread_ts=1658807355.706189&cid=CUMUWQU66
  • s

    shy-chef-10188

    07/28/2022, 2:57 PM
    Hi Team, Is there a way to enable queries tab on hive datasets. We capture table usage from hive-metastore using hooks and we would like push this information to datahub. I was able to ingest some sample data using file source but the queries tab is still disabled on UI. Can you please suggest how to enable queries tab in UI on datasets ingested using file source ? sample data ingest using file source
    Copy code
    {
      "auditHeader": null,
      "entityType": "dataset",
      "entityUrn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
      "entityKeyAspect": null,
      "changeType": "UPSERT",
      "aspectName": "datasetUsageStatistics",
      "aspect": {
        "value": "{\"timestampMillis\": 1623888000000, \"eventGranularity\": {\"unit\": \"DAY\", \"multiple\": 1}, \"partitionSpec\": {\"type\": \"FULL_TABLE\", \"partition\": \"FULL_TABLE_SNAPSHOT\"}, \"uniqueUserCount\": 1, \"totalSqlQueries\": 2, \"topSqlQueries\": [\"select * from `test`\"], \"userCounts\": [{\"user\": \"urn:li:corpuser:xx\", \"count\": 2, \"userEmail\": \"xx\"}], \"fieldCounts\": [{\"fieldPath\": \"complaint_description\", \"count\": 2}, {\"fieldPath\": \"last_update_date\", \"count\": 2}, {\"fieldPath\": \"complaint_type\", \"count\": 2}, {\"fieldPath\": \"unique_key\", \"count\": 2}, {\"fieldPath\": \"source\", \"count\": 1}, {\"fieldPath\": \"city\", \"count\": 1}, {\"fieldPath\": \"map_tile\", \"count\": 1}, {\"fieldPath\": \"longitude\", \"count\": 1}, {\"fieldPath\": \"state_plane_y_coordinate\", \"count\": 1}, {\"fieldPath\": \"map_page\", \"count\": 1}, {\"fieldPath\": \"status_change_date\", \"count\": 1}, {\"fieldPath\": \"latitude\", \"count\": 1}, {\"fieldPath\": \"incident_zip\", \"count\": 1}, {\"fieldPath\": \"status\", \"count\": 1}, {\"fieldPath\": \"created_date\", \"count\": 1}, {\"fieldPath\": \"county\", \"count\": 1}, {\"fieldPath\": \"owning_department\", \"count\": 1}, {\"fieldPath\": \"street_name\", \"count\": 1}, {\"fieldPath\": \"close_date\", \"count\": 1}, {\"fieldPath\": \"street_number\", \"count\": 1}, {\"fieldPath\": \"incident_address\", \"count\": 1}, {\"fieldPath\": \"state_plane_x_coordinate\", \"count\": 1}, {\"fieldPath\": \"council_district_code\", \"count\": 1}, {\"fieldPath\": \"location\", \"count\": 1}]}",
        "contentType": "application/json"
      },
      "systemMetadata": {
        "lastObserved": 1626739200000,
        "runId": "test-hivequery-usage",
        "registryName": null,
        "registryVersion": null,
        "properties": null
      }
    }
    b
    g
    • 3
    • 3
  • b

    best-receptionist-79324

    07/28/2022, 4:34 PM
    Hi all! I'm just getting started with DataHub. I'm using Python and have the emitter working at a basic level. We have a different ingestion process that defines the "platform_instance" in YAML before ingestion. Is there a way to define the "platform_instance" when using the emitter? Here is some of the code I'm using:
    Copy code
    upstream_table_1 = UpstreamClass(
        dataset=builder.make_dataset_urn("DEPENDENCY_MAP", "upstream_table_4", "DEV"),
        type=DatasetLineageTypeClass.TRANSFORMED,
    )
    upstream_tables: List[UpstreamClass] = [upstream_table_1]
    upstream_table_2 = UpstreamClass(
        dataset=builder.make_dataset_urn("DEPENDENCY_MAP", "upstream_table_3", "DEV"),
        type=DatasetLineageTypeClass.TRANSFORMED,
    )
    upstream_tables.append(upstream_table_2)
    
    # Construct a lineage object.
    upstream_lineage = UpstreamLineage(upstreams=upstream_tables)
    
    # Construct a MetadataChangeProposalWrapper object.
    lineage_mcp = MetadataChangeProposalWrapper(
        entityType="dataset",
        changeType=ChangeTypeClass.UPSERT,
        entityUrn=builder.make_dataset_urn("DEPENDENCY_MAP", "downstream", "DEV"),
        aspectName="upstreamLineage",
        aspect=upstream_lineage,
    )
    
    
    # Emit metadata
    emitter.emit_mcp(lineage_mcp)
    Thanks! --Rob
    c
    • 2
    • 2
  • a

    adamant-napkin-88678

    07/28/2022, 10:31 PM
    If I have a materialized view. How do I see its lineage?
    c
    p
    • 3
    • 3
  • e

    echoing-alligator-70530

    07/28/2022, 10:05 PM
    Hey Everyone quick question around lineage is there a way to specify that such lineage between dataset is a replication. For example at the lineage in the picture if there is a way to say customers is a replication of users, orders and payments
    c
    • 2
    • 2
  • m

    magnificent-notebook-88304

    07/29/2022, 6:20 AM
    Quick Question - When we ingest Table Metadata from an Oracle table , can’t we see on which column the table data is partitioned on ? May be somewhere in properties tab? i can see the partition count , but not the column on which the data is partitioned. @rich-policeman-92383
    c
    • 2
    • 1
  • f

    flat-painter-78331

    07/29/2022, 7:54 AM
    Hi guys! Anyone created a BigQuery source successfully for ingestion?
  • f

    flat-painter-78331

    07/29/2022, 7:55 AM
    If so, can someone lmk what you used for configurations in the recipe.yml file?
    c
    • 2
    • 2
  • m

    mysterious-nail-70388

    07/29/2022, 10:07 AM
    Hello,can the ES data source support stateful_ingestion
    c
    • 2
    • 2
  • h

    hallowed-bear-93832

    07/29/2022, 11:01 AM
    Hi Team, we are looking for s3 compliance data source ingestion like Minio. We tried using s3 recipe code for ingesting Minio data source, but it did not worked. Is their any way we can ingest Minio data source in current version of datahub.
    d
    • 2
    • 2
  • k

    kind-whale-32412

    07/29/2022, 5:30 PM
    Hello there, I am trying to modiify superset.py for the ingestion and add more lineage to the existing one. I noticed that if I emit lineage via DatahubRestEmitter then it works completely fine. However if I emit lineage via
    self.report.report_workunit
    then I do not see the workunit being sent to the server. I couldn't figure out why, so I'm asking here. Is there a way to submit lineage work with this workunit system?
    s
    • 2
    • 3
  • l

    lemon-terabyte-66903

    07/29/2022, 5:39 PM
    Hello, can somebody address this?
  • k

    kind-whale-32412

    07/30/2022, 7:23 PM
    Hey there, I feel like there's an issue with URN parsing on something:
    Failed to convert urn to entity key: urns parts and key fields do not have same length
    I get this error. I am wondering what does it refer here as key fields, and how can I debug this? This is for superset ingestion that uses Trino
    c
    • 2
    • 1
  • c

    crooked-holiday-47153

    08/01/2022, 8:28 AM
    Hi all, Is there an option to filter out the following with the Tableau ingestion: • Embedded Data Source • View • Custom SQL 10x in advance, Eyal
    h
    • 2
    • 7
  • b

    better-orange-49102

    07/28/2022, 12:50 PM
    For schema history, is it intended that a change in field TYPE for an existing column do not get a version increment in the history? When adding new fields/removing new fields I see the version increment, but not for type changes. (Though, my schema changes are done it via MCP emits, so not sure if that is the cause)
    e
    • 2
    • 6
  • a

    alert-fall-82501

    07/28/2022, 2:17 PM
    "class": algorithms.Blowfish, [2022-07-28 193658,839] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.41.2 [2022-07-28 193659,267] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://localhost:8080 Illegal instruction (core dumped)
    g
    • 2
    • 6
  • l

    lively-farmer-38551

    08/01/2022, 9:51 PM
    Hi, anybody knows how to import metadata comments from table and columns on postgreSQL ? Thanks a lot !
    c
    • 2
    • 1
  • a

    alert-football-80212

    08/01/2022, 12:10 PM
    Hi all, is there any way in kafka Ingestion recipe to add owner and domain. Thank you!
    c
    g
    • 3
    • 13
  • l

    limited-cricket-18852

    08/02/2022, 12:01 PM
    Hi all, I am trying to ingest data coming from a Hive Metastore (hosted at Databricks) and it works almost perfectly but we are not seeing the descriptions of the columns (and maybe the tables). I could find that it has been fixed in acryl-datahub 0.3.0 as mentioned here. In fact when I use the console sink, I can see the description, but when I use the HTTP sink, I’ll not see them in the UI. As anyone seen this? Is there any way to fix or something? Thanks!
    m
    • 2
    • 14
  • a

    alert-fall-82501

    08/02/2022, 12:02 PM
    [2022-08-02 173151,021] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.41.2 [2022-08-02 173151,375] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://localhost:8080 Illegal instruction (core dumped)
    m
    • 2
    • 2
1...575859...144Latest