https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • m

    magnificent-plumber-63682

    06/16/2023, 9:59 AM
    Hi, I am trying to do ingestion locally on my system. I have created recipe for mysql, now I am trying to pass passowrd as a secret. But I am not getting how to generate secrets file in local system. Can anyone help me?
    ✅ 1
    g
    • 2
    • 6
  • g

    great-notebook-53658

    06/19/2023, 2:26 AM
    Hi, I am trying to ingest PowerBI metadata and getting error extra fields not permitted (type=value_error.extra). The datasource into PowerBI is Snowflake. The error I was getting as follows: Can anyone help ? Thanks!
    g
    a
    • 3
    • 51
  • s

    shy-dog-84302

    06/19/2023, 12:17 PM
    Hi, a question related to BigQuery metadata ingestion.. can we prevent ingesting project entities when there are no datasets in it? and also prevent ingesting dataset entities when there no tables/views in it? Maybe a configurable flag to prevent this?
    Copy code
    [2023-06-19 12:07:32,796] INFO     {datahub.ingestion.source.bigquery_v2.bigquery:474} - Processing project: xyz-project
    [2023-06-19 12:07:33,008] WARNING  {datahub.ingestion.source.bigquery_v2.bigquery:589} - No dataset found in xyz-project. Either there are no datasets in this project or missing bigquery.datasets.get permission. You can assign predefined roles/bigquery.metadataViewer role to your service account.
    [2023-06-19 12:07:33,008] INFO     {datahub.ingestion.source.bigquery_v2.bigquery_report:95} - Time spent in stage <xyz-project: Metadata Extraction at 2023-06-19 12:07:32.796768+00:00>: 0.21 seconds
    d
    m
    a
    • 4
    • 14
  • m

    millions-city-84223

    06/19/2023, 12:40 PM
    Hi Team Sorry if it is not a right place for this message. We are using Datahub and one of the ingestion flow we currently have is to ingest data using file source. Current file source implementation allows us to read files from local fs and http(s). But we also need to ingest files located on AWS S3. Could you please clarify whether Datahub has (or maybe work in progress) AWS S3 file source support? If not, I would like to add AWS S3 file source support and, maybe, create some convenient mechanism to add any other file based source(like GCP, Azure, …), just implementing interface.
    d
    a
    • 3
    • 10
  • b

    bland-application-65186

    06/19/2023, 2:30 PM
    Hi, question regarding OpenAPI ingestion. Is the ingestion OpenAPI definitions in YAML format in the roadmap?
    ✅ 1
    d
    • 2
    • 1
  • s

    strong-diamond-4751

    06/19/2023, 7:59 PM
    Howdy! I've got a bit of a weird question. Is it possible to set up a gitlab repository as an ingestion source? It would be neat to be able to document pipeline processes and whatnot from here.
    ✅ 1
    d
    m
    • 3
    • 2
  • i

    icy-zoo-92866

    06/20/2023, 7:18 AM
    Hi, I am trying to Ingest data from Superset in Datahub. Sending request like this
    Copy code
    source:
        type: superset
        config:
            connect_uri: '<https://superset-xx.xx.xx/>'
            username: xxx
            password: xxx
            provider: db
            env: xxx
    Request/Auth is successful but we are not getting any dashboards or charts back. When I login with same user and pwd to superset I can see all the charts what can be the issue..? TIA
    g
    w
    i
    • 4
    • 9
  • m

    miniature-hair-20451

    06/20/2023, 8:00 AM
    HI all. This bug affect me too - https://github.com/datahub-project/datahub/issues/6544 Now it's closed, please reopen.
    ✅ 1
    g
    d
    • 3
    • 2
  • j

    jolly-airline-17196

    06/20/2023, 11:52 AM
    hey! had a little query, during ingestion with the FILE method, the path specified would be on the docker container running datahub-gms or the machine on which datahub containers are currently running on, although i tried both ways, always ended up with the following error
    Copy code
    raise Exception(f"Failed to process {path}")
    Exception: Failed to process /home/datahub/students.json
    ✅ 1
    g
    • 2
    • 3
  • a

    ancient-queen-15575

    06/20/2023, 1:21 PM
    I’m seeing odd behaviour when ingesting dbt and Snowflake data. Column level lineage for a lot of tables is not appearing, some columns are greyed out and for some tables no columns are showing at all, as shown in first screenshot. In the second screenshot I clicked into the rightmost table, which wasn’t showing any columns in the Visualised Lineage view, and I see the full column list fine. Initially I was ingesting Snowflake first, then dbt and seeing no column level lineage. But after reading comments here I saw I should ingest dbt first and then Snowflake. Doing that has led to situation though. Anyone know what could be changed or how I could start debugging this? The dbt and Snowflake ingestions are running fine.
    g
    • 2
    • 2
  • l

    lively-raincoat-33818

    06/20/2023, 3:48 PM
    Hi folks, I'm working in the dbt ingestion and I want to have the compiled code of the queries in the view definition. Is that possible? For now If I used a macro function I only see the macro and not all the compiled query. I'm using the V0.10.3. Thanks in advance,
    g
    g
    d
    • 4
    • 7
  • l

    limited-cricket-18852

    06/20/2023, 4:54 PM
    Hi All! is there a way to parse/transform/ignore some containers when ingesting data? I am using the Databricks/unity-catalog source type and it results in generating some containers that I would like to not show up in Datahub. I get something like
    Datasets/ prod/ databricks/ my_workspace/ global-euwest/ my_catalog/ some_layer/ my_beautiful_table
    , however the
    my_workspace
    and
    global-euwest
    are not interesting for me. Is there a way to ingest without these information? Thanks!
    ✅ 1
    g
    f
    • 3
    • 15
  • b

    bumpy-hamburger-47757

    06/20/2023, 7:52 PM
    I'm using the Python SDK -- is there a way to filter datasets by exact name matches? I'm using the
    DataHubGraph.get_urns_by_filter()
    and it's returning partial name matches for dataset name and column names. For example, if my query is
    "test_table"
    , it will return any datasets with the words
    test
    or
    table
    in the dataset name or columns (for example, a dataset named
    users_table
    or a column named
    test_value
    will match). Thanks!
    ✅ 1
    b
    g
    a
    • 4
    • 12
  • a

    average-nail-72662

    06/20/2023, 9:15 PM
    Hi guys, I’m new in datahub and I have a question to ask. When I did an ingestion Glue database, Can I upsert in properties metadata?
    ✅ 1
    g
    a
    • 3
    • 25
  • b

    bland-orange-13353

    06/21/2023, 12:47 AM
    This message was deleted.
    ✅ 1
    g
    l
    • 3
    • 2
  • e

    eager-monitor-4683

    06/21/2023, 3:21 AM
    Hi team, I tried to get the profiling from Redshift ingestion, but it's not working for external tables. Just want to know if there is any specific setting required? Thanks
    g
    d
    +2
    • 5
    • 9
  • r

    refined-gold-30439

    06/21/2023, 8:18 AM
    Hi 👋 Can't I collect metadata from LookerStudio instead of Looker? • ingestion.yaml
    Copy code
    source:
        type: looker
        config:
            base_url: '<https://lookerstudio.google.com/>'
            client_id: '${looker_client_id}'
            stateful_ingestion:
                enabled: true
            client_secret: '${looker_client_secret}'
    • Error
    Copy code
    [2023-06-21 08:17:34,423] INFO     {looker_sdk.rtl.requests_transport:72} - POST(<https://lookerstudio.google.com//api/4.0/login>)
    [2023-06-21 08:17:35,243] ERROR    {datahub.entrypoints:199} - Command failed: Failed to configure the source (looker): Failed to connect/authenticate with looker - check your configuration: )]}'
    {"errorStatus":{"code":9}}
    g
    f
    • 3
    • 2
  • g

    gifted-bird-57147

    06/21/2023, 9:10 AM
    Hi Team, We are using an ingestion recipe to load Athena data to our catalog. There are no documentation properties in the Athena source, so I added documentation manually afterwards. However, when we rerun the ingestion recipe the documentation gets removed. What do I need to change in my receipe to keep the existing (manually edited) documentation?
    Copy code
    source:
      type: athena
      config:
        # Coordinates
        aws_region: eu-west-1
        work_group: ${ATHENA_WG_PROD_BDV}
        username: ${ATHENA_USER_BDV}
        password: ${ATHENA_PW_BDV}
        query_result_location: ${ATHENA_QL_BDV}
        ## vanwege een bug in de athena ingestion moeten we de database opgeven.
        ## Daarom aparte scripts per database (want je kunt maar 1 database per script specificeren...)
        database: "bdv-prod-topdesk-transformed"
    
        # Options
        #s3_staging_dir: ${ATHENA_QL}
        profiling:
          enabled: true
          turn_off_expensive_profiling_metrics: true
          include_field_distinct_count: true
          include_field_min_value: true
          include_field_max_value: true
          include_field_mean_value: true
          include_field_sample_values: true
          field_sample_values_limit: 2
          profile_if_updated_since_days: 10
        stateful_ingestion:
          enabled: true
          ignore_old_state: false
          ignore_new_state: false
          remove_stale_metadata: true
        env: PROD
    
    pipeline_name: "BDV-prod-topdesk-transformed"
    
    
    transformers: # an array of transformers applied sequentially
      - type: "pattern_add_dataset_terms"
        config:
          term_pattern:
            rules:
              ".*": ["urn:li:glossaryTerm:INTERN_OPEN"]
      - type: simple_add_dataset_tags
        config:
          tag_urns:
            - "urn:li:tag:Bedrijfsvoering"
            - "urn:li:tag:Topdesk"
            - "urn:li:tag:PROD"
            - "urn:li:tag:Transformed"
      - type: "simple_add_dataset_domain"
        config:
          replace_existing: true  # false is default behaviour
          domains:
            - "urn:li:domain:1ef9fa01-a415-46e2-93ad-f8ce3bf84537" # domein 'Bedrijfsvoering'
    ✅ 1
    g
    • 2
    • 3
  • a

    adorable-forest-52600

    06/21/2023, 11:25 AM
    Hi all, I successfully ingested two JSON-schema's, but with both I see "no data" when I want to see the schema. I can only see the raw JSON that I ingested when I click on raw, but it didn't extract the properties, type, descriptions, etc for the schema. With another JSON-schema, I was successful before. Does anyone know what can cause this, that the ingestion is Successful but that no metadata is retrieved?
    ✅ 1
    g
    m
    • 3
    • 11
  • l

    lively-thailand-64294

    06/21/2023, 2:58 PM
    Hello Team!! I am new to datahub. I would like to know where the csv files for ingestion are supposed to be placed and where the recipe for ingestion is to be placed? I am running datahub on windows using docker and wsl2. Also Can the csv file be from dataset or does it need specific columns like resources.
    ✅ 1
    g
    • 2
    • 1
  • r

    rich-restaurant-61261

    06/21/2023, 8:46 PM
    Hi Team, I am trying to ingest data from superset, and base on the documentation https://datahubproject.io/docs/generated/ingestion/sources/superset/. The receipt using one variable is 'provider', is anyone know what is this variable? and what should I put over there? My superset and datahub is deployed through Kubernetes.
    ✅ 1
    g
    • 2
    • 3
  • c

    calm-helmet-89243

    06/21/2023, 10:41 PM
    Hi folks. When I use the Hive source, on the UI I see “Lineage” and “Queries” tabs are enabled even though there’s no data there. AFAIK I don’t emit any lineage or queries MCP events. Is there a way to disable these tabs? I’m thinking seeing these tabs would give users false hope that there’s something valuable there when there never will be (yet).
    d
    g
    • 3
    • 11
  • g

    gifted-diamond-19544

    06/22/2023, 7:08 AM
    Hello! I am getting an Athena timeout error on my Athena ingestion. Any idea on how to deal with this?
    Copy code
    "Ingestion error: An error occurred (MetadataException) when calling the GetTableMetadata operation: Rate exceeded (Service: AmazonDataCatalog; Status Code: 400; Error Code: ThrottlingException
    g
    a
    • 3
    • 11
  • p

    proud-dusk-671

    06/22/2023, 7:41 AM
    For ingesting into Snowflake, I have the following questions - 1. According to the diagram here, it seems that data is pulled into Metadata Ingestion which pushes it into the gms service. Does that mean there is no involvement of Kafka here? 2. Secondly, I would also like to know which component of Datahub does the service Metadata Ingestion belong to
    ✅ 1
    g
    • 2
    • 1
  • c

    creamy-pizza-80433

    06/22/2023, 10:10 AM
    Hello everyone, Recently we upgraded datahub version from 0.10.2 to 0.10.4 and we got a new problem regarding permissions and policies for users The permissions suddenly didn't work for any other entity except for Data Products. Does anyone know how can I solve this problem? Thanks!
    g
    • 2
    • 2
  • m

    modern-hospital-90979

    06/22/2023, 2:03 PM
    We have a question related to ingestion of Looker data. We've configured both the
    looker
    and
    lookml
    ingestion patterns and they appear to be pulling in most, if not all, of our assets in the platform. However, I'm unable to locate certain specific views that are defined in Looker ad Persistent Derived Tables (PDTs). Some PDTs show up, but others do not. It's unclear if there's a pattern to which ones show up and which do not. Have other users experienced challenges ingesting Looker PDTs?
    g
    • 2
    • 6
  • s

    strong-diamond-4751

    06/22/2023, 3:41 PM
    Hey there, I'm using the programmatic_pipeline.py to be able to configure and run a pipeline from within my py script. What is the proper syntax to add options? For example if my type is redshift, how to add include_table_lineage, include_views, etc
    ✅ 1
    g
    • 2
    • 1
  • g

    great-notebook-53658

    06/23/2023, 7:59 AM
    Hi, is it possible to define access policies to prevent certain users from accessing metadata by platform (e.g Snowflake)? I do not see in the https://datahubproject.io/docs/authorization/access-policies-guide/ that resource type = Platform or platform instance available and I do not see any drop down in the resource field when I select resource type=Container
    ✅ 1
    d
    • 2
    • 1
  • g

    great-notebook-53658

    06/23/2023, 8:50 AM
    Hi, any idea why I am getting the following error when trying to delete snowflake metadata using --platform option ? --urn is working but it’s tedious to delete by urn. Thanks!
    g
    • 2
    • 7
  • b

    billions-journalist-13819

    06/23/2023, 8:57 AM
    @famous-waitress-64616 A while ago, "only_ingest_assigned_metastore" was added and used among databricks ingest options. By the way, has this option disappeared? Can I use it again? i need this option..
    ✅ 1
    f
    a
    • 3
    • 37
1...126127128...144Latest