https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • w

    witty-keyboard-20400

    10/14/2021, 4:53 PM
    How do we enable versioning in captured metadata? 1. In the push mode, can the MCE emitter specify the same version for an existing dataset definition to override the DatasetSnapshot present in DataHub? 2. In the pull mode, is it possible for DataHub to create a new version of DatasetSnapshot if there were any changes in the definition? e.g. a field name's type has been changed from int to string in a mongodb collection?
    m
    • 2
    • 1
  • n

    nice-planet-17111

    10/15/2021, 7:59 AM
    Hi team 🥲 Is there a way to sink data to file
    from datahub
    ? I want to save it like a snapshot of current status of datahub, and i want it to be the same json format that is used for sink (or data source), so i can customize them and ingest it to datahub later. (Or is there any other way to solve this via mysql DB... etc?)
    s
    m
    • 3
    • 10
  • n

    nice-planet-17111

    10/15/2021, 9:30 AM
    Also, does anyone know what "`schema`" and "`property`" taps are for in Glossary terms page? I checked the docs and example, but there is no explanation about what are these, or how to define these in yaml file.
    m
    • 2
    • 2
  • w

    witty-butcher-82399

    10/15/2021, 10:59 AM
    Hi! I’ve noted there is a new
    DataPlatformInstance
    aspect for datasets. Is that somehow related to the “domains” feature that was mentioned at some moment?
    e
    m
    s
    • 4
    • 14
  • m

    melodic-spoon-99300

    10/15/2021, 1:55 PM
    Hello everyone, im ingesting metadata from athena. Is writing a custom transformer the only way to change instances urn enviroment (like prod to dev)?
    w
    m
    • 3
    • 3
  • n

    numerous-cricket-19689

    10/15/2021, 4:52 PM
    I am adding a custom Source by following https://datahubproject.io/docs/metadata-ingestion/adding-source. This custom source is very specific to my company. I was wondering what is the right approach to build it. I can think of 2 options. option 1: Create fork of datahubproject in my company and include the source in it and add it to setup.py, Option 2: Create a repo only for my custom sources/sinks/transform and install it on top of datahub. I prefer option 2 but not sure if its feasible how to include a plugin in setup.py
    m
    b
    p
    • 4
    • 13
  • w

    witty-keyboard-20400

    10/18/2021, 4:45 AM
    Could anyone please help me with exporting and importing metadata present in DataHub to preferably JSON dump? If that's not possible then export to binary. @nice-planet-17111 / @big-carpet-38439 any pointer?
    đź‘€ 1
    n
    b
    b
    • 4
    • 9
  • e

    elegant-machine-39016

    10/18/2021, 10:46 AM
    I added some data to a kafka topic named topic8. Then ran the ingestion script. I see the topic8 show up in datahub but I don't see any properties associated with it. Can someone please tell me what I'm doing wrong? I've attached screenshots of reading data from topic8 and my ingestion yaml.
    b
    l
    • 3
    • 8
  • r

    rapid-sundown-8805

    10/18/2021, 12:42 PM
    Hi all! Having some issues with the Azure AD ingestion recipe. Which permissions does my application need? Are the ones below appropriate?
    b
    • 2
    • 4
  • c

    clean-piano-28976

    10/18/2021, 4:38 PM
    Hi all 👋🏾 can anyone please help with the following error when ingesting LookerML data? I have been following this documentation
    v
    m
    +2
    • 5
    • 22
  • v

    victorious-garage-66835

    10/18/2021, 5:43 PM
    Hi Folks, Need some help with DataHub quickstart. After successfully ingesting Looker dashboard, I'm seeing this error show up on home page and many other pages. Screenshots below: Any ideas what is required for a fix?
    e
    b
    • 3
    • 12
  • q

    quiet-pilot-28237

    10/19/2021, 6:33 AM
    image.png
    d
    • 2
    • 3
  • q

    quiet-pilot-28237

    10/19/2021, 6:44 AM
    image.png
    d
    • 2
    • 6
  • m

    microscopic-elephant-47912

    10/19/2021, 8:11 PM
    Hello, I'm trying to ingest looker metadata but I'm having some problems. 1 week ago I was able to ingest with the same config file.
    Copy code
    ---- (full traceback above) ----
    File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 91, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 52, in run
        pipeline = Pipeline.create(pipeline_config)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 120, in create
        return cls(config)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 88, in __init__
        self.source: Source = source_class.create(
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 788, in create
        return cls(config, ctx)
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 245, in __init__
        self.client = LookerAPI(self.source_config).get_client()
    File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/looker.py", line 84, in __init__
        raise ConfigurationError(
    
    ConfigurationError: Failed to initialize Looker client. Please check your configuration.
    m
    • 2
    • 1
  • r

    rough-eye-60206

    10/19/2021, 8:38 PM
    Hello Team, I am. getting the following error when i tried to add tags to a dataset using GraphQL API. Can someone please help me figure out.
    Copy code
    {
      "errors": [
        {
          "message": "An unknown error occurred.",
          "locations": [
            {
              "line": 2,
              "column": 3
            }
          ],
          "path": [
            "addTag"
          ],
          "extensions": {
            "code": 500,
            "type": "SERVER_ERROR",
            "classification": "DataFetchingException"
          }
        }
      ],
      "data": {
        "addTag": null
      }
    }
    b
    g
    +2
    • 5
    • 37
  • b

    better-orange-49102

    10/20/2021, 3:49 AM
    For data profiling, if i set a limit of # of rows to profile, the stats will show as the # of rows profiled, yes? do you think it is reasonable/possible to show a count of all rows in the table AND show the stats based on the # of rows profiled? because i was thinking it shouldnt be too expensive to compute a count of all rows in the table, but profiling all rows could take a long time
    👍 1
    h
    • 2
    • 1
  • f

    full-area-6720

    10/20/2021, 7:12 AM
    Hi, how do I add glossary terms to the ui. It's empty even after ingestion from redshift. Does it have to be ingested separately?
    m
    • 2
    • 2
  • p

    powerful-telephone-71997

    10/20/2021, 8:15 AM
    Rollback of a run id is taking long and not logging anything in UI even when --debug switch is used….any tips / suggestions?
    datahub --debug ingest rollback --run-id <run id>
    g
    • 2
    • 3
  • f

    freezing-teacher-87574

    10/20/2021, 3:15 PM
    How I can connect recipe trough ssl to superset?
    Copy code
    source:
      type: "superset"
      config:
        connection:
          consumer_config:
            security.protocol: "ssl"
            ssl.ca.location: "/etc/ssl/certs/iamss-ca.crt"
        username: admin
        password: admin
        provider: db
        connect_uri: <https://superset.XXXXXX.com>
    sink:
      type: "datahub-rest"
      config:
        server: "<http://XXXXX:8080>"
    ERROR:
    Copy code
    1 validation error for SupersetConfig
    connection
      extra fields not permitted (type=value_error.extra)
    command terminated with exit code 1
    Thanks
    m
    s
    • 3
    • 4
  • p

    polite-flower-25924

    10/21/2021, 1:22 PM
    Hey folks, I have executed
    Redshift-usage
    ingestion and after that operation
    Queries
    tab is enabled. However, I’m not able to see any query in it. When I check the documents of
    dataset_datasetusagestatisticsaspect_v1
    index at ElasticSearch, I’ve validated that there are SQL queries in
    topSqlQueries
    fields.
    Copy code
    "topSqlQueries": [ - 
                  "/* ' Query generated by Chartio {\"reason\":\"dashboard_refresh_data\"} */ <REDACTED_SQL_COMMAND>",
                   "-- Looker Query Context '{\"user_id\":111,\"history_id\":22222,\"instance_slug\":\"asdqwezxc\"}' WITH company_wide_cdf AS (/* Primary owner: Seref */ <REDACTED_SQL_COMMAND>"
                   ]
    Could you please help me about this issue?
    b
    m
    +2
    • 5
    • 18
  • w

    wooden-notebook-33413

    10/21/2021, 1:34 PM
    Hi there, I'm trying to do a Snowflake ingestion. I've run the example to pull the Top N Queries but now I want to capture the metadata of a specific DB while ensuring that I exclude one with a similar name. I've added the following to my yml database_pattern: allow: "THISDB" database_pattern: ignoreCase: "NOTTHISDB" I keep getting the following error: 1 validation error for SnowflakeUsageConfig database_pattern extra fields not permitted (type=value_error.extra)
    âś… 1
    p
    • 2
    • 6
  • n

    nice-country-99675

    10/22/2021, 12:32 PM
    Hi all! just a quick question... I would like to know if there is an ingestion from Amazon QuickSight already developed (I think there's not, per the documentation), or whether there's a plan to support it... I would love to contribute with that, but since I'm just starting with datahub it may take me a while to get a draft in place...
    l
    b
    a
    • 4
    • 9
  • g

    gorgeous-diamond-82312

    10/22/2021, 4:19 PM
    Hi! Some folks around our org have documented various tables and columns in spreadsheets that can be read as CSV. Each line has enough info to recreate the table and column URNs. What's the best way to load these to DataHub?
    g
    • 2
    • 4
  • f

    full-area-6720

    10/25/2021, 6:02 AM
    Hi, can we add descriptions while defining the airflow dags itself rather than doing it in the UI?
    l
    • 2
    • 4
  • c

    cuddly-refrigerator-2629

    10/25/2021, 4:13 PM
    Hello, maybe anyone already have implemented ingestion for Kafka Connect S3 sink ?
    🚀 1
    l
    • 2
    • 3
  • r

    rhythmic-sundown-12093

    10/26/2021, 7:38 AM
    I think it's time for an update here
    g
    • 2
    • 2
  • b

    brief-insurance-68141

    10/26/2021, 6:15 PM
    could someone share with push ingestion model, documents and any demo codes in the repo?
    m
    • 2
    • 14
  • p

    powerful-manchester-27331

    10/26/2021, 6:25 PM
    Max retries exceeded with url: /config (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
    m
    • 2
    • 10
  • n

    narrow-kitchen-1309

    10/26/2021, 6:59 PM
    Hello, I have question regarding enable profiling status when ingestion from Trino (only one db2 table, 56 columns, 200rows ) to datahub. Below is warning I consistent got after execute ingestion. BTW, It works fine and ingestion successfully when I disabled profiling status. Any thoughts or suggestion please let me know.
    Copy code
    [2021-10-26 14:41:41,449] WARNING  {great_expectations.dataset.sqlalchemy_dataset:1577} - No sqlalchemy dialect found; relying in top-level sqlalchemy types.
    [2021-10-26 14:41:50,462] WARNING  {great_expectations.dataset.sqlalchemy_dataset:2023} - Regex is not supported for dialect <sqlalchemy_trino.dialect.TrinoDialect object at 0x13f1fb880>
    [2021-10-26 14:41:54,881] WARNING  {great_expectations.dataset.sqlalchemy_dataset:1577} - No sqlalchemy dialect found; relying in top-level sqlalchemy types.
    e
    h
    • 3
    • 5
  • c

    cuddly-family-62352

    10/27/2021, 6:28 AM
    Does datahub support the collection of database indexes and functions?
    s
    g
    • 3
    • 2
1...151617...144Latest