https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • b

    bland-orange-13353

    03/01/2024, 12:18 PM
    This message was deleted.
    r
    • 2
    • 1
  • m

    miniature-mouse-35911

    03/01/2024, 4:49 PM
    Hello, team ! We are doing DataHub POC testing and as part of it , using Athena , Glue and S3 as Ingestion recipes to compare side by side on the capabilities. we are more interested in stats like rowcount. sample data, min,max ,average etc Deployed DataHub on EKS cluster. EKS cluster pod uses a certain IAM role and I used this code in Athena Recipe so that it can assume the role i defined in recipe but 1) Assume role is not happening. It brings back metadata though with the instance IAM role. 2) stats do not get updated even though profiling is enabled 1. Are you using UI or CLI for ingestion? UI 2. Which DataHub version are you using? (e.g. 0.12.0) 0.12.0 3. What data source(s) are you integrating with DataHub? (e.g. BigQuery) Athena, Glue, S3
    Copy code
    source:
        type: athena
        config:
            aws_region: us-east-1
            work_group: primary
            s3_staging_dir: '<s3://datahubpoc-data/athena-results/>'
            catalog_name: datahubpoc-gluecatalog
            aws_role_arn: 'arn:aws:iam::<awsaccountid>:role/test-datahubec2-poc-role'
            profiling:
                enabled: 'True'
    r
    d
    l
    • 4
    • 3
  • b

    bulky-island-74277

    03/04/2024, 3:10 AM
    Hello DataHub community!Today I ran into an issue while using datahub, when I clicked on Lineage, it said "an unexpected error occurred". What is the problem and how can I fix it?
    r
    a
    • 3
    • 6
  • g

    gifted-diamond-19544

    03/04/2024, 8:58 AM
    Hi all! I have a question regarding
    Athena
    ingestion. I was looking into the permissions and it seems that Datahub needs permissions to run queries on Athena, as well as getting objects from S3. Are these permissions necessary if I just want to ingst metadata from Athena (meaning, no profiling)?
    ✅ 1
    r
    d
    • 3
    • 6
  • b

    bland-application-65186

    03/04/2024, 10:01 AM
    Hi All, in https://datahubproject.io/docs/generated/ingestion/sources/s3/,
    <s3://my-bucket/{dept}/tests/{table}/*.avro>
    # specifying keywords to be used in display name
    whats the expected result of using
    {dept}
    ?
    r
    d
    • 3
    • 19
  • p

    purple-addition-48342

    03/04/2024, 8:29 PM
    Hello everyone. I am using Datahub v0.12.0 from Helm Chart with GCP. I would like to use custom code inside the action container, to e.g. add custom transforms or other custom code with the UI ingestion. What is the best way to "inject" this custom code ... Actually I would like to prevent creating a custom image, which needs to be updated every time datahub is updated. I thought about injecting files via config maps, e.g. overwrite the ingestion_common.sh to add custom packages .... is there a better way ? Thx for help
    r
    d
    • 3
    • 3
  • b

    boundless-bear-68728

    03/05/2024, 1:41 AM
    Hi Team, Can you please suggest what should be the recommended memory that should be assigned to
    datahub-action
    servce. Currently, I have assigned 6Gi with max up to 8Gi but still I could see that the service is consuming around 7.6Gi of memory and during this time the application UI renders inactive. Is there any resolution to this issue? Currently, I am trying to ingest metadata for just 1 Snowflake DB with all advanced options turned on. Do I need to cut down on the number of schema I am trying to ingest the data or should I push to
    datahub-action
    for more memory?
    r
    d
    • 3
    • 5
  • e

    elegant-salesmen-99143

    03/05/2024, 8:07 AM
    Hi everyone. I have a question about the note that the
    env
    parameter is about to be deprecated. It said use
    platform_instance
    instead. But it looks like
    platform_instance
    is for different use cases and works differently. For example, I had a recipe that had
    env: STG
    . I tried replacing it with with
    platform_instance: STG
    , but now when I look at database structure, I have a container PROD on upper level (
    PROD
    is the default value for
    env
    ), and inside it I have STG container with my database. Is that the expected behavior? Environment is the same thing as instance, how do I specify the environment now? After
    env
    is deprecated, what will happen to the databases that have PROD as the default value for env, not specified in recipe? Will they behave differently from those where
    env: PROD
    is specified in recipe? I did it while on Datahub 12.1, I haven't upgraded to 13.0 yet, I wanted to try using replacing the
    env
    first.
    r
    d
    • 3
    • 6
  • a

    able-jelly-63005

    03/05/2024, 9:24 AM
    I am running a bigquery ingestion while running if i enable the column level profiling, its failing and showing exception in logs ERROR {datahub.entrypoints:201} - Command failed: 'Cursor' object has no attribute '_query_job' Traceback (most recent call last): File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main sys.exit(datahub(standalone_mode=False, **kwargs)) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper raise e File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper res = func(*args, **kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper return func(ctx, *args, **kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run ret = loop.run_until_complete(run_ingestion_and_check_upgrade()) File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 181, in run_ingestion_and_check_upgrade ret = await ingestion_future File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 139, in run_pipeline_to_completion raise e File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion pipeline.run() File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 377, in run for wu in itertools.islice( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 118, in auto_stale_entity_removal for wu in stream: File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 142, in auto_workunit_reporter for wu in stream: File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 224, in auto_browse_path_v2 for urn, batch in _batch_workunits_by_urn(stream): File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 362, in _batch_workunits_by_urn for wu in stream: File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 155, in auto_materialize_referenced_tags for wu in stream: File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect for wu in stream: File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 551, in get_workunits_internal yield from self._process_project(project_id) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 670, in _process_project yield from self.profiler.get_workunits( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/profiler.py", line 184, in get_workunits yield from self.generate_profile_workunits( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/sql_generic_profiler.py", line 103, in generate_profile_workunits for ge_profiler_request, profile in ge_profiler.generate_profiles( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 944, in generate_profiles yield async_profile.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 987, in _generate_profile_from_request return request, self._generate_single_profile( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 1042, in _generate_single_profile bigquery_temp_table = create_bigquery_temp_table( File "/tmp/datahub/ingest/venv-bigquery-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 1227, in create_bigquery_temp_table ] = cursor._query_job AttributeError: 'Cursor' object has no attribute '_query_job'. Did you mean: 'query_job'?
    r
    d
    l
    • 4
    • 5
  • f

    few-accountant-12561

    03/05/2024, 1:20 PM
    Hi all! Please tell me if it is necessary to create a connection to the airflow in the datahub. What should the connection recipe look like? I mean Ui connection
    r
    d
    • 3
    • 2
  • f

    few-piano-98292

    03/05/2024, 7:24 PM
    I have been trying to capture lineage information from a spark Databricks notebook, however it appears information related to the notebook run such as appName, startedAt, description and queryPlan is populated in the Properties page of that Task. No details related to the lineage of this output dataset written to s3, shows up in the Lineage page of this task Could someone help understand what is it that is missing in our configs/setup that would prevent the lineage details from being displayed on the frontend. Here is the datahub and spark version DataHub Version: 0.12.0 Apache Spark Version : 3.2.1[on Databricks -- Databricks LTS 10.4] Scala Version : 2.12
    r
    l
    +2
    • 5
    • 16
  • b

    boundless-bear-68728

    03/05/2024, 10:53 PM
    Hi All. I am facing the following error while I am trying to ingest Looker metadata information:
    Copy code
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/acryl/executor/dispatcher/default_dispatcher.py", line 30, in dispatch_async
        res = executor.execute(request)
      File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/reporting_executor.py", line 94, in execute
        self._datahub_graph.emit_mcp(completion_mcp)
      File "/usr/local/lib/python3.10/site-packages/datahub/emitter/rest_emitter.py", line 245, in emit_mcp
        self._emit_generic(url, payload)
    datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'message': 'HTTPConnectionPool(host=\'datahub-datahub-gms\', port=8080): Max retries exceeded with url: /aspects?action=ingestProposal (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'datahub-datahub-gms\', port=8080): Read timed out. (read timeout=30)"))'})
    2024-03-05T22:44:20.989419034Z
    Can you please help me with this issue
    r
    d
    • 3
    • 7
  • f

    fresh-river-19527

    03/06/2024, 12:23 PM
    Hi, is there any way to filter the notifications being published into Slack? For example, when I set up the Airflow connector, either with a personal token or with the Datahub user token, for every metadata event that the Airflow plugin publishes, we receive a notification in the slack channel, which can become very spammy when having lots of Dags running
    r
    d
    • 3
    • 5
  • s

    some-alligator-9844

    03/06/2024, 2:43 PM
    Hi Team/@gray-shoe-75895, I am doing CLI based ingestion for Hive Sources. The ingestion is failing after encountering a single exception from source for a dataset/table. In the earlier version it used to continue and try for next tables. 1. Is there any configuration to continue on error? 2. The ingestion gets stuck for hours if there is no response from the source. is it possible to have a timeout per dataset? Datahub CLI version: 0.12.1.3
    r
    d
    • 3
    • 4
  • s

    some-alligator-9844

    03/06/2024, 2:46 PM
    Hi Team/@gray-shoe-75895, During cli ingestion I am getting this warning. What should I do to make correct this? I am already using platform_instance
    Copy code
    ['env is deprecated and will be removed in a future release. Please use platform_instance instead.']
    recipe.yaml
    Copy code
    source:
      type: hive
      config:
        platform_instance: ANA.OCE.DEV
        env: DEV
        host_port: 'xxxxxxx.visa.com:10000'
        username: xxxxxxx
        options:
          connect_args:
            auth: KERBEROS
            kerberos_service_name: hive
    sink:
      type: datahub-rest
      config:
        server: '${DATAHUB_GMS_HOST}'
        token: '${DATAHUB_GMS_TOKEN}'
        max_threads: 1
    Datahub CLI version: 0.12.1.3
    r
    d
    • 3
    • 3
  • h

    happy-branch-193

    03/06/2024, 3:12 PM
    Hi guys, just had a question. For LookML ingestions, how is it possible to enable to show all the nested lookml views (specifically defined with extends keyword) in the lineage? Not just the last used for the explore? This is UI ingestion. DataHub CLI version: 0.12.1.5. Looker/LookML.
    r
    d
    l
    • 4
    • 8
  • i

    incalculable-sundown-8765

    03/06/2024, 7:18 PM
    Hi guys, I have a question on
    datahub delete
    . I want to hard delete everything related to redshift. However, I encounter this issue:
    Copy code
    % datahub delete --platform redshift --dry-run
    [2024-03-06 20:13:35,266] INFO     {datahub.cli.delete_cli:341} - Using DataHubGraph: configured to talk to <http://localhost:8080>
    [2024-03-06 20:13:36,009] ERROR    {datahub.entrypoints:201} - Command failed: ('Unable to get metadata from DataHub', {'message': '401 Client Error: Unauthorized for url: <http://localhost:8080/api/graphql'}>)
    Do I need token to run the command? If so, how can I include the token in the command? Thank you. Datahub version: v0.12.1
    r
    l
    • 3
    • 2
  • m

    modern-orange-37660

    03/06/2024, 9:31 PM
    I am dealing with a weird Tableau ingestion bug. A dashboard is ingested and can be found in DataHub, but if I browse to the one level up workbook, it doesn’t exist in there. Has anyone encountered a similar problem? UI Ingestion / CLI Version: 0.12.1
    r
    • 2
    • 5
  • c

    cuddly-dinner-641

    03/07/2024, 4:02 PM
    is the Snowflake ingestion source able to collect lineage for Dynamic Tables? It doesn't seem to be working for me
    r
    • 2
    • 2
  • f

    flat-bear-65100

    03/08/2024, 2:10 AM
    Hello Team! I’ve attempted several times to ingest data into datahub (v0.13.0) for both for S3 and Glue. I’ve tried through the UI and via the CLI. This is an example of what shows although no assets get ingested:
    Copy code
    'container': ['urn:li:container:8e7ba34c02ebac26523e12b245223254',
                                'urn:li:container:8f14caa5a1220e7890ee5ca61d5c570d',
                                'urn:li:container:cee410e83a7898b2dda07dc3440c7cfd',
                                'urn:li:container:83e2422984342072527ec4f411c231e8',
                                'urn:li:container:4be6f93ced89cf3af76c4d5aa0a4313f',
                                'urn:li:container:fbf321045931666f19a792a7bcbd2d2e',
                                'urn:li:container:7702dc6c60dc4dbdd8ba26f3dc6464ad',
                                'urn:li:container:8da75ef4e929ee8bdc0dc8287d16cd2b',
                                'urn:li:container:1d0508f2f359898db300c54bd57ad670',
                                'urn:li:container:196bbcab079fa9315eb6badccfa8befb',
                                '... sampled of 21 total elements']},
     'aspects': {'dataset': {'datasetProperties': 26, 'schemaMetadata': 26, 'operation': 26, 'container': 26, 'browsePathsV2': 52, 'status': 26},
                 'container': {'containerProperties': 21,
                               'status': 21,
                               'dataPlatformInstance': 21,
                               'subTypes': 21,
                               'browsePathsV2': 42,
                               'container': 20}},
     'warnings': {},
     'failures': {},
     'soft_deleted_stale_entities': [],
     'filtered': [],
     'start_time': '2024-03-07 21:02:59.023267 (19.02 seconds ago)',
     'running_time': '19.02 seconds'}
    Sink (datahub-rest) report:
    {'total_records_written': 328,
     'records_written_per_second': 16,
     'warnings': [],
     'failures': [],
     'start_time': '2024-03-07 21:02:58.256726 (19.79 seconds ago)',
     'current_time': '2024-03-07 21:03:18.048629 (now)',
     'total_duration_in_seconds': 19.79,
     'max_threads': 15,
     'gms_version': 'v0.13.0',
     'pending_requests': 0}
    r
    d
    • 3
    • 5
  • f

    flat-bear-65100

    03/08/2024, 2:11 AM
    CleanShot 2024-03-07 at 21.10.50@2x.png
    r
    • 2
    • 3
  • q

    quick-guitar-82682

    03/08/2024, 5:17 AM
    Hi everyone. I am looking to ingest structured properties via file ingests on DataHub. I was following documentation as to how to input the aspect, however, after ingestion no data seems to come through. This is what I am including in my ingest file. The structured property has properly been set up and tested on another dataset so I am not sure why it doesn't show up on this one { "com.linkedin.pegasus2avro.structured.StructuredProperties": { "properties": [ { "propertyUrn": "urnlistructuredProperty:io.acryl.props.keywords", "values": [ { "string": "foo" }, { "string": "bar" } ] }, { "propertyUrn": "urnlistructuredProperty:io.acryl.props.testProperty", "values": [ { "string": "Test" } ] } ] } },
    r
    d
    • 3
    • 4
  • s

    some-zoo-21364

    03/08/2024, 10:26 AM
    Hi all, May I know if there is a way to map the Airflow DAG owner to Datahub custom group? I created a group with members via cli ingestion, and set the owner in my DAG as the group's id. DAG example below:
    Copy code
    default_args = {
        'owner': 'mygroup',
    }
    and the group yaml file contains..
    Copy code
    id: mygroup
    display_name: "My Group"
    email: "mygroup@example.com"
    however triggering DAG creates a new user with type
    CORP_USER
    and urn
    urn:li:corpuser:mygroup
    , instead of mapping it the the group entity with urn
    urn:li:corpGroup:mygroup
    r
    d
    • 3
    • 4
  • g

    gifted-coat-97302

    03/08/2024, 12:21 PM
    Hello Team, we have been trying to ingest from Athena using CLI (0.12.1.5) based ingestion but struggling with the GMS service throwing
    document_missing_exception
    . There seems to be data in the
    metadata_aspect_v2
    database table but nothing in elasticsearch and nothing is visible in the datahub-frontend either. Datahub Details: • Version: 0.12.1 (using docker images with this version • deployment type: Kubernetes (AWS EKS) • deployment method: Custom internal Helm chart ◦ Frontend deployment separately ◦ GMS deployed with multiple replicas ▪︎ with MCE/MAE turned off ▪︎ metadata-auth enabled ▪︎ Hazelcast enabled (although we are having problems with this, so currently only running one replica) ◦ MAE consumer deployment separately with 2 replicas ◦ MCE consumer deployment separately with 1 replica Further details in the thread, any help will be much appreciated
    r
    d
    • 3
    • 6
  • m

    miniature-magician-74764

    03/08/2024, 7:49 PM
    Hello Team, we have been trying to ingest from Athena using CLI (acryl-datahub, version 0.12.1.5) and dbt. We need both ingestions as not necessarily all assets in Athena will be part of the dbt universe. The problem: We are getting duplicated entities. The main problem seems to be that Athena is not adding the "Catalog" name in the urn. • Athena (top) URN:
    urn:li:dataset:(urn:li:dataPlatform:athena,dq_cat_test.mod_cat1_test1,PROD)
    • dbt (bottom):
    urn:li:dataset:(urn:li:dataPlatform:dbt,AwsDataCatalog.dq_cat_test.mod_cat1_test1,PROD)
    Is there a way to add the correct Data Catalog into the Athena Ingestion URN? Working with sibling would be impossible due to the volume and the data mesh schema we are developing.
    Copy code
    athena_ingestion_nonprod.py
    
    # The pipeline configuration is similar to the recipe YAML files provided to the CLI tool.
    pipeline = Pipeline.create(
        {
            "source": {
                "type": "athena",
                "config": {
                    "aws_region": "us-east-2",
                    "work_group":"primary",
                    "query_result_location":"REDACTED",
                    "catalog_name":"AwsDataCatalog"
                },
            },
            "sink": {
                "type": "datahub-rest",
                "config": {
                    "server": "REDACTED",
                    "token": "REDACTED"
                    },
            },
        }
    )
    
    # Run the pipeline and report the results.
    pipeline.run()
    pipeline.pretty_print_summary()
    Copy code
    recipe.dhub.dbt_nonprod.yaml
    
    source:
      type: "dbt"
      config:
        # Coordinates
        # To use this as-is, set the environment variable DBT_PROJECT_ROOT to the root folder of your dbt project
        manifest_path: "REDACTED/manifest.json"
        catalog_path: "REDACTED/catalog.json"
        sources_path: "REDACTED/sources.json" # optional for freshness
        test_results_path: "REDACTED/run_results.json" # optional for recording dbt test results after running dbt test
    
        # Options
        target_platform: "athena" # e.g. bigquery/postgres/etc.
        # incremental_lineage: False # Para cuando queremos borrar el linaje previo
        entities_enabled: # Multiple dbt projects
          sources: "no"
    
    sink: 
      type: "datahub-rest"
      config: 
        server: "REDACTED"
        token: "REDACTED"
    r
    • 2
    • 2
  • b

    boundless-bear-68728

    03/08/2024, 10:13 PM
    Hi Team/@gray-shoe-75895 I having an issue with the Looker Ingestion. I could see discrepancies between the number of datasets that DataHub is showing vs the actual count of Datasets that exist in our Looker. The DataHub is showing only 97 Explores vs 300+ explores that we have. The Looker Ingestion logs show success but I still could not see all the records in DataHub. Can you please help me how to resolve this issue
    r
    g
    • 3
    • 10
  • r

    ripe-machine-72145

    03/09/2024, 2:03 PM
    Hi Team, Is there any better way to ingest csv file metadata. UI based 0.13 Csv
    r
    f
    • 3
    • 3
  • w

    worried-agent-2446

    03/10/2024, 2:42 PM
    Hello! I’m using DataHub. And I’m considering ingesting mysql with SQL Queries(https://datahubproject.io/docs/generated/ingestion/sources/sql-queries/) to view column level lineage. I’d like to know whether I can use ddl sql (like CREATE TABLE or DROP TABLE …) in SQL Queries for this purpose.🙇‍♂️
    r
    • 2
    • 3
  • c

    clean-magazine-98135

    03/11/2024, 2:42 AM
    Hi all! I'm using DataHub on version 0.13.0. I wanna connect to a hive database using UI ingestion feature. Could you please offer me a recipe demo of hive database connection? Thanks a lot.
    r
    • 2
    • 1
  • r

    rich-barista-93413

    03/11/2024, 9:24 AM
    Hey there! 👋 Make sure your message includes the following information if relevant, so we can help more effectively! 1. Are you using UI or CLI for ingestion? 2. Which DataHub version are you using? (e.g. 0.12.0) 3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)
1...140141142143144Latest