https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • f

    full-chef-85630

    11/23/2022, 5:56 AM
    Hi all, I’m using bigquery usage now. My problem is that I don’t store the usage data in the datahub that I report. I want to get the usage data directly
    ✅ 1
    a
    b
    • 3
    • 4
  • s

    silly-finland-62382

    11/23/2022, 7:17 AM
    Hey Everyone, Hope you are doing well & good Facing Datahub RDS CPU Utilisation Issue As we are using datahub on AWS service (with backend service RDS MySQL), as we are using large instance type of rds, when we are running any job/pipeline or dataset entry on datahub, it is creating high CPU utilisation Hike at 100% even RDS type is large 1. Is there any way we can delete unused entry on datahub RDS (by detecting unsed URN's or rows in RDS)? 2. Any way by which we can archival the RDS data of datahub? 3. If we delete some entries from RDS datahub, will it affect on Datahub or Datahub Service or it only used for log purpose ? I am highly obliged to youl, if you can reply on the same Thanks
    ✅ 1
    g
    • 2
    • 1
  • b

    better-fireman-33387

    11/23/2022, 10:48 AM
    Hi all, I’m deploying datahub using helm charts and I wonder how can I start a pod with datahub CLI docker image as explained here. Could it be configured inside the datahub chart? or it’s a different deployment
    ✅ 1
    g
    • 2
    • 5
  • a

    alert-fall-82501

    11/23/2022, 11:41 AM
    Copy code
    can anybody help me with this error , I am working airflow dag jobs lineage ,installed required plugins and docker-compose file 
    airflow-init_1       | ....................
    airflow-init_1       | ERROR! Maximum number of retries (20) reached.
    airflow-init_1       | 
    airflow-init_1       | Last check result:
    airflow-init_1       | $ airflow db check
    airflow-init_1       | Unable to load the config, contains a configuration error.
    airflow-init_1       | Traceback (most recent call last):
    airflow-init_1       |   File "/usr/local/lib/python3.9/pathlib.py", line 1323, in mkdir
    airflow-init_1       |     self._accessor.mkdir(self, mode)
    airflow-init_1       | FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2022-11-23'
    airflow-init_1       | 
    airflow-init_1       | During handling of the above exception, another exception occurred:
    airflow-init_1       | 
    airflow-init_1       | Traceback (most recent call last):
    airflow-init_1       |   File "/usr/local/lib/python3.9/logging/config.py", line 564, in configure
    airflow-init_1       |     handler = self.configure_handler(handlers[name])
    airflow-init_1       |   File "/usr/local/lib/python3.9/logging/config.py", line 745, in configure_handler
    airflow-init_1       |     result = factory(**kwargs)
    airflow-init_1       |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/log/file_processor_handler.py", line 46, in __init__
    airflow-init_1       |     Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)
    airflow-init_1       |   File "/usr/local/lib/python3.9/pathlib.py", line 1327, in mkdir
    airflow-init_1       |     self.parent.mkdir(parents=True, exist_ok=True)
    airflow-init_1       |   File "/usr/local/lib/python3.9/pathlib.py", line 1323, in mkdir
    airflow-init_1       |     self._accessor.mkdir(self, mode)
    airflow-init_1       | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'
    airflow-init_1       | 
    airflow-init_1       | The above exception was the direct cause of the following exception:
    airflow-init_1       | 
    airflow-init_1       | Traceback (most recent call last):
    airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
    airflow-init_1       |     from airflow.__main__ import main
    airflow-init_1       |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/__init__.py", line 46, in <module>
    airflow-init_1       |     settings.initialize()
    airflow-init_1       |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/settings.py", line 444, in initialize
    airflow-init_1       |     LOGGING_CLASS_PATH = configure_logging()
    airflow-init_1       |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/logging_config.py", line 73, in configure_logging
    airflow-init_1       |     raise e
    airflow-init_1       |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/logging_config.py", line 68, in configure_logging
    airflow-init_1       |     dictConfig(logging_config)
    airflow-init_1       |   File "/usr/local/lib/python3.9/logging/config.py", line 809, in dictConfig
    airflow-init_1       |     dictConfigClass(config).configure()
    airflow-init_1       |   File "/usr/local/lib/python3.9/logging/config.py", line 571, in configure
    airflow-init_1       |     raise ValueError('Unable to configure handler '
    airflow-init_1       | ValueError: Unable to configure handler 'processor'
    airflow-init_1       |
    ✅ 1
    g
    • 2
    • 1
  • s

    silly-intern-25190

    11/23/2022, 12:40 PM
    Hi team, our team is working on the Vertica plugin for datahub. while development we noticed in the datahub UI, there is an ML Model section, but inside the datahub code. we were not able to find any files from which we can ingest data in ML Models. It will be a great help if someone can help us understand which files have functions, for ingestion metadata in ML models sections of datahub UI. @mammoth-bear-12532
    ✅ 1
    g
    • 2
    • 3
  • f

    future-iron-16086

    11/23/2022, 1:51 PM
    Hellow. I'm new here Is it possible to ingest policy tags from bigquery tables? Another questions its about Stats, using UI to ingest data, by default is stats collected? Thank you
    ✅ 1
    plus1 1
    b
    g
    • 3
    • 8
  • m

    mammoth-gigabyte-6392

    11/23/2022, 3:06 PM
    Hello team, I had a question: Can we specify our own path on Datahub metadata store during ingestion instead of it just replicating the hierarchy from the source?
    ✅ 1
    g
    • 2
    • 2
  • q

    quaint-barista-82836

    11/23/2022, 4:33 PM
    Hi Team, I am trying to connect Bigquery from local datahub setup and got a connection error, could someone advice what could be the issue(not passing key from cli):
    exec-urn_li_dataHubExecutionRequest_ca938180-1b5e-43e3-bbc6-dbb13c947e9b.log
    ✅ 1
    g
    • 2
    • 4
  • q

    quaint-barista-82836

    11/23/2022, 7:19 PM
    Hi Team, I am able to connect bq from datahub local version, however I do not see Lineage, queries and remaining tabs as they are disabled: Is this feature disabled on local and only enabled on the cloud version ?
    ✅ 1
  • q

    quaint-barista-82836

    11/23/2022, 7:22 PM
    image.png
  • l

    lemon-lock-89160

    11/23/2022, 8:25 PM
    Anyone have experience ingesting from Alteryx?
    ✅ 1
    m
    a
    • 3
    • 3
  • f

    famous-quill-82626

    11/23/2022, 10:15 PM
    Hi there - I am a new user of DataHub and was attempting to ingest file input of: • Glossary Terms • Domains While I am able to successfully ingest Glossary Terms, via:
    datahub ingest -c {recipe-filename.yaml}
    ... I do not seem to be able to do this with Domains, and these must be manually entered via the UI - is this correct?
    ✅ 1
    g
    • 2
    • 3
  • p

    polite-alarm-98901

    11/23/2022, 11:11 PM
    Hi everyone, I was curious if there are ways to do dynamic data ingestion using, say, a cronjob that is natively supported in DataHub, or is that something that would need to be built outside of it?
    ✅ 1
    m
    • 2
    • 4
  • a

    abundant-napkin-12120

    11/24/2022, 3:27 AM
    Hi All, We are currently using hive recipe to ingest our metadata to datahub from 200+ different hive instances. By default the container name is being generated based on the database name. Current Container Hierarchy :{hiveDatabase} -> hiveTable Is there any way we can have the container name hierarchy updated as below using the hive recipe? Desired Container Hierarchy :{tenantname} -> {hivedatabase} -> hivetable
    ✅ 1
    d
    a
    • 3
    • 5
  • a

    ancient-policeman-73437

    11/24/2022, 7:38 AM
    Dear support, as I have mentioned some times in the past. I face an issue in the newest version of DataHub with Looker module (LookML looks working fine). Looker ingestion doesnt build sometimes the lineage between charts and explores and now I started to see that even not all explores got imported, the root cause is enough clear as all of them are in the same model and have the same access require permissions. Please help!
    d
    l
    +2
    • 5
    • 20
  • a

    acoustic-ghost-64885

    11/24/2022, 9:16 AM
    Failed to create ingestion source!: Unauthorized to perform this action. Please contact your DataHub administrator.iam getting this error can anyone help me to resolve this issue.
    ✅ 1
    a
    • 2
    • 1
  • w

    witty-microphone-40893

    11/24/2022, 9:48 AM
    Hello. I have a dbt ingestion from a SQLite database, intended to tag fields for PII. Unfortunately the ingestion fails with an error.
    Copy code
    source:
      type: dbcat.datahub.CatalogSource
      config:
        database: main
        path: '/Users/user/Documents/datascience-experiments/piiscan'
        source_names:
          - prod_cat
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
    I run it with
    datahub ingest -c ./export.dhub.yml
    And the resulting run contains errors like these: (snipped for conciseness)
    Copy code
    [2022-11-24 09:40:47,109] INFO     {datahub.cli.ingest_cli:167} - DataHub CLI version: 0.9.2.1
    [2022-11-24 09:40:47,171] INFO     {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
    [2022-11-24 09:40:53,745] INFO     {datahub.ingestion.run.pipeline:197} - Source configured successfully.
    [2022-11-24 09:40:53,746] INFO     {datahub.cli.ingest_cli:120} - Starting metadata ingestion
    -[2022-11-24 09:40:53,985] ERROR    {datahub.ingestion.run.pipeline:57} -  failed to write record with workunit loan_management.account_holder with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/4/globalTags/tags/1/tag :: "Provided urn urn.li.tag.ADDRESS" is invalid\nERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/5/globalTags/tags/1/tag :: "Provided urn urn.li.tag.PERSON" is invalid\n', 'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/c', 'status': 422, 'id': 'urn:li:dataset:(urn:li:dataPlatform:mysql,loan_management.account_holder,PROD)'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: com.linkedin.metadata.entity.validation.ValidationException:
    ....
    ....
                  {'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: '
                                          'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class '
                                          'com.linkedin.entity.Entity: ERROR :: '
                                          '/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/com.linkedin.schema.SchemaMetadata/fields/3/globalTags/tags/1/tag '
                                          ':: "Provided urn urn.li.tag.PERSON" is invalid\n'
                                          '\n'
                                          '\tat com.linkedin.metadata.resources.entity.EntityResource.ingest(EntityResource.java:213)',
                            'message': 'com.linkedin.metadata.entity.validation.ValidationException: Failed to validate record with class '
                                       'com.linkedin.entity.Entity: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0/c',
                            'status': 422,
                            'id': 'urn:li:dataset:(urn:li:dataPlatform:mysql,document_templates.editions,PROD)'}},
                  '... sampled of 87 total elements'],
     'start_time': '2022-11-24 09:40:47.165725 (11.66 seconds ago).',
     'current_time': '2022-11-24 09:40:58.824433 (now).',
     'total_duration_in_seconds': '11.66',
     'gms_version': 'v0.9.2',
     'pending_requests': '0'}
    
     Pipeline finished with at least 87 failures ; produced 181 events in 5.08 seconds.
    It seems the errors are similar to
    "Provided urn <http://urn.li|urn.li>.tag.ADDRESS" is invalid\nERROR
    What am I missing to get this to ingest?
    ✅ 1
    d
    • 2
    • 6
  • r

    rich-van-74931

    11/24/2022, 10:47 AM
    Hello! I’m trying to run an ingestion recipe for Tableau, but it’s not able to login with provided credentials (using a token from a user with the required role in Tableau). The credentials are working, I can connect to the metadata API but for some reason datahub cannot login. I’ve tried using UI and CLI, but none work. I’m using the docker deployment on a AWS EC2 instance and have created another ingestion task for Snowflake that worked fine the first time (but strangely not working on schedule). Any suggestions? May I have to setup something else? Thanks!
    Copy code
    datahub ingest -c tableau.yml
    [2022-11-24 10:38:28,123] INFO     {datahub.cli.ingest_cli:167} - DataHub CLI version: 0.9.2.2
    [2022-11-24 10:38:28,152] INFO     {datahub.ingestion.run.pipeline:174} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
    ==================
    [2022-11-24 10:38:28,546] INFO     {datahub.ingestion.run.pipeline:197} - Source configured successfully.
    [2022-11-24 10:38:28,549] INFO     {datahub.cli.ingest_cli:120} - Starting metadata ingestion
    -[2022-11-24 10:38:28,582] INFO     {datahub.cli.ingest_cli:135} - Finished metadata ingestion
    /
    Cli report:
    {'cli_entry_location': '/home/ec2-user/.local/lib/python3.7/site-packages/datahub/__init__.py',
     'cli_version': '0.9.2.2',
     'mem_info': '68.09 MB',
     'os_details': 'Linux-5.10.149-133.644.amzn2.x86_64-x86_64-with-glibc2.2.5',
     'py_exec_path': '/usr/bin/python3',
     'py_version': '3.7.10 (default, Jun  3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]'}
    Source (tableau) report:
    {'event_ids': [],
     'events_produced': '0',
     'events_produced_per_sec': '0',
     'failures': {'tableau-login': ['Unable to login with credentials provided: \n\n\t401001: Signin Error\n\t\tError signing in to Tableau Server']},
     'running_time': '0.55 seconds',
     'soft_deleted_stale_entities': [],
     'start_time': '2022-11-24 10:38:28.255411 (now).',
     'warnings': {}}
    Sink (datahub-rest) report:
    {'current_time': '2022-11-24 10:38:28.805863 (now).',
     'failures': [],
     'gms_version': 'v0.9.2',
     'pending_requests': '0',
     'records_written_per_second': '0',
     'start_time': '2022-11-24 10:38:28.148111 (now).',
     'total_duration_in_seconds': '0.66',
     'total_records_written': '0',
     'warnings': []}
    
     Pipeline finished with at least 1 failures ; produced 0 events in 0.55 seconds.
    ✅ 1
    h
    d
    • 3
    • 14
  • b

    busy-computer-98970

    11/24/2022, 1:29 PM
    Hey team, good morging/afternoon/nigth! I'm currently having a problem ingesting AWS Athena, I have more than 600 tables and would like to run with profiling. However I am getting an exception only when I try to perform profiling such as '`asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit` Log above:
    Copy code
    '"systemMetadata": {"lastObserved": 1669295089809, "runId": "aeed3a9a-9dcb-4b89-91c3-8c78c068dc88"}}}\' '
               "'<http://datahub-datahub-gms:8080/aspects?action=ingestProposal>'\n"
               '[2022-11-24 13:04:49,851] DEBUG    {datahub.ingestion.run.pipeline:47} -  sink wrote workunit '
               'container-urn:li:container:d517cb430c984c29a927ccf609be7dcf-to-urn:li:dataset:(urn:li:dataPlatform:athena,api_silver_db.superlogica_faturas,PROD)\n'
               '[2022-11-24 13:04:49,884] DEBUG    {datahub.emitter.rest_emitter:236} - Attempting to emit to DataHub GMS; using curl equivalent to:\n',
               '2022-11-24 13:04:49.888314 [exec_id=aeed3a9a-9dcb-4b89-91c3-8c78c068dc88] INFO: Caught exception EXECUTING '
               'task_id=aeed3a9a-9dcb-4b89-91c3-8c78c068dc88, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
               '    line = await self.readuntil(sep)\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 620, in readuntil\n'
               '    raise exceptions.LimitOverrunError(\n'
               'asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit\n'
               '\n'
               'During handling of the above exception, another exception occurred:\n'
               '\n'
               'Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 147, in execute\n'
               '    await tasks.gather(_read_output_lines(), _report_progress(), _process_waiter())\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 99, in _read_output_lines\n'
               '    line_bytes = await ingest_process.stdout.readline()\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 534, in readline\n'
               '    raise ValueError(e.args[0])\n'
               'ValueError: Separator is found, but chunk is longer than limit\n']}
    Execution finished with errors.
    Remembering, this error only occurs when activating profiling. My recipe:
    Copy code
    source:
        type: athena
        config:
            aws_region: us-west-2
            s3_staging_dir: '------------------------------'
            profiling:
                enabled: true
                include_field_sample_values: false
            work_group: primary
    ✅ 1
    a
    s
    • 3
    • 7
  • e

    enough-mouse-67490

    11/24/2022, 3:06 PM
    👋 Hello, team! @witty-plumber-82249 I connect my snowflake account using this recipe and get succeeded status.
    Copy code
    source:
        type: snowflake
        config:
            include_table_lineage: true
            account_id: pagaya-luigi
            profiling:
                enabled: true
            include_view_lineage: true
            warehouse: datahub_wh
            stateful_ingestion:
                enabled: false
            username: '${snowflake_prod_username}'
            password: '${snowflake_prod_password}'
            table_pattern:
                deny:
                    - '.*TMP$'
            role: datahub_test
    pipeline_name: 'urn:li:dataHubIngestionSource:_______'
    I see all the tables but, I cant see the snowpipes, streams and tasks. Is someone knows how to connect them or what is wrong with my recipe? Thanks in advance 🙂
    d
    h
    • 3
    • 5
  • c

    colossal-smartphone-90274

    11/24/2022, 4:26 PM
    Hi I am now using postgresql ingestion which permissions do I need to be able to ingest and profile the data -> This is the version of postgresql I am using and the roles -> https://www.postgresql.org/docs/13/default-roles.html
    a
    • 2
    • 3
  • w

    white-xylophone-3944

    11/25/2022, 5:25 AM
    Hello. I ingested glue source. and in my glue, I deleted some databases. but in datahub, they remained. so I deleted them using cli. but still, browse_path remained. how can i delete browse_path? and how can I delete removed dbs and tables in ingestion.
    ✅ 1
    a
    • 2
    • 4
  • f

    flaky-soccer-57765

    11/25/2022, 9:48 AM
    All, Anyone faced an issue while updating description for a tag. The tag was added from the ingestion script and when manually adding the description to tag, it fails stating that "URN not present". Any idea how to solve this please
    ✅ 1
    a
    • 2
    • 2
  • f

    full-planet-19427

    11/25/2022, 12:57 PM
    Hello everyone! I'm struggled trying ingest data from a S3 delta lake. I'm passing this recipe:
    Copy code
    source:
          type: delta-lake
          config:
            env: "DEV"
            base_path: "<s3://dl-gold-zone-dev/>"
            s3:
              aws_config:
                aws_region: "us-east-1"
                aws_access_key_id: {{ $aws_access_key_id }}
                aws_secret_access_key: {{ $aws_secret_access_key }}
    
        sink:
          type: "datahub-rest"
          config:
            server: "{{ $serverUrl }}"
    But when my Job starts, it gets this error: deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Failed to read S3 object content: Request ID: None Body: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>AuthorizationHeaderMalformed</Code><Message>The authorization header is malformed; the region 'custom' is wrong; expecting 'us-east-1'</Message><Region>us-east-1</Region><RequestId>AEW6HDPFQ0P65Z4C</RequestId><HostId>YlU8KVjy7UmSYF6tOM7iZmSJshn1tTKpCzF/mKogmz8lEPkB+ZWhcoNce4Laj/kNYmHTMiqWIRc=</HostId></Error> I can't understand where do I have to configure this region, even I've configured the "aws_region" parameter. Could someone help me to understand this problem?
    ✅ 1
    • 1
    • 1
  • f

    future-iron-16086

    11/25/2022, 6:23 PM
    Hi everyone! Is it possible to connect datahub with MS Teams in a self-hosted to send notifications?
    ✅ 1
    a
    b
    • 3
    • 4
  • e

    eager-cpu-59593

    11/26/2022, 2:29 PM
    Hi all! We've just deployed Datahub and we're trying to create an ingestion for elasticsearch, but we're facing some problems. We've tried to do it both from the CLI and the UI, but both ways provide the same results. The ingestion runs without any errors and it scans all existing indexes, but once it finishes "successfully" it says that no assets were ingested. The recipe that we're using is this one (some parts are changed for privacy):
    source:
    type: elasticsearch
    config:
    host: <our_host>:9200
    use_ssl: false
    verify_certs: false
    url_prefix: ""
    index_pattern:
    allow: [".*"]
    deny: ["^_."]
    sink:
    type: "datahub-rest"
    config:
    server: "http://<datahub-service-name>:8080"
    • The elastic we're trying to ingest has version 6.6.2, and the result logs after the ingestion are the following:
    Cli report:
    {'cli_version': '0.9.2+docker',
    'cli_entry_location': '/usr/local/lib/python3.10/site-packages/datahub/__init__.py',
    'py_version': '3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110]',
    'py_exec_path': '/usr/local/bin/python',
    'os_details': 'Linux-5.4.219-126.411.amzn2.x86_64-x86_64-with-glibc2.31',
    'mem_info': '128.81 MB'}
    Source (elasticsearch) report:
    {'events_produced': '0',
    'events_produced_per_sec': '0',
    'event_ids': [],
    'warnings': {},
    'failures': {},
    'index_scanned': '55',
    'filtered': [],
    'start_time': '2022-11-26 12:10:03.720786 (now).',
    'running_time': '0.46 seconds'}
    Sink (datahub-rest) report:
    {'total_records_written': '0',
    'records_written_per_second': '0',
    'warnings': [],
    'failures': [],
    'start_time': '2022-11-26 12:10:03.494297 (now).',
    'current_time': '2022-11-26 12:10:04.179827 (now).',
    'total_duration_in_seconds': '0.69',
    'gms_version': 'v0.9.2',
    'pending_requests': '0'}
    Pipeline finished successfully; produced 0 events in 0.46 seconds.
    • Some WARNINGS that appear during the ingestion are similar to this one:
    [2022-11-26 12:10:03,932] WARNING {datahub.ingestion.source.elastic_search:172} - Missing 'properties' in elastic search mappings={"status": {"properties": {"indexing_status": {"type": "keyword", "index": false}, "version": {"type": "long"}}}}!
    Does anyone have an idea of what might be happening or if we're doing something wrong? Thank you very much!
    ✅ 1
    m
    b
    r
    • 4
    • 6
  • t

    thankful-kite-1198

    11/27/2022, 12:21 PM
    hi everyone! could you please consult regarding ingesting metadata from API. Reading the documentation I found out that OpenAPI ingestor will seach for each GET method in a contract, than call it and parce metadata from response. Questions: 1. metadata about contract itself will not be published in datahub? 2. if contract contains some runtime functions how it will be parced? 3. is it correct that to understand what exactly will be shown in datahub after ingesting I can look at description in object "response" in API specification?
    ✅ 1
    a
    • 2
    • 1
  • q

    quaint-rainbow-60164

    11/28/2022, 3:38 AM
    hello guys, does anyone have experience on how to integrate
    lineage
    in PostgreSQL ingestion?
    ✅ 1
    d
    h
    • 3
    • 2
  • f

    full-chef-85630

    11/28/2022, 10:10 AM
    I now store the auditlog of multiple projects in a bigquery. We want to analyze the usage of each project and how to implement it. We need to write our own custom source @dazzling-judge-80093
    ✅ 1
    d
    • 2
    • 2
  • a

    alert-fall-82501

    11/28/2022, 10:22 AM
    I am trying to ingest the redshift metadata to datahub including the table lineage . When I am including in config file include_table_lineage : True ..ingestion is failing with errors ...the following are errors I am getting
    ✅ 1
    d
    • 2
    • 8
1...868788...144Latest