https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • o

    orange-flag-48535

    10/13/2022, 4:56 AM
    For ingestion of a relational source (say postgres), is it possible to store both the Database and the Table as distinct "Dataset" entities in Datahub? I would like to retain the parent-child relationship as well. I was looking at how a postgres database is represented on the Demo page, and couldn't find what I'm asking for - https://demo.datahubproject.io/browse/dataset/prod/postgres/calm-pagoda-323403/jaffle_shop
    b
    • 2
    • 1
  • a

    alert-fall-82501

    10/13/2022, 6:52 AM
    Hi Team - I wanted to apply authentication for user those are not belong to the same group or I want it in such way that only the particular domain access user can see same domain ,they should not able to see the other domain metadata ...can anybody help me with this ? ... provide if there is any documents on this .
    g
    • 2
    • 1
  • f

    full-chef-85630

    10/13/2022, 2:04 PM
    hi all, ingestion bigquery datasource, “sharded_table_pattern”, this parameter is no longer effective when upgrading from 41 to the latest version. Is there any change? Remember that there was no return value last time @bulky-soccer-26729 @dazzling-judge-80093
    d
    • 2
    • 10
  • f

    flaky-soccer-57765

    10/13/2022, 3:52 PM
    Hey all, I ingested a sql server source to datahub using CLI. Now I am trying to update terms in a dataset schema through emitter. I receive 404 exception and none if I call get_aspect_V2 def with aspect as "editableSchemaMetadata" and receive 200 if I use "SchemaMetadata". what does this mean ? Why does the glossary terms and descriptions are not exposed in editable metadataschema ? can you please help?
    g
    • 2
    • 2
  • b

    brave-businessperson-3969

    10/13/2022, 4:12 PM
    When using the command line client to ingest data (datahub ingest -c ...), is there any option to assign a run-id manually or at least set the part before the date-time section of the run-id?
    g
    • 2
    • 3
  • m

    microscopic-mechanic-13766

    10/13/2022, 4:49 PM
    Good afternoon, is there any known limitation for the ingestion of Hive?? I am asking this because I am trying to ingest and profile 40 tables (one of them has over 250.000 rows but the rest have 20 at best) and I keep getting errors like this. Note: I am using v0.8.45 for both the gms and the front containers (as I am deploying it on docker) and v0.0.8 for actions. I am also using CLI version 0.8.45.2
    exec-urn_li_dataHubExecutionRequest_6c3c0d3f-52fa-4b43-8956-f3ce9400a45d.log
    b
    • 2
    • 1
  • b

    best-eve-12546

    10/13/2022, 5:22 PM
    Hi! Was wondering if we can use the redshift-query-parser to parse sql queries without using the rest of the Redshift Source? Or is it mostly just using the old Postgres parser from when Redshift was forked?
    g
    • 2
    • 5
  • m

    millions-waiter-49836

    10/13/2022, 8:54 PM
    Can anyone help review this PR? https://github.com/datahub-project/datahub/pull/6173 It makes kafka connect ingestion support generic connectors
    ✅ 1
  • w

    white-xylophone-3944

    10/13/2022, 8:38 AM
    Hello, Can I link other metadata or hint? I ingest glue, athena, spark in datahub. each datasource has same datasource, but they can't be recognized. so I want to hint or link them. because, pipeline in spark. table schema in glue.
    g
    • 2
    • 4
  • w

    wooden-jackal-88380

    10/13/2022, 2:55 PM
    Hello there! We are using the dbt recipe in 2 separate dbt projects. One dbt projects generates certain models, the other dbt project uses those models as sources. After running the dbt ingestion recipe of both, we notice that we don’t see the E2E lineage in Datahub. However, if we run the dbt recipe from the first project (the one that creates the models) again, we do see end to end lineage. We would expect that models with a certain URN would not be completely overwritten by sources with the same URN. We know that one of the disadvantages of using multiple dbt projects is that you don’t get full documentation and lineage, but we were hoping to resolve this disadvantage with Datahub. Any ideas?
    g
    l
    • 3
    • 4
  • a

    alert-fall-82501

    10/14/2022, 12:05 PM
    Hi Team - I have question that , l have successfully ingested metadata for various sources in datahub but let suppose if there changes in any field or extra table is added to schema ..How will get the notification of this ? I have created airflow dag jobs for some and those are running at scheduled interval ..I just wanted know how will track the new changes ?
    m
    m
    • 3
    • 6
  • l

    lemon-hydrogen-83671

    10/14/2022, 4:40 PM
    Hey, is it common for ingestion via the cli to be slow? I’m running it within a github actions pipeline and im seeing it only produce 2 records per second.
    Copy code
    Sink (***-kafka) report:
    {'current_time': '2022-10-14 16:39:57.173070 (now).',
     'failures': [],
     'records_written_per_second': '2',
     'start_time': '2022-10-14 16:35:33.365443 (4 minutes and 23.81 seconds ago).',
     'total_duration_in_seconds': '263.81',
     'total_records_written': '771',
     'warnings': []}
    I’m using kafka as my sink so im a bit surprised its so slow
    g
    • 2
    • 3
  • r

    ripe-apple-36185

    10/14/2022, 6:24 PM
    Hi Team, I am using the s3 plugin to load metadata from csv files. These files are then used to create tables in Snowflake. The s3 plugin creates the file urns using
    .csv
    . The problem I am seeing is that the snowflake plugin will create lineage to the files, but using
    _csv
    in the urn. The problem that this creates is that I have two datasets for the same file and broken lineage. Has anyone else seen this?
    h
    • 2
    • 4
  • c

    chilly-scientist-91160

    10/14/2022, 6:44 PM
    Hi, quick question: I am trying to use the https://datahubproject.io/docs/generated/ingestion/sources/openapi ingestion module. I noticed it does not pick up any metadata on the fields (like description) because it tries to read the returned example and does not use the attached schema. Would this be a vailid feature request or am I missing something?
    g
    m
    • 3
    • 5
  • s

    salmon-jackal-36326

    10/14/2022, 6:56 PM
    @witty-plumber-82249 Another question, for example, let's say I have my application on EC2, and if I delete this machine, how can I have a YAML with all the settings? Tags, field descriptions? So I don't lose all my work?
    b
    • 2
    • 1
  • a

    alert-fall-82501

    10/15/2022, 4:10 PM
    Copy code
    Can anybody suggest on this ?   ....Ingesting metadata from bigquery to datahub private sever ip-10-231-6-97.ec2.internal
    *** Reading remote log from Cloudwatch log_group: airflow-dt-airflow-prod-Task log_stream: datahub_bigquery_ingest/mp5_ingest/2022-10-14T06_00_00+00_00/1.log.
    [2022-10-15 06:00:20,042] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: datahub_bigquery_ingest.mp5_ingest scheduled__2022-10-14T06:00:00+00:00 [queued]>
    [2022-10-15 06:00:20,123] {{taskinstance.py:1035}} INFO - Dependencies all met for <TaskInstance: datahub_bigquery_ingest.mp5_ingest scheduled__2022-10-14T06:00:00+00:00 [queued]>
    [2022-10-15 06:00:20,123] {{taskinstance.py:1241}} INFO - 
    --------------------------------------------------------------------------------
    [2022-10-15 06:00:20,124] {{taskinstance.py:1242}} INFO - Starting attempt 1 of 2
    [2022-10-15 06:00:20,124] {{taskinstance.py:1243}} INFO - 
    --------------------------------------------------------------------------------
    [2022-10-15 06:00:20,213] {{taskinstance.py:1262}} INFO - Executing <Task(BashOperator): mp5_ingest> on 2022-10-14 06:00:00+00:00
    [2022-10-15 06:00:20,224] {{standard_task_runner.py:52}} INFO - Started process 630 to run task
    [2022-10-15 06:00:20,297] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'datahub_bigquery_ingest', 'mp5_ingest', 'scheduled__2022-10-14T06:00:00+00:00', '--job-id', '47734', '--raw', '--subdir', 'DAGS_FOLDER/dt_datahub/pipelines/bigquery_metadata_dag.pay.py', '--cfg-path', '/tmp/tmpqhkznspm', '--error-file', '/tmp/tmpmd2axjcd']
    [2022-10-15 06:00:20,298] {{standard_task_runner.py:77}} INFO - Job 47734: Subtask mp5_ingest
    [2022-10-15 06:00:20,528] {{logging_mixin.py:109}} INFO - Running <TaskInstance: datahub_bigquery_ingest.mp5_ingest scheduled__2022-10-14T06:00:00+00:00 [running]> on host ip-10-231-6-97.ec2.internal
    [2022-10-15 06:00:21,041] {{taskinstance.py:1429}} INFO - Exporting the following env vars:
    AIRFLOW_CTX_DAG_EMAIL=data-engineering@xxxx.com
    AIRFLOW_CTX_DAG_OWNER=data-engineering
    AIRFLOW_CTX_DAG_ID=datahub_bigquery_ingest
    AIRFLOW_CTX_TASK_ID=mp5_ingest
    AIRFLOW_CTX_EXECUTION_DATE=2022-10-14T06:00:00+00:00
    AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-10-14T06:00:00+00:00
    [2022-10-15 06:00:21,084] {{subprocess.py:62}} INFO - Tmp dir root location: 
     /tmp
    [2022-10-15 06:00:21,085] {{subprocess.py:74}} INFO - Running command: ['bash', '-c', 'python3 -m datahub ingest -c /usr/local/airflow/dags/dt_datahub/recipes/prod/bigquery/mp5.yaml']
    [2022-10-15 06:00:21,102] {{subprocess.py:85}} INFO - Output:
    [2022-10-15 06:00:26,521] {{subprocess.py:89}} INFO - [2022-10-15 06:00:26,521] INFO     {datahub.cli.ingest_cli:179} - DataHub CLI version: 0.8.44
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO - [2022-10-15 06:06:40,032] ERROR    {datahub.entrypoints:192} -
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO - Traceback (most recent call last):
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO -     (self._dns_host, self.port), self.timeout, **extra_kw
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 95, in create_connection
    [2022-10-15 06:06:40,084] {{subprocess.py:89}} INFO -     raise err
    [2022-10-15 06:06:40,085] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 85, in create_connection
    [2022-10-15 06:06:40,085] {{subprocess.py:89}} INFO -     sock.connect(sa)
    [2022-10-15 06:06:40,085] {{subprocess.py:89}} INFO - socket.timeout: timed out
    [2022-10-15 06:06:40,086] {{warnings.py:110}} WARNING - /usr/local/airflow/.local/lib/python3.7/site-packages/watchtower/__init__.py:349: WatchtowerWarning: Received empty message. Empty messages cannot be sent to CloudWatch Logs
      warnings.warn("Received empty message. Empty messages cannot be sent to CloudWatch Logs", WatchtowerWarning)
    
    [2022-10-15 06:06:40,086] {{logging_mixin.py:109}} WARNING - Traceback (most recent call last):
    [2022-10-15 06:06:40,086] {{logging_mixin.py:109}} WARNING -   File "/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
        self.sniff_errors(record)
    [2022-10-15 06:06:40,087] {{logging_mixin.py:109}} WARNING -   File "/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
        if pattern.search(record.message):
    [2022-10-15 06:06:40,087] {{logging_mixin.py:109}} WARNING - AttributeError: 'LogRecord' object has no attribute 'message'
    [2022-10-15 06:06:40,087] {{subprocess.py:89}} INFO - During handling of the above exception, another exception occurred:
    [2022-10-15 06:06:40,087] {{subprocess.py:89}} INFO - Traceback (most recent call last):
    [2022-10-15 06:06:40,087] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
    [2022-10-15 06:06:40,087] {{subprocess.py:89}} INFO -     chunked=chunked,
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -     self._validate_conn(conn)
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -     conn.connect()
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connection.py", line 358, in connect
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -     self.sock = conn = self._new_conn()
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connection.py", line 182, in _new_conn
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO -     % (self.host, self.timeout),
    [2022-10-15 06:06:40,088] {{subprocess.py:89}} INFO - urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fbb89ffccd0>, 'Connection to <http://datahub-gms.xxxxx.com|datahub-gms.xxxxx.com> timed out. (connect timeout=30)')
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO - During handling of the above exception, another exception occurred:
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO - Traceback (most recent call last):
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/requests/adapters.py", line 499, in send
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO -     timeout=timeout,
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 828, in urlopen
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO -     **response_kw
    [2022-10-15 06:06:40,089] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 828, in urlopen
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -     **response_kw
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 828, in urlopen
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -     **response_kw
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 788, in urlopen
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -   File "/usr/local/airflow/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO -     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    [2022-10-15 06:06:40,090] {{subprocess.py:89}} INFO - urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<http://datahub-gms.xxxxx.com|datahub-gms.xxxxx.com>', port=8080): Max retries exceeded with url: /config (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fbb89ffccd0>, 'Connection to <http://datahub-gms.xxxxxx.com|datahub-gms.xxxxxx.com> timed out. (connect timeout=30)'))
    _record_initialization_failure
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO -     raise PipelineInitError(msg) from e
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - datahub.ingestion.run.pipeline.PipelineInitError: Failed to set up framework context
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - [2022-10-15 06:06:40,033] ERROR    {datahub.entrypoints:196} - Command failed:
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - 	Failed to set up framework context due to
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - 		'Failed to connect to DataHub' due to
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - 			'HTTPSConnectionPool(host='<http://datahub-gms.XXXXX.com|datahub-gms.XXXXX.com>', port=8080): Max retries exceeded with url: /config (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fbb89ffccd0>, 'Connection to <http://datahub-gms.XXXXX.com|datahub-gms.XXXXX.com> timed out. (connect timeout=30)'))'.
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - 	Run with --debug to get full stacktrace.
    [2022-10-15 06:06:40,103] {{subprocess.py:89}} INFO - 	e.g. 'datahub --debug ingest -c /usr/local/airflow/dags/dt_datahub/recipes/prod/bigquery/mp5.yaml'
    [2022-10-15 06:06:40,391] {{subprocess.py:93}} INFO - Command exited with return code 1
    [2022-10-15 06:06:40,869] {{taskinstance.py:1703}} ERROR - Task failed with exception
    
        result = execute_callable(context=context)
      File "/usr/local/lib/python3.7/site-packages/airflow/operators/bash.py", line 188, in execute
        f'Bash command failed. The command returned a non-zero exit code {result.exit_code}.'
    airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 1.
    [2022-10-15 06:06:40,947] {{local_task_job.py:154}} INFO - Task exited with return code 1
    [2022-10-15 06:06:40,990] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check
  • f

    fierce-monkey-46092

    10/16/2022, 5:42 PM
    Hello guys, I've tried file-based lineage on Datahub, which is running on Docker. Currently I made son and father. But i need to create a grandpa and big-grandpa. Can someone give me a guide for it? My config yml looks like this: version: 1 lineage: - entity: name: son type: dataset env: UAT platform: snowflake upstream: - entity: name: dad type: dataset env: UAT platform: oracle - entity: name: mom type: dataset env: UAT platform: kafka
    l
    d
    • 3
    • 5
  • w

    wooden-jackal-88380

    10/17/2022, 3:13 PM
    Hey there, any idea why the primary key is shown on the “main” Snowflake dataset. But not in the composed dbt/snowflake Dataset?
    g
    • 2
    • 5
  • n

    nutritious-finland-99092

    10/17/2022, 5:41 PM
    Hi guys, I'm trying to add the Nullable option as in the image below got from demo.datahubproject.tio via ingestion troughth DataHub API. I have modified the SchemaFieldClass but nothing seems to work, can anyone help me?
    Copy code
    fields.append(
                SchemaFieldClass(
                    fieldPath=column,
                    type=SchemaFieldDataTypeClass(type=column_type_class),
                    nativeDataType= f"{table_columns.get(column, {}).get('max_length', 0)}",
                    nullable = True, 
                    description=table_columns.get(
                        column, {}).get('description', ''),
                    lastModified=AuditStampClass(
                        time=int(round(datetime.timestamp(datetime.now()))), actor="urn:li:corpuser:carol"
                    ),
                )
            )
    g
    • 2
    • 15
  • a

    able-evening-90828

    10/17/2022, 7:06 PM
    After upgrading to
    0.9.0
    , the details of previous runs of UI ingestion were all lost. This includes the various columns and the execution logs ("DETAILS"). Is this a bug?
  • a

    ambitious-magazine-36012

    10/18/2022, 12:26 AM
    To ingest an excel based dataset from a custom source, is push based ingestion pattern with json the best option?
    d
    • 2
    • 1
  • s

    silly-oil-35180

    10/18/2022, 2:43 AM
    Hello, I want to insert custom queries to
    Queries
    Tab using GarphQL API. I cannot find any mutation api to update Queries.
    updateDataset
    exists, however
    datasetupdateinput
    doesn’t include Queries tab information(https://datahubproject.io/docs/graphql/inputObjects/#datasetupdateinput). anyone who updated Queries to your custom query..?
  • r

    rhythmic-gold-76195

    10/18/2022, 6:52 AM
    Hi everyone, I have a problem with DataHub, I have no received information in DataHub when I delete the processor in Nifi.
    d
    • 2
    • 8
  • q

    quick-pizza-8906

    10/18/2022, 1:43 PM
    Hello, I wonder whether there are some experiences in the community with ingesting s3 datasets which are partitioned? Anything special which can be set to make it clear to the user that dataset tehy are browsing is partitioned and spread across different prefixes in a bucket?
    g
    w
    • 3
    • 11
  • p

    prehistoric-fireman-61692

    10/18/2022, 2:29 PM
    Hi all, just starting with DataHub and have a couple of questions: 1. Is ingestion via
    .yaml
    the only way to ingest a large existing Business Glossary? What is the best method - via the BG UI plugin or via the CLI? 2. Which data
    sources
    are supported for the
    Stats
    and
    Queries
    tabs? Is there anyway to extract/update the
    Stats
    and
    Queries
    metadata if it isn’t natively supported through the ingestion integration?
    m
    • 2
    • 1
  • m

    miniature-plastic-43224

    10/18/2022, 4:12 PM
    Hi everyone, quick question. Is there anyway to use CURL command against GMS service to retrieve number of "corpuser"s?
    g
    a
    • 3
    • 2
  • s

    sparse-planet-56664

    10/19/2022, 9:34 AM
    Will the meta mappers on ingestions support “add_domain” in the future? Or should maybe an issue be created for this?
    b
    • 2
    • 2
  • a

    alert-fall-82501

    10/19/2022, 12:38 PM
    [2022-10-19 060514,691] {{taskinstance.py:1703}} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task self._execute_task_with_callbacks(context) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks result = self._execute_task(context, self.task) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1509, in _execute_task result = execute_callable(context=context) File "/usr/local/lib/python3.7/site-packages/airflow/operators/bash.py", line 188, in execute f'Bash command failed. The command returned a non-zero exit code {result.exit_code}.' airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code -9. [2022-10-19 060514,703] {{taskinstance.py:1280}} INFO - Marking task as UP_FOR_RETRY. dag_id=datahub_bigquery_ingest, task_id=bmp5_ingest, execution_date=20221018T060000, start_date=20221019T060154, end_date=20221019T060514 [2022-10-19 060514,732] {{standard_task_runner.py:91}} ERROR - Failed to execute job 50473 for task bmp5_ingest Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork args.func(args, dag=self.dag) File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command return func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run _run_task_by_selected_method(args, dag, ti) File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method _run_raw_task(args, ti) File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/task_command.py", line 184, in _run_raw_task error_file=args.error_file, File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper return func(*args, session=session, **kwargs) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task self._execute_task_with_callbacks(context) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks result = self._execute_task(context, self.task) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1509, in _execute_task result = execute_callable(context=context) File "/usr/local/lib/python3.7/site-packages/airflow/operators/bash.py", line 188, in execute f'Bash command failed. The command returned a non-zero exit code {result.exit_code}.' airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code -9. [2022-10-19 060514,830] {{local_task_job.py:154}} INFO - Task exited with return code 1 [2022-10-19 060514,904] {{local_task_job.py:264}} INFO - 0 downstream tasks scheduled from follow-on schedule check
    a
    • 2
    • 1
  • a

    alert-fall-82501

    10/19/2022, 12:38 PM
    Can anybody suggest on this ? I am trying ingest bigquery beta source to datahub .
  • t

    thankful-monitor-87245

    10/19/2022, 2:55 PM
    Hi All, Good day. I’m trying to ingest some data into my datahub instance. I have created a dataset which contains the special character ‘/’ in the name. The UI however considers ‘/’ as a delimiter and breaks it into a container and a dataset. Any suggestions on how I can bypass this. In the image below you can see in the browse path depicts a container CUSTOMER CREDIT::R and then a dataset 7_2::PAYMENT but I intend to create a dataset with the name CUSTOMER CREDIT::R /7_2::PAYMENT
    a
    • 2
    • 1
1...777879...144Latest