https://datahubproject.io logo
Join SlackCommunities
Powered by
# ingestion
  • a

    able-rain-74449

    04/08/2022, 7:29 AM
    Hi All i am hitting error white using source from UI see thread for error my source looks like:
    Copy code
    type: mysql
        config:
            host_port: '<http://datahub-mysql.oiasdaihsdoiahdoh.eu-west-1.rds.amazonaws.com::3306|datahub-mysql.oiasdaihsdoiahdoh.eu-west-1.rds.amazonaws.com::3306>'
            database: datahub
            username: admin
            password: mypwd
            include_tables: true
            include_views: true
            profiling:
                enabled: false
    sink:
        type: datahub-rest
        config:
            server: '<http://myserver.eu-west-1.elb.amazonaws.com:9002/api/gms>'
    b
    m
    +2
    • 5
    • 53
  • m

    many-guitar-67205

    04/08/2022, 8:23 AM
    I was going through the datahub entities on https://demo.datahubproject.io/browse/dataset/prod/datahub/entities and it got me wondering. How was this ingested? What was the source?
    h
    • 2
    • 1
  • b

    billions-twilight-48559

    04/08/2022, 12:40 PM
    Hi, When creating for example glosary terms with “/” character it has an unexpecting behavior, if I create a Term with the name “Org./Destination Country” it creates a Folder “Org.” and a Term “/Destination Country”. We are doing it in yaml (with quotation) and importing it with the cli ingest. There is any scape character?, Something curious is the “/” character drive the sub element creation but also appears as text
    l
    g
    • 3
    • 4
  • p

    plain-farmer-27314

    04/08/2022, 2:25 PM
    hi @square-activity-64562 wondering why the bigquery_usage config was changed to allow any tables/datasets vs the regex implementation that existed prior: https://github.com/datahub-project/datahub/blob/7c3ad3d2931cf6e19cdd94c65e44f684de[…]ion/src/datahub/ingestion/source_config/usage/bigquery_usage.py
    s
    l
    h
    • 4
    • 14
  • m

    most-waiter-95820

    04/08/2022, 5:03 PM
    @square-activity-64562 heya. Just raising awareness about a new weird issue with BigQuery lineage 😅 https://github.com/datahub-project/datahub/issues/4623
    s
    • 2
    • 3
  • i

    icy-piano-35127

    04/08/2022, 7:16 PM
    Hey guys! I'm having problems with the Redash. Basically it's appearing like a Platform (i don't know if that's a common behavior because i also have the metabase metadata in my datahub and it doesn't' appear as a platform) (First image). But when i oppen the redash platforms it has no results. (Second image). Question: • The redash should appear like a platform? If yes, it should have their entities inside it, right? • The metabase should appear like a platform too?
    g
    n
    +2
    • 5
    • 23
  • m

    mammoth-fountain-32989

    04/11/2022, 9:34 AM
    Hi, Can someone please help me with enabling push mechanism for ingesting metadata changes to Datahub from Postgresql. Have setup a sample recipe from UI and able to extract from Postgresql to datahub using API mechanism. Is the push mechanism available only via kafka to post metadata changes. Please help with some samples if available, Thanks in advance
    e
    • 2
    • 2
  • b

    brave-forest-5974

    04/11/2022, 3:39 PM
    ❓ Question on ingestion from multiple Looker instances.... To prevent the dashboards and elements overwriting each other, what should we do? I've set the platform name for now, but that then means that these entities aren't grouped under Looker in the UI 😞
    b
    m
    • 3
    • 5
  • n

    nutritious-bird-77396

    04/11/2022, 8:30 PM
    After ingesting Okta i have got a lot of corpgroups with HTML Encoding reference.... For ex:
    urn:li:corpGroup:Data%20Platform
    Is there a way i can use the
    okta_profile_to_group_name_regex
    to remove these special characters - https://datahubproject.io/docs/metadata-ingestion/source_docs/okta Adding a pattern like this
    "[^\\s]+"
    returns
    Data
    where as what I am expecting is
    DataPlatform
    without any spaces... Is there a way this can be achieved just with regex to
    urn:li:corpGroup:DataPlatform
    ?
    b
    • 2
    • 3
  • b

    bitter-toddler-42943

    04/12/2022, 1:35 AM
    Hello, Team. I have some problems on Ingestion. After set-up successfully,, I am trying to execute on [Manage Ingestion] UI but the details of error shows above.
    b
    • 2
    • 2
  • b

    bitter-toddler-42943

    04/12/2022, 1:35 AM
    ~~~~ Execution Summary ~~~~
    RUN_INGEST - {'errors': [],
    'exec_id': 'b41cc395-054d-47a6-a683-377b91965383',
    'infos': ['2022-04-11 06:34:26.234370 [exec_id=b41cc395-054d-47a6-a683-377b91965383] INFO: Starting execution for task with name=RUN_INGEST',
    '2022-04-11 06:34:54.884051 [exec_id=b41cc395-054d-47a6-a683-377b91965383] INFO: stdout=Requirement already satisfied: pip in '
    '/tmp/datahub/ingest/venv-b41cc395-054d-47a6-a683-377b91965383/lib/python3.9/site-packages (21.2.4)\n'
    'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/pip/\n"
    'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/pip/\n"
    'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/pip/\n"
    'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/pip/\n"
    'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/pip/\n"
    'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/wheel/\n"
    'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/wheel/\n"
    'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/wheel/\n"
    'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/wheel/\n"
    'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/wheel/\n"
    'ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)\n'
    'ERROR: No matching distribution found for wheel\n'
    'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/acryl-datahub/\n"
    'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/acryl-datahub/\n"
    'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/acryl-datahub/\n"
    'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/acryl-datahub/\n"
    'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
    "'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /simple/acryl-datahub/\n"
    'ERROR: Could not find a version that satisfies the requirement acryl-datahub[datahub-rest,mssql]==0.8.26.6 (from versions: none)\n'
    'ERROR: No matching distribution found for acryl-datahub[datahub-rest,mssql]==0.8.26.6\n'
    '/tmp/datahub/ingest/venv-b41cc395-054d-47a6-a683-377b91965383/bin/python3: No module named datahub\n',
    "2022-04-11 06:34:54.884314 [exec_id=b41cc395-054d-47a6-a683-377b91965383] INFO: Failed to execute 'datahub ingest'",
    '2022-04-11 06:34:54.886440 [exec_id=b41cc395-054d-47a6-a683-377b91965383] INFO: Caught exception EXECUTING '
    'task_id=b41cc395-054d-47a6-a683-377b91965383, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
    '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
    '    self.event_loop.run_until_complete(task_future)\n'
    '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
    '    return f.result()\n'
    '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
    '    raise self._exception\n'
    '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
    '    result = coro.send(None)\n'
    '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
    '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
    "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    b
    • 2
    • 2
  • b

    bitter-toddler-42943

    04/12/2022, 3:13 AM
    [zwahn@datahub-zwahn ~]$ python3 --version Python 3.6.8
    b
    • 2
    • 1
  • i

    icy-ram-1893

    04/12/2022, 5:02 AM
    Hi , I am trying to ingest some sample data from mssql , with this config : source: type: mssql config: host_port: '192.168.70.59:1433' database: AdventureWorks2019 username: sa password: sapass sink: type: datahub-rest config: server: 'http://datahub.cloud.tiddev.com/api/gms' and receive the errors :
    Copy code
    ailed to establish a new connection: "
               "[Errno 101] Network is unreachable')': /simple/pip/\n",
    ERROR: Could not find a version that satisfies the requirement acryl-datahub[datahub-rest,mssql]==0.8.32 (from versions: none)\n'
               'ERROR: No matching distribution found for acryl-datahub[datahub-rest,mssql]==0.8.32\n'
    and this the log picture . It should be mentioned that we deploy datahub by Containerd in different server, but I could not find any documentation about datahub in ContainerD . Is it mandatory to install mssql plug-in to ingest data from it?
    h
    b
    • 3
    • 4
  • b

    bitter-toddler-42943

    04/12/2022, 6:24 AM
    Copy code
    Please check your configuration and make sure you are talking to the DataHub GMS (usually "
               '<datahub-gms-host>:8080) or Frontend GMS API (usually <frontend>:9002/api/gms)
    h
    b
    • 3
    • 6
  • s

    sticky-dawn-95000

    04/12/2022, 11:21 AM
    Hello, everyone. I have a question. Can I ingest metadata from hive using SSL connection? If possible, how can I configure yml file? I created and tested metadata ingestion from hive SSL connection like below:
    Copy code
    source:
     type: "hive"
     config:
      host_port: <http://test.myhiveserver.com:10000|test.myhiveserver.com:10000>
      database: default
      username: datahub
      password: 1234
      options:
       connect_args: {'ssl_cert':'cacert.pem'}
     
    sink:
     type: "datahub-rest"
     config:
      server: "<http://localhost:8080>"
    But it did not work. 😞
    Copy code
    ValueError: Password should be set if and only if in LDAP or CUSTOM mode; Remove password or use one of whose modes
    Please, help me..
    d
    g
    • 3
    • 2
  • s

    square-solstice-69079

    04/12/2022, 12:58 PM
    Any idea what makes stateful ingestion not work for Redshift? I think it worked before. I added a schema, and then removed it from schema_pattern allow. But the schema is still in the datahub.
    h
    • 2
    • 4
  • b

    brave-forest-5974

    04/12/2022, 5:28 PM
    ❓ What would cause a dbt table node to not have a lineage connection to its BigQuery table? I'm seeing this for a subset of nodes that appear in the manifest, but not the catalog (that's another story) I had earlier deleted these nodes to be able to recreate them
    e
    m
    • 3
    • 26
  • p

    plain-farmer-27314

    04/12/2022, 6:59 PM
    Hi all, what is currently the best way to "hide" datasets from users in the UI?
    e
    • 2
    • 5
  • c

    curved-crayon-1929

    04/12/2022, 7:53 PM
    HI All, i am trying to ingest S3 data by following this documentation https://datahubproject.io/docs/metadata-ingestion/source_docs/s3 however, when I try to ingest using the below code snippet I get glue and some testjob as shown in the image. Image 2 shows the glue with the table name(elb_logs) and database name(sampledb). Please correct me where I am going wrong here.
    Copy code
    source:
        type: glue
        config:
            aws_region: us-east-2
            aws_access_key_id: AKIA**************V
            aws_secret_access_key: j4EzEH12YEQLYP************0p4+K
            aws_session_token: null
            database_pattern:
                allow:
                    - sampledb
            table_pattern:
                allow:
                    - elb_logs
    sink:
        type: datahub-rest
        config:
            server: '<http://localhost:8080>'
    h
    • 2
    • 1
  • m

    mysterious-lamp-91034

    04/12/2022, 9:20 PM
    @early-lamp-41924 Let's use this thread to sync up Thrift ingestion. To recap what we discussed yesterday, • It is not a good idea to dedicated ThriftStruct, ThriftUnion and ThriftEnum entities. ThriftStruct/ThriftUnion will be part of the dataset. • We are not going to convert thrift to avro, then avro to mce.
    c
    e
    +2
    • 5
    • 22
  • s

    salmon-rose-54694

    04/13/2022, 1:33 AM
    Is
    acryl-datahub[airflow]
    open sourced? I find a bug and need to investigate.
    • 1
    • 1
  • c

    creamy-van-28626

    04/13/2022, 5:43 AM
    Hi team I am facing issue while running ingestion job from pod itself Below errors are coming
    h
    h
    • 3
    • 20
  • d

    dazzling-queen-76396

    04/13/2022, 6:51 AM
    Hey! Is it possible to get notifications about metadata ingestion failures (for example, to Slack) if I set up ingestion from UI?
    l
    c
    • 3
    • 3
  • e

    eager-animal-48107

    04/13/2022, 10:02 AM
    Hi Team, The feast connector for data ingestion is tested with which version, AS the connector is using gRPC and in latest release they dont have ay gRPC apis
    e
    m
    • 3
    • 2
  • d

    dazzling-alarm-64985

    04/13/2022, 10:50 AM
    Hi, i am deploying datahub on a private k8s cluster. datahub-acryl-datahub-actions is trying to reach pypi for installation of packages. I want to configure my own custom pypi repo without having to build my own custom datahub-acryl-datahub-actions docker image. Helpz 🙂
    s
    • 2
    • 4
  • d

    delightful-barista-90363

    04/13/2022, 7:22 PM
    Hello, I have tried ingesting metadata through both the s3 and s3 data lake ingest tasks. One being thing for us is the entity in datahub grabbing the tags that currently exist on the s3 bucket. I was wondering if there is any setting or what needs to be done to get that to work? Would like to do this for any AWS Resource fwiw
    m
    • 2
    • 4
  • m

    mysterious-nail-70388

    04/14/2022, 2:50 AM
    Hello,I want to replace the ES docker container with ES with the local version 7.16.2 with the username and password. How do I change?The original ES container does not have a username and password
    b
    e
    • 3
    • 7
  • i

    incalculable-forest-10734

    04/14/2022, 6:40 AM
    Hey, I ingested BigQuery table named tb_TABLE_SUFFIX (ex)
    tb_20220414
    ,
    tb_20220413
    ). I want to ingest the latest table
    tb_20220414
    but it ingested the oldest table
    tb_20220413
    . How can i ingest the latest suffixed table?
    e
    d
    • 3
    • 14
  • b

    bland-orange-13353

    04/14/2022, 7:35 AM
    This message was deleted.
    h
    • 2
    • 1
  • c

    creamy-van-28626

    04/14/2022, 7:56 AM
    Hi team, While running ingest from UI I am getting is this error
    m
    • 2
    • 9
1...363738...144Latest