https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • w

    wooden-chef-22394

    07/11/2022, 9:47 AM
    Hi~, why I don't have any stats for ClickHouse Data? How to enable stats of Dataset ui?
    h
    • 2
    • 1
  • b

    better-bird-87143

    07/11/2022, 1:12 PM
    Hi Everyone, I work with Great expectation api V2 with spark execution engine Is there any way to ingest GE Result to datahub and enable full functions of validation tab
    l
    h
    • 3
    • 4
  • r

    rich-policeman-92383

    07/11/2022, 1:25 PM
    Hello How do we change the stateful ingestion commit policy, what are the available options and will there be any consequences related to this change. Problem: With current commit policy on_no_errors, the checkpoint is not saved even if there is a warning during ingestion. Due to this stale metadata remains present in datahub. #Stateful Ingestion # Datahub Version: v0.8.38
  • p

    plain-guitar-45103

    07/11/2022, 5:32 PM
    Hello friends! I am trying to ingest delta table in S3. I am currently experiencing this error:
    Copy code
    'PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Failed to read S3 object content: Request ID: None '
               'Body: <?xml version="1.0" encoding="UTF-8"?>\n'
               '<Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. '
               'Please send all future requests to this '
               'endpoint.</Message><Endpoint><http://databricks-lake-dev-us-west-2.s3-us-west-2.amazonaws.com|databricks-lake-dev-us-west-2.s3-us-west-2.amazonaws.com></Endpoint><Bucket>databricks-lake-dev-us-west-2</Bucket><RequestId>N79MDMFJ56EK9V74</RequestId><HostId>h0jvugKsFOWOKy/EPr8NELkO85lO7YQYBKR0H33LqZ7U3HkjFB2iUOM2Ne/3reGDzbzKxfEYPMg=</HostId></Error>\n'
    I noticed people are experiencing similar issues with the delta-rs module when they don't pass in the correct s3 region to the DeltaTable class. I went ahead experimented with this behavior in my local Jupyter Notebook. I noticed when I pass in the correct region, my notebook will instantiate a DeltaTable object properly. When I pass the wrong region, I get the exact same error as the one I get when ingesting with Datahub. This leads me to believe that Datahub code is not handling aws region correctly. I then dug into the Datahub source code a bit and realized that the
    read_delta_table
    method is missing the region parameter. I believe that is the reason I am getting this failure. Can someone please confirm my suspicion? I am happy to open an issue on github or pair up any time today to investigate further! Thanks in advance!
    l
    m
    c
    • 4
    • 5
  • p

    plain-guitar-45103

    07/11/2022, 5:33 PM
    Here is my ingestion yaml:
    Copy code
    source:
        type: delta-lake
        config:
            base_path: 'my_base_path'
            relative_path: 'my_relative_path'
            s3:
                aws_config:
                    aws_access_key_id: xxxxx
                    aws_secret_access_key: xxxxxx
                    aws_region: us-west-2
    sink:
        type: datahub-rest
        config:
            server: '<http://172.17.0.1:8080>'
  • m

    mysterious-nail-70388

    07/12/2022, 2:59 AM
    Hello Now dataHub can use non-Docker components(ES, Kafka, MySQL) , will we delete data on non-Docker components when we delete data
    b
    • 2
    • 4
  • s

    steep-vr-39297

    07/12/2022, 3:25 AM
    Hi there, I want to send hive DB to datahub, but I keep getting errors. The connection URL used by datagrip is
    jdbc:<hive2://hive_host:10001/;transportMode=http;httpPath=cliservice>
    It's a recipe file
    Copy code
    source:
        type: hive
        config:
          host_port: hive_host:10001
          database: db_name
          username: id
          password: pw
          options:
           connect_args:
              http_path: "/cliservice"
              auth: LDAP
    
    sink:
        type: datahub-rest
        config:
          server: "<http://localhost:8080>"
    error message is
    Copy code
    [2022-07-12 12:09:54,931] ERROR    {datahub.entrypoints:184} - File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/entrypoints.py", line 149, in main
    
    ....
    
    '---- (full traceback above) ----
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/entrypoints.py", line 149, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/upgrade/upgrade.py", line 333, in wrapper
        res = func(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/telemetry/telemetry.py", line 338, in wrapper
        raise e
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/telemetry/telemetry.py", line 290, in wrapper
        res = func(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
        res = func(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/cli/ingest_cli.py", line 131, in run
        raise e
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/cli/ingest_cli.py", line 117, in run
        pipeline.run()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 217, in run
        self.preview_workunits if self.preview_mode else None,
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/ingestion/source/sql/sql_common.py", line 712, in get_workunits
        for inspector in self.get_inspectors():
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/ingestion/source/sql/sql_common.py", line 516, in get_inspectors
        with engine.connect() as conn:
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2263, in connect
        return self._connection_cls(self, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 104, in __init__
        else engine.raw_connection()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2370, in raw_connection
        self.pool.unique_connection, _connection
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
        return fn()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection
        return _ConnectionFairy._checkout(self)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
        fairy = _ConnectionRecord.checkout(pool)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
        rec = pool._do_get()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 140, in _do_get
        self._dec_overflow()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
        with_traceback=exc_tb,
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
        raise exception
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 137, in _do_get
        return self._create_connection()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
        return _ConnectionRecord(self)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
        self.__connect(first_connect_check=True)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
        pool.logger.debug("Error on connect(): %s", e)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
        with_traceback=exc_tb,
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
        raise exception
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
        connection = pool._invoke_creator(self)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
        return dialect.connect(*cargs, **cparams)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 508, in connect
        return self.dbapi.connect(*cargs, **cparams)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/pyhive/hive.py", line 126, in connect
        return Connection(*args, **kwargs)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/pyhive/hive.py", line 267, in __init__
        self._transport.open()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/thrift_sasl/__init__.py", line 93, in open
        status, payload = self._recv_sasl_message()
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/thrift_sasl/__init__.py", line 115, in _recv_sasl_message
        payload = self._trans_read_all(length)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/thrift_sasl/__init__.py", line 210, in _trans_read_all
        return read_all(sz)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/thrift/transport/TTransport.py", line 62, in readAll
        chunk = self.read(sz - have)
    File "/users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/thrift/transport/TSocket.py", line 167, in read
        message='TSocket read 0 bytes')
    
    TTransportException: TSocket read 0 bytes
    [2022-07-12 12:09:54,942] INFO     {datahub.entrypoints:188} - DataHub CLI version: 0.8.40.2 at /users/user/workspace/datahub/datahub-env/lib64/python3.6/site-packages/datahub/__init__.py
    [2022-07-12 12:09:54,942] INFO     {datahub.entrypoints:191} - Python version: 3.6.8 (default, Nov 16 2020, 16:55:22)
    [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] at /users/user/workspace/datahub/datahub-env/bin/python3 on Linux-3.10.0-693.2.2.el7.x86_64-x86_64-with-centos-7.9.2009-Core
    [2022-07-12 12:09:54,942] INFO     {datahub.entrypoints:193} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.40', 'commit': '5bb7fe3691e153ff64137a8bdd64ec1473b6095f'}}, 'managedIngestion': {'defaultCliVersion': '0.8.40', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
    Please, help me...
    c
    h
    • 3
    • 16
  • l

    lemon-zoo-63387

    07/12/2022, 4:04 AM
    Hello everyone, I met the following alarm. How to solve it. Thanks for your help. DBAPIError: (pytds.tds_base.Error) Client does not have encryption enabled but it is required by server, enable encryption and try connecting again
    Copy code
    'File "/tmp/datahub/ingest/venv-5e0fb85d-849d-4862-af24-8090d2718e47/lib/python3.9/site-packages/pytds/tds.py", line 1343, in '
               'parse_prelogin\n'
               "    raise tds_base.Error('Client does not have encryption enabled but it is required by server, '\n"
               '\n'
               'DBAPIError: (pytds.tds_base.Error) Client does not have encryption enabled but it is required by server, enable encryption and try '
               'connecting again\n'
               '(Background on this error at: <http://sqlalche.me/e/13/dbapi>)\n'
    l
    • 2
    • 2
  • f

    future-helmet-59694

    07/12/2022, 5:41 AM
    Hi everyone! We are trying out Data Hub’s push-based ingestion using Python REST emitter and we have some questions. We’ve created a process that each time it runs it ingests multiple Metadata Change Proposal events essentially describing a Data product: GlossaryTerm associated to GlossaryNode, several Datasets/DataJobs with Owners and Domain, Dataset’s Schema Fields and relationships between assets. What we’ve tested out so far is defining all MetadataChangeProposalWrapper events needed and then emitting each of them sequentially using
    DatahubRestEmitter
    ’s
    emit()
    method. Is there a better way to do this? How should we deal with the emission of multiple events as if it was some sort of single transaction and the possible rollback in case of any error with the ingestion process? Thanks in advance! 🙂
    m
    • 2
    • 4
  • s

    silly-ice-4153

    07/12/2022, 8:52 AM
    Hello all, I'm impressed from the datahub software so far to be honest. I try to get now also airflow 2.1.2 connected. I see the plugin in Admin->Plugins
    source	acryl-datahub-airflow-plugin==0.8.35.6: EntryPoint(name='acryl-datahub-airflow-plugin', value='datahub_airflow_plugin.datahub_plugin:DatahubPlugin', group='airflow.plugins')
    I set also the lazy_loading to False. I added this in a test dag
    Copy code
    task2 = PythonOperator(
        task_id='Execute_Test_Script',
        python_callable=main,
        dag=dag,
        inlets={
          "datasets": [
            Dataset("postgres", "postgres.test.y"),
          ],
        },
        outlets={
          "datasets": [
            Dataset("postgres", "postgres.test.y"),
            Dataset("postgres", "postgres.test.x"),
            Dataset("postgres", "postgres.test.z")
          ]
        },
    I added also the connection datahub_rest_default to the connections. But I don't see anything in the logs that it is emitting data. Has someone an idea what could be wrong?
    c
    d
    • 3
    • 26
  • q

    quick-article-20863

    07/12/2022, 3:00 PM
    Hello everyone, how are you? I'm trying to find, but without success until now, so I'm writing, to get a faster answer. There is a way to discover all SQL Server databases to do the ingestion? I try to remove the database parameter on the recipe and I receveid only a few system databases, the user databases don't appear in the list to me. Can you help me? If there is a link with this request, please share with me.
    plus1 1
    c
    • 2
    • 1
  • q

    quick-article-20863

    07/12/2022, 3:00 PM
    Thanks in advance
  • w

    witty-butcher-82399

    07/12/2022, 3:22 PM
    Platform instance feature is not implemented for the bigquery connector https://datahubproject.io/docs/generated/ingestion/sources/bigquery#module-bigquery
    BigQuery doesn’t need platform instances because project ids in BigQuery are globally unique.
    While the feature is not required in terms of uniqueness (project id is already included in the URN), setting the
    DataPlatformInstace
    aspect with the project id would enable the use of the project id for the platform instance faceting (as a filter in the searches). WDYT about this? I could do a PR if some agreement on this.
    c
    m
    • 3
    • 9
  • n

    numerous-bird-27004

    07/12/2022, 8:23 PM
    Anyone ingested metadata from S3 Delta Lake? I am stuck and need some help.
    m
    p
    +2
    • 5
    • 20
  • r

    rich-policeman-92383

    07/13/2022, 6:05 AM
    In stateful ingestion, do we have any way of forcefully making sure that the source datastore and datahub are same. Right now with hive, even with a successful ingestion many datasets/tables are not getting deleted from datahub. soft_deleted_stale_Entities is []. datahub version : v0.8.38
    c
    • 2
    • 11
  • w

    wonderful-egg-79350

    07/13/2022, 8:16 AM
    Is there a way to ingest metadata about 'owners' and 'about' using yml file and 'DataHub CLI'?
    b
    • 2
    • 3
  • c

    crooked-holiday-47153

    07/13/2022, 12:17 PM
    Hi All, I have set up a datahub lab environment using the getting started dockers and when I execute an ingestion the UI doesn't shows the process running and the logs doesn't say anything that helps me debug this. Any ideas?
    b
    • 2
    • 1
  • b

    breezy-portugal-43538

    07/13/2022, 12:40 PM
    Hello everyone, I wanted to ask quick question - is there some way to update the "stats" tab for a given urn using the curl command? Profiling doesn't entirely work for me, since the datahub has hardcoded "s3a" as a path and my s3 is on completely different host (url differs), you can find an error with reference below in the pasted image. Regardless of that - would it be possible to do profiling of data with manual update via curl? Perhaps there is some field inside the com.linkedin.metadata.snapshot.DatasetSnapshot that could be altered to get the same results as with profiling set to "True". Thank you deeply for all the help, you guys do tremendous work.
    c
    • 2
    • 7
  • f

    faint-advantage-18690

    07/13/2022, 1:11 PM
    Hi all, when using a custom transformer, is there a way to retrieve the name of the current entity in the
    transform_aspect()
    method?
    c
    • 2
    • 3
  • l

    lively-ice-56461

    07/13/2022, 2:34 PM
    Hello, i tried to use reporting systems , in the end of ingesting i got logs
    Copy code
    >datahub --debug ingest -c ./mssql.yml
    [2022-07-13 17:23:17,777] DEBUG    {datahub.telemetry.telemetry:201} - Sending init Telemetry
    [2022-07-13 17:23:18,130] DEBUG    {datahub.telemetry.telemetry:234} - Sending Telemetry
    [2022-07-13 17:23:18,302] INFO     {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.40.3rc2
    ......
    
    [2022-07-13 17:24:16,697] INFO     {datahub.ingestion.reporting.datahub_ingestion_reporting_provider:143} - Committing ingestion run summary for pipeline:'pipeline_name',instance:'mssql_localhost:1433_master', j
    ob:'common_ingest_from_sql_source'
    [2022-07-13 17:24:16,698] DEBUG    {datahub.emitter.rest_emitter:224} - Attempting to emit to DataHub GMS; using curl equivalent to:
    curl -X POST -H 'User-Agent: python-requests/2.28.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"pro
    posal": {"entityType": "dataJob", "entityUrn": "urn:li:dataJob:(urn:li:dataFlow:(datahub,pipeline_name_mssql_localhost:1433_master,prod),common_ingest_from_sql_source)", "changeType": "UPSERT", "aspectName": "datahubIngestionRunSummary", "aspect": {"value": "< report data>"}}
    
    [2022-07-13 17:24:16,759] INFO     {datahub.ingestion.reporting.datahub_ingestion_reporting_provider:169} - Committed ingestion run summary for pipeline:'pipeline_name',instance:'mssql_localhost:1433_master', jo
    b:'common_ingest_from_sql_source'
    [2022-07-13 17:24:16,760] INFO     {datahub.ingestion.run.pipeline:296} - Successfully committed changes for DatahubIngestionReportingProvider.
    [2022-07-13 17:24:16,760] INFO     {datahub.cli.ingest_cli:133} - Finished metadata pipeline
    [2022-07-13 17:24:16,760] DEBUG    {datahub.telemetry.telemetry:234} - Sending Telemetry
    
    Source (mssql) report:
    {'workunits_produced': 86,
     'workunit_ids': [<workunit ids here>],
    
     'warnings': {'database.schema.view': ['unable to map type BIT() to metadata schema']},
     'failures': {},
     'cli_version': '0.8.40.3rc2',
     'cli_entry_location': '\\lib\\site-packages\\acryl_datahub-0.8.40.3rc2-py3.8.egg\\datahub\\__init__.py',
     'py_version': '3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)]',
     'py_exec_path': 'Scripts\\python.exe',
     'os_details': 'Windows-10-10.0.19041-SP0',
     'tables_scanned': 5,
     'views_scanned': 1,
     'entities_profiled': 0,
     'filtered': [],
     'soft_deleted_stale_entities': [],
     'query_combiner': None}
    Sink (datahub-rest) report:
    {'records_written': 86,
     'warnings': [],
     'failures': [],
     'downstream_start_time': datetime.datetime(2022, 7, 13, 17, 23, 30, 127613),
     'downstream_end_time': datetime.datetime(2022, 7, 13, 17, 24, 15, 739817),
     'downstream_total_latency_in_seconds': 45.612204,
     'gms_version': 'v0.8.40'}
    
    Pipeline finished with 1 warnings in source producing 86 workunits
    [2022-07-13 17:24:18,093] DEBUG    {datahub.telemetry.telemetry:234} - Sending Telemetry
    Where can i look report info in datahub?
    • 1
    • 1
  • k

    kind-whale-32412

    07/13/2022, 7:15 PM
    hello there, is there a way to revert ingestion; or completely clean up ingestion in localhost?
    i
    • 2
    • 9
  • g

    gray-hair-27030

    07/13/2022, 7:42 PM
    Copy code
    How can I load the postgres data, so that it appears as datasets in datahub, since it currently appears as a table, I attach a print of my configuration
    c
    • 2
    • 6
  • p

    powerful-planet-87080

    07/13/2022, 9:12 PM
    How can I deploy a new custom metadata ingestion connector?
    c
    • 2
    • 5
  • p

    powerful-planet-87080

    07/13/2022, 9:12 PM
    I created a my-connector.py and a recipe (yaml file with a file sink)
  • p

    powerful-planet-87080

    07/13/2022, 9:13 PM
    I am working with quickstart on my local machine.
  • p

    powerful-planet-87080

    07/13/2022, 9:15 PM
    My preliminary understanding is that I may have to specify these in some registry.
  • c

    colossal-sandwich-50049

    07/13/2022, 9:35 PM
    Hello, I have two questions: 1. I am trying to install
    datahub-protobuf
    module described in this documentation (https://github.com/datahub-project/datahub/tree/master/metadata-integration/java/datahub-protobuf) but can't seem to find it on maven; can someone advise? 2. After running the code below with java emitter (using Scala below), I have found that some of the methods on
    DatasetProperties
    (e.g.
    setTags
    ,
    setQualifiedName
    ) don't alter anything in the user interface; can someone advise? Follow up: I notice, based on maven, that the
    datahub-client
    is fairly new; would it be fair to say that it's functionality is still fairly limited?
    Copy code
    val emitter: RestEmitter = RestEmitter.create(b => b
              .server("<http://localhost:8080>")
              .extraHeaders(Collections.singletonMap("Custom-Header", "custom-val"))
      )
      // emitter.testConnection()
    
      val tags = new StringArray()
      tags.add("featureStore")
      tags.add("bi")
    
      val url = new Url("<https://www.denofgeek.com/>")
    
      val customProperties = new StringMap()
      customProperties.put("governance", "disabled")
      customProperties.put("otherProp", "someValue")
    
      val mcpw = MetadataChangeProposalWrapper.builder()
        .entityType("dataset")
        .entityUrn("urn:li:dataset:(urn:li:dataPlatform:delta-lake,fraud.feature-stores.feature-store-v1,PROD)")
        .upsert
        .aspect(
          new DatasetProperties()
            .setName("feature-store")
            .setDescription("some feature store desc")
            .setTags(tags, SetMode.DISALLOW_NULL) // SetMode.IGNORE_NULL
            .setQualifiedName("fraudFeatureStore")
            .setExternalUrl(url)
        //    .setUri(new URI("<https://www.geeksforgeeks.org/>"))
            .setCustomProperties(customProperties)
        )
        .build
    
    val requestFuture = emitter.emit(mcpw, null).get()
    c
    m
    • 3
    • 8
  • e

    echoing-alligator-70530

    07/13/2022, 10:03 PM
    Hello everyone, is there a way to ingest csv file in datahub that is not the csv-enricher?
    m
    c
    r
    • 4
    • 4
  • w

    wonderful-egg-79350

    07/14/2022, 7:39 AM
    If the value of 'version' in the 'metadata_aspect_v2' column of 'mysql' is 1 or more, is there any problem if I delete it?
    plus1 1
    i
    m
    b
    • 4
    • 5
  • l

    loud-kite-94877

    07/14/2022, 8:04 AM
    Copy code
    'File "/tmp/datahub/ingest/venv-dc280a46-0332-4755-a38c-552445dc2860/lib/python3.9/site-packages/jpype/_jvmfinder.py", line 212, in '
               'get_jvm_path\n'
               '    raise JVMNotFoundException("No JVM shared library file ({0}) "\n'
               '\n'
               'JVMNotFoundException: No JVM shared library file (libjvm.so) found. Try setting up the JAVA_HOME environment variable properly.\n'
               '[2022-07-14 07:54:21,301] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.8.40 at '
               '/tmp/datahub/ingest/venv-dc280a46-0332-4755-a38c-552445dc2860/lib/python3.9/site-packages/datahub/__init__.py\n'
               '[2022-07-14 07:54:21,301] INFO     {datahub.entrypoints:179} - Python version: 3.9.9 (main, Dec 21 2021, 10:03:34) \n'
               '[GCC 10.2.1 20210110] at /tmp/datahub/ingest/venv-dc280a46-0332-4755-a38c-552445dc2860/bin/python3 on '
    This error appeared when run kafka-connect ingestion through ui.
    c
    • 2
    • 1
1...535455...144Latest