refined-ability-35859
11/02/2022, 5:08 PMlively-dusk-19162
11/02/2022, 8:09 PMfull-chef-85630
11/03/2022, 6:14 AMlemon-cat-72045
11/03/2022, 7:27 AM'datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure source (bigquery)\n'
'[2022-11-03 07:23:42,552] ERROR {datahub.entrypoints:195} - Command failed: \n'
'\tFailed to configure source (bigquery) due to \n'
"\t\t'Missing provider configuration.'.\n"
'\tRun with --debug to get full stacktrace.\n'
"\te.g. 'datahub --debug ingest run -c /tmp/datahub/ingest/bb9624b9-d4aa-4af4-b861-cd287691400c/recipe.yml --report-to "
Do I need to config the stateful ingestion provider for the Kafka sink? Thanks!mammoth-gigabyte-6392
11/03/2022, 7:40 AMfrom datahub.ingestion.run.pipeline import Pipeline
def get_pipeline():
pipeline = Pipeline.create(
{
"source": {
"type": "s3",
"config": {
"path_specs": [{
"include": "<s3://path/to/my/json>"}],
"aws_config": {
"aws_access_key_id": "**************",
"aws_secret_access_key": "***************",
"aws_region": "*********"
},
"env": "prod",
"profiling": {"enabled": False},
},
},
"sink": {
"type": "datahub-rest",
"config": {
"server": "server-link",
"token": "*******"
}
},
}
)
return pipeline
def main():
pipeline = get_pipeline()
pipeline.run()
pipeline.pretty_print_summary()
if __name__ == '__main__':
main()
Cli report:
{'cli_version': '0.9.1',
'cli_entry_location': '/usr/local/lib/python3.8/dist-packages/datahub/__init__.py',
'py_version': '3.8.10 (default, Mar 15 2022, 12:22:08) \n[GCC 9.4.0]',
'py_exec_path': '/usr/bin/python3',
'os_details': 'Linux-5.4.172-90.336.amzn2.x86_64-x86_64-with-glibc2.29',
'mem_info': '232.53 MB'}
Source (s3) report:
{'events_produced': '0',
'events_produced_per_sec': '0',
'event_ids': [],
'warnings': {},
'failures': {},
'filtered': [],
'start_time': '2022-11-03 07:29:28.481404 (now).',
'running_time': '0.5 seconds'}
Sink (datahub-rest) report:
{'total_records_written': '0',
'records_written_per_second': '0',
'warnings': [],
'failures': [],
'start_time': '2022-11-03 07:29:28.471589 (now).',
'current_time': '2022-11-03 07:29:28.982979 (now).',
'total_duration_in_seconds': '0.51',
'gms_version': 'v0.8.45',
'pending_requests': '0'}
Pipeline finished successfully; produced 0 events in 0.5 seconds.
microscopic-mechanic-13766
11/03/2022, 9:26 AMsteep-family-13549
11/03/2022, 9:52 AMsteep-family-13549
11/03/2022, 9:55 AMdazzling-park-96517
11/03/2022, 11:15 AMSink:
Type: datahub-rest
Config:
Server: <http://datahub-Datahub-gms:8080>
Source:
Type: superset
Config:
Connect_uri: <myhost:port>
Username: myuser
Password: mypassword
But I always get the error below:
â self.access_token = login_response.json()[âaccess_tokenâ] â
ââKeyError: âaccess_tokenââ
I have the access to my superset implemented with keycloak.
Any suggestion to solve this problem? Thanks in advancerapid-army-98062
11/03/2022, 11:25 AM' entrypoint = u._get_entrypoint()\n'
'File "/tmp/datahub/ingest/venv-928a9961-8859-44e5-aaab-dfe230122564/lib/python3.9/site-packages/sqlalchemy/engine/url.py", line 172, in '
'_get_entrypoint\n'
' cls = registry.load(name)\n'
'File "/tmp/datahub/ingest/venv-928a9961-8859-44e5-aaab-dfe230122564/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line '
'277, in load\n'
' raise exc.NoSuchModuleError(\n'
'\n'
"NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:crate\n"
We have installed the following packages in the acryl-datahub-actions docker image :
RUN pip install crate acryl-datahub[sqlalchemy] crate[sqlalchemy]
however when the ingestion job runs, the crate[sqlalchemy] package is not present.
Any idea how we can get that loaded on the ingestion run from the datahub UI?delightful-barista-90363
11/03/2022, 4:32 PMgreen-lion-58215
11/03/2022, 5:07 PMFile "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
ValidationError: 1 validation error for BusinessGlossarySourceConfig
enable_auto_id
extra fields not permitted (type=value_error.extra)
bumpy-pharmacist-66525
11/03/2022, 6:02 PMpipeline_name
in the recipe (https://datahubproject.io/docs/metadata-ingestion/docs/dev_guides/stateful#sample-configuration). Is there a way to delete pipelines once they have been created?nutritious-salesclerk-57675
11/03/2022, 6:14 PM[2022-11-04, 01:48:09 ] {logging_mixin.py:109} INFO - Exception: Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 241, in _emit_generic
response = <http://self._session.post|self._session.post>(url, data=payload)
File "/opt/python3.8/lib/python3.8/site-packages/requests/sessions.py", line 577, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/requests/sessions.py", line 515, in request
prep = self.prepare_request(req)
File "/opt/python3.8/lib/python3.8/site-packages/requests/sessions.py", line 443, in prepare_request
p.prepare(
File "/opt/python3.8/lib/python3.8/site-packages/requests/models.py", line 318, in prepare
self.prepare_url(url, params)
File "/opt/python3.8/lib/python3.8/site-packages/requests/models.py", line 392, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '/aspects?action=ingestProposal': No scheme supplied. Perhaps you meant http:///aspects?action=ingestProposal?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/datahub_airflow_plugin/datahub_plugin.py", line 337, in custom_on_success_callback
datahub_on_success_callback(context)
File "/opt/python3.8/lib/python3.8/site-packages/datahub_airflow_plugin/datahub_plugin.py", line 204, in datahub_on_success_callback
dataflow.emit(emitter)
File "/opt/python3.8/lib/python3.8/site-packages/datahub/api/entities/datajob/dataflow.py", line 155, in emit
rest_emitter.emit(mcp)
File "/opt/python3.8/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 183, in emit
self.emit_mcp(item)
File "/opt/python3.8/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 218, in emit_mcp
self._emit_generic(url, payload)
File "/opt/python3.8/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 255, in _emit_generic
raise OperationalError(
datahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'message': "Invalid URL '/aspects?action=ingestProposal': No scheme supplied. Perhaps you meant http:///aspects?action=ingestProposal?"})
[2022-11-04, 01:48:09 ] {logging_mixin.py:109} INFO -
[2022-11-04, 01:48:09 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
I dont seem to get this error when I dont have a secret manager configured. This error only occurs when I try to integrate datahub to a composer instance with a secret manager configured. Does anyone have an idea as to what I am doing wrong here?lively-dusk-19162
11/03/2022, 7:10 PMeager-lifeguard-22029
11/03/2022, 11:45 PMlively-dusk-19162
11/04/2022, 2:57 AMmicroscopic-mechanic-13766
11/04/2022, 8:44 AMlimited-forest-73733
11/04/2022, 11:12 AMfew-carpenter-93837
11/04/2022, 12:17 PMdatahub ingest -c datahub-vertica-lineage-ingestion.dhub.yaml
Then how am I supposed to toggle the telemetry to disabled, mentioned here:
https://datahubproject.io/docs/cli/#user-guidefew-carpenter-93837
11/04/2022, 12:19 PMfew-carpenter-93837
11/04/2022, 1:00 PMlimited-forest-73733
11/04/2022, 1:59 PMmost-monkey-10812
11/04/2022, 2:03 PMdazzling-park-96517
11/04/2022, 3:14 PMHost_port: <https://my-secured-Druid-app:443>
Can somebody share the recipe for Druid connection?
Maybe is the sqlalchemy necessary?
Thanks in advanceripe-alarm-85320
11/04/2022, 5:22 PMquiet-school-18370
11/04/2022, 10:05 PMsink:
type: datahub-rest
config:
server: '<https://datahub.dev.dap.XXXXXX.com:8080>'
token : "XXXXXX"
source:
type: lookml
config:
github_info:
repo: 'XXXX' <repo address where deploy key is added>
# deploy_key_file: <file_address>
api:
base_url: '<https://dev-looker.XXXXXXX.com>'
client_secret: 'XXXXXXXXX'
client_id: XXXXXXXX
base_folder: /
pipeline_name: XXXXXXXXX
but when i am running datahub ingest -c recipe.dhub.yaml
command, i am receiving the following error
raise ConfigurationError(
ConfigurationError: Failed to initialize Looker client. Please check your configuration.
quiet-school-18370
11/04/2022, 10:06 PMgifted-rocket-7960
11/07/2022, 5:41 AMgifted-rocket-7960
11/07/2022, 5:42 AM