Hi Team I am trying to execute the ingestion pipel...
# troubleshoot
n
Hi Team I am trying to execute the ingestion pipeline through UI. I am getting the following error Please help me out
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': 'ffb67bca-4da2-40f2-b846-34c17e167ce9',
 'infos': ['2022-09-15 07:35:04.529251 [exec_id=ffb67bca-4da2-40f2-b846-34c17e167ce9] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-09-15 07:35:08.306135 [exec_id=ffb67bca-4da2-40f2-b846-34c17e167ce9] INFO: stdout=Requirement already satisfied: pip in '
           '/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages (21.2.4)\n'
           'ERROR: Exception:\n'
           'Traceback (most recent call last):\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", '
           'line 173, in _main\n'
           '    status = self.run(options, args)\n'
           '    state = resolution.resolve(requirements, max_rounds=max_rounds)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", '
           'line 341, in resolve\n'
           '    resp = self.send(prep, **send_kwargs)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", '
           'line 655, in send\n'
           '    r = adapter.send(request, **kwargs)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/cachecontrol/adapter.py", '

           'ValueError: check_hostname requires server_hostname\n'
           'ERROR: Exception:\n'
           'Traceback (most recent call last):\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", '
           'line 173, in _main\n'
           '    status = self.run(options, args)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", '
           'line 203, in wrapper\n'
           '    return func(self, options, args)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/commands/install.py", '
           'line 315, in run\n'
           '    requirement_set = resolver.resolve(\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", '
           'line 94, in resolve\n'
           '    result = self._result = resolver.resolve(\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", '
           'line 472, in resolve\n'
           '    state = resolution.resolve(requirements, max_rounds=max_rounds)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", '
           'line 341, in resolve\n'
           '    self._add_to_criteria(self.state.criteria, r, parent=None)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", '
           'line 172, in _add_to_criteria\n'
           '    if not criterion.candidates:\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/resolvelib/structs.py", '
           'line 151, in __bool__\n'
           '    return bool(self._sequence)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", '
           'line 140, in __bool__\n'
           '    return any(self)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", '
           'line 128, in <genexpr>\n'
           '    return (c for c in iterator if id(c) not in self._incompatible_ids)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", '
           'line 29, in _iter_built\n'
           '    for version, func in infos:\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", '
           'line 272, in iter_index_candidate_infos\n'
           '    result = self._finder.find_best_candidate(\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/package_finder.py", line '
           '851, in find_best_candidate\n'
           '    candidates = self.find_all_candidates(project_name)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/package_finder.py", line '
           '798, in find_all_candidates\n'
           '    page_candidates = list(page_candidates_it)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/sources.py", line '
           '134, in page_candidates\n'
           '    yield from self._candidates_from_page(self._link)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/package_finder.py", line '
           '758, in process_project_url\n'
           '    html_page = self._link_collector.fetch_page(project_url)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/collector.py", '
           'line 490, in fetch_page\n'
           '    return _get_html_page(location, session=self.session)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/collector.py", '
           'line 400, in _get_html_page\n'
           '    resp = _get_html_response(url, session=session)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/index/collector.py", '
           'line 115, in _get_html_response\n'
           '    resp = session.get(\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", '
           'line 555, in get\n'
           "    return self.request('GET', url, **kwargs)\n"
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_internal/network/session.py", '
           'line 454, in request\n'
           '    return super().request(method, url, *args, **kwargs)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", '
           'line 542, in request\n'
           '    resp = self.send(prep, **send_kwargs)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", '
           'line 655, in send\n'
           '    r = adapter.send(request, **kwargs)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/cachecontrol/adapter.py", '
           'line 53, in send\n'
           '    resp = super(CacheControlAdapter, self).send(request, **kw)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/requests/adapters.py", '
           'line 439, in send\n'
           '    resp = conn.urlopen(\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/connectionpool.py", line '
           '696, in urlopen\n'
           '    self._prepare_proxy(conn)\n'
           '  File '
           '"/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/connectionpool.py", line '
           '964, in _prepare_proxy\n'
           '    conn.connect()\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/connection.py", '
           'line 359, in connect\n'
           '    conn = self._connect_tls_proxy(hostname, conn)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/connection.py", '
           'line 500, in _connect_tls_proxy\n'
           '    return ssl_wrap_socket(\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/util/ssl_.py", '
           'line 453, in ssl_wrap_socket\n'
           '    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls)\n'
           '  File "/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/lib/python3.9/site-packages/pip/_vendor/urllib3/util/ssl_.py", '
           'line 495, in _ssl_wrap_socket_impl\n'
           '    return ssl_context.wrap_socket(sock)\n'
           '  File "/usr/local/lib/python3.9/ssl.py", line 500, in wrap_socket\n'
           '    return self.sslsocket_class._create(\n'
           '  File "/usr/local/lib/python3.9/ssl.py", line 997, in _create\n'
           '    raise ValueError("check_hostname requires server_hostname")\n'
           'ValueError: check_hostname requires server_hostname\n'
           '/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/bin/python3: No module named datahub\n',
           "2022-09-15 07:35:08.306472 [exec_id=ffb67bca-4da2-40f2-b846-34c17e167ce9] INFO: Failed to execute 'datahub ingest'",
           '2022-09-15 07:35:08.307260 [exec_id=ffb67bca-4da2-40f2-b846-34c17e167ce9] INFO: Caught exception EXECUTING '
           'task_id=ffb67bca-4da2-40f2-b846-34c17e167ce9, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
p
the main point is: look at hostname maybe error in name?
Copy code
check_hostname requires server_hostname
n
Where to check that?
p
source or gms hostname
n
I have mentioned the correct hostname in the ingestion yaml
Is this the problem with the Recipe yaml? Or is it something else?
p
in recipe yaml i'll quess, urllib3 like names like 'http://blah.blah' anyway the value error say that server_hostname - not a hostname
n
source: type: mongodb config: # Coordinates connect_uri: 'mongo-0.mongo-svc.telco-datastorage-mvp.svc.cluster.local:27017' # Credentials # Add secret in Secrets Tab with relevant names for each variable username: password: # profiling: # enabled: true # include_field_null_count: true # include_field_min_value: true # include_field_max_value: true # include_field_mean_value: true # include_field_median_value: true # include_field_stddev_value: true # include_field_quantiles: false # include_field_distinct_value_frequencies: # include_field_histogram: true # include_field_sample_values: false # query_combiner_enabled: true # max_number_of_fields_to_profile: # profile_table_level_only: # limit: # offset: # Options (recommended) enableSchemaInference: True useRandomSampling: True maxSchemaSize: 300 sink: type: datahub-rest config: server: 'http://datahub-mvp-datahub-gms.telco-dataprocessing-mvp:8080' # Add a secret in secrets Tab token:
This is the yaml Can't see any issue in it
p
Copy code
connect_uri: '<mongodb://host>:port' i'll quess
n
The same yaml is working fine when I am triggering it from the CLI. It is having issue on the UI
I have tested it using the following uri: connect_uri: 'mongodb://mongo-0.mongo-svc.telco-datastorage-mvp.svc.cluster.local:27017' still getting the same error
Copy code
ValueError: check_hostname requires server_hostname\n'
p
so( another idea if from cli works fine but not from UI. Is sink.config.token set?
n
no the token is not set how to do that?
h
Hey @numerous-account-62719 - gentle reminder to make use of threads when posting large blocks of code/stack trace.
Also, if you are ingesting using UI, you can skip the sink config from recipe.
n
Still getting the same error when I skipped the sink config
Copy code
'ValueError: check_hostname requires server_hostname\n'
h
Okay, so the issue seems to be with source config itself. when you mentioned its working from CLI - which container did you run it from ? UI ingestion runs on datahub-actions container, so you'll need to confirm network connectivity to mongodb service from there.
n
I used acryl-actions for ingesting the data through CLI. The acryl pod has network connectivity as well
h
Okay. are you able to connect to mongo server from datahub-actions container ? Can you try using mongo shell client ?
p
@hundreds-photographer-13496 Why do you think that the problem is in the connection, if the stacktrace explicitly says that the problem is in the variable check? Since there are only two places where it is used, we have already excluded the mongi host, only the request to gms, which without the token fails.
n
@hundreds-photographer-13496 Yes the container is able to curl to mongo.org
@purple-balloon-66501 are you sure the issue is with the token?
h
hmm, now after re-parsing the error log - I see this error in error log -
'/tmp/datahub/ingest/venv-ffb67bca-4da2-40f2-b846-34c17e167ce9/bin/python3: No module named datahub\n',
Looks like the failure is even before it tries to connect to mongodb. Probably some pip install is failing.
p
so we can check it with cron scheldule @numerous-account-62719 can u scheldule ingest ASAP? if it work fine with sceldule -> go to acryl and check connection with pypi.org
n
the ingestion is triggering with schedule but all the executions have same error
@hundreds-photographer-13496 I went through that link but we need proxy in order to connect to the internet. The k8s deployment pod does not have the internet access by default. We need to use proxy
@hundreds-photographer-13496 @purple-balloon-66501 @little-megabyte-1074 @dazzling-judge-80093 Can you gues please help me out in resolving this issue
h
n
Hi @hundreds-photographer-13496 Tried this thing but no success, still getting the same error
h
Hey @numerous-account-62719 is it possible for you to open internet access for the datahub-actions pod ?