Hi everyone! We’re testing datahub features and so...
# ingestion
n
Hi everyone! We’re testing datahub features and some questions occurred. Not sure if this is right channel to ask these questions, but I hope someone can help:) 1. Where should we create a recipe file - in terminal or elsewhere? a. If in terminal, then what command to use to create recipe? The following doesn’t seem to work (see screens in thread). b. If recipe is set in UI: - When created in ui, we can’t see the results. What do we possibly do wrong? (also see screens in thread)
thank you 1
a.
b.
s
The recipe is created using a text editor. Or you can download example from here https://github.com/linkedin/datahub/tree/master/metadata-ingestion/examples/recipes to your computer
For 2nd can you share output of
docker ps -a
? There should be a container with
action
in its name. Need to ensure that is working fine for you or not
n
@square-activity-64562 here it is
s
you would need to check the actions container. It should be running. Check the logs maybe? Or re-run the quickstart
n
Where I can find logs?
s
Copy code
docker logs datahub_datahub-actions_1
s
Copy code
datahub docker quickstart
No Datahub Neo4j volume found, starting with elasticsearch as graph service.
To use neo4j as a graph backend, run
`datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml`
from the root of the datahub repo

Fetching docker-compose file <https://raw.githubusercontent.com/linkedin/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHub
Pulling elasticsearch          ... done
Pulling elasticsearch-setup    ... done
Pulling mysql                  ... done
Pulling datahub-gms            ... done
Pulling datahub-frontend-react ... done
Pulling datahub-actions        ... done
Pulling mysql-setup            ... done
Pulling zookeeper              ... done
Pulling broker                 ... done
Pulling schema-registry        ... done
Pulling kafka-setup            ... done

zookeeper is up-to-date
Recreating mysql ...
elasticsearch is up-to-date
broker is up-to-date
Recreating mysql               ... done
Recreating elasticsearch-setup ... done
Recreating kafka-setup         ... done
Recreating datahub-gms         ... done
Recreating mysql-setup         ... done
Recreating datahub-frontend-react    ... done
Recreating datahub_datahub-actions_1 ... done
...........
mysql is up-to-date
zookeeper is up-to-date
Starting mysql-setup ...
elasticsearch is up-to-date
Starting elasticsearch-setup ...
datahub-gms is up-to-date
broker is up-to-date
datahub-frontend-react is up-to-date
Starting mysql-setup         ... done
Starting elasticsearch-setup ... done
Starting kafka-setup         ... done
.........
✔ DataHub is now running
Ingest some demo data using `datahub docker ingest-sample-data`,
or head to <http://localhost:9002> (username: datahub, password: datahub) to play around with the frontend.
Copy code
docker logs datahub_datahub-actions_1
2022/03/03 14:43:40 Waiting for: <http://datahub-gms:8080/health>
2022/03/03 14:43:40 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:41 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:42 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:43 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:44 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:45 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:46 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:47 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:48 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:49 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:50 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:51 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:52 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:53 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:54 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:55 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:56 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:57 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:58 Problem with request: Get "<http://datahub-gms:8080/health>": dial tcp 172.18.0.7:8080: connect: connection refused. Sleeping 1s
2022/03/03 14:43:59 Received 200 from <http://datahub-gms:8080/health>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 358, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 179, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f3c7b495d60>, 'Connection to <http://api.mixpanel.com|api.mixpanel.com> timed out. (connect timeout=10)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<http://api.mixpanel.com|api.mixpanel.com>', port=443): Max retries exceeded with url: /engage (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3c7b495d60>, 'Connection to <http://api.mixpanel.com|api.mixpanel.com> timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/mixpanel/__init__.py", line 615, in _write_request
    response = <http://self._session.post|self._session.post>(
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 577, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 507, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='<http://api.mixpanel.com|api.mixpanel.com>', port=443): Max retries exceeded with url: /engage (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3c7b495d60>, 'Connection to <http://api.mixpanel.com|api.mixpanel.com> timed out. (connect timeout=10)'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/datahub", line 5, in <module>
    from datahub.entrypoints import main
  File "/usr/local/lib/python3.9/site-packages/datahub/entrypoints.py", line 11, in <module>
    from datahub.cli.delete_cli import delete
  File "/usr/local/lib/python3.9/site-packages/datahub/cli/delete_cli.py", line 21, in <module>
    from datahub.telemetry import telemetry
  File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 147, in <module>
    telemetry_instance = Telemetry()
  File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 46, in __init__
    mp.people_set(
  File "/usr/local/lib/python3.9/site-packages/mixpanel/__init__.py", line 238, in people_set
    return self.people_update({
  File "/usr/local/lib/python3.9/site-packages/mixpanel/__init__.py", line 392, in people_update
    self._consumer.send('people', json_dumps(record, cls=self._serializer))
  File "/usr/local/lib/python3.9/site-packages/mixpanel/__init__.py", line 594, in send
    self._write_request(self._endpoints[endpoint], json_message, api_key, api_secret)
  File "/usr/local/lib/python3.9/site-packages/mixpanel/__init__.py", line 623, in _write_request
    six.raise_from(MixpanelException(e), e)
  File "<string>", line 3, in raise_from
mixpanel.MixpanelException: HTTPSConnectionPool(host='<http://api.mixpanel.com|api.mixpanel.com>', port=443): Max retries exceeded with url: /engage (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3c7b495d60>, 'Connection to <http://api.mixpanel.com|api.mixpanel.com> timed out. (connect timeout=10)'))
2022/03/03 14:44:39 Command exited with error: exit status 1
How I can resolve this problem?
s
This should not be happening. I have raised a PR to fix this problem.
If you re-run the container if it manages to connect to mixpanel for telemetry this container should not crash. I understand this is not ideal. Once the PR is merged and a new release is made this should stop happening.