Hi guys!! I just started trying datahub…i’m using...
# troubleshoot
a
Hi guys!! I just started trying datahub…i’m using
acryl-datahub, version 0.8.16.11
with the docker quickstart, image tag says
head
, I’m trying to ingest data using
linkedin/datahub-ingestion
docker image with the following command
Copy code
docker run -v /Desktop/test:/datahub-ingestion linkedin/datahub-ingestion ingest -c ./datahub-ingestion/config.yaml
the config.yaml looks like this:
Copy code
source:
  type: "athena"
  config:
    # Coordinates
    aws_region: "xxx"
    work_group: "xxx"

    # Credentials
    username: "xxx"
    password: "xxx"
    database: "xxx"

    # Options
    s3_staging_dir: "<s3://xxx/>"

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>". #also tried "<http://datahub-gms:8080>"
then i’ve got this error:
Copy code
ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit admincube.cubeprod with ('Unable to emit metadata to DataHub GMS', {'message': "HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /datasets?
I tried
linkedin/datahub-ingestion:latest
as well as
linkedin/datahub-ingestion:head
Any idea? Cheers!
m
@agreeable-thailand-43234: you might need to nuke your datahub-ingestion container as it is probably tainted by
latest
which is not the latest
:head
is the right one to use if you want to live on the current HEAD of the repo. If you want a container that is aligned with the pypi package releases, you can use
acryldata/datahub-ingestion:0.8.16.x
. We just started publishing them yesterday to make the python and container release process aligned.
a
I did try this one
docker pull linkedin/datahub-ingestion
which is tagged as latest
m
right.. we’ll have to nuke that one so people don’t keep tripping over it.
a
Also tried
docker pull linkedin/datahub-ingestion:head
m
can you check to make sure it actually pulled and removed the previous image
sometimes it will silently keep the
latest
image
the error you posted is clearly an error due to the
latest
tagged image
a
Yup, I removed them a few times using docker rm command rather than nuke
oh my bad..i forgot to post the error using
head
Copy code
[2021-11-17 23:52:31,548] ERROR    {datahub.entrypoints:101} - File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn

    161  def _new_conn(self):

 (...)

    170      if self.socket_options:

    171          extra_kw["socket_options"] = self.socket_options

    172  

    173      try:

--> 174          conn = connection.create_connection(

    175              (self._dns_host, self.port), self.timeout, **extra_kw

    ..................................................

     self = <urllib3.connection.HTTPConnection object at 0x7f6200ace7f0>

     self.socket_options = [(6, 1, 1, ), ]

     extra_kw = {'socket_options': [(...), ]}

     connection.create_connection = <function 'create_connection' connection.py:38>

     self._dns_host = 'localhost'

     self.port = 8080

     self.timeout = None
i tried using `"http://localhost:8080"`as well as
"<http://datahub-gms:8080>"
in the sink
config.yaml
m
hmm you need to add this container to the datahub network
otherwise it won’t see the hostname
a
you mean anchor to this:
m
yeah
let me see if there is a one liner for that
a
not sure how you do it (not an expert with docker 🙈)
docker network connect
?
m
Copy code
docker run --network=datahub_network <container_name>
since I guess you are running this container as a one-shot command?
a
yup
should i use docker compose to attach it permanently? i spin up the containers using
datahub docker quickstart
not sure if i’m on the right track to attach the image to the network
Copy code
docker run --network=datahub_network -v /Desktop/test:/datahub-ingestion linkedin/datahub-ingestion:head ingest -c ./datahub-ingestion/config.yaml
m
yup that’s what I was suggesting
did that work?
a
nope, that spins up a new container but it’s not attached to the
datahub_network
i’ve got this error:
Copy code
ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f90428a87f0>: Failed to establish a new connection: [Errno 111] Connection refused')
docker creates a container but not within the
datahub
m
can you set the config to “datahub-gms” for the host?
the
datahub
inside docker dashboard is merely the grouping due to docker-compose
a
oh hold on…i’ve change the sink from
localhost
to
datahub-gms
and seems it’s doing something
m
👍
a
🥳
yeeeeiii
thanks a lot @mammoth-bear-12532 🙌
m
it shouldn’t be this hard 🙂 we’ll fix our docs. Thanks for powering through!