Got error while ingesting data in v0.8.0 Steps : ...
# ingestion
e
Got error while ingesting data in v0.8.0 Steps : 1. checkout v0.8.0 2. run nuke.sh 3. executed quickstart.sh (validated if all containers are up) 4. validated table datahub.metadata_aspect_v2 5. Executed ingestion.sh While ingesting got Error :
Copy code
ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit file:///bootstrap_mce.json:2 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect' doesn't exist Bind values:[com.linkedin.metadata.dao.EbeanMetadataAspect$PrimaryKey@64865b96, ]
👀 1
s
Hi Shikha, try running
docker exec -i mysql sh -c 'exec mysql datahub -udatahub -pdatahub' < docker/mysql/init.sql
, as specified at the following link: https://datahubproject.io/docs/debugging/
e
hi Vincenzo..tried the above as well, got :
docker exec -i mysql sh -c 'exec mysql datahub -udatahub -pdatahub' < docker/mysql/init.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1050 (42S01) at line 2: Table 'metadata_aspect_v2' already exists
s
So that table actually exist... could you please retry to launch the ingestion?
e
Got the same error again :
[2021-06-04 13:32:23,525] ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit file:///bootstrap_mce.json:3 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect' doesn't exist
seems it is looking for
datahub.metadata_aspect
in the ingestion script of v0.8.0
m
@early-hydrogen-59749: can you export DATAHUB_VERSION=v0.8.0 before running QuickStart.sh ?
d
I faced the same error today and that works for me. Thanks @mammoth-bear-12532!
m
Cool. @dazzling-book-76108 were you running from master or from the specific tag (v0.8.0)?
e
@mammoth-bear-12532 containers got up and running with v0.8.0. SQLException was resolved however ingestion still failed with SerializationException. https://datahubspace.slack.com/archives/C0244FHMHJQ/p1622817476015500
d
@mammoth-bear-12532 I am running directly from master.
👍 1
m
@dazzling-book-76108: you should no longer need to set the environment variable for this... after you pull from master, we have fixed all the issues on our end.
👍 1
f
Hi @mammoth-bear-12532, appreciate all the good work you guys are doing. I really want to get this running, but still get this error with the latest version. I had to add the metadata_aspect table manually. 1. Ingestion gives this error
Copy code
...
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 248, in _generic_from_json
ingestion    |     result = self._union_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 304, in _union_from_json
ingestion    |     raise schema.AvroException('Datum union type not in schema: %s', value_type)
ingestion    | avro.schema.AvroException: ('Datum union type not in schema: %s', 'com.linkedin.pegasus2avro.common.BrowsePaths')
ingestion exited with code 1
2. Trying the no-code metadata demo and getting error when writing an entity.
Copy code
curl "<http://mdhprot601:8080/entities?action=ingest>" -X POST -H "X-RestLi-Protocol-Version:2.0.0" --data "{   "entity":{      "value":{         "com.linkedin.metadata.snapshot.ServiceSnapshot":{            "urn": "urn:li:service:mydemoservice",            "aspects":[               {                  "com.linkedin.service.ServiceInfo":{                     "description":"My demo service",                     "owner": "urn:li:corpuser:user1"                  }               },               {                  "com.linkedin.common.BrowsePaths":{                     "paths":[                        "/my/custom/browse/path1",                        "/my/custom/browse/path2"                     ]                  }               }            ]         }      }   }}"
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/entities'\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n","message":"No root resource defined for path '/entities'","status":404}curl: (6) Could not resolve host: demo
curl: (3) [globbing] unmatched close brace/bracket in column 76
m
@faint-hair-91313: are you running release 0.8.0 or 0.8.1 or master?
f
master
m
The error messages are a bit confusing... in terms of what you should expect to see in your MySQL: we have deprecated metadata_aspect and moved to metadata_aspects_v2. @big-carpet-38439 to keep me honest here.
b
That's correct
Im looking at the above
So It looks like you're still running old containers unfortunately
at least for gms
f
hmm how can I check to be sure?
b
one easy way is to curl http://localhost:8080/config
if you get an error its going to be an old version @green-football-43791 to confirm
g
error or 404- yep
f
Yup, I get that.
Copy code
curl <http://mdhprot601:8080/config>
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/config'\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\
So what I am doing wrong, should I not pull from master?
Copy code
[gmarin@mdhprot601 datahub]$ datahub --version
acryl-datahub, version 0.3.3
g
you should be on latest master
what happens when you run
git log
?
f
Copy code
git log
commit 31eae2430087de0d4c9d0fda46f050e578bb4da1 (HEAD -> master, origin/master, origin/HEAD)
Author: Harshal Sheth <hsheth2@gmail.com>
Date:   Fri Jun 4 18:19:11 2021 -0700

    fix(ingest): support mssql encryption via ODBC (#2657)

commit 051fa253d1d7d8ce366f170d1ff417b966734443
Author: Harshal Sheth <hsheth2@gmail.com>
Date:   Fri Jun 4 18:12:26 2021 -0700

    fix(docker): use debug tag in local dev images (#2658)

commit e73c6525e9abaa1674f3d5bd3dee14ff97a5bff2
Author: Harshal Sheth <hsheth2@gmail.com>
Date:   Fri Jun 4 18:11:36 2021 -0700

    revert: "fix(docker): pin containers to golden hash for release (#2654)" (#2659)

    This reverts commit a483933eab6d072ba6d2a414e378eee1e80dbdc6 and moves
    us back to using HEAD in quickstart on the master branch.

commit b1ff56f606aa2c28e74e4846a39906696ac67900
Author: Gabe Lyons <itsgabelyons@gmail.com>
Date:   Fri Jun 4 17:54:17 2021 -0700

    docs(nocode): Adding documentation for no-migration upgrade option (#2656)

commit ee454eb5d60a8948e305a04793b1a12bb286a1cb
Author: Dexter Lee <dexter@acryl.io>
Date:   Fri Jun 4 15:13:32 2021 -0700
b
And how are you deploying?
quickstrat?
f
Yes
I hardcoded the DATAHUB_VERSION to 0.8.0 in the quickstart. Seemed that the default way was not working. Trying to hardcode at 0.8.1, see if I get any issues.
Same problem ...
b
It should be pinned as
export DATAHUB_VERSION=${DATAHUB_VERSION:-9829576}
I wonder if docker isn't refreshing your images or something
inside quickstart
if you ./nuke.sh and then ./quickstart.sh again let's see if that works
pretty stumped as to why you're running with old images...
f
On-prem installation, nexus repo, maybe something to do with that...
Copy code
$ sudo docker images
REPOSITORY                                            TAG       IMAGE ID       CREATED         SIZE
acryldata/datahub-upgrade                             head      311bf190e60e   2 days ago      283MB
linkedin/datahub-frontend-react                       latest    0fe49e21548a   4 days ago      579MB
linkedin/datahub-ingestion                            latest    b0a4af2ae320   4 days ago      1.57GB
linkedin/datahub-mce-consumer                         latest    86807c429e32   4 days ago      188MB
linkedin/datahub-mae-consumer                         latest    3543ba03c201   4 days ago      197MB
linkedin/datahub-gms                                  latest    57a94e065b5b   4 days ago      275MB
linkedin/datahub-kafka-setup                          latest    18a457859322   4 days ago      598MB
linkedin/datahub-elasticsearch-setup                  latest    2f8dfeb10834   4 days ago      14.3MB
mnexus001:8082/linkedin/datahub-frontend-react        latest    9d42ddf6fbed   13 days ago     579MB
mnexus001:8082/linkedin/datahub-mae-consumer          latest    bd191ea8c497   13 days ago     197MB
mnexus001:8082/linkedin/datahub-gms                   latest    8213ffb37dc9   13 days ago     275MB
mnexus001:8082/linkedin/datahub-mce-consumer          latest    bd59349e8c8f   13 days ago     188MB
mnexus001:8082/linkedin/datahub-frontend-react        <none>    e047c58e2251   2 weeks ago     579MB
mnexus001:8082/linkedin/datahub-mae-consumer          <none>    26d1a8106171   2 weeks ago     197MB
mnexus001:8082/linkedin/datahub-gms                   <none>    eaf6799ffb11   2 weeks ago     275MB
mnexus001:8082/linkedin/datahub-mce-consumer          <none>    b54646549ee3   2 weeks ago     188MB
mnexus001:8082/linkedin/datahub-frontend-react        <none>    e448478b8329   2 weeks ago     579MB
linkedin/datahub-ingestion                            <none>    ca905fa5f5c2   2 weeks ago     1.56GB
mnexus001:8082/linkedin/datahub-gms                   <none>    e24ebec15af2   2 weeks ago     275MB
mnexus001:8082/linkedin/datahub-mae-consumer          <none>    57257b33a7bf   2 weeks ago     197MB
mnexus001:8082/linkedin/datahub-mce-consumer          <none>    9306856493fe   2 weeks ago     188MB
neo4j                                                 4.0.6     520a5ddf4f6d   3 weeks ago     542MB
mnexus001:8082/neo4j                                  4.0.6     520a5ddf4f6d   3 weeks ago     542MB
mnexus001:8082/linkedin/datahub-kafka-setup           latest    606ea4fbf9a6   3 weeks ago     598MB
mnexus001:8082/linkedin/datahub-elasticsearch-setup   latest    2ffdf0c427a8   3 weeks ago     14.3MB
Nuked, now quickstart is running (put datahub version back)
Still not working ... 🤐
m
@faint-hair-91313: to simplify things, you could just run on master for now... there isn't a strong reason to pin to a release unless you are running in production. So I would just unset DATAHUB_VERSION.
for the pip acryl_datahub module, I believe the latest is 0.8.1.0 (https://pypi.org/project/acryl-datahub/)
f
pip freeze acryl-datahub==0.3.3
m
Your nexus images look worrying... anyway you can remove them and just use the images from linkedin/ and acryldata/ ?
f
It shows this. I would like the no-code metadata feature. I did a cleanup of my docker images and containers. Will try again.
m
pip install acryl-datahub --upgrade
should do the trick
but it does seem like you are running some old containers that are pre-release
f
Trying to clean-up and force him to get new containers.
Didn't work ... removed the entire datahub project. Started from scratch ... don't understand what is happening...
b
Same images showing when doing docker image ls?
Im fairly certain it’s an issue with your pulling of public images
f
I start to believe that, too
Copy code
$ sudo docker image ls
REPOSITORY                                       TAG       IMAGE ID       CREATED         SIZE
linkedin/datahub-frontend-react                  latest    0fe49e21548a   4 days ago      579MB
linkedin/datahub-mce-consumer                    latest    86807c429e32   4 days ago      188MB
linkedin/datahub-mae-consumer                    latest    3543ba03c201   4 days ago      197MB
linkedin/datahub-gms                             latest    57a94e065b5b   4 days ago      275MB
linkedin/datahub-kafka-setup                     latest    18a457859322   4 days ago      598MB
linkedin/datahub-elasticsearch-setup             latest    2f8dfeb10834   4 days ago      14.3MB
mnexus001:8082/neo4j                             4.0.6     520a5ddf4f6d   3 weeks ago     542MB
neo4j                                            4.0.6     520a5ddf4f6d   3 weeks ago     542MB
mysql                                            5.7       2c9028880e58   3 weeks ago     447MB
kibana                                           7.9.3     f9f7fac59a10   7 months ago    1.18GB
mnexus001:8082/kibana                            7.9.3     f9f7fac59a10   7 months ago    1.18GB
elasticsearch                                    7.9.3     1ab13f928dc8   7 months ago    742MB
mnexus001:8082/elasticsearch                     7.9.3     1ab13f928dc8   7 months ago    742MB
confluentinc/cp-kafka-rest                       5.4.0     9dbb7f03f2c7   17 months ago   1.06GB
mnexus001:8082/confluentinc/cp-kafka-rest        5.4.0     9dbb7f03f2c7   17 months ago   1.06GB
confluentinc/cp-kafka                            5.4.0     7fa4a6c57613   17 months ago   598MB
mnexus001:8082/confluentinc/cp-kafka             5.4.0     7fa4a6c57613   17 months ago   598MB
confluentinc/cp-schema-registry                  5.4.0     27756bdebb20   17 months ago   1.07GB
mnexus001:8082/confluentinc/cp-schema-registry   5.4.0     27756bdebb20   17 months ago   1.07GB
confluentinc/cp-zookeeper                        5.4.0     d834c6b4b3dc   17 months ago   598MB
mnexus001:8082/confluentinc/cp-zookeeper         5.4.0     d834c6b4b3dc   17 months ago   598MB
landoop/schema-registry-ui                       latest    a6e1e435b452   24 months ago   29.7MB
mnexus001:8082/landoop/schema-registry-ui        latest    a6e1e435b452   24 months ago   29.7MB
landoop/kafka-topics-ui                          0.9.4     f13537e7ec57   2 years ago     30.5MB
mnexus001:8082/landoop/kafka-topics-ui           0.9.4     f13537e7ec57   2 years ago     30.5MB
I am deploying on a Sandbox Azure machine, see if that happens there. My working environment is bullet proof, sometimes difficult to get things through.
b
Got it. Let’s test this. Are you able to docker pull linkedin/datahub-gms:9829576
If this runs successfully we should see it in your images
f
Copy code
$ sudo docker image ls
REPOSITORY                                       TAG       IMAGE ID       CREATED         SIZE
linkedin/datahub-gms                             9829576   45f1870d3519   3 days ago      275MB
Yes, it's there!
I removed the prev image and container and ran quickstart again. I see he is pulling an older version, for some reason.
Copy code
$ sudo docker image list
REPOSITORY                                       TAG       IMAGE ID       CREATED         SIZE
linkedin/datahub-gms                             9829576   45f1870d3519   3 days ago      275MB
linkedin/datahub-frontend-react                  latest    0fe49e21548a   4 days ago      579MB
linkedin/datahub-mce-consumer                    latest    86807c429e32   4 days ago      188MB
linkedin/datahub-mae-consumer                    latest    3543ba03c201   4 days ago      197MB
linkedin/datahub-gms                             latest    57a94e065b5b   4 days ago      275MB
I have this in the quickstart, but still pulling and using the latest.
Copy code
export DATAHUB_VERSION=${DATAHUB_VERSION:-9829576}
I adapted the Dockerfile for this version, too and seem to pick it up this time.
But other issues with ingestion, probably because other images are not up to date.
Copy code
curl <http://mdhprot601:8080/config>
{"noCode":"true"}
b
"I adapted the Dockerfile for this version," -- What exactly do you mean by this? I have no idea why the heck this is pulling the latest tag
can you check your quickstart.sh for sanity just to see that its using hte correct version?
it has an export DATAHUB_VERSION line inside IIRC
f
Copy code
$ cat quickstart.sh
#!/bin/bash

# Quickstarts DataHub by pulling all images from dockerhub and then running the containers locally. No images are
# built locally. Note: by default this pulls the latest (head) version; you can change this to a specific version by setting
# the DATAHUB_VERSION environment variable.
export DATAHUB_VERSION=${DATAHUB_VERSION:-9829576}
echo $DATAHUB_VERSION
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
#cd $DIR && docker-compose pull && docker-compose -p datahub up
cd $DIR && sudo /usr/local/bin/docker-compose pull && sudo /usr/local/bin/docker-compose -p datahub up
Sorry, I meant the docker-compose.yml
I have stopped all containers and removed all images. Trying again.
Copy code
sudo docker ps -a -q | xargs sudo docker rm
sudo docker images -a -q | xargs sudo docker rmi -f
Oh boy, so after he pulls what he needs, ingestion fails.
Copy code
...[HTTP Status:500]: INTERNAL SERVER ERROR...
b
So we can confirm we are on the correct version of GMS?
the curl to /config is now working/
?
f
Yes, the curl to config worked. But had issues when running ingestion.
b
Are you using the latest python ingest framework? Though this should not be throwing 500s regardless. Is there any additional error messaging provided at GMS?
Want to make sure we get to the bottom of this
f
Me, too. So I have two installations. 1. One on a sandbox Azure vm, where I don't have problems bypassing the proxy 2. The other one on premises, where I do have some issues. I am first trying to get 1. fixed. This is what I did.
Copy code
# stop all containers and remove images
sudo docker ps -a -q | xargs sudo docker rm
sudo docker images -a -q | xargs sudo docker rmi -f

# remove git repository
rm -R -f datahub

# pull git repository
git clone <http://github.com/linkedin/datahub.git>

# update acryl-datahub
pip install acryl-datahub 

# pull images and start containers
# include sudo docker compose in quickstart
# cd $DIR && sudo /usr/local/bin/docker-compose pull && sudo /usr/local/bin/docker-compose -p datahub up
./datahub/docker/quickstart.sh

# check running containers
sudo docker ps -a

# ingest demo data
# include sudo docker compose in ingestion
# cd $DIR && sudo /usr/local/bin/docker-compose pull && sudo /usr/local/bin/docker-compose -p datahub up
./datahub/docker/ingestion/ingestion.sh
And then when ingesting I got this error.
Copy code
...
ingestion    |                                       '\tat io.ebeaninternal.server.query.CQueryEngine.findMany(CQueryEngine.java:384)\n'
ingestion    |                                       '\t... 89 more\n',
ingestion    |                         'status': 500}},
ingestion    |               {'error': 'Unable to emit metadata to DataHub GMS',
ingestion    |                'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
ingestion    |                         'message': 'INTERNAL SERVER ERROR',
...
I have upgraded the python libraries.
Copy code
$ datahub version
DataHub CLI version: 0.8.1.0
Python version: 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
What else could I check?
Ok, quick update on solution 1. I ran nuke.sh before ingestion.sh and did the trick. No more failures when ingesting. Trying on 2, too.
Ufff, this is annoying. On premises, followed the same steps and still get this error when running the ingestion.
Copy code
...
ingestion    | [2021-06-09 11:30:43,864] ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit file:///bootstrap_mce.json:3 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect' doesn't exist 
...
@big-carpet-38439, you were asking if I use the latest python ingest framework. Checked pip freeze and this is what I got:
Copy code
$ pip freeze
acryl-datahub==0.8.1.0
attrs==21.2.0
avro-gen3==0.5.0
avro-python3==1.10.2
bcrypt==3.2.0
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==8.0.1
cryptography==3.4.7
cx-Oracle==8.1.0
distro==1.5.0
docker==5.0.0
docker-compose==1.29.2
dockerpty==0.4.1
docopt==0.6.2
entrypoints==0.3
expandvars==0.7.0
greenlet==1.1.0
idna==2.10
jsonschema==3.2.0
mypy-extensions==0.4.3
numpy==1.20.3
pandas==1.2.4
paramiko==2.7.2
pycparser==2.20
pydantic==1.8.2
PyNaCl==1.4.0
pyrsistent==0.17.3
python-dateutil==2.8.1
python-dotenv==0.17.1
pytz==2021.1
PyYAML==5.4.1
requests==2.25.1
six==1.16.0
SQLAlchemy==1.4.15
texttable==1.6.3
toml==0.10.2
typing-extensions==3.10.0.0
typing-inspect==0.6.0
tzlocal==2.1
urllib3==1.26.4
websocket-client==0.59.0
I have uploaded the ingestion output and the datahub-gms log as produced by docker logs. Really want to get this solved for the on premises.
b
Glad we got this working for Case 1. For case 2: Do you mind telling me which SQL tech you are using? Is this MySQL?
Thank you for the logs. Lot of useful info for me. Digging into it now
f
Here are the docker logs for ingestion and gms (stderr).
I am trying to get the sample ingested data loaded.
But eventually will use Oracle as source, if that is what you are referring to.
b
Hi. So based on these logs, it still looks like your version of GMS is behind. You should never be hitting the code path that is throwing the exception after the migration.
You can confirm this by curling localhost:8080/config again for this deployment
f
I don't get it ... I thought I pulled the last one. This is the manifest for the latest as itis in our Nexus Repo, isn't this the last version?
Curl of course does not work...
Copy code
$ curl <http://mdhprot601:8080/config>
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/config'\n\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/config'\n\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n\t... 62 more\n","message":"No root resource defined for path '/config'","status":404}[gmarin@mdhprot601 ~]$ echo $no_proxy
localhost,mnexus001,mnexus910,mdhprot601
b
So this confirms it
Running on an old version of GMS
f
Is it only the gms container that needs to be actual, or are there other services in-between?
b
For your ingestion, we just need GMS to be running
I mean GMS to be updated
GMS + MAE Consumer actually. But it's also useful to have MCE consumer updated.
f
OK... investigating with the local platform team... possibly I cannot overwrite the Nexus images with a new version
Hi guys, coming back to this. I went a bit deeper and tried to understand why the images confusion. Looking at docker hub, the latest datahub-gms has been pushed 7 days ago, while v0.8.1 has been pushed 6 days ago.
So v0.8.1 is newer but quickstart.sh keeps pulling latest.
I hardcoded the version in quickstart.sh to v0.8.1, but still pulls latest.
I then hardcoded docker-compose.yml, only for datahub-gms to v0.8.1, and that seems to pull the right version (tested with curl), but ingestion still fails, so I assume everything needs to be in sync to work.
b
What’s the ingestion error with 0.8.1? And are you ingesting to datahub-kafka or datahub-rest? Ingestion is expected to work, though MAE consumer and MCE consumer will not.
So for whatever reason QuickStart.sh is not working. Let me look into this on my end but my worry is that this is not going to repeatable
(In my env)
f
I am using the ingestion.sh sample data that comes with datahub. I am so close in getting it working ... it's annoying.
This is the error I am getting with v0.8.1
Copy code
$ ./docker/ingestion/ingestion.sh
Pulling ingestion ... done
WARNING: Found orphan containers (mysql, schema-registry, kibana, schema-registry-ui, datahub-mce-consumer, kafka-rest-proxy, datahub-mae-consumer, kafka-setup, datahub-gms, zookeeper, kafka-topics-ui, neo4j, broker, elasticsearch, elasticsearch-setup, datahub-frontend-react) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Creating ingestion ... done
Attaching to ingestion
ingestion    | [2021-06-10 15:06:05,574] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'file', 'config': {'filename': '/bootstrap_mce.json'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://datahub-gms:8080>'}}}
ingestion    | [2021-06-10 15:06:08,229] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit file:///bootstrap_mce.json:0
ingestion    | [2021-06-10 15:06:08,266] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit file:///bootstrap_mce.json:1
ingestion    | [2021-06-10 15:06:08,292] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit file:///bootstrap_mce.json:2
ingestion    | [2021-06-10 15:06:08,316] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit file:///bootstrap_mce.json:3
ingestion    | Traceback (most recent call last):
ingestion    |   File "/usr/local/bin/datahub", line 8, in <module>
ingestion    |     sys.exit(main())
ingestion    |   File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 85, in main
ingestion    |     sys.exit(datahub(standalone_mode=False, **kwargs))
ingestion    |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
ingestion    |     return self.main(*args, **kwargs)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
ingestion    |     rv = self.invoke(ctx)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
ingestion    |     return _process_result(sub_ctx.command.invoke(sub_ctx))
ingestion    |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
ingestion    |     return ctx.invoke(self.callback, **ctx.params)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
ingestion    |     return callback(*args, **kwargs)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 74, in ingest
ingestion    |     pipeline.run()
ingestion    |   File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
ingestion    |     for wu in self.source.get_workunits():
ingestion    |   File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/mce_file.py", line 37, in get_workunits
ingestion    |     for i, mce in enumerate(iterate_mce_file(self.config.filename)):
ingestion    |   File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/source/mce_file.py", line 18, in iterate_mce_file
ingestion    |     mce: MetadataChangeEvent = MetadataChangeEvent.from_obj(obj)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/dict_wrapper.py", line 41, in from_obj
ingestion    |     return conv.from_json_object(obj, cls.RECORD_SCHEMA)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 98, in from_json_object
ingestion    |     return self._generic_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 250, in _generic_from_json
ingestion    |     result = self._record_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 333, in _record_from_json
ingestion    |     field_value = self._generic_from_json(json_obj[field.name], writers_field.type, field.type)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 248, in _generic_from_json
ingestion    |     result = self._union_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 299, in _union_from_json
ingestion    |     return self._generic_from_json(value, s, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 231, in _generic_from_json
ingestion    |     return self._generic_from_json(json_obj, writers_schema, s)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 250, in _generic_from_json
ingestion    |     result = self._record_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 333, in _record_from_json
ingestion    |     field_value = self._generic_from_json(json_obj[field.name], writers_field.type, field.type)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 244, in _generic_from_json
ingestion    |     result = self._array_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 272, in _array_from_json
ingestion    |     return [self._generic_from_json(x, writers_schema.items, readers_schema.items)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 272, in <listcomp>
ingestion    |     return [self._generic_from_json(x, writers_schema.items, readers_schema.items)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 248, in _generic_from_json
ingestion    |     result = self._union_from_json(json_obj, writers_schema, readers_schema)
ingestion    |   File "/usr/local/lib/python3.8/site-packages/avrogen/avrojson.py", line 304, in _union_from_json
ingestion    |     raise schema.AvroException('Datum union type not in schema: %s', value_type)
ingestion    | avro.schema.AvroException: ('Datum union type not in schema: %s', 'com.linkedin.pegasus2avro.common.BrowsePaths')
ingestion exited with code 1
These are the running containers
Copy code
$ sudo docker ps -a
CONTAINER ID   IMAGE                                                        COMMAND                  CREATED          STATUS                          PORTS                                                                                            NAMES
0bed706190b2   mnexus001:8082/linkedin/datahub-ingestion:latest             "datahub ingest -c /…"   17 seconds ago   Exited (1) 7 seconds ago                                                                                                         ingestion
04690dd67e07   mnexus001:8082/linkedin/datahub-frontend-react:v0.8.1        "datahub-frontend/bi…"   2 minutes ago    Up 2 minutes (healthy)          0.0.0.0:9002->9002/tcp, :::9002->9002/tcp                                                        datahub-frontend-react
98931ec96d0a   mnexus001:8082/linkedin/datahub-mce-consumer:v0.8.1          "/bin/sh -c /datahub…"   2 minutes ago    Up 2 minutes (healthy)          0.0.0.0:9090->9090/tcp, :::9090->9090/tcp                                                        datahub-mce-consumer
e4b87adfd723   mnexus001:8082/landoop/kafka-topics-ui:0.9.4                 "/run.sh"                3 minutes ago    Up 2 minutes                    0.0.0.0:18000->8000/tcp, :::18000->8000/tcp                                                      kafka-topics-ui
d48e795290b3   mnexus001:8082/linkedin/datahub-mae-consumer:v0.8.1          "/bin/sh -c /datahub…"   3 minutes ago    Up 2 minutes (healthy)          9090/tcp, 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp                                              datahub-mae-consumer
57185b860869   mnexus001:8082/linkedin/datahub-gms:v0.8.1                   "/bin/sh -c /datahub…"   3 minutes ago    Up 2 minutes (healthy)          0.0.0.0:8080->8080/tcp, :::8080->8080/tcp                                                        datahub-gms
1a7b6fa60a4f   mnexus001:8082/confluentinc/cp-kafka-rest:5.4.0              "/etc/confluent/dock…"   3 minutes ago    Up 3 minutes                    0.0.0.0:8082->8082/tcp, :::8082->8082/tcp                                                        kafka-rest-proxy
4c792a142873   mnexus001:8082/linkedin/datahub-kafka-setup:v0.8.1           "/bin/sh -c ./kafka-…"   3 minutes ago    Exited (0) 2 minutes ago                                                                                                         kafka-setup
97561d70aee4   mnexus001:8082/landoop/schema-registry-ui:latest             "/run.sh"                3 minutes ago    Up 3 minutes                    0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                                        schema-registry-ui
4145b34af2e5   mnexus001:8082/confluentinc/cp-schema-registry:5.4.0         "/etc/confluent/dock…"   3 minutes ago    Up 3 minutes                    0.0.0.0:8081->8081/tcp, :::8081->8081/tcp                                                        schema-registry
cdc822a87518   mnexus001:8082/confluentinc/cp-kafka:5.4.0                   "/etc/confluent/dock…"   3 minutes ago    Up 3 minutes                    0.0.0.0:9092->9092/tcp, :::9092->9092/tcp, 0.0.0.0:29092->29092/tcp, :::29092->29092/tcp         broker
947cc7e9496c   mnexus001:8082/linkedin/datahub-elasticsearch-setup:v0.8.1   "dockerize /bin/sh -…"   3 minutes ago    Exited (0) About a minute ago                                                                                                    elasticsearch-setup
13750601227e   mnexus001:8082/kibana:7.9.3                                  "/usr/local/bin/dumb…"   3 minutes ago    Up 3 minutes                    0.0.0.0:5601->5601/tcp, :::5601->5601/tcp                                                        kibana
b1bd18bd3eba   mnexus001:8082/confluentinc/cp-zookeeper:5.4.0               "/etc/confluent/dock…"   3 minutes ago    Up 3 minutes                    2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp                                    zookeeper
da9d5ca55e49   mysql:5.7                                                    "docker-entrypoint.s…"   3 minutes ago    Up 3 minutes                    0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp                                             mysql
2aeccd941a5c   mnexus001:8082/elasticsearch:7.9.3                           "/tini -- /usr/local…"   3 minutes ago    Up 3 minutes (healthy)          0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp                                              elasticsearch
a2ac42f24e93   mnexus001:8082/neo4j:4.0.6                                   "/sbin/tini -g -- /d…"   3 minutes ago    Up 3 minutes                    0.0.0.0:7474->7474/tcp, :::7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp, :::7687->7687/tcp   neo4j
b
We are quite close. @green-football-43791 you’ve seen this datum union is not in schéma right? What does this mean again?
g
that indicates to me the
./docker/ingestion/ingestion.sh
is not pulling version 0.8.1- could it be possible that it is experiencing the same issue we were having with gms above?
b
Interesting, bc I believe that Gratiel verified earlier that he was pinning to 0.8.1 package for ingestion. But this is I guess different in that’s it’s the version of the container being pulled by ingestion.ah
g
right- maybe we need to pint the version for the ingestion docker script as well?
b
Do we do that in ingestion.sh? I believe so but yeah maybe an old version is being pulled as in the case above
f
YESS. That was it. Good catch! Ingestion was still on latest, and not v0.8.1. Adapted docker-compose.yml and got it working. I am ready to try the new things.
b
Thank goodness!!!
So glad we were able to get through this. Really appreciate your patience @faint-hair-91313!
Remote debugging not always the smoothest process 😛
f
Hold on, I found something else when deploying with quickstart.sh, after I nuke.sh.
Copy code
datahub-mae-consumer      | 09:09:47.686 [datahub-usage-event-consumer-job-client-0-C-1] ERROR o.s.k.listener.LoggingErrorHandler - Error while processing: ConsumerRecord(topic = DataHubUsageEvent_v1, partition = 0, leaderEpoch = 0, offset = 1, CreateTime = 1623661787568, serialized key size = 23, serialized value size = 577, headers = RecordHeaders(headers = [], isReadOnly = false), key = urn:li:corpuser:datahub, value = {"title":"DataHub","url":"<http://mdhprot601:9002/>","path":"/","hash":"","search":"","width":2194,"height":1101,"referrer":"<http://mdhprot601:9002/browse/dataset/prod/hive>","prevPathname":"<http://mdhprot601:9002/browse/dataset/prod/hive>","type":"PageViewEvent","actorUrn":"urn:li:corpuser:datahub","timestamp":1623661787246,"date":"Mon Jun 14 2021 11:09:47 GMT+0200 (Central European Summer Time)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36","browserId":"19c905d6-afae-4b8a-a7ec-c4f509c98e0e"})
datahub-mae-consumer      | org.springframework.kafka.listener.ListenerExecutionFailedException: Listener method 'public void com.linkedin.metadata.kafka.DataHubUsageEventsProcessor.consume(org.apache.kafka.clients.consumer.ConsumerRecord<java.lang.String, java.lang.String>)' threw exception; nested exception is java.lang.ClassCastException: com.linkedin.metadata.key.CorpUserKey cannot be cast to com.linkedin.identity.CorpUserInfo; nested exception is java.lang.ClassCastException: com.linkedin.metadata.key.CorpUserKey cannot be cast to com.linkedin.identity.CorpUserInfo
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.decorateException(KafkaMessageListenerContainer.java:1376)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeErrorHandler(KafkaMessageListenerContainer.java:1365)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:1277)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:1248)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:1162)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:971)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:775)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:708)
datahub-mae-consumer      |     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
datahub-mae-consumer      |     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
datahub-mae-consumer      |     at java.lang.Thread.run(Thread.java:748)
datahub-mae-consumer      | Caused by: java.lang.ClassCastException: com.linkedin.metadata.key.CorpUserKey cannot be cast to com.linkedin.identity.CorpUserInfo
datahub-mae-consumer      |     at java.util.Optional.ifPresent(Optional.java:159)
datahub-mae-consumer      |     at com.linkedin.metadata.kafka.hydrator.CorpUserHydrator.getHydratedEntity(CorpUserHydrator.java:42)
datahub-mae-consumer      |     at com.linkedin.metadata.kafka.hydrator.HydratorFactory.getHydratedEntity(HydratorFactory.java:32)
datahub-mae-consumer      |     at com.linkedin.metadata.kafka.transformer.DataHubUsageEventTransformer.setFieldsForEntity(DataHubUsageEventTransformer.java:106)
datahub-mae-consumer      |     at com.linkedin.metadata.kafka.transformer.DataHubUsageEventTransformer.transformDataHubUsageEvent(DataHubUsageEventTransformer.java:71)
datahub-mae-consumer      |     at com.linkedin.metadata.kafka.DataHubUsageEventsProcessor.consume(DataHubUsageEventsProcessor.java:44)
datahub-mae-consumer      |     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
datahub-mae-consumer      |     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
datahub-mae-consumer      |     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
datahub-mae-consumer      |     at java.lang.reflect.Method.invoke(Method.java:498)
datahub-mae-consumer      |     at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:171)
datahub-mae-consumer      |     at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:120)
datahub-mae-consumer      |     at org.springframework.kafka.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:48)
datahub-mae-consumer      |     at org.springframework.kafka.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:283)
datahub-mae-consumer      |     at org.springframework.kafka.listener.adapter.RecordMessagingMessageListenerAdapter.onMessage(RecordMessagingMessageListenerAdapter.java:79)
datahub-mae-consumer      |     at org.springframework.kafka.listener.adapter.RecordMessagingMessageListenerAdapter.onMessage(RecordMessagingMessageListenerAdapter.java:50)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeOnMessage(KafkaMessageListenerContainer.java:1327)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeOnMessage(KafkaMessageListenerContainer.java:1307)
datahub-mae-consumer      |     at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:1267)
datahub-mae-consumer      |     ... 8 common frames omitted
I am trying to run the advanced guide here (https://datahubproject.io/docs/advanced/no-code-modeling/#step-1-add-aspects) And don't seem to get through. Going through the deployment steps I spotted that issue, I am not sure it is related.
b
This is a known issue that should not be related. We have a fix coming soon but it only affects analytics related features. What do you mean by ‘don’t seem to get through’ here?