Hi guys great job with the latest release I do have a differ DataHub #ingestion

Hi guys, great job with the latest release. I do ...

faint-hair-91313

08/04/2021, 2:11 PM

Hi guys, great job with the latest release. I do have a different behavior with lineage between charts and datasets. I see the sources here, but not on the lineage graph.

mammoth-bear-12532

08/04/2021, 2:14 PM

@faint-hair-91313 is this something you are seeing in the new release and used to work in the old release? @green-football-43791

faint-hair-91313

08/04/2021, 2:14 PM

Yes.

faint-hair-91313

08/04/2021, 2:15 PM

Just upgraded and was trying different things. Including the STATS for Oracle.

faint-hair-91313

08/04/2021, 2:16 PM

By the way, I see you are doing sampling as in count rows < 1000, but Oracle has dictionaries that could possibly be queried instead of this. Would save a lot of time.

mammoth-bear-12532

08/04/2021, 2:16 PM

Sounds good. We’ll take a look. Did stats for oracle work?

green-football-43791

08/04/2021, 2:16 PM

@faint-hair-91313 does the lineage graph work for other entities?

mammoth-bear-12532

08/04/2021, 2:17 PM

Good point, we are using great expectations as “least common denominator” approach for profiling. We’ll probably need to add short cuts for specific sources that have inbuilt support.

faint-hair-91313

08/04/2021, 2:17 PM

Stats for Oracle. Yes, they've worked. See a snap here.

faint-hair-91313

08/04/2021, 2:17 PM

Lineage between datasets works ok.

faint-hair-91313

08/04/2021, 2:18 PM

green-football-43791

08/04/2021, 2:19 PM

Are you seeing lineage issues with all charts?

faint-hair-91313

08/04/2021, 2:20 PM

Had to look at 10 or so.. Yes. All have issues.

faint-hair-91313

08/04/2021, 2:20 PM

But the lineage is tracked in the main screen, not the graph.

green-football-43791

08/04/2021, 2:21 PM

Did you re-ingest chart data recently by any chance?

faint-hair-91313

08/04/2021, 2:22 PM

I did a clean install. Nuked everything and started over.

faint-hair-91313

08/04/2021, 2:22 PM

So yes 🙂

green-football-43791

08/04/2021, 2:22 PM

Ah ok- it seems like your chart relationships did not make it to the graph service

green-football-43791

08/04/2021, 2:22 PM

are you using neo4j or elastic?

faint-hair-91313

08/04/2021, 2:22 PM

Let me see ...

faint-hair-91313

08/04/2021, 2:23 PM

Copy code

bash-4.4$ sudo docker ps -a
CONTAINER ID   IMAGE                                       COMMAND                  CREATED       STATUS                      PORTS                                                                                      NAMES
a16edb15dda9   linkedin/datahub-kafka-setup:head           "/bin/sh -c ./kafka-…"   2 hours ago   Exited (0) 35 minutes ago                                                                                              kafka-setup
8d14304888b5   confluentinc/cp-schema-registry:5.4.0       "/etc/confluent/dock…"   2 hours ago   Up 36 minutes               0.0.0.0:8081->8081/tcp, :::8081->8081/tcp                                                  schema-registry
75c8ddea67b5   linkedin/datahub-frontend-react:head        "datahub-frontend/bi…"   2 hours ago   Up 36 minutes (healthy)     0.0.0.0:9002->9002/tcp, :::9002->9002/tcp                                                  datahub-frontend-react
c401305f99ae   confluentinc/cp-kafka:5.4.0                 "/etc/confluent/dock…"   2 hours ago   Up 36 minutes               0.0.0.0:9092->9092/tcp, :::9092->9092/tcp, 0.0.0.0:29092->29092/tcp, :::29092->29092/tcp   broker
9ea5bb0e43fd   linkedin/datahub-gms:head                   "/bin/sh -c /datahub…"   2 hours ago   Up 36 minutes (healthy)     0.0.0.0:8080->8080/tcp, :::8080->8080/tcp                                                  datahub-gms
ebe9a46d13a9   acryldata/datahub-mysql-setup:head          "dockerize /bin/sh -…"   2 hours ago   Exited (0) 36 minutes ago                                                                                              mysql-setup
61d1c1be9f01   linkedin/datahub-elasticsearch-setup:head   "dockerize /bin/sh -…"   2 hours ago   Exited (0) 35 minutes ago                                                                                              elasticsearch-setup
a90acb427bdd   confluentinc/cp-zookeeper:5.4.0             "/etc/confluent/dock…"   2 hours ago   Up 36 minutes               2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp                              zookeeper
8e266e2f3292   mysql:5.7                                   "docker-entrypoint.s…"   2 hours ago   Up 36 minutes               0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp                                       mysql
d0592890fbb2   elasticsearch:7.9.3                         "/tini -- /usr/local…"   2 hours ago   Up 36 minutes (healthy)     0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp                                        elasticsearch

I would say elastic?

green-football-43791

08/04/2021, 2:23 PM

yup, looks like there's no neo4j container

green-football-43791

08/04/2021, 2:24 PM

how many entities did you re-ingest?

faint-hair-91313

08/04/2021, 2:24 PM

All my charts, around 59.

green-football-43791

08/04/2021, 2:24 PM

and # of datasets?

green-football-43791

08/04/2021, 2:24 PM

trying to get a rough order of magnitude

faint-hair-91313

08/04/2021, 2:24 PM

3442

green-football-43791

08/04/2021, 2:24 PM

ok- lets try rebuilding your indexes

green-football-43791

08/04/2021, 2:25 PM

https://datahubproject.io/docs/how/restore-indices/

green-football-43791

08/04/2021, 2:25 PM

can you try running

Copy code

./docker/datahub-upgrade/datahub-upgrade.sh -u RestoreIndices

green-football-43791

08/04/2021, 2:25 PM

that should fix the lineage relationship

faint-hair-91313

08/04/2021, 2:26 PM

Ongoing ...

faint-hair-91313

08/04/2021, 2:26 PM

Didn't fix it.

green-football-43791

08/04/2021, 2:27 PM

hmm...

faint-hair-91313

08/04/2021, 2:27 PM

Copy code

...
Reading rows 16000 through 17000 from the aspects table.
Successfully sent MAEs for 17000 rows
Reading rows 17000 through 18000 from the aspects table.
Successfully sent MAEs for 18000 rows
Reading rows 18000 through 19000 from the aspects table.
Successfully sent MAEs for 19000 rows
Reading rows 19000 through 20000 from the aspects table.
Successfully sent MAEs for 19743 rows
Completed Step 4/4: SendMAEStep successfully.
Success! Completed upgrade with id RestoreIndices successfully.
Upgrade RestoreIndices completed with result SUCCEEDED. Exiting...
2021-08-04 14:26:27.814  INFO 1 --- [extShutdownHook] o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.

green-football-43791

08/04/2021, 2:28 PM

can you run

docker logs --tail datahub-gms

green-football-43791

08/04/2021, 2:28 PM

it might take a bit to process the re-index on gms side

green-football-43791

08/04/2021, 2:28 PM

if that is still printing out, let's give it some time

faint-hair-91313

08/04/2021, 2:29 PM

Copy code

sudo docker logs --tail datahub-gms
"docker logs" requires exactly 1 argument.
See 'docker logs --help'.

Usage:  docker logs [OPTIONS] CONTAINER

Fetch the logs of a container

faint-hair-91313

08/04/2021, 2:29 PM

Are you sure it's the right syntax.. Let me check ...

green-football-43791

08/04/2021, 2:30 PM

ah im sorry

green-football-43791

08/04/2021, 2:30 PM

--follow

green-football-43791

08/04/2021, 2:31 PM

(instead of --tail)

faint-hair-91313

08/04/2021, 2:31 PM

(Regarding the Oracle stats - look at this document https://www.oracle.com/technetwork/database/bi-datawarehousing/twp-stats-concepts-19c-5324209.pdf)

faint-hair-91313

08/04/2021, 2:32 PM

Log still running ... maybe it's still indexing?

green-football-43791

08/04/2021, 2:32 PM

sounds like it

faint-hair-91313

08/04/2021, 2:32 PM

Ok it stopped ...

faint-hair-91313

08/04/2021, 2:32 PM

Copy code

sbx2.tomo_sectors_hourly,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,edw.traffic_volume_h,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,edw.sco_sector_configuration_bb,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,edw.time_r,PROD)]
14:32:26.115 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sector_controlling_bb%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Csbx2.tomo_sectors_hourly%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.traffic_volume_h%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sco_sector_configuration_bb%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.time_r%2CPROD%29) - batchGet - 200 - 10ms
14:32:26.121 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=INCOMING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.traffic_volume_h%2CPROD%29 - get - 200 - 3ms
14:32:26.122 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.traffic_volume_h%2CPROD%29 - get - 200 - 4ms
14:32:26.124 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sector_controlling_bb%2CPROD%29 - get - 200 - 2ms
14:32:26.125 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=INCOMING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sector_controlling_bb%2CPROD%29 - get - 200 - 3ms
14:32:26.127 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.time_r%2CPROD%29 - get - 200 - 2ms
14:32:26.129 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=INCOMING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.time_r%2CPROD%29 - get - 200 - 2ms
14:32:26.130 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sco_sector_configuration_bb%2CPROD%29 - get - 200 - 2ms
14:32:26.133 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=INCOMING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sco_sector_configuration_bb%2CPROD%29 - get - 200 - 3ms
14:32:26.133 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=OUTGOING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Csbx2.tomo_sectors_hourly%2CPROD%29 - get - 200 - 2ms
14:32:26.135 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /lineage?direction=INCOMING&urn=urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Csbx2.tomo_sectors_hourly%2CPROD%29 - get - 200 - 1ms
14:32:26.137 [qtp544724190-121] INFO  c.l.m.r.entity.EntityResource - BATCH GET [urn:li:dataset:(urn:li:dataPlatform:oracle,edw.sector_crossing_bbs,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,stg6.nm_volumes_sector,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,edw.sector_controlling_bbs,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,edw.sco_sector_configuration_bbs,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,inf.sector_configuration_f,PROD), urn:li:dataset:(urn:li:dataPlatform:oracle,inf.flight_sector_d,PROD)]
14:32:26.155 [pool-11-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sector_crossing_bbs%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cstg6.nm_volumes_sector%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sector_controlling_bbs%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cedw.sco_sector_configuration_bbs%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cinf.sector_configuration_f%2CPROD%29,urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cinf.flight_sector_d%2CPROD%29) - batchGet - 200 - 18ms

faint-hair-91313

08/04/2021, 2:32 PM

Maybe it was displaying on screen

green-football-43791

08/04/2021, 2:33 PM

those logs look like they were triggered by you interacting with the app

green-football-43791

08/04/2021, 2:33 PM

otherwise the log is silent?

faint-hair-91313

08/04/2021, 2:33 PM

Yes

big-carpet-38439

08/04/2021, 2:35 PM

Gratiel do you mind sending over your GMS debug log file?

faint-hair-91313

08/04/2021, 2:35 PM

I have an idea... we have to bypass a proxy, maybe neo4j couldn't be retrieved and then your script went for the previous installation. Give me a minute to try and get that one. Need to adapt your docker to use the proxy

faint-hair-91313

08/04/2021, 2:39 PM

Now, I see this little baby:

Copy code

b81e56beaa5b   mnexus001:8082/neo4j:4.0.6                    "/sbin/tini -g -- /d…"   49 seconds ago   Up 46 seconds                      0.0.0.0:7474->7474/tcp, :::7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp, :::7687->7687/tcp   neo4j

faint-hair-91313

08/04/2021, 2:39 PM

Give 1 more min to run my ingestion.

faint-hair-91313

08/04/2021, 2:40 PM

Ah... damn, I am on an older version... d*** proxy again... I see gms:v0.8.6. Hold on, I'll get that fixed. Will tell you after.

green-football-43791

08/04/2021, 2:41 PM

ok- sounds good 👍

faint-hair-91313

08/04/2021, 2:44 PM

Copy code

Quickstarting DataHub: version v0.8.7
Datahub Neo4j volume found, starting with neo4j as graph service

green-football-43791

08/04/2021, 2:46 PM

ah ha.. that may explain it

faint-hair-91313

08/04/2021, 3:18 PM

Ok.. re-deployed everything with the new version, but still not graph relationship. Looking at the logs after running the restore indice I see this...

Copy code

15:17:24.391 [mae-consumer-job-client-0-C-1] ERROR c.l.m.k.MetadataAuditEventsProcessor - Error deserializing message: java.lang.NullPointerException
15:17:24.391 [mae-consumer-job-client-0-C-1] ERROR c.l.m.k.MetadataAuditEventsProcessor - Message: {"auditHeader": null, "oldSnapshot": null, "oldSystemMetadata": null, "newSnapshot": {"urn": "urn:li:dataset:(urn:li:dataPlatform:oracle,edw.flow_es,PROD)", "aspects": [{"platform": "urn:li:dataPlatform:oracle", "name": "edw.flow_es", "origin": "PROD"}]}, "newSystemMetadata": null, "operation": "UPDATE"}
15:17:24.391 [mae-consumer-job-client-0-C-1] INFO  c.l.m.k.MetadataAuditEventsProcessor - {urn=urn:li:dataset:(urn:li:dataPlatform:oracle,edw.flow_es,PROD), aspects=[{com.linkedin.metadata.key.DatasetKey={name=edw.flow_es, platform=urn:li:dataPlatform:oracle, origin=PROD}}, {com.linkedin.common.GlobalTags={tags=[{tag=urn:li:tag:Data Warehouse}]}}]}
15:17:24.399 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener - Successfully fed bulk request. Number of events: 1 Took time ms: -1
15:17:24.400 [mae-consumer-job-client-0-C-1] ERROR c.l.m.k.MetadataAuditEventsProcessor - Error deserializing message: java.lang.NullPointerException
15:17:24.400 [mae-consumer-job-client-0-C-1] ERROR c.l.m.k.MetadataAuditEventsProcessor - Message: {"auditHeader": null, "oldSnapshot": null, "oldSystemMetadata": null, "newSnapshot": {"urn": "urn:li:dataset:(urn:li:dataPlatform:oracle,edw.flow_es,PROD)", "aspects": [{"platform": "urn:li:dataPlatform:oracle", "name": "edw.flow_es", "origin": "PROD"}, {"tags": [{"tag": "urn:li:tag:Data Warehouse"}]}]}, "newSystemMetadata": null, "operation": "UPDATE"}

faint-hair-91313

08/04/2021, 3:19 PM

Over and over...

green-football-43791

08/04/2021, 3:33 PM

Ok- taking a look.

big-carpet-38439

08/04/2021, 3:59 PM

ew!! this is no fun. we'll get back soon

faint-hair-91313

08/04/2021, 4:00 PM

Sure, thanks a lot!

green-football-43791

08/04/2021, 4:21 PM

got a fix here Gratiel- sorry about that!

green-football-43791

08/04/2021, 4:21 PM

https://github.com/linkedin/datahub/pull/3029/

faint-hair-91313

08/04/2021, 5:27 PM

Wow, so fast! Neat!

green-football-43791

08/04/2021, 8:45 PM

@faint-hair-91313 - we released the fix- could you upgrade and see if you can re-index now?

faint-hair-91313

08/04/2021, 8:50 PM

I am trying... fighting with the proxy ...

big-carpet-38439

08/04/2021, 9:04 PM

this dang proxy lol

faint-hair-91313

08/05/2021, 2:45 PM

Ok, sorted the proxy out... but still I do not see the relationships. For some reason it's still running on elastic.

green-football-43791

08/05/2021, 2:48 PM

Did you try running re-indexing after setting up again?

faint-hair-91313

08/06/2021, 8:01 AM

Yes.

faint-hair-91313

08/06/2021, 8:01 AM

And this time, no errors in the datahub-gms.

faint-hair-91313

08/06/2021, 8:04 AM

Regarding the proxy, I have to force getting the image from our local Nexus repository. If the Nexus repo does not find it, it will retrieve it from the internet. With the new version, you guys added an if case in the quickstart to handle the neo4j deployment. I missed that and that is why I couldn't get the new docker images. That's one. The pip library also comes via this Nexus repository. But there we had a failure timeout - everytime we try to dld a library that does not exist, you would have to try again in 24 hours before it could check. Several interactions with the IT team and got it sorted it out and reduced to 6 hours. 🤦‍♂️

green-football-43791

08/11/2021, 12:01 AM

Hey @faint-hair-91313 - update on this

green-football-43791

08/11/2021, 12:01 AM

Turns out there was a bug that was causing issues with Chart <> Dataset relationships

green-football-43791

08/11/2021, 12:02 AM

We've fixed this in the latest master 🙂

green-football-43791

08/11/2021, 12:02 AM

if you pull latest and re-index, you should be good to go!

green-football-43791

08/11/2021, 12:03 AM

Thank you for all your patience along with this bug.

faint-hair-91313

08/18/2021, 9:53 AM

No worries. You guys are doing am amazing job. Updated to 0.8.10 and relationships are back.

big-carpet-38439

08/18/2021, 2:46 PM

Thank you for looping back around Gratiel! You are an appreciated user. Keep the feedback coming!

green-football-43791

08/18/2021, 3:12 PM

Glad to hear!

🙌 1

Open in Slack

Previous Next