hey, not sure if this is the right channel, but he...
# ingestion
b
hey, not sure if this is the right channel, but here it goes: i’ve ingested data from hive, lookml, looker using the cli tool. I’ve also prepared and ingested some custom
com.linkedin.dataset.UpstreamLineage
aspects for the datasets via the rest api. However I see that some pages do not load when trying to batchLoad (I think) the upstream/downstream dependency datasets. The UI looks like this:
in the frontend logs I find:
Copy code
10:58:37 [R2 Nio Event Loop-3-2] DEBUG c.l.r.t.h.c.rest.RAPResponseHandler - data-platform-datahub-gms/10.126.240.134:8080: handling a response
10:58:37 [R2 Nio Event Loop-3-2] DEBUG c.l.r.t.h.c.rest.RAPResponseHandler - data-platform-datahub-gms/some_ip:8080: exception on active channel
io.netty.handler.codec.TooLongFrameException: Response entity too large: HttpObjectAggregator$AggregatedFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 2097152, cap: 2097152, components=259))
HTTP/1.1 200 OK
Date: Tue, 17 Aug 2021 10:58:37 GMT
Content-Type: application/json
X-RestLi-Protocol-Version: 2.0.0
Server: Jetty(9.4.20.v20190813)
	at io.netty.handler.codec.http.HttpObjectAggregator.handleOversizedMessage(HttpObjectAggregator.java:276)
	at io.netty.handler.codec.http.HttpObjectAggregator.handleOversizedMessage(HttpObjectAggregator.java:87)
	at io.netty.handler.codec.MessageAggregator.invokeHandleOversizedMessage(MessageAggregator.java:404)
	at io.netty.handler.codec.MessageAggregator.decode(MessageAggregator.java:293)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:316)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run(Thread.java:748)
10:58:37 [R2 Nio Event Loop-3-2] DEBUG c.l.r.t.h.c.rest.RAPResponseHandler - data-platform-datahub-gms/some_ip:8080: idle channel closed
I’m running
v0.8.6
g
Hey @bumpy-activity-74405, do your entities have long descriptions, many tags, or deeply interconnected lineage by any chance?
It looks like the issue is
Copy code
io.netty.handler.codec.TooLongFrameException: Response entity too large: HttpObjectAggregator$AggregatedFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 2097152, cap: 2097152, components=259))
does this happen for all entities or just a few?
b
do your entities have long descriptions, many tags, or deeply interconnected lineage by any chance?
No tags, long descriptions here and there, the lineage is a bit ridiculous too.
g
hmm, strange, we should still be able to handle this
we limit the lineage of entities to 100
but its possible that 100 downstream entities each w/ 100 relationships could cause payload size issues if long descriptions were also in play
is this sample data or real data?
b
I’ve narrowed it down to a single table that has ±8 upstream dependencies and ±40 downstream dependencies, but the weird part is that some pages of tables that have this table in up/down stream load fine and some don’t.
g
It's likely this entity includes a large payload but not large enough to cause the whole payload to be oversized
so some entities still load fine, but for other entities with large enough payloads, it puts them over the top
b
it’s a bit strange, let’s call the problematic dataset “dataset A”. I have a different dataset “dataset B” that has dataset A in it’s downstream. Dataset B page loads fine, but when I try to go to the lineage tab and expand dependencies of dataset A, I can see the same error in logs.
anyway, reducing the payload by removing the description should help, right?
g
yep, meanwhile let me file a ticket to support large payloads in the lineage endpoint