https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
s

Satyam Raj

06/14/2022, 6:03 PM
hey guys, we’re using the pinot-java-client to fire point queries on a pinot table on user_id (inverted indexed). when the api that has this client is hit with a lot of requests (200rpm) it causes the application to crash because of too many open threads. just wanted to check if the queries are async or sync in nature. implemented like below
Copy code
Connection connection = ConnectionFactory.fromHostList(this.pinotConfig.getBrokerUrl());
ResultSetGroup resultSet = connection.execute(new Request("sql",query));
n

Neha Pawar

06/14/2022, 6:25 PM
@saurabh dubey would you help answer this? ^
s

saurabh dubey

06/15/2022, 1:32 AM
Would you happen to have a thread dump to check what these open threads are doing? One of the things I do see is the fact that there's no connection pooling happening within the client at the moment.
The http request itself is async but with a blocking wait effectively blocking your request thread. https://github.com/apache/pinot/blob/master/pinot-clients/pinot-java-client/src/ma[…]/org/apache/pinot/client/JsonAsyncHttpPinotClientTransport.java You could use
Copy code
connection.executeAsync(new Request("sql", query))
if you'd like to handle the response future yourself, in an async fashion
s

Satyam Raj

06/15/2022, 7:19 AM
I have a thread dump… >1k threads are being formed within seconds of the below:
Copy code
New I/O worker #971
PRIORITY : 5

THREAD ID : 0X00007FC938D560C0

NATIVE ID : 0X73F

NATIVE ID (DECIMAL) : 1855

STATE : RUNNABLE


stackTrace:
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000007baaf6e30> (a sun.nio.ch.Util$3)
- locked <0x00000007baaf6e20> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000007baaf6d08> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000007baaf7188> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Going ahead to use commons-pool2 for the Connection objects in the client side
s

saurabh dubey

06/15/2022, 7:33 AM
I think we can benefit from connection pooling here. Although that'll need a change in the client
s

Satyam Raj

06/15/2022, 7:33 AM
for now, i’m creating pool of the Connection objects in my application code
s

saurabh dubey

06/15/2022, 7:40 AM
I had and old PR which adds netty connection pooling https://github.com/saurabhd336/pinot/pull/3/files. I'll try to benchmark its performance under heavy query load like 200 QPM, and will try to get this reviewed if there are performance improvements. @Neha Pawar
👀 1
s

Satyam Raj

06/15/2022, 12:51 PM
@saurabh dubey i think the client has a thread-leak. even if i get a future, it creates New I/O worker #47 too many times
s

saurabh dubey

06/15/2022, 4:32 PM
Will it be possible to share a thread dump? If not, could you verify that in the thread dump, if these thread do indeed belong to the AsyncHTTPClient? i,e. look something like
Copy code
"AsyncHttpClient-3-3@3992" prio=5 tid=0x19 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.asynchttpclient.netty.channel.ChannelManager$1.initChannel(ChannelManager.java:222)
Just to make sure its indeed the client thread and not some application threads. I do think we could enable connection pooling and set max allowed connection for the http client being used underneath. https://github.com/apache/pinot/blob/master/pinot-clients/pinot-java-client/src/ma[…]/org/apache/pinot/client/JsonAsyncHttpPinotClientTransport.java
Copy code
builder.setReadTimeout((int) BROKER_READ_TIMEOUT_MS)
        .setConnectTimeout(BROKER_CONNECT_TIMEOUT_MS)
        .setKeepAlive(true)
        .setMaxConnections(10);
s

Satyam Raj

06/16/2022, 8:19 AM
@Troy Doles FYI
thankyou 1
I'll share the thread dump in a while
s

Satyam Raj

06/16/2022, 9:22 AM
@saurabh dubey
s

saurabh dubey

06/16/2022, 9:49 AM
@Satyam Raj are you creating a new Connection object for each request?
s

Satyam Raj

06/17/2022, 7:33 AM
No… I’m maintaining a GenericObjectPool of the Connection objects prepared
s

saurabh dubey

06/17/2022, 7:56 AM
So within a single instance of AsyncHttpClient (which is what the pinot-java client uses underneath), the number of IO threads are capped to 2 * number of CPU cores (this comes from netty) https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/MultithreadEventLoopGroup.java#L40 So not sure why you're seeing those many IO threads. Not sure if this could be some other client within the application or something else.. Although, I did find one issue with the pinot-java client wherein, due to a lack of synchronisation, > 1 instances of AHC can get created if threads call
Copy code
Connection conn = ConnectionFactory.fromZookeeper("<ZK URL>");
https://github.com/apache/pinot/blob/master/pinot-clients/pinot-java-client/src/main/java/org/apache/pinot/client/ConnectionFactory.java#L174 But its a bit unlikely.
One way to explicitly cap the number of IO threads, would be via Dsl.config().setIoThreadsCount()
Copy code
builder.setReadTimeout((int) BROKER_READ_TIMEOUT_MS)
        .setIoThreadsCount(10)
        .setConnectTimeout(BROKER_CONNECT_TIMEOUT_MS);
here https://github.com/apache/pinot/blob/master/pinot-clients/pinot-java-client/src/ma[…]/org/apache/pinot/client/JsonAsyncHttpPinotClientTransport.java. But even without that, the max count is capped to 2 * number of CPU cores.. Could you verify if there might be something else in your application that might be creating these threads?
BTW, I just checked and the the threadFactory used to create the AHC threads, has a name "AsyncHttpClient". So all threads created by the AHC should look like
Copy code
"AsyncHttpClient-3-3@3992" prio=5 tid=0x19 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.asynchttpclient.netty.channel.ChannelManager$1.initChannel(ChannelManager.java:222)
The IO threads in your thread dump don't have this prefix. Hence my suspicion that these are being created by something else in the application
s

Satyam Raj

06/17/2022, 11:58 AM
Thanks for the debugging @saurabh dubey I searched a bit and found that these I/O threaads are coming from Netty itself (https://github.com/twitter/finagle/issues/152) and only spawning once our app’s api that has this client implementation is hit. They don’t spawn on other API calls (that don’t have pinot-client) Connection pooling is also again creating the same number of threads (although a bit low in number) As a workaround, I’m trying to use the REST api call to localhost:8099/query/sql (on the broker).
Quick question though, for this api that will be hit with ~200 rpm, should this rest call to pinot be asynchronously implemented?
s

saurabh dubey

06/20/2022, 4:49 AM
I think either should be fine. Just make sure you use keepAlive connections, and use connection pooling. Most http clients do that by default, one last thing wrt the debugging, I realised there's a possibility that either the entire async-http-client library or one of its core dependencies could be getting overriden by something else in the pom dependencies.. Could you check the output of
mvn org.apache.maven.plugins:maven-dependency-plugin:2.10:tree -Dverbose=true
and try and verigy whether the netty and AHC versions match with
Copy code
saurabh.dubey@Saurabhs-MacBook-Pro pinot-java-client % mvn org.apache.maven.plugins:maven-dependency-plugin:2.10:tree -Dverbose=true
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Detecting the operating system and CPU architecture
[INFO] ------------------------------------------------------------------------
[INFO] os.detected.name: osx
[INFO] os.detected.arch: aarch_64
[INFO] os.detected.version: 12.4
[INFO] os.detected.version.major: 12
[INFO] os.detected.version.minor: 4
[INFO] os.detected.classifier: osx-aarch_64
[INFO] 
[INFO] -----------------< org.apache.pinot:pinot-java-client >-----------------
[INFO] Building Pinot Java Client 0.11.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.10:tree (default-cli) @ pinot-java-client ---
[WARNING] Using Maven 2 dependency tree to get verbose output, which may be inconsistent with actual Maven 3 resolution
[INFO] org.apache.pinot:pinot-java-client:jar:0.11.0-SNAPSHOT
[INFO] +- org.testng:testng:jar:6.11:test
[INFO] |  +- com.beust:jcommander:jar:1.64:test
[INFO] |  \- (org.yaml:snakeyaml:jar:1.30:compile - version managed from 1.17; scope updated from test; omitted for duplicate)
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.10.0:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile
[INFO] +- org.asynchttpclient:async-http-client:jar:2.12.3:compile
[INFO] |  +- org.asynchttpclient:async-http-client-netty-utils:jar:2.12.3:compile
[INFO] |  |  +- io.netty:netty-buffer:jar:4.1.74.Final:compile (version managed from 4.1.60.Final)
[INFO] |  |  |  \- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.7.30; omitted for duplicate)
[INFO] |  |  \- (com.sun.activation:jakarta.activation:jar:1.2.2:compile - omitted for duplicate)
[INFO] |  +- io.netty:netty-codec-http:jar:4.1.74.Final:compile (version managed from 4.1.60.Final)
[INFO] |  |  +- io.netty:netty-common:jar:4.1.74.Final:compile
[INFO] |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  +- io.netty:netty-transport:jar:4.1.74.Final:compile
[INFO] |  |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  |  \- (io.netty:netty-resolver:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- io.netty:netty-codec:jar:4.1.74.Final:compile
[INFO] |  |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  |  \- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  \- (io.netty:netty-handler:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  +- io.netty:netty-handler:jar:4.1.74.Final:compile
[INFO] |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- io.netty:netty-resolver:jar:4.1.74.Final:compile
[INFO] |  |  |  \- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-codec:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  \- io.netty:netty-tcnative-classes:jar:2.0.48.Final:compile
[INFO] |  +- io.netty:netty-codec-socks:jar:4.1.74.Final:compile (version managed from 4.1.60.Final)
[INFO] |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  \- (io.netty:netty-codec:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  +- io.netty:netty-handler-proxy:jar:4.1.74.Final:compile (version managed from 4.1.60.Final)
[INFO] |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-codec:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-codec-socks:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  \- (io.netty:netty-codec-http:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  +- io.netty:netty-transport-native-epoll:jar:linux-x86_64:4.1.74.Final:compile (version managed from 4.1.60.Final)
[INFO] |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |  +- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |  \- io.netty:netty-transport-classes-epoll:jar:4.1.74.Final:compile
[INFO] |  |     +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  |     +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |  |     \- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |  +- org.reactivestreams:reactive-streams:jar:1.0.3:compile
[INFO] |  +- com.typesafe.netty:netty-reactive-streams:jar:2.0.4:compile
[INFO] |  |  +- (io.netty:netty-handler:jar:4.1.74.Final:compile - version managed from 4.1.43.Final; omitted for duplicate)
[INFO] |  |  \- (org.reactivestreams:reactive-streams:jar:1.0.3:compile - omitted for duplicate)
[INFO] |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.1; omitted for duplicate)
[INFO] |  \- com.sun.activation:jakarta.activation:jar:1.2.2:compile
[INFO] +- io.netty:netty:jar:3.9.6.Final:compile
[INFO] +- com.101tec:zkclient:jar:0.7:compile
[INFO] |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.1; omitted for duplicate)
[INFO] |  \- org.apache.zookeeper:zookeeper:jar:3.5.8:compile (version managed from 3.4.6)
[INFO] |     +- org.apache.zookeeper:zookeeper-jute:jar:3.5.8:compile
[INFO] |     +- (io.netty:netty-handler:jar:4.1.74.Final:compile - version managed from 4.1.48.Final; omitted for duplicate)
[INFO] |     +- io.netty:netty-transport-native-epoll:jar:4.1.74.Final:compile (version managed from 4.1.48.Final)
[INFO] |     |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |     |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |     |  +- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |     |  +- io.netty:netty-transport-native-unix-common:jar:4.1.74.Final:compile
[INFO] |     |  |  +- (io.netty:netty-common:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |     |  |  +- (io.netty:netty-buffer:jar:4.1.74.Final:compile - version managed from 4.1.60.Final; omitted for duplicate)
[INFO] |     |  |  \- (io.netty:netty-transport:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |     |  \- (io.netty:netty-transport-classes-epoll:jar:4.1.74.Final:compile - omitted for duplicate)
[INFO] |     \- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.1; omitted for duplicate)
[INFO] +- org.slf4j:slf4j-api:jar:1.7.25:compile
[INFO] +- com.google.code.findbugs:jsr305:jar:3.0.0:compile
[INFO] +- org.mockito:mockito-core:jar:3.9.0:test
[INFO] |  +- net.bytebuddy:byte-buddy:jar:1.10.20:test
[INFO] |  +- net.bytebuddy:byte-buddy-agent:jar:1.10.20:test
[INFO] |  \- org.objenesis:objenesis:jar:3.2:test
[INFO] \- org.apache.pinot:pinot-spi:jar:0.11.0-SNAPSHOT:compile
[INFO]    +- commons-configuration:commons-configuration:jar:1.10:compile
[INFO]    +- commons-codec:commons-codec:jar:1.15:compile
[INFO]    +- commons-io:commons-io:jar:2.11.0:compile
[INFO]    +- commons-lang:commons-lang:jar:2.6:compile
[INFO]    +- commons-logging:commons-logging:jar:1.2:compile
[INFO]    +- org.apache.commons:commons-lang3:jar:3.5:compile
[INFO]    +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO]    +- com.google.guava:guava:jar:20.0:compile
[INFO]    +- (com.google.code.findbugs:jsr305:jar:3.0.0:compile - omitted for duplicate)
[INFO]    +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.17.1:compile
[INFO]    |  +- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.1; omitted for duplicate)
[INFO]    |  +- org.apache.logging.log4j:log4j-api:jar:2.17.1:compile
[INFO]    |  \- org.apache.logging.log4j:log4j-core:jar:2.17.1:runtime
[INFO]    |     \- (org.apache.logging.log4j:log4j-api:jar:2.17.1:runtime - omitted for duplicate)
[INFO]    +- com.lmax:disruptor:jar:3.3.4:compile
[INFO]    +- org.apache.logging.log4j:log4j-1.2-api:jar:2.17.1:compile
[INFO]    |  \- (org.apache.logging.log4j:log4j-api:jar:2.17.1:compile - omitted for duplicate)
[INFO]    +- joda-time:joda-time:jar:2.10.5:compile
[INFO]    +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.10.0:compile - omitted for duplicate)
[INFO]    +- org.yaml:snakeyaml:jar:1.30:compile
[INFO]    +- (com.fasterxml.jackson.core:jackson-databind:jar:2.10.0:compile - omitted for duplicate)
[INFO]    +- org.apache.avro:avro:jar:1.9.2:compile
[INFO]    |  +- (com.fasterxml.jackson.core:jackson-core:jar:2.10.0:compile - version managed from 2.10.2; omitted for duplicate)
[INFO]    |  +- (com.fasterxml.jackson.core:jackson-databind:jar:2.10.0:compile - version managed from 2.10.2; omitted for duplicate)
[INFO]    |  +- org.apache.commons:commons-compress:jar:1.21:compile (version managed from 1.19)
[INFO]    |  \- (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.1; omitted for duplicate)
[INFO]    +- org.codehaus.groovy:groovy-all:jar:2.4.21:compile
[INFO]    \- org.reflections:reflections:jar:0.9.9:compile
[INFO]       +- (com.google.guava:guava:jar:20.0:compile - version managed from 15.0; omitted for duplicate)
[INFO]       +- org.javassist:javassist:jar:3.19.0-GA:compile (version managed from 3.18.2-GA)
[INFO]       \- com.google.code.findbugs:annotations:jar:2.0.1:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.785 s
[INFO] Finished at: 2022-06-20T10:13:00+05:30
[INFO] ------------------------------------------------------------------------
s

Satyam Raj

06/20/2022, 8:18 AM
The application is on jdk-8 so we’re using the pinot-java-client:0.7.0 as the >0.10 are compiled with java11
this is when i do
mvn dependency:tree -Dincludes=io.netty
Copy code
[INFO] +- org.apache.phoenix:phoenix-core:jar:4.4.0.2.4.3.0-227:compile
[INFO] |  \- io.netty:netty-all:jar:4.0.23.Final:compile
[INFO] +- org.apache.pinot:pinot-jdbc-client:jar:0.7.0:compile
[INFO] |  \- io.netty:netty:jar:3.9.6.Final:compile
[INFO] \- com.datastax.cassandra:cassandra-driver-mapping:jar:3.4.0:compile
[INFO]    \- com.datastax.cassandra:cassandra-driver-core:jar:3.4.0:compile
[INFO]       \- io.netty:netty-handler:jar:4.0.47.Final:compile
[INFO]          +- io.netty:netty-buffer:jar:4.0.47.Final:compile
[INFO]          |  \- io.netty:netty-common:jar:4.0.47.Final:compile
[INFO]          +- io.netty:netty-transport:jar:4.0.47.Final:compile
[INFO]          \- io.netty:netty-codec:jar:4.0.47.Final:compile
I’ve been able to use the broker rest endpoint instead of this java client for the queries and working pretty fine in prod at 10k rpm. 🎉