Hi, I am having trouble getting the reload segment...
# troubleshooting
p
Hi, I am having trouble getting the reload segments to work (I am running latest master code) Seeing below logs on pinot-server. Looks like
SegmentMessageHandlerFactory
is not getting registered for some reason. When I restart the server I don’t see Logs from this function beyond this point. (https://sourcegraph.com/github.com/apache/incubator-pinot/-/blob/pinot-server/src/ma[…]org/apache/pinot/server/starter/helix/HelixServerStarter.java)
Copy code
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:0
21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild: false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:1
The latency of message dbb39d17-cece-4f3c-bd88-29fb61a93863 is 922470 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: dbb39d17-cece-4f3c-bd88-29fb61a93863
The latency of message 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4 is 42 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4
21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 7ms
x
do you have more logs show any error logs
are you running on docker or baremetal
for reloading segments, do you mean you did call the controller reload endpoint or just restart server
p
I didn't find any more error logs, next set of logs are segment state transition logs. I was trying controller endpoint then I tried restarting to see debug it
It's running on docker
x
after you restart the pinot server, is ip address same?
can you check the idealstates for the table and see if segment assigned instances with same ip
p
Oh ok let me check that and get back
x
you can set this config in to pinot server config to use hostname not ip to register pinot:
Copy code
pinot.set.instance.id.to.hostname=true
p
thanks, let me try t hat
So I currently have only one pinot server and all the segments are always attached to that. But I see how this is better
Copy code
pinot.set.instance.id.to.hostname=true
x
got it, there should be a pinotServer.log file in your docker container
if possible you can enter into the container and check that file
p
hmm, it’s empty We created our own docker image but it uses pinot-admin.sh
x
can you check if this file is there :
/opt/pinot/conf/pinot-server-log4j2.xml
p
ah no, let me fix that and try
x
if so you can append
-Dlog4j2.configurationFile=/opt/pinot/conf/pinot-server-log4j2.xml
to the existing
JAVA_OPTS
(I think you must already configured it ) when you start up pinot server
p
Copy code
-Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml
yup it’s setup
i see it as an option with which the process is running too
x
ic
so
conf/pinot-admin-log4j2.xml
should tell where the log files are located
p
yeah for some reason i don’t see it appending to the pinotServer.log but i see console outputs as part of
sudo docker logs <container_id>
x
ic, that comes from pinot-admin-log4j2.xml
you can set console log to INFO
so it will print more info
p
it’s already set to INFO, i do see the info logs
are you looking for any specific log entry?
x
not really
cause you say segments are not reloading
so the first thing is to see if there is any error that prevent the reloading
if restart doesn’t work, then we need to check if idealstates matches the instance id
p
so segments are in healthy state, I am trying to reload some indicies
and trying to do that using reload. Assignment is correct, all segments are healthy and I am able to query them
x
ic
p
Initialized with retryCount: 3, retryWaitMs: 100, retryDelayScaleFactor: 5 NoOpPinotCrypter not configured, adding as default Register state model factory for state model SegmentOnlineOfflineStateModel using factory name DEFAULT with org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory@6 bb2e932 Adding preconnect callback: org.apache.pinot.server.starter.helix.HelixServerStarter$$Lambda$142/653079016@1c09a18 Connecting Helix manager ClusterManager.connect() Starting ZkClient event thread. Client environment:zookeeper.version=3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT Client environment:host.name=<NA> Client environment:java.version=1.8.0_275 Client environment:java.vendor=Private Build Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre Client environmentjava.class.path=/opt/pinot/apache pinot incubating 0.7.0 SNAPSHOT bin/lib/pinot all 0.7.0 SNAPSHOT jar with dependencies.jarjmx_prometheus_javaagent-0.12.0.jar Client environmentjava.library.path=/usr/java/packages/lib/amd64/usr/lib/x86_64-linux-gnu/jni/lib/x86 64 linux gnu/usr/lib/x86_64-linux-gnu/usr/lib/jni/lib:/usr/lib Client environment:java.io.tmpdir=/tmp Client environment:java.compiler=<NA> Client environment:os.name=Linux Client environment:os.arch=amd64 Client environment:os.version=5.4.0-1030-aws Client environment:user.name=root Client environment:user.home=/root Client environment:user.dir=/opt/pinot/apache-pinot-incubating-0.7.0-SNAPSHOT-bin
x
then there should be log entry with that segment id
p
let me try that on a single segment
x
if there is any indices, it should also tell it’s creating indices for that segment
yes
you can do /reload with force option
p
nope I see this on the server side when I trigger a reload on a segment. How do I a force reload, this is triggered from controller endpoint
/segments/{tablename}/{segment}/reload
Copy code
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:7
21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild:
false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Consumed 0 events from (rate:0.0/s), currentOffset=4037, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:3
The latency of message 2334f0d8-e251-4f75-85f0-c4db72623b31 is 136141 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 2334f0d8-e251-4f75-85f0-c4db72623b31
The latency of message f41b6769-5f43-4982-bd40-5c8e43dbd9ac is 60 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: f41b6769-5f43-4982-bd40-5c8e43dbd9ac
21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 9ms
only thing i can think of is that SegmentMessageHandlerFactory not getting registered for some reason which is weird since I don’t see any exception/errors
x
I don’t think that’s the issue
if it’s not registered then segment shouldn’t be loaded
so there is no segment got reloaded in server side ?
I think consuming segment cannot be reloaded
you can reload an online segment that is already persisted
p
yeah this is an online segment
x
ok
then there is no logs on server side for this segment?
p
I see that segments offline to online is managed by SegmentOnlineOfflineStateModelFactory rite? where as SegmentMessageHandlerFactory seems to be handling reloading/refreshing messages
no I don’t see any logs on the server side for that segment when I did a reload
ok let me get back to debugging this a bit later, but let me know if you have any other ideas/suggestions. Thanks a lot for the help
x
this SegmentMessageHandlerFactory is on pinot server side only
Copy code
// Register message handler factory
    SegmentMessageHandlerFactory messageHandlerFactory =
        new SegmentMessageHandlerFactory(fetcherAndLoader, instanceDataManager, serverMetrics);
    _helixManager.getMessagingService()
        .registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(), messageHandlerFactory);
this is registered always
p
yeah that makes sense which is why I am a bit confused on what’s going on, anyways let me get back to this when I get some time in the evening
x
sure, also check if restarting server makes segment reloading happen with new indexing
p
yeah restarting server is creating appropriates indicies
x
ok