Hi I am having trouble getting the reload segments to work I Apache Pinot #troubleshooting

Hi, I am having trouble getting the reload segment...

Pradeep

02/09/2021, 7:07 PM

Hi, I am having trouble getting the reload segments to work (I am running latest master code) Seeing below logs on pinot-server. Looks like

SegmentMessageHandlerFactory

is not getting registered for some reason. When I restart the server I don’t see Logs from this function beyond this point. (https://sourcegraph.com/github.com/apache/incubator-pinot/-/blob/pinot-server/src/ma[…]org/apache/pinot/server/starter/helix/HelixServerStarter.java)

Copy code

Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:0
21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild: false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:1
The latency of message dbb39d17-cece-4f3c-bd88-29fb61a93863 is 922470 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: dbb39d17-cece-4f3c-bd88-29fb61a93863
The latency of message 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4 is 42 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 4c8dffc3-68cf-417e-8cd2-a572ce4cdcb4
21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 7ms

Xiang Fu

02/09/2021, 8:56 PM

do you have more logs show any error logs

Xiang Fu

02/09/2021, 8:56 PM

are you running on docker or baremetal

Xiang Fu

02/09/2021, 8:57 PM

for reloading segments, do you mean you did call the controller reload endpoint or just restart server

Pradeep

02/09/2021, 8:57 PM

I didn't find any more error logs, next set of logs are segment state transition logs. I was trying controller endpoint then I tried restarting to see debug it

Pradeep

02/09/2021, 8:57 PM

It's running on docker

Xiang Fu

02/09/2021, 8:58 PM

after you restart the pinot server, is ip address same?

Xiang Fu

02/09/2021, 8:59 PM

can you check the idealstates for the table and see if segment assigned instances with same ip

Pradeep

02/09/2021, 8:59 PM

Oh ok let me check that and get back

Xiang Fu

02/09/2021, 9:03 PM

you can set this config in to pinot server config to use hostname not ip to register pinot:

Copy code

pinot.set.instance.id.to.hostname=true

Pradeep

02/09/2021, 9:08 PM

thanks, let me try t hat

Pradeep

02/09/2021, 9:14 PM

So I currently have only one pinot server and all the segments are always attached to that. But I see how this is better

Copy code

pinot.set.instance.id.to.hostname=true

Xiang Fu

02/09/2021, 9:17 PM

got it, there should be a pinotServer.log file in your docker container

Xiang Fu

02/09/2021, 9:17 PM

if possible you can enter into the container and check that file

Pradeep

02/09/2021, 9:24 PM

hmm, it’s empty We created our own docker image but it uses pinot-admin.sh

Xiang Fu

02/09/2021, 9:26 PM

can you check if this file is there :

/opt/pinot/conf/pinot-server-log4j2.xml

Pradeep

02/09/2021, 9:27 PM

ah no, let me fix that and try

Xiang Fu

02/09/2021, 9:27 PM

if so you can append

-Dlog4j2.configurationFile=/opt/pinot/conf/pinot-server-log4j2.xml

to the existing

JAVA_OPTS

(I think you must already configured it ) when you start up pinot server

Pradeep

02/09/2021, 9:31 PM

Copy code

-Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml

yup it’s setup

Pradeep

02/09/2021, 9:31 PM

i see it as an option with which the process is running too

Xiang Fu

02/09/2021, 9:37 PM

Xiang Fu

02/09/2021, 9:38 PM

conf/pinot-admin-log4j2.xml

should tell where the log files are located

Pradeep

02/09/2021, 9:42 PM

yeah for some reason i don’t see it appending to the pinotServer.log but i see console outputs as part of

sudo docker logs <container_id>

Xiang Fu

02/09/2021, 9:47 PM

ic, that comes from pinot-admin-log4j2.xml

Xiang Fu

02/09/2021, 9:47 PM

you can set console log to INFO

Xiang Fu

02/09/2021, 9:47 PM

so it will print more info

Pradeep

02/09/2021, 9:52 PM

it’s already set to INFO, i do see the info logs

Pradeep

02/09/2021, 9:53 PM

are you looking for any specific log entry?

Xiang Fu

02/09/2021, 9:54 PM

not really

Xiang Fu

02/09/2021, 9:54 PM

cause you say segments are not reloading

Xiang Fu

02/09/2021, 9:54 PM

so the first thing is to see if there is any error that prevent the reloading

Xiang Fu

02/09/2021, 9:55 PM

if restart doesn’t work, then we need to check if idealstates matches the instance id

Pradeep

02/09/2021, 9:56 PM

so segments are in healthy state, I am trying to reload some indicies

Pradeep

02/09/2021, 9:56 PM

and trying to do that using reload. Assignment is correct, all segments are healthy and I am able to query them

Xiang Fu

02/09/2021, 9:57 PM

Pradeep

02/09/2021, 9:57 PM

Initialized with retryCount: 3, retryWaitMs: 100, retryDelayScaleFactor: 5 NoOpPinotCrypter not configured, adding as default Register state model factory for state model SegmentOnlineOfflineStateModel using factory name DEFAULT with org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory@6 bb2e932 Adding preconnect callback: org.apache.pinot.server.starter.helix.HelixServerStarter$$Lambda$142/653079016@1c09a18 Connecting Helix manager ClusterManager.connect() Starting ZkClient event thread. Client environment:zookeeper.version=3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT Client environment:host.name=<NA> Client environment:java.version=1.8.0_275 Client environment:java.vendor=Private Build Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre Client environmentjava.class.path=/opt/pinot/apache pinot incubating 0.7.0 SNAPSHOT bin/lib/pinot all 0.7.0 SNAPSHOT jar with dependencies.jarjmx_prometheus_javaagent-0.12.0.jar Client environmentjava.library.path=/usr/java/packages/lib/amd64/usr/lib/x86_64-linux-gnu/jni/lib/x86 64 linux gnu/usr/lib/x86_64-linux-gnu/usr/lib/jni/lib:/usr/lib Client environment:java.io.tmpdir=/tmp Client environment:java.compiler=<NA> Client environment:os.name=Linux Client environment:os.arch=amd64 Client environment:os.version=5.4.0-1030-aws Client environment:user.name=root Client environment:user.home=/root Client environment:user.dir=/opt/pinot/apache-pinot-incubating-0.7.0-SNAPSHOT-bin

Xiang Fu

02/09/2021, 9:57 PM

then there should be log entry with that segment id

Pradeep

02/09/2021, 9:58 PM

let me try that on a single segment

Xiang Fu

02/09/2021, 9:58 PM

if there is any indices, it should also tell it’s creating indices for that segment

Xiang Fu

02/09/2021, 9:58 PM

yes

Xiang Fu

02/09/2021, 9:58 PM

you can do /reload with force option

Pradeep

02/09/2021, 10:05 PM

nope I see this on the server side when I trigger a reload on a segment. How do I a force reload, this is triggered from controller endpoint

/segments/{tablename}/{segment}/reload

Copy code

Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:7
21 START:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff, watchChild:
false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff
Subscribing child change listener to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES
Consumed 0 events from (rate:0.0/s), currentOffset=4037, numRowsConsumedSoFar=0, numRowsIndexedSoFar=0
Subscribing to path:/PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES took:3
The latency of message 2334f0d8-e251-4f75-85f0-c4db72623b31 is 136141 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: 2334f0d8-e251-4f75-85f0-c4db72623b31
The latency of message f41b6769-5f43-4982-bd40-5c8e43dbd9ac is 60 ms
Fail to find message handler factory for type: USER_DEFINE_MSG msgId: f41b6769-5f43-4982-bd40-5c8e43dbd9ac
21 END:INVOKE /PinotCluster/INSTANCES/Server_10.0.101.11_8069/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@4b9419ff type: CALLBACK Took: 9ms

Pradeep

02/09/2021, 10:07 PM

only thing i can think of is that SegmentMessageHandlerFactory not getting registered for some reason which is weird since I don’t see any exception/errors

Xiang Fu

02/09/2021, 10:07 PM

I don’t think that’s the issue

Xiang Fu

02/09/2021, 10:08 PM

if it’s not registered then segment shouldn’t be loaded

Xiang Fu

02/09/2021, 10:08 PM

so there is no segment got reloaded in server side ?

Xiang Fu

02/09/2021, 10:09 PM

I think consuming segment cannot be reloaded

Xiang Fu

02/09/2021, 10:09 PM

you can reload an online segment that is already persisted

Pradeep

02/09/2021, 10:09 PM

yeah this is an online segment

Xiang Fu

02/09/2021, 10:10 PM

Xiang Fu

02/09/2021, 10:10 PM

then there is no logs on server side for this segment?

Pradeep

02/09/2021, 10:10 PM

I see that segments offline to online is managed by SegmentOnlineOfflineStateModelFactory rite? where as SegmentMessageHandlerFactory seems to be handling reloading/refreshing messages

Pradeep

02/09/2021, 10:10 PM

no I don’t see any logs on the server side for that segment when I did a reload

Pradeep

02/09/2021, 10:11 PM

ok let me get back to debugging this a bit later, but let me know if you have any other ideas/suggestions. Thanks a lot for the help

Xiang Fu

02/09/2021, 10:12 PM

this SegmentMessageHandlerFactory is on pinot server side only

Xiang Fu

02/09/2021, 10:12 PM

Copy code

// Register message handler factory
    SegmentMessageHandlerFactory messageHandlerFactory =
        new SegmentMessageHandlerFactory(fetcherAndLoader, instanceDataManager, serverMetrics);
    _helixManager.getMessagingService()
        .registerMessageHandlerFactory(Message.MessageType.USER_DEFINE_MSG.toString(), messageHandlerFactory);

Xiang Fu

02/09/2021, 10:13 PM

this is registered always

Pradeep

02/09/2021, 10:15 PM

yeah that makes sense which is why I am a bit confused on what’s going on, anyways let me get back to this when I get some time in the evening

Xiang Fu

02/09/2021, 10:18 PM

sure, also check if restarting server makes segment reloading happen with new indexing

Pradeep

02/09/2021, 10:19 PM

yeah restarting server is creating appropriates indicies

Xiang Fu

02/09/2021, 10:24 PM

Open in Slack

Previous Next