FYI, on pinot-0.4.0 , after deleting a segment whi...
# troubleshooting
e
FYI, on pinot-0.4.0 , after deleting a segment which was in ERROR state the realtime table is no longer ingesting. Is there an api to restart ingestion?
n
how long has it been since deleting the segment?
the controller periodic task that fixes this runs hourly by default
e
Hi @Neha Pawar, thanks for the quick response! We deleted the segment on friday night:) Is there a way to manually start it or configure the schedule?
n
restarting the controllers will trigger the periodic task.
👍 1
but it shouldve run several times since friday
you can check in controller logs, if you see lines about RealtimeSegmentValidationManager
e
Thanks! will check logs to see if I can find out. What's the name of the periodic task class, so I can filter the logs?
n
RealtimeSegmentValidationManager
e
Thanks for the help, will do
Actually just found that we had a few more segments in
ERROR
state from external view. Deleted them. Could that be why the segment validator didn't run?
n
yes that would explain it. having a segment in ERROR state in the ExternalView means the ideal state would’ve been all good. The validation manager looks at the ideal state for erroneous conditions.
e
Thanks!
n
deleting the segment deletes the entry from ideal state, and then promts the validation manager to create the entry
were all partitions having latest segment in ERROR state?
e
Only from 7/23 when we upgraded to pinot-0.4.0 and only for this table.
I deleted all the error segments and restarted the controllers so once the validation manager runs I should see segments in consuming state?
n
yup
😁 1
e
Now I see "Mismatching schema/table config for ..." errors, but only for this one problematic table:
Copy code
textPayload: "java.lang.RuntimeException: Mismatching schema/table config for oas_integration_operation_event_REALTIME
	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:238) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:88) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at sun.reflect.GeneratedMethodAccessor110.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
"
n
any WARN before that?
e
Didn't find any warning but here is the logs right up to the exception:
Untitled
Could it be that the table config or schema is not compatible with pinot-0.4.0?
n
Cannot convert from incoming field spec:< field name: operation_ts, data type: LONG, time type: MILLISECONDS, time unit size: 1, time format: EPOCH > to outgoing field spec:< field name: operation_ts, data type: LONG, time type: SECONDS, time unit size: 1, time format: EPOCH > if name is the same
we introduced a validation to check that incoming and outgoing names are different
e
Ah! That's great to know:)
n
was this working? as in, was the conversion happening correctly from millis to seconds for operation_ts?
e
It was in pinot-0.3.0
Even created a sorted index on the operation_ts column in pinot-0.3.0 and it appeared to be working
So if we have a timespec where the incoming != outgoing we should also change the name?
n
yeah.
trying to think how we can upgrade your table
e
Thanks! Can I just do
PUT
with the new schema?
We only have realtime segments
n
umm, you’ll also have to change the time column name in the table config. that can have repercussions
e
I can try that (in staging first 🙂 ) - if this doesn't work can I create a new table with the correct table schema/realtime config and load the data into offline segments?
n
what is the retention of your table? and do you have all that data in the kafka topic?
I create a new table with the correct table schema/realtime config
- if this is possible, that’s great. though i didnt understand this -
load the data into offline segments
e
we just get the data from a hive table that we also ingest into and create offline segments
n
oh so this is a hybrid table?
so a complete rebootstrap is possible?
Kishore tells me that you have a custom Decoder implementation?
e
We use the avro confluent decoder - it was merged
So retention is 7 days and the kafka topic holds 7 days of data
n
what’s ingestion rate? 7 days rebootstrap should not take very long
e
No, it seems to suck it up in < 10 seconds 🙂
n
what do you do with the offline segments you create?
e
We just use the uploaddownload client and push them to gcs
And we use the pinot gcs fs plugin
So you recommend creating a new table vs trying to fix the old one?
n
yes, fixing this is going to be complex. been chatting with @Jackie and he feels the same
👍 1
e
Recreating now, so for the realtime spec do we use the outgoing time field name?
n
We just use the uploaddownload client and push them to gcs
who/how queries these segments?
e
We have a custom job that queries presto and creates the segments via a flink job, then pushes the generated segments to pinot. Does that answer your question?
n
so you have OFFLINE and REALTIME table ? just wanting to confirm if it is a REALTIME only table or a hybrid table
the fix involves - create new table, which has new outgoing time column name, and use that in the table config’s timeColumnName. if this is a REALTIME only table - then we only need to re-bootstrap the realtiem table. if it is hybrid, we would also need to rebootstrap the offline segments
e
For this one it's realtime, as we never loaded the table from offline before. This is good to know though!
👍 1