https://pinot.apache.org/ logo
e

Elon

07/27/2020, 5:00 PM
FYI, on pinot-0.4.0 , after deleting a segment which was in ERROR state the realtime table is no longer ingesting. Is there an api to restart ingestion?
n

Neha Pawar

07/27/2020, 5:04 PM
how long has it been since deleting the segment?
the controller periodic task that fixes this runs hourly by default
e

Elon

07/27/2020, 5:08 PM
Hi @Neha Pawar, thanks for the quick response! We deleted the segment on friday night:) Is there a way to manually start it or configure the schedule?
n

Neha Pawar

07/27/2020, 5:14 PM
restarting the controllers will trigger the periodic task.
👍 1
but it shouldve run several times since friday
you can check in controller logs, if you see lines about RealtimeSegmentValidationManager
e

Elon

07/27/2020, 5:15 PM
Thanks! will check logs to see if I can find out. What's the name of the periodic task class, so I can filter the logs?
n

Neha Pawar

07/27/2020, 5:15 PM
RealtimeSegmentValidationManager
e

Elon

07/27/2020, 5:16 PM
Thanks for the help, will do
Actually just found that we had a few more segments in
ERROR
state from external view. Deleted them. Could that be why the segment validator didn't run?
n

Neha Pawar

07/27/2020, 5:49 PM
yes that would explain it. having a segment in ERROR state in the ExternalView means the ideal state would’ve been all good. The validation manager looks at the ideal state for erroneous conditions.
e

Elon

07/27/2020, 5:49 PM
Thanks!
n

Neha Pawar

07/27/2020, 5:49 PM
deleting the segment deletes the entry from ideal state, and then promts the validation manager to create the entry
were all partitions having latest segment in ERROR state?
e

Elon

07/27/2020, 5:50 PM
Only from 7/23 when we upgraded to pinot-0.4.0 and only for this table.
I deleted all the error segments and restarted the controllers so once the validation manager runs I should see segments in consuming state?
n

Neha Pawar

07/27/2020, 5:52 PM
yup
😁 1
e

Elon

07/27/2020, 6:07 PM
Now I see "Mismatching schema/table config for ..." errors, but only for this one problematic table:
Copy code
textPayload: "java.lang.RuntimeException: Mismatching schema/table config for oas_integration_operation_event_REALTIME
	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:238) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:88) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at sun.reflect.GeneratedMethodAccessor110.invoke(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
"
n

Neha Pawar

07/27/2020, 6:12 PM
any WARN before that?
e

Elon

07/27/2020, 6:34 PM
Didn't find any warning but here is the logs right up to the exception:
Could it be that the table config or schema is not compatible with pinot-0.4.0?
n

Neha Pawar

07/27/2020, 6:38 PM
Cannot convert from incoming field spec:< field name: operation_ts, data type: LONG, time type: MILLISECONDS, time unit size: 1, time format: EPOCH > to outgoing field spec:< field name: operation_ts, data type: LONG, time type: SECONDS, time unit size: 1, time format: EPOCH > if name is the same
we introduced a validation to check that incoming and outgoing names are different
e

Elon

07/27/2020, 6:39 PM
Ah! That's great to know:)
n

Neha Pawar

07/27/2020, 6:39 PM
was this working? as in, was the conversion happening correctly from millis to seconds for operation_ts?
e

Elon

07/27/2020, 6:40 PM
It was in pinot-0.3.0
Even created a sorted index on the operation_ts column in pinot-0.3.0 and it appeared to be working
So if we have a timespec where the incoming != outgoing we should also change the name?
n

Neha Pawar

07/27/2020, 6:47 PM
yeah.
trying to think how we can upgrade your table
e

Elon

07/27/2020, 6:48 PM
Thanks! Can I just do
PUT
with the new schema?
We only have realtime segments
n

Neha Pawar

07/27/2020, 6:52 PM
umm, you’ll also have to change the time column name in the table config. that can have repercussions
e

Elon

07/27/2020, 6:57 PM
I can try that (in staging first 🙂 ) - if this doesn't work can I create a new table with the correct table schema/realtime config and load the data into offline segments?
n

Neha Pawar

07/27/2020, 6:59 PM
what is the retention of your table? and do you have all that data in the kafka topic?
I create a new table with the correct table schema/realtime config
- if this is possible, that’s great. though i didnt understand this -
load the data into offline segments
e

Elon

07/27/2020, 7:01 PM
we just get the data from a hive table that we also ingest into and create offline segments
n

Neha Pawar

07/27/2020, 7:01 PM
oh so this is a hybrid table?
so a complete rebootstrap is possible?
Kishore tells me that you have a custom Decoder implementation?
e

Elon

07/27/2020, 7:05 PM
We use the avro confluent decoder - it was merged
So retention is 7 days and the kafka topic holds 7 days of data
n

Neha Pawar

07/27/2020, 7:10 PM
what’s ingestion rate? 7 days rebootstrap should not take very long
e

Elon

07/27/2020, 7:10 PM
No, it seems to suck it up in < 10 seconds 🙂
n

Neha Pawar

07/27/2020, 7:10 PM
what do you do with the offline segments you create?
e

Elon

07/27/2020, 7:11 PM
We just use the uploaddownload client and push them to gcs
And we use the pinot gcs fs plugin
So you recommend creating a new table vs trying to fix the old one?
n

Neha Pawar

07/27/2020, 7:13 PM
yes, fixing this is going to be complex. been chatting with @Jackie and he feels the same
👍 1
e

Elon

07/27/2020, 7:14 PM
Recreating now, so for the realtime spec do we use the outgoing time field name?
n

Neha Pawar

07/27/2020, 7:14 PM
We just use the uploaddownload client and push them to gcs
who/how queries these segments?
e

Elon

07/27/2020, 7:14 PM
We have a custom job that queries presto and creates the segments via a flink job, then pushes the generated segments to pinot. Does that answer your question?
n

Neha Pawar

07/27/2020, 7:15 PM
so you have OFFLINE and REALTIME table ? just wanting to confirm if it is a REALTIME only table or a hybrid table
the fix involves - create new table, which has new outgoing time column name, and use that in the table config’s timeColumnName. if this is a REALTIME only table - then we only need to re-bootstrap the realtiem table. if it is hybrid, we would also need to rebootstrap the offline segments
e

Elon

07/27/2020, 7:18 PM
For this one it's realtime, as we never loaded the table from offline before. This is good to know though!
👍 1