https://pinot.apache.org/ logo
k

Kevin Vu

04/13/2021, 4:17 PM
Having issues adding a new streaming tables in pinot, I have added 3 new tables prior with status Good but adding this new table called interactions_REALTIME will show a status Bad and it doesn't seem to consume any new data. Any idea why this is the case? Also looking at the logs, there seems to be warning logs for this table being below replica threshold and failing to find servers hosting the segment.
a

ayush sharma

04/13/2021, 4:25 PM
Hi @Kevin Vu, Can you also post the log of AddTable execution and the table config files.
m

Mayank

04/13/2021, 4:37 PM
Are there any warn/error during table creation on the controller/server logs?
k

Kevin Vu

04/13/2021, 4:37 PM
Copy code
{
  "tableName": "interactions",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "interactionunixtime",
    "timeType": "MILLISECONDS",
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "interactions",
    "replication": "0"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "solace",
      "stream.solace.consumer.type": "highLevel",
      "stream.solace.topic.name": "cw/jms/interactions/v1",
      "stream.solace.decoder.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceJSONMessageDecoder",
      "stream.solace.consumer.factory.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceStreamConsumerFactory",
      "stream.solace.jms.Host": "localhost",
      "stream.solace.jms.Username": "****",
      "stream.solace.jms.Password": "****",
      "stream.solace.jms.VPN": "solace-dev",
      "stream.solace.destinationType": "queue",
      "stream.solace.jms.ClientID": "dev-profile"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}
For adding the table, I'll see {"status":"Table interactions_REALTIME succesfully added"}
m

Mayank

04/13/2021, 4:41 PM
Any logs in controller/server during table creation?
k

Kevin Vu

04/13/2021, 4:45 PM
I've only seen warn logs, here is whats being shown
2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Segment interactions_REALTIME_1618330076604_0__0__1618330076699 of table interactions_REALTIME has no online replicas 2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 1 segments with no online replicas 2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 0 replicas, below replication threshold :1   2021/04/13 110757.743 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting segment: interactions_REALTIME_1618330076604_0__0__1618330076699 for table: interactions_REALTIME (all ONLINE/CONSUMING instances: [] and OFFLINE instances: [] are disabled, counting segment as unavailable)
m

Mayank

04/13/2021, 4:46 PM
This is probably from query execution?
I am asking more around the time of table creation.
Also, can you paste the Ideal State and External view?
k

Kevin Vu

04/13/2021, 4:55 PM
What do you mean by Ideal State and External View?
Logs look fine during table creation, no errors or warnings besides what i mentioned earlier
m

Mayank

04/13/2021, 4:58 PM
message has been deleted
Also curious if there's any difference between the tables that are working and the one that isn't?
k

Kevin Vu

04/13/2021, 5:00 PM
EXTERNAL VIEW {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "BUCKET_SIZE": "0",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ERROR"    }  },  "listFields": {} }   IDEAL STATE {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ONLINE"    }  },  "listFields": {} }
seems like there is an error in mapFields
m

Mayank

04/13/2021, 5:00 PM
"Server_135.113.208.194_7000": "ERROR"
The server hosting this segment is in ERROR state. There should be a log in the server indicating why it is in error state.
Are you able to access the server log? If so, grep for
interactions_REALTIME_1618332793297_0__0__1618332793369
in that log
k

Kevin Vu

04/13/2021, 5:03 PM
yes its the pinotServer.log correct?
m

Mayank

04/13/2021, 5:04 PM
Are you running locally? If so, yes
Can you grep
interactions_REALTIME_1618332793297_0__0__1618332793369
there?
k

Kevin Vu

04/13/2021, 5:07 PM
2021/04/13 115313.507 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Adding segment: interactions_REALTIME_1618332793297_0__0__1618332793369 to table: interactions_REALTIME 2021/04/13 115313.537 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:agentid.sv.unsorted.fwd 2021/04/13 115313.770 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:interactionid.sv.unsorted.fwd
m

Mayank

04/13/2021, 5:08 PM
Is that it? Also, no errors elsewhere?
k

Kevin Vu

04/13/2021, 5:08 PM
yes
m

Mayank

04/13/2021, 5:09 PM
What's your log4j setting?
k

Kevin Vu

04/13/2021, 5:10 PM
defaults within the pinot binary
m

Mayank

04/13/2021, 5:10 PM
The fact that you see segment in ERROR state implies that the server is unable to load that segment for some reason. What are your JVM settings?
I am guessing server is running out of resources required to host this segment. But that should most definitely generate an error in the log.
k

Kevin Vu

04/13/2021, 5:11 PM
what jvm settings does pinot start with if i initialized the instance using ./quick-start-batch.sh?
m

Mayank

04/13/2021, 5:12 PM
That would probably be minimal.
That is just a demo purpose script (and also for batch ingestion). Are you launching Pinot using that and creating new realtime tables on that demo cluster?
k

Kevin Vu

04/13/2021, 5:13 PM
yes
m

Mayank

04/13/2021, 5:15 PM
What's the end goal for your exercise? If it is to setup a cluster to ingest sizable production size data, we might not be able to use quick-start.
But before that, I'd like to understand why you are not getting any errors logged.
The quick-start script is setting log4j as follows:
Copy code
if [ -z "$JAVA_OPTS" ] ; then
  ALL_JAVA_OPTS="-Xms4G -Dlog4j2.configurationFile=conf/quickstart-log4j2.xml"
else
  ALL_JAVA_OPTS=$JAVA_OPTS
fi
Can you play with it to increase the logging level to info or debug?
And then delete and recreate the table?
k

Kevin Vu

04/13/2021, 5:22 PM
ok I will try that, also this cluster is only for testing/poc, its not that much data.
m

Mayank

04/13/2021, 5:23 PM
Yeah, enabling the logging correctly will tell us what happened in the server.
k

Kevin Vu

04/13/2021, 5:25 PM
looking at the log4j for quickstart, seems like most of them are already set to info
m

Mayank

04/13/2021, 5:26 PM
Are you seeing info messages in your logs though?
k

Kevin Vu

04/13/2021, 5:26 PM
yes
m

Mayank

04/13/2021, 5:26 PM
Ah ok
For some reason I thought you werent
k

Kevin Vu

04/13/2021, 5:26 PM
no i was, i was just only grepping for warn logs
m

Mayank

04/13/2021, 5:26 PM
Mind deleting and re-creating the table again and monitoring the logs
In that case
can you grep all occurances the segment name (not just warn)?
k

Kevin Vu

04/13/2021, 5:28 PM
thats what I did earlier but there were no error logs
m

Mayank

04/13/2021, 5:29 PM
Ok, then lets delete and recreate and see if there are errors this time
k

Kevin Vu

04/13/2021, 5:41 PM
no errors
m

Mayank

04/13/2021, 5:52 PM
And still the same issue?
k

Kevin Vu

04/13/2021, 5:52 PM
yes same issue with the new table
m

Mayank

04/13/2021, 5:52 PM
Can we do a quick zoom meeting?
k

Kevin Vu

04/13/2021, 5:58 PM
can you join my webex instead, https://attcorp.webex.com/join/kv609y
m

Mayank

04/13/2021, 6:01 PM
Joining
Just to update the thread:
Copy code
1. We deleted and recreated the table, this time with entire cluster nuked, and did not see the issue.
2. Our current suspicion is that given that quick-start uses only 32M as Xmx, and several tables were created, the server ran out of resources.
3. This is using a custom connector (Solace), so that might have caused the logging issue.
Thanks @Kevin Vu for trying Pinot, let us know if you need more help
👍 1