Having issues adding a new streaming tables in pin...
# troubleshooting
k
Having issues adding a new streaming tables in pinot, I have added 3 new tables prior with status Good but adding this new table called interactions_REALTIME will show a status Bad and it doesn't seem to consume any new data. Any idea why this is the case? Also looking at the logs, there seems to be warning logs for this table being below replica threshold and failing to find servers hosting the segment.
a
Hi @Kevin Vu, Can you also post the log of AddTable execution and the table config files.
m
Are there any warn/error during table creation on the controller/server logs?
k
Copy code
{
  "tableName": "interactions",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "interactionunixtime",
    "timeType": "MILLISECONDS",
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "interactions",
    "replication": "0"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "solace",
      "stream.solace.consumer.type": "highLevel",
      "stream.solace.topic.name": "cw/jms/interactions/v1",
      "stream.solace.decoder.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceJSONMessageDecoder",
      "stream.solace.consumer.factory.class.name": "org.apache.pinot.plugin.stream.solace.server.SolaceStreamConsumerFactory",
      "stream.solace.jms.Host": "localhost",
      "stream.solace.jms.Username": "****",
      "stream.solace.jms.Password": "****",
      "stream.solace.jms.VPN": "solace-dev",
      "stream.solace.destinationType": "queue",
      "stream.solace.jms.ClientID": "dev-profile"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}
For adding the table, I'll see {"status":"Table interactions_REALTIME succesfully added"}
m
Any logs in controller/server during table creation?
k
I've only seen warn logs, here is whats being shown
2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Segment interactions_REALTIME_1618330076604_0__0__1618330076699 of table interactions_REALTIME has no online replicas 2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 1 segments with no online replicas 2021/04/13 114026.282 WARN [SegmentStatusChecker] [pool-7-thread-6] Table interactions_REALTIME has 0 replicas, below replication threshold :1   2021/04/13 110757.743 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting segment: interactions_REALTIME_1618330076604_0__0__1618330076699 for table: interactions_REALTIME (all ONLINE/CONSUMING instances: [] and OFFLINE instances: [] are disabled, counting segment as unavailable)
m
This is probably from query execution?
I am asking more around the time of table creation.
Also, can you paste the Ideal State and External view?
k
What do you mean by Ideal State and External View?
Logs look fine during table creation, no errors or warnings besides what i mentioned earlier
m
image.png
Also curious if there's any difference between the tables that are working and the one that isn't?
k
EXTERNAL VIEW {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "BUCKET_SIZE": "0",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ERROR"    }  },  "listFields": {} }   IDEAL STATE {  "id": "interactions_REALTIME",  "simpleFields": {    "BATCH_MESSAGE_MODE": "false",    "IDEAL_STATE_MODE": "CUSTOMIZED",    "INSTANCE_GROUP_TAG": "interactions_REALTIME",    "MAX_PARTITIONS_PER_INSTANCE": "1",    "NUM_PARTITIONS": "1",    "REBALANCE_MODE": "CUSTOMIZED",    "REPLICAS": "1",    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",    "STATE_MODEL_FACTORY_NAME": "DEFAULT"  },  "mapFields": {    "interactions_REALTIME_1618332793297_0__0__1618332793369": {      "Server_135.113.208.194_7000": "ONLINE"    }  },  "listFields": {} }
seems like there is an error in mapFields
m
"Server_135.113.208.194_7000": "ERROR"
The server hosting this segment is in ERROR state. There should be a log in the server indicating why it is in error state.
Are you able to access the server log? If so, grep for
interactions_REALTIME_1618332793297_0__0__1618332793369
in that log
k
yes its the pinotServer.log correct?
m
Are you running locally? If so, yes
Can you grep
interactions_REALTIME_1618332793297_0__0__1618332793369
there?
k
2021/04/13 115313.507 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Adding segment: interactions_REALTIME_1618332793297_0__0__1618332793369 to table: interactions_REALTIME 2021/04/13 115313.537 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:agentid.sv.unsorted.fwd 2021/04/13 115313.770 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 20000000 bytes for: interactions_REALTIME_1618332793297_0__0__1618332793369:interactionid.sv.unsorted.fwd
m
Is that it? Also, no errors elsewhere?
k
yes
m
What's your log4j setting?
k
defaults within the pinot binary
m
The fact that you see segment in ERROR state implies that the server is unable to load that segment for some reason. What are your JVM settings?
I am guessing server is running out of resources required to host this segment. But that should most definitely generate an error in the log.
k
what jvm settings does pinot start with if i initialized the instance using ./quick-start-batch.sh?
m
That would probably be minimal.
That is just a demo purpose script (and also for batch ingestion). Are you launching Pinot using that and creating new realtime tables on that demo cluster?
k
yes
m
What's the end goal for your exercise? If it is to setup a cluster to ingest sizable production size data, we might not be able to use quick-start.
But before that, I'd like to understand why you are not getting any errors logged.
The quick-start script is setting log4j as follows:
Copy code
if [ -z "$JAVA_OPTS" ] ; then
  ALL_JAVA_OPTS="-Xms4G -Dlog4j2.configurationFile=conf/quickstart-log4j2.xml"
else
  ALL_JAVA_OPTS=$JAVA_OPTS
fi
Can you play with it to increase the logging level to info or debug?
And then delete and recreate the table?
k
ok I will try that, also this cluster is only for testing/poc, its not that much data.
m
Yeah, enabling the logging correctly will tell us what happened in the server.
k
looking at the log4j for quickstart, seems like most of them are already set to info
m
Are you seeing info messages in your logs though?
k
yes
m
Ah ok
For some reason I thought you werent
k
no i was, i was just only grepping for warn logs
m
Mind deleting and re-creating the table again and monitoring the logs
In that case
can you grep all occurances the segment name (not just warn)?
k
thats what I did earlier but there were no error logs
m
Ok, then lets delete and recreate and see if there are errors this time
k
no errors
m
And still the same issue?
k
yes same issue with the new table
m
Can we do a quick zoom meeting?
k
can you join my webex instead, https://attcorp.webex.com/join/kv609y
m
Joining
Just to update the thread:
Copy code
1. We deleted and recreated the table, this time with entire cluster nuked, and did not see the issue.
2. Our current suspicion is that given that quick-start uses only 32M as Xmx, and several tables were created, the server ran out of resources.
3. This is using a custom connector (Solace), so that might have caused the logging issue.
Thanks @Kevin Vu for trying Pinot, let us know if you need more help
👍 1