Hi, I created a real-time table this table contain...
# troubleshooting
m
Hi, I created a real-time table this table contains about 44M records each time I run a query to get the total count of rows"select count(*) from table_Name" got a different number, for example, I run now I got 4M rows then I run after only one second I got 9M rows then I run again for the third time after only one second I got 4M rows,
m
Is this a realtime only table, or has offline component too?
If realtime only, then your query should have been answered by metadata (ie no scan).
m
it's only realtime table
is there A way to refresh metadata
m
I am unable to think of a reason on the behavior you described.
How many segments do you have in the table, and how many servers?
Also, what’s the cpu/mem for the servers
m
I have 3 servers and 2 segments
Copy code
[
  {
    "tableName": "XXXXXX_REALTIME",
    "numSegments": 2,
    "numServers": 3,
    "numBrokers": 2,
    "segmentDebugInfos": [],
    "serverDebugInfos": [],
    "brokerDebugInfos": [],
    "tableSize": {
      "reportedSize": "1 GB",
      "estimatedSize": "1 GB"
    },
    "ingestionStatus": {
      "ingestionState": "HEALTHY",
      "errorMessage": ""
    }
  }
]
m
can you paste the response metadata from query response?
m
This is the first run { "resultTable": { "dataSchema": { "columnNames": [ "count(*)" ], "columnDataTypes": [ "LONG" ] }, "rows": [ [ 9667129 ] ] }, "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 2, "numSegmentsProcessed": 2, "numSegmentsMatched": 2, "numConsumingSegmentsQueried": 1, "numDocsScanned": 9667129, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 0, "numGroupsLimitReached": false, "totalDocs": 10000000, "timeUsedMs": 110, "offlineThreadCpuTimeNs": 0, "realtimeThreadCpuTimeNs": 0, "segmentStatistics": [], "traceInfo": {}, "numRowsResultSet": 1, "minConsumingFreshnessTimeMs": 1656428445718 }
This is second run { "resultTable": { "dataSchema": { "columnNames": [ "count(*)" ], "columnDataTypes": [ "LONG" ] }, "rows": [ [ 6318998 ] ] }, "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 2, "numSegmentsProcessed": 2, "numSegmentsMatched": 2, "numConsumingSegmentsQueried": 1, "numDocsScanned": 6318998, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 0, "numGroupsLimitReached": false, "totalDocs": 6651510, "timeUsedMs": 65, "offlineThreadCpuTimeNs": 0, "realtimeThreadCpuTimeNs": 0, "segmentStatistics": [], "traceInfo": {}, "numRowsResultSet": 1, "minConsumingFreshnessTimeMs": 1656461396988 }
m
That is really odd
If you see, total docs is very different in the two cases.
m
yes, it's wierd
m
I suspect, somehow the two servers are out of sync. Is the replication 2?
Can you paste the external view?
m
yes replication 2
m
One server has 10M rows, the other has 6.6M rows.
m
what do you mean by external view ?
m
In Zk browsers (from UI), go to external view and paste it
Also, there’s a swagger api to show it.
m
Copy code
{
  "OFFLINE": null,
  "REALTIME": {
    "XXXXX__0__0__20220628T1116Z": {
      "Server_pinot-server-0.pinot-server-headless.default.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.default.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-2.pinot-server-headless.default.svc.cluster.local_8098": "ONLINE"
    },
    "XXXXX__0__1__20220628T1300Z": {
      "Server_pinot-server-0.pinot-server-headless.default.svc.cluster.local_8098": "CONSUMING",
      "Server_pinot-server-1.pinot-server-headless.default.svc.cluster.local_8098": "CONSUMING",
      "Server_pinot-server-2.pinot-server-headless.default.svc.cluster.local_8098": "CONSUMING"
    }
  }
m
can you run
select count(*) from <table> group by $segmentName
?
Wait you have 3 servers, but replication of 2?
m
Copy code
{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "count(*)"
      ],
      "columnDataTypes": [
        "LONG"
      ]
    },
    "rows": [
      [
        4667129
      ],
      [
        5000000
      ]
    ]
  },
  "exceptions": [],
  "numServersQueried": 1,
  "numServersResponded": 1,
  "numSegmentsQueried": 2,
  "numSegmentsProcessed": 2,
  "numSegmentsMatched": 2,
  "numConsumingSegmentsQueried": 1,
  "numDocsScanned": 9667129,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 9667129,
  "numGroupsLimitReached": false,
  "totalDocs": 10000000,
  "timeUsedMs": 163,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "numRowsResultSet": 2,
  "minConsumingFreshnessTimeMs": 1656428445718
}
yes I have 3 servers and 2 replicas
m
That doesn’t make sense right? (Likely not the issue here though)
can we hop on a call to debug?
m
sure
m
check dm