https://pinot.apache.org/ logo
#general
Title
# general
a

Andre Hsu

05/14/2019, 12:50 AM
Is it possible to have the exact same segments on two different clusters but somehow the data row counts are different?
I have a table that have the exact same segments from the controller API on two different clusters but the row counts for the table are different.
r

Ravi

05/14/2019, 1:09 AM
Do you use count* or other methods? Either ways, there could be a small probability that one of the servers did not respond. You can check a flag in the response called "partialResponse" or "isPartialResponse" (don't remember the exact name). However this should not happen consistently
m

Mayank

05/14/2019, 1:10 AM
Another possibility is that some segment(s) are not online in one cluster.
a

Andre Hsu

05/14/2019, 2:08 AM
how would one check if some of the segments are not online ?
In the response it says all the servers responsed
also, we set up the cluster in a way where the controllers don't share a single filesystem. Could that be causing this?
m

Mayank

05/14/2019, 2:13 AM
Please check the external view (I believe there’s an endpoint), to ensure all segments are online
You can also check external view via zooInspector
a

Andre Hsu

05/14/2019, 2:31 AM
Okay I checked it, all the segments come back as "ONLINE"
m

Mayank

05/14/2019, 2:46 AM
And both clusters have same number of online segments? Also, are you sure the data is exactly the same?
a

Andre Hsu

05/14/2019, 3:14 AM
for both clusters when I call their controllers table segments api, followed by downloading each segment result in the exact same segments. But the data are different, for the same table, I get different row counts in the two clusters.
Yes both clusters have 253 segments that are online
m

Mayank

05/14/2019, 3:23 AM
How did you determine that the segment contents are identical?
If there is no partial response, then the only plausible reason I can think of is somehow the two clusters are seeing different data.
To narrow down, you can start adding time predicates (assuming you have time column) and verify each day’s data.
a

Andre Hsu

05/14/2019, 5:29 AM
yeah I already narrowed the data down to a specific week
The way I verify the segments are identical is by using the get segment API to download all the segments from both clusters and compare each segment file content.
The way I am running into this problem is that I start with a cluster and download its segments. Then I post its segments to another cluster. Then I confirm both clusters now have the same segments. But cluster 2 has more data than the original cluster from the table row count. So it would seem some segments in the original cluster are not being used somehow. But the external view says all the segments are online.
m

Mayank

05/14/2019, 6:24 AM
Is that the oldest week, and beyond your retention time? The clusters may apply retention at different times, causing the row count to be different
s

Subbu Subramaniam

05/14/2019, 3:35 PM
@Andre Hsu please get the external views and compare the count of segments and the segment names across the two external views. See if any segments are missing in one cluster vs another.
Also, you mentioned that controllers do not share storage. All controllers within a cluster must share storage. Not sure how you upload the segments to the cluster, but if you use a vip, for example, segments could go to any controller. What are the servers configured with as the controller address?
a

Andre Hsu

05/14/2019, 5:55 PM
@Mayank No the problematic data is not the oldest week so I don't think retention is the issue. When I delete all the segments for the problematic week, the count for that week becomes 0 so I know the segment files that match the problematic week.
@Subbu Subramaniam I counted the segments and the names across the two views, they are the same and all online.
m

Mayank

05/14/2019, 5:57 PM
Does the count match after you remove the problematic week from both clusters?
a

Andre Hsu

05/14/2019, 5:58 PM
@Subbu Subramaniam I believe in the doc, it says if there's more than one controller, they must all share the same filesystem/storage. We have a load balancer in front of the 5 controllers, so when we upload the segment can go to any of the 5. Would the controllers not sharing storage be an issue?
@Mayank No I haven't touched the original cluster from which I downloaded the segments because it is our prod cluster. I did however upload segments from the prod cluster to two separate clusters. The counts for the two other clusters from the uploaded segments have the same row counts, which is higher than the row counts for the prod cluster. I wonder if prod is seeing all the segments correctly. It seems like the higher row counts that appear in the other two clusters and match are the correct numbers.
m

Mayank

05/14/2019, 6:35 PM
Hmm, I am confused. Do the two non-prod clusters report the same count? All the comparisons that you talked about (matching segment names and numbers) was done between two non-matching clusters or the two separate clusters that match? Have you investigated the external view for the prod table (Also, are you sure you are looking at external view and not ideal-state)?
a

Andre Hsu

05/14/2019, 7:55 PM
Yeah I called the external view API.
The comparisons I was talking about were between prod cluster and staging cluster. Initially I downloaded some segments from prod and uploaded to staging. For the same segments, prod reported less counts than staging. Then, I also uploaded the same segments to another cluster called dev. Dev and staging clusters report the same row counts. All three clusters have the same segments and they all show up as online in external view.
This leads me to think that prod cluster is the one with incorrect row counts.
m

Mayank

05/14/2019, 7:58 PM
Can you run count(*) group by time on all three clusters? Then you will know exactly which time stamp is causing the discrepancy
Yes, seems like prod cluster is the incorrect one. But from all the info you provided, I am unable to think what might be causing the issue
Are all three offline only?
a

Andre Hsu

05/14/2019, 8:08 PM
yep all three are offline only. Yeah I did a group by year_week which led me to identifying the exact segments causing the issue.
I suspect restarting the prod cluster will fix the issue just not sure what the root cause is.
s

Subbu Subramaniam

05/14/2019, 8:42 PM
@Andre Hsu is it possible that some segments have aged out in the prod cluster (retention manager runs)?
Is it possible that the prod cluster has a realtime table with the same name?
@Andre Hsu what is your controller config for vipProtocol, vipHost, and vipPort? To provide you context, when a segment is uploaded, the controllers use these values to set the segment access URL on the segment metadata. The servers use the segment metadata to download the segment when they need to do so. The server's request to download can go to any controller (via the vip) and if that controller does not have the segment, then the server will never get the segment. In this case,the EXTERNALVIEW will be in ERROR state, though, so I am confused how things are working at all.
a

Andre Hsu

05/14/2019, 8:50 PM
No every table in prod is offline table and there are no duplicate tables. How do I check if the segments have aged out? It is unlikely because the other weeks data whose segments were uploaded around the same time on the same day have the correct counts.
how do I check for vipProtocol, vipHost, and vipPort?
m

Mayank

05/14/2019, 8:54 PM
@Andre Hsu Did you mention that you don't have shared storage in prod?
And also that you do have multiple controller hosts in prod?
It is possible that you ended up in a state where due to missing shared storage, the data seen by prod cluster is not the same as the data you are downloading from the prod cluster.
For example, different controllers have potentially different segments (with same name). And the prod servers are loading one copy of these differing segments, however, you are download a different copy with the same name.
a

Andre Hsu

05/14/2019, 9:02 PM
yep currently I believe each controller has its own storage/filesystem. I am looking through our deployment configuration scripts to find out more.
we have five controllers currently in the cluster.
m

Mayank

05/14/2019, 9:04 PM
Ok, is it possible that they don't have the same data?
That would be my guess.
a

Andre Hsu

05/14/2019, 9:04 PM
How would I confirm if that is the case?
Maybe logging into the servers and seeing what segments they're actually seeing?
and compare those to the segments I downloaded from prod?
m

Mayank

05/14/2019, 9:05 PM
You already know the segments that could be causing the issue right? (From your previous exercise)?
If so, just compare those in all 5 controllers and see if their size matches.
Also, segments on controller are tar.gz files, you can unzip them and
vim
the metadata.properties to find the
total.docs
in that segment.
a

Andre Hsu

05/14/2019, 9:08 PM
Okay, I'll look at the segments from the five controllers separately and get back to you thanks
I sshed into each controller in prod cluster, and went into /pinot_data/controller_data/tablename
I see that for the year_week that has issue, there are 253 segments and they are spread across each controller.
meaning each controller has a portion of the 253 segments, is that an issue?
But somehow, the servers are still able to download the segments from the correct controller?
I checked the total segments file size by summing up the segment sizes from the 5 controllers. The number is the same as the total segments size from downloading the segments and summing up the file sizes.
basically each controller has a subset of the total segments for the table and adding those up result in the same 253 segments as calling the download segment API for all 253 segments.
Is that set up correct? I think I found the issue, I must have missed the external view output the first time. One of the segments outputs ERROR.
I logged into the server that's supposed to have the segment and the segment is missing from the server just like you predicted. Maybe the server is not asking for the segment from the right controller since the segment is on one of the controllers.
@Subbu Subramaniam as you said, one of the segments actually showed up as ERROR. How do I check if the vipProtocol, vipHost, and vipPort are correct? It seems that the servers know which controllers have the segments it need but one segment.
m

Mayank

05/14/2019, 11:09 PM
@Andre Hsu I thought you had confirmed that all segments were ONLINE in External View of prod? Anyway, this is the root cause for the discrepancy. Now you need to find out why the segment is in error state.
a

Andre Hsu

05/14/2019, 11:11 PM
@Mayank Sorry I missed it. One of them said error and the servers that are supposed to have the segment do not have them in the index dir for some reason.
m

Mayank

05/14/2019, 11:12 PM
Also, as @Subbu Subramaniam mentioned, it is a bit surprising how things are working at all without shared storage. You can check the SegmentZKMetadata via ZooInspector for download url of the segment. If that shows up as a vip, then it can go to any one of the 5 controllers for that segment. However, it is not guaranteed that controller has the segment needed, and the server wont be able to download it.
a

Andre Hsu

05/14/2019, 11:13 PM
zookeeper know the controller IP that has the segment because when I use zk client to get table segment IP, it returned the correct controller IP.
m

Mayank

05/14/2019, 11:13 PM
I would not recommend running the cluster without shared storage in controller, certainly not in production.
What does it return for the segment in ERROR state?
a

Andre Hsu

05/14/2019, 11:14 PM
what does zookeeper return?
m

Mayank

05/14/2019, 11:14 PM
What is the download url in segmentZKMetadata for the segment in ERROR state?
a

Andre Hsu

05/14/2019, 11:15 PM
oh hm I'm not sure how to use ZooInspector
m

Mayank

05/14/2019, 11:15 PM
It will be super helpful for you
In the meanwhile, you can also use the endpoint
controllerHost:controllerPort/tables/<tableName>/segments/<segmentName>
to access the segmentZKMetadata that has the download url.
a

Andre Hsu

05/14/2019, 11:45 PM
this is what it says
I got zooinspector to work
{ "id":"dma_behavior_2018_10_2018_10_130" ,"simpleFields":{ "segment.crc":"577498001" ,"segment.creation.time":"1556344508517" ,"segment.end.time":"-1" ,"segment.index.version":"v3" ,"segment.name":"dma_behavior_2018_10_2018_10_130" ,"segment.offline.download.url":"http://controller-02:9000/segments/dma_behavior/dma_behavior_2018_10_2018_10_130" ,"segment.offline.push.time":"1556344540556" ,"segment.offline.refresh.time":"-9223372036854775808" ,"segment.start.time":"-1" ,"segment.table.name":"dma_behavior" ,"segment.time.unit":"null" ,"segment.total.docs":"179477" ,"segment.type":"OFFLINE" } ,"listFields":{ } ,"mapFields":{ } }
i believe the offline download url is correct and point to the right controller url. So it seems like it's set up in a way that server will ask the right controller for the segment?
Also is http://controller-url/segments/table_name/segment_name supposed to return code 404?
I can get the segment names if I omit the segment_name in the path though
ah I see why, it's because asking for specific segment is controller specific the way we set this up
This issue has been resolved. The set up environment with separate controller volumes may cause race conditions. We saw that there was a problematic segment that was in ERROR state and missing from the server. I will reupload that segment as a temporary fix and get the controllers to share volume maybe leverage deep storage.
Thanks for your help!
2 Views