https://pinot.apache.org/ logo
s

Shen Wan

09/16/2020, 3:39 PM
SQL query filtering on the field used for partitioning returning nothing. Filtering on other fields is fine. I do not see anything worth mentioning in log. What’s going on?
k

Kishore G

09/16/2020, 3:54 PM
whats the query?
s

Shen Wan

09/16/2020, 4:00 PM
select * from abc_test where service_slug='xyz'
very simple query like this
service_slug is the column used to partition
k

Kishore G

09/16/2020, 4:09 PM
I dont think there is any data in that table
totalDocs is 0
s

Shen Wan

09/16/2020, 4:11 PM
This is the result of
select count(*) from …
however, if I add that where clause, everything is zero
n

Neha Pawar

09/16/2020, 4:12 PM
how about partition ‘xyz’, are you certain that exists in the data?
s

Shen Wan

09/16/2020, 4:13 PM
yes. I can see it from select *
“dimensionFieldSpecs”: [     {       “name”: “service_slug”,       “dataType”: “STRING”     },
n

Neha Pawar

09/16/2020, 4:13 PM
can you do
select count(*), service_slug from abc_test group by service_slug order by count(*) limit 10
and use one of those?
s

Shen Wan

09/16/2020, 4:14 PM
“segmentPartitionConfig”: {         “columnPartitionMap”: {           “service_slug”: {             “functionName”: “HashCode”,             “numPartitions”: 16           }         }       },
k

Kishore G

09/16/2020, 4:15 PM
can you paste the metadata of the segment
looks like its pruning all the segments
s

Shen Wan

09/16/2020, 4:16 PM
this query works
select count(*) from oas_log_test where service_slug='ofo4'
this returns nothing
k

Kishore G

09/16/2020, 4:18 PM
can you paste the metadata of a segment?
s

Shen Wan

09/16/2020, 4:19 PM
what’s that?
REST GET?
k

Kishore G

09/16/2020, 4:20 PM
yes or you can use clutermanager UI to navigate
n

Neha Pawar

09/16/2020, 4:20 PM
message has been deleted
s

Shen Wan

09/16/2020, 4:21 PM
n

Neha Pawar

09/16/2020, 4:24 PM
was this exact loigc used to partition the stream:
Copy code
return Math.abs(value.hashCode()) % _numPartitions;
s

Shen Wan

09/16/2020, 4:25 PM
I do not write any code to partition, do I?
And how do I get the Zookeeper browser UI?
There are 1557 records with null in the
service_slug
column used for partition. How does Pinot handle this?
n

Neha Pawar

09/16/2020, 4:27 PM
Data partitioning won’t happen in Pinot. The data needs to be pre-partitioned. From this doc: https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#partitioning
Copy code
After setting the above config, data needs to be partitioned with the same partition function and number of partitions before running Pinot segment build and push job for offline push. Realtime partitioning depends on the kafka for partitioning. When emitting an event to kafka, a user need to feed partitioning key and partition function for Kafka producer API
k

Kishore G

09/16/2020, 4:28 PM
@Mayank ^^
s

Shen Wan

09/16/2020, 4:31 PM
Ah, did not know Kafka needs to be using the same partition. Should Pinot at least return some data or error in such case?
m

Mayank

09/16/2020, 4:31 PM
Pinot will treat it as unpartitioned.
k

Kishore G

09/16/2020, 4:32 PM
yes, it should return data
if its unpartitioned,
n

Neha Pawar

09/16/2020, 4:32 PM
Mayank, Kishore, we put this in the metadata, so it looks like we set partitions based on whatever data was received
Copy code
{\"columnPartitionMap\":{\"service_slug\":{\"functionName\":\"HashCode\",\"numPartitions\":16,\"partitions\":[10]}}}
m

Mayank

09/16/2020, 4:33 PM
Is this in the Pinot segment?
s

Shen Wan

09/16/2020, 4:33 PM
select count(*) from oas_log_test where service_slug=‘ofo1’ This query on
ofo1
returns a bit more info
n

Neha Pawar

09/16/2020, 4:35 PM
basically partitioning is mismatched. the stream was not partitioned with the hashcode. Bot Pinot is expecting it to be. Pinot has stored in metadata, based on whatever is seeing. So querytime we’re seeing mismatch
m

Mayank

09/16/2020, 4:35 PM
I don't think partitioning setup issues can cause empty results
n

Neha Pawar

09/16/2020, 4:35 PM
yes @Mayank that is in segment metadata. Shen has posted some metadata above
it will if there’s no matching partition found right?
m

Mayank

09/16/2020, 4:38 PM
I think it is from table config and not segment metadata?
s

Shen Wan

09/16/2020, 4:38 PM
preparing lunch. Will be back after lunch. Lemme know what other info you guys need.
m

Mayank

09/16/2020, 4:38 PM
Segment metadata should look like:
Copy code
column.service_slug.partitionFunction = Murmur
column.service_slug.numPartitions = 32
column.service_slug.partitionValues = 24
@Neha Pawar So during consumption, we identify all the partitions rows of a consuming segment are in. If they belong to different partitions, then either we write multiple partitions in metadata (or don't write at all, can't recall). So during pruning, segment won't be pruned as long as there is either no partition info, or one of the partition ids in the metadata match
n

Neha Pawar

09/16/2020, 4:43 PM
oh i see. so not doing partitioning on stream will simply cause non-optimal querying. But there won’t be any incorrectness. Got it
m

Mayank

09/16/2020, 4:43 PM
Yep
k

Kishore G

09/16/2020, 4:47 PM
Is this a bug?
m

Mayank

09/16/2020, 4:48 PM
No, why
@Shen Wan Could you modify the query as
where service_slug in ('ofo1')
? I want to validate a theory
n

Neha Pawar

09/16/2020, 5:02 PM
ofo1 is the one that returns results. You mean ofo4?
m

Mayank

09/16/2020, 5:03 PM
yeah
s

Shen Wan

09/16/2020, 5:12 PM
select count(*) from oas_log_test where service_slug in ('ofo4')
returns nothing
m

Mayank

09/16/2020, 5:12 PM
So I think it may not be related to partitioning
IIRC, partition pruning kicks in for equality predicate.
Is there any query that returns
ofo4
?
n

Neha Pawar

09/16/2020, 5:17 PM
also can you share the broker logs from around that time, even if there’s no errors you see. there might be something that pops up for us
s

Shen Wan

09/16/2020, 5:38 PM
recent broker logs
select distinct service_slug from oas_log_test where service_slug <> 'null'
n

Neha Pawar

09/16/2020, 6:00 PM
could you share segment metadata from a few other segments, of different partitions (for example, previously shared metadata was kafka partition 10)
s

Shen Wan

09/16/2020, 6:05 PM
like this?
n

Neha Pawar

09/16/2020, 6:06 PM
yes, maybe a few more for other partitions?
trying to validate something
s

Shen Wan

09/16/2020, 6:07 PM
{ “segment.realtime.endOffset”: “322913", “segment.time.unit”: “MILLISECONDS”, “segment.start.time”: “1600245140167", “segment.flush.threshold.size”: “113905", “segment.realtime.startOffset”: “209008", “segment.end.time”: “1600258808629", “segment.total.docs”: “113905", “segment.table.name”: “oas_log_test_REALTIME”, “segment.realtime.numReplicas”: “2", “segment.creation.time”: “1600246011143", “segment.realtime.download.url”: “http://pinot-logging-controller-2.pinot-logging-controller-headless.pinot-logging.svc.cluster.local:9000/segments/oas_log_test/oas_log_test__0__8__20200916T0846Z”, “segment.name”: “oas_log_test__0__8__20200916T0846Z”, “segment.index.version”: “v3", “custom.map”: null, “segment.flush.threshold.time”: null, “segment.type”: “REALTIME”, “segment.crc”: “1038864885", “segment.partition.metadata”: “{\“columnPartitionMap\“{\“service slug\“{\“functionName\“\“HashCode\“,\“numPartitions\“16,\“partitions\“:[0]}}}“, “segment.realtime.status”: “DONE” }
n

Neha Pawar

09/16/2020, 6:09 PM
thank you
is it possible that your stream is already partitioned by HashCode on service_slug? or are you certain the stream has no partitioning whatsoever? Just trying to verify why the kafka partition number is always matching the “partitions” in the partition metadata.
s

Shen Wan

09/16/2020, 6:32 PM
I do not know. This is the config
Do you have an example code that shows how to set up partition while sending messages to Kafka?
I hope this works.
n

Neha Pawar

09/16/2020, 7:47 PM
Hey @Shen Wan we have identified a bug in the realtime partitioning logic. Please give us some time to figure out a fix/workaround.
👍 1
@Shen Wan if you want to use partitioning, unfortunately the only way forward is to recreate this table. And before doing that, set partitioning logic in Kafka stream to match the logic in the Pinot table config
s

Shen Wan

09/16/2020, 8:20 PM
I see. This is a test table. So it is OK. What is the bug about?
k

Kishore G

09/16/2020, 8:23 PM
we assume that kafka stream is partitioned on that key (in your case -service)
n

Neha Pawar

09/16/2020, 8:24 PM
In realtime, Pinot is assuming that the stream is partitioned. So the partition number is directly used as available partitions in the segment metadata. When consuming data from the partitions and creating segments, no validation is done to ensure that the data actually matches the partition, based on the column.
s

Shen Wan

09/16/2020, 8:26 PM
So you guys are going to make Pinot query all partitions when partition info is incorrect?
And BTW, before I drop the table and recreated, I’d like to get some stats, like storage usage per column. Where can I get them?
n

Neha Pawar

09/16/2020, 8:33 PM
i’m not sure we have per column storage stats. @Kishore G?
k

Kishore G

09/16/2020, 8:39 PM
we do, its in the segment directory, its called index_map
s

Shen Wan

09/16/2020, 8:43 PM
So not a REST API but a file?
k

Kishore G

09/16/2020, 8:45 PM
yes, for now, please an issue. we can add that as part of segment metadata
s

Shen Wan

09/16/2020, 8:54 PM
in pinot server? what directory?
n

Neha Pawar

09/16/2020, 8:56 PM
this will be whatever directory you used when starting server as -dataDir
s

Shen Wan

09/16/2020, 8:59 PM
I find nothing under
/var/pinot/server/data/segment
something under
…/data/index
n

Neha Pawar

09/16/2020, 9:00 PM
do you see directories for each segment there?
s

Shen Wan

09/16/2020, 9:14 PM
no
actually yes, found
index_map
Are all the sizes in bytes? I add them all up and get ~60% of
diskSizeInBytes
. Are the rest 40% raw data? Does this look reasonable?
And I wonder how is repartitioning supposed to work: updating Kafka and Pinot config cannot be atomic, so there will be a period when Kafka’s partition setting and Pinot’s is out-of-sync, right?
n

Neha Pawar

09/16/2020, 10:58 PM
yes it is in bytes
Which is why deleting table was suggested. delete Pinot table. correct the partitioning in the stream recreate the table.
s

Shen Wan

09/16/2020, 11:00 PM
That’s infeasible in prod.
k

Kishore G

09/16/2020, 11:00 PM
in prod, you will have to remove the partition info from the metadata
s

Shen Wan

09/16/2020, 11:01 PM
rebalance does not help?
k

Kishore G

09/16/2020, 11:02 PM
no, the segment processing framework that @Neha Pawar is building can help but its not ready
s

Shen Wan

09/16/2020, 11:04 PM
so removing partition info will cause Pinot to treat all data as one partition?
k

Kishore G

09/16/2020, 11:05 PM
yes
broker is basically looking at the segment metadata in ZK and thinks that this segment is partitioned
s

Shen Wan

09/16/2020, 11:06 PM
then update Kafka partition, then update Pinot partition to be consistent, right?
k

Kishore G

09/16/2020, 11:06 PM
and applies the partitioning function, if it does not match it excludes the segment from query execution
yes
is this already in production?
s

Shen Wan

09/16/2020, 11:07 PM
my table? no, it’s just a test.
k

Kishore G

09/16/2020, 11:08 PM
got it
s

Shen Wan

09/16/2020, 11:08 PM
so segments created during the interim will have bad query performance, right?
k

Kishore G

09/16/2020, 11:08 PM
correct, by the way how many services do you have
s

Shen Wan

09/16/2020, 11:09 PM
you mean pinot servers? 12
k

Kishore G

09/16/2020, 11:10 PM
no, what is the cardinality for the partition column
s

Shen Wan

09/16/2020, 11:10 PM
up to 100 I think
And to my previous question: forward index is the data, right? So why all the sizes in
index_map
add up to just ~60% of
diskSizeInBytes
?
k

Kishore G

09/16/2020, 11:19 PM
you are probably missing inverted index
s

Shen Wan

09/16/2020, 11:20 PM
I included that.
I included dict size, fwd index size inv index size range index size bloomfilter size
all that I can find in
index_map
k

Kishore G

09/16/2020, 11:24 PM
can you paste the output
s

Shen Wan

09/16/2020, 11:39 PM
index_map
REST response
k

Kishore G

09/16/2020, 11:43 PM
it does not add up?
can you do ls -l on the segment file as well
s

Shen Wan

09/16/2020, 11:45 PM
not any more, I already dropped the table to repartition
will try to get some stats again tomorrow
k

Kishore G

09/16/2020, 11:47 PM
ok
these things should match.
s

Shen Wan

09/17/2020, 1:12 AM
I also wonder where is the text index info? I set text index for columns
req
and
resp
but do not see anything related.
I deleted the table 
oas_log_test
 and created a new table 
oas_log_test_v2
 with new schema. But the new table contains 12 million very old records and new records are not flowing in. Do we need to reset Kafka?
n

Neha Pawar

09/17/2020, 4:28 PM
are you using the same kafka topic? and that kafka topic has all this old data? As soon as the table is created, pinot will ingest whatever is already in the topic
you could change that to consume from the latest messages post table creation. streamConfigs section “offset” field change from smallest to largest
another possibility is that the table didn’t get deleted completely and new table was created . after deleting, check external view to make sure everything is gone
s

Shen Wan

09/17/2020, 5:35 PM
external view of
oas_log_test
is 404. external view of
oas_log_test_v2
is stuck on CONSUMING.
n

Neha Pawar

09/17/2020, 5:38 PM
why do you say it is stuck on CONSUMING? it looks like a valid EV
s

Shen Wan

09/17/2020, 5:57 PM
because I’m expecting new segments generated for the new data I sent to Kafka
n

Neha Pawar

09/17/2020, 6:00 PM
you cannot see the new data in the queries?
segments are created only ocassionally
s

Shen Wan

09/17/2020, 6:00 PM
no. always the same 1.2 million records over 30 hours ago
even after I tried your suggestion to update the table to
largest
n

Neha Pawar

09/17/2020, 6:02 PM
updating to
largest
will not remove older data from the table. that signal is for a new table to about where to start consumption
s

Shen Wan

09/17/2020, 6:03 PM
I do not see new data coming in.
n

Neha Pawar

09/17/2020, 6:03 PM
could you start with a clean kafka topic and table?
or post any exceptions that you see
s

Shen Wan

09/17/2020, 6:03 PM
And I do not understand why old data are still there even after I deleted the table and recreate.
n

Neha Pawar

09/17/2020, 6:04 PM
you used the same topic right? and that topic has all the data?
s

Shen Wan

09/17/2020, 6:04 PM
yes
I did not touch Kafka
n

Neha Pawar

09/17/2020, 6:04 PM
then Pinot is going to ingest all the data from the topic, if you had set to “smallest”
s

Shen Wan

09/17/2020, 6:05 PM
that’s fine. but only 1.2M ingested.
and nothing changes after I set to “largest”
n

Neha Pawar

09/17/2020, 6:06 PM
like i said above, updating an existing table to “largest” will have no effect
i cannot tell why newer events aren’t getting ingested. will need to see logs
s

Shen Wan

09/17/2020, 6:08 PM
I’ll delete table and recreate with “largest”
now ingestion is active. It should exceed 1.2M records soon.
You mentioned that maybe the old table was not deleted completely. How would that affect the new table consuming data from Kafka? And how can we verify that a table is completely deleted?
n

Neha Pawar

09/17/2020, 6:26 PM
if deleting and recreating with largest fixed it for you, then it was probably not about un-deleted data. When a table is deleted, the directories for that table in the server and controller get deleted. If new table create is issued before delete is done, the old directories could interfere with the new table. But again, it doesn’t appear to be the case for you
s

Shen Wan

09/17/2020, 7:17 PM
select count(*) from oas_log_test_v2 With this table setting the number of docs ingested halted at 800k
no new segments created
I feel Pinot is still in an unhealthy/stuck state.
2020/09/17 18:47:41.613 ERROR [LLRealtimeSegmentDataManager_oas_log_test_v2__11__0__20200917T1810Z] [oas_log_test_v2__11__0__20200917T1810Z] Could not build segment
This log confirms the issue but provides no insight.
n

Neha Pawar

09/17/2020, 7:41 PM
can you share the whole log
what version of Pinot are you using?
s

Shen Wan

09/17/2020, 7:47 PM
And that was the whole log line.
logs around that error
Is this exception the culprit? What does it mean?
inverted index must be built on columns with dictionary?
n

Neha Pawar

09/17/2020, 8:04 PM
can i see the full table confg and full schema?
also why does the exception get skipped in your logs? The logs line is actually
Copy code
} catch (Exception e) {
        segmentLogger.error("Could not build segment", e);
but i dont see the exception
s

Shen Wan

09/17/2020, 8:05 PM
That I complained yesterday and thought that was a Pinot bug. Maybe stackdriver is truncating?
schema
config
n

Neha Pawar

09/17/2020, 8:10 PM
afaik, you cannot put inv index column as noDictionary
also, timeFieldSpec is deprecated
suggest you put all time fields as dateTimeFieldSpecs
s

Shen Wan

09/17/2020, 8:11 PM
I thought inv index would make it unnecessary to also build dictionary?
If it has to be built on top of dictionary the config structure should represent the logical relationship, or document it?
n

Neha Pawar

09/17/2020, 8:13 PM
why do you want to make it no dictionary?
s

Shen Wan

09/17/2020, 8:16 PM
It will be UUID in prod.
does not make sense to me to build a dictionary for UUIDs.
I’d like the UUID itself be the key.