Neil Teng
06/18/2021, 9:36 PMMayank
Neil Teng
06/18/2021, 9:49 PMNeil Teng
06/18/2021, 9:49 PMMayank
Mayank
Mayank
Neil Teng
06/18/2021, 10:17 PMMayank
Neil Teng
06/18/2021, 10:20 PMMayank
Your use case has queries mostly for a primary column (eg where customerId = xxx).
If you sort on customerId, then you will always pick contiguous docIds for a given query.
Now consider you have a high cardinality string column that you project in the query.
With dictionary, the fwd index will have dictionary ids, that may point to different disk blocks.
Without dictionary for this high cardinality column, the contiguous docIds will correspond to contiguous disk blocks.
Mayank
Neil Teng
06/18/2021, 10:32 PMMayank
Neil Teng
06/18/2021, 10:38 PMNeil Teng
06/18/2021, 10:41 PMMayank
Mayank
I think for a column-oriented DB, every column is sorted with this docIdĀ and compressed in default. That is the way data lay out in the disk.
Mayank
Neil Teng
06/18/2021, 10:48 PMNeil Teng
06/18/2021, 10:48 PMMayank
Neil Teng
06/18/2021, 10:50 PMMayank
Neil Teng
06/18/2021, 10:51 PMNeil Teng
06/18/2021, 10:52 PMMayank
Mayank
Mayank
Mayank
Neil Teng
06/18/2021, 10:55 PMNeil Teng
06/18/2021, 10:56 PMMayank
Mayank
Neil Teng
06/18/2021, 10:57 PMNeil Teng
06/18/2021, 11:02 PM