Nizar Hejazi
04/22/2022, 9:29 AMRichard Startin
04/22/2022, 9:36 AMRichard Startin
04/22/2022, 9:36 AMRichard Startin
04/22/2022, 9:38 AMRichard Startin
04/22/2022, 9:39 AMNizar Hejazi
04/22/2022, 9:53 AMemployee
.
Primary key: employeeId
.
Partitioning key: companyId
.
I have 8 partitions for employee
table. This means records of around 1/8 of all companies goes into a single partition. As you mentioned, partitioning helps finds the server. We have also an inverted index on our partitioning key (companyId
), since each partition contains the records of hundreds of companies but our queries are always restricted to a single company.
I’ll add bloom filters on employeeId
column to help prune segments that don’t contains records related to a specific employeeId or a set of employee Ids. I think there is no value in adding companyId
to the set of bloom filter columns since it is highly unlikely that a segment does not contains at least a single record from each company. Note: as per docs, bloom filters can be applied only to dictionary-encoded columns.
We want to avoid scanning all the content of a segment, can you elaborate on why sorted index is better than inverted index? Please note that each record has a unique primary key and our primary key is a Mongo 12-bytes ObjectId.Nizar Hejazi
04/22/2022, 9:55 AMRichard Startin
04/22/2022, 12:44 PMRichard Startin
04/22/2022, 12:45 PMRichard Startin
04/22/2022, 12:46 PMRichard Startin
04/22/2022, 12:48 PMRichard Startin
04/22/2022, 12:49 PMKishore G
Richard Startin
04/22/2022, 2:56 PMRichard Startin
04/22/2022, 2:57 PMNizar Hejazi
04/22/2022, 5:10 PM