What is the best practice on to dictionary or not ...
# general
j
What is the best practice on to dictionary or not to dictionary string id columns with billions of unique values? I have a non-dictionary column of id strings like
"0.0.89748-1612413398-472232185"
and
select * where id = '...' performance
is horrible. Just times out after 10 seconds. I assume I need to add a reverse index, the question is should it be dictionary or non-dictionary?
k
try bloom filter first
j
Cool interesting, will do 🙂
thanks
m
Also, I notice you are running
select *
, as opposed to
select count(*)
, are you interested in actual rows, or just a count?