Sharath Gururaj
09/26/2023, 8:09 AMsampledata
and sampledata2
both with the same schema
create temporary table sampledata (
id INT primary key not enforced,
my_value STRING,
updated_at TIMESTAMP
) with (
'connector' = 'filesystem',
'path' = 'file:///home/sharath/data2/sample_data.csv',
'format' = 'csv',
);
each containing the same data - 1 million rows of 2 kb each row. I am trying to measure join speed with the following
create table sampledatasink as
select s1.id, s2.my_value, s1.updated_at from sampledata s1 join sampledata2 s2 on (s1.id = s2.id);
Some more details about my setup
• single node deployment 2 cores 8 GB ram, 40GB ssd
• rocksdb backend, incremental mode checkpoint
• checkpointing set to 10 mins and MinPauseBetweenCheckpoints = 10 mins
• mode = streaming
The join speed in very slow (of the order of 100 rows per second)
I read some blog posts from ververica which seemed to indicate join speeds of ~89,000 rows per second.
Plz let me know if i am doing anything wrong.. what speeds can i expect here (ballpark)