Hello everyone, I am new to the community and tr...
# getting-started
j
Hello everyone, I am new to the community and trialling using Pinot to store IoT timeseries data (extremely impressed so far! 😁 ) Our data schema is as expected for IoT (sensor, metric, timestamp, value) and we need to query the data aggregated by time bucket (minute, hour, day but also raw unaggregated data). I have had some success with
timestamp
index but I think a
startree
index, using pre-aggregation, would be more powerful. A couple of questions: 1. Is Pinot a good use case for IoT data? 2. Can
startree
index be used with a continuous data field like
timestamp
? Many thanks in advance! 🙌
A typical query would look like:
Copy code
select dateTrunc(
'HOUR', 
TimestampMS, 
'MILLISECONDS', 
'UTC', 
'HOURS'
) as hour,
avg(value) as avg_val,
count(*) as count,
from data 
WHERE sensor_id = 1
AND metric_id = 'pressure'
AND TimestampMS > fromDateTime('2020-02-01', 'yyyy-MM-dd') 
AND TimestampMS < fromDateTime('2020-02-20', 'yyyy-MM-dd') 
group by hour
ORDER BY hour ASC
k
create a derived column for hour and you can use startree index and it will be extremely fast
d
Nice to see someone else interested in IoT use cases for Pinot @James Ramsay! I'm working on one right now myself. I'm collecting co2, temperature and humidity data. Working on doing some anomaly detection in ThirdEye as well.
j
@Kishore G I managed to configure the star tree index and, after fixing some segment size issues, the performance is really impressive! I will be interested to see what the limits are on cardinality, with respect to the star tree index, for the derived time columns.
@David G. Simmons - that's good to hear. I've noticed a few people on the channel who are also looking at IOT use cases. Have you come across any good example repositories or "recipes" for IoT with pinot?
d
I have written 2 myself — the writeups and like to the repos, etc. are on DZone here and here.
j
Fantastic, thanks for sharing these. Looking forward to going through them both