This message was deleted Apache Druid #troubleshooting

Join Slack

This message was deleted.

# troubleshooting

Slackbot

06/01/2023, 11:28 AM

This message was deleted.

John Kowtko

06/01/2023, 1:20 PM

I have tested the three APIs (regular SQL API, Async SQL API, MSQ Task API) using curl , querying up to 50m narrow (4 columns) records from a datasource. I found all three APIs to work fine, but the regular sync API completed fastest due to not having the extra step of storing and retrieving results. For Python specific client, there is a Python API (DOC HERE) and PyDruid (Github repo HERE)

Abhishek Agarwal

06/01/2023, 1:57 PM

There is no Async SQL API in open-source druid.

Peter Marshall

06/01/2023, 1:59 PM

Hey @Bharat Thakur I seem to remember this blog from (a while ago) … maybe you’ll find some things in here? https://support.imply.io/hc/en-us/articles/360034310953-Tuning-Druid-for-Large-Result-Sets

➕ 1

Peter Marshall

06/01/2023, 2:00 PM

10m is quite a lot --- sounds like a massive

GROUP BY

result 😄

John Kowtko

06/01/2023, 3:00 PM

In my test script I was doing a simple select from the Trips demo DB:

Copy code

SQLTEXT="SELECT trip_id, pickup_longitude FROM trips_xaa where pickup_longitude != 0 limit $FETCH_ROWS"

No group by involved (the use case is to pull a list of candidate events for targeted marketing campaign) ... I think 50m records took about 40sec on my laptop quickstart config? (Note I did not store the results to disk, for the sync API I piped the output to 'wc -l' to ensure the data was being sent back to the client ... and of course no network involved. But still pretty decent performance IMHO.

Didip Kerabat

06/01/2023, 4:43 PM

10 million rows? or 10 million requests/second?

Bharat Thakur

06/03/2023, 9:14 AM

I have now fetched 185 million rows from Druid dumped using python in 7 minute. Changed only group by storage

Open in Slack

Previous Next