This message was deleted.
# troubleshooting
s
This message was deleted.
j
I have tested the three APIs (regular SQL API, Async SQL API, MSQ Task API) using curl , querying up to 50m narrow (4 columns) records from a datasource. I found all three APIs to work fine, but the regular sync API completed fastest due to not having the extra step of storing and retrieving results. For Python specific client, there is a Python API (DOC HERE) and PyDruid (Github repo HERE)
a
There is no Async SQL API in open-source druid.
p
Hey @Bharat Thakur I seem to remember this blog from (a while ago) … maybe you’ll find some things in here? https://support.imply.io/hc/en-us/articles/360034310953-Tuning-Druid-for-Large-Result-Sets
1
10m is quite a lot --- sounds like a massive
GROUP BY
result 😄
j
In my test script I was doing a simple select from the Trips demo DB:
Copy code
SQLTEXT="SELECT trip_id, pickup_longitude FROM trips_xaa where pickup_longitude != 0 limit $FETCH_ROWS"
No group by involved (the use case is to pull a list of candidate events for targeted marketing campaign) ... I think 50m records took about 40sec on my laptop quickstart config? (Note I did not store the results to disk, for the sync API I piped the output to 'wc -l' to ensure the data was being sent back to the client ... and of course no network involved. But still pretty decent performance IMHO.
d
10 million rows? or 10 million requests/second?
b
I have now fetched 185 million rows from Druid dumped using python in 7 minute. Changed only group by storage