Gian Merlino
01/13/2024, 4:44 AMGian Merlino
01/13/2024, 9:13 PMKaran Kumar
01/16/2024, 5:45 AMArun C
01/17/2024, 1:36 PMArun C
01/22/2024, 8:26 PMArun C
01/22/2024, 8:27 PMArun C
01/22/2024, 8:28 PMGian Merlino
01/22/2024, 8:47 PMsomaxconn
is helpful when you have a lot of tasks, and we found that Jetty accept queue needs to be sized to match if you want to avoid having a bunch of connection resets.
• we've been experimenting with using druid.global.http.eagerInitialization = false
to reduce the # of connections created when services first start up (otherwise they will eagerly create druid.global.http.numConnections
connections [default 20] all at once)Abhishek Agarwal
01/24/2024, 7:03 AMAbhishek Agarwal
01/25/2024, 11:03 AMorg.apache.druid.extensions
or org.apache.druid.extensions.contrib
- I like the former because we can move the extension to core without any user-facing impact. But latter is what I have seen commonly followed.Satya Kuppam
01/25/2024, 3:23 PMGian Merlino
02/08/2024, 12:22 AMClint Wylie
02/14/2024, 3:26 AMAhemad Ali Shaik
02/15/2024, 3:01 PMLasse Mammen
02/19/2024, 4:21 PMSergio Ferragut
03/04/2024, 5:53 PMKai Sun
03/08/2024, 10:25 PMt0:
H0: group by 8 segments with 16 hydrants in total
L0: group by 2 segments with 2 hydrants in total
t1:
H1: group by 8 segments with 16 hydrants in total
L0: group by 2 segments with 2 hydrants in total.
t2:
....
So before this change, H0 (high priority query) and L0 at t0 would both make progress in the thread processing pool for the reason that H0 will only use 8 threads to process its 8 segments. ( The hydrants within the same segment are processed in the same thread sequentially.) Thus, the would be 2 threads left for L0 (low priority query) to make progress.
It is not hard to see that if high priority query is processed at the roughly the same time span as low priority query, both queries would make progress with the threads usages as 8:2 ratio.
Now, with this change, H0 would take over the 10 threads as it 16 hydrants (more than 10 threads slot in the processing pool) and leave no threads for L0 to progress. So L0 and subsequent low priority queries would "starve" and never make progress.
In general we can say before this changes, as long as the segment count of total on-going high priority queries is less than the processing thread pool size, the low priority queries can make progress.
After the change, only as long as the hydrant count of total on-going high priority queries is less than the processing thread pool size, the low priority queries can make progress
In practice, it is very likely to see "starvation" of low priority queries (or much longer running time for low priority queries) previous not seen before.Egor Ryashin
03/14/2024, 4:19 PMCould not resolve type id 'S3' as a subtype
because it’s case sensitive and doesn’t match to s3
NSERT INTO
EXTERN(
S3(bucket => 'your_bucket', prefix => 'prefix/to/files')
)
AS CSV
SELECT
<column>
FROM <table>
birinder tiwana
03/14/2024, 4:46 PMSamarth Jain
03/15/2024, 11:06 PM"dimensions": [
"dim"
],
"aggregations": [
{
"type": "longSum",
"fieldName": "abc",
"name": "abc"
}
],
"postAggregations": [
{
"type": "fieldAccess",
"fieldName": "abc",
"name": "value"
}
],
"intervals": [
"2018-01-02T00:00:00.000Z/2024-03-12T23:59:59.999Z"
],
"limitSpec": { "type": "default", "limit": 2000001
},
"queryType": "groupBy",
"granularity": "day",
"dataSource": "ds"
}
where as the results of below query get cached:
{
"dimensions": [
"rejoiner_group"
],
"aggregations": [
{
"type": "longSum",
"fieldName": "accounts_can_stream_cnt_7d",
"name": "accounts_can_stream_cnt_7d"
}
],
"intervals": [
"2018-01-02T00:00:00.000Z/2024-03-12T23:59:59.999Z"
],
"queryType": "groupBy",
"granularity": "day",
"dataSource": "productinsights_engagement"
}
The different being that in the first query there is post aggregation and limit.ANANTHAN A
03/26/2024, 6:02 AMArun C
03/30/2024, 10:42 AMArun C
03/30/2024, 10:42 AMArun C
03/30/2024, 10:43 AMArun C
03/30/2024, 10:44 AMArun C
03/30/2024, 10:47 AMEgor Ryashin
04/03/2024, 5:10 PMResultFormat
open for extension, it will allow to create arbitrary ResultFormat.Writer
for exporting formats, for example here instead of CSV
a user could specify PARQUET
, XSL
(and so on) by loading a relative Druid extension.
INSERT INTO
EXTERN(
S3(bucket => 'your_bucket', prefix => 'prefix/to/files')
)
AS CSV
SELECT
<column>
FROM <table>
The problem is ResultFormat
is enum right now. I propose instead of using values()
in ResultFormat.fromString()
- there should be a Guice MapBinder
that can collect ResultFormat.Writer
implementations from different extensions (as well as from core lib).
I wonder if anyone working on that right now or has some insight/plans?Neeraj Kumar
04/08/2024, 10:15 AMHardik Bajaj
04/16/2024, 8:37 AMstatsd-emitter
Egor Ryashin
04/22/2024, 7:18 AM