Renato Santos
01/20/2023, 4:38 PMAbdel
01/21/2023, 6:48 PM| time | s_id | v1 | … | vn|
I have scaled the number of s_ids from 10 to 10000 in a larger dataset. The queries are getting 500% slower even for the same size of accessed data and query output.
Could anyone provide an intuition about the reason of this slowdown? True, more data is stored in the system, but the query latency should be the same if its output is the same, no?slack123
01/23/2023, 4:00 PMKeren Meron
01/25/2023, 8:32 AMmakeGroupId
in screenshot).
A possible solution I thought for this would be to have the two tasks be under the same group, but unfortunately the groupId is not something that I can control.
I would like to know if:
1. There is a reason against the kafka and index task sharing the same lock in this case?
2. There is a reason against allowing to set the groupId in the ingestion task spec? (thought of opening a PR for this)
3. There is another suggestion on how to overcome this issue?
Thank You!Abdel
01/25/2023, 9:54 AMtilak chowdary
01/25/2023, 6:30 PMJames Von Kaenel
01/25/2023, 9:30 PMMichael Taranov
01/26/2023, 9:20 AMspark_druid_connector
?
I see following in GitHub and number other open PRs but nothing else in recent time
https://github.com/apache/druid/tree/spark_druid_connector
My goal is just ETL data from Druid to other data DBs/Storages with help of something (preferably Spark) 🙂
If you have any working solutions in your production I really would like to hear about them.Hogan Chu
01/26/2023, 3:16 PMsize
result is not going similar to right value
-> any advice for approach?tilak chowdary
01/26/2023, 9:32 PMdimenstions(_time, id, tags) metrics( duration)
{"timestamp": "2011-01-12T00:00:00.000Z", "abc", ["t1","t2","t3"], 1} #row1
{"timestamp": "2011-01-12T00:00:00.000Z", "abc", ["t3","t4","t5"], 2} #row2
{"timestamp": "2011-01-12T00:00:00.000Z", "abc", ["t5","t6","t7"], 3} #row3
{"timestamp": "2011-01-12T00:00:00.000Z", "abc", [], 4} #row4
we're expecting
_time, id, tags (union), duration(max)
{"timestamp": "2011-01-12T00:00:00.000Z", "abc", ["t1","t2","t3", "t4","t5","t6","t7"], 4} # after
I was able to use the SQL based ingestion to replace the 4 rows into the one I expected. Is there a way to run this SQL ingestion as a scheduled task like an auto compaction?
Challenge of using this as an external cron job is that, coordinating external cron job with druid compaction.Pramod Immaneni
01/26/2023, 10:52 PMSergio Ferragut
01/27/2023, 4:24 PMVijay Narayanan
01/30/2023, 3:21 AMD K
01/31/2023, 6:50 AMJose Robles
01/31/2023, 7:15 AMPranav
02/01/2023, 6:52 PMBastian
02/01/2023, 9:36 PMdruid.segmentCache.infoDir
directory is deleted what will happen to the existing cached segments? Will Historicals loose track of the existing cached segments? Will they stay on disk unless manually cleared?Slackbot
02/02/2023, 2:58 PMNiranjan Sridhara
02/02/2023, 5:04 PMBharat Thakur
02/03/2023, 4:29 AMBharat Thakur
02/03/2023, 4:29 AMVijay Narayanan
02/03/2023, 5:27 AMSergio Ferragut
02/03/2023, 10:33 PMdimensions
list will currently identify columns as string.
- Nested Columns enable semi-dynamic schemas where the fields nested in the ingested object are automatically parsed into columns and give them proper data types. But this is only for nested columns.
So the question is, do you see value in extending the automatic schema detection with the right data type to all columns? What features do you think this should have?
Two use cases come to mind:
- new data / POC : just through some data into Druid and query it
- schema evolution: by auto detecting at every ingestion, there is no need to maintain ingestion specs as columns appear, disappear and change in type.
Any other use cases?추호관
02/04/2023, 7:47 PMJamie Chapman-Brown
02/06/2023, 4:48 PMRishi Rana
02/06/2023, 7:44 PMSai Sharan Tangeda
02/07/2023, 7:56 AMAshok Kumar Ragupathi
02/07/2023, 10:23 AMJamie Chapman-Brown
02/07/2023, 7:55 PMDavid Glasser
02/08/2023, 12:07 AM