Abhishek Agarwal
09/28/2022, 10:20 AMClint Wylie
09/29/2022, 10:14 AMCory Johannsen
10/03/2022, 10:26 PMCory Johannsen
10/03/2022, 10:46 PMIndexTaskClient
treats all 400 responses the same https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/common/IndexTaskClient.java#L373-L378Abhishek Agarwal
10/04/2022, 6:00 AMCory Johannsen
10/10/2022, 11:45 PMChangeRequestHttpSyncer
that perform sync operations with segments has not maximum failure limit, and will retry indefinitely to sync with a node. The issue I have been diagnosing is that the node no longer exists and will never come back; the underlying kubernetes pod is gone and the IP address is either no longer in use or recycled to another pod. I am assuming that the segment should have unannounced itself, but I'm wondering if that failed to occur. The end result is that a sync thread is running indefinitely attempting to talk to a host that will never return. In this case there is a NoRouteToHost
that identifies that the host is gone:
Caused by: java.net.NoRouteToHostException: Host is unreachable
With enough pod churn this eventually consumes all the threads on coordinator http client pool and coordinator stops responding to requests, so not having a fail-safe seems like this is inevitable.kfaraz
10/25/2022, 7:49 AMDidip Kerabat
10/29/2022, 4:11 PMDidip Kerabat
11/03/2022, 3:08 PMAmatya Avadhanula
11/03/2022, 4:40 PMAmatya Avadhanula
11/05/2022, 3:46 AMSamarth Jain
11/16/2022, 10:31 PMVectorValueSelector
interface, it seems like only primitive columns are supported.Abhishek Agarwal
11/23/2022, 11:15 AMcontrib
extensions experimental? Reviewing this PR https://github.com/apache/druid/pull/13348/files#diff-f4ca21313f7523b8be9efd16ea196bd1426a143fdbf9422d3560643ff69e275f right now that lists experimental features. I think that we need not list the contrib
extensions here at all.Didip Kerabat
11/28/2022, 7:53 PMGian Merlino
12/06/2022, 8:26 AMMichael Schiff
12/07/2022, 7:17 PMGian Merlino
12/09/2022, 8:11 PMMaytas Monsereenusorn
12/15/2022, 9:24 AMtransformSpec
or a flattenSpec
which creates a field that has a different name which does not exist in your parquet schema and you do not include the base field (that exist in parquet schema) in the dimensionsSpec. For example, your parquet file has column country
, then you add a transformSpec
that does something like "transforms": [ { "type": "expression", "name": "countryUpper", "expression": "upper(country)" }
. In the parseSpec->dimensionsSpec->dimensions, you only have countryUpper
(as you don’t want the original country
field in your Druid datasource). The problem is that the DruidParquetReadSupport will not read in the country
field (as country
is not in the dimensionsSpec). As a result, you will not get countryUpper
in your Druid datasource as Druid (map/reduce) job did not read in the country
field.
(An exception to this is when parseSpec is ParquetParseSpec and flattenSpec is not null….so for the example above, lets assume that this is not the case i.e. we are using timeAndDims)Clint Wylie
12/16/2022, 6:05 AMfrontCoded
starting on the highlighted day (well had a false start on 2022-11-10 so the dip there is due to a handful of the segments also being frontCoded
)Rishi Rana
12/27/2022, 7:08 PMMaytas Monsereenusorn
01/02/2023, 11:03 AMCory Johannsen
01/11/2023, 2:07 AMdruid-kubernetes-extensions
that causes the coordinator that is the leader to become non-responsive over time. The root of the issue is that when an indexer
node shuts down and un-announces itself the ChangeRequestHttpSyncer
configured to sync the workers (at druid-internal/v1/worker
) shuts down cleanly, but the ChangeRequestHttpSyncer
configured to sync the segments (at druid-internal/v1/segments
) can be left running. I have isolated the behavior to the order of even processing at indexer shutdown. If the PEON
node shutdown is processed after the INDEXER
node shutdown, then system cleans up the syncers correctly. If the PEON
node shutdown is processed before the INDEXER
node shutdown is always generates an NPE and this appears to prevent the segment syncer from getting shut down.
The net result is that as the indexer pods are replaced there the syncers that never shut down accumulate in memory until the thread pool is exhausted and the coordinator stops accepting HTTP requests from the other nodes in the cluster.
I have capture a log from the coordinator that shows the indexer cleanly starting up and shutting down two times, and then on the third time it processes the PEON
before the INDEXER
and produces the bug. I'll attach the annotated log in a thread.Shilpa Sivanesan
01/11/2023, 6:29 AM[KinesisRecordSupplier-Worker-13] org.apache.druid.indexing.kinesis.KinesisRecordSupplier - unknown getRecordRunnable exception, will not retry
should we increase the heap size of middle manager / indexing task ? but since the source is same the no of events between tranquility and kinesis is same.
would anyone know about this issue with kinesis ingestion ?Cory Johannsen
01/12/2023, 6:04 PMCory Johannsen
01/12/2023, 9:54 PMlibsigar
and am curious if anyone has a good solutionXavier
01/13/2023, 6:17 PMEyal Yurman
01/19/2023, 10:06 PMJason Koch
01/25/2023, 7:17 PMStringDimensionIndexer
will no longer need to track size, and -- does this mean that the code for estimateEncodedKeyComponentSize
here will go away? 🧵Shyam
01/27/2023, 12:14 AMdruid.indexer.tasklock.forceTimeChunkLock
come out of its experimental status? I wonder if there is an effort towards that.Abhishek Agarwal
01/27/2023, 4:59 AM