This message was deleted.
# troubleshooting
s
This message was deleted.
s
Seems to be complaining about a missing segment file within the temporary work area for the task. In principle, a kill task only affects deep storage, so something is off. I don't think it is retention related. Then again it could be when the input phase was trying to read that segment from Deep Storage... are you selecting from Druid tables?
t
Yes It is from the Druid tables. After debugging it further. Looks like segment it is complaining about is actually never ingested. Lets sys I have selected one day of data from Druid tables and there was no ingestion from 8hr to 9hr this causes MSQ to fail with not able to access that one hour segment. Looks like it is a bug
s
Oh… interesting I will try to reproduce. Thanks for explaining.
@tilak chowdary what version were you on? I tried to reproduce this on 26.0 by ingesting this data:
Copy code
REPLACE INTO "msq-reindex-failure" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type": "inline","data": "{\"timestamp\":\"2023-01-01 08:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 09:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 10:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 12:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 13:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 15:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 17:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 18:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 19:00:00\",\"col1\":1}\n{\"timestamp\":\"2023-01-01 20:00:00\",\"col1\":1}"}',
    '{"type":"json"}',
    '[{"name":"timestamp","type":"string"},{"name":"col1","type":"long"}]'
  )
))
SELECT
  TIME_PARSE("timestamp") AS "__time",
  "col1"
FROM "ext"
PARTITIONED BY HOUR
which has gaps in the hours 11, 14, 16 and reindexed with:
Copy code
REPLACE INTO "msq-reindex-failure-target" OVERWRITE ALL
SELECT *
FROM "msq-reindex-failure"
WHERE __time BETWEEN '2023-01-01' AND '2023-01-02'
PARTITIONED BY HOUR
but it succeeds. Is there an aspect of what you did that I am missing?
t
Thanks @Sergio Ferragut for trying out. These are our segments in S3 2023-05-21. There is a segment missing for time 2023-05-21T010000.000Z_2023-05-21T020000.000Z. When run MSQ it fails for that segment. We are on 25.0, I couldn't reproduce with your example.
Meta data has an entry for that hour with 0 rows
@Sergio Ferragut I was able to reproduce with these steps step 1) REPLACE INTO "msq-reindex-failure-2" OVERWRITE ALL WITH "ext" AS (SELECT * FROM TABLE( EXTERN( '{"type": "inline","data": "{\"timestamp\":\"2023-01-01 120000\",\"tags\":[\"t1\",\"t2\"], \"id\":\"id1\"}\n{\"timestamp\":\"2023-01-01 130000\",\"tags\":[\"t2\",\"t3\"], \"id\":\"id1\"}"}', '{"type":"json"}', '[{"name":"timestamp","type":"string"},{"name":"tags","type":"string"},{"name":"id","type":"string"}]' ) )) SELECT TIME_PARSE("timestamp") AS "__time", "tags", "id" FROM "ext" PARTITIONED BY HOUR step 2) REPLACE INTO "msq-reindex-failure-2" OVERWRITE ALL SELECT max(__time) __time, ARRAY_AGG(DISTINCT tags, 1000) as tags, id FROM "msq-reindex-failure-2" GROUP BY id PARTITIONED BY HOUR ) Run step2 again step 3) REPLACE INTO "msq-reindex-failure-2" OVERWRITE ALL SELECT max(__time) __time, ARRAY_AGG(DISTINCT tags, 1000) as tags, id FROM "msq-reindex-failure-2" GROUP BY id PARTITIONED BY HOUR
s
Thanks! I will try this...
m
t
@Sergio Ferragut this https://github.com/apache/druid/pull/14342 fixed this.
🙌 1
k
A potential race in MSQ based ingestion can happen if • t0: segmentA is used and is fetched by the controller • t1: segmentA matches the drop rule is mark used==false by coordinator • t2: segmentA is removed by the kill task • t3: worker download the segmentA after spawning Generally t1 t2 delta is in minutes/hours where as delta between t0-t3 is in seconds. So there is a race but we can live with it for now 🙂