This message was deleted.
# general
s
This message was deleted.
s
26.0 introduces an actual Array data type, so I assume you mean a multi value string column. You can find all the query context options here. The one that you are looking for is
groupByEnableMultiValueUnnesting
which is part of the group by query context options found here. I tested it on 25.0 and it is working, it even respects your choice of MV behavior at ingestion, I tested with ARRAY to see that it respected the order of the elements, and distinguished between different orders when grouping: inline data:
Copy code
{"mv_column":["a","b","c"],"metric":1}
{"mv_column":["d","a","b"],"metric":3}
{"mv_column":["b","a","d"],"metric":5}
query context:
Copy code
{
  "groupByEnableMultiValueUnnesting": "false"
}
SQL:
Copy code
SELECT "mv_column", count(*) FROM "inline_data" GROUP BY 1
Results:
Wait, what!? I tried another SQL:
Copy code
SELECT MV_TO_ARRAY("mv_column"), count(*) FROM "inline_data" GROUP BY 1
Now it works:
so I wondered if the query context was doing anything, I removed it and the result was still correct, same as above. In 26.0 with Array data type and UNNEST functions, this gets more natural IMHO.
also tried to filter with:
Copy code
SELECT MV_TO_ARRAY("mv_column"), SUM("metric"), count(*) 
FROM "inline_data"  
WHERE MV_TO_STRING("mv_column",100) like '%c%'
GROUP BY 1
and this:
Copy code
SELECT MV_TO_ARRAY("mv_column"), SUM("metric"), count(*) 
FROM "inline_data"  
WHERE "mv_column" like '%c%'
GROUP BY 1
and both worked. After rereading your question I see that what you want is that only the item that matches be in the results, not the other items in the row that matched. This will at least give you the right number of rows, even though it still shows the other values in the MV column:
Copy code
SELECT MV_TO_ARRAY("mv_column"), SUM("metric"), count(*) 
FROM "inline_data"  
WHERE "mv_column" like '%c%'
GROUP BY 1
I tested unnest on 26.0-SNAPSHOT build where it is still experimental but the results are what I think you are expecting:
Copy code
SELECT n."mv_item", SUM("metric"), count(*) 
FROM "inline_data", 
     UNNEST("mv_column") AS n("mv_item")  
WHERE n."mv_item" like '%d%'
GROUP BY 1
🙌 1
v
groupByEnableMultiValueUnnesting
is only for (native) groupBy, the first query you, @Sergio Ferragut, wrote planned to a
topN
this flag is used in MSQ ingestion (to make sure that your ingested data does not expload)
at query time, as you have discovered, the correct thing to do is to wrap you MV value in an
MV_TO_ARRAY(...)
b
If you're using druid SQL, there's also a new (I think) function,
MV_FILTER_ONLY
. (See here._)
v
that's a good note but that would only let you express an
IN
filter, not a
LIKE
filter
✅ 1
b
@Sergio Ferragut when should we expect release of 26.0 version
Waiting for window function
s
I hear it will be soon. You can follow it’s progress at https://github.com/apache/druid/milestone/50
r
sorry, I was busy for the rest of the day and didn't came back for the answers I was indeed only expecting the matching values in the results, when I tried using a other query, the results where the same (I think there apache calcite just rewrote it removing the subquery), I just ended filtering in the spreadsheet after downloading