This message was deleted.
# troubleshooting
s
This message was deleted.
c
off the top of my head i’m not sure if this can be done with SQL right now at least
that said, i think it would be possible to make a ‘native’ query that could do it using the undocumented ‘expression’ aggregator
which is what makes things like
ARRAY_AGG
work behind the scenes and is able to model most aggregators using native druid expressions
well, im not sure if there is a clean way to make an ‘empty’ json object so that might be a bit rough too
the
initialValue
would need to be a json object.. i forget if i let
json_object
have zero arguments or not, ideally it would work though
oh neat just tested, zero argument does work
ok so
initialValue
would be
json_object()
i think, then the
fold
expression would be something like the native
fold
expression https://druid.apache.org/docs/latest/misc/math-expr.html#apply-functions to walk over the
json_keys(clicks_by_country)
oh wait, i just noticed that object_id is not inside the nested column, maybe this won’t work yet either
well, i guess could group by object_id and use the expression aggregator to combine the clicks by country might be approximately the right thing
the engine cannot currently handle grouping on json blobs directly right now
oh, actually i think json_value only works with constant path arguments right now, so my idea wont work
p
@Clint Wylie yup, I was working with native queries anyway and what I tried so far was: 1. use JSON_PATH as an expression within virtual column
paths
. If you define output type as a
STRING
you get N results (where N is the number of keys in given json). 2. apply
json_value
with a value from
paths
column Example here: DATA IN:
Copy code
{
  "time": "2023-06-05T16:30:00",
  "object_id": 1,
  "clicks_by_country": {
    "US": 200,
    "CA": 400,
    "JP": 700
  }
}
OUTPUT AFTER 1.: (THIS WORKS)
Copy code
__time object_id   paths
2023-06-05T16:30:00         1    $.US
2023-06-05T16:30:00         1    $.CA
2023-06-05T16:30:00         1    $.JP
EXPECTED OUTPUT AFTER 2.: (THIS IS NOT WORKING)
Copy code
__time object_id   paths    values
2023-06-05T16:30:00         1    $.US       200
2023-06-05T16:30:00         1    $.CA       400
2023-06-05T16:30:00         1    $.JP       700
In my head, that makes sense and should work, although AFAIK you cannot use the virtual column value on the same level where you defined it. So I am unable to reach it in one query. When I want to introduce another layer of groupby into that query (no biggie, we do this every day), I cannot push the JSON anywhere higher. It's only available in the lowest layer of the query
Bumping here!
☝️ 1
j
@Clint Wylie, I was able to get partway here: This is logically what I thought would work:
Copy code
with tmp as (
select distinct string_to_mv(mv_to_string(json_keys(clicks_by_country, '$.'), ','), ',')  country
  from inline_data
 group by 1
)
 select tmp.country, json_value("clicks_by_country", '$.' || tmp.country)  val
   from inline_data
  cross join tmp
In the above SQL -- • the contents of the CTE will work by itself and return a single column list of countries from the source data • I can also run the main query if I use a simple CTE, e.g.:
Copy code
with tmp as (
select 'US' country
)
 select tmp.country, json_value("clicks_by_country", '$.' || tmp.country)  val
   from inline_data
  cross join tmp
... which works. But my first SQL statement above as-is generates a query unsupported error. I then tried tricking it by cross joining the rows first before doing the JSON extract on clicks:
Copy code
with tmp_keys as (
select distinct string_to_mv(mv_to_string(json_keys(clicks_by_country, '$.'), ','), ',')  country
  from inline_data
 group by 1
), tmp as (
 select k.country, d.clicks_by_country
   from inline_data d cross join tmp_keys k
)
 select country, json_value("clicks_by_country", '$.' || country)  val
   from tmp
But it doesn't like that either. It will work with "select * from tmp" as the final query, that json_value() function just doesn't seem to like operating on a record set. Given the above tests and logic can you think of a variation that will work? Thanks. John
p
The solution here was to use unnest function with data transformed into array.
Copy code
{"time": "2023-06-05T16:30:00", "object_id": 1, "clicks_by_country": {[{"country": "US", "clicks": 200, {"country": "CA", "clicks": 400, "country": "JP", "clicks": 700}]}}
The query looks like:
Copy code
{
  "queryType": "groupBy",
  "dataSource": {
    "type": "unnest",
    "base": {
      "type": "table",
      "name": "test_complex_data_unpacking"
    },
    "virtualColumn": {
      "type": "expression",
      "name": "impressions_by_country_flat",
      "expression": "\"impressions_by_country\"",
      "outputType": "COMPLEX"
    }
  },
  "granularity": "all",
  "intervals": "0000/3000",
  "dimensions": [
    "related_object_id",
    "v0"
    "v1"
  ],
  "virtualColumns": [    {
      "type": "nested-field",
      "columnName": "impressions_by_country_flat",
      "outputName": "v0",
      "expectedType": "STRING",
      "path": "$.country"
    }
    {
      "type": "nested-field",
      "columnName": "impressions_by_country_flat",
      "outputName": "v1",
      "expectedType": "LONG",
      "path": "$.impressions"
    }
    ]
}
I am not sure how to approach multiple columns that might require unnesting. Still digging.