With respect to complex data types[1] (not yet ava...
# general
e
With respect to complex data types[1] (not yet available, not until 0.8 AFAIU), what would happen in the given example if there were an additional complex data type in the input object. Say that the example input object also had a
friends
array (arbitrary example) at the same level as
group
. I noted that the docs say this about the unnesting example (emphasis mine):
For instance, the example in the beginning will transform into two rows with this configuration example.
Are these then distinct rows in the table, or some kind of special nested rows associated with a top-level row? And if there were another nested/complex field, would that result in a combinatorial explosion in terms of number of rows? A related question: how does this approach work with conjunctions? In the example given where the topics are
paddling
and
hiking
, can a query be written to answer the question “which meetup events had talks about both paddling and hiking?” [1] https://docs.pinot.apache.org/basics/data-import/complex-type#handle-the-complex-type-with-ingestion-configurations
k
you can either unnest the complex fields (this will turn one row into multiple records in Pinot) or store them as JSON (this will be stored as one row). This will allow you to query them using json udf and index using json index
e
Thanks @User 🙏
k
regarding conjunction query - if you flatten then you lose the relationship once you flatten one row into multiple rows - so the conjunction query will require window function which will be expensive But if you store it as JSON, then you will be able to query using JSON_MATCH
our recommendation it to store it as JSON and use json udf and json_match
👍 1
e
and how is the performance of JSON search?
k
we will shortly enhance the storage to store JSON in columnar format
JSON search is very fast
🙌 1
cons of json today - storage is not as efficient as it could be. We plan to convert json into a columnar representation
e
So the context of nested objects/array of json objects is maintained? From the example here[1] would the following conjunction (using json) be truthy? • addresses.country: us AND addresses.number: 3 Maybe it’s a matter of my lack of syntax knowledge but the examples show the use of asterisk for array querying, and I wonder how one could query multiple fields within the same element of a json array. [1] https://docs.pinot.apache.org/basics/indexing/json-index
k
@User ^^
The context is maintained
If the leaf is an array of primitives, you can treat it as an multi value field
May be, we can take sample json and come up with queries

https://youtu.be/CnSnLKQLuXc

👀 1
This video from Jackie will give you more information
🙏 1
e
that’s awesome if the context is maintained! A sample query would be great. Let’s use the example in the docs[1]:
Copy code
{
 "name": "adam",
 "age": 30,
 "country": "us",
 "addresses":
 [
  {
   "number" : 112,
   "street" : "main st",
   "country" : "us"
  },
  {
   "number" : 2,
   "street" : "second st",
   "country" : "us"
  },
  {
   "number" : 3,
   "street" : "third st",
   "country" : "ca"
  }
 ]
}
How would one write a query to select rows that had an address on
main st
but only if that address was in
ca
? Would the below query return the
adam
row?
Copy code
SELECT ... FROM mytable WHERE JSON_MATCH(person, '"$.addresses[*].country"=''ca'' AND "$.addresses[*].street"=''main st'' ')
Especially thinking of the fields like multi-value fields, I would guess that it would return the
adam
row:
Copy code
{
  "name": "adam",
  "age": 30,
  "country": "us",
  "addresses.number": [
    112,
    2,
    3
  ],
  "addresses.street": [
    "main st",
    "second st",
    "third st"
  ],
  "addresses.country": [
    "ca",
    "us"
  ]
}
[1] https://docs.pinot.apache.org/basics/indexing/json-index
k
That’s rt.. that query works
And gets you what’s expected
j
Very good question. With json index, this query should not return
adam
because
main st
and
ca
are not from the same element
e
ah ok @User, so the
*
(asterisk) does maintain the context? That’s both surprising and great to learn!
j
If you don't want to maintain the context, then you may split it into 2
JSON_MATCH
clause. Basically in a single
JSON_MATCH
, the context is maintained
e
awesome 🙌 It’s great to have the flexibility to choose with/without context. Thanks all for the answers!
ya, this is really great. The video link of your presentation @User is also very helpful. Glad you added this feature!
k
@User this is an excellent scenario and we should add this to the docs as example queries
e
Ya definitely having examples which show boolean logic within a single JSON_MATCH vs. multiple JSON_MATCH clauses will really showcase the flexibility 🚀
👍 1