https://pinot.apache.org/ logo
Join Slack
Powered by
# complex-type-support
  • y

    Yupeng Fu

    04/20/2021, 8:54 PM
    currently i always flatten the struct
  • y

    Yupeng Fu

    04/20/2021, 8:54 PM
    we may consider a new rule to json serialize the struct
  • j

    Jackie

    04/20/2021, 8:56 PM
    You don't need to serialize the struct in the new added module. It can be
    Map
    or
    List
  • j

    Jackie

    04/20/2021, 8:57 PM
    In the rule, config the columns to extract and columns to flatten. If the extracted column is primitive type, leave it as is; if the extracted column is complex type, convert it to
    Map
    or
    List
  • y

    Yupeng Fu

    04/20/2021, 8:59 PM
    okay, i think the default behavior is different from yours and mine
  • y

    Yupeng Fu

    04/20/2021, 9:00 PM
    in mine, primitive types takes precedence over json type
  • y

    Yupeng Fu

    04/20/2021, 9:00 PM
    so i always try to flatten it until i cannot, e.g. array
  • y

    Yupeng Fu

    04/20/2021, 9:00 PM
    so perf is preferred
  • j

    Jackie

    04/20/2021, 9:01 PM
    While, I think the only difference here is to make the flatten columns configurable
  • j

    Jackie

    04/20/2021, 9:01 PM
    We can start with always flattening all complex types
  • j

    Jackie

    04/20/2021, 9:02 PM
    Then everything will be extracted into primitive type, and you don't need the json index feature
    ➕ 1
  • y

    Yupeng Fu

    04/20/2021, 9:07 PM
    right, that's the behavior in the proposal
  • y

    Yupeng Fu

    04/20/2021, 9:07 PM
    i think we are on the same page
  • y

    Yupeng Fu

    04/27/2021, 12:00 AM
    hey, folks, i have a PR to implement the transformer proposed in the design. https://github.com/apache/incubator-pinot/pull/6845 there will be followup PRs on other surrounding components
    👍 2
  • y

    Yupeng Fu

    04/27/2021, 12:08 AM
    plz take a look
  • k

    Kishore G

    04/29/2021, 8:29 PM
    @User did we summarize the notes from our zoom call the other day
    a
    • 2
    • 6
  • a

    Amrish Lal

    04/29/2021, 10:54 PM
    Also, there was a fair amount of discussion on complex type (STRUCT/MAP/LIST) as well, so here is a summary related to that discussion:
    Copy code
    Adding support for JSON	has commonalities with adding	support	for complex data types (STRUCT, LIST, MAP) with the key difference between JSON and STRUCT/LIST/MAP support being that JSON will not enforce schema validation (in keeping with JSON standard) while as STRUCT/LIST/MAP will support schema validation. A table could be defined with both a JSON column and a STRUCT/LIST/MAP column. For example:
     nestedColumn1 JSON,
     nestedCOLUMN2 STRUCT (name : string, age: INT : salary : INT, addresses : LIST (STRUCT ( apt: int, street : string, city : string, zip : INT )))
    
    The implementation steps that we describe under	"Near Term Enhancements" are common to supporting both JSON and	complex	data types (STRUCT, LIST,
    MAP). Both JSON and STRUCT/LIST/MAP columns:
    -would be stored as text,
    -would use JsonIndex for fast filtering (with additional support for multidimensional arrays)
    -be queried via new dot/array based syntax as proposed in "Language enhancements"
    
    In the long term it is quite possible that these data types share common hierarchical indexing functionality and storage mechanisms while providing JSON specific semantics with JSON column type and a more well-defined schema and type checking semantics with STRUCT/LIST/MAP type.
    👍 1
    j
    • 2
    • 2
  • a

    Amrish Lal

    04/29/2021, 11:53 PM
    @User If you are moving forward with long term storage aspects of JSON, would definitely like to discuss that further with you. cc: @User
    ➕ 1
  • k

    Kishore G

    04/30/2021, 12:05 AM
    yes, will write it up
    👍 1
  • a

    Amrish Lal

    05/01/2021, 5:30 PM
    @User in continuation of what we discussed earlier, this is the existing query:
    select jsoncolumn,json_extract_scalar(jsoncolumn, '$.person.companies[*].name', 'STRING') from jsontable where id = 106
    which produces the results:
    {"person":{"name":"daffy duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}, 	["n1","n2"]
    What we would like to do is to rewrite this query from user query:
    select jsoncolumn.person.companies[*].name from jsontable where id = 106
    I believe by "unnesting" you are referring to the fact that
    ["n1","n2"]
    could be separate rows in Pinot? Also, for JSON storage support, we had briefly looked at BSON format (mongodb), JSON2 (derivative of BSON used in postgres), and OSON (used by oracle json database.). In either case, we were looking at a format that would help to minimizing parsing of json strings into json object before query evaluation as is being done in json_extract_scalar.
  • k

    Kishore G

    05/01/2021, 5:50 PM
    I was referring to a different query/requirement
  • j

    Jackie

    05/05/2021, 1:31 AM
    Support for nested (multi-dimensional) array: https://github.com/apache/incubator-pinot/pull/6877
  • j

    Jackie

    05/05/2021, 1:31 AM
    @User @User Please take a look
  • a

    Amrish Lal

    05/05/2021, 2:05 AM
    JSON_MATCH filter expression to be JSONPath compatible
    Nice
  • s

    Sidd

    05/05/2021, 6:18 AM
    Thanks @User. Will review tomorrow
  • a

    Amrish Lal

    05/05/2021, 4:17 PM
    Also, this is the PR for JSON column type support: https://github.com/apache/incubator-pinot/pull/6878
  • b

    Brad

    12/11/2021, 6:19 PM
    @User has left the channel
  • s

    Santiago Paz

    05/05/2024, 3:43 AM
    Hi everyone! Can some fine lad lend me a hand with this.
    h
    • 2
    • 1
  • s

    Santiago Paz

    05/05/2024, 3:43 AM
    I have this json array to ingest ['{"objects":[{"detection":{"bounding_box":{"x_max":0.18486635386943817,"x_min":0.11353801190853119,"y_max":0.23475037515163422,"y_min":0.06590475142002106},"confidence":1.0,"label_id":1},"emotion":{"confidence":0.9898877739906311,"label":"happy","label_id":1,"model":{"name":"0003_EmoNet_ResNet10"}},"h":122,"region_id":131,"w":91,"x":145,"y":47}],"resolution":{"height":720,"width":1280},"timestamp":294364856}', ' {"onecallid": 1716745129 } ']
    h
    • 2
    • 1
  • s

    Santiago Paz

    05/05/2024, 3:44 AM
    no matter how much i try to ingest the onecallid field, it doesnt happen,
    h
    • 2
    • 1