What platform are you using? I've written multiple different styles of semi-structured normalization scripts, either pure python or pyspark or scala spark. It's much easier to do this pre-sql and then just land the normalized dataframe into sql and select what you need. Unfortunately with some IOT devices, you get jibberish across the network. Combining schema's for some set of say drilling devices created ~500 columns, a lot of which were the same thing just identified differently in the json stream. Takes effort to map them.