Open to all advices & tips :slightly_smiling_f...
# advice-data-transformation
j
Open to all advices & tips 🙂
👋 3
m
denesting is hard, basically. You can write your own, but the edge cases will quickly drive you crazy. Airbyte will do denesting in many cases though - is that not working here?
m
I think the issue is that the denesting is done for the 1st level object but then when you have another nested object inside it is parsed as a string. I encountered the same issue with MongoDB documents and also look for a solution (dbt transformation?)
j
What platform are you using? I've written multiple different styles of semi-structured normalization scripts, either pure python or pyspark or scala spark. It's much easier to do this pre-sql and then just land the normalized dataframe into sql and select what you need. Unfortunately with some IOT devices, you get jibberish across the network. Combining schema's for some set of say drilling devices created ~500 columns, a lot of which were the same thing just identified differently in the json stream. Takes effort to map them.
m
For me I'm trying to move my MongoDB collection to PostgreSQL.. in your case have you scheduled your python/pyspark jobs to exctrat the data ?
j
@Marie if that was targeted at me, yes, I'd have a source > lake raw > lake normalized > destination pipeline that extracts json and normalizes it at the lake layer, and then we can push it into a destination in tabular format.
❤️ 1
j
@Matt Sterling Nope, airbyte doesn't handle this case
@Jordan Fox In my case I am currently extracting data from MongoDb and loading it into a BigQuery database. I'm thinking about implementing a generic dbt function that can iterate over nested properties and objects