Hi everyone! I have a question regarding data pre...
# ask-community-for-troubleshooting
a
Hi everyone! I have a question regarding data preprocessing (data transformation). AirByte is a ELT tool, which extracts data from the sources and uploads it to the destination as a raw data. I would like to preprocess that raw data with Pandas (Python), and I want it to be automated as well. I'm going to describe my plan: 1. Extract Shopify data using AirByte and store it in Postgre database 2. Get the data for each particular product from products table (by
product_type
) 3. Preprocess this data. (extract addtional information about the product from
tags
field, and add it to the corresponding fields) 4. Store the final (preprocessed data) into a separate tables in the Postgre. For example, I have got the data about shoes, preprocess it using Pandas (extract information from
tags
) add this extracted data to additional fields. Save it as a separate table in the Postgre Database (shoes_table). Therefore, I have a question: Is there a way to preprocess the data using Pandas (Python code) in AirByte? Or are there any approaches ? for example, using both AirFlow + Airbyte ?
1
a
Hi @Andrei Batomunkuev you can run a custom transformation with DBT on Airbyte for your usecase. We have a guide here to help you set this up. We do not offer yet other kinds of transformation (e.g with Pandas) but feel free to upvote this issue. If you're using Airflow you can use our operator to trigger the Airbyte job sync and run your pandas transformation after the sync.
a
@Augustin Lafanechere (Airbyte) Thank you! Could you please share a guide to set up custom transformations?