Hey all! I'm using Snowflake as a DataLake for my ...
# advice-data-warehouses
g
Hey all! I'm using Snowflake as a DataLake for my company and loading all the raw data coming from different sources with Airbyte into it. First question: Is it ok to use Snowflake as a DataLake? Second question: Do you guys have any advise on how to structure the data (for DataLake purpose). I was thinking on doing: • Independent database for all the raw data on the "Data Lake": All data from Airbyte or other sources in raw format. ◦ Schema for airbyte data, I'd have another schema for data coming from other sources. ▪︎ One table for each source (i.e. stripe, google ads, etc)? What do you think about it? Do you have any advise on doing it differently? Is it just a matter of structure preference or are there other pros and cons? Thanks!
👀 1
a
@Madison Mae shared how she organizes raw data from Airbyte on Snowflake here: https://airbyte.com/blog/snowflake-data-warehouse-architecture When you refer to raw data do you mean data with no transformations or you are also disabling Airbyte normalization?
g
Thanks Ari, I’ll read it🙌🏻 Regarding raw data, I was refering to the data without transformations, but I am enabling normalisation. Would you recommend normalising it myself?
a
You would win some time at the beginning if you enable normalization, as your data will be eaiser to query. I've heard of some users that disable it to control everything or because sometimes Airbyte normalizes data "too much"...
Otherwise, the article reads:
Copy code
Your “RAW” and “BASE” databases should be made of the exact same schemas. “BASE” is essentially just a copy of your raw database but with basic transformations applied. In both of these databases, I recommend creating a new schema for every data source.
Copy code
For example, if you ingest data from Google Ads, Facebook, Bing Ads, and Mailchimp, you would create a different schema for each of these.
g
Awesome, thanks!🙌