Hey all! does anybody know if there’s an existing ...
# advice-data-transformation
o
Hey all! does anybody know if there’s an existing dbt package for the github source? I want to contribute but avoid starting from scratch if possible I’ve tried searching for it but if I Google “airbyte github dbt package” it just can’t handle those keywords 😂 “you mean like the github page for airbyte?” “oh you mean the dbt directory in airbytes github?” I mean something like fivetran’s: https://hub.getdbt.com/fivetran/github/latest/
👋 1
o
Got it! https://github.com/CerebriumAI/airbyte_dbt_github buried in that post 🙌
m
There isn’t a official dbt packages for Airbyte
o
Time for me to get started then!!! cracks knuckles
a
@Michael Louis getting some traction! 🙂
@Oliver Laslett we may not have the same Google installed 😄
o
^ right but the title of that blog is not actually what I wanted. What I was searching for was a pre-built package rather than a tutorial on how to make my own!
a
Right! That's the 3rd result on my screenshot but anyway curious to know if you try the Cerebrium package and have some feedback over how reusable is for your use case 🙂
g
@Oliver cerebrium have a package but it assumes airbyte’s basic normalisation. I've built one internally directly over the RAW json. Was thinking of publishing it but it all feels too opinionated IMO to see a one-size-fits-all
😍 1
a
Hey @gunu, curious to know why you preferedto build it from the raw data instead of normalization? :)
g
I find the basic normalisation sometimes too bloated e.g. the automatic table generation of nested JSON fields. also I think the default deduping method could be optimised
👍 1
m
@Edward Gao (Airbyte) fyi this discussion
👀 1
o
@gunu - some more context, I was thinking of adding some dbt metrics + Lightdash charts/dashboards as code, That way somebody can spin up airbyte+dbt+Lightdash for an instant github analytics stack. I’m not sure which would be most useful for the community. But if not I’ll use our internal one too, which is also quite opinionated I guess ha Do most airbyte users use the normalisation or not?
octavia loves 1
a
cc @Simon Späti (Airbyte)
o
so many people cc’d on this thread, what is going on 😂
😝 1
are ya’ll folks that would benefit from the package or people looking to contribute?
👀 1
a
I think Simon is also building some Github analytics as a sample project to showcase Airbyte 😉
o
Cool! Can you share more @Simon Späti?
s
Hi @Oliver Laslett 🙂 Interesting thoughts with
dbt metrics + Lightdash
, I have similar ideas as part of my example MDS project with
airbyte+dbt+dagster+(cube.js or metricflow or dbt metrics)+(Metabase or superset)
. My plan: I start with airbyte and dbt and use the basic normalization that airbyte provides. I'd add additional transformation logic and cleaning on top with standard dbt and schedule it with dagster, which would look something like the attached (see article here). Not sure if that helps at all. Unfortunately, I do not have a repo or code yet to share, but that is something I will work on pretty soon.
m
@User check this discussion
a

https://media.giphy.com/media/iOH0OFXM94IwtZ7PON/giphy.gif

i
Hey @Simon Späti was wondering if you had any further thoughts on this idea or whether you started working on it? Would love to continue the conversation, particularly as I was separately planning on meeting up with @Justin Chau (Airbyte) anyway to talk about content ideas like this!