Hey guys. I'm really excited about this project an...
# contributing-to-airbyte
s
Hey guys. I'm really excited about this project and tried to get it running yesterday for the first time. Unfortunately, I ran into a normalization error. Hoping someone can suggest a fix. I'm running airbyte on CE and attempting Shopify -> BigQuery. First sync exited with an error. I checked the data and some of the columns that should exist based on the schema do not. For example, the orders table does not contain a line_items column. However, the rows all seem to be there. Below is the relevant section of the logs. Appreciate any help! **EDIT: I see that the expected result would be a separate line_items table, but that doesn't exist either.
u
To add to this, I attempted to sync a different Shopify store's data to BigQuery and got the same error and results.
u
I also attempted using Postgres instead of BigQuery, but got the same error and missing column.
u
I see in the basic normalization docs that since the Shopify "line_items" datatype is an array, it should be expanded into its own table. However, I don't see the expected tables for the various arrays.
u
Hi @Spyq Sklar - thanks for reporting this. I’ll take a look later today.
u
FWIW normalization is not coupled to any particular destinatino — it’s a single module that works with all destinations. So it makes sense that the issue is happening across destiantions
u
What Airbyte version are you running? you can find out by running
cat .env | head
in the airbyte repo root
u
Thanks for your help Shrif! Version 0.14.1-alpha
u
u
One more piece of information. I noticed the error was coming from Shopify's metadata endpoint. If I reset the connection and re-run it with metadata toggled off, the sync runs without errors. However, the additional tables that should be generated as part of basic normalization still don't show up.
u
Nested columns or arrays columns are not properly supported by normalization yet, as reported here: #886 We’ll have to work on it, sorry for the inconvenience!
u
u
yes you’re right!
u
Do you have a sense for when this will be available? I suppose I could attempt doing this on my own for the data I need through dbt or sql as suggested. However, the connecting EL to T docs looked challenging (i'm no expert at this)
u
for the nested data, we’re planning to work on it this week and it should be available in a couple of weeks. For the issue with normalization breaking, we’ll hopefully have a fix for that today.
u
Thanks Shrif. Is there a webhook that triggers on sync completion (or something similar) I could use to run my own unnesting code in the meantime?
u
@Spyq Sklar Can you go to the Admin UI and verify which version of the Shopify source connector you’re using? The latest version is
0.1.7
— if you’re on an earlier version, could you update and retry the sync?
u
@s. I was on 0.1.6. I have updated to 0.1.7 and will let you know how the sync goes. However a sync attempt is already underway and it each attempt seems to have taken ~9hrs. Is there a safe way to cancel a sync in progress or should I just wait?
u
@Spyq Sklar did the sync run successfully? we don’t yet have a way to cancel in progress syncs though this is coming very soon.
u
@s I entered in the new version in the admin panel. Was there anything else I needed to do? So far, the sync is still failing.
u
@Spyq Sklar from the connection settings (sources -> shopify source --> destination --> settings) can you click on “Update Schema” ? Also, what is the current data type of the “value” column under the
metafields
table?
u
there was a type fix that feels eerily close to this problem that was published in 0.1.7 but it would require updating the schema on your connection
u
out of curiosity how much data are you replicating from shopify? i’m trying to understand why the sync takes 9 hours
u
@s Datatype of value is number. Once the sync finishes I will update the schema and try again. Each sync attempt seems to take about 9.5hrs and it makes 3 attempts for every sync. Seems like it is not doing incremental updating since the syncs are failing.
u
BiqQuery shows 0.071TB in Analysis and 0.06 in Active Storage. My store is probably on the low end of data with respect to the kinds of stores likely to use Airbyte
u
yeah, really unfortunate that the first sync is failing because like you said it will be incremental afterwards 🤦🏼‍♂️
u
Is the type showing as number after you updated the schema? you should be able to update the schema at any time fwiw
u
it shouldnt’ impact the destination since the syncs are failing
u
@s I was waiting to update the schema until after the sync. I just reset it, and now it says value.
u
value is a string?
u
Sorry, yes value is a string
u
great
u
I think that should fix it
u
some context: the
value
column can be a number, string, or object. We previously had a bug where we always recognized it as a number (in shopify 0.1.6)
u
the error log you sent indicates that a string value (technically a boolean) was being parsed as a number
u
which was causing normalization to fail
u
so with the new connector pulled and the schema updated, I think your sync should succeed
u
Awesome, will let you know
u
Quick question, until basic normalization includes nested data I have my own sql query handling that which is currently just scheduled by BQ. Is there a way to have it triggered after each sync instead?
u
at the moment there aren’t any hooks/events that are launched by airbyte to indicate that a sync is complete. For the moment one thing you can do is something like this: 1. Create the connection with a manual sync schedule 2. Use cloud composer to create a workflow which triggers a sync, waits until the sync is done 3. trigger the BQ hook to extract nested data from the cloud composer workflow definitely not ideal and something we are actively working on, but it might get you through
u
Got it. Thanks for the suggestions and your help! I can probably figure that out, and if not my current setup is fine if not perfect. Please let me know when basic normalization for nested data is live! I'll confirm the sync works with the new connector version
u
@s sync failed. I checked the schema and the value field is back to being a number. It must have reverted back some how. Admin still shows the shopify tag of 0.1.7
u
can you create a new connection from the same source and destiantion?
u
this should create a new schema from scratch
u
there might have been a bug with resetting the schema
u
will investigate
u
Okay will do
u
apologies for the issues
u
That's alright. Appreciate the help. In the end, this is going to be hugely helpful for my business
u
@s sync succeeded and best of all, only took 8 minutes instead of 9.5hrs. Thanks for your help!
u
amazing
u
glad to hear
u
@s Quick question. I have syncs set to hourly. The first sync took 8 minutes. The second took 31m and the 3rd is still running after 2hrs. Shouldn't the second sync take considerably less time due to it being incremental? Why would the 3rd be taking so long even though there isn't much new data? Seems strange
u
how many records were replicated in each sync?