https://linen.dev logo
s

Spyq Sklar

01/31/2021, 4:59 PM
Hey guys. I'm really excited about this project and tried to get it running yesterday for the first time. Unfortunately, I ran into a normalization error. Hoping someone can suggest a fix. I'm running airbyte on CE and attempting Shopify -> BigQuery. First sync exited with an error. I checked the data and some of the columns that should exist based on the schema do not. For example, the orders table does not contain a line_items column. However, the rows all seem to be there. Below is the relevant section of the logs. Appreciate any help! **EDIT: I see that the expected result would be a separate line_items table, but that doesn't exist either.
u

user

01/31/2021, 6:28 PM
To add to this, I attempted to sync a different Shopify store's data to BigQuery and got the same error and results.
u

user

01/31/2021, 8:19 PM
I also attempted using Postgres instead of BigQuery, but got the same error and missing column.
u

user

01/31/2021, 8:32 PM
I see in the basic normalization docs that since the Shopify "line_items" datatype is an array, it should be expanded into its own table. However, I don't see the expected tables for the various arrays.
u

user

01/31/2021, 8:42 PM
Hi @Spyq Sklar - thanks for reporting this. I’ll take a look later today.
u

user

01/31/2021, 8:54 PM
FWIW normalization is not coupled to any particular destinatino — it’s a single module that works with all destinations. So it makes sense that the issue is happening across destiantions
u

user

01/31/2021, 8:56 PM
What Airbyte version are you running? you can find out by running
cat .env | head
in the airbyte repo root
u

user

01/31/2021, 9:13 PM
Thanks for your help Shrif! Version 0.14.1-alpha
u

user

01/31/2021, 9:14 PM
u

user

01/31/2021, 9:18 PM
One more piece of information. I noticed the error was coming from Shopify's metadata endpoint. If I reset the connection and re-run it with metadata toggled off, the sync runs without errors. However, the additional tables that should be generated as part of basic normalization still don't show up.
u

user

02/01/2021, 10:06 AM
Nested columns or arrays columns are not properly supported by normalization yet, as reported here: #886 We’ll have to work on it, sorry for the inconvenience!
u

user

02/01/2021, 4:23 PM
u

user

02/01/2021, 4:24 PM
yes you’re right!
u

user

02/01/2021, 4:30 PM
Do you have a sense for when this will be available? I suppose I could attempt doing this on my own for the data I need through dbt or sql as suggested. However, the connecting EL to T docs looked challenging (i'm no expert at this)
u

user

02/01/2021, 5:05 PM
for the nested data, we’re planning to work on it this week and it should be available in a couple of weeks. For the issue with normalization breaking, we’ll hopefully have a fix for that today.
u

user

02/01/2021, 5:35 PM
Thanks Shrif. Is there a webhook that triggers on sync completion (or something similar) I could use to run my own unnesting code in the meantime?
u

user

02/02/2021, 6:16 AM
@Spyq Sklar Can you go to the Admin UI and verify which version of the Shopify source connector you’re using? The latest version is
0.1.7
— if you’re on an earlier version, could you update and retry the sync?
u

user

02/02/2021, 8:22 PM
@s. I was on 0.1.6. I have updated to 0.1.7 and will let you know how the sync goes. However a sync attempt is already underway and it each attempt seems to have taken ~9hrs. Is there a safe way to cancel a sync in progress or should I just wait?
u

user

02/03/2021, 4:37 PM
@Spyq Sklar did the sync run successfully? we don’t yet have a way to cancel in progress syncs though this is coming very soon.
u

user

02/03/2021, 7:30 PM
@s I entered in the new version in the admin panel. Was there anything else I needed to do? So far, the sync is still failing.
u

user

02/03/2021, 10:33 PM
@Spyq Sklar from the connection settings (sources -> shopify source --> destination --> settings) can you click on “Update Schema” ? Also, what is the current data type of the “value” column under the
metafields
table?
u

user

02/03/2021, 10:33 PM
there was a type fix that feels eerily close to this problem that was published in 0.1.7 but it would require updating the schema on your connection
u

user

02/03/2021, 10:34 PM
out of curiosity how much data are you replicating from shopify? i’m trying to understand why the sync takes 9 hours
u

user

02/03/2021, 11:35 PM
@s Datatype of value is number. Once the sync finishes I will update the schema and try again. Each sync attempt seems to take about 9.5hrs and it makes 3 attempts for every sync. Seems like it is not doing incremental updating since the syncs are failing.
u

user

02/03/2021, 11:38 PM
BiqQuery shows 0.071TB in Analysis and 0.06 in Active Storage. My store is probably on the low end of data with respect to the kinds of stores likely to use Airbyte
u

user

02/03/2021, 11:53 PM
yeah, really unfortunate that the first sync is failing because like you said it will be incremental afterwards 🤦🏼‍♂️
u

user

02/03/2021, 11:55 PM
Is the type showing as number after you updated the schema? you should be able to update the schema at any time fwiw
u

user

02/03/2021, 11:55 PM
it shouldnt’ impact the destination since the syncs are failing
u

user

02/04/2021, 1:02 AM
@s I was waiting to update the schema until after the sync. I just reset it, and now it says value.
u

user

02/04/2021, 1:02 AM
value is a string?
u

user

02/04/2021, 1:03 AM
Sorry, yes value is a string
u

user

02/04/2021, 1:04 AM
great
u

user

02/04/2021, 1:05 AM
I think that should fix it
u

user

02/04/2021, 1:05 AM
some context: the
value
column can be a number, string, or object. We previously had a bug where we always recognized it as a number (in shopify 0.1.6)
u

user

02/04/2021, 1:06 AM
the error log you sent indicates that a string value (technically a boolean) was being parsed as a number
u

user

02/04/2021, 1:07 AM
which was causing normalization to fail
u

user

02/04/2021, 1:07 AM
so with the new connector pulled and the schema updated, I think your sync should succeed
u

user

02/04/2021, 1:07 AM
Awesome, will let you know
u

user

02/04/2021, 1:08 AM
Quick question, until basic normalization includes nested data I have my own sql query handling that which is currently just scheduled by BQ. Is there a way to have it triggered after each sync instead?
u

user

02/04/2021, 1:12 AM
at the moment there aren’t any hooks/events that are launched by airbyte to indicate that a sync is complete. For the moment one thing you can do is something like this: 1. Create the connection with a manual sync schedule 2. Use cloud composer to create a workflow which triggers a sync, waits until the sync is done 3. trigger the BQ hook to extract nested data from the cloud composer workflow definitely not ideal and something we are actively working on, but it might get you through
u

user

02/04/2021, 1:33 AM
Got it. Thanks for the suggestions and your help! I can probably figure that out, and if not my current setup is fine if not perfect. Please let me know when basic normalization for nested data is live! I'll confirm the sync works with the new connector version
u

user

02/04/2021, 4:21 PM
@s sync failed. I checked the schema and the value field is back to being a number. It must have reverted back some how. Admin still shows the shopify tag of 0.1.7
u

user

02/04/2021, 4:22 PM
can you create a new connection from the same source and destiantion?
u

user

02/04/2021, 4:22 PM
this should create a new schema from scratch
u

user

02/04/2021, 4:22 PM
there might have been a bug with resetting the schema
u

user

02/04/2021, 4:23 PM
will investigate
u

user

02/04/2021, 4:23 PM
Okay will do
u

user

02/04/2021, 4:23 PM
apologies for the issues
u

user

02/04/2021, 4:24 PM
That's alright. Appreciate the help. In the end, this is going to be hugely helpful for my business
u

user

02/04/2021, 6:10 PM
@s sync succeeded and best of all, only took 8 minutes instead of 9.5hrs. Thanks for your help!
u

user

02/04/2021, 6:10 PM
amazing
u

user

02/04/2021, 6:10 PM
glad to hear
u

user

02/04/2021, 10:23 PM
@s Quick question. I have syncs set to hourly. The first sync took 8 minutes. The second took 31m and the 3rd is still running after 2hrs. Shouldn't the second sync take considerably less time due to it being incremental? Why would the 3rd be taking so long even though there isn't much new data? Seems strange
u

user

02/04/2021, 10:29 PM
how many records were replicated in each sync?