https://linen.dev logo
h

Harshith (Airbyte)

04/07/2021, 3:33 PM
hey we are trying to do csv (HTTPS) -> bigquery using
file
connection. There is a
number
field in the csv data and when we are pushing to
bigquery
this is taken as null. can someone help me with this ?
u

user

04/07/2021, 5:17 PM
Is this field showing as a
number
field in the configuration for your CSV file source?
u

user

04/07/2021, 5:17 PM
Is it possible to share the CSV line that is causing this problem?
u

user

04/07/2021, 5:34 PM
if we discover schema it shows up as number, so I updated the schema manually using API even then it is showing as null
u

user

04/07/2021, 5:42 PM
What is the type of the BQ column?
u

user

04/07/2021, 5:42 PM
initally it was float then I changed it to string
u

user

04/07/2021, 5:49 PM
ah that’s probably it then
u

user

04/07/2021, 5:49 PM
it’s trying to write the value using the incorrect type
u

user

04/07/2021, 5:50 PM
did you change it to string in BQ before syncing or after syncing?
u

user

04/07/2021, 5:52 PM
I changed it before syncing
u

user

04/07/2021, 5:56 PM
are you syncing into an existing table then? not one created by airbyte?
u

user

04/07/2021, 5:58 PM
generally we recommend having airbyte create the tables and write to them. then after that point users often use DBT to combine/modify schemas
u

user

04/07/2021, 5:58 PM
I have created the whole thing using API 1. Created source 2. Created connection (in this instead of using discover schema I manually added schema as string) 3. After this I clicked sync now
u

user

04/07/2021, 6:03 PM
We don’t actually support that now for basic normalization
u

user

04/07/2021, 6:04 PM
One sec, checking something.
u

user

04/07/2021, 6:07 PM
So what’s happening here is the source is producing some record with data
{'key':100}
because it thinks it’s producing a number. Then at the destination and normalization level it’s getting a catalog with
string
and within normalization it isn’t actually receiving the correct value and doesn’t cast automatically.
u

user

04/07/2021, 6:13 PM
what is the solution?
u

user

04/07/2021, 6:14 PM
Type coercion from files is a very valid use case. I think what we might want to do is have
source-file
look at the catalog it is provided and cast the records it produces for a stream to the type specified.
u

user

04/07/2021, 6:18 PM
Actually after thinking about it more it probably makes more sense for normalization to handle the casting in a way that doesn’t end up with a null here.
u

user

04/07/2021, 6:20 PM
I’m going to try to reproduce this issue locally and create a ticket for fixing this.
u

user

04/07/2021, 6:20 PM
In the meantime is it possible to use floats on the BQ side?
u

user

04/07/2021, 6:43 PM
First I tried float in BQ but it didn't work