https://linen.dev logo
#contributing-to-airbyte
Title
# contributing-to-airbyte
m

Michel

12/03/2020, 6:27 PM
The Local CSV destination has an unexpected behavior
u

user

12/03/2020, 6:27 PM
it starts by writing to
exchange_rate_1607019960628.csv
and when it finishes the files gets renamed to
exchange_rate_raw.csv
u

user

12/03/2020, 6:28 PM
@Chris (deprecated profile) do you have an idea of what could cause that?
u

user

12/03/2020, 6:46 PM
why is that unexpected?
u

user

12/03/2020, 6:47 PM
this matches the db behavior of creating tmp tables and renaming
u

user

12/03/2020, 6:47 PM
I would expect this destination to create new files on every sync
u

user

12/03/2020, 6:47 PM
here it will just erase
u

user

12/03/2020, 6:47 PM
oh is this incremental?
u

user

12/03/2020, 6:47 PM
no
u

user

12/03/2020, 6:47 PM
it just overwrite
u

user

12/03/2020, 6:48 PM
full refresh we don't expect to overwrite the file?
u

user

12/03/2020, 6:48 PM
I thought this was intentional to keep the target file "up to date"
u

user

12/03/2020, 6:48 PM
ok, my bad, 🙂
u

user

12/03/2020, 6:48 PM
the exchange currency seems to be incremental out of the box
u

user

12/03/2020, 6:48 PM
ah
u

user

12/03/2020, 6:48 PM
so the data just gets erased
u

user

12/03/2020, 6:50 PM
Also do we want local csv files to end with \r\n or just \n
u

user

12/03/2020, 6:50 PM
right now it is \r\n
u

user

12/03/2020, 6:52 PM
(that caused me some issues when trying to extract the json blob)
u

user

12/03/2020, 7:08 PM
do we need to change some behavior in the destination csv then ?
u

user

12/03/2020, 7:08 PM
I think we need to define them first
u

user

12/03/2020, 7:09 PM
The way I think about it is that file should be immutable
u

user

12/03/2020, 7:12 PM
what do you think?
u

user

12/03/2020, 7:12 PM
regarding the EOL, it should be an option
u

user

12/03/2020, 7:13 PM
i think if we are saying that this destination does full refresh, then i expect the file to be overwritten.
u

user

12/03/2020, 7:13 PM
that makes it the same behavior as everything else.
u

user

12/03/2020, 7:14 PM
everything could be an option.... 🙄
u

user

12/03/2020, 7:14 PM
we are passing a state
u

user

12/03/2020, 7:15 PM
when I do a manual sync after the first sync, I only get the latest day,
u

user

12/03/2020, 7:15 PM
😒 damn. i think state wasn't being passed properly originally so we weren't seeing this behavior. but once we fixed state passing in the worker, this probably crept in.
u

user

12/03/2020, 7:16 PM
@s - heads up, related to what we were talking about re stripe singer and incremental 😂
u

user

12/03/2020, 7:19 PM
@Michel what was the issue caused by
\r\n
when extracting the json blob? it seems a little suspicious that this would break a json parser.
u

user

12/03/2020, 7:19 PM
it doesn't break the json parser
u

user

12/03/2020, 7:20 PM
but because the data is in acolumn and not a real json, I needed to do some post processing to extract the blob:
Copy code
cat exchange_rate_raw.csv | tail -n +2 | cut -d , -f 3- | sed 's/^.\(.*\)..$/\1/g' | tr -s \" | jq -s .
u

user

12/03/2020, 7:21 PM
it just made it harder to remove the last double quote
u

user

12/03/2020, 7:21 PM
it is more a user experience problem here
u

user

12/03/2020, 7:23 PM
Wouldn' t it be simpler to store as a newline delimited JSON instead of a fake CSV file ?
u

user

12/03/2020, 7:23 PM
haha 🙂 that's another question indeed
u

user

12/03/2020, 7:23 PM
It would probably make more sense
u

user

12/03/2020, 7:25 PM
to be clear this destination was one of the first one we built, mostly for debugging.
u

user

12/03/2020, 7:25 PM
Ideally we would do smth similar to the source file but for destinations
u

user

12/03/2020, 7:25 PM
yeah. if our goal here is for someone to test us out, new line delimited json may be easier to work with.
u

user

12/03/2020, 7:26 PM
so we can easily write to any store
u

user

12/03/2020, 7:26 PM
but we will need to answer the question of what is the expected behavior regarding files: overwrite or versioned?
u

user

12/03/2020, 7:27 PM
and how does that play in a world with full refreshes and incrementals
u

user

12/03/2020, 7:30 PM
full refresh as we have defined it is very clearly overwriting existing data. if you'd like this destination to act differently, i think you really need to want it.
u

user

12/03/2020, 7:30 PM
versioning is not in the vocabulary right now at all.
u

user

12/03/2020, 7:31 PM
just to be super clear that doing this is sneaking in a non-trivial new feature and vocabulary.
u

user

12/03/2020, 7:31 PM
agreed
u

user

12/04/2020, 1:25 AM
Thinking more about this issue..for this kind of source (singer that is incremental by default), is there any reason we’d force it to be full refresh? why don’t we say it’s only available for incremental sync?
u

user

12/04/2020, 1:26 AM
and for any destination which supports incremental, it should look at the incoming catalog and sync correctly
u

user

12/04/2020, 1:26 AM
I know we said this morning we’ll force it to full refresh just to minimize delivery risk
5 Views