The Local CSV destination has an unexpected behavi...
# contributing-to-airbyte
m
The Local CSV destination has an unexpected behavior
u
it starts by writing to
exchange_rate_1607019960628.csv
and when it finishes the files gets renamed to
exchange_rate_raw.csv
u
@Chris (deprecated profile) do you have an idea of what could cause that?
u
why is that unexpected?
u
this matches the db behavior of creating tmp tables and renaming
u
I would expect this destination to create new files on every sync
u
here it will just erase
u
oh is this incremental?
u
no
u
it just overwrite
u
full refresh we don't expect to overwrite the file?
u
I thought this was intentional to keep the target file "up to date"
u
ok, my bad, 🙂
u
the exchange currency seems to be incremental out of the box
u
ah
u
so the data just gets erased
u
Also do we want local csv files to end with \r\n or just \n
u
right now it is \r\n
u
(that caused me some issues when trying to extract the json blob)
u
do we need to change some behavior in the destination csv then ?
u
I think we need to define them first
u
The way I think about it is that file should be immutable
u
what do you think?
u
regarding the EOL, it should be an option
u
i think if we are saying that this destination does full refresh, then i expect the file to be overwritten.
u
that makes it the same behavior as everything else.
u
everything could be an option.... 🙄
u
we are passing a state
u
when I do a manual sync after the first sync, I only get the latest day,
u
😒 damn. i think state wasn't being passed properly originally so we weren't seeing this behavior. but once we fixed state passing in the worker, this probably crept in.
u
@s - heads up, related to what we were talking about re stripe singer and incremental 😂
u
@Michel what was the issue caused by
\r\n
when extracting the json blob? it seems a little suspicious that this would break a json parser.
u
it doesn't break the json parser
u
but because the data is in acolumn and not a real json, I needed to do some post processing to extract the blob:
Copy code
cat exchange_rate_raw.csv | tail -n +2 | cut -d , -f 3- | sed 's/^.\(.*\)..$/\1/g' | tr -s \" | jq -s .
u
it just made it harder to remove the last double quote
u
it is more a user experience problem here
u
Wouldn' t it be simpler to store as a newline delimited JSON instead of a fake CSV file ?
u
haha 🙂 that's another question indeed
u
It would probably make more sense
u
to be clear this destination was one of the first one we built, mostly for debugging.
u
Ideally we would do smth similar to the source file but for destinations
u
yeah. if our goal here is for someone to test us out, new line delimited json may be easier to work with.
u
so we can easily write to any store
u
but we will need to answer the question of what is the expected behavior regarding files: overwrite or versioned?
u
and how does that play in a world with full refreshes and incrementals
u
full refresh as we have defined it is very clearly overwriting existing data. if you'd like this destination to act differently, i think you really need to want it.
u
versioning is not in the vocabulary right now at all.
u
just to be super clear that doing this is sneaking in a non-trivial new feature and vocabulary.
u
agreed
u
Thinking more about this issue..for this kind of source (singer that is incremental by default), is there any reason we’d force it to be full refresh? why don’t we say it’s only available for incremental sync?
u
and for any destination which supports incremental, it should look at the incoming catalog and sync correctly
u
I know we said this morning we’ll force it to full refresh just to minimize delivery risk