<@U01AS8LGX41> I'm running into some problems with...
# contributing-to-airbyte
j
@charles I'm running into some problems with google adwords -> postgres
u
Copy code
2020-11-11 03:00:31 ERROR (/tmp/workspace/32/0) LineGobbler(voidCall):69 - Caused by: org.postgresql.util.PSQLException: ERROR: relation "click_performance_report_1605063619055" does not exist
u
however,
CLICK_PERFORMANCE_REPORT_1605063619055
does exist
u
all of the performance reports are all caps
u
campaigns/ad_groups/ads/accounts are all snake case
u
this is without normalization
u
maybe this could be related to the problems @s is seeing with capitalization problems?
u
I think this might be a casing issue
u
capitalization
u
do you think this is in the same area you're already looking into?
u
damn
u
or want me to look into it
u
i think it’s related.
u
but different.
u
sources need to be responsible for passing valid stream and field names.
u
but not all valid stream and field names are going to be valid in a destination
u
and a destination should be responsible for making sure that all valid stream and field names can be handled
u
which it sounds like they don’t right now.
u
it seems like the postgres destination needs to be adjusted to handle this capitalization case.
u
unless we want to take a stance that stream names and fields names are always all caps 🤷
u
i think tha’ts probably a bad idea
u
if everything for postgres is properly quoted, it should support upper case names
u
maybe i’m not understanding the problem.
u
I guess it's a question of if we want to normalize names or support case sensitive names
u
why is it capitalized sometimes and not others in this postgres case?
u
I'm not sure, it might just be that the source streams happen to be that way
u
i’m learning towards not supporting case sensitive names.
u
the source is outputting capitalized streams in my case:

https://user-images.githubusercontent.com/6246757/98760749-c5716700-2388-11eb-9b1d-d58cfde9575f.png

u
these are dumb issues to have.
u
woof. that’s so whacky.
u
I think this means we aren't using jooq escaping properly in the destination for names
u
i vote case insensitive.
u
meaning we auto lower case everything?
u
or that we reject uppercase?
u
• sources need to be able to only return and need to be able to acceppt either all caps or all lower case. don’t care which one. • destinations need to be able to convert from all lower case or all upper case to whatever case they need.
u
tu so destination needs to be able to accept
u
what do we want uppercase or lower case?
u
lower case?
u
yes
u
so then sources can only emit lower case stream names and field names
u
why either all caps or upper?
u
I think it has to happen all in the destination though
u
plus whatever restrictions we put on them around whitespace and whatever.
u
destinations assume all inputs are all lower case and adjust them to whatever they need them to be.
u
I think it's safer if the destination is fully responsible for interpreting the stream name
u
^
u
I think it’s simpler if we just say no matter what a source outputs, the destination will be responsible for handling it correctly
u
although I'm worried that the schema selection won't match then
u
yeah. i’m worried what the airbyte catalog is going to look like if we don’t enforce sanity in the sources.
u
i do agree destinations should be cautious and assume the worse. no disagreement there
u
but i think we also need to enforce what the source is allowed to emit
u
and we should be strict so that we don’t have to handle stupid edge case in our configuration model.
u
yeah, my most recent statement was just w.r.t casing, not special characters
u
i want to enforce casing in airbyte catalog too is what i’m saying
u
i don’t want source to emit any non lowercase characters.
u
is that cool with you?
u
so that means at the python entrypoint.py level we'd want to standardize naming?
u
u
that is cool with me. I like the idea of enforcing naming
u
that would probably tide us over on the destination side for release too
u
what do we do if a source has
aBc
and
ABc
streams?
u
for the release I mean
u
deterministically pick one.
u
so i actually don’t think stringcase works.
u
for this
u
i don’t think any of our infrastructure can normalize for the integration
u
the integration needs to do it right.
u
we can't do that for the release though
u
if the field name that the source emits in discover is
fooBar
and then entrypoint.py sneakily normalized it to
foobar
. when
foobar
is passed back the source won’t know that it’s the same thing as
fooBar
.
u
ah. i see what you’re saying.
u
just as a stop gap.
u
yeah
u
i mean as far as i know there’s only one source that causes problems right now. google sheets.
u
i can fix that one tomorrow
u
sounds like maybe adwords too.
u
can we just fix those two and not mess with entrypoint.py?
u
and salesforce? or was that something else
u
I think it can also happen in sql sources too
u
depending on the data
u
yeah…. i think if we put a stop gap thing in entrypoint.py we will cause new bugs
u
yeah...
u
i think we try our best to get by with what we have now. fix the sources we know are really bad and then gun for doing the right thing after release as a top priority.
u
is it actually hard to support all casings in the jdbc destinations?
u
is there a reason we can't just quote all table names and call it a day?
u
you’re just talking for sources?
u
no in the destinations
u
basically support all cases
u
🤦‍♀️ right. dumb question.
u
i don’t know enough off the top of my head to know if that will work.
u
yeah I don't want to make any decisions on this at this stage
u
going to sleep on it and revisit in the morning
u
🤝
u
maybe poke around a bit to see how we're quoting
u
🥨 🥨 🥨 🥨
u
agreed.
u
thought about this some more this morning. i think alot of my thinking was wrong.
u
i think we should avoid touching sources at all pre-release if we can.
u
i think investing in getting destinations to normalize stuff is generally worthwhile.
u
priority of work from my point of view now is as follows: • (pre-release) add to standard tests a data set that includes streams / fields with spaces and weird casing in the name. get all destinations (with normalization) to success given these cases • (pre-release or right after release) add to sync worker something that applies a normalization to stream and field names before they get to destinations (remove white space, stripe out special characters) • (post-release) at least highly encourage that sources conform to a certain set of rules but maybe don’t fail on them.
u
i could be convinced that i have the order of steps 1 and 2 wrong. what do you think.
u
Pre release we can add quoting to identifiers in the Postgres destination too
u
I think it’s easier to support case sensitivity short term than it is to consider name normalization
u
i came to this ordering because, no matter what destinations will need to do some name normalization from what comes in no matter what. Either it receives any possible string or it receives a string that is already normalized to whatever standard airbyte sets, but not necessarily what the db supports). it will also be easy to test this and give us high confidence it’s working in the standard tests.
u
i have meeting with zazmic and need to review their code now, but i can work on adding the extra data set to the standard tests once i’m done unless someone beats me to it.
u
jared, sounds like you’re focusing on fixing the pg stuf?
u
Yeah I think it’s a short fix, I’ll start on it in 10-15min
u
kk
u
Pre release we can add quoting to identifiers in the Postgres destination too
That's what i end up doing in the normalization pieces by DBT with the default
quoting
parameters set to true in the sources.yml file but we can actually play with those settings to disable quoting or not...
u
going to publish postgres with this change
u
hotfixing the current version
u
tu
u
done
u
Exception in thread "main" org.jooq.exception.DataAccessException: SQL [insert into streamWithCamelCase_1605118306833 (ab_id, data, emitted_at) values (?, cast(? as jsonb), cast(? as timestamp with time zone))]; ERROR: relation "streamwithcamelcase_1605118306833" does not exist
u
i’m getting this error. should your change have fixed this?
u
or that was just for whitspacing?
u
@Jared Rhizor (Airbyte)
u
might fix this
u
did you pull the latest version?
u
yes
u
I would expect the logged sql to include the quotes then
u
Copy code
→ docker image ls airbyte/destination-postgres:0.1.1
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
airbyte/destination-postgres   0.1.1               2d00d190bca9        28 minutes ago      440MB
u
Can you verify you're on that image?
u
And also, is the test you're running based off of a dev image or 0.1.1? If it's off of dev, have you merged master into your test branch?
u
yes to all of those things.
u
i’m using :dev tag
u
and your change is included
u
oh this is the insert
u
so not impacted by my change
u
and this is postgres?
u
I thought insert would quote
u
maybe there's some jooq setting
u
By default, jOOQ will always generate quoted names for all identifiers
u
i’m going to get the new standard tests in a working state and then we can just divide up destinations and get them passing.
u
do you want me to include field names that include special charactrers. e.g.
field*name*with*asterisk*
or is that overkill for right now?
u
I think case sensitivity with _ is all we've seen and all we reasonably need to support
u
google sheets regularly has spaces too.
u
I'm fine failing if anyone does something
super$nonstandard!@
u
oh
u
hmm
u
i’ll do spaces but not asterisk.
u
google sheets probably has special characters too
u
cool?
u
oh yeah. i guess $ could be a thing
u
i mean how much exta work is it to normalize everything that’s not a normal character to an underscore for our existing destinations?
u
if it’s little extra work then we should just do it now.
u
maybe?
u
I mean we can replace special chars with an
_
u
but then it’s ambiguous because
x$*y == x*$y
u
are there any destinations that don't support special characters?
u
snowflake/postgres do?
u
i don’t know.
u
bigquery does too
u
and i guess the filesystem is pretty tolerant for the csv stuff.
u
yeah
u
I guess we should just add some special character tests then
u
agreed
u
ok
u
i do this.
u
u
@s re-requested your reivew since it changed kinda substantially from when you looked at in an hour ago.
u
plan is to merge this. it will break integration tests for bq, snowflake, and pg, and then we split them up and fix them.