I'm wondering how it is that you're not defaulting...
# contributing-to-airbyte
h
I'm wondering how it is that you're not defaulting to e.g. aiohttp or some other async http client in the python CDK?
u
The honest answer is I was so caught up on figuring out the right abstraction that itdid not cross my mind while implementing it 😛
u
Is there a particular client you’d recommend? We’re working through v2 of the CDK these days and would be good to add performance enhancements like this
u
I’ve been trying the one I mentioned and it’s actually really nice
y
@s Please let me know when you start working on cdk v2 and perhaps we can collaborate?
u
@haf thinking through this once more actually, this potentially can go in the current version if we can do it in a backwards compatible way. Is it compatible with the
requests
interface?
u
I think it’s methods are named the same but change the return type to a coroutine
u
I’ll have to take another look. My use case is to run multiple customers in one process
u
I see
u
I’m assuming you took a look at the HTTP stream already: https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py Some potentially related tasks issues, some of which are in ideation/design phase: • https://github.com/airbytehq/airbyte/issues/3284 • https://github.com/airbytehq/airbyte/issues/2787 <-- this one is maybe the most relevant to your usecase? but it’s also one of the harder ones to get right since it doesn’t necessarily play so nicely with the slices concept as it’s implemented today
u
u
I would also allow connectors to yield both a new updated
configured catalog
(alt: just
catalog
) as well as separating
tokens
(secrets) from the config and from the state
u
If you're going to support #5084 you probably will need a leaky-bucket implementation, such as https://github.com/dmarkey/aiopylimit
u
I also noticed there's a bunch of strangeness around parsing of datetime; all of it assuming some "server time" alternatively "non-local" dates (no offset, no timezone). When IRL you want to get reports localised to the timezone your advertising uses (e.g. "before noon on black friday")
u
If we do the leaky bucket alternative, we can easily let the CDK spawn a queue per stream (one item in the queue means that one API poll is allowed); and so all streams can run concurrently within the process
y
I would also go down the route of middlewares https://us-pycon-2019-tutorial.readthedocs.io/aiohttp_middlewares.html rather than inheritance. That way, you can compose middlewares finally with inheritance, but you can also use every request individually
u
This also means you can build a library of middlewares that compose, including a rate-limiting middleware