Hello, what are Airbyte’s guidelines when dealing ...
# connector-development
j
Hello, what are Airbyte’s guidelines when dealing with quota errors in APIs. For instance, we are struggling with Google Analytics as it has strict quota limitations. We are running incremental synch and when it fails: • The state of the completed streams is saved but the state of the partially synched state is not saved I believe that there are potential improvements: • The state of the the partially synched state should be saved to allow the next run not to start from the beginning • The connector could sleep and regularly check if quota is restored instead of failing
o
loading...
a
Hi @Jaime Farres, generally speaking our connector are able to persist state of partially synced stream thanks to a checkpointing logic. This allows the next run to start from the latest stored cursor and avoid re-reading the same data again in case of partial failure. Connectors can also implement an exponential back off and retry strategy to respect API quota. Both of this feature are not always introduced by the connector developers, let me check what's the current context for Google Analytics.
I dug a bit into the Google Analytics code. There's no custom implementation of a back off strategy. In case a rate limiting error is found it will retry 5 time after exponential sleep time: this is the default behavior. The state is saved each X records (X being your windows is days value).
You can increase the windows is days value in your configuration to reduce the load on the API but you might get sampled results.
j
OK thanks, but regardless of how it works now, what would be ideal? I work with @Vladimir Remar so we have experience improving connectors
a
If possible you can leverage stream slices for more granular state checkpointing. You could also dig in the Google Analytics API reference to check if rate limiting response have information about quota consumptions and make the backoff_time and should_retry method a bit more dynamic.
j
Yes, like the FB API. I have researched a little bit but found nothing. Indeed, the limit we usually run into is the daily limit, which resets at 12am PST
we’ll see and maybe open a PR if we find meaningful improvements