hello I am having problems pulling data from an API when y p Airbyte #connector-development

hello, I am having problems pulling data from an A...

Phoebe Yang

01/28/2022, 4:40 PM

hello, I am having problems pulling data from an API, when y put retrive only 10000 object the EC2 instance dead. I tried that on local using python request and works well in 3 minutos around. My aws EC2 has: 30GB ssd 4GB Ram 2vcpu in addition, when i pull 1000 using airbyte the size is too big

Copy code

lass ServicesnowApi(HttpStream):
    url_base = "https://.com/api/now/v1/"

    # Set this as a noop.
    primary_key = None

    def __init__(self, limit: str, sys_created_from: str, sys_created_to: str, **kwargs):
        super().__init__(**kwargs)
        # Here's where we set the variable from our input to pass it down to the source.
        self.limit = limit
        self.sys_created_from = sys_created_from
        self.sys_created_to = sys_created_to


    def path(self, **kwargs) -> str:
        # This defines the path to the endpoint that we want to hit.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return f"table/incident?sysparm_offset=0&sysparm_limit={limit}&sysparm_query=sys_created_on>={sys_created_from} 08:00^sys_created_on<{sys_created_to} 08:00^active=ISNOTEMPTY"


    def request_params(
            self,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> MutableMapping[str, Any]:
        # The api requires that we include the Pokemon name as a query param so we do that in this method.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return {"limit": limit, "sys_created_from":sys_created_from, "sys_created_to":sys_created_to}


    def parse_response(
            self,
            response: requests.Response,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> Iterable[Mapping]:
        # The response is a simple JSON whose schema matches our stream's schema exactly,
        # so we just return a list containing the response.
        return [response.json()]


    def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
    # While the PokeAPI does offer pagination, we will only ever retrieve one Pokemon with this implementation,
    # so we just return None to indicate that there will never be any more pages in the response.
        return None

Daniel Eduardo Portugal Revilla

01/28/2022, 5:15 PM

Hi @Daniel Eduardo Portugal Revilla, your EC2 instance is probably too small, please give it more memory and CPU, this will probably avoid the DEADLINE_EXCEEDED error. We've a guide about scaling airbyte which has some sizing suggestions. You can try something like 16gb.

flow

01/29/2022, 6:29 PM

but... 7000 records are igual to 50MB and 10000 maybe 66MB? I think it is smaller

Jordan Velich

01/31/2022, 6:05 AM

Hey it's the resources which are in need from the prebuit systems (temporal, server, scheduler) and doesn't map exactly to the sync.

Jordan Velich

01/31/2022, 6:06 AM

Can you try the same increasing the resources?

user

02/01/2022, 4:49 PM

Hello @Harshith (Airbyte) how much? it is only for one API, what happen if I have more APIs. for example this api is about 4000000 records, I need to pull all the historical data first

Alessandro Duico

02/02/2022, 11:21 AM

I would suggest to try 16gb one

Alessandro Duico

02/02/2022, 11:48 AM

is incremental stream or stream_slices a better option? or page_size?

Alessandro Duico

02/02/2022, 11:52 AM

If you want incremental sync with respect to data you can use stream_slices and page_size is wrt to API right if I am not wrong ?

Alessandro Duico

02/02/2022, 12:22 PM

Airbyte result is this... Should I get out the result from.

"result": [...]

? maybe this doesn't fit the schema, all results return inside this array

3 Views

Open in Slack

Previous Next