hello, I am having problems pulling data from an A...
# connector-development
hello, I am having problems pulling data from an API, when y put retrive only 10000 object the EC2 instance dead. I tried that on local using python request and works well in 3 minutos around. My aws EC2 has: 30GB ssd 4GB Ram 2vcpu in addition, when i pull 1000 using airbyte the size is too big
Copy code
lass ServicesnowApi(HttpStream):
    url_base = "https://.com/api/now/v1/"

    # Set this as a noop.
    primary_key = None

    def __init__(self, limit: str, sys_created_from: str, sys_created_to: str, **kwargs):
        # Here's where we set the variable from our input to pass it down to the source.
        self.limit = limit
        self.sys_created_from = sys_created_from
        self.sys_created_to = sys_created_to

    def path(self, **kwargs) -> str:
        # This defines the path to the endpoint that we want to hit.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return f"table/incident?sysparm_offset=0&sysparm_limit={limit}&sysparm_query=sys_created_on>={sys_created_from} 08:00^sys_created_on<{sys_created_to} 08:00^active=ISNOTEMPTY"

    def request_params(
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> MutableMapping[str, Any]:
        # The api requires that we include the Pokemon name as a query param so we do that in this method.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return {"limit": limit, "sys_created_from":sys_created_from, "sys_created_to":sys_created_to}

    def parse_response(
            response: requests.Response,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> Iterable[Mapping]:
        # The response is a simple JSON whose schema matches our stream's schema exactly,
        # so we just return a list containing the response.
        return [response.json()]

    def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
    # While the PokeAPI does offer pagination, we will only ever retrieve one Pokemon with this implementation,
    # so we just return None to indicate that there will never be any more pages in the response.
        return None
Hi @Daniel Eduardo Portugal Revilla, your EC2 instance is probably too small, please give it more memory and CPU, this will probably avoid the DEADLINE_EXCEEDED error. We've a guide about scaling airbyte which has some sizing suggestions. You can try something like 16gb.
but... 7000 records are igual to 50MB and 10000 maybe 66MB? I think it is smaller
Hey it's the resources which are in need from the prebuit systems (temporal, server, scheduler) and doesn't map exactly to the sync.
Can you try the same increasing the resources?
Hello @Harshith (Airbyte) how much? it is only for one API, what happen if I have more APIs. for example this api is about 4000000 records, I need to pull all the historical data first
I would suggest to try 16gb one
is incremental stream or stream_slices a better option? or page_size?
If you want incremental sync with respect to data you can use stream_slices and page_size is wrt to API right if I am not wrong ?
Airbyte result is this... Should I get out the result from.
"result": [...]
? maybe this doesn't fit the schema, all results return inside this array