hello, I am having problems pulling data from an A...
# connector-development
p
hello, I am having problems pulling data from an API, when y put retrive only 10000 object the EC2 instance dead. I tried that on local using python request and works well in 3 minutos around. My aws EC2 has: 30GB ssd 4GB Ram 2vcpu in addition, when i pull 1000 using airbyte the size is too big
Copy code
lass ServicesnowApi(HttpStream):
    url_base = "https://.com/api/now/v1/"

    # Set this as a noop.
    primary_key = None

    def __init__(self, limit: str, sys_created_from: str, sys_created_to: str, **kwargs):
        super().__init__(**kwargs)
        # Here's where we set the variable from our input to pass it down to the source.
        self.limit = limit
        self.sys_created_from = sys_created_from
        self.sys_created_to = sys_created_to


    def path(self, **kwargs) -> str:
        # This defines the path to the endpoint that we want to hit.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return f"table/incident?sysparm_offset=0&sysparm_limit={limit}&sysparm_query=sys_created_on>={sys_created_from} 08:00^sys_created_on<{sys_created_to} 08:00^active=ISNOTEMPTY"


    def request_params(
            self,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> MutableMapping[str, Any]:
        # The api requires that we include the Pokemon name as a query param so we do that in this method.
        limit = self.limit
        sys_created_from = self.sys_created_from
        sys_created_to = self.sys_created_to
        return {"limit": limit, "sys_created_from":sys_created_from, "sys_created_to":sys_created_to}


    def parse_response(
            self,
            response: requests.Response,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> Iterable[Mapping]:
        # The response is a simple JSON whose schema matches our stream's schema exactly,
        # so we just return a list containing the response.
        return [response.json()]


    def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
    # While the PokeAPI does offer pagination, we will only ever retrieve one Pokemon with this implementation,
    # so we just return None to indicate that there will never be any more pages in the response.
        return None
d
Hi @Daniel Eduardo Portugal Revilla, your EC2 instance is probably too small, please give it more memory and CPU, this will probably avoid the DEADLINE_EXCEEDED error. We've a guide about scaling airbyte which has some sizing suggestions. You can try something like 16gb.
f
but... 7000 records are igual to 50MB and 10000 maybe 66MB? I think it is smaller
j
Hey it's the resources which are in need from the prebuit systems (temporal, server, scheduler) and doesn't map exactly to the sync.
Can you try the same increasing the resources?
u
Hello @Harshith (Airbyte) how much? it is only for one API, what happen if I have more APIs. for example this api is about 4000000 records, I need to pull all the historical data first
a
I would suggest to try 16gb one
is incremental stream or stream_slices a better option? or page_size?
If you want incremental sync with respect to data you can use stream_slices and page_size is wrt to API right if I am not wrong ?
Airbyte result is this... Should I get out the result from.
"result": [...]
? maybe this doesn't fit the schema, all results return inside this array