Yannick Sacherer
10/28/2024, 3:42 PMuser
10/28/2024, 3:43 PMhttp_method
field in the HttpRequester configuration specifies which method to use: yaml http_method: string
Examples given are: GET
or POST
This suggests that the HTTP method used depends on how the specific connector is configured, rather than following a predetermined order of calls. If you need more detailed information about the sequence of HTTP method calls in Airbyte, you may want to check the documentation for specific connectors or examine the source code of the relevant components.Yannick Sacherer
10/28/2024, 3:48 PMuser
10/28/2024, 3:48 PMuser
10/28/2024, 3:48 PMread_records
method, which is called on the stream object. 2. Inside read_records
, the _read_pages
method is called. This method handles pagination and making HTTP requests. 3. For each page of data: a. The _send_request
method is called to make the HTTP request. b. After receiving the response, the parse_response
method is called to process the data from the API response. c. The next_page_token
method is then called to determine if there are more pages to fetch. This sequence is evident from the following code snippet from the [HttpStream class](https://github.com/airbytehq/airbyte/discussions/33814): python def read_records( self, sync_mode: SyncMode, cursor_field: List[str] = None, stream_slice: Mapping[str, Any] = None, stream_state: Mapping[str, Any] = None, ) -> Iterable[StreamData]: yield from self._read_pages( lambda req, res, state, _slice: self.parse_response(res, stream_slice=_slice, stream_state=state), stream_slice, stream_state )
As you can see, read_records
calls _read_pages
, which in turn calls parse_response
. The parse_response
method is typically implemented in your custom stream class to handle the specific format of the API response: python def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: res = response.json() yield from res.get("data", [])
It's important to note that other methods like path
, request_params
, and next_page_token
are called as needed during this process to construct requests and handle pagination. This flow ensures that Airbyte can efficiently read data from HTTP APIs, handle pagination, and process the responses into a format suitable for further processing in the Airbyte pipeline. [Source: How the Airbyte CDK read command works](https://github.com/airbytehq/airbyte/discussions/33814)Yannick Sacherer
10/29/2024, 9:43 AMuser
10/29/2024, 9:43 AMuser
10/29/2024, 9:43 AMstream_slices
function and what a cursor_field
is. - [Python Concepts](https://docs.airbyte.com/connector-development/cdk-python/python-concepts) for tips on useful Python knowledge. - [Tutorial for implementing an HTTP source connector](https://docs.airbyte.com/connector-development/tutorials/custom-python-connector/getting-started). Remember, connectors that follow these best practices will be labelled as "Airbyte Certified" to indicate they've passed a high quality bar and will perform reliably in all production use cases.Yannick Sacherer
10/29/2024, 2:37 PMuser
10/29/2024, 2:37 PMuser
10/29/2024, 2:38 PMrequest_body_data
and request_body_json
methods are typically used when you need to send data in the body of HTTP requests, which is common for POST, PUT, and PATCH requests. - request_body_data
is used when you want to send a non-JSON payload in the body of the request. If this method returns a string, it will be sent as is. If it returns a dictionary, it will be converted to a urlencoded form. - request_body_json
is used when you want to send a JSON payload in the body of the request. Here's an example of how you might override these methods in your custom stream class: python def request_body_json( self, stream_state: Optional[Mapping[str, Any]], stream_slice: Optional[Mapping[str, Any]] = None, next_page_token: Optional[Mapping[str, Any]] = None, ) -> Optional[Mapping[str, Any]]: # Your logic here to construct the JSON payload return {"key": "value"}
Remember, you can only override one of request_body_data
and request_body_json
in a given class. If you need to send both JSON and non-JSON data, you might need to create separate stream classes for each type of request. For GET requests, you typically don't need to override these methods, as GET requests usually don't have a body. Instead, you would use the request_params
method to specify any query parameters for the request. For more information, refer to the [Python CDK documentation](https://docs.airbyte.com/connector-development/cdk-python/basic-concepts).