gunu
03/16/2022, 10:19 AMOctavia Squidington III
03/16/2022, 11:14 AMOctavia Squidington III
03/16/2022, 11:42 AMHarshith (Airbyte)
03/16/2022, 11:44 AMgunu
03/16/2022, 8:24 PMgunu
03/16/2022, 8:24 PMMarcos Marx (Airbyte)
03/17/2022, 2:13 AMgunu
03/17/2022, 2:22 AMgunu
03/17/2022, 8:20 AMgunu
03/17/2022, 9:17 PMgunu
03/17/2022, 9:21 PMMarcos Marx (Airbyte)
03/17/2022, 10:03 PMare there related tests on this connector for the incremental responses stream?should have a test validating that a incremental sync or full refresh produce the correct output, maybe not spcific to duplicate records
gunu
03/18/2022, 12:38 AMgunu
03/21/2022, 6:47 AMwhich stream is this?
Do you think the primary key is right here? id is not the primary key right?@Harshith (Airbyte) redirecting you back to this thread as it contains much of what your asking. I'm not sure what primary key is defined. it is source defined and I'm asking where this might be defined in order to try investigate further myself.
Harshith (Airbyte)
id
you can find this here https://github.com/airbytehq/airbyte/blob/eeb35872c2d30e348709589fd56480c3f9b513f0[…]s/connectors/source-surveymonkey/source_surveymonkey/streams.pygunu
03/21/2022, 7:05 AMid
is the same in the duplicated rows. however i do not see start_modified_at
in the full response recordgunu
03/21/2022, 7:06 AMstart_modified_at
in the full response recordsgunu
03/21/2022, 7:07 AM"date_modified": "2022-01-19T13:18:13+00:00",
is the same for both recordsgunu
03/21/2022, 7:09 AMHarshith (Airbyte)
03/21/2022, 7:24 AMgunu
03/21/2022, 8:05 AM_AIRBYTE_RAW
tables) and thus the deduplication process (in incremental + dedupe) will not work.
however, as for incremental. this duplicate record should still not appear as it is the same data i.e. only records after date_modified
should be inserted. correct?Harshith (Airbyte)
03/21/2022, 8:06 AMHarshith (Airbyte)
03/21/2022, 8:06 AMgunu
03/21/2022, 8:49 AMgunu
03/21/2022, 8:53 AMHarshith (Airbyte)
03/21/2022, 8:54 AMgunu
03/22/2022, 1:02 PMHarshith (Airbyte)
03/23/2022, 5:06 AMparams = super().request_params(stream_state=stream_state, **kwargs)
params["sort_order"] = "ASC"
params["sort_by"] = "date_modified"
params["per_page"] = 1000 # maybe as user input or bigger value
since_value = pendulum.parse(stream_state.get(self.cursor_field)) if stream_state.get(self.cursor_field) else self._start_date
since_value = max(since_value, self._start_date)
params["start_modified_at"] = since_value.strftime("%Y-%m-%dT%H:%M:%S")
return params
It doesn't look off for me. Can you help in understanding if there is pattern for duplicate records like it's just happening in the borders?gunu
03/23/2022, 8:27 AMgunu
03/23/2022, 8:28 AMIt doesn’t look off for me. Can you help in understanding if there is pattern for duplicate records like it’s just happening in the borders?is there a test to ensure this isn’t happening on airbyte’s end/configuration?
gunu
03/28/2022, 9:21 PMHarshith (Airbyte)
03/29/2022, 6:03 AMHarshith (Airbyte)
03/29/2022, 6:03 AMgunu
03/29/2022, 9:05 PMgunu
04/02/2022, 12:51 PMcursor_field
is not specific the survey_id? i.e. when a list of survey IDs are provided and it goes through each one to get the responses. the cursor_field
is leaking into the next survey id?Harshith (Airbyte)
04/03/2022, 1:13 PMgunu
04/04/2022, 11:24 PMWHERE timestamp > cursor_field
or is it being pulled as WHERE timestamp >= cursor_field
??gunu
04/05/2022, 12:09 AMWHERE timestamp >= cursor_field
)
e.g. my last response has "date_modified": "2022-04-04T14:51:18+00:00"
and this gets set as the cursor field.
when i apply the updated state start_modified_at: 2022-04-04T14:51:18
it returns the same responsegunu
04/05/2022, 12:51 AMgunu
04/06/2022, 12:15 PMgunu
04/07/2022, 6:27 AM