https://linen.dev logo
y

Yiqing Wang

08/21/2021, 7:03 AM
Hello team, we just created a PR #5561 for the new
DynamoDB
destination. Please let us know if anything else needs to be done!
u

user

08/23/2021, 11:07 AM
This is awesome!
u

user

08/23/2021, 11:07 AM
We'll take a look at it and walk through it with you; thanks 🙂
u

user

08/23/2021, 2:38 PM
@Yiqing Wang nice! thanks for reaching out I’m trying to think through the UX for consuming data written by this connector. What’s your use case with dynamo db/how do you intend on querying the data? I’m specifically wondering whether the UUID key is sufficiently usable
u

user

08/23/2021, 9:15 PM
As we are using DynamoDB as the database for our backend system, it would be great to enables Airbytes using DynamoDB as the destination. As I can query the data using
JavaBaseConstants.COLUMN_NAME_DATA
in DynamoDB, I think UUID is good enough here.
sync_time
as the sort key in the PR is used to limit the query time range.
u

user

08/25/2021, 4:21 AM
@Yiqing Wang when you say you query the data using
COLUMN_NAME_DATA
, are you doing a full scan of tables in dynamo to consume this data? isn’t that very expensive? 😅 would it be better to key the records in dynamo on their primary key rather than a random UUID?
u

user

08/25/2021, 5:55 AM
What is the default primary key for a record? Is it determined by the specific data?
u

user

08/25/2021, 6:09 AM
sometimes the data declares the primary key in the catalog yes
u

user

08/25/2021, 6:10 AM
my main concern is whether a random UUID primary key is very usable for the user without having to do a full scan of the data. Maybe a full scan is acceptable here though? I’m not sure how else the data would be added. Might I suggest we use the PK as the key when it is present, and otherwise using the random UUID?
y

Yiqing Wang

08/25/2021, 6:18 AM
May I ask how can I dynamically parse the primary key from catalog without knowing the key exists?
u

user

08/25/2021, 6:20 AM
you can inspect the catalog and for each configured stream, seeing if the
user_configured_primary_key
field is set
u

user

08/25/2021, 6:20 AM
although the downside here is that the only method of adding data is an overwrite… (if we use the primary key)
u

user

08/25/2021, 6:25 AM
We have a sync_time in place using as the sort key. So the partition key (
user_configured_primary_key
) and the sort key will play together as the grouped primary key. Thus we can add the data without using overwrite as long as the partition key is unique for each sync.
u

user

08/25/2021, 6:26 AM
that would be excellent
u

user

08/25/2021, 6:27 AM
Nice, we will try to implement this change.
u

user

08/25/2021, 6:43 AM
In class ConfiguredAirbyteStream, why the primary key is List<List<String>>?
y

Yiqing Wang

08/25/2021, 6:43 AM
Copy code
@JsonProperty("primary_key")
public List<List<String>> getPrimaryKey() {
    return primaryKey;
}
u

user

08/25/2021, 6:49 AM
A primary key is described by a list of strings which indicates the “path” of the field in case it is nested in the record. A composite key consists of multiple lists
u

user

08/25/2021, 7:17 AM
We find there is no user_configured_primary_key in the testing files, so it is hard for us to test this feature.
4 Views