Cloudflare #durable-objects

Jens

07/14/2021, 8:33 AM

Is it a good idea to connect a client (browser) to a worker via WebSocket. Then subscribe to multiple "topics" (DO) by initiating multiple WebSocket connections to topic DOs and multiplex the messages back to the client from the worker. The idea is to avoid having the client to open multiple WS if they want to listen to multiple topics. Is this a pattern you'd say DO is good for or would you do it another way? Any limitations? E.g. duration of WS connection, extreme cost? Or can this be considered a good pattern?

kenton

07/14/2021, 2:41 PM

1 - DO reads are always satisfied within the local network, which shouldn't take more than a few ms. This differs from Workers KV, which will handle reads locally if they hit cache, but otherwise may take much longer to fetch over the internet. Before this week, all DO reads took about the same amount of time, as they all involved local network communication. This week's changes add an in-memory cache that lives in the same process as the worker itself, so reads that hit cache should be basically instantaneous. 2 - I'm not sure if we have plans to increase it, but if we do, it would likely be a minor increase, like doubling it. To store a large blob, you should either split it across many keys, or use Workers KV or some object storage instead of DO storage. Think of DO storage key/values like database rows, they aren't optimized for huge values. 3 - We are rolling out an in-memory caching layer this week. This should strictly improve performance compared to what you've seen in the past. 4 - No. If a DO needs access to another one's data, it needs to send a fetch request to the other DO, and the other DO needs to return the data in a response. This is fundamental to the architecture, not likely to change.

Wallacy

07/14/2021, 5:15 PM

Thanks @User and @User .... The problem maybe related on what im trying to do. I have a lot of text/documents to process. Most of then has less than 32KB, but few of then can be 200KB or more. I can split the document sections (or put on the KV like im doing now). But the major problem is: If i store each document on a diferente key, the process to load all of then in memory to make the data processing is very, very long, i will test again today, but last week i experience was not great (several seconds). I was planing to put several docs inside one KV/Value to boost that, but the problem is rewrite the entire 25M data block for each text/docs change; Then I load 6-10 DOs in parallel to process some amount of keys (thats why asked about storage sharing). I cant load everything in one DO because the 128M memory limit. I can send a put/post to the target DO with the data set but i was not able to do that without consume the entire memory of the base worker, thats why each target DO is reading the data by themselves. If i can at least read quickly (less than 10ms) and/or stream the data to other DOs (using fetch, whatever) without load in the worker memory similar what we do with the request.body (ReadableStream) will be nice.

kenton

07/14/2021, 5:44 PM

Have you tried parallelizing reads? Like:

Copy code

let promise1 = kv.read(key1);
let promise2 = kv.read(key2);
let value1 = await promise1;
let value2 = await promise2;

Amenadiel

07/14/2021, 5:46 PM

Same here. I'm moving ti DO, but my former logic relied in KV and schedule to grab huge csv. Couldnt parse them on the fly due to cpu limits. Now I cannot seem to do it with DO storage, but I Will try passing a classic KV in the env. These csv are 500kb and have like 500 rows, but they mean nothing when inspected one by one.

kenton

07/14/2021, 5:47 PM

note you can do streaming reads from KV using

kv.get(key, "stream")

, this returns a

ReadableStream

that you can use as your response body. Similarly

kv.put(key, request.body)

will stream a request into KV without reading it into memory. (These apply to Workers KV, not durable object storage.)

kenton

07/14/2021, 5:49 PM

with DO storage, you can do "streaming" using

list()

to fetch a few keys at a time. Since each value can't be more than 32k, you would have to split large blobs across several keys.

Amenadiel

07/14/2021, 5:49 PM

This is not a flaw anyway, it was just puzzling to find out they are so different to KV. Lack of metadata and ttl hurt tho

figgyc

07/14/2021, 5:51 PM

just to chime in as an additional bonus tip to this, if you're trying to read an arbitrary number of chunks it'd probably be much cleaner code and easier to write with

Promise.all

Copy code

js
let promises = [kv.read(key1), kv.read(key2)] // and so on - plus you could like generate this array
let values = await Promise.all(promises)
console.log(values)
// -> ["value1", "value2"]

figgyc

07/14/2021, 5:52 PM

although i guess with your way you could like do things with value1 while value2 is fetched in the background

Wallacy

07/14/2021, 6:38 PM

Yes, i did that on KV! Im still expected be a little more quickly. I will try again using using the DO storage.get(keys) 128 keys at time to compare... (if we cat use more keys in the future maybe nice)

Wallacy

07/14/2021, 6:39 PM

For KV is simple just make each DO read directly! But good to know.

Wallacy

07/14/2021, 6:41 PM

I will try that too! Thanks.

Amenadiel

07/14/2021, 9:02 PM

If I grab the DO stub with getIdFromName and I use the request URLs as name, there would be as many instances as unique URLs right?

john.spurlock

07/14/2021, 9:07 PM

Will the new in-memory cache for DO count against the 128mb ram limit for the instance?

Vanessa🦩

07/14/2021, 9:09 PM

Yes.

kenton

07/14/2021, 9:56 PM

Sort of but not exactly. The cache itself has the same memory limit as the isolate -- that is, if you put() more than 128MB of data into the cache all at once without giving it a chance to write back the data to disk, your isolate will be terminated. This is sort of a difficult thing to do (you'd have to be storing data that you are generating programmatically?), and you can avoid it entirely by making sure to

await

all writes (this will apply backpressure as needed). But the amount of memory your isolate is using, and the amount of memory the cache is using, are at present not connected; the limit is applied separately to each, not the sum total. We might link them more closely in the future, but if so what we'd do is make the cache automatically evict entries as needed so that the total memory usage stays within the limit.

kenton

07/14/2021, 9:58 PM

in short, you shouldn't have to worry about the cache's memory use, and you definitely don't have to worry about it if you make sure to

await

your `put()`s (but that only matters in unusual scenarios).

Vanessa🦩

07/14/2021, 10:16 PM

Interesting ... currently I'm storing all data asap to protect against crashes, but also keep a copy in memory for quick access if needed. We're writing much more often than reading though (streaming everything to all clients immediately, writing asap, sending backlog only when new clients join, reading backlog only when DO resumes). Since we don't know in advance if a new client will ever join this session, some of the memory is "wasted". Would there be any advantage in relying on the fast readback from memcache over keeping it in memory as we do now? We limit DO memory usage already.

kenton

07/14/2021, 10:22 PM

Yes, it would probably make the code simpler if nothing else. Note the memory cache stores data in V8 serialized form, so the only possible concern would be if you're storing complex data structures that are expensive to serialize, but I'd be surprised if that had a real impact on performance. In fact it might actually improve performance by utilizing CPU caches better (since the serialized forms are much smaller than the object trees they'd parse to).

kenton

07/14/2021, 10:25 PM

BTW another part of this change which might matter to you... if you don't await a put(), we will now implicitly await it for you before we let you return a response, so that people don't have accidental premature confirmation of writes. But if you actually don't want to wait, and you are OK with the possibility of rare data loss (e.g. if the power went out before the write completed), then you should do

put(key, value, {allowUnconfirmed: true})

Vanessa🦩

07/14/2021, 10:29 PM

With WebSockets, that would only apply to the initial response right? Or every message?

john.spurlock

07/14/2021, 10:31 PM

But you'd get charged for reads when using the memory cache, right? That would seem like an important difference (vs keeping the data around yourself in memory)

kenton

07/14/2021, 10:31 PM

it applies to all outgoing messages of any type (including outgoing fetch() requests)

kenton

07/14/2021, 10:31 PM

I'm not sure what the billing plan is. It might be you get charged for reads even if they hit cache.

kenton

07/14/2021, 10:31 PM

(headed out the door now, ttyl)

john.spurlock

07/14/2021, 10:33 PM

Thanks for the info, it's very helpful!

kenton

07/15/2021, 4:22 PM

FYI, this started rolling out globally two hours ago, and should be everywhere in another hour or so. If anything weird started happening in the last couple hours let me know.

Wallacy

07/15/2021, 4:31 PM

Will all locations have DO now?

ckoeninger

07/15/2021, 4:41 PM