Cloudflare #durable-objects

brett

07/26/2021, 4:50 PM

Going back to this though, are you hitting some kind of timeout in your own code? I'm not sure I follow why hitting more instances would have an effect other than possibly making it slower. Are you saying at 16 you received an error, or?

john.spurlock

07/26/2021, 5:01 PM

It's all there in the scrollback from over the weekend, but I'll try to tldr: There were two behavioral changes noticed after last weeks platform update with the big storage change: 1.) certain storage

list

calls would hang/never return and deadlock all other callers indefinitely. Kenton said he may have found the reason: a call that would return more than 16mb, and a fix for that will be included in the next platform update. 2.) I have data object instances that had been working fine previously, but started getting way slower. Storage access times up by about 10x with the same data. Each object does an initial load from storage on first access, and I noticed that every access was the first access now, even on immediate subsequent requests. i.e. the object was getting reset after every call, this makes my whole project unusable, since the initial load is considered to be the cold start slow path, not to be expected on every call. No errors were thrown, even when doing a clever "create a hanging request and see what it throws" trap that Kenton suggested. In fact, instead of trapping, I was able to have multiple requests out to two versions of the same DO instance, which should never happen.

john.spurlock

07/26/2021, 5:06 PM

So I started a minimal repro project, but it doesn't repro with only one DO instance. Then I noticed that in my real worker, it only starts occurring at the full number of 16 instances. Because like 8 or 9 of the instances are shoved into the same process! It sounds like what you're saying is this is by design? If so, it makes DO kind of an unreliable building block, as you really can't count on any of the ram being there, and the perf will vary wildly based on how the instances are mapped to processes.

brett

07/26/2021, 5:38 PM

Yeah 1) should be fixed after the release in a couple of days. 2) is odd, were you able to verify the object was reset by setting some simple in memory state and observing it vanished after every request? or were you purely going by the storage latency?

brett

07/26/2021, 5:40 PM

I wouldn't say it's by design so much as it's still a TODO we intend to fix

john.spurlock

07/26/2021, 5:45 PM

Yeah I can tell when the object is reset, keeping state in the object (https://github.com/johnspurlock/workers-do-memory-issue/blob/master/memory_do.ts#L10), and also tracking static DO state (https://github.com/johnspurlock/workers-do-memory-issue/blob/master/memory_do.ts#L150) It is both going through the slow path and the

list

calls (there are now multiple calls to work around issue 1) are slower to complete when there are too many instantiations inside one process.

john.spurlock

07/26/2021, 5:49 PM

that's awesome to hear - but not sure what to do in the meantime. my prototype was working really well before last week. Could you add something to the stub call similar to jurisdiction, where you can specify that this instance should not be shared with others? Or maybe only overmap like 2 or 3 max? Or could you bump up the memory ceiling in proportion to how many instances are created inside a single process?

zifera

07/27/2021, 11:02 AM

Hmm suddenly my DO stopped working, havent changed anything for a while

zifera

07/27/2021, 11:02 AM

Anyone else having issues?

zifera

07/27/2021, 11:02 AM

Worker threw exception error 1101

zifera

07/27/2021, 12:33 PM

Seems to work again an hour later, strange!

john.spurlock

07/27/2021, 2:27 PM

When multiple DO instances are shoved into a single process, is the Bundled CPU limit also shared, or tracked precisely per incoming request? I'm seeing requests that were not hitting cpu limits before now hitting cpu limits when too many instances are stuffed into one process.

vans163

07/27/2021, 2:55 PM

is there a way to load+run a .wasm blob inside a durable object?

brett

07/27/2021, 2:56 PM

CPU should be accurately accounted for down to the individual Object

brett

07/27/2021, 2:57 PM

Sorry, I got busy yesterday, I think it'd be interesting to see what happened if you leaned more on the new storage cache, rather than pulling things into object memory. But I do hope to fix the balancing of Objects around a colo soon

brett

07/27/2021, 2:57 PM

I can't think of any platform reasons you couldn't, is there a specific issue?

vans163

07/27/2021, 2:57 PM

i never tried, i am just asking, are you aware of any examples of how to setup the project? following the rollup template for DO projects

brett

07/27/2021, 3:00 PM

I just know of this https://github.com/cloudflare/rustwasm-worker-template

vans163

07/27/2021, 3:01 PM

awesome thankyou

john.spurlock

07/27/2021, 3:12 PM

Storage api calls are $$ tho, so app-specific in-memory caching still seems to makes more sense for mostly-read scenarios

john.spurlock

07/27/2021, 3:17 PM

Even a more sane limit would get me unstuck. Sometimes a parallel request for 20 objects that were not already active puts all 20 objects in the same process. A limit of 4 or 5 would be workable for now. It's true that all scenarios are not the same, for small objects that do light storage access, you could get away with 20 in a process. This is why it would be great to be able to specify either at a class-definition level (in the rest api?) or when making the stub call the desired "sharing" level

kenton

07/27/2021, 3:18 PM

hmm, but isn't reading the entire contents of storage into memory every time the object starts also going to be expensive? Does the object always end up using the whole data set?

john.spurlock

07/27/2021, 3:20 PM

In my case, yes. Since there is only basically one index on each object's storage, I bet many scenarios will require lots of reading/preloading.

john.spurlock

07/27/2021, 3:22 PM

Or lots of additional storage calls $$ for maintaining alternate indices : )

kenton

07/27/2021, 3:24 PM

I guess it kind of sucks that pricing ends up warping the implementation like this

ckoeninger

07/27/2021, 3:31 PM

https://github.com/cloudflare/durable-objects-typescript-rollup-esm

kenton

07/27/2021, 3:33 PM

It's true we need to improve the spread of objects across isolates. But at the same time, any in-memory cache implementation that caches a large data set really needs to track memory usage and prune the cache to stay within limits -- and it needs to account for the possibility of multiple objects in the same isolate, because eventually if you have enough objects, even if we evenly distribute them across the colo, some will land in the same isolate. Tracking memory pressure is admittedly kind of hard to do in JavaScript, but not impossible -- you could count memory usage in a global variable, so that it tracks across all objects in the isolate. Or if you rely on the built-in cache, it'll be taken care of for you. But yeah, that might incur additional billing... (I don't yet know if cache hits will count for billing, but it's possible.)

john.spurlock

07/27/2021, 3:43 PM

Isn't the unpredictable sharing of objects within a single isolate a larger problem than just storage caching scenarios tho? You have the same problem without storage in the picture. Any data structures allocated by the app (ws-connection-level stuff, encryption artifacts, things coming in from input or external fetch) are going to use memory, and it won't be good if your scenario is architected to work in dev on Monday, runs fine in production on Tuesday, but then fails on Wednesday because more object instances are now put into the same isolate. Say what you want about lambda, I've used it from the very beginning, and one nice property is that it's predictable about what you get in the container you specify, and you can architect within the advertised limits.

kenton

07/27/2021, 3:55 PM

So, an important design goal of durable objects is that it should scale to extremely large numbers of fine-grained objects. We strongly encourage developers to design for finer granularity, and we intentionally don't bill on the number of objects because we'd rather see people building apps around more, smaller objects. If we said that each object gets its own isolate, that would unfortunately blow up this goal, since even though isolates are much cheaper than containers, they are still much more expensive than what we'd like to see for fine-grained objects. What we really want to be able to do is adaptively adjust the isolate count to match what the app needs, so apps don't need to worry about it. We're still working on that, though, and it's a lot harder for large / coarse-grained objects compared to fine-grained.

vans163

07/27/2021, 3:55 PM

is there a way to get chunks of a

content-encoding: chunked

inside a DO?