Anyone else here work on “SaaS platform scaling”? ...
# general
j
Anyone else here work on “SaaS platform scaling”? Looking for kindred spirits to talk scaling with
g
I think many of us do 🙂 I'm curious which specific aspects are you currently looking into...
j
Wondering what other people do to bring their fellow devs around to “testing at scale” I’m continually running into features that work for average sized customers, but fail completely for 10x customers. It’s not something that you can catch with unit or integration tests
g
100% I'm honestly surprised that there isn't a top down mandate from the business to make sure all new features work for your largest customers. These are usually the ones who are 90% of your revenue and have a dedicated customer success team...
If you have low-hanging fruits that you think will make a difference, this is a great place to start. Some ideas which help in some cases • running integration tests against old copy of production • JMH testing • Production canaries of significant size (sometimes there is an internal "dogfood" customer that can serve as a good-size canary). Full-fledged load testing is usually so much effort that it requires the business to agree that it is a priority and make sure developers have time. Getting a framework and writing tests is time consuming but a one-time investment. Analyzing the results on a good cadence is the real heavy lift.
💯 1
btw, what absolutely didn't work for me is a one-time project to set up a huge test environment and write some load tests. Without on-going commitment to invest in making sure things really work at scale, developers started ignoring the load tests result because the time they take to analyze will be seen as "a waste".
cc @Tal Borenstein - since "failing tests fatigue" may be a related problem to the "alert fatigue" that he's currently solving.
👀 1
t
Interesting and would love to hear more, @Jeffrey Sherman is there a specific use case that you can tell us to better understand? maybe through an example?
j
Recent one I ran into was a page that shows the count of Contacts associated with a List. (There’s some business logic with “associated” so it’s much slower than you’d think from the description) The initial implementation had a single endpoint gather all the data. Works fine for average customers with fewer than 10,000 contacts. When you get to millions of contacts, the endpoint would time out (3 min +) It’s easy to split the logic into 2 endpoints - 1 to get a list of Lists, 1 to get the count for any list. Much harder to get the developers to think about scale in the first place
💯 2
g
so relatable!