Hey Gang, Curious if anyone here has had experien...
# general
m
Hey Gang, Curious if anyone here has had experience with yugabyte db (self hosted oss, or managed, anything really) How is it holding up for you? Is it delivering on the promises? latency issues (read after write and such), just spill it all out... Bonus question: Did you move from MySQL (using their migration tool)? how was that experience It's not really on the horizon, as we are quite entrenched in MySQL for now, but I'm super curious... It sounds like the next level Cassandra (which I love), but with ACID(? really?) - so perhaps they deliver? I wonder how much a developer who is used to "regular" RDBMS (single master) needs to learn and watch out using this (the transition into Cassandra and all the eventual consistency models is a very steep cliff to climb) link for reference: https://www.yugabyte.com/ Danke
r
I’ve evaluated it and also have seen some benchmarks from its competitors. It really does deliver the crazy performance it boasts
1
Haven’t checked its distributed qualities, but trust they’re doing strides in that area
I’ve also explored ‘less disruptive’ approaches like Citus extension for PostgreSQL
Ultimately, decided that unless you have massive operational data you need at your fingertips and all at once, it is (to me) preferable to keep it as a single node data set with no sharding. (not talking about HA setup with warm or hot standby) And also that it is better to work on the data model, making it suitable for partitioning, ultimately archiving old data to either cheaper disks or just not-indexed rarely-accessible tables (only by primary key). This makes sense for use cases where you have hot data with a requirement to keep the latest data accessible and queryable, and have older rows (documents, entities) at least directly accessible, e.g via permalinks. Twitter could be the best example of such system. I believe they cache the latest timeline, but you can still access individual tweets via direct links
💯 1
d
I think Yugabyte vs Cassandra/HBase/Mongo/etc. is similar to Pulsar's problem vs Kafka - they solve an interesting optimisation and architectural problem at large enterprise scale, but how many businesses actually need to move to that particular optimisation vs the better known alternatives? Yuga toutes a better metro cluster, and good performance, Cassandra claims to be the generalist king, Accumulo for govt security requirements, Mongo is the most accessible for newcomers, and nothing scales bigger than HBase if you know what you're doing.
r
I’m cautiously optimistic about FoundationDB. Granted it’s unreasonable to spend precious engineering resources on building layers on top of it in place of just using PostgreSQL/MySQL, but seeing other teams like https://www.tigrisdata.com/ building open-source MongoDB layers on it is exciting
@Daniel Chaffelson Pulsars and Kafkas pave the way for the next generation of systems that either build on the same protocol (redpanda) or force the incumbents to improve (rabbitmq streams)
d
@Rauan Mayemir yeah I agree, there's plenty of good challengers in the RT message handling space, I think the parallel here to the big NoSqls is that Pulsar came later, solved a specific variant problem, but it turned out not as many people really had that problem and they (IMO) didn't beat Confluent in the marketing battle for who defined the overall message. Yugabyte needs to define a space between easy with Mongo, best single-vendor with Cassandra, or default integrated platform with Cloudera, and that's without taking on the 3 big cloud providers. It's a tough space.
disclaimer wise I used to be at Hortonworks and compete against these guys, but that was a little while before Yugabyte became relevant on the scene. quite a few ex customers and colleagues over there now! These days I work on ClickHouse which solves yet another different kind of specialisation
👍 1
r
clickhouse is an engineering marvel. i can’t wait to increase the complexity budget and bring it to the stack
m
@Daniel Chaffelson @Rauan Mayemir I'm specifically interested in the RDBMS aspect, this is very different in almost everything from NoSQL. Sharding is nice, but I can achieve that in other means, this is the approach of Spanner for example, which is why the per instance capacity is limited there. I want to know how much data/capacity can a single cluster of yugabyte "carry" with the multi-writer multi-reader configuration (in a single region).
d
I guess you will need someone who has directly worked on it then, and thus your original question - my best advice is always to test your specific data and use case in a PoC and not trust benchmarketing figures though, as hidden assumptions about CAP tradeoffs or required functionality often come to light and need to be addressed, particularly with 'new' database engines. We recently won a PoC because the customer assumed the other vendor (I won't name them) could do specific kinds of joins, but it turns out they couldn't and we could, and it made the difference in the end. I guess from your description that you have a particular kind of performance profile you are pushing the boundaries of on MySQL. I hope you find a good answer!
m
Yeah, there is no replacement for POCing and testing, we are right now testing AWS Aurora for a migration, and the plan is for a months long process, just testing various aspects. I'm looking for a "not sales pitch" review from an unaffiliated engineer who used/uses it in production, so I get a feel of gotchas and other things I won't get from the vendor without very pointed questions, or waste a LOT of time to verify myself
👍 2
g
Moshe, if you tweet your question, I can retweet and try to reach more people. I don't know anyone using Yuga myself, but someone must be using them...
c
My family is good friends with some of the leadership at YB (so I might be biased), and have studied their architecture pretty extensively (never used it outside of a KIND cluster on my own laptop though). It's almost 100% API compatible with Postgres (last I checked, it didn't support all types of triggers but had everything else). The way they do that is they actually just git fork the Postgres query parsing layer and rewrote the storage layer below in a distributed manner. Transactions use Raft consensus protocol. It's a CP system (when the leader crashes, it doesn't accept writes but does accept reads.) Depending on configuration, failover is generally around 2-15 seconds as they support multiple heartbeats per second, so failure detection is quite fast. They also have a large hierarchy of rack-awareness, I think it goes region -> datacenter -> node, whereas Pulsar only has datacenter -> node and kafka only has rack. (I don't know about the hierarchy in Cassandra) Fantastic product. But their problem is that 1. They haven't won the marketing battle yet 2. Not everyone needs ACID transactions across multiple clouds and multiple regions.
🙌 1
YB also has strongly-consistent secondary indexes, which Cassandra doesn't have.