Thursday, December 04, 2014

AerospikeDB vs. Cassandra on Google Compute Engine





Last month I began a new job at Aerospike (Twitter: @aerospikeDB, http://www.aerospike.com) I've been learning a lot about the company, and every day I learn more and more about NoSQL, and find a new way to be impressed at our product, and the people behind it.

Today, there was a new report released from another Blogspot account, at GoogleCloudPlatform.blogspot.com. It mirrored an earlier report from March 2014 released regarding Apache Cassandra, a different open source NoSQL database.

Many customers say, "I already have a NoSQL database. We're already running Cassandra. Why should I switch?" or "How does AerospikeDB compare to Cassandra?"

Some people try to answer those NoSQL questions simply looking at the popularity of key word searches, or the frequency of a NoSQL vendor on resumes. But I have an axiom, which is derived from "quantity ≠ quality":

popularity ≠ performance

Today's blog post at Google Cloud Platform answers those questions, and points out how popular misperceptions need to be challenged from time to time. These two separate announcements on Google Compute Engine have a very similar headline, but tell a very different story.
A million writes per second! At first blush, that's impressive. If you were a busy professional reading tweets, you might just notice the similarity of the number, shrug and move on. This is why it is always important to "double click" to get into details. Because its a questions of how much hardware you have to throw at a problem to produce those same numbers.

300 vs. 50

In the Cassandra test, which was a write-only test, Datastax was able to achieve 1 million writes-per-second using 300 Google Compute Engine virtual machines. They generated the traffic using 30 more machines. Giving credit where credit is due, they were able to achieve an impressive 10.3 ms median latency, with 95% of writes within 23 ms.

In the Aerospike write test, we were able to achieve the 1 million writes-per-second using 50 Aerospike servers. Traffic was generated from 20 client nodes. 83% of the writes were 7ms. And 96% of the writes were accomplished within 32 ms.


Another detail to note is that the Cassandra test was using 170 byte test writes, whereas Aerospike was using a 200 byte payload. So that's an additional 17.6% more raw data throughput.

AerospikeDB was able to do +17.6% more throughput on 16.67% (1/6th) of the hardware. A back-of-the-napkin calculation would suggest that the throughput per server on Aerospike was about 4 MB of data per second, whereas Cassandra got 566 KB. That would be just about 7 times the throughput performance.

Which would you rather pay for? 300 servers or 50?
We scaled even better in reads; we only used 10 machines to do 1 million reads-per-second.


There's a lot more to be read in the studies, and I encourage everyone to do their own research, benchmarks and testing into the claims of any vendor, including my own. Feel free to go to Aerospike.com to discover more about our product architecture, download the software, read the docs, and get in touch with the user community and follow what they've been doing.


If you want to get in touch with me directly to learn more about Aerospike's NoSQL DB, you can contact me via these mechanisms:

Peter Corless

e pcorless@aerospike.com
t @PeterCorless

m 650-906-3134

No comments:

Post a Comment