Why in-memory doesn't matter (and why it does)
(This post is a bit of a blast from the past. It was original published in 2009 and was lost during a migration in 2011. The software landscape it references is obviously dated and many links are likely broken, but the analysis is still relevant.)
Well, that title was going to be perfect flame-bait, but then I went all moderate and decided to write a blog that actually matters. So here's the low-down:
There's a lot of talk lately about in-memory and how it's the awesome. This is especially true in the SAP-o-sphere, primarily due to SAP's marketing might getting thrown behind Business Warehouse Accelerator (BWA) and the in-memory analytics baked into Business ByDesign.
I'm here today to throw some cold water on that pronouncement. Yes, in-memory is a great idea in a lot of situations, but it has its downsides, and it won't address a lot of the issues that people are saying it addresses. In the SAP space, I blame some of the marketing around BWA. In the rest of the internet, I'm not sure if this is even an issue.
Since I've actually done a fair amount of thinking about these issues (and as a result I troll people on Twitter about it), I thought maybe it'd be helpful if I wrote it down.
So let's get down to brass tacks:
How in-memory helps
In short: it speeds everything up.
How much? Well, let's do the math: Your high-end server hard drive has a seek time of around 2 ms. That's 2*10^-3 seconds (thanks Google). Yes, I'm ignoring rotational latency to keep it simple.
Meanwhile, fast RAM has a latency measured in nanoseconds. Let's say 10ns to keep it simple. That's 10^-8 seconds.
So, if I remember my arithmetic (and I don't), RAM is about 2*10^5, or 200,000 times faster than hard disk access.
Keep in mind that RAM is actually faster because the CPU-memory interface usually supports faster transfer rates than the disk-CPU interface. But then, hard disks are actually faster because there are ways to drastically improve overall access performance and transfer rates (RAID, iSCSI? - not really my area). Point is, RAM helps your data access go a lot faster.
But ... er ... wait a second (or several thousand)
So here I am thinking, "Well, we're all fine and dandy then. I just put my job in RAM and it goes somewhere between 100,000 and 1,000,000 times as fast. Awesome!".
But then I remember that RAM isn't a viable backing store for some applications, like ERPs (no matter what Hasso Plattner seems to be saying) or any other application where you can't lose data, period. Yes it can act as a cache, but your writes (at least) are going to have to be transactional and will be constrained by the speed of your actual backing store, which will probably be a database on disk.
And then I see actual benchmarks meant to reflect the real world like this. For those who won't click the link, the numbers are a bit hard to read, but I'm seeing RAM doing about 10,000 database operations in the amount of time it takes a hard disk store to do about 100. That's only a 100x speedup.
Ok, now I'm back down to earth and I'm thinking, "I just put my job in RAM and I'll get maybe a 50-100x speedup but at the cost of significant volatility". (I'm also thinking that SAP's claimed performance improvements of 10x - 100x sound just about like what we'd expect.)
This is still really really good. It makes some things possible that were not possible before and it makes some things easy that used to be hard.
And finally, why in-memory doesn't matter
But really, what is the proportion of well-optimized workloads in the world? How often are people going to use in-memory as an excuse to be lazy about solving the actual underlying problems? In my experience, a lot. Already we are hearing things along the lines of, "The massive BW query on a DSO is slow? Throw the DSO into the BWA index." [Editor's note: A DSO is essentially a flat table. Also, the current version of BWA doesn't support direct indexing of DSOs, but it probably will soon, along with directly indexing ERP tables.]
Now's the part where we who know what we're doing tear these people to shreds and tell them to implement a real Information Lifecycle Management system and build a Inmon-approved data warehouse using their BW system (BW makes it relatively easy). Then that complex query on a flat table that used to take two days of runtime will run in 30 seconds.
Well, that would be one approach, but frankly most people and companies don't have the time or the organizational maturity in their IT function to pull this off. And in this world, where people have neither the time nor the business process for this sort of thing, then it starts to make sense to spend money on it, and something like BWA is a great thing in this context.
But it's not great because it's in-memory. It's great because it takes your data - that data you haven't had the time to properly build into a datawarehouse with a layered and scalable architecture, highly optimized ROLAP stores, painstakingly configured caching, and carefully crafted delta processes - and it compresses it, partitions it, and denormalizes it (where appropriate). Then, as the icing on the cake, it caches the heck out of it in memory.
Let's be clear: BW already has in-memory capabilities. Livecache is used with APO, and the OLAP cache resides in memory. The reason BWA matters is not that it is in-memory. It matters because it does the hard work for you behind the scenes, and partially because of this it is able to use architectural paradigms like column-based stores, compression, and partitioning that deliver performance improvements for certain types of queries regardless of the backing store.
In-memory is great, and fast, and should be used. But in most ways that are really important, it doesn't matter all that much.'