Out of curiosity, let's measure the performance of memory access from Java. We limit the test to summing values of a single long array. The memory size of the array is 0.8 MB when testing the cache and 400 MB for testing the RAM. The array is accesses either sequentially or using random access. So, we have a total of four test cases: cache with sequential access, cache with random access, RAM with sequential access, and RAM with random access.
My setup: an i7-3687U CPU (2.10 GHz), Ubuntu 12.04 64-bit, 4 MiB cache, Oracle JVM 1.7.0_25 (64-Bit, build 23.25-b01). The "-server" flag is used when running Java. Computer: Toshiba Portege Z930-14L.
My Java code. Consider it public domain, do whatever you like with it. It's all in one class and it has no dependencies besides a JVM. There are deficiences, of course. Code style has sometime been sacrificed for performance. Loop unrolling did not improve performance, however. There are many other things that happens then just accessing memory. Computation of the next index is a significant part of the cacheRan test.
Test Access time cacheSeq 0.33 ns, 0.69 cycles cacheRan 3.16 ns, 6.64 cycles ramSeq 0.56 ns, 1.37 cycles ramRan 20.07 ns, 42.16 cycles
Sequential access to the long array is surprisingly fast. It is interesting to note that accessing data from the cache in sequence results in less than a cycle per access. I believe this could be because SSE instructions are used. I have not confirmed this. It could also be because of Intel's turbo boost technology. The cycle time assumed is based on a frequency of 2.10 GHz.
Sequential access in RAM is on par with the result for sequential access to cached data. Random access to the cache is significantly slower (3 ns) and random access to the big RAM memory is many times slower (20 ns).
Cache-aware algorithms are important. More on that later.
Check the memory access time on your computer! Just run the Java class provided. Add the result as a comment, please.
UPDATE: Needless to say (?). It is very seldom a programmer would ever need to consider making his code cache-aware. Other things (disk/network IO) usually limits application performance. And the value of the product is seldom limited by its CPU performance anyway. Cache-aware algorithms are however interesting for me while creating BergDB. I aim at very high performance for data that fits into RAM - it should be on par with databases that assumes all data fits into RAM. It is also interesting for those who writes code for basic data structures: maps, set, lists and so on.