I loved this recent essay from one of the top PostgreSQL developers: Memory Matters.
Under any circumstances, reading from disk is vastly slower than reading from memory, but reading data from disk sequentially is 10 to 100 times faster than random I/O. Unfortunately, it's often the case that the task which is evicting data from memory is writing data sequentially while the underlying database workload is typically accessing some working set of pages in a more-or-less random fashion. The result is that data is removed from the cache at a vastly higher rate than it can be read back in. Even after the bulk operation terminates and the cache-purging ceases, it can take a painfully long time - sometimes many hours - for random I/O to bring all of the hot data back into memory.
I think it's possible that he meant "reading data sequentially," not "writing data sequentially,", but that's a nit.
The whole essay is great. I probably give this advice about 3x a week at work, and never word it anywhere nearly so well, so it's nice to have a great reference to point to.
And I really love his two final points:
- If adding memory doesn't seem to help, it's possible that you just haven't added enough.
- Memory is different: because of the way operating system and PostgreSQL caching works, it's likely that substantially all of your memory will be in use all the time