wheybags' blog: Recursive COW pages in userspace

COW stands for Copy On Write. Modern OSes will give you "pages" of memory (chunks of a predefined size) when you ask them to allocate space for your data. Unix-like OSes famously use copy on write pages to implement fork(2). The process is cloned when you call fork, so naturally you'd need to copy all the memory that program is using into the clone right?

Well, that would be fantastically wasteful. You need to spend all those cycles copying data, and double your memory footprint. It turns out, pretty often when you call fork you don't need a lot of the data in memory in the original process. Often you might never modify some of the memory, only read it. This is where COW comes in. You can create a cloned process which actually shares memory with the original process. Doesn't that mean that you'd introduce all sorts of race conditions when the two processes read and write the same memory though? Well, through the magic of virtual memory, the OS can detect when you actually write to the duplicated memory, and delay the copy until just before that point. This way you only pay the price for copying the memory you actually use.

What is virtual memory tho

Virtual memory is a feature of most modern processors that adds a layer of indirection between the CPU and RAM. You have your physical RAM, with a range of addresses, let's say for example you have 10 bytes of ram numbered 0-9. When your program asks for the memory at address 7, it doesn't just look up the data at physical address 7, but instead it first consults the page table. The page table is a data structure maintained by your OS that maps your programs pointers (which are in virtual address space) to the physical address space of your real memory.

So you might have a page table mapping that says "map virtual addresses 5-9 to physical addresses 0-4". This would mean your lookup for the data at virtual address 7 would end up reading the physical memory at address 2. The OS maintains these page tables, but the CPU has special hardware to use them, so the hardware and software work in tandem to keep the whole arrangement working. The greatest thing about this system, is it allows your OS to isolate different programs from each other, by keeping a separate page table for each process. This means that one buggy program can't overwrite the memory owned by another process, because it just has no way to refer to it, those pages are not mapped in its page table. It is also the foundation of pretty much all sandboxing and local security measures.

Side note, this is also (one possible reason) why calloc is preferred over malloc + memset(0). calloc can give you a COW reference to a read only zero page, repeated N times, and only actually bother zeroing some real memory when you try to write on top of it.

Tom Mason

Recursive COW pages in userspace