pinned_vec - invalidating iterator invalidation
- 4th December 2021
Alternate title: Fun with Virtual Memory
Anyone who has used the
std::vector class in C++ should be familiar with the concept of iterator invalidation.
That doesn't sound like me, what is iterator invalidation?
For the uninitiated, an iterator in C++ is like a pointer to an item in a range of values, that can be used to iterate that range. You can pass around pairs of start and end iterators to represent a range. The advantage of this is that the code receiving the iterator pair doesn't need to know what container they come from, so you can implement a generic algorithm like summation which can operate on any arbitrary range. If you're familiar with C#, it serves a similar purpose to
IEnumerable, or generators in python. When dealing with a flat block of memory, they are commonly implemented as just plain old pointers to the start and end of that block.
Because C++ is a language that gives you more than enough rope to hang yourself, you can of course store these iterators somewhere, and then modify the collection that owns them, or modify the collection while you're iterating it, or anything you can think of really. Of course, unless you're very careful, this will be undefined behaviour which can cause bugs and crashes.
auto end = vec.end(); // saving an iterator
// and then using it later on (comparison)
for (auto it = vec.begin(); it != end; ++it)
Totally fine, according to your compiler
The implementation of that collection might well decide to reallocate its storage (move its contents to a new, bigger block of memory and return the old block to the operating system), and now all your iterators are pointing into memory that has been freed. The iterators you had are no longer valid. This is iterator invalidation. To give a concrete example, when you append an item to an
std::vector (the standard resizable array class) by calling
push_back, you are required to abandon all your existing iterators, as the implementation could reallocate. In practice it won't do this every time you push, as it reserves a bit of extra space, but you don't know when it will reallocate so you have to act like it might every time.
In general, when iterators are invalidated, any pointers into the range will be invalidated also, so I'll use the terms pointer and iterator interchangeably (iterator invalidation just has a certain ring to it. It's an Attractive Alliteration).
Wouldn't it be convenient if we didn't have to worry about iterator invalidation? We could have elements in a vector that hold pointers to other elements in the vector, pass pointers and iterators around willy-nilly, and live without fear of the dreaded nasal demons. What would be required for that to work? Well the problem is that the address of the block of memory we are using changes right? So we just need to make sure we always use a block of memory which starts at the same address. But there's a problem with that, we need to do other allocations, and besides, we're not the only program running on this machine, someone else will grab the next bit of memory after us and we'll have no room to expand. Well, enter...
We can use virtual memory to present a fragmented series of blocks of physical memory as a contiguous block of memory in virtual address space.
What is virtual memory?
Virtual memory is a feature of most modern processors that adds a layer of indirection between the CPU and RAM. You have your physical RAM, with a range of addresses, let's say for example you have 10 bytes of ram numbered 0-9. When your program asks for the memory at address 5, it doesn't just look up the data at physical address 7, but instead it first consults the page table. The page table is a data structure maintained by your OS that maps your programs pointers (which are in virtual address space) to the physical address space of your real memory.
So you might have a page table mapping that says "map virtual addresses 5-9 to physical addresses 0-4". This would mean your lookup for the data at virtual address 7 would end up reading the physical memory at address 2. The OS maintains these page tables, but the CPU has special hardware to use them, so the hardware and software work in tandem to keep the whole arrangement working. The greatest thing about this system, is it allows your OS to isolate different programs from each other, by keeping a separate page table for each process. This means that one buggy program can't overwrite the memory owned by another process, because it just has no way to refer to it, those pages are not mapped in its page table. It is also the foundation of pretty much all sandboxing and local security measures.
So we start with an initial block of memory mapped to some physical ram, then when we need more we just map some pages at the end of that block's virtual address space?. We still have a problem here, how do we stop malloc and friends from using the virtual address space immediately following our block? Well, we can reserve pages. And here's where it all comes together: these days, we're all using 64-bit operating systems. that gives us enough address space for 16 exabytes of ram. I don't know about you, but I haven't got *nearly* that much physical ram in my system (hi 2050, I hope that sentence aged badly).
Since just using virtual address space doesn't inherently use more physical RAM, this means we can afford to waste virtual address space. A lot of virtual address space. Unfortunately, OS developers already swooped and nabbed some of that free real estate, but we've still got a solid 17 terabytes of address space to play with, which is plenty for our purposes.
Here's a pie chart comparing the RAM in my PC (64 GiB) to the 64-bit address space.
What we can do is reserve a huge chunk of address space, way bigger than we could possibly need, say a whole terabyte. Different OSes have different ways of doing this, but at least Windows and linux do allow it (side note: I really enjoyed reading the Windows memory management documentation pages. The page on VirtualAlloc2 has a pretty cool example of using virtual memory to implement a ring buffer that blew my mind when I first read about it). Now the rest of the system will leave our huge chunk of virtual memory alone, and we can map and unmap pages inside that range as we see fit.
So, that's about it. We use the OS-specific memory management utilities to set things up, and wrap it into a trio of allocate, reallocate, and free functions, implement a basic copy of std::vector on top of that, then
push_back() to your heart's content. But there's one last extra bonus...
When a traditional vector class reallocates, it has to move the old data into the new storage. Depending on what the vector is storing, this can get quite expensive. Even if it is just storing plain old data, copying all those bytes gets slow when the vector is large. With pinned_vec, we don't need to do that, so we should see a performance gain.
Benchmark of growing a vector of ints to specified size, on Windows 10.
At smaller sizes, the std::vector wins out, but for larger vectors the pinned_vec lack of a need to copy takes over. Interestingly for the larger sizes the percentage lead starts to slide, maybe Microsoft is doing some similar optimisation inside std::vector at large sizes? Testing on linux I didn't see this, the pinned vector gained a substantial lead around 512 KiB and then kept it as far as I tested.
So should I actually use this thing in production?
I'm not, but I might some day. It's available here if you want. Support -100% guaranteed!