Now, at this stage there are two possibilities: either (a) there is a memory trample, but for whatever reason we only see it on one platform, or (b) the memory profiler code is not working correctly on that platform.
My first step was to put a breakpoint on the line of code that spits out the warnings and run the code in a debugger on the target platform.
Execution stops, and I examine the code in some more detail. The code that fires the warning works as follows: When a pointer to memory is about to be freed it is passed to the memory profiler, which calls a "GetMemSize" function to determine the size of the allocation, then fetches a value from the last 4 bytes of the block and checks if it is the expected value, if it is not, a warning is printed.
The callstack does not make me happy. The code freeing the memory is my code - it's my vector (array) template class. This is code I've brought with me and has been in production use on shipping projects for years. It is unlikely, in my opinion, that this code is broken, but not impossible - so I keep looking. I look at the code using the array - is it doing anything untoward, the array class has assertions on the element access to trap bounds errors, but the iterators do not. The code using the array class does not (that I can see) take iterators. It only uses: push_back, pop_back and operator[]. All safe operations.
So time to take a step back. The profiling code itself is quite new, lets look at it in more detail. Check where the trapped value is supposed to be set. And now I think I'm onto something.
Our memory manager is layered - Micro-Allocator, Turbo-Allocator and then the system (vendor/OS) allocator. The memory profiler is currently turned off for allocations that go via the micro and turbo allocators, so maybe there is a mistake in the way the #ifdefs have been done that means the value is being checked, but not set up at all.
This seems like a good lead at first, and I follow it for a while, but it turns out the mess of #ifdef is set up correctly, the allocation in question is going via the system allocator, and the profiler is inserting it's pad value at the end correctly.
Back to square one.
The problem is happening at start up, which is deterministic, and the OS uses a fixed memory model that means allocations always end up at the exact same address run after run. Time for some hardware breakpoints.
So I run again, find out the address of the value the mem profile system is complaining about, and breakpoint on write, then restart the run.
Skipping over earlier hits. And then nothing. Nothing relevant. Nothing writes to that byte. It is unused. The value is not touched.
What on earth?
Backtrack. Double check my numbers. That is the right address. Break point the set up function. Breakpoint the creation of the array. Step though lots of code. The code is setting up the boundary value - but not in the place it is being fetched from.
So. It's not the value at the end of the array getting stomped, but the address the code is getting for the end of the array is wrong! Perhaps it's the value at the start of the array is getting stamped?
New hardware breakpoint, at the start of the array this time. Run. Watch were we stop. The breakpoint hits in system malloc and system free. Makes sense. The callstacks look right. Hits malloc, free, malloc, free, malloc, free. Except nothing that that shouldn't touch that memory is touching it. We just get a bunch of mallocs and frees and then the system falls over.
What on earth?
And then I notice it. The system that is using the array is an XML parser. There isn't one array - there are loads of them. And this is the mind boggling part: The moment the call to free stamps on the size value of array 'A' was not the moment that the memory owned by 'A' was freed, it was when some other array was freeing memory.
So: We have a two blocks of memory allocated by the system. At the start of each one, we have a 4-byte int for the total size of the block, and at the end of each one we have the guard value. We know the location of the check the guard value by checking the size, but when we free 'B' the size of 'A' changes.
Nice.
Lets have a look at GetMemSize. For the platfrom in question GetMemSize(ptr) returns ptr[-1]-5. Note the vendor in question does not provide a memstat/memsize function so this function is the result of deduction and reverse engineering.
And it's wrong. The version of the function for another platform by the same vendor has a hack of a work around that does: "if (memSize % 2 != 0) memSize = memSize + 1;".
Oh Hello.
So, it seems like the system memory manager uses those first 4 bits for more than just the size. After all, the memory allocations are all 4 byte aligned, so those first 3 bits are not needed, and it seems like they are used for something else, perhaps to do with the availablity of neighboring blocks for use with realloc? Who knows.
But the fix now is clear, mask of the bottom bit, and we are good.
RwUInt32 memSize = (Ptr[-1]-4)&~1;
Perhaps I should mask of the bottom 3 bits after all?
Who knows, it all seems to be working now, so best leave well alone...
< Wheel of Fortune | If it's Friday, I must be diarizing > |