Print Story Tales from the Codeface
Working life
By codemonkey uk (Fri Mar 20, 2009 at 08:39:15 AM EST) (all tags)
This week is a milestone week, the end of MS12 is upon us, and my code is holding up well.  I've been at 0-bugs most of the week, and have spent most of my time doing some testing, and helping out with debugging miscellaneous bugs.

Including one particularly tricky problem described within.

On Wednesday I got an email from the technical lead.  Our memory profiling system had been spitting out a warning on one of our target platforms that indicated a memory overwrite/trample.  Both the technical lead, and the programmer who implemented the system were snowed under, so could I take a look and find out what the problem was.

Now, at this stage there are two possibilities:  either (a) there is a memory trample, but for whatever reason we only see it on one platform, or (b) the memory profiler code is not working correctly on that platform.

My first step was to put a breakpoint on the line of code that spits out the warnings and run the code in a debugger on the target platform.

Execution stops, and I examine the code in some more detail.  The code that fires the warning works as follows: When a pointer to memory is about to be freed it is passed to the memory profiler, which calls a "GetMemSize" function to determine the size of the allocation, then fetches a value from the last 4 bytes of the block and checks if it is the expected value, if it is not, a warning is printed.

The callstack does not make me happy.  The code freeing the memory is my code - it's my vector (array) template class.  This is code I've brought with me and has been in production use on shipping projects for years.  It is unlikely, in my opinion, that this code is broken, but not impossible - so I keep looking.  I look at the code using the array - is it doing anything untoward, the array class has assertions on the element access to trap bounds errors, but the iterators do not.  The code using the array class does not (that I can see) take iterators.  It only uses: push_back, pop_back and operator[].  All safe operations.

So time to take a step back.  The profiling code itself is quite new, lets look at it in more detail.  Check where the trapped value is supposed to be set.  And now I think I'm onto something.

Our memory manager is layered - Micro-Allocator, Turbo-Allocator and then the system (vendor/OS) allocator.  The memory profiler is currently turned off for allocations that go via the micro and turbo allocators, so maybe there is a mistake in the way the #ifdefs have been done that means the value is being checked, but not set up at all.

This seems like a good lead at first, and I follow it for a while, but it turns out the mess of #ifdef is set up correctly, the allocation in question is going via the system allocator, and the profiler is inserting it's pad value at the end correctly.

Back to square one.

The problem is happening at start up, which is deterministic, and the OS uses a fixed memory model that means allocations always end up at the exact same address run after run.  Time for some hardware breakpoints.

So I run again, find out the address of the value the mem profile system is complaining about, and breakpoint on write, then restart the run.

Skipping over earlier hits.  And then nothing.  Nothing relevant.  Nothing writes to that byte.  It is unused.  The value is not touched.

What on earth?

Backtrack.  Double check my numbers.  That is the right address.  Break point the set up function.  Breakpoint the creation of the array.  Step though lots of code.  The code is setting up the boundary value - but not in the place it is being fetched from.

So.  It's not the value at the end of the array getting stomped, but the address the code is getting for the end of the array is wrong!  Perhaps it's the value at the start of the array is getting stamped?

New hardware breakpoint, at the start of the array this time.  Run.  Watch were we stop.  The breakpoint hits in system malloc and system free.  Makes sense.  The callstacks look right.  Hits malloc, free, malloc, free, malloc, free.  Except nothing that that shouldn't touch that memory is touching it.  We just get a bunch of mallocs and frees and then the system falls over.

What on earth?

And then I notice it.  The system that is using the array is an XML parser.  There isn't one array - there are loads of them.  And this is the mind boggling part:  The moment the call to free stamps on the size value of array 'A' was not the moment that the memory owned by 'A' was freed, it was when some other array was freeing memory.

So:  We have a two blocks of memory allocated by the system.  At the start of each one, we have a 4-byte int for the total size of the block, and at the end of each one we have the guard value.  We know the location of the check the guard value by checking the size, but when we free 'B' the size of 'A' changes.


Lets have a look at GetMemSize.  For the platfrom in question  GetMemSize(ptr) returns ptr[-1]-5.  Note the vendor in question does not provide a memstat/memsize function so this function is the result of deduction and reverse engineering. 

And it's wrong.  The version of the function for another platform by the same vendor has a hack of a work around that does: "if (memSize % 2 != 0) memSize = memSize + 1;". 

Oh Hello.

So, it seems like the system memory manager uses those first 4 bits for more than just the size.  After all, the memory allocations are all 4 byte aligned, so those first 3 bits are not needed, and it seems like they are used for something else, perhaps to do with the availablity of neighboring blocks for use with realloc?  Who knows.

But the fix now is clear, mask of the bottom bit, and we are good.

RwUInt32 memSize = (Ptr[-1]-4)&~1;

Perhaps I should mask of the bottom 3 bits after all?

Who knows, it all seems to be working now, so best leave well alone...

< Wheel of Fortune | If it's Friday, I must be diarizing >
Tales from the Codeface | 14 comments (14 topical, 0 hidden)
for all the whinging by sasquatchan (3.25 / 4) #1 Fri Mar 20, 2009 at 08:54:25 AM EST
FOSS folks make about winders platforms being  a PITA to use (it can be), I've never had to deal with muck like that. Glad I've never needed to write my own allocator, more or less profiling code.

You should try writing your own allocator. by dark nowhere (4.00 / 1) #2 Fri Mar 20, 2009 at 09:33:25 AM EST
There's lots of interesting little challenges in there.

See you, space cowboy.

[ Parent ]
I should add by sasquatchan (2.00 / 0) #5 Fri Mar 20, 2009 at 10:28:42 AM EST

I did for NACHOS, back in college, and loved it. (Well, I enjoyed writing the thread scheduling more ..)

If I were writing an OS, I would find it very interesting. If I'm writing a game or DB or other application, where I want my own pool of memory to allocate as I see fit, I would not like it.. You have to deal with the underlying OS, the hardware, all the headaches of doing it for the OS, yet with out the benefits of complete control..

[ Parent ]
Not sure I follow... by dark nowhere (2.00 / 0) #8 Fri Mar 20, 2009 at 11:34:35 AM EST
Aren't the cases where you want to manage the memory yourself the mostly same ones you'd want to write your own allocator for? Or do you mean that you want to allocate it... manually? Or are you just saying you hate to have to put up with the OS getting in your way?

See you, space cowboy.

[ Parent ]
Not sure I'm phrasing it right by sasquatchan (2.00 / 0) #9 Fri Mar 20, 2009 at 11:53:31 AM EST
Generally, most applications are happy to have new/delete/malloc/free provided by the OS. Those map to the runtime, that calls the OS specific function for doing it. That is how most folks manage their memory themselves. The OS does the allocation, you call the OS function.

Certain very large applications, or very performance concerned applications will ask the underlying OS "I know what I'm doing. Give me a huge chunk of memory, and let me manage it myself". Meaning the OS hands over the chunk and walks away. The application  implements its own my_new/my_delete/my_malloc/my_free operations (its own memory manager) and doles out the RAM from that chunk. 

I believe the reasons behind it deal with lower overhead/faster calls to allocate/free, lets you do tricky things under the hood, etc. Unsure if codemonkey_uk can comment on their specific case.. Given his gaming background and the platform issues, it may be proprietary.

I haven't kept up with how various OSs handle that case, so perhaps there's less overhead involved in writing your own allocater now a days.

[ Parent ]
but... malloc isn't provided by the OS by dark nowhere (2.00 / 0) #10 Fri Mar 20, 2009 at 02:22:06 PM EST
I mean, sure a C lib with malloc is often distributed with the OS, but it's not a system call. But maybe that's the semantical point that caused all of the confusion.

I know for sure there are reasons in gaming to write your own allocator or even just allocate manually for some things. Not sure how many of those cases have to do with speed, but situations with RAM constraints can benefit from tailored  allocation strategies (and thank god the modern "low RAM" scenario is usually one without an OS in the way.)

See you, space cowboy.

[ Parent ]
probably semantics.. by sasquatchan (2.00 / 0) #11 Fri Mar 20, 2009 at 02:59:12 PM EST
I prefer new, whatever new maps to ;)

Win32 programmers that haven't bothered to learn new platforms use the win32 api for alloc (eg virutalalloc) so there are both OS specific calls (like taht) and generic malloc/new that "just work".

(Might still be quibble: OS still does memory management, whatever API is used eventually boils down to the OS walking the chain of free memory blocks looking for best-fit/first-fit and shuffling the chain based on those actions, no ?)

[ Parent ]
OS & memory by dark nowhere (2.00 / 0) #12 Fri Mar 20, 2009 at 03:21:47 PM EST
You're right--the OS will allocate to the program via syscall within malloc. That stuff is handled very differently in the modern OS, thanks to the MMU, which keeps everything looking nice and contiguous to the program.

I forget the specifics on 'new', but IIRC it does the same things for the purposes of this conversation (plus the C++ stuff on the side.)

See you, space cowboy.

[ Parent ]
Mask off the bottom 3 bits by wiredog (4.00 / 1) #3 Fri Mar 20, 2009 at 09:43:42 AM EST
You don't know what the system does with them, or what it will do with them in the future (Can it be field upgraded, or is it all in ROM?), and you don't use them yourself. Safer to explicitly ignore them.

Learned that bit doing motion control apps. Which are similarly low-level and constrained. If I never have to program in hex again...

Earth First!
(We can strip mine the rest later.)

As a hardware guy by garlic (2.00 / 0) #4 Fri Mar 20, 2009 at 10:20:25 AM EST
I am always irritated when the sw guys don't treat the 'unused' portions of a memory access with more respect. Because I know SW does this, I try not to grow the size of a current register and only add new registers. However, sometimes I need some sort of control register that has a variety of bits that can grow with new features. It'd be great if my software team wouldn't write garbage into the unused area that could eventually turn into a used area.

[ Parent ]
Folks abusing by sasquatchan (2.00 / 0) #6 Fri Mar 20, 2009 at 10:32:05 AM EST
supposedly "unused" bits of pointers/handles is a regular feature on "The old new thing". Always because some SW dev abuses something they "notice" about an abstract value (eg that handles are always mod 4, so  you can play with those other lower bits), then the next version of Windows doesn't keep that pattern (it was undocumented in the first place), and software breaks, and MSFT gets the grief, when bad SW writers were at fault for abusing something they didn't own.

[ Parent ]
(Comment Deleted) by yicky yacky (2.00 / 0) #7 Fri Mar 20, 2009 at 10:41:24 AM EST

This comment has been deleted by yicky yacky

[ Parent ]
You're too nice. by dark nowhere (4.00 / 0) #13 Fri Mar 20, 2009 at 04:05:25 PM EST
I would just use the words "undefined" or "reserved" here and there. If they fuck that up, they can eat one for all I care.

See you, space cowboy.

[ Parent ]
I haven't looked into computer architecture by wumpus (2.00 / 0) #14 Sun Mar 22, 2009 at 09:31:29 AM EST
for a long time, but it seemed full of "any warts you allow programmers to monkey with the 'unused bits' will have to be supported for all time.

I think the mac programmers playing with the 'unusable' 8 bits that a 68000 can't use is the poster child for this stupidity, but I suspect it happens every time they aren't forced to.


[ Parent ]
Tales from the Codeface | 14 comments (14 topical, 0 hidden)