Provided Example Code
This lab asks you to implement a tool that can be included within a C source file in order to check memory usage for a few specific types of errors:
- Freeing a pointer that has never been malloc'ed
- Freeing a pointer that has already been freed (without being rellocated)
- Leaking memory at the end, and only at the end, of the program
How Do We Track this?
In order to do this, it should keep track of all "active" allocations by address, adding them to some list when upon call to malloc() and removing them upon call to free(). Free should produce a message to stderr if, and only if, it is asked to free an address which is not known to be "active".
In order to distinguish between an allocation which has been freed more than once from a bogus pointer which never identified allocated space, the utility should also keep track of freed allocations, also by address. So, upon call to free(), an address should not only be removed from the list of allocations, but also added to the list of freed allocations. This way, if a user tries to free it again, the utility can determine that it is a multiple free, not an otherwise bogus pointer.
This also means that, upon malloc(), the address of the allocation should be removed from the list of "freed allocations", if present. If it isn't removed, an appropriate call to free might appear as a duplicate free.
The utility should also warn upon call to exit(), if any allocations remain within the allocated list. Specifically, it should report the address of each allocation which has not been freed. This will require exit() be replaced as was malloc and free.
For this to work, instead of returning from main, exit() must be called. Use a standard int main, but call exit() before your return 0. This means control will never reach return 0, but the compiler will be happy.
Format of Output
Messages should be as in the example below:
- Free requested of unallocated pointer 0x5A456312 at file.c:123
- Free requested of already freed pointer 0x5A456312 at file.c:123
- Allocation at address 0x5A456312 not freed at program termination
In the case of a duplicate free, report both the location of the free and the location of the original malloc. This can be done by leaving the information in the "active" list upon a free. If this is done, both sets of information are available -- and we can determine that the allocation is freed by its presence within the "freed" list alone.
- Free requested of already freed pointer 0x5A456312 at file.c:123, original allocation was at file.c:123
Intercepting calls to malloc() and free()
The basic plan is this, we're going to put together a header file that uses #define to replace calls to malloc() and free() with calls own versions, mymalloc() and myfree(). Anywhere that this header file is included, malloc and free won't be called -- instead, the preprocessor will transform those calls into calls to mymalloc() and myfree()
Since we don't want to reimplement malloc() and free() ourselves -- we'll leave that for 15-213 -- we still want to call the "real thing" from our version. This isn't a problem -- our library doesn't include the header file that does the replacement, so we still have access to malloc() and free() in our code.
The only other detail is that, in the header file, we're going to "#undef malloc" and "#undef free". We're doing this just in case one of the included header files has #defined them as we are. #undef is the opposite of a #define -- it removes a previously defined preprocessor macro.
So, for a listing of the code that illustrates this, grab the examples -- they are linked at the top of this page. Notice that main.c includes the memcheck.h header file -- this maps malloc() and free() over to our versions, mymalloc() and myfree(). Notice that memcheck.c does not include the header file -- this way, it can access the real malloc() and free(). You can use these examples as a starting point for this lab, if you'd like.
In addition to #define and #undef, this lab makes use of the preprocessor predefined names "__FILE__" and "__LINE__". When the preprocessor encounters these names, it replaces them with the name of the file and the line number, respectively. "__FILE__" is a string literal and "__LINE__" is a decimal constant.
If you'd like to see how they work, try adding this at any point within your favorite program:
/* prints the printf's file and line number */ printf ("%s:%d\n", __FILE__, __LINE__);
In order to incorporate this into your malloc and free, you'll want to change the malloc and free macros that we that we provided to capture the line and file at the point where the program called malloc or free. Remember that the macro is expanded "in place", so the "__LINE__" will be observed where malloc is called within the code, not where malloc is defined as a macro.
Below is an example of the modified macro.
#define malloc(x) mymalloc(x, __FILE__, __LINE__)
What to Track
Our main goal is to track memory addresses. But, reporting a problem to the user by memory address is not super-helpful. It doesn't answer the question, "Where did the problem occur?". And, it doesn't give any clue about "Which object was allocated?"
So, as you've noticed by our output, we're actually tracking a few things:
- Memory address
- Allocation size
- The file and line number of the malloc/free
In order to do this, you'll probably want to organize the information you need into a struct and keep track of instances of that struct, rather than pure addresses. this way, everything is in one place.
Keeping Track of Addresses: The Hash Table Approach
This assignment requires that you keep track of two different things: the "active" pointers and "freed" pointers. If you are familiar with hash tables, they are the logical choice of data structure for this assignment. One hash table can be used for each purpose.
If you take this route, you might want to consider the hash function shown below. It derives an index into the hash table directly from the memory address. it strips s you'll better understand after taking 15-213, it strips off the least-significant bits (right most) part of an address -- because these bits are affected by "alignment". And, it also strips of the high-order bits of an address -- because they are affected by the "big picture" location of the stack in memory. And, within the framed bits, it favors the lower bits, which change more rapidly, so the hash table won't fill sequentially as the stack grows upward. It isn't a really good hash function -- but it is simple and good enough.
#define hash(address,table_size) ( ((address) / 16) % (table_size) )
Also, you'll want to pick a hash table size that is "plenty big". This should be a configuration constant (#define or const), so that it can be changed. As yourself how many pointers you'll have in your program, estimate very high, and give your self several times that number -- and never less than a few hundred.
Keeping Track of Addresses: The Linked-List Approach
If you are unfamilar with hash tables, or just scared of them, you may use linked lists and search them brute-force. It isn't pretty -- but we do know that not everyone yet feels comfortable enough with hash tables. There is no penalty for this approach nor is there extra credit for using hash tables.
Shell script Test Driver
Please write a shell script to serve as a test driver for your library. For each source file in the input directory (argv), it should compile and execute the program using your memory utility. This means that your script should add the appropriate annotations to the original source code -- the original source should not necessarily have the appropriate #include.
For each input source program, it should generate an trace file with a name of the form "xyz.c.trace", where "xyz.c" is the name of the source file. these files should be placed into the output directory (argv).
You may assume that no arguments are required for any of the test programs. You may also generate temporary files, if you'd like -- but you should clean them up as you go.
We're Here To Help!
As always -- remember, we're here to help!