Last modified

This is the heap profiler we use at Google, to explore how C++ programs manage memory. This facility can be useful for

The profiling system instruments all allocations and frees. It keeps track of various pieces of information per allocation site. An allocation site is defined as the active stack trace at the call to malloc, calloc, realloc, or, new.

There are three parts to using it: linking the library into an application, running the code, and analyzing the output.

Linking in the Library

To install the heap profiler into your executable, add -ltcmalloc to the link-time step for your executable. Also, while we don't necessarily recommend this form of usage, it's possible to add in the profiler at run-time using LD_PRELOAD:

% env LD_PRELOAD="/usr/lib/libtcmalloc.so" <binary>

This does not turn on heap profiling; it just inserts the code. For that reason, it's practical to just always link -ltcmalloc into a binary while developing; that's what we do at Google. (However, since any user can turn on the profiler by setting an environment variable, it's not necessarily recommended to install profiler-linked binaries into a production, running system.) Note that if you wish to use the heap profiler, you must also use the tcmalloc memory-allocation library. There is no way currently to use the heap profiler separate from tcmalloc.

Running the Code

There are several alternatives to actually turn on heap profiling for a given run of an executable:

  1. Define the environment variable HEAPPROFILE to the filename to dump the profile to. For instance, to profile /usr/local/bin/my_binary_compiled_with_tcmalloc:

    % env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc
  2. In your code, bracket the code you want profiled in calls to HeapProfilerStart() and HeapProfilerStop(). (These functions are declared in <gperftools/heap-profiler.h>.) HeapProfilerStart() will take the profile-filename-prefix as an argument. Then, as often as you'd like before calling HeapProfilerStop(), you can use HeapProfilerDump() or GetHeapProfile() to examine the profile. In case it's useful, IsHeapProfilerRunning() will tell you whether you've already called HeapProfilerStart() or not.

For security reasons, heap profiling will not write to a file -- and is thus not usable -- for setuid programs.

Modifying Runtime Behavior

You can more finely control the behavior of the heap profiler via environment variables.

HEAP_PROFILE_ALLOCATION_INTERVAL default: 1073741824 (1 Gb) Dump heap profiling information each time the specified number of bytes has been allocated by the program.
HEAP_PROFILE_INUSE_INTERVAL default: 104857600 (100 Mb) Dump heap profiling information whenever the high-water memory usage mark increases by the specified number of bytes.
HEAP_PROFILE_TIME_INTERVAL default: 0 Dump heap profiling information each time the specified number of seconds has elapsed.
HEAPPROFILESIGNAL default: disabled Dump heap profiling information whenever the specified signal is sent to the process.
HEAP_PROFILE_MMAP default: false Profile mmap, mremap and sbrk calls in addition to malloc, calloc, realloc, and new. NOTE: this causes the profiler to profile calls internal to tcmalloc, since tcmalloc and friends use mmap and sbrk internally for allocations. One partial solution is to filter these allocations out when running pprof, with something like pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc.
HEAP_PROFILE_ONLY_MMAP default: false Only profile mmap, mremap, and sbrk calls; do not profile malloc, calloc, realloc, or new.
HEAP_PROFILE_MMAP_LOG default: false Log mmap/munmap calls.

Checking for Leaks

You can use the heap profiler to manually check for leaks, for instance by reading the profiler output and looking for large allocations. However, for that task, it's easier to use the automatic heap-checking facility built into tcmalloc.

Analyzing the Output

If heap-profiling is turned on in a program, the program will periodically write profiles to the filesystem. The sequence of profiles will be named:

           <prefix>.0000.heap
           <prefix>.0001.heap
           <prefix>.0002.heap
           ...

where <prefix> is the filename-prefix supplied when running the code (e.g. via the HEAPPROFILE environment variable). Note that if the supplied prefix does not start with a /, the profile files will be written to the program's working directory.

The profile output can be viewed by passing it to the pprof tool -- the same tool that's used to analyze CPU profiles.

Here are some examples. These examples assume the binary is named gfs_master, and a sequence of heap profile files can be found in files named:

  /tmp/profile.0001.heap
  /tmp/profile.0002.heap
  ...
  /tmp/profile.0100.heap

Why is a process so big

    % pprof --gv gfs_master /tmp/profile.0100.heap

This command will pop-up a gv window that displays the profile information as a directed graph. Here is a portion of the resulting output:

A few explanations:

Comparing Profiles

You often want to skip allocations during the initialization phase of a program so you can find gradual memory leaks. One simple way to do this is to compare two profiles -- both collected after the program has been running for a while. Specify the name of the first profile using the --base option. For example:

   % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap

The memory-usage in /tmp/profile.0004.heap will be subtracted from the memory-usage in /tmp/profile.0100.heap and the result will be displayed.

Text display

% pprof --text gfs_master /tmp/profile.0100.heap
   255.6  24.7%  24.7%    255.6  24.7% GFS_MasterChunk::AddServer
   184.6  17.8%  42.5%    298.8  28.8% GFS_MasterChunkTable::Create
   176.2  17.0%  59.5%    729.9  70.5% GFS_MasterChunkTable::UpdateState
   169.8  16.4%  75.9%    169.8  16.4% PendingClone::PendingClone
    76.3   7.4%  83.3%     76.3   7.4% __default_alloc_template::_S_chunk_alloc
    49.5   4.8%  88.0%     49.5   4.8% hashtable::resize
   ...

Ignoring or focusing on specific regions

The following command will give a graphical display of a subset of the call-graph. Only paths in the call-graph that match the regular expression DataBuffer are included:

% pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap

Similarly, the following command will omit all paths subset of the call-graph. All paths in the call-graph that match the regular expression DataBuffer are discarded:

% pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap

Total allocations + object-level information

All of the previous examples have displayed the amount of in-use space. I.e., the number of bytes that have been allocated but not freed. You can also get other types of information by supplying a flag to pprof:

--inuse_space Display the number of in-use megabytes (i.e. space that has been allocated but not freed). This is the default.
--inuse_objects Display the number of in-use objects (i.e. number of objects that have been allocated but not freed).
--alloc_space Display the number of allocated megabytes. This includes the space that has since been de-allocated. Use this if you want to find the main allocation sites in the program.
--alloc_objects Display the number of allocated objects. This includes the objects that have since been de-allocated. Use this if you want to find the main allocation sites in the program.

Interactive mode

By default -- if you don't specify any flags to the contrary -- pprof runs in interactive mode. At the (pprof) prompt, you can run many of the commands described above. You can type help for a list of what commands are available in interactive mode.

Caveats


Sanjay Ghemawat