The 2 children says that this node node has two children of its own. Allocations are aggregated by their allocation stack trace. The Allocated at section shows the allocation stack trace that is shared by all the blocks covered by this node. The Total line shows that this node accounts for 4. The Total line also shows allocation rates, measured in bytes and blocks per million instructions. These rates are useful for comparing the significance of nodes across profiles made with different workloads.
Finally, the Total line shows the average size and lifetimes of these blocks. The At t-gmax line says shows that no blocks from this PP were alive when the global heap peak occurred. In other words, these blocks do not contribute at all to the global heap peak. The At t-end line shows that no blocks were from this PP were alive at shutdown.
In other words, all those blocks were explicitly freed before termination. The Reads and Writes lines show how many bytes were read within this PP's blocks, the fraction this represents of all heap reads, and the read rate.
Finally, it shows the read ratio, which is the number of reads per byte. In this case the number is 0. This suggests that the blocks are being underutilized and might be worth optimizing. The Writes lines is similar to the Reads line. The Reads and Writes measurements suggest that the blocks are being under-utilised and might be worth optimizing. Having said that, this kind of under-utilisation is common in data structures that grow, such as vectors and hash tables, and isn't always fixable.
Leaf nodes contain an additional Max line, indicating the peak memory use for the blocks covered by this PP. This peak may have occurred at a time other than t-gmax. In this case, 31,, bytes were allocated from this PP, but the maximum size alive at once was 16,, bytes. In this example, the first 8 frames are identical to those from the node in the previous example. These frames could be found by tracing back through ancestor nodes, but that can be annoying, which is why they are duplicated.
This also means that each node makes complete sense on its own. If all blocks covered by a PP node have the same size, an additional Accesses field will be present. It indicates how the reads and writes within these blocks were distributed. Every block covered by this PP was 32 bytes.
Within all of those blocks, byte 0 was accessed read or written 65, times, byte 1 was accessed 7 times, byte 2 was accessed 8 times, and so on. A dash - means "zero". It is used instead of 0 because it makes unaccessed regions more easily identifiable. Block layout can often be inferred from counts. For example, these blocks probably have four separate byte-sized fields, followed by a four-byte field, and so on. The PP tree is very large and many nodes represent tiny numbers of blocks and bytes.
Therefore, DHAT's viewer aggregates insignificant nodes like this:. Much of the detail is stripped away, leaving only basic measurements, along with an indication of how many nodes were aggregated together 5 in this case. It shows the function used to determine if a PP node is significant. All nodes that don't satisfy this function are aggregated. It is occasionally useful if you don't understand why a PP node has been aggregated.
The exact threshold depends on the sort metric see below. Finally, the bottom of the page shows a legend that explains some of the terms, abbreviations and symbols used in the output. The order in which sub-trees are sorted can be changed via the "Sort metric" drop-down menu at the top of DHAT's viewer.
Different sort metrics can be useful for finding different things. Some sort metrics also incorporate some filtering, so that only nodes meeting a particular criteria are shown. The total number of bytes allocated during the execution. Highly useful for evaluating heap churn, though not quite as useful as "Total blocks ". The total number of blocks allocated during the execution. Highly useful for evaluating heap churn; reducing the number of calls to the allocator can significantly speed up a program.
This is the default sort metric. Like "Total blocks ", but shows only very small blocks. Moderately useful, because such blocks are often easy to avoid allocating. Like "Total blocks ", but shows only very short-lived blocks. Like "Total bytes ", but shows only blocks that are never read or never written to or both.
Highly useful, because such blocks indicate poor use of memory and are often easy to avoid allocating. For example, sometimes a block is allocated and written to but then only read if a condition C is true; in that case, it may be possible to delay creating the block until condition C is true.
Alternatively, sometimes blocks are created and never used; such blocks are trivial to remove. Like "Total bytes ", but shows only blocks that have low numbers of reads or low numbers of writes or both. Moderately useful, because such blocks indicate poor use of memory. This shows the breakdown of memory at the point of peak heap memory usage.
Highly useful for reducing peak memory usage. This shows the breakdown of memory at program termination. Highly useful for identifying process-lifetime leaks. The heuristics control which interior pointers to a block cause it to be considered as reachable. The heuristic set is specified in one of the following ways:. They have been tested with some gcc versions e. If set to yes, the results for the leak search done at exit will be output in a 'Callgrind Format' execution tree file.
The produced file will contain the following events:. The increase or decrease for all events above will also be output in the file to provide the delta increase or decrease between 2 successive leak searches. The values for the increase and decrease events will be zero for the first leak search done. See Execution Trees for a detailed explanation about execution trees. Specifies that Valgrind should produce the xtree leak report in the specified file.
See the description of --log-file for details. See Execution Trees for a detailed explanation about execution trees formats. Controls whether Memcheck reports uses of undefined value errors. Set this to no if you don't want to see undefined value errors. It also has the side effect of speeding up Memcheck somewhat. AddrCheck removed in Valgrind 3. Controls whether Memcheck tracks the origin of uninitialised values.
By default, it does not, which means that although it can tell you that an uninitialised value is being used in a dangerous way, it cannot tell you where the uninitialised value came from. This often makes it difficult to track down the root problem. When set to yes , Memcheck keeps track of the origins of all uninitialised values. Then, when an uninitialised value error is reported, Memcheck will try to show the origin of the value.
An origin can be one of the following four places: a heap block, a stack allocation, a client request, or miscellaneous other sources eg, a call to brk. For uninitialised values originating from a heap block, Memcheck shows where the block was allocated. For uninitialised values originating from a stack allocation, Memcheck can tell you which function allocated the value, but no more than that -- typically it shows you the source location of the opening brace of the function.
So you should carefully check that all of the function's local variables are initialised properly. Performance overhead: origin tracking is expensive. It halves Memcheck's speed and increases memory use by a minimum of MB, and possibly more.
Nevertheless it can drastically reduce the effort required to identify the root cause of uninitialised value errors, and so is often a programmer productivity win, despite running more slowly. Accuracy: Memcheck tracks origins quite accurately. To avoid very large space and time overheads, some approximations are made.
It is possible, although unlikely, that Memcheck will report an incorrect origin, or not be able to identify any origin. Memcheck checks for and rejects this combination at startup. Controls how Memcheck handles , , and bit naturally aligned loads from addresses for which some bytes are addressable and others are not. When yes , such loads do not produce an address error.
Instead, loaded bytes originating from illegal addresses are marked as uninitialised, and those corresponding to legal addresses are handled in the normal way. When no , loads from partially invalid addresses are treated the same as loads from completely invalid addresses: an illegal-address error is issued, and the resulting bytes are marked as initialised.
If at all possible, such code should be fixed. Controls whether Memcheck should employ more precise but also more expensive time consuming instrumentation when checking the definedness of certain values. In particular, this affects the instrumentation of integer adds, subtracts and equality comparisons.
This maximises performance but will normally give an unusably high false error rate. It also enables an instrumentation-time analysis pass which aims to further reduce the costs of accurate instrumentation. Note that the exact instrumentation settings in this mode are architecture dependent. With alloc-then-free , a stack trace is recorded at allocation time, and is associated with the block.
When the block is freed, a second stack trace is recorded, and this replaces the allocation stack trace. As a result, any "use after free" errors relating to this block can only show a stack trace for where the block was freed. With alloc-and-free , both allocation and the deallocation stack traces for the block are stored.
Hence a "use after free" error will show both, which may make the error easier to diagnose. Compared to alloc-then-free , this setting slightly increases Valgrind's memory use as the block contains two references instead of one. With alloc , only the allocation stack trace is recorded and reported. With free , only the deallocation stack trace is recorded and reported. These values somewhat decrease Valgrind's memory and cpu usage.
They can be useful depending on the error types you are searching for and the level of detail you need to analyse them. For example, if you are only interested in memory leak errors, it is sufficient to record the allocation stack traces. With none , no stack traces are recorded for malloc and free operations. Of course, few details will be reported for errors related to heap blocks. Note that once a stack trace is recorded, Valgrind keeps the stack trace in memory even if it is not referenced by any block.
Some programs for example, recursive algorithms can generate a huge number of stack traces. Instead, it is marked inaccessible and placed in a queue of freed blocks. The purpose is to defer as long as possible the point at which freed-up memory comes back into circulation. This increases the chance that Memcheck will be able to detect invalid accesses to blocks for some significant period of time after they have been freed. This option specifies the maximum total size, in bytes, of the blocks in the queue.
The default value is twenty million bytes. Increasing this increases the total amount of memory used by Memcheck but may detect invalid uses of freed blocks which would otherwise go undetected. When making blocks from the queue of freed blocks available for re-allocation, Memcheck will in priority re-circulate the blocks with a size greater or equal to --freelist-big-blocks.
This ensures that freeing big blocks in particular freeing blocks bigger than --freelist-vol does not immediately lead to a re-circulation of all or a lot of the small blocks in the free list. In other words, this option increases the likelihood to discover dangling pointers for the "small" blocks, even when big blocks are freed. When enabled, assume that reads and writes some small distance below the stack pointer are due to bugs in GCC 2.
The "small distance" is bytes by default. Note that GCC 2. X and so you may need to use this option. Do not use it if you do not have to, as it can cause real errors to be overlooked. A better alternative is to use a more recent GCC in which this bug is fixed. You may also need to use this option when working with GCC 3. X on bit PowerPC Linux.
This is in violation of the bit PowerPC ELF specification, which makes no provision for locations below the stack pointer to be accessible. This option is deprecated as of version 3. You should instead use --ignore-range-below-sp to specify the exact range of offsets below the stack pointer that should be ignored. This is a more general replacement for the deprecated --workaround-gccbugs option. When specified, it causes Memcheck not to report errors for accesses at the specified offsets below the stack pointer.
The two offsets must be positive decimal numbers and -- somewhat counterintuitively -- the first one must be larger, in order to imply a non-wraparound address range to ignore. Only one range may be specified.
When enabled, Memcheck checks that heap blocks are deallocated using a function that matches the allocating function. That is, it expects free to be used to deallocate blocks allocated by malloc , delete for blocks allocated by new , and delete[] for blocks allocated by new[].
If a mismatch is detected, an error is reported. This is in general important because in some environments, freeing with a non-matching function can cause crashes.
There is however a scenario where such mismatches cannot be avoided. For example, imagine that delete[] is inlined but new[] is not. The result is that Memcheck "sees" all delete[] calls as direct calls to free , even when the program source contains no mismatched calls. This causes a lot of confusing and irrelevant error reports. It is not generally advisable to disable them, though, because you may miss real errors as a result. Any ranges listed in this option and multiple ranges can be specified, separated by commas will be ignored by Memcheck's addressability checking.
Fills blocks allocated by malloc , new , etc, but not by calloc , with the specified byte. This can be useful when trying to shake out obscure memory corruption problems.
The allocated area is still regarded by Memcheck as undefined -- this option only affects its contents. Fills blocks freed by free , delete , etc, with the specified byte value. The freed area is still regarded by Memcheck as not valid for access -- this option only affects its contents. The basic suppression format is described in Suppressing errors. Value1 , Value2 , Value4 , Value8 , Value16 , meaning an uninitialised-value error when using a value of 1, 2, 4, 8 or 16 bytes.
Cond or its old name, Value0 , meaning use of an uninitialised CPU condition code. Addr1 , Addr2 , Addr4 , Addr8 , Addr16 , meaning an invalid address during a memory access of 1, 2, 4, 8 or 16 bytes respectively. Param errors have a mandatory extra information line at this point, which is the name of the offending system call parameter.
Leak errors have an optional extra information line, with the following format:. Be aware that leak suppressions that are created using --gen-suppressions will contain this optional extra line, and therefore may match fewer leaks than you expect.
You may want to remove the line before using the generated suppressions. If you give the -v option, Valgrind will print the list of used suppressions at the end of execution. For a leak suppression, this output gives the number of different loss records that match the suppression, and the number of bytes and blocks suppressed by the suppression.
If the run contains multiple leak checks, the number of bytes and blocks are reset to zero before each new leak check. Note that the number of different loss records is not reset to zero. For ValueN and AddrN errors, the first line of the calling context is either the name of the function in which the error occurred, or, failing that, the full path of the. For Overlap errors, the first line is the name of the function with the overlapping arguments eg.
The last part of any suppression specifies the rest of the calling context that needs to be matched. Every bit literally of data processed, stored and handled by the real CPU has, in the synthetic CPU, an associated "valid-value" bit, which says whether or not the accompanying bit has a legitimate value. In the discussions which follow, this bit is referred to as the V valid-value bit.
Each byte in the system therefore has a 8 V bits which follow it wherever it goes. For example, when the CPU loads a word-size item 4 bytes from memory, it also loads the corresponding 32 V bits from a bitmap which stores the V bits for the process' entire address space.
If the CPU should later write the whole or some part of that value to memory at a different address, the relevant V bits will be stored back in the V-bit bitmap. In short, each bit in the system has conceptually an associated V bit, which follows it around everywhere, even inside the CPU. Yes, all the CPU's registers integer, floating point, vector and condition registers have their own V bit vectors.
For this to work, Memcheck uses a great deal of compression to represent the V bits compactly. Copying values around does not cause Memcheck to check for, or report on, errors. However, when a value is used in a way which might conceivably affect your program's externally-visible behaviour, the associated V bits are immediately checked.
If any of these indicate that the value is undefined even partially , an error is reported. Memcheck emits no complaints about this, since it merely copies uninitialised values from a[] into b[] , and doesn't use them in a way which could affect the behaviour of the program.
However, if the loop is changed to:. It's only when a decision has to be made as to whether or not to do the printf -- an observable action of your program -- that Memcheck complains.
Most low level operations, such as adds, cause Memcheck to use the V bits for the operands to calculate the V bits for the result. Even if the result is partially or wholly undefined, it does not complain. Checks on definedness only occur in three places: when a value is used to generate a memory address, when control flow decision needs to be made, and when a system call is detected, Memcheck checks definedness of parameters as required.
If a check should detect undefinedness, an error message is issued. The resulting value is subsequently regarded as well-defined. To do otherwise would give long chains of error messages.
In other words, once Memcheck reports an undefined value error, it tries to avoid reporting further errors derived from that same undefined value. This sounds overcomplicated. Why not just check all reads from memory, and complain if an undefined value is loaded into a CPU register?
Well, that doesn't work well, because perfectly legitimate C programs routinely copy uninitialised values around in memory, and we don't want endless complaints about that. Here's the canonical example. Consider a struct like this:.
The question to ask is: how large is struct S , in bytes? An int is 4 bytes and a char one byte, so perhaps a struct S occupies 5 bytes? All non-toy compilers we know of will round the size of struct S up to a whole number of words, in this case 8 bytes.
Not doing this forces compilers to generate truly appalling code for accessing arrays of struct S 's on some architectures. So s1 occupies 8 bytes, yet only 5 of them will be initialised. If Memcheck simply checked values as they came out of memory, it would yelp every time a structure assignment like this happened. So the more complicated behaviour described above is necessary. This allows GCC to copy s1 into s2 any way it likes, and a warning will only be emitted if the uninitialised values are later used.
As explained above, Memcheck maintains 8 V bits for each byte in your process, including for bytes that are in shared memory. However, the same piece of shared memory can be mapped multiple times, by several processes or even by the same process for example, if the process wants a read-only and a read-write mapping of the same page.
For such multiple mappings, Memcheck tracks the V bits for each mapping independently. This can lead to false positive errors, as the shared memory can be initialised via a first mapping, and accessed via another mapping. The access via this other mapping will have its own V bits, which have not been changed when the memory was initialised via the first mapping.
Notice that the previous subsection describes how the validity of values is established and maintained without having to say whether the program does or does not have the right to access any particular memory location.
We now consider the latter question. As described above, every bit in memory or in the CPU has an associated valid-value V bit. In addition, all bytes in memory, but not in the CPU, have an associated valid-address A bit. This indicates whether or not the program can legitimately read or write that location. It does not give any indication of the validity of the data at that location -- that's the job of the V bits -- only whether or not the location may be accessed.
Every time your program reads or writes memory, Memcheck checks the A bits associated with the address. If any of them indicate an invalid address, an error is emitted. Note that the reads and writes themselves do not change the A bits, only consult them. Upon freeing the area the A bits are changed to indicate inaccessibility.
When the stack pointer register SP moves up or down, A bits are set. The rule is that the area from SP up to the base of the stack is marked as accessible, and below SP is inaccessible. Tracking SP like this has the useful side-effect that the section of stack used by a function for local variables etc is automatically marked accessible on function entry and inaccessible on exit. When doing system calls, A bits are changed appropriately. For example, mmap magically makes files appear in the process' address space, so the A bits must be updated if mmap succeeds.
Optionally, your program can tell Memcheck about such changes explicitly, using the client request mechanism described above. When memory is read or written, the relevant A bits are consulted. If they indicate an invalid address, Memcheck emits an Invalid read or Invalid write error.
They are not consulted. When a register is written out to memory, the V bits for that register are written back to memory too.
Viewed 4k times. Does anyone know why massif would fail to profile my application? Tyson Tyson 1 1 silver badge 5 5 bronze badges. Is your program statically linked? If you do an 'ldd. It is statically linked. I've edited the question to clarify this. Add a comment.
Active Oldest Votes. From the Valgrind FAQ: Second, if your program is statically linked, most Valgrind tools won't work as well, because they won't be able to replace certain functions, such as malloc, with their own versions. Dave S Dave S I missed that part of the FAQ. After changing to partial dynamic linking I am able to get a profile. Based on a conversation on the valgrind-users mailing list, valgrind should be able to analyze statically linked applications from version 3.
As requested on the mailing list I have filed a bug about this. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password.
0コメント