Recording memory profiles with Perfetto

In this guide, you'll learn how to:

The memory use of a process plays a key role in the performance of processes and impact on overall system stability. Understanding where and how your process is using memory can give significant insight to understand why your process may be running slower than you expect or just help make your program more efficient.

When it comes to apps and memory, there are mainly two ways a process can use memory:

Perfetto offers two complementary techniques for debugging the above:

Native (C/C++/Rust) Heap Profling

Native languages like C/C++/Rust commonly allocate and deallocate memory at the lowest level by using the libc family of malloc/free functions. Native heap profiling works by intercepting calls to these functions and injecting code which keeps track of the callstack of memory allocated but not freed. This allows to keep track of the "code origin" of each allocation. malloc/free can be perf-hotspots in heap-heavy processes: in order to mitigate the overhead of the memory profiler we support sampling to trade-off accuracy and overhead.

NOTE: native heap profiling with Perfetto only works on Android and Linux; this is due to the techniques we use to intercept malloc and free only working on these operating systems.

A very important point to note is that heap profiling is not retroactive. It can only report allocations that happen after tracing has started. It cannot provide any insight into allocations that occurred before the trace began. If you need to analyze memory usage from the start of a process, you must begin tracing before the process is launched.

If your question is "why is this process so big right now?" you cannot use heap profiling to answer questions about what happened in the past. However our anecdotal experience is that if you are chasing a memory leak, there is a good chance that the leak will keep happening over time and hence you will be able to see future increments.

Collecting your first heap profile

On Android Perfetto heap profiling hooks are seamlessly integrated into the libc implementation.

Prerequisites

  • A device running Android 10+.
  • A Profileable or Debuggable app. If you are running on a "user" build of Android (as opposed to "userdebug" or "eng"), your app needs to be marked as profileable or debuggable in its manifest. See the heapprofd documentation for more details.

Instructions

  • Open https://ui.perfetto.dev/#!/record
  • Select Android as target device and use one of the available transports. If in doubt, WebUSB is the easiest choice.
  • Click on the Memory probe on the left and then toggle the Native Heap Profiling option.
  • Enter the process name in the Names box.
  • The process name you have to enter is (the first argument of the) the process cmdline. That is the right-most column (NAME) of adb shell ps -A.
  • Select an observation time in the Buffers and duration page. This will determine for how long the profile will intercept malloc/free calls.
  • Press the red button to start recording the trace.
  • While the trace is being recorded, interact with the process being profiled. Run your user journey, test patterns, interact with your app.

UI Recording

On Android Perfetto heap profiling hooks are seamlessly integrated into the libc implementation.

Prerequisites

  • ADB installed.
  • Windows users: Make sure that the downloaded adb.exe is in the PATH. set PATH=%PATH%;%USERPROFILE%\Downloads\platform-tools
  • A device running Android 10+.
  • A Profileable or Debuggable app. If you are running on a "user" build of Android (as opposed to "userdebug" or "eng"), your app needs to be marked as profileable or debuggable in its manifest. See the heapprofd documentation for more details.

Instructions

$ adb devices -l List of devices attached 24121FDH20006S device usb:2-2.4.2 product:panther model:Pixel_7 device:panther transport_id:1

If more than one device or emulator is reported you must select one upfront as follows:

export ANDROID_SERIAL=24121FDH20006S

Download the tools/heap_profile (if you don't have a perfetto checkout):

curl -LO https://raw.githubusercontent.com/google/perfetto/main/tools/heap_profile

Then start the profile:

python3 heap_profile -n com.google.android.apps.nexuslauncher

Run your test patterns, interact with the process and press Ctrl-C when done (or pass -d 10000 for a time-limited profiling)

When you press Ctrl-C the heap_profile script will pull the traces and store them in /tmp/heap_profile-latest. Look for the message that says

Wrote profiles to /tmp/53dace (symlink /tmp/heap_profile-latest) The raw-trace file can be viewed using https://ui.perfetto.dev

Prerequisites

  • You need to build the libheapprofd_glibc_preload.so library from a Perfetto checkout (instructions)

Instructions

Download tracebox and the heap_profile script

curl -LO https://get.perfetto.dev/tracebox curl -LO https://raw.githubusercontent.com/google/perfetto/main/tools/heap_profile chmod +x tracebox heap_profile

Start the tracing service

./tracebox traced &

Generate the heapprofd config and start the tracing session.

# Replace trace_processor_shell with with the name of the process you want to # profile. ./heap_profile -n trace_processor_shell --print-config | \ ./tracebox perfetto --txt -c - -o ~/trace_processor_memory.pftrace

Open another terminal (or tab), start the process you want to profile, preloading the heapprofd's .so. Example:

PERFETTO_HEAPPROFD_BLOCKING_INIT=1 \ LD_PRELOAD=out/lnx/libheapprofd_glibc_preload.so \ trace_processor_shell /dev/null # Typing this will cause a 40MB allocation. > CREATE TABLE x as SELECT randomblob(40000000) as data

By default, heapprofd lazily initalizes to avoid blocking your program's main thread. However, if your program makes memory allocations on startup, these can be missed. To avoid this from happening, set the enironment variable PERFETTO_HEAPPROFD_BLOCKING_INIT=1; on the first malloc, your program will be blocked until heapprofd initializes fully but means every allocation will be correctly tracked.

At this point:

  • If your process terminates first, it will flush all the profiling data into the tracing buffer.
  • If not, you can request a flush of the profiling data by stopping the trace.

In either case: press Ctrl-C on the tracebox perfetto command of the previous step. That will write the trace file (e.g. ~/trace_processor_memory.pftrace) which you can then open with the Perfetto UI.

Remember also to kill the tracebox traced service once you are done.

fg # Ctrl-C

Visualizing your first heap profile

Open the /tmp/heap_profile-latest file in the Perfetto UI and click on the chevron marker in the UI track labeled "Heap profile".

Profile Diamond Native Flamegraph

The aggregated flamegraph by default shows unreleased memory (i.e. memory that has not been free()d) aggregated by call stack. The frames at the top represent the earliest entrypoint in your call stack (typically main() or pthread_start()). As you go towards the bottom, you'll get closer to the frames that ultimately invoked malloc().

You can also change the aggregation to the following modes:

Heap Profiling modes

Querying your first heap profile

As well as visualizing traces on a timeline, Perfetto has support for querying traces using SQL. The easiest way to do this is using the query engine available directly in the UI.

  1. In the Perfetto UI, click on the "Query (SQL)" tab in the left-hand menu.

    Perfetto UI Query SQL

  2. This will open a two-part window. You can write your PerfettoSQL query in the top section and view the results in the bottom section.

    Perfetto UI SQL Window

  3. You can then execute queries Ctrl/Cmd + Enter:

For example, by running:

INCLUDE PERFETTO MODULE android.memory.heap_graph.heap_graph_class_aggregation; SELECT -- Class name (deobfuscated if available) type_name, -- Count of class instances obj_count, -- Size of class instances size_bytes, -- Native size of class instances native_size_bytes, -- Count of reachable class instances reachable_obj_count, -- Size of reachable class instances reachable_size_bytes, -- Native size of reachable class instances reachable_native_size_bytes FROM android_heap_graph_class_aggregation;

you can see a summary of the reachable aggregate object sizes and object counts.

Java/Managed Heap Dumps

Java—and managed languages built on top of it, like Kotlin—use a runtime environment to handle memory management and garbage collection. In these languages, (almost) every object is a heap allocation. Memory is managed through object references: objects retain other objects, and memory is automatically reclaimed by the garbage collector once objects become unreachable. There is no free() call as in manual memory management.

As a result, most profiling tools for managed languages work by capturing and analyzing a complete heap dump, which includes all live objects and their retaining relationships—a full object graph.

This approach has the advantage of retroactive analysis: it provides a consistent snapshot of the entire heap without requiring prior instrumentation. However, it comes with a trade-off: while you can see which objects are keeping others alive, you typically cannot see the exact call sites where those objects were allocated. This can make it harder to reason about memory usage, especially when the same type of object is allocated from multiple locations in the code.

NOTE: Java heap dumps with Perfetto only works on Android. This is due to the deep integration with the JVM (Android Runtime - ART) required to efficiently capture a heap dump without impacting the performance of the process.

Collecting your first heap dump

On Android Perfetto heap profiling hooks are seamlessly integrated into the libc implementation.

Prerequisites

  • A device running Android 10+.
  • A Profileable or Debuggable app. If you are running on a "user" build of Android (as opposed to "userdebug" or "eng"), your app needs to be marked as profileable or debuggable in its manifest.

Instructions

  • Open https://ui.perfetto.dev/#!/record
  • Select Android as target device and use one of the available transports. If in doubt, WebUSB is the easiest choice.
  • Click on the Memory probe on the left and then toggle the Java heap dumps option.
  • Enter the process name in the Names box.
  • The process name you have to enter is (the first argument of the) the process cmdline. That is the right-most column (NAME) of adb shell ps -A.
  • Select a short duration in the Buffers and duration page (10 s or less). The trace duration is meaningless for this particular data source, as it emits a whole dump at the end of the trace. A longer trace will not lead to more or better data.
  • Press the red button to start recording the trace.

UI Recording

On Android Perfetto heap profiling hooks are seamlessly integrated into the libc implementation.

Prerequisites

  • ADB installed.
  • Windows users: Make sure that the downloaded adb.exe is in the PATH. set PATH=%PATH%;%USERPROFILE%\Downloads\platform-tools
  • A device running Android 10+.
  • A Profileable or Debuggable app. If you are running on a "user" build of Android (as opposed to "userdebug" or "eng"), your app needs to be marked as profileable or debuggable in its manifest.

Instructions

$ adb devices -l List of devices attached 24121FDH20006S device usb:2-2.4.2 product:panther model:Pixel_7 device:panther transport_id:1

If more than one device or emulator is reported you must select one upfront as follows:

export ANDROID_SERIAL=24121FDH20006S

Download the tools/java_heap_dump (if you don't have a perfetto checkout):

curl -LO https://raw.githubusercontent.com/google/perfetto/main/tools/java_heap_dump

Then start the profile:

python3 java_heap_dump -n com.google.android.apps.nexuslauncher

The script will record a trace with the heap dump and print the path of the trace file (e.g. /tmp/tmpmhuvqmnqprofile)

Wrote profile to /tmp/tmpmhuvqmnqprofile This can be viewed using https://ui.perfetto.dev.

Visualizing your first heap dump

Open the /tmp/xxxx file in the Perfetto UI and click on the chevron marker in the UI track labeled "Heap profile".

The UI will show a flattened version of the heap graph, in the shape of a flamegraph. The flamegraph aggregates together summing objects of the same type that share the same reachability path. Two flattening strategies are possible:

You can learn more about them in the Debugging memory usage case study

Sample heap dump in the UI

Querying your first heap profile

As well as visualizing traces on a timeline, Perfetto has support for querying traces using SQL. The easiest way to do this is using the query engine available directly in the UI.

  1. In the Perfetto UI, click on the "Query (SQL)" tab in the left-hand menu.

    Perfetto UI Query SQL

  2. This will open a two-part window. You can write your PerfettoSQL query in the top section and view the results in the bottom section.

    Perfetto UI SQL Window

  3. You can then execute queries Ctrl/Cmd + Enter:

For example, by running:

INCLUDE PERFETTO MODULE android.memory.heap_profile.summary_tree; SELECT -- The id of the callstack. A callstack in this context -- is a unique set of frames up to the root. id, -- The id of the parent callstack for this callstack. parent_id, -- The function name of the frame for this callstack. name, -- The name of the mapping containing the frame. This -- can be a native binary, library, JAR or APK. mapping_name, -- The name of the file containing the function. source_file, -- The line number in the file the function is located at. line_number, -- The amount of memory allocated and *not freed* with this -- function as the leaf frame. self_size, -- The amount of memory allocated and *not freed* with this -- function appearing anywhere on the callstack. cumulative_size, -- The amount of memory allocated with this function as the leaf -- frame. This may include memory which was later freed. self_alloc_size, -- The amount of memory allocated with this function appearing -- anywhere on the callstack. This may include memory which was -- later freed. cumulative_alloc_size FROM android_heap_profile_summary_tree;

you can see the memory allocated by every unique callstack in the trace.

Other types of memory

Besides the standard native and Java heaps, memory can be allocated in other ways that are not profiled by default. Here are some common examples:

Next steps

Now that you've recorded and analyzed your first memory profile, you can explore more advanced topics: