pprof Support in Perfetto

Status: COMPLETED · lalitm · 2025-09-30

Objective

Add support for importing pprof files into Perfetto Trace Processor and visualizing them with flame graphs in the Perfetto UI. This enables analysis of CPU/heap profiles from Go, C++, and other tools that generate pprof format within the Perfetto ecosystem.

Overview

This feature extends Perfetto's trace analysis capabilities to include non-time-based aggregate profiling data. Unlike existing profiling support which is integrated with timeline-based traces, pprof data represents standalone aggregate samples that are independent of time.

graph LR A[pprof file] --> B[PprofTraceReader] B --> C[aggregate_profile table] B --> D[aggregate_sample table] B --> E[stack_profile_* tables] C --> F[UI: Scope/Metric Selection] D --> F E --> F F --> G[Interactive Flamegraph]

The implementation builds upon existing Perfetto infrastructure:

Database layer: Extends existing stack_profile_* tables with new aggregate tables
Import pipeline: Follows the established TraceType + TraceReader pattern
UI layer: Leverages existing flame graph visualization components

Requirements

Zero-setup analysis: A pprof file can be analyzed with a single command or drag-and-drop.

Full format support: Support gzipped and uncompressed pprof protobuf files from any pprof-compatible tool.

Multiple metrics per file: Handle pprof files containing multiple value types (e.g., CPU samples + allocation counts) in a single visualization.

Interactive flame graphs: Provide full interactivity including zoom, search, and source location attribution where available.

No timeline confusion: Keep pprof data completely separate from time-based trace analysis to avoid user confusion.

Detailed Design

File Format Support

The implementation supports the standard pprof format as defined by Google's pprof tool:

Gzipped format: Files compressed with gzip, as typically generated by most profiling tools.

Raw protobuf: Uncompressed protobuf files for development and testing.

Profile structure: Full support for the Profile protobuf message including:

String table for deduplicated strings
Sample data with location hierarchies
Function and mapping metadata
Multiple value types (CPU samples, allocations, etc.)

Import Architecture

File Detection

The import pipeline automatically detects pprof files through a two-stage process:

Gzip detection: Recognize gzipped files by magic bytes (1f 8b)
Protobuf validation: After decompression, validate pprof structure by checking for Profile message with sample_type field

PprofTraceReader

class PprofTraceReader : public ChunkedTraceReader {
 public:
  explicit PprofTraceReader(TraceProcessorContext* context);

  base::Status Parse(TraceBlobView blob) override;
  base::Status NotifyEndOfFile() override;

 private:
  base::Status ParseProfile();

  TraceProcessorContext* context_;
  std::vector<uint8_t> buffer_;
};

The reader accumulates pprof data into an internal buffer and parses the complete protobuf message upon EOF notification.

Database Schema

New Tables

The implementation introduces two new tables that integrate with existing stack profiling infrastructure:

-- Metadata for each profiling metric from pprof files
CREATE TABLE aggregate_profile (
  id INTEGER PRIMARY KEY,
  scope TEXT,              -- file identifier (e.g., "cpu.pprof")
  name TEXT,               -- display name (e.g., "pprof cpu")
  sample_type_type TEXT,   -- pprof ValueType.type (e.g., "cpu")
  sample_type_unit TEXT    -- pprof ValueType.unit (e.g., "nanoseconds")
);

-- Sample values aggregated by callsite
CREATE TABLE aggregate_sample (
  id INTEGER PRIMARY KEY,
  aggregate_profile_id INTEGER,  -- FK to aggregate_profile
  callsite_id INTEGER,           -- FK to stack_profile_callsite
  value REAL                     -- sample count/value
);

Integration with Existing Infrastructure

stack_profile_frame: Stores function name and source file information
stack_profile_callsite: Maintains call stack hierarchy from root to leaf
stack_profile_mapping: Contains binary/library mapping information

Each pprof location becomes a frame, callsites represent the full call chain from root to leaf, and samples aggregate values at each callsite.

Data Processing Pipeline

Step 1: String Table Parsing

All pprof files use a string table for deduplication. The importer builds a vector of strings from the protobuf string_table field.

Step 2: Mapping and Function Creation

For each pprof Mapping and Function:

Extract binary name, build ID, and memory ranges
Create entries in stack_profile_mapping and populate frame metadata
Build lookup tables for location resolution

Step 3: Location Processing

Each pprof Location represents a program counter with optional debug information:

Map addresses to existing or dummy memory mappings
Extract function names from associated line information
Create stack_profile_frame entries with relative PCs

Step 4: Sample Processing

For each pprof Sample:

Build complete callsite hierarchy from location chain (reversing pprof leaf-first order)
Create aggregate entries for each value type in the sample
Link samples to callsites through aggregate_sample table

Pprof Sample → Location IDs [3,2,1] (leaf first)
             ↓
Perfetto Callsite hierarchy: 1 → 2 → 3 (root to leaf)
                            ↓
Multiple aggregate_sample entries (one per value type)

UI Implementation

PprofPage Component

The UI provides a dedicated page for pprof analysis accessible from the main navigation. The page automatically discovers available data and provides interactive controls.

Dynamic Data Discovery

Upon loading, the UI queries the database to discover:

Available scopes (typically one per imported pprof file)
Available metrics within each scope (CPU, allocations, etc.)
Sample data for the selected scope/metric combination

// Discover available pprof data
const scopesResult = await trace.engine.query(`
  SELECT DISTINCT scope FROM __intrinsic_aggregate_profile ORDER BY scope
`);

// Load metrics for selected scope
const metricsResult = await trace.engine.query(`
  SELECT sample_type_type, sample_type_unit
  FROM __intrinsic_aggregate_profile
  WHERE scope = '${selectedScope}'
`);

Flamegraph Integration

The implementation reuses Perfetto's existing QueryFlamegraph component with dynamically generated metrics:

const flamegraphMetrics = metricsFromTableOrSubquery(
  `
    WITH metrics AS MATERIALIZED (
      SELECT
        callsite_id,
        sum(sample.value) AS self_value
      FROM __intrinsic_aggregate_sample sample
      JOIN __intrinsic_aggregate_profile profile
        ON sample.aggregate_profile_id = profile.id
      WHERE profile.scope = '${scope}'
        AND profile.sample_type_type = '${metric}'
      GROUP BY callsite_id
    )
    SELECT
      c.id,
      c.parent_id as parentId,
      c.name,
      c.mapping_name,
      coalesce(m.self_value, 0) AS self_value
    FROM _callstacks_for_stack_profile_samples!(metrics) AS c
    LEFT JOIN metrics AS m USING (callsite_id)
  `,
  [{ name: 'Pprof Samples', unit: unit, columnName: 'self_value' }],
  'include perfetto module callstacks.stack_profile'
);

This query leverages the existing _callstacks_for_stack_profile_samples! table function to build the complete flamegraph hierarchy while aggregating pprof sample values.

Usage

Command Line Analysis

# Analyze a pprof file directly
$ trace_processor_shell profile.pprof

# Query available metrics
> SELECT scope, sample_type_type, sample_type_unit
  FROM __intrinsic_aggregate_profile;

# Examine sample data
> SELECT COUNT(*) FROM __intrinsic_aggregate_sample
  WHERE aggregate_profile_id = 1;

Web UI Analysis

File loading: Drag and drop pprof file into Perfetto UI or use file picker
Automatic detection: Perfetto recognizes pprof format and imports data
Navigation: Go to "Pprof" page from main navigation
Interactive analysis: Select scope/metric and explore flame graph

Multi-metric Files

For pprof files containing multiple value types (e.g., CPU samples + heap allocations):

Single import: All metrics from one file imported together under same scope
Metric switching: UI dropdown allows switching between metrics instantly
Independent analysis: Each metric displays as separate flame graph

Design Principles

Integration over Replacement

Rather than building a standalone pprof viewer, this feature integrates pprof analysis into Perfetto's existing infrastructure. This provides:

Unified tooling: Users can analyze pprof data alongside other trace formats using the same UI and SQL interface.

Leveraged infrastructure: Reuses existing flame graph rendering, call stack handling, and database optimization.

Consistent UX: Familiar Perfetto interface for users already using the platform.

Separation of Concerns

Timeline independence: pprof data represents aggregate samples without time dimension, kept completely separate from timeline-based trace analysis.

Static import model: pprof files are imported once and stored in read-only tables, avoiding complex re-aggregation logic.

Format-specific handling: Dedicated importer handles pprof-specific concepts while mapping to Perfetto's general profiling abstractions.

Minimal Overhead

Zero cost when unused: No impact on existing Perfetto functionality when pprof features are not used.

Efficient storage: Sample values stored in aggregated form, avoiding redundant per-sample overhead.

Query optimization: Leverages existing database indices and table functions for optimal performance.