pprof Support in Perfetto
Status: COMPLETED · lalitm · 2025-09-30
Objective
Add support for importing pprof files into Perfetto Trace Processor and visualizing them with flame graphs in the Perfetto UI. This enables analysis of CPU/heap profiles from Go, C++, and other tools that generate pprof format within the Perfetto ecosystem.
Overview
This feature extends Perfetto's trace analysis capabilities to include non-time-based aggregate profiling data. Unlike existing profiling support which is integrated with timeline-based traces, pprof data represents standalone aggregate samples that are independent of time.
The implementation builds upon existing Perfetto infrastructure:
- Database layer: Extends existing
stack_profile_*tables with new aggregate tables - Import pipeline: Follows the established
TraceType+TraceReaderpattern - UI layer: Leverages existing flame graph visualization components
Requirements
Zero-setup analysis: A pprof file can be analyzed with a single command or drag-and-drop.
Full format support: Support gzipped and uncompressed pprof protobuf files from any pprof-compatible tool.
Multiple metrics per file: Handle pprof files containing multiple value types (e.g., CPU samples + allocation counts) in a single visualization.
Interactive flame graphs: Provide full interactivity including zoom, search, and source location attribution where available.
No timeline confusion: Keep pprof data completely separate from time-based trace analysis to avoid user confusion.
Detailed Design
File Format Support
The implementation supports the standard pprof format as defined by Google's pprof tool:
Gzipped format: Files compressed with gzip, as typically generated by most profiling tools.
Raw protobuf: Uncompressed protobuf files for development and testing.
Profile structure: Full support for the Profile protobuf message including:
- String table for deduplicated strings
- Sample data with location hierarchies
- Function and mapping metadata
- Multiple value types (CPU samples, allocations, etc.)
Import Architecture
File Detection
The import pipeline automatically detects pprof files through a two-stage process:
- Gzip detection: Recognize gzipped files by magic bytes (
1f 8b) - Protobuf validation: After decompression, validate pprof structure by checking for Profile message with
sample_typefield
PprofTraceReader
class PprofTraceReader : public ChunkedTraceReader {
public:
explicit PprofTraceReader(TraceProcessorContext* context);
base::Status Parse(TraceBlobView blob) override;
base::Status NotifyEndOfFile() override;
private:
base::Status ParseProfile();
TraceProcessorContext* context_;
std::vector<uint8_t> buffer_;
};The reader accumulates pprof data into an internal buffer and parses the complete protobuf message upon EOF notification.
Database Schema
New Tables
The implementation introduces two new tables that integrate with existing stack profiling infrastructure:
-- Metadata for each profiling metric from pprof files
CREATE TABLE aggregate_profile (
id INTEGER PRIMARY KEY,
scope TEXT, -- file identifier (e.g., "cpu.pprof")
name TEXT, -- display name (e.g., "pprof cpu")
sample_type_type TEXT, -- pprof ValueType.type (e.g., "cpu")
sample_type_unit TEXT -- pprof ValueType.unit (e.g., "nanoseconds")
);
-- Sample values aggregated by callsite
CREATE TABLE aggregate_sample (
id INTEGER PRIMARY KEY,
aggregate_profile_id INTEGER, -- FK to aggregate_profile
callsite_id INTEGER, -- FK to stack_profile_callsite
value REAL -- sample count/value
);Integration with Existing Infrastructure
- stack_profile_frame: Stores function name and source file information
- stack_profile_callsite: Maintains call stack hierarchy from root to leaf
- stack_profile_mapping: Contains binary/library mapping information
Each pprof location becomes a frame, callsites represent the full call chain from root to leaf, and samples aggregate values at each callsite.
Data Processing Pipeline
Step 1: String Table Parsing
All pprof files use a string table for deduplication. The importer builds a vector of strings from the protobuf string_table field.
Step 2: Mapping and Function Creation
For each pprof Mapping and Function:
- Extract binary name, build ID, and memory ranges
- Create entries in
stack_profile_mappingand populate frame metadata - Build lookup tables for location resolution
Step 3: Location Processing
Each pprof Location represents a program counter with optional debug information:
- Map addresses to existing or dummy memory mappings
- Extract function names from associated line information
- Create
stack_profile_frameentries with relative PCs
Step 4: Sample Processing
For each pprof Sample:
- Build complete callsite hierarchy from location chain (reversing pprof leaf-first order)
- Create aggregate entries for each value type in the sample
- Link samples to callsites through
aggregate_sampletable
Pprof Sample → Location IDs [3,2,1] (leaf first)
↓
Perfetto Callsite hierarchy: 1 → 2 → 3 (root to leaf)
↓
Multiple aggregate_sample entries (one per value type)UI Implementation
PprofPage Component
The UI provides a dedicated page for pprof analysis accessible from the main navigation. The page automatically discovers available data and provides interactive controls.
Dynamic Data Discovery
Upon loading, the UI queries the database to discover:
- Available scopes (typically one per imported pprof file)
- Available metrics within each scope (CPU, allocations, etc.)
- Sample data for the selected scope/metric combination
// Discover available pprof data
const scopesResult = await trace.engine.query(`
SELECT DISTINCT scope FROM __intrinsic_aggregate_profile ORDER BY scope
`);
// Load metrics for selected scope
const metricsResult = await trace.engine.query(`
SELECT sample_type_type, sample_type_unit
FROM __intrinsic_aggregate_profile
WHERE scope = '${selectedScope}'
`);Flamegraph Integration
The implementation reuses Perfetto's existing QueryFlamegraph component with dynamically generated metrics:
const flamegraphMetrics = metricsFromTableOrSubquery(
`
WITH metrics AS MATERIALIZED (
SELECT
callsite_id,
sum(sample.value) AS self_value
FROM __intrinsic_aggregate_sample sample
JOIN __intrinsic_aggregate_profile profile
ON sample.aggregate_profile_id = profile.id
WHERE profile.scope = '${scope}'
AND profile.sample_type_type = '${metric}'
GROUP BY callsite_id
)
SELECT
c.id,
c.parent_id as parentId,
c.name,
c.mapping_name,
coalesce(m.self_value, 0) AS self_value
FROM _callstacks_for_stack_profile_samples!(metrics) AS c
LEFT JOIN metrics AS m USING (callsite_id)
`,
[{ name: 'Pprof Samples', unit: unit, columnName: 'self_value' }],
'include perfetto module callstacks.stack_profile'
);This query leverages the existing _callstacks_for_stack_profile_samples! table function to build the complete flamegraph hierarchy while aggregating pprof sample values.
Usage
Command Line Analysis
# Analyze a pprof file directly
$ trace_processor_shell profile.pprof
# Query available metrics
> SELECT scope, sample_type_type, sample_type_unit
FROM __intrinsic_aggregate_profile;
# Examine sample data
> SELECT COUNT(*) FROM __intrinsic_aggregate_sample
WHERE aggregate_profile_id = 1;Web UI Analysis
- File loading: Drag and drop pprof file into Perfetto UI or use file picker
- Automatic detection: Perfetto recognizes pprof format and imports data
- Navigation: Go to "Pprof" page from main navigation
- Interactive analysis: Select scope/metric and explore flame graph
Multi-metric Files
For pprof files containing multiple value types (e.g., CPU samples + heap allocations):
- Single import: All metrics from one file imported together under same scope
- Metric switching: UI dropdown allows switching between metrics instantly
- Independent analysis: Each metric displays as separate flame graph
Design Principles
Integration over Replacement
Rather than building a standalone pprof viewer, this feature integrates pprof analysis into Perfetto's existing infrastructure. This provides:
Unified tooling: Users can analyze pprof data alongside other trace formats using the same UI and SQL interface.
Leveraged infrastructure: Reuses existing flame graph rendering, call stack handling, and database optimization.
Consistent UX: Familiar Perfetto interface for users already using the platform.
Separation of Concerns
Timeline independence: pprof data represents aggregate samples without time dimension, kept completely separate from timeline-based trace analysis.
Static import model: pprof files are imported once and stored in read-only tables, avoiding complex re-aggregation logic.
Format-specific handling: Dedicated importer handles pprof-specific concepts while mapping to Perfetto's general profiling abstractions.
Minimal Overhead
Zero cost when unused: No impact on existing Perfetto functionality when pprof features are not used.
Efficient storage: Sample values stored in aggregated form, avoiding redundant per-sample overhead.
Query optimization: Leverages existing database indices and table functions for optimal performance.