Trace Processor Architecture
This document explains how Perfetto's trace processor works, from ingesting raw trace files to providing SQL-queryable data. It covers the key components, data flow, and architectural patterns that enable the trace processor to handle traces from various formats (Proto, JSON, Systrace, etc.) and transform them into a unified analytical database.
Overview
The trace processor is a system that ingests trace files of various formats, parses their contents, sorts events by timestamp, and stores the data in a columnar SQL database for analysis. It processes traces in chunks to efficiently handle large files.
Core Data Pipeline
Raw Trace → ForwardingTraceParser → Format-Specific ChunkedTraceReader →
TraceSorter → TraceStorage → SQL Query Engine
Format Detection and Delegation
ForwardingTraceParser (src/trace_processor/forwarding_trace_parser.cc:95-134
)
- Detects trace format using
GuessTraceType()
from first bytes - Creates appropriate reader via TraceReaderRegistry (
src/trace_processor/trace_reader_registry.h
) - All readers implement ChunkedTraceReader interface (
src/trace_processor/importers/common/chunked_trace_reader.h
)
Format Registration (src/trace_processor/trace_processor_impl.cc:475-519
)
context()->reader_registry->RegisterTraceReader<JsonTraceTokenizer>(kJsonTraceType);
context()->reader_registry->RegisterTraceReader<ProtoTraceReader>(kProtoTraceType);
context()->reader_registry->RegisterTraceReader<SystraceTraceParser>(kSystraceTraceType);
Format-Specific Readers (Diverse Approaches)
1. JSON Traces
JsonTraceTokenizer (src/trace_processor/importers/json/json_trace_tokenizer.h:73
)
- Data Flow: Raw JSON → Tokenizer → JsonEvent objects → TraceSorter::Stream
- Parser: JsonTraceParser processes sorted events → TraceStorage
- Architecture: Tokenizer/Parser split with JSON-specific state machine
2. Proto Traces (Complex Modular System)
ProtoTraceReader (src/trace_processor/importers/proto/proto_trace_reader.h:58
)
- Data Flow: Proto bytes → ProtoTraceTokenizer → ProtoImporterModules → TraceSorter::Stream
- Modules: Register for specific packet field IDs (
src/trace_processor/importers/proto/proto_importer_module.h:110
)- Tokenization phase: Early processing before sorting
- Parsing phase: Post-sorting detailed processing
- Examples: FtraceModule, TrackEventModule, AndroidModule (many files in
src/trace_processor/importers/proto/
)
3. Systrace (Line-Based Processing)
SystraceTraceParser (src/trace_processor/importers/systrace/systrace_trace_parser.h:34
)
- Data Flow: Text lines → SystraceLineTokenizer → SystraceLine objects → TraceSorter::Stream
- Architecture: State machine for HTML + trace data sections
4. Other Formats
- Perf:
perf_importer::PerfDataTokenizer
(binary perf.data format) - Gecko:
gecko_importer::GeckoTraceTokenizer
(Firefox traces) - Fuchsia:
FuchsiaTraceTokenizer
(Fuchsia kernel traces)
Event Sorting and Processing
TraceSorter (src/trace_processor/sorter/trace_sorter.h:43
)
- Purpose: Multi-stream timestamp-based merge sorting
- Architecture: Per-CPU queues for ftrace, windowed sorting for streaming
- Streams: Each format creates typed streams (JsonEvent, TracePacketData, SystraceLine, etc.)
- Output: Sorted events to format-specific parsers
Storage Layer
TraceStorage (src/trace_processor/storage/trace_storage.h
)
- Architecture: Columnar storage with specialized table types
- Tables: SliceTable, ProcessTable, ThreadTable, CounterTable, etc.
- Access: Direct insertion by parsers, SQL queries by engine
Context and Coordination
TraceProcessorContext (src/trace_processor/types/trace_processor_context.h
)
- Multi-level state management:
- Global state (shared across machines)
- Per-trace state (specific to each trace file)
- Per-machine state (unique to each machine)
- Per-trace-and-machine state (most specific)
- Coordination: Central access point for storage, sorter, trackers
Key Architectural Patterns
1. ChunkedTraceReader Interface
All format readers implement same interface but with completely different internal architectures:
- JSON: Incremental JSON parsing with state machine
- Proto: Modular packet processing with field-based routing
- Systrace: Line-by-line text processing
- Archives (ZIP/TAR): Container formats that extract and delegate
2. TraceSorter::Stream Pattern
Each format defines its own event types and creates typed streams:
Stream<JsonEvent>
for JSON tracesStream<TracePacketData>
for proto eventsStream<SystraceLine>
for systrace lines
3. Parser vs Tokenizer Split
- Tokenizer: Pre-sorting processing, fast timestamp extraction
- Parser: Post-sorting detailed processing into storage
- Not all formats use this split (depends on complexity)
File Path Reference
Core Infrastructure:
src/trace_processor/forwarding_trace_parser.{h,cc}
- Format detection and delegationsrc/trace_processor/trace_reader_registry.{h,cc}
- Reader registrationsrc/trace_processor/sorter/trace_sorter.h
- Event sortingsrc/trace_processor/storage/trace_storage.h
- Columnar storage
Format Readers (examples):
src/trace_processor/importers/json/json_trace_tokenizer.h
- JSON processingsrc/trace_processor/importers/proto/proto_trace_reader.h
- Proto entry pointsrc/trace_processor/importers/proto/proto_importer_module.h
- Proto module systemsrc/trace_processor/importers/systrace/systrace_trace_parser.h
- Systrace processing
Registration:
src/trace_processor/trace_processor_impl.cc:475-519
- Where all readers are registered