CPU Scheduling events

On Android and Linux Perfetto can gather scheduler traces via the Linux Kernel ftrace infrastructure.

This allows to get fine grained scheduling events such as:

UI

When zoomed out, the UI shows a quantized view of CPU usage, which collapses the scheduling information:

However, by zooming in, the individual scheduling events become visible:

Clicking on a CPU slice shows the relevant information in the details panel:

Scrolling down, when expanding individual processes, the scheduling events also create one track for each thread, which allows to follow the evolution of the state of individual threads:

data_sources { config { name: "linux.ftrace" ftrace_config { ftrace_events: "sched/sched_switch" ftrace_events: "sched/sched_waking" } } }

SQL

At the SQL level, the scheduling data is exposed in the sched_slice table.

select ts, dur, cpu, end_state, priority, process.name, thread.name from sched_slice left join thread using(utid) left join process using(upid)
ts dur cpu end_state priority process.name, thread.name
261187012170995 247188 2 S 130 /system/bin/logd logd.klogd
261187012418183 12812 2 D 120 /system/bin/traced_probes traced_probes0
261187012421099 220000 4 D 120 kthreadd kworker/u16:2
261187012430995 72396 2 D 120 /system/bin/traced_probes traced_probes1
261187012454537 13958 0 D 120 /system/bin/traced_probes traced_probes0
261187012460318 46354 3 S 120 /system/bin/traced_probes traced_probes2
261187012468495 10625 0 R 120 [NULL] swapper/0
261187012479120 6459 0 D 120 /system/bin/traced_probes traced_probes0
261187012485579 7760 0 R 120 [NULL] swapper/0
261187012493339 34896 0 D 120 /system/bin/traced_probes traced_probes0

TraceConfig

data_sources: { config { name: "linux.ftrace" ftrace_config { ftrace_events: "sched/sched_switch" ftrace_events: "sched/sched_process_exit" ftrace_events: "sched/sched_process_free" ftrace_events: "task/task_newtask" ftrace_events: "task/task_rename" } } } # This is to get full process name and thread<>process relationships. data_sources: { config { name: "linux.process_stats" } }

Scheduling wakeups and latency analysis

By further enabling the following in the TraceConfig, the ftrace data source will record also scheduling wake up events:

ftrace_events: "sched/sched_wakeup_new" ftrace_events: "sched/sched_waking"

While sched_switch events are emitted only when a thread is in the R(unnable) state AND is running on a CPU run queue, sched_waking events are emitted when any event causes a thread state to change.

Consider the following example:

Thread A condition_variable.wait() Thread B condition_variable.notify()

When Thread A suspends on the wait() it will enter the state S(sleeping) and get removed from the CPU run queue. When Thread B notifies the variable, the kernel will transition Thread A into the R(unnable) state. Thread A at that point is eligible to be put back on a run queue. However this might not happen for some time because, for instance:

Unless using real-time thread priorities, most Linux Kernel scheduler configurations are not strictly work-conserving. For instance the scheduler might prefer to wait some time in the hope that the thread running on the current CPU goes to idle, avoiding a cross-cpu migration which might be more costly both in terms of overhead and power.

NOTE: sched_waking and sched_wakeup provide nearly the same information. The difference lies in wakeup events across CPUs, which involve inter-processor interrupts. The former is emitted on the source (wakee) CPU, the latter on the destination (waked) CPU. sched_waking is usually sufficient for latency analysis, unless you are looking into breaking down latency due to inter-processor signaling.

When enabling sched_waking events, the following will appear in the UI when selecting a CPU slice: