The difference between isolcpus and cgroups

24 Dec 2023 - John Z. Li

CPU isolation is critial for low-latency applications, such as high-frequency trading systems. To squeeze the most latency reduction from hardware, it is important to use dedicated CPU cores that are free from interruptions by other processes or the OS kernel. For Linux servers, both isolcpus and cgroups are popular choices for isolating system resources. This post intends to provide a brief summary of their difference.

First, cgroups works at the process level. One process can contain multiple threads. This means that if a set of CPU cores are assigned to a process, threads of that process might still be moved onto and from CPU cores. If the application is a multithreading one, even occasional short-lived threads might cause some jitter on the main thread that is supposed to be of low-latency. Secondly, cgroups is orthogonal to the OS scheduler, assigning a set of CPU cores to a process does not mean other processes can not use CPU cores in that set. If there are some idle cores at a given moment, say, because some threads are waiting I/O operations to finish, the OS scheduler might dispatch some other threads that do not belong to the isolated process to those CPU cores to prevent waste of system resource.

The utility isolcpus, on the other hand, provides more protection at the cost of being not able to change configurations dynamically while the OS is running, meaning isolcpus configuration must be boot-time configurations and rebooting is required if a new set of configuration is to be applied. When a set of CPU cores are isolated using isolcpus, it means the OS scheduler will totally ignore the existence of those cores as if those cores are non-existent. The isolation is done in such a way that the only way to actually run some threads on those isolated coares are to use the taskset command, or the sched_setaffinity system call. (Actually, the former calls the latter underneath.)

isolcpus allows fine-grained control over CPU affinity. For example, if we have a low-latency application that will run 3 long-running threads on starting-up, we can use sched_setaffinity to assign one core for each thread. Combining thread affinity control with CPU isolation provided by isolcpus, maximal speed of execution of threads can be achieved. It is guaranteed that no other thread can use those designated CPU cores, also that unless a threads is blocking on some system calls, the OS kernel will never migrate that thread to another CPU core.