Nuts and Bolts EDF Scheduler Lead image: Lead Image © SV-Luma,
Lead Image © SV-Luma,

Optimizing utilization with the EDF scheduler

Mr. Efficient

The superior "Earliest Deadline First" task scheduling method has been part of Linux since kernel 3.14. By Jürgen Quade, Eva-Katharina Kunst

Linux kernel 3.14 added an optional scheduling method known as Earliest Deadline First (EDF). EDF scheduling uses a scheme where the process closest to its deadline is the next process scheduled for execution.

EDF is mainly used in systems that need to meet time requirements, and it has been known to handle tasks on time even in cases for which common priority-driven scheduling fails (see Figure 1 and the "Utilization" box).

CPU occupancy example. In a priority-driven system, Task B will not be completed in time, but the superior EDF scheduler manages fine.
Figure 1: CPU occupancy example. In a priority-driven system, Task B will not be completed in time, but the superior EDF scheduler manages fine.

Existing interfaces, which select a scheduling process and assign parameters, do not yet support the EDF process. The development in Userland might be lagging behind the kernel, but admins can still use Earliest Deadline First scheduling if they put in a little manual work. This work involves defining required data structures and interface functions. The code from the kernel documentation [1] in Listing 1 shows how to set up the necessary structures.

Listing 1: dl_test.c

001 #define _GNU_SOURCE
002 #include <unistd.h>
003 #include <stdio.h>
004 #include <stdlib.h>
005 #include <string.h>
006 #include <time.h>
007 #include <linux/unistd.h>
008 #include <linux/kernel.h>
009 #include <linux/types.h>
010 #include <sys/syscall.h>
011 #include <pthread.h>
013 #define gettid() syscall(__NR_gettid)
015 #define SCHED_DEADLINE  6
017 /* XXX use the proper syscall numbers */
018 #ifdef __x86_64__
019 #define __NR_sched_setattr      314
020 #define __NR_sched_getattr      315
021 #endif
022 #ifdef __i386__
023 #define __NR_sched_setattr      351
024 #define __NR_sched_getattr      352
025 #endif
026 #ifdef __arm__
027 #define __NR_sched_setattr      380
028 #define __NR_sched_getattr      381
029 #endif
031 static volatile int done;
033 struct sched_attr {
034     __u32 size;
036     __u32 sched_policy;
037     __u64 sched_flags;
040     __s32 sched_nice;
042     /* SCHED_FIFO, SCHED_RR */
043     __u32 sched_priority;
045     /* SCHED_DEADLINE (nsec) */
046     __u64 sched_runtime;
047     __u64 sched_deadline;
048     __u64 sched_period;
049 };
051 int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags)
052 {
053     return syscall(__NR_sched_setattr, pid, attr, flags);
054 }
056 int sched_getattr(pid_t pid, struct sched_attr *attr, unsigned int size, unsigned int flags)
057 {
058     return syscall(__NR_sched_getattr, pid, attr, size, flags);
059 }
061 void *run_deadline(void *data)
062 {
063     struct sched_attr attr;
064     int x = 0, ret;
065     unsigned int flags = 0;
067     printf("deadline thread start %ld\n",
068         gettid());
070     attr.size = sizeof(attr);
071     attr.sched_flags = 0;
072     attr.sched_nice = 0;
073     attr.sched_priority = 0;
075     /* creates a 10ms/30ms reservation */
076     attr.sched_policy = SCHED_DEADLINE;
077     attr.sched_runtime = 10 * 1000 * 1000;
078     attr.sched_period  = 30 * 1000 * 1000;
079     attr.sched_deadline= 30 * 1000 * 1000;
081     ret = sched_setattr(0, &attr, flags);
082     if (ret < 0) {
083         done = 0;
084         perror("sched_setattr");
085         exit(-1);
086     }
088     while (!done) {
089         x++;
090     }
091     return NULL;
092 }
094 int main (int argc, char **argv)
095 {
096     pthread_t thread;
098     printf("main thread [%ld]\n", gettid());
099     pthread_create(&thread, NULL, run_deadline, NULL);
100     sleep(10);
101     done = 1;
102     pthread_join(thread, NULL);
103     printf("main dies [%ld]\n", gettid());
104     return 0;
105 }

The block of code starting with line 17 defines the required system call numbers for three platforms (ARM, 32 bit, and 64 bit). The central data structure that follows, struct sched_attr (line 33), establishes the scheduling parameters [2]. The third part implements the system call selections (line 51).

The new system call

sched_setattr(pid_t pid,const structsched_attr * attr,unsigned int flags)

activates both EDF and the other scheduling method (line 51). It gets three parameters along the way. The first specifies the thread for which the scheduling applies. The second parameter selects the scheduling process and any required parameters. The third parameter is intended for potential enhancements; 0 is currently passed.

The return value 0 indicates that the scheduling was successfully selected and parameterized. If an error like -1 is returned, the global variable errno contains an error code, (e.g., for a Permission denied). In the last case, the job is presumably not running with the root privileges required to select the scheduling. The call of the function sched_setattr() takes place in the example within the separate function run_deadline(), which the code starts in line 99 as a separate thread.

In total, four values are important for parameterizing the EDF: the scheduling process itself (Policy), the processing time in the worst-case scenario (WCET), the shortest period in which the computing process is reactivated after the last activation (period), and finally the actual deadline. These times are stored in 64-bit variables and are specified in nanoseconds.

The task programmed in Listing 1 is active at a minimum distance of 30 milliseconds (line 78) for a maximum of 10 milliseconds (line 77). The calculations (the cycle) must be completed after 30 milliseconds at the latest (line 79). The Linux EDF scheduler recalculates the deadline every time a task is woken.

No Cheating

Specifying the deadline for parameterizing the EDF should be sufficient, but the Linux kernel has supplemented the classical EDF with a workload control (bandwidth control) that requires knowledge of the WCET and period. The quotient of the two results in utilization.

Linux ensures that the utilization of jobs selected by EDF by default does not add up to more than 95 percent of the available computing time. That leaves at least 5 percent for tasks that manage another scheduling algorithm. This setting can be changed by using the proc files:

The admin writes a -1 in the /proc/sys/kernel/sched_rt_runtime_us proc file to decommission this utilization control.

Linux very drastically stalls jobs that make false claims in the WCET or period and take longer to compute than requested. CPU time is even withdrawn. It also refuses to specify overly large (more than 2^63) deadlines or periods. Finally, this causes an error in calculating time differences. Linux will also prevent the admin from cheating the system by calling fork(), whereby the child process normally inherits the attributes of the parent process. The

int sched_getattr(pid_t pid, struct sched_attr *attr,unsigned int size, \
                      unsigned int flags)

(line 56) system call newly introduced with kernel 3.14 also reads the once-set scheduling parameters. The parameter pid references the task again, where 0 represents the calling task. attr concerns a memory area's address of size size, from which the program is supposed to store the parameters. A 0 must be passed for flags.

Kernel Exchange

The command

make LDLIBS="-lpthread" dl_test

makes an executable program from the source code in Listing 1. This command requires a kernel version 3.14 or beyond. By default, the kernel version only comes in with Ubuntu 14.10 (Utopic Unicorn). However, a current kernel can also be installed on older versions, especially version 14.04 with Long Term Support.

You can install kernel 3.18.1 for a 64-bit Ubuntu quite easily via the command shown in Listing 2. A developer who wants to use the 32-bit version can simply exchange the text amd64 for i386 in the filenames from Listing 2.

Listing 2: Installing the Kernel from a PPA

01 wget
02 sudo dpkg -i linux-image-3.18.1-031801-generic_

The commands that represent active tasks are not yet familiar with the deadline scheduling and therefore display the kernel internal number, #6 (Figure 2). This number is also visible if the admin reads a job's sched file in the directory /proc/PID/, which EDF manages. The entry in Figure 3 indicates a priority of -1. This entry is also indicative of deadline scheduling because to indicate and simplify internal processes, EDF jobs are assigned a the priority -1.

The command output from ps -eLfc shows the kernel internal number, #6.
Figure 2: The command output from ps -eLfc shows the kernel internal number, #6.
Command output of cat /proc/tp_id/sched.
Figure 3: Command output of cat /proc/tp_id/sched.

Class Society

The whole scheduling problem has been on a new footing thanks to scheduling classes (Figure 4) – especially since the Completely Fair Scheduler (CFS), which makes it possible to use various scheduling techniques simultaneously in one system, was merged into kernel version 2.6.23.

Scheduling classes enable the parallel use of different algorithms. EDF is embedded as dl_sched_class.
Figure 4: Scheduling classes enable the parallel use of different algorithms. EDF is embedded as dl_sched_class.

EDF is embedded within the new deadline scheduling class [3]. Linux then processes all classes one after another: first – hence the highest priority – the new deadline scheduling class, then the real-time class and, finally, the Completely Fair Scheduler.

Whipped Cream

Linux development has increasingly focused on the needs of real-time applications, so it is only logical to equip the kernel with an Earliest Deadline First scheduler. This option presents various advantages compared with conventional priority-driven scheduling.