Optimizing Performance with Intel Resource Director Technology (RDT)
Intel® Resource Director Technology (RDT) provides a set of resource allocation (or control) capabilities that provide control over how shared resources such as last-level cache (LLC) and memory bandwidth are used by applications. These capabilities include Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA):
- Certain applications can over-utilize the cache, reducing the performance of more important applications. Cache Allocation Technology (CAT) enables control over data placement in the last-level cache (LLC), enabling isolation and prioritization of important threads. CAT enables eRTOS to allocate L3 cache or L2 cache for each logical processor.
- MBA allows eRTOS to control the memory bandwidth that is available for each core.
RDT introduces an intermediate construct called a Class of Service (CLOS) which acts as a resource control tag into which a thread can be grouped. The CLOS has associated resource capacity bitmasks (CBMs) indicating how much of the cache can be used by a given CLOS. RDT uses CLOS to configure the L3/L2 cache size and memory throttle (delay). As a matter of eRTOS policy for extensibility, CLOS 0 is considered and configured as the highest priority CLOS, followed by CLOS 1, and so on. In the eRTOS context, CLOS is different from thread priority but can impact the behavior of a thread.
Available RDT Modes
To differentiate the performance among the parallel running real-time threads, eRTOS introduces two RDT modes for CAT and MBA: Flat performance mode and Priority-based CLOS performance mode.
Cache Allocation Technology (CAT) Modes
Mode | Description |
---|---|
Flat performance mode | All logical processors are equally configured with all Process L3/L2 caches. |
Priority-based CLOS performance mode | The L3/L2 caches of each real-time logical processor are based on the CLOS of the running thread (see Understanding Class of Service below), thus optimizing the performance of the thread with higher priority CLOS by reducing the L3/L2 cache contention from the thread with the lower priority CLOS. |
Memory Bandwidth Allocation (MBA) Modes
Mode | Description |
---|---|
Flat performance mode | All cores are configured with minimum memory delay. |
Priority-based CLOS performance mode |
The memory delay of each real-time logical processor is based on the CLOS of the running thread (see Understanding Class of Service below); thus, avoiding the performance degradation of the bandwidth-intense thread with higher priority CLOS by throttling the thread that may be over-utilizing memory bandwidth relative to its priority. Since MBA uses a programmable rate controller between the cores and the last-level shared caches and memory controller, bandwidth to these caches may also be reduced. Note: We recommend you only throttle bandwidth-intense threads that do not use the off-core caches effectively. |
You can configure the CAT/MBA modes through the eRTOS grub.cfg
file
Understanding Class of Service (CLOS)
The Class of Service (CLOS) of a real-time thread is based on the thread’s priority when the kernel is configured to use Priority-based CLOS performance mode. The CLOS and priority are inversely mapped with the real-time priority range (0~127) by the number of CLOS available to real-time cores. Use real-time function RtGetRDTCapability to determine the number of CLOS. With Flat performance mode, all thread CLOS defaults to 0.
In the example below, the mapping is for the number of CLOS equal to 7. The table also shows the configuration of L3 Cache Capability Bitmasks (CBM) with MBA delay for each CLOS.
CLOS | L3 CBM | MBA DELAY | Process Priority Range |
---|---|---|---|
0 | 001,1111,1111 | 0 | 109~127 |
1 | 000,0111,1111 | 20 | 90~108 |
2 | 000,0001,1111 | 40 | 72~89 |
3 | 000,0000,1111 | 50 | 54~71 |
4 | 000,0000,0111 | 60 | 36~53 |
5 | 000,0000,0011 | 70 | 18~35 |
6 | 000,0000,0001 | 80 | 0~17 |
7 | 110,0000,0000 | 90 | Windows cores |
Note: The number of CLOS, the number of CBM bits, and the range of MBA delay is dependent on processor family. The above example is based on a Skylake i9-7900X system. If Flat performance mode is configured for both CAT and MBA in the eRTOS Control Panel, the number of CLOS available to RTKernel is one.
In the CLOS example above:
- CLOS 0 is configured with all real-time L3 CBM bits (maximum L3 caches space) and zero MBA delay. Therefore, a Process thread with CLOS equal to 0 will have the highest performance when running.
- CLOS 6 is configured with one L3 CBM bit (minimum L3 cache space) and larger MBA delay. Therefore, a real-time thread with CLOS equal to 6 will have lowest performance when running.
Based on the CLOS example above:
- If CAT is configured to Flat performance mode, L3 CBM of CLOS 0~6 is equally configured with 001,1111,1111.
- If MBA is configured to Flat performance mode, MBA Delay of CLOS 0~6 is equally configured with 0.
- If Flat performance mode is configured for both CAT and MBA, only CLOS 0 is available to Process.
The real-time function RtSetThreadCLOS, available from the eRTOS SDK, can overwrite a real-time thread's CLOS.
Note: The CLOS of a real-time thread does not impact the thread scheduler’s selection of threadthe current thread, which is based on a thread’s priority. A thread's CLOS only impacts its performance when scheduled as the current thread.
Related Topics: