AMXNest Sample
Description
This program shows the utilization of Intel® AMX technology in a nested scenario. It uses the highest priority timer handler to create a nested case with the main thread. The main thread creates a one-second period timer, and then creates an AMX tile configuration before entering loops of calling TileMultiply() subroutine.
Within these loops, the execution is periodically interrupted by the timer handler. The timer handler, in turn, creates its own tile configuration, calls TileMultiply() subroutine with a different parameter, and then releases the tile configuration. The TileMultiply() subroutine initializes two source matrices, each comprising 16*64 bytes, and zeroes the destination matrix. The source matrices are loaded into two tiles (tmm2 and tmm3), while the destination matrix is loaded into tile tmm1. The subroutine computes the dot-product of bytes in tmm2 and tmm3, storing the results in tmm1 to memory and validating against the expected data.
In this program, the main thread’s metadata in the control register (TILECFG) and tile registers in TILEDATA are periodically nested by the timer handler’s metadata and tile registers. If the OS switches TILECFG and TILEDATA incorrectly, the program will generate a general protection fault during execution, or the matrix multiply results will not pass the validation check.
Source Files
| File | Description |
|---|---|
| AMXNest.c |
Usage
Run AMXNest.ertos
Examples
In this example, the sample is running on XEON Scalable 4 system. In this scenario, we use the following command line:
run AMXNest.ertos
Output
AI App - AMXNest is running
**** AI App - AMXNest: PASS (Timer handler count = 36)****
APIs Referenced