===================== Smaller Vector Tables ===================== .. warning:: Migrated from: https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables One of the largest OS data structures is the vector table, ``g_irqvector[]``. This is the table that holds the vector information when ``irq_attach()`` is called and used to dispatch interrupts by ``irq_dispatch()``. Recent changes have made that table even larger, for 32-bit arm the size of that table is given by: .. code-block:: c nbytes = number_of_interrupts * (2 * sizeof(void *)) We will focus on the STM32 for this discussion to keep things simple. However, this discussion applies to all architectures. The number of (physical) interrupt vectors supported by the MCU hardwared given by the definition ``NR_IRQ`` which is provided in a header file in ``arch/arm/include/stm32``. This is, by default, the value of ``number_of_interrupts`` in the above equation. For a 32-bit ARM like the STM32 with, say, 100 interrupt vectors, this size would be 800 bytes of memory. That is not a lot for high-end MCUs with a lot of RAM memory, but could be a show stopper for MCUs with minimal RAM. Two approaches for reducing the size of the vector tables are described below. Both depend on the fact that not all interrupts are used on a given MCU. Most of the time, the majority of entries in ``g_irqvector[]`` are zero because only a small number of interrupts are actually attached and enabled by the application. If you know that certain IRQ numbers are not going to be used, then it is possible to filter those out and reduce the size to the number of supported interrupts. For example, if the actual number of interrupts used were 20, the the above requirement would go from 800 bytes to 160 bytes. Software IRQ Remapping ====================== `[On March 3, 2017, support for this "Software IRQ Remapping" as included in the NuttX repository.]` One of the simplest way of reducing the size of ``g_irqvector[]`` would be to remap the large set of physical interrupt vectors into a much small set of interrupts that are actually used. For the sake of discussion, let's imagine two new configuration settings: * ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping * ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping. Then it could allocate the interrupt vector table to be size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger ``NR_IRQS``: .. code-block:: c #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS]; #else struct irq_info_s g_irqvector[NR_IRQS]; #endif The ``g_irqvector[]`` table is accessed in only three places: ``irq_attach()`` ---------------- ``irq_attach()`` receives the physical vector number along with the information needed later to dispatch interrupts: .. code-block:: c int irq_attach(int irq, xcpt_t isr, FAR void *arg); Logic in ``irq_attach()`` would map the incoming physical vector number to a table index like: .. code-block:: c #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE int ndx = g_irqmap[irq]; #else int ndx = irq; #endif where ``up_mapirq[]`` is an array indexed by the physical interrupt vector number and contains the new, mapped interrupt vector table index. This array must be provided by platform-specific code. ``irq_attach()`` would this use this index to set the ``g_irqvector[]``. .. code-block:: c g_irqvector[ndx].handler = isr; g_irqvector[ndx].arg = arg; ``irq_dispatch()`` ------------------ ``irq_dispatch()`` is called by MCU logic when an interrupt is received: .. code-block:: c void irq_dispatch(int irq, FAR void *context); Where, again irq is the physical interrupt vector number. ``irq_dispatch()`` would do essentially the same thing as ``irq_attach()``. First it would map the irq number to a table index: .. code-block:: c #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE int ndx = g_irqmap[irq]; #else int ndx = irq; #endif Then dispatch the interrupt handling to the attached interrupt handler. NOTE that the physical vector number is passed to the handler so it is completely unaware of the underlying `shell` game: .. code-block:: c vector = g_irqvector[ndx].handler; arg = g_irqvector[ndx].arg; vector(irq, context, arg); ``irq_initialize()`` -------------------- ``irq_initialize()``: simply set the ``g_irqvector[]`` table a known state on power-up. It would only have to distinguish the difference in sizes. .. code-block:: c #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE # define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS #else # define TAB_SIZE NR_IRQS #endif for (i = 0; i < TAB_SIZE; i++) ``g_mapirq[]`` -------------- An implementation of ``up_mapirq()`` might be something like: .. code-block:: c #include const irq_mapped_t g_irqmap[NR_IRQS] = { ... IRQ to index mapping values ... }; ``g_irqmap[]`` is a array of mapped irq table indices. It contains the mapped index value and is itself indexed by the physical interrupt vector number. It provides an ``irq_mapped_t`` value in the range of 0 to ``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped index into the vector table. Unsupported IRQs would simply map to an out of range value like ``IRQMAPPED_MAX``. So, for example, if ``g_irqmap[37] == 24``, then the hardware interrupt vector 37 will be mapped to the interrupt vector table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then hardware interrupt vector 42 is not used and if it occurs will result in an unexpected interrupt crash. Hardware Vector Remapping ========================= `[This technical approach is discussed here but is discouraged because of technical "Complications" and "Dubious Performance Improvements" discussed at the end of this section.]` Most ARMv7-M architectures support two mechanism for handling interrupts: * The so-called `common` vector handler logic enabled with ``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in ``arch/arm/src/armv7-m/``, and * MCU-specific interrupt handling logic. For the STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``. The `common` vector logic is slightly more efficient, the MCU-specific logic is slightly more flexible. If we don't use the `common` vector logic enabled with ``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more flexible MCU-specific implementation, then we can also use this to map the large set of hardware interrupt vector numbers to a smaller set of software interrupt numbers. This involves minimal changes to the OS and does not require any magic software lookup table. But is considerably more complex to implement. This technical approach requires changes to three files: * A new header file at ``arch/arm/include/stm32``, say ``xyz_irq.h`` for the purposes of this discussion. This new header file is like the other IRQ definition header files in that directory except that it defines only the IRQ number of the interrupts after remapping. So, instead of having the 100 IRQ number definitions of the original IRQ header file based on the physical vector numbers, this header file would define ``only`` the small set of 20 ``mapped`` IRQ numbers in the range from 0 through 19. It would also set ``NR_IRQS`` to the value 20. * A new header file at ``arch/arm/src/stm32/hardware``, say ``xyz_vector.h``. It would be similar to the other vector definitions files in that directory: It will consist of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will define ``VECTOR`` entries for the 20 valid interrupts and 80 ``UNUSED`` entries for the unused interrupt vector numbers. More about this below. * Modification of the ``stm32_vectors.S`` file. These changes are trivial and involve only the conditional inclusion of the new, special ``xyz_vectors.h`` header file. **REVISIT**: This needs to be updated. Neither the ``xyz_vector.h`` files nor the ``stm32_vectors.S`` exist in the current realization. This has all been replaced with the common vector handling at ``arch/arm/src/armv7-m``. Vector Definitions ================== In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the ``xyz_vector.h`` file will be included twice. Before each inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined. The first time that ``xyz_vector.h`` included, it defines the hardware vector table. The hardware vector table consists of ``NR_IRQS`` 32-bit addresses in an array. This is accomplished by setting: .. code-block:: c #undef VECTOR #define VECTOR(l,i) .word l #undef UNUSED #define UNUSED(i) .word stm32_reserved Then including ``xyz_vector.h``. So consider the following definitions in the original file: .. code-block:: c ... VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */ VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */ VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */ ... Suppose that we wanted to support only USART1 and that we wanted to have the IRQ number for USART1 to be 12. That would be accomplished in the ``xyz_vector.h`` header file like this: .. code-block:: c ... VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */ UNUSED(0) /* Vector 16+38: USART2 global interrupt */ UNUSED(0) /* Vector 16+39: USART3 global interrupt */ ... Where the value of ``STM32_IRQ_USART1`` was defined to be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header file. When ``xyz_vector.h`` is included by ``stm32_vectors.S`` with the above definitions for ``VECTOR`` and ``UNUSED``, the following would result: .. code-block:: c ... .word stm32_usart1 .word stm32_reserved .word stm32_reserved ... These are the settings for vector 53, 54, and 55, respectively. The entire vector table would be populated in this way. ``stm32_reserved``, if called would result in an "unexpected ISR" crash. ``stm32_usart1``, if called will process the USART1 interrupt normally as we will see below. Interrupt Handler Definitions ----------------------------- in the vector table, all of the valid vectors are set to the address of a `handler` function. All unused vectors are force to vector to ``stm32_reserved``. Currently, only vectors that are not supported by the hardware are marked ``UNUSED``, but you can mark any vector ``UNUSED`` in order to eliminate it. The second time that ``xyz_vector.h`` is included by ``stm32_vector.S``, the `handler` functions are generated. Each of the valid vectors point to the matching handler function. In this case, you do NOT have to provide handlers for the ``UNUSED`` vectors, only for the used ``VECTOR`` vectors. All of the unused vectors will go to the common ``stm32_reserved`` handler. The remaining set of handlers is very sparse. These are the values of ``UNUSED`` and ``VECTOR`` macros on the second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``: .. code-block:: asm .macro HANDLER, label, irqno .thumb_func label: mov r0, #\irqno b exception_common .endm #undef VECTOR #define VECTOR(l,i) HANDLER l, i #undef UNUSED #define UNUSED(i) In the above USART1 example, a single handler would be generated that will provide the IRQ number 12. Remember that 12 is the expansion of the macro ``STM32_IRQ_USART1`` that is provided in the ``arch/arm/include/stm32/xyz_irq.h`` header file: .. code-block:: asm .thumb_func stm32_usart1: mov r0, #12 b exception_common Now, when vector 16+37 occurs it is mapped to IRQ 12 with no significant software overhead. A Complication -------------- A complication in the above logic has been noted by David Sidrane: When we access the NVIC in ``stm32_irq.c`` in order to enable and disable interrupts, the logic requires the physical vector number in order to select the NVIC register and the bit(s) the modify in the NVIC register. This could be handled with another small IRQ lookup table (20 ``uint8_t`` entries in our example situation above). But then this approach is not so much better than the `Software Vector Mapping` described about which does not suffer from this problem. Certainly enabling/disabling interrupts in a much lower rate operation and at least does not put the lookup in the critical interrupt path. Another option suggested by David Sidrane is equally ugly: * Don't change the ``arch/arm/include/stm32`` IRQ definition file. * Instead, encode the IRQ number so that it has both the index and physical vector number: .. code-block:: c ... VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1) UNUSED(0) UNUSED(0) ... The STM32_INDEX_USART1 would have the value 12 and STM32_IRQ_USART1 would be as before (53). This encoded value would be received by ``irq_dispatch()`` and it would decode both the index and the physical vector number. It would use the index to look up in the ``g_irqvector[]`` table but would pass the physical vector number to the interrupt handler as the IRQ number. A lookup would still be required in ``irq_attach()`` in order to convert the physical vector number back to an index (100 ``uint8_t`` entries in our example). So some lookup is unavoidable. Based upon these analysis, my recommendation is that we do not consider the second option any further. The first option is cleaner, more portable, and generally preferable.is well worth that. Dubious Performance Improvements -------------------------------- The intent of this second option was to provide a higher performance mapping of physical interrupt vectors to IRQ numbers compared to the pure software mapping of option 1. However, in order to implement this approach, we had to use the less efficient, non-common vector handling logic. That logic is not terribly less efficient, the cost is probably only a 16 bit load immediate instruction and branch to another location in FLASH (which will cause the CPU pipeline to be flushed). The variant of option 2 where both the physical vector number and vector table index are encoded would require even more processing in ``irq_dispatch()`` in order to decode the physical vector number and vector table index. Possible just AND and SHIFT instructions. However, the minimal cost of the first pure software mapping approach was possibly as small as a single indexed byte fetch from FLASH in ``irq_attach()``. Indexing is, of course, essentially `free` in the ARM ISA, the primary cost would be the FLASH memory access. So my first assessment is that the performance of both approaches is the essentially the same. If anything, the first approach is possibly the more performant if implemented efficiently. Both options would require some minor range checking in ``irq_attach()`` as well. Because of this and because of the simplicity of the first option, I see no reason to support or consider this second option any further. Complexity and Generalizability ------------------------------- Option 2 is overly complex; it depends on a deep understanding on how the MCU interrupt logic works and on a high level of Thumb assembly language skills. Another problem with option 2 is that really only applies to the Cortex-M family of processors and perhaps others that support interrupt vectored interrupts in a similar fashion. It is not a general solution that can be used with any CPU architectures. And even worse, the MCU-specific interrupt handling logic that this support depends upon is is very limited. As soon as the common interrupt handler logic was added, I stopped implementing the MCU specific logic in all newer ARMv7-M ports. So that MCU specific interrupt handler logic is only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, Tiva, and nothing else. Very limited! These are further reasons why option 2 is no recommended and will not be supported explicitly.