mirror of
https://github.com/apache/nuttx.git
synced 2025-01-13 07:28:38 +08:00
473 lines
16 KiB
ReStructuredText
473 lines
16 KiB
ReStructuredText
|
=====================
|
||
|
Smaller Vector Tables
|
||
|
=====================
|
||
|
|
||
|
.. warning::
|
||
|
Migrated from:
|
||
|
https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables
|
||
|
|
||
|
|
||
|
One of the largest OS data structures is the vector table,
|
||
|
``g_irqvector[]``. This is the table that holds the vector
|
||
|
information when ``irq_attach()`` is called and used to
|
||
|
dispatch interrupts by ``irq_dispatch()``. Recent changes
|
||
|
have made that table even larger, for 32-bit arm the
|
||
|
size of that table is given by:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
nbytes = number_of_interrupts * (2 * sizeof(void *))
|
||
|
|
||
|
We will focus on the STM32 for this discussion to keep
|
||
|
things simple. However, this discussion applies to all
|
||
|
architectures.
|
||
|
|
||
|
The number of (physical) interrupt vectors supported by
|
||
|
the MCU hardwared given by the definition ``NR_IRQ`` which
|
||
|
is provided in a header file in ``arch/arm/include/stm32``.
|
||
|
This is, by default, the value of ``number_of_interrupts``
|
||
|
in the above equation.
|
||
|
|
||
|
For a 32-bit ARM like the STM32 with, say, 100 interrupt
|
||
|
vectors, this size would be 800 bytes of memory. That is
|
||
|
not a lot for high-end MCUs with a lot of RAM memory,
|
||
|
but could be a show stopper for MCUs with minimal RAM.
|
||
|
|
||
|
Two approaches for reducing the size of the vector tables
|
||
|
are described below. Both depend on the fact that not all
|
||
|
interrupts are used on a given MCU. Most of the time,
|
||
|
the majority of entries in ``g_irqvector[]`` are zero because
|
||
|
only a small number of interrupts are actually attached
|
||
|
and enabled by the application. If you know that certain
|
||
|
IRQ numbers are not going to be used, then it is possible
|
||
|
to filter those out and reduce the size to the number of
|
||
|
supported interrupts.
|
||
|
|
||
|
For example, if the actual number of interrupts used were
|
||
|
20, the the above requirement would go from 800 bytes to
|
||
|
160 bytes.
|
||
|
|
||
|
Software IRQ Remapping
|
||
|
======================
|
||
|
|
||
|
`[On March 3, 2017, support for this "Software IRQ Remapping"
|
||
|
as included in the NuttX repository.]`
|
||
|
|
||
|
One of the simplest way of reducing the size of
|
||
|
``g_irqvector[]`` would be to remap the large set of physical
|
||
|
interrupt vectors into a much small set of interrupts that
|
||
|
are actually used. For the sake of discussion, let's
|
||
|
imagine two new configuration settings:
|
||
|
|
||
|
* ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping
|
||
|
* ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping.
|
||
|
|
||
|
Then it could allocate the interrupt vector table to be
|
||
|
size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger
|
||
|
``NR_IRQS``:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||
|
struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
|
||
|
#else
|
||
|
struct irq_info_s g_irqvector[NR_IRQS];
|
||
|
#endif
|
||
|
|
||
|
The ``g_irqvector[]`` table is accessed in only three places:
|
||
|
|
||
|
``irq_attach()``
|
||
|
----------------
|
||
|
|
||
|
``irq_attach()`` receives the physical vector number along
|
||
|
with the information needed later to dispatch interrupts:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
int irq_attach(int irq, xcpt_t isr, FAR void *arg);
|
||
|
|
||
|
Logic in ``irq_attach()`` would map the incoming physical
|
||
|
vector number to a table index like:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||
|
int ndx = g_irqmap[irq];
|
||
|
#else
|
||
|
int ndx = irq;
|
||
|
#endif
|
||
|
|
||
|
where ``up_mapirq[]`` is an array indexed by the physical
|
||
|
interrupt vector number and contains the new, mapped
|
||
|
interrupt vector table index. This array must be
|
||
|
provided by platform-specific code.
|
||
|
|
||
|
``irq_attach()`` would this use this index to set the ``g_irqvector[]``.
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
g_irqvector[ndx].handler = isr;
|
||
|
g_irqvector[ndx].arg = arg;
|
||
|
|
||
|
``irq_dispatch()``
|
||
|
------------------
|
||
|
|
||
|
``irq_dispatch()`` is called by MCU logic when an interrupt is received:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
void irq_dispatch(int irq, FAR void *context);
|
||
|
|
||
|
Where, again irq is the physical interrupt vector number.
|
||
|
|
||
|
``irq_dispatch()`` would do essentially the same thing as
|
||
|
``irq_attach()``. First it would map the irq number to
|
||
|
a table index:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||
|
int ndx = g_irqmap[irq];
|
||
|
#else
|
||
|
int ndx = irq;
|
||
|
#endif
|
||
|
|
||
|
Then dispatch the interrupt handling to the attached
|
||
|
interrupt handler. NOTE that the physical vector
|
||
|
number is passed to the handler so it is completely
|
||
|
unaware of the underlying `shell` game:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
vector = g_irqvector[ndx].handler;
|
||
|
arg = g_irqvector[ndx].arg;
|
||
|
|
||
|
vector(irq, context, arg);
|
||
|
|
||
|
``irq_initialize()``
|
||
|
--------------------
|
||
|
|
||
|
``irq_initialize()``: simply set the ``g_irqvector[]`` table
|
||
|
a known state on power-up. It would only have to distinguish
|
||
|
the difference in sizes.
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||
|
# define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
|
||
|
#else
|
||
|
# define TAB_SIZE NR_IRQS
|
||
|
#endif
|
||
|
|
||
|
for (i = 0; i < TAB_SIZE; i++)
|
||
|
|
||
|
``g_mapirq[]``
|
||
|
--------------
|
||
|
|
||
|
An implementation of ``up_mapirq()`` might be something like:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#include <nuttx/irq.h>
|
||
|
|
||
|
const irq_mapped_t g_irqmap[NR_IRQS] =
|
||
|
{
|
||
|
... IRQ to index mapping values ...
|
||
|
};
|
||
|
|
||
|
``g_irqmap[]`` is a array of mapped irq table indices. It
|
||
|
contains the mapped index value and is itself indexed
|
||
|
by the physical interrupt vector number. It provides
|
||
|
an ``irq_mapped_t`` value in the range of 0 to
|
||
|
``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped
|
||
|
index into the vector table. Unsupported IRQs would
|
||
|
simply map to an out of range value like ``IRQMAPPED_MAX``.
|
||
|
So, for example, if ``g_irqmap[37] == 24``, then the hardware
|
||
|
interrupt vector 37 will be mapped to the interrupt vector
|
||
|
table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then
|
||
|
hardware interrupt vector 42 is not used and if it occurs
|
||
|
will result in an unexpected interrupt crash.
|
||
|
|
||
|
Hardware Vector Remapping
|
||
|
=========================
|
||
|
|
||
|
`[This technical approach is discussed here but is
|
||
|
discouraged because of technical "Complications" and
|
||
|
"Dubious Performance Improvements" discussed at the
|
||
|
end of this section.]`
|
||
|
|
||
|
Most ARMv7-M architectures support two mechanism for handling interrupts:
|
||
|
|
||
|
* The so-called `common` vector handler logic enabled with
|
||
|
``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in
|
||
|
``arch/arm/src/armv7-m/``, and
|
||
|
* MCU-specific interrupt handling logic. For the
|
||
|
STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``.
|
||
|
|
||
|
The `common` vector logic is slightly more efficient,
|
||
|
the MCU-specific logic is slightly more flexible.
|
||
|
|
||
|
If we don't use the `common` vector logic enabled with
|
||
|
``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more
|
||
|
flexible MCU-specific implementation, then we can
|
||
|
also use this to map the large set of hardware
|
||
|
interrupt vector numbers to a smaller set of software
|
||
|
interrupt numbers. This involves minimal changes to
|
||
|
the OS and does not require any magic software lookup
|
||
|
table. But is considerably more complex to implement.
|
||
|
|
||
|
This technical approach requires changes to three files:
|
||
|
|
||
|
* A new header file at ``arch/arm/include/stm32``, say
|
||
|
``xyz_irq.h`` for the purposes of this discussion.
|
||
|
This new header file is like the other IRQ definition
|
||
|
header files in that directory except that it
|
||
|
defines only the IRQ number of the interrupts after
|
||
|
remapping. So, instead of having the 100 IRQ number
|
||
|
definitions of the original IRQ header file based on
|
||
|
the physical vector numbers, this header file would
|
||
|
define ``only`` the small set of 20 ``mapped`` IRQ numbers in
|
||
|
the range from 0 through 19. It would also set ``NR_IRQS``
|
||
|
to the value 20.
|
||
|
* A new header file at ``arch/arm/src/stm32/hardware``, say
|
||
|
``xyz_vector.h``. It would be similar to the other vector
|
||
|
definitions files in that directory: It will consist
|
||
|
of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will
|
||
|
define ``VECTOR`` entries for the 20 valid interrupts and
|
||
|
80 ``UNUSED`` entries for the unused interrupt vector numbers.
|
||
|
More about this below.
|
||
|
* Modification of the ``stm32_vectors.S`` file. These changes
|
||
|
are trivial and involve only the conditional inclusion
|
||
|
of the new, special ``xyz_vectors.h`` header file.
|
||
|
|
||
|
**REVISIT**: This needs to be updated. Neither the ``xyz_vector.h``
|
||
|
files nor the ``stm32_vectors.S`` exist in the current realization.
|
||
|
This has all been replaced with the common vector handling at
|
||
|
``arch/arm/src/armv7-m``.
|
||
|
|
||
|
Vector Definitions
|
||
|
==================
|
||
|
|
||
|
In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the
|
||
|
``xyz_vector.h`` file will be included twice. Before each
|
||
|
inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined.
|
||
|
|
||
|
The first time that ``xyz_vector.h`` included, it defines the
|
||
|
hardware vector table. The hardware vector table consists
|
||
|
of ``NR_IRQS`` 32-bit addresses in an array. This is
|
||
|
accomplished by setting:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
#undef VECTOR
|
||
|
#define VECTOR(l,i) .word l
|
||
|
|
||
|
#undef UNUSED
|
||
|
#define UNUSED(i) .word stm32_reserved
|
||
|
|
||
|
Then including ``xyz_vector.h``. So consider the following
|
||
|
definitions in the original file:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
...
|
||
|
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
|
||
|
VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */
|
||
|
VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */
|
||
|
...
|
||
|
|
||
|
Suppose that we wanted to support only USART1 and that
|
||
|
we wanted to have the IRQ number for USART1 to be 12.
|
||
|
That would be accomplished in the ``xyz_vector.h`` header
|
||
|
file like this:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
...
|
||
|
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
|
||
|
UNUSED(0) /* Vector 16+38: USART2 global interrupt */
|
||
|
UNUSED(0) /* Vector 16+39: USART3 global interrupt */
|
||
|
...
|
||
|
|
||
|
Where the value of ``STM32_IRQ_USART1`` was defined to
|
||
|
be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header
|
||
|
file. When ``xyz_vector.h`` is included by ``stm32_vectors.S``
|
||
|
with the above definitions for ``VECTOR`` and ``UNUSED``, the
|
||
|
following would result:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
...
|
||
|
.word stm32_usart1
|
||
|
.word stm32_reserved
|
||
|
.word stm32_reserved
|
||
|
...
|
||
|
|
||
|
These are the settings for vector 53, 54, and 55,
|
||
|
respectively. The entire vector table would be populated
|
||
|
in this way. ``stm32_reserved``, if called would result in
|
||
|
an "unexpected ISR" crash. ``stm32_usart1``, if called will
|
||
|
process the USART1 interrupt normally as we will see below.
|
||
|
|
||
|
Interrupt Handler Definitions
|
||
|
-----------------------------
|
||
|
|
||
|
in the vector table, all of the valid vectors are set to
|
||
|
the address of a `handler` function. All unused vectors
|
||
|
are force to vector to ``stm32_reserved``. Currently, only
|
||
|
vectors that are not supported by the hardware are
|
||
|
marked ``UNUSED``, but you can mark any vector ``UNUSED`` in
|
||
|
order to eliminate it.
|
||
|
|
||
|
The second time that ``xyz_vector.h`` is included by
|
||
|
``stm32_vector.S``, the `handler` functions are generated.
|
||
|
Each of the valid vectors point to the matching handler
|
||
|
function. In this case, you do NOT have to provide
|
||
|
handlers for the ``UNUSED`` vectors, only for the used
|
||
|
``VECTOR`` vectors. All of the unused vectors will go
|
||
|
to the common ``stm32_reserved`` handler. The remaining
|
||
|
set of handlers is very sparse.
|
||
|
|
||
|
These are the values of ``UNUSED`` and ``VECTOR`` macros on the
|
||
|
second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``:
|
||
|
|
||
|
.. code-block:: asm
|
||
|
|
||
|
.macro HANDLER, label, irqno
|
||
|
.thumb_func
|
||
|
label:
|
||
|
mov r0, #\irqno
|
||
|
b exception_common
|
||
|
.endm
|
||
|
|
||
|
#undef VECTOR
|
||
|
#define VECTOR(l,i) HANDLER l, i
|
||
|
|
||
|
#undef UNUSED
|
||
|
#define UNUSED(i)
|
||
|
|
||
|
In the above USART1 example, a single handler would be
|
||
|
generated that will provide the IRQ number 12. Remember
|
||
|
that 12 is the expansion of the macro ``STM32_IRQ_USART1``
|
||
|
that is provided in the ``arch/arm/include/stm32/xyz_irq.h``
|
||
|
header file:
|
||
|
|
||
|
.. code-block:: asm
|
||
|
|
||
|
.thumb_func
|
||
|
stm32_usart1:
|
||
|
mov r0, #12
|
||
|
b exception_common
|
||
|
|
||
|
Now, when vector 16+37 occurs it is mapped to IRQ 12
|
||
|
with no significant software overhead.
|
||
|
|
||
|
A Complication
|
||
|
--------------
|
||
|
|
||
|
A complication in the above logic has been noted by David Sidrane:
|
||
|
When we access the NVIC in ``stm32_irq.c`` in order to enable
|
||
|
and disable interrupts, the logic requires the physical
|
||
|
vector number in order to select the NVIC register and
|
||
|
the bit(s) the modify in the NVIC register.
|
||
|
|
||
|
This could be handled with another small IRQ lookup table
|
||
|
(20 ``uint8_t`` entries in our example situation above). But
|
||
|
then this approach is not so much better than the `Software
|
||
|
Vector Mapping` described about which does not suffer from
|
||
|
this problem. Certainly enabling/disabling interrupts in a
|
||
|
much lower rate operation and at least does not put the
|
||
|
lookup in the critical interrupt path.
|
||
|
|
||
|
Another option suggested by David Sidrane is equally ugly:
|
||
|
|
||
|
* Don't change the ``arch/arm/include/stm32`` IRQ definition file.
|
||
|
* Instead, encode the IRQ number so that it has both
|
||
|
the index and physical vector number:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
...
|
||
|
VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
|
||
|
UNUSED(0)
|
||
|
UNUSED(0)
|
||
|
...
|
||
|
|
||
|
The STM32_INDEX_USART1 would have the value 12 and
|
||
|
STM32_IRQ_USART1 would be as before (53). This encoded
|
||
|
value would be received by ``irq_dispatch()`` and it would
|
||
|
decode both the index and the physical vector number.
|
||
|
It would use the index to look up in the ``g_irqvector[]``
|
||
|
table but would pass the physical vector number to the
|
||
|
interrupt handler as the IRQ number.
|
||
|
|
||
|
A lookup would still be required in ``irq_attach()`` in
|
||
|
order to convert the physical vector number back to
|
||
|
an index (100 ``uint8_t`` entries in our example). So
|
||
|
some lookup is unavoidable.
|
||
|
|
||
|
Based upon these analysis, my recommendation is that
|
||
|
we do not consider the second option any further. The
|
||
|
first option is cleaner, more portable, and generally
|
||
|
preferable.is well worth that.
|
||
|
|
||
|
Dubious Performance Improvements
|
||
|
--------------------------------
|
||
|
|
||
|
The intent of this second option was to provide a higher
|
||
|
performance mapping of physical interrupt vectors to IRQ
|
||
|
numbers compared to the pure software mapping of option 1. However,
|
||
|
in order to implement this approach, we had
|
||
|
to use the less efficient, non-common vector handling
|
||
|
logic. That logic is not terribly less efficient, the
|
||
|
cost is probably only a 16 bit load immediate instruction
|
||
|
and branch to another location in FLASH (which will cause
|
||
|
the CPU pipeline to be flushed).
|
||
|
|
||
|
The variant of option 2 where both the physical vector number
|
||
|
and vector table index are encoded would require even more
|
||
|
processing in ``irq_dispatch()`` in order to decode the
|
||
|
physical vector number and vector table index.
|
||
|
Possible just AND and SHIFT instructions.
|
||
|
|
||
|
However, the minimal cost of the first pure software
|
||
|
mapping approach was possibly as small as a single
|
||
|
indexed byte fetch from FLASH in ``irq_attach()``.
|
||
|
Indexing is, of course, essentially `free` in the ARM
|
||
|
ISA, the primary cost would be the FLASH memory access.
|
||
|
So my first assessment is that the performance of both
|
||
|
approaches is the essentially the same. If anything, the
|
||
|
first approach is possibly the more performant if
|
||
|
implemented efficiently.
|
||
|
|
||
|
Both options would require some minor range checking in
|
||
|
``irq_attach()`` as well.
|
||
|
|
||
|
Because of this and because of the simplicity of the
|
||
|
first option, I see no reason to support or consider
|
||
|
this second option any further.
|
||
|
|
||
|
Complexity and Generalizability
|
||
|
-------------------------------
|
||
|
|
||
|
Option 2 is overly complex; it depends on a deep understanding
|
||
|
on how the MCU interrupt logic works and on a high level of
|
||
|
Thumb assembly language skills.
|
||
|
|
||
|
Another problem with option 2 is that really only applies to
|
||
|
the Cortex-M family of processors and perhaps others that
|
||
|
support interrupt vectored interrupts in a similar fashion.
|
||
|
It is not a general solution that can be used with any CPU
|
||
|
architectures.
|
||
|
|
||
|
And even worse, the MCU-specific interrupt handling logic
|
||
|
that this support depends upon is is very limited. As soon
|
||
|
as the common interrupt handler logic was added, I stopped
|
||
|
implementing the MCU specific logic in all newer ARMv7-M
|
||
|
ports. So that MCU specific interrupt handler logic is
|
||
|
only present for EFM32, Kinetis, LPC17, SAM3/4, STM32,
|
||
|
Tiva, and nothing else. Very limited!
|
||
|
|
||
|
These are further reasons why option 2 is no recommended and
|
||
|
will not be supported explicitly.
|