forked from nuttx/nuttx-update
Doc: Migrate Smaller Vector Tables
Migrate https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables to official wiki Signed-off-by: Ludovic Vanasse <ludovicvanasse@gmail.com>
This commit is contained in:
parent
2e2ec9afa4
commit
2c2a794240
2 changed files with 474 additions and 1 deletions
|
@ -53,4 +53,5 @@ Guides
|
|||
semihosting.rst
|
||||
renode.rst
|
||||
signal_events_interrupt_handlers.rst
|
||||
signaling_sem_priority_inheritance.rst
|
||||
signaling_sem_priority_inheritance.rst
|
||||
smaller_vector_tables.rst
|
472
Documentation/guides/smaller_vector_tables.rst
Normal file
472
Documentation/guides/smaller_vector_tables.rst
Normal file
|
@ -0,0 +1,472 @@
|
|||
=====================
|
||||
Smaller Vector Tables
|
||||
=====================
|
||||
|
||||
.. warning::
|
||||
Migrated from:
|
||||
https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables
|
||||
|
||||
|
||||
One of the largest OS data structures is the vector table,
|
||||
``g_irqvector[]``. This is the table that holds the vector
|
||||
information when ``irq_attach()`` is called and used to
|
||||
dispatch interrupts by ``irq_dispatch()``. Recent changes
|
||||
have made that table even larger, for 32-bit arm the
|
||||
size of that table is given by:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
nbytes = number_of_interrupts * (2 * sizeof(void *))
|
||||
|
||||
We will focus on the STM32 for this discussion to keep
|
||||
things simple. However, this discussion applies to all
|
||||
architectures.
|
||||
|
||||
The number of (physical) interrupt vectors supported by
|
||||
the MCU hardwared given by the definition ``NR_IRQ`` which
|
||||
is provided in a header file in ``arch/arm/include/stm32``.
|
||||
This is, by default, the value of ``number_of_interrupts``
|
||||
in the above equation.
|
||||
|
||||
For a 32-bit ARM like the STM32 with, say, 100 interrupt
|
||||
vectors, this size would be 800 bytes of memory. That is
|
||||
not a lot for high-end MCUs with a lot of RAM memory,
|
||||
but could be a show stopper for MCUs with minimal RAM.
|
||||
|
||||
Two approaches for reducing the size of the vector tables
|
||||
are described below. Both depend on the fact that not all
|
||||
interrupts are used on a given MCU. Most of the time,
|
||||
the majority of entries in ``g_irqvector[]`` are zero because
|
||||
only a small number of interrupts are actually attached
|
||||
and enabled by the application. If you know that certain
|
||||
IRQ numbers are not going to be used, then it is possible
|
||||
to filter those out and reduce the size to the number of
|
||||
supported interrupts.
|
||||
|
||||
For example, if the actual number of interrupts used were
|
||||
20, the the above requirement would go from 800 bytes to
|
||||
160 bytes.
|
||||
|
||||
Software IRQ Remapping
|
||||
======================
|
||||
|
||||
`[On March 3, 2017, support for this "Software IRQ Remapping"
|
||||
as included in the NuttX repository.]`
|
||||
|
||||
One of the simplest way of reducing the size of
|
||||
``g_irqvector[]`` would be to remap the large set of physical
|
||||
interrupt vectors into a much small set of interrupts that
|
||||
are actually used. For the sake of discussion, let's
|
||||
imagine two new configuration settings:
|
||||
|
||||
* ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping
|
||||
* ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping.
|
||||
|
||||
Then it could allocate the interrupt vector table to be
|
||||
size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger
|
||||
``NR_IRQS``:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||||
struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
|
||||
#else
|
||||
struct irq_info_s g_irqvector[NR_IRQS];
|
||||
#endif
|
||||
|
||||
The ``g_irqvector[]`` table is accessed in only three places:
|
||||
|
||||
``irq_attach()``
|
||||
----------------
|
||||
|
||||
``irq_attach()`` receives the physical vector number along
|
||||
with the information needed later to dispatch interrupts:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int irq_attach(int irq, xcpt_t isr, FAR void *arg);
|
||||
|
||||
Logic in ``irq_attach()`` would map the incoming physical
|
||||
vector number to a table index like:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||||
int ndx = g_irqmap[irq];
|
||||
#else
|
||||
int ndx = irq;
|
||||
#endif
|
||||
|
||||
where ``up_mapirq[]`` is an array indexed by the physical
|
||||
interrupt vector number and contains the new, mapped
|
||||
interrupt vector table index. This array must be
|
||||
provided by platform-specific code.
|
||||
|
||||
``irq_attach()`` would this use this index to set the ``g_irqvector[]``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
g_irqvector[ndx].handler = isr;
|
||||
g_irqvector[ndx].arg = arg;
|
||||
|
||||
``irq_dispatch()``
|
||||
------------------
|
||||
|
||||
``irq_dispatch()`` is called by MCU logic when an interrupt is received:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
void irq_dispatch(int irq, FAR void *context);
|
||||
|
||||
Where, again irq is the physical interrupt vector number.
|
||||
|
||||
``irq_dispatch()`` would do essentially the same thing as
|
||||
``irq_attach()``. First it would map the irq number to
|
||||
a table index:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||||
int ndx = g_irqmap[irq];
|
||||
#else
|
||||
int ndx = irq;
|
||||
#endif
|
||||
|
||||
Then dispatch the interrupt handling to the attached
|
||||
interrupt handler. NOTE that the physical vector
|
||||
number is passed to the handler so it is completely
|
||||
unaware of the underlying `shell` game:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
vector = g_irqvector[ndx].handler;
|
||||
arg = g_irqvector[ndx].arg;
|
||||
|
||||
vector(irq, context, arg);
|
||||
|
||||
``irq_initialize()``
|
||||
--------------------
|
||||
|
||||
``irq_initialize()``: simply set the ``g_irqvector[]`` table
|
||||
a known state on power-up. It would only have to distinguish
|
||||
the difference in sizes.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
|
||||
# define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
|
||||
#else
|
||||
# define TAB_SIZE NR_IRQS
|
||||
#endif
|
||||
|
||||
for (i = 0; i < TAB_SIZE; i++)
|
||||
|
||||
``g_mapirq[]``
|
||||
--------------
|
||||
|
||||
An implementation of ``up_mapirq()`` might be something like:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <nuttx/irq.h>
|
||||
|
||||
const irq_mapped_t g_irqmap[NR_IRQS] =
|
||||
{
|
||||
... IRQ to index mapping values ...
|
||||
};
|
||||
|
||||
``g_irqmap[]`` is a array of mapped irq table indices. It
|
||||
contains the mapped index value and is itself indexed
|
||||
by the physical interrupt vector number. It provides
|
||||
an ``irq_mapped_t`` value in the range of 0 to
|
||||
``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped
|
||||
index into the vector table. Unsupported IRQs would
|
||||
simply map to an out of range value like ``IRQMAPPED_MAX``.
|
||||
So, for example, if ``g_irqmap[37] == 24``, then the hardware
|
||||
interrupt vector 37 will be mapped to the interrupt vector
|
||||
table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then
|
||||
hardware interrupt vector 42 is not used and if it occurs
|
||||
will result in an unexpected interrupt crash.
|
||||
|
||||
Hardware Vector Remapping
|
||||
=========================
|
||||
|
||||
`[This technical approach is discussed here but is
|
||||
discouraged because of technical "Complications" and
|
||||
"Dubious Performance Improvements" discussed at the
|
||||
end of this section.]`
|
||||
|
||||
Most ARMv7-M architectures support two mechanism for handling interrupts:
|
||||
|
||||
* The so-called `common` vector handler logic enabled with
|
||||
``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in
|
||||
``arch/arm/src/armv7-m/``, and
|
||||
* MCU-specific interrupt handling logic. For the
|
||||
STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``.
|
||||
|
||||
The `common` vector logic is slightly more efficient,
|
||||
the MCU-specific logic is slightly more flexible.
|
||||
|
||||
If we don't use the `common` vector logic enabled with
|
||||
``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more
|
||||
flexible MCU-specific implementation, then we can
|
||||
also use this to map the large set of hardware
|
||||
interrupt vector numbers to a smaller set of software
|
||||
interrupt numbers. This involves minimal changes to
|
||||
the OS and does not require any magic software lookup
|
||||
table. But is considerably more complex to implement.
|
||||
|
||||
This technical approach requires changes to three files:
|
||||
|
||||
* A new header file at ``arch/arm/include/stm32``, say
|
||||
``xyz_irq.h`` for the purposes of this discussion.
|
||||
This new header file is like the other IRQ definition
|
||||
header files in that directory except that it
|
||||
defines only the IRQ number of the interrupts after
|
||||
remapping. So, instead of having the 100 IRQ number
|
||||
definitions of the original IRQ header file based on
|
||||
the physical vector numbers, this header file would
|
||||
define ``only`` the small set of 20 ``mapped`` IRQ numbers in
|
||||
the range from 0 through 19. It would also set ``NR_IRQS``
|
||||
to the value 20.
|
||||
* A new header file at ``arch/arm/src/stm32/hardware``, say
|
||||
``xyz_vector.h``. It would be similar to the other vector
|
||||
definitions files in that directory: It will consist
|
||||
of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will
|
||||
define ``VECTOR`` entries for the 20 valid interrupts and
|
||||
80 ``UNUSED`` entries for the unused interrupt vector numbers.
|
||||
More about this below.
|
||||
* Modification of the ``stm32_vectors.S`` file. These changes
|
||||
are trivial and involve only the conditional inclusion
|
||||
of the new, special ``xyz_vectors.h`` header file.
|
||||
|
||||
**REVISIT**: This needs to be updated. Neither the ``xyz_vector.h``
|
||||
files nor the ``stm32_vectors.S`` exist in the current realization.
|
||||
This has all been replaced with the common vector handling at
|
||||
``arch/arm/src/armv7-m``.
|
||||
|
||||
Vector Definitions
|
||||
==================
|
||||
|
||||
In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the
|
||||
``xyz_vector.h`` file will be included twice. Before each
|
||||
inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined.
|
||||
|
||||
The first time that ``xyz_vector.h`` included, it defines the
|
||||
hardware vector table. The hardware vector table consists
|
||||
of ``NR_IRQS`` 32-bit addresses in an array. This is
|
||||
accomplished by setting:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#undef VECTOR
|
||||
#define VECTOR(l,i) .word l
|
||||
|
||||
#undef UNUSED
|
||||
#define UNUSED(i) .word stm32_reserved
|
||||
|
||||
Then including ``xyz_vector.h``. So consider the following
|
||||
definitions in the original file:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
...
|
||||
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
|
||||
VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global interrupt */
|
||||
VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global interrupt */
|
||||
...
|
||||
|
||||
Suppose that we wanted to support only USART1 and that
|
||||
we wanted to have the IRQ number for USART1 to be 12.
|
||||
That would be accomplished in the ``xyz_vector.h`` header
|
||||
file like this:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
...
|
||||
VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global interrupt */
|
||||
UNUSED(0) /* Vector 16+38: USART2 global interrupt */
|
||||
UNUSED(0) /* Vector 16+39: USART3 global interrupt */
|
||||
...
|
||||
|
||||
Where the value of ``STM32_IRQ_USART1`` was defined to
|
||||
be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header
|
||||
file. When ``xyz_vector.h`` is included by ``stm32_vectors.S``
|
||||
with the above definitions for ``VECTOR`` and ``UNUSED``, the
|
||||
following would result:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
...
|
||||
.word stm32_usart1
|
||||
.word stm32_reserved
|
||||
.word stm32_reserved
|
||||
...
|
||||
|
||||
These are the settings for vector 53, 54, and 55,
|
||||
respectively. The entire vector table would be populated
|
||||
in this way. ``stm32_reserved``, if called would result in
|
||||
an "unexpected ISR" crash. ``stm32_usart1``, if called will
|
||||
process the USART1 interrupt normally as we will see below.
|
||||
|
||||
Interrupt Handler Definitions
|
||||
-----------------------------
|
||||
|
||||
in the vector table, all of the valid vectors are set to
|
||||
the address of a `handler` function. All unused vectors
|
||||
are force to vector to ``stm32_reserved``. Currently, only
|
||||
vectors that are not supported by the hardware are
|
||||
marked ``UNUSED``, but you can mark any vector ``UNUSED`` in
|
||||
order to eliminate it.
|
||||
|
||||
The second time that ``xyz_vector.h`` is included by
|
||||
``stm32_vector.S``, the `handler` functions are generated.
|
||||
Each of the valid vectors point to the matching handler
|
||||
function. In this case, you do NOT have to provide
|
||||
handlers for the ``UNUSED`` vectors, only for the used
|
||||
``VECTOR`` vectors. All of the unused vectors will go
|
||||
to the common ``stm32_reserved`` handler. The remaining
|
||||
set of handlers is very sparse.
|
||||
|
||||
These are the values of ``UNUSED`` and ``VECTOR`` macros on the
|
||||
second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``:
|
||||
|
||||
.. code-block:: asm
|
||||
|
||||
.macro HANDLER, label, irqno
|
||||
.thumb_func
|
||||
label:
|
||||
mov r0, #\irqno
|
||||
b exception_common
|
||||
.endm
|
||||
|
||||
#undef VECTOR
|
||||
#define VECTOR(l,i) HANDLER l, i
|
||||
|
||||
#undef UNUSED
|
||||
#define UNUSED(i)
|
||||
|
||||
In the above USART1 example, a single handler would be
|
||||
generated that will provide the IRQ number 12. Remember
|
||||
that 12 is the expansion of the macro ``STM32_IRQ_USART1``
|
||||
that is provided in the ``arch/arm/include/stm32/xyz_irq.h``
|
||||
header file:
|
||||
|
||||
.. code-block:: asm
|
||||
|
||||
.thumb_func
|
||||
stm32_usart1:
|
||||
mov r0, #12
|
||||
b exception_common
|
||||
|
||||
Now, when vector 16+37 occurs it is mapped to IRQ 12
|
||||
with no significant software overhead.
|
||||
|
||||
A Complication
|
||||
--------------
|
||||
|
||||
A complication in the above logic has been noted by David Sidrane:
|
||||
When we access the NVIC in ``stm32_irq.c`` in order to enable
|
||||
and disable interrupts, the logic requires the physical
|
||||
vector number in order to select the NVIC register and
|
||||
the bit(s) the modify in the NVIC register.
|
||||
|
||||
This could be handled with another small IRQ lookup table
|
||||
(20 ``uint8_t`` entries in our example situation above). But
|
||||
then this approach is not so much better than the `Software
|
||||
Vector Mapping` described about which does not suffer from
|
||||
this problem. Certainly enabling/disabling interrupts in a
|
||||
much lower rate operation and at least does not put the
|
||||
lookup in the critical interrupt path.
|
||||
|
||||
Another option suggested by David Sidrane is equally ugly:
|
||||
|
||||
* Don't change the ``arch/arm/include/stm32`` IRQ definition file.
|
||||
* Instead, encode the IRQ number so that it has both
|
||||
the index and physical vector number:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
...
|
||||
VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
|
||||
UNUSED(0)
|
||||
UNUSED(0)
|
||||
...
|
||||
|
||||
The STM32_INDEX_USART1 would have the value 12 and
|
||||
STM32_IRQ_USART1 would be as before (53). This encoded
|
||||
value would be received by ``irq_dispatch()`` and it would
|
||||
decode both the index and the physical vector number.
|
||||
It would use the index to look up in the ``g_irqvector[]``
|
||||
table but would pass the physical vector number to the
|
||||
interrupt handler as the IRQ number.
|
||||
|
||||
A lookup would still be required in ``irq_attach()`` in
|
||||
order to convert the physical vector number back to
|
||||
an index (100 ``uint8_t`` entries in our example). So
|
||||
some lookup is unavoidable.
|
||||
|
||||
Based upon these analysis, my recommendation is that
|
||||
we do not consider the second option any further. The
|
||||
first option is cleaner, more portable, and generally
|
||||
preferable.is well worth that.
|
||||
|
||||
Dubious Performance Improvements
|
||||
--------------------------------
|
||||
|
||||
The intent of this second option was to provide a higher
|
||||
performance mapping of physical interrupt vectors to IRQ
|
||||
numbers compared to the pure software mapping of option 1. However,
|
||||
in order to implement this approach, we had
|
||||
to use the less efficient, non-common vector handling
|
||||
logic. That logic is not terribly less efficient, the
|
||||
cost is probably only a 16 bit load immediate instruction
|
||||
and branch to another location in FLASH (which will cause
|
||||
the CPU pipeline to be flushed).
|
||||
|
||||
The variant of option 2 where both the physical vector number
|
||||
and vector table index are encoded would require even more
|
||||
processing in ``irq_dispatch()`` in order to decode the
|
||||
physical vector number and vector table index.
|
||||
Possible just AND and SHIFT instructions.
|
||||
|
||||
However, the minimal cost of the first pure software
|
||||
mapping approach was possibly as small as a single
|
||||
indexed byte fetch from FLASH in ``irq_attach()``.
|
||||
Indexing is, of course, essentially `free` in the ARM
|
||||
ISA, the primary cost would be the FLASH memory access.
|
||||
So my first assessment is that the performance of both
|
||||
approaches is the essentially the same. If anything, the
|
||||
first approach is possibly the more performant if
|
||||
implemented efficiently.
|
||||
|
||||
Both options would require some minor range checking in
|
||||
``irq_attach()`` as well.
|
||||
|
||||
Because of this and because of the simplicity of the
|
||||
first option, I see no reason to support or consider
|
||||
this second option any further.
|
||||
|
||||
Complexity and Generalizability
|
||||
-------------------------------
|
||||
|
||||
Option 2 is overly complex; it depends on a deep understanding
|
||||
on how the MCU interrupt logic works and on a high level of
|
||||
Thumb assembly language skills.
|
||||
|
||||
Another problem with option 2 is that really only applies to
|
||||
the Cortex-M family of processors and perhaps others that
|
||||
support interrupt vectored interrupts in a similar fashion.
|
||||
It is not a general solution that can be used with any CPU
|
||||
architectures.
|
||||
|
||||
And even worse, the MCU-specific interrupt handling logic
|
||||
that this support depends upon is is very limited. As soon
|
||||
as the common interrupt handler logic was added, I stopped
|
||||
implementing the MCU specific logic in all newer ARMv7-M
|
||||
ports. So that MCU specific interrupt handler logic is
|
||||
only present for EFM32, Kinetis, LPC17, SAM3/4, STM32,
|
||||
Tiva, and nothing else. Very limited!
|
||||
|
||||
These are further reasons why option 2 is no recommended and
|
||||
will not be supported explicitly.
|
Loading…
Reference in a new issue