nuttx-mirror/Documentation/guides/protected_build.rst

=====================
NuttX Protected Build
=====================

.. warning::
    Migrated from :
    https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Protected+Build

The Traditional "Flat" Build
============================

The traditional NuttX build is a "flat" build. By flat, I mean that when
you build NuttX, you end up with a single "blob" called ``nuttx``. All of the
components of the build reside in the same address space. All components
of the build can access all other components of the build.

The "Two Pass" Protected Build
==============================

The NuttX protected build, on the other hand, is a "two-pass" build and
generates two "blobs": (1) a separately compiled and linked `kernel` blob
called, again, `nuttx` and separately compiled and linked `user` blob called
in ``nuttx_user.elf`` (in the existing build configurations). The user blob
is created on pass 1 and the kernel blob is created on pass2.

These two make commands are identical:

.. code-block:: bash

    make
    make pass1 pass2

But the second is clearer and I prefer to use it for the protected build.
In the second case, the user and kernel blobs are built separately; in the
first, the kernel and user blob builds may be intermixed and somewhat
confusing. You can also build the kernel and user blobs separately with
one of the following commands:

.. code-block:: bash

    make pass1
    make pass2

At the end of the build, there will be several files in the top-level NuttX build directory. From Pass 1:

* ``nuttx_user.elf``. The pass1 user-space ELF file
* ``nuttx_user.hex``. The pass1 Intel HEX format file (selected in ``defconfig``)
* ``User.map``. Symbols in the user-space ELF file

From Pass 2:

* ``nuttx``. The pass2 kernel-space ELF file
* ``nuttx.hex``. The pass2 Intel HEX file (selected in ``defconfig``)
* ``System.map``. Symbols in the kernel-space ELF file

The Memory Protection Unit
==========================

If the MCU supports a Memory Protection Unit (MPU), then the logic within
the kernel blob all execute in kernel-mode, i.e., with all privileges.
These privileged threads can access all memory, all CPU instructions,
and all MCU registers. The logic executing within the user-mode blob,
on the other hand, all execute in user-mode with certain restrictions
as enforced by the MCU and by the MPU. The MCU may restrict access to
certain registers and machine instructions; with the MPU, access to all
kernel memory resources are prohibited from the user logic. This includes
the kernel blob's FLASH, .bss/.data storage, and the kernel heap memory.

Advantages of the Protected Build
=================================

The advantages of such a protected build are (1) security and (2)
modularity. Since the kernel resources are protected, it will be much
less likely that a misbehaving task will crash the system or that a
wild pointer access will corrupt critical memory. This security also
provides a safer environment in which to execute 3rd party software
and prevents "snooping" into the kernel memory from the hosted applications.

Modularity is assured because there is a strict control of the exposed
kernel interfaces. In the flat build, all symbols are exposed and there
is no enforcement of a kernel API. With the protected build, on the
other hand, all interactions with the kernel from the user application
logic must use `system calls` (or `syscalls`) to interface with the OS. A
system call is necessary to transition from user-mode to kernel-mode;
all user-space operating system interfaces are via syscall `proxies`.
Then, while in kernel mode, the kernel system call handler will
perform the OS service requested by the application. At the
conclusion of system processing, user-privileges are restored
and control is return to the user application. Since the only
interactions with the kernel can be through support system calls,
modularity of the OS is guaranteed.

User-Space Proxies/Kernel-Space Stubs
=====================================

The same OS interfaces are exposed to the application in both the "flat"
build and the protected build. The difference is that in the protected
build, the user-code interfaces with a `proxy` for the OS function. For
example, here is what a proxy for the OS ``getpid()`` interface:

.. code-block:: c

    #include <unistd.h>
    #include <syscall.h>
    pid_t getpid(void)
    {
        return (pid_t)sys_call0(SYS_getpid);
    }

Thus the ``getpid()`` proxy is a stand-in for the real OS ``getpid()`` interface
that executes a system call so the kernel code can perform the real
``getpid()`` operation on behalf of the user application. Proxies are
auto-generated for all exported OS interfaces using the CSV file
``syscall/syscall.csv`` and the program ``tools/mksyscalls``. Similarly,
on the kernel-side, there are auto-generated `stubs` that map the
system calls back into real OS calls. These, however, are internal
to the OS and the implementation may be architecture-specific.
See the ``README.txt`` files in those directories for further information.

Combining Intel HEX Files
=========================

One issue that you may face is that the two pass builds creates two
FLASH images. Some debuggers that I use will allow me to write each
image to FLASH separately. Others will expect to have a single Intel
HEX image. In this latter case, you may need to combine the two Intel
HEX files into one. Here is how you can do that:

1) The `tail` of the ``nuttx.hex`` file should look something like this
   (with my comments and spaces added):

.. code-block:: bash

    $ tail nuttx.hex
    # 00, data records
    ...
    :10 9DC0 00 01000000000800006400020100001F0004
    :10 9DD0 00 3B005A0078009700B500D400F300110151
    :08 9DE0 00 30014E016D0100008D
    # 05, Start Linear Address Record
    :04 0000 05 0800 0419 D2
    # 01, End Of File record
    :00 0000 01 FF

Use an editor such as vi to remove the 05 and 01 records.

2) The `head` of the ``nuttx_user.hex`` file should look something like this
   (again with my comments and spaces added):

.. code-block:: bash

    $ head nuttx_user.hex
    # 04, Extended Linear Address Record
    :02 0000 04 0801 F1
    # 00, data records
    :10 8000 00 BD89 01084C800108C8110208D01102087E
    :10 8010 00 0010 00201C1000201C1000203C16002026
    :10 8020 00 4D80 01085D80010869800108ED83010829
    ...

Nothing needs to be done here. The ``nuttx_user.hex`` file should be fine.

3) Combine the edited nuttx.hex and un-edited ``nuttx_user.hex`` file to produce
   a single combined hex file:

.. code-block:: bash

    $ cat nuttx.hex nuttx_user.hex >combined.hex

Then use the ``combined.hex`` file with for FLASH/JTAG tool. If you do this
a lot, you will probably want to invest a little time to develop a tool
to automate these steps.

Files and Directories
=====================

Here is a summary of directories and files used by the STM32F4Discovery
protected build:

* ``boards/arm/stm32/stm32f4discovery/configs/kostest``. This is the kernel
  mode OS test configuration. The two standard configuration files
  can be found in this directory: (1) ``defconfig`` and (2) ``Make.defs``.
* ``boards/arm/stm32/stm32f4discovery/kernel``. This is the first past
  build directory. The Makefile in this directory is invoked to
  produce the pass1 object (``nuttx_user.elf`` in this case). The
  second pass object is created by ``arch/arm/src/Makefile``. Also
  in this directory is the file ``userspace.c``. The user-mode blob
  contains a header that includes information need by the kernel
  blob in order to interface with the user-code. That header is
  defined in by this file.
* ``boards/arm/stm32/stm32f4discovery/scripts``. Linker scripts for
  the kernel mode build are found in this directory. This includes
  (1) ``memory.ld`` which hold the common memory map, (2) ``user-space.ld``
  that is used for linking the pass1 user-mode blob, and (3)
  ``kernel-space.ld`` that is used for linking the pass1 kernel-mode blob.

Alignment, Regions, and Subregions
==================================

There are some important comments in the ``memory.ld``
file that are worth duplicating here:

"The STM32F407VG has 1024Kb of FLASH beginning at address
0x0800:0000 and 192Kb of SRAM. SRAM is split up into three blocks:

* "112KB of SRAM beginning at address 0x2000:0000
* "16KB of SRAM beginning at address 0x2001:c000
* "64KB of CCM SRAM beginning at address 0x1000:0000

"When booting from FLASH, FLASH memory is aliased to address
0x0000:0000 where the code expects to begin execution by jumping
to the entry point in the 0x0800:0000 address range.

"For MPU support, the kernel-mode NuttX section is assumed to
be 128Kb of FLASH and 4Kb of SRAM. That is an excessive amount
for the kernel which should fit into 64KB and, of course, can
be optimized as needed... Allowing the additional memory does
permit addition debug instrumentation to be added to the kernel
space without overflowing the partition.

"Alignment of the user space FLASH partition is also a critical
factor: The user space FLASH partition will be spanned with a
single region of size 2||n bytes. The alignment of the user-space
region must be the same. As a consequence, as the user-space
increases in size, the alignment requirement also increases.

"This alignment requirement means that the largest user space
FLASH region you can have will be 512KB at it would have to be
positioned at 0x08800000. If you change this address, don't
forget to change the ``CONFIG_NUTTX_USERSPACE`` configuration
setting to match and to modify the check in ``kernel/userspace.c``.

"For the same reasons, the maximum size of the SRAM mapping is
limited to 4KB. Both of these alignment limitations could be
reduced by using multiple MPU regions to map the FLASH/SDRAM
range or perhaps with some clever use of subregions."

Memory Management
=================

At present, there are two options for memory management in the
NuttX protected build:

Single User Heap
----------------

By default, there is only a single user-space heap and heap
allocator that is shared by both kernel- and user-modes.
PROs: Simple and makes good use of the heap memory space,
CONs: Awkward architecture and no security for kernel-mode
allocations.

Dual, Partitioned Heaps
-----------------------

Two configuration options can change this behavior:

* ``CONFIG_MM_MULTIHEAP=y``. This changes internal memory manager interfaces
  so that multiple heaps can be supported.
* ``CONFIG_MM_KERNEL_HEAP=y``. Uses the multi-heap capability to enable
  a kernel heap

If this both options are defined defined, the two heap partitions and
two copies of the memory allocators are built:

One un-protected heap partition that will allocate user accessible memory
that is shared by both the kernel- and user-space code. That allocator
physically resides in the user address space so that it can be called
directly by both the user- and kernel-space code. There is a header at
the beginning of the user-space blob; the kernel-space code gets
address of the user-space allocator from this header.

And another protected heap partition that will allocate protected
memory that is only accessible from the kernel code. This allocator
is built into the kernel block. This separate protected heap is
required if you want to support security features.

NOTE: There are security issues with calling into the user space
allocators in kernel mode. That is a security hole that could be
exploit to gain control of the system! Instead, the kernel code
should switch to user mode before entering the memory allocator
stubs (perhaps via a trap). The memory allocator stubs should
then trap to return to kernel mode (as does the signal handler now).

The Traditional Approach
------------------------

A more traditional approach would use something like the interface
``sbrk()``. The ``sbrk()`` function adds memory to the heap space
allocation of the calling process. In this case, there would
still be kernel- and user-mode instances of the memory allocators.
Each would ``sbrk()`` as necessary to extend their heap; the pages
allocated for the kernel-mode allocator would be protected but
the pages allocated for the user-mode allocator would not.
PROs: Meets all of the needs. CONs: Complex. Memory losses
due to quantization.

This approach works well with CPUs that have very capable
Memory Management Units (MMUs) that can coalesce the
srbk-ed chunks to a contiguous, `virtual` heap region.
Without an MMU, the sbrk-ed memory would not be
contiguous; this would limit the sizes of allocations
due to the physical pages.

Many MCUs will have Memory Protection Units (MPUs) that can
support the security features (only). However these lower
end MPUs may not support sufficient mapping capability to
support this traditional approach. The ARMv7-M MPU, for
example, only supports eight protection regions to manage
all FLASH and SRAM and so this approach would not be
technically feasible for th ARMv7-M family (Cortex-M3/4).

Comparing the "Flat" Build Configuration with the Protected Build Configuration
===============================================================================

Compare, for example the configuration
``boards/arm/stm32/stm32f4discovery/configs/ostest`` and the
configuration ``boards/arm/stm32/stm32f4discovery/configs/kostest``.
These two configurations are identical except that one builds a
"flat" version of OS test and the other builds a kernel version
of the OS test. See the file ``boards/arm/stm32/stm32f4discovery/README.txt``
for more details about those configurations.

The configurations can be compared using the ``cmpconfig`` tool:

.. code-block:: bash

    cd tools
    make -f Makefile.host cmpconfig
    cd ..
    tools/cmpconfig boards/arm/stm32/stm32f4discovery/configs/ostest/defconfig boards/arm/stm32/stm32f4discovery/configs/kostest/defconfig

Here is a summary of the meaning of all of the important differences in the
configurations. This should be enough information for you to convert any
configuration from a "flat" to a protected build:

* ``CONFIG_BUILD_2PASS=y``. This enables the two pass build.
* ``CONFIG_BUILD_PROTECTED=y``. This option enables the "two pass"
  protected build.
* ``CONFIG_PASS1_BUILDIR="boards/arm/stm32/stm32f4discovery/kernel"``.
  This tells the build system the (relative) location of the pass1 build directory.
* ``CONFIG_PASS1_OBJECT=""``. In some "two pass" build configurations,
  the build system need to know the name of the first pass object.
  This setting is not used for the protected build.
* ``CONFIG_NUTTX_USERSPACE=0x08020000``. This is the expected location
  where the user-mode blob will be located. The user-mode blob
  contains a header that includes information need by the kernel
  blob in order to interface with the user-code. That header will
  be expected to reside at this location.
* ``CONFIG_PASS1_TARGET="all"``. This is the build target to use for
  invoking the pass1 make.
* ``CONFIG_MM_MULTIHEAP=y``. This changes internal memory manager
  interfaces so that multiple heaps can be supported.
* ``CONFIG_MM_KERNEL_HEAP=y``. NuttX supports the option of using a
  single user-accessible heap or, if this options is defined,
  two heaps: (1) one that will allocate user accessible memory
  that is shared by both the kernel- and user-space code, and
  (2) one that will allocate protected memory that is only
  accessible from the kernel code. Separate heap memory is required
  if you want to support security features.
* ``CONFIG_MM_KERNEL_HEAPSIZE=8192``. This determines an approximate
  size for the kernel heap. The standard heap space is partitioned
  into a kernel- and user-heap space. This size of the kernel heap
  is only approximate because the user heap is subject to stringent
  alignment requirements. Because of the alignment requirements, the
  actual size of the kernel heap could be considerable larger than this.
* ``CONFIG_BOARD_EARLY_INITIALIZE=y``. This setting enables a special,
  `early` initialization call to initialize board-specific resources.
* ``CONFIG_BOARD_LATE_INITIALIZE=y``. This setting enables a special
  initialization call to initialize `late` board-specific resources.
  The difference between ``CONFIG_BOARD_EARLY_INITIALIZE`` and
  ``CONFIG_BOARD_LATE_INITIALIZE`` is that the ``CONFIG_BOARD_EARLY_INITIALIZE``
  logic runs earlier in initialization before the full operating
  system is up and running. ``CONFIG_BOARD_LATE_INITIALIZE``, on the
  other hand, runs at the completion of initialization, just before
  the user applications are started. Neither ``CONFIG_BOARD_EARLY_INITIALIZE``
  nor ``CONFIG_BOARD_LATE_INITIALIZE`` are used in the OS test
  configuration but other configurations (such as NSH)
  require some application-specific initialization before
  the application can run. In the "flat" build, such initialization
  is performed as part of the application start-up sequence.
  These includes such things as initializing device drivers.
  These same initialization steps must be performed in kernel
  mode for the protected build and ``CONFIG_BOARD_LATE_INITIALIZE``.
  See ``boards/arm/stm32/stm32f4discovery/src/up_boot.c`` for an
  example of such board initialization code.
* ``CONFIG_NSH_ARCHINITIALIZE`` is not defined. The setting
  ``CONFIG_NSH_ARCHINITIALIZE`` does not apply to the OS test
  configuration, however, this is noted here as an example
  of initialization that cannot be performed in the protected build.

Architecture-Specific Options:

* ``CONFIG_SYS_RESERVED=8``. The user application logic
  interfaces with the kernel blob using system calls.
  The architecture-specific logic may need to reserved a
  few system calls for its own internal use. The ARMv7-M
  architectures all require 8 reserved system calls.
* ``CONFIG_SYS_NNEST=2``. System calls may be nested. The
  system must retain information about each nested system
  call and this setting is used to set aside resources for
  nested system calls. In the current architecture, a maximum
  nesting level of two is all that is needed.
* ``CONFIG_ARMV7M_MPU=y``. This settings enables support for
  the ARMv7-M Memory Protection Unit (MPU). The MPU is used
  to prohibit user-mode access to kernel resources.
* ``CONFIG_ARMV7M_MPU_NREGIONS=8``. The ARMv7-M MPU supports 8
  protection regions.

Size Expansion
==============

The protected build will, or course, result in a FLASH image that is
larger than that of the corresponding "flat" build. How much larger?
I don't have the numbers in hand, but you can build
``boards/arm/stm32/stm32f4discovery/configs/nsh`` and
``boards/arm/stm32/stm32f4discovery/configs/kostest`` and compare
the resulting binaries for yourself using the ``size`` command.

Increases in size are expected because:

* The syscall layer is included in the protected build but not the flat
  build.
* The kernel-size _syscal_l stubs will cause all enabled OS code to be
  drawn into the build. In the flat build, only those OS interfaces
  actually called by the application will be included in the final objects.
* The dual memory allocators will increase size.
* Code duplication. Some code, such as the C library, will be
  duplicated in both the kernel- and user-blobs, and
* Alignment. The alignments required by the MPU logic will leave
  relatively large regions of FLASH (and perhaps RAM) is not usable.

Performance Issues
==================

The only performance differences using the protected build should
result as a consequence of the `sycalls` used to interact with the
OS vs. the direct C calls as used in the flat build. If your
performance is highly dependent upon high rate OS calls, then
this could be an issue for you. But, in the typical application,
OS calls do not often figure into the critical performance paths.

The `syscalls` are, ultimately, software interrupts. If the platform
does not support prioritized, nested interrupts then the `syscall`
execution could also delay other hardware interrupt processing.
However, `sycall` processing is negligible: they really just
configure to return to in supervisor mode and vector to the
`syscall` stub. They should be lightning fast and, for the typical
real-time applications, should cause no issues.