nuttx-mirror/Documentation/guides/protected_build.rst

=====================
NuttX Protected Build
=====================

.. warning::
    Migrated from : 
    https://cwiki.apache.org/confluence/display/NUTTX/NuttX+Protected+Build

The Traditional "Flat" Build
============================

The traditional NuttX build is a "flat" build. By flat, I mean that when 
you build NuttX, you end up with a single "blob" called ``nuttx``. All of the 
components of the build reside in the same address space. All components 
of the build can access all other components of the build.

The "Two Pass" Protected Build
==============================

The NuttX protected build, on the other hand, is a "two-pass" build and 
generates two "blobs": (1) a separately compiled and linked `kernel` blob 
called, again, `nuttx` and separately compiled and linked `user` blob called 
in ``nuttx_user.elf`` (in the existing build configurations). The user blob 
is created on pass 1 and the kernel blob is created on pass2.

These two make commands are identical:

.. code-block:: bash

    make
    make pass1 pass2

But the second is clearer and I prefer to use it for the protected build. 
In the second case, the user and kernel blobs are built separately; in the 
first, the kernel and user blob builds may be intermixed and somewhat 
confusing. You can also build the kernel and user blobs separately with 
one of the following commands:

.. code-block:: bash

    make pass1
    make pass2

At the end of the build, there will be several files in the top-level NuttX build directory. From Pass 1:

* ``nuttx_user.elf``. The pass1 user-space ELF file
* ``nuttx_user.hex``. The pass1 Intel HEX format file (selected in ``defconfig``)
* ``User.map``. Symbols in the user-space ELF file

From Pass 2:

* ``nuttx``. The pass2 kernel-space ELF file
* ``nuttx.hex``. The pass2 Intel HEX file (selected in ``defconfig``)
* ``System.map``. Symbols in the kernel-space ELF file

The Memory Protection Unit
==========================

If the MCU supports a Memory Protection Unit (MPU), then the logic within 
the kernel blob all execute in kernel-mode, i.e., with all privileges. 
These privileged threads can access all memory, all CPU instructions, 
and all MCU registers. The logic executing within the user-mode blob, 
on the other hand, all execute in user-mode with certain restrictions 
as enforced by the MCU and by the MPU. The MCU may restrict access to 
certain registers and machine instructions; with the MPU, access to all 
kernel memory resources are prohibited from the user logic. This includes 
the kernel blob's FLASH, .bss/.data storage, and the kernel heap memory.

Advantages of the Protected Build
=================================

The advantages of such a protected build are (1) security and (2) 
modularity. Since the kernel resources are protected, it will be much 
less likely that a misbehaving task will crash the system or that a 
wild pointer access will corrupt critical memory. This security also 
provides a safer environment in which to execute 3rd party software 
and prevents "snooping" into the kernel memory from the hosted applications.

Modularity is assured because there is a strict control of the exposed 
kernel interfaces. In the flat build, all symbols are exposed and there 
is no enforcement of a kernel API. With the protected build, on the 
other hand, all interactions with the kernel from the user application 
logic must use `system calls` (or `syscalls`) to interface with the OS. A 
system call is necessary to transition from user-mode to kernel-mode; 
all user-space operating system interfaces are via syscall `proxies`. 
Then, while in kernel mode, the kernel system call handler will 
perform the OS service requested by the application. At the 
conclusion of system processing, user-privileges are restored 
and control is return to the user application. Since the only 
interactions with the kernel can be through support system calls, 
modularity of the OS is guaranteed.

User-Space Proxies/Kernel-Space Stubs
=====================================

The same OS interfaces are exposed to the application in both the "flat" 
build and the protected build. The difference is that in the protected 
build, the user-code interfaces with a `proxy` for the OS function. For 
example, here is what a proxy for the OS ``getpid()`` interface:

.. code-block:: c

    #include <unistd.h>
    #include <syscall.h>
    pid_t getpid(void)
    {
        return (pid_t)sys_call0(SYS_getpid);
    }

Thus the ``getpid()`` proxy is a stand-in for the real OS ``getpid()`` interface 
that executes a system call so the kernel code can perform the real 
``getpid()`` operation on behalf of the user application. Proxies are 
auto-generated for all exported OS interfaces using the CSV file 
``syscall/syscall.csv`` and the program ``tools/mksyscalls``. Similarly, 
on the kernel-side, there are auto-generated `stubs` that map the 
system calls back into real OS calls. These, however, are internal 
to the OS and the implementation may be architecture-specific. 
See the ``README.txt`` files in those directories for further information.

Combining Intel HEX Files
=========================

One issue that you may face is that the two pass builds creates two 
FLASH images. Some debuggers that I use will allow me to write each 
image to FLASH separately. Others will expect to have a single Intel 
HEX image. In this latter case, you may need to combine the two Intel 
HEX files into one. Here is how you can do that:

1) The `tail` of the ``nuttx.hex`` file should look something like this 
   (with my comments and spaces added):

.. code-block:: bash

    $ tail nuttx.hex
    # 00, data records
    ...
    :10 9DC0 00 01000000000800006400020100001F0004
    :10 9DD0 00 3B005A0078009700B500D400F300110151
    :08 9DE0 00 30014E016D0100008D
    # 05, Start Linear Address Record
    :04 0000 05 0800 0419 D2
    # 01, End Of File record
    :00 0000 01 FF

Use an editor such as vi to remove the 05 and 01 records.

2) The `head` of the ``nuttx_user.hex`` file should look something like this 
   (again with my comments and spaces added):

.. code-block:: bash 

    $ head nuttx_user.hex
    # 04, Extended Linear Address Record
    :02 0000 04 0801 F1
    # 00, data records
    :10 8000 00 BD89 01084C800108C8110208D01102087E
    :10 8010 00 0010 00201C1000201C1000203C16002026
    :10 8020 00 4D80 01085D80010869800108ED83010829
    ...

Nothing needs to be done here. The ``nuttx_user.hex`` file should be fine.

3) Combine the edited nuttx.hex and un-edited ``nuttx_user.hex`` file to produce 
   a single combined hex file:

.. code-block:: bash

    $ cat nuttx.hex nuttx_user.hex >combined.hex

Then use the ``combined.hex`` file with for FLASH/JTAG tool. If you do this 
a lot, you will probably want to invest a little time to develop a tool 
to automate these steps.

Files and Directories
=====================

Here is a summary of directories and files used by the STM32F4Discovery 
protected build:

* ``boards/arm/stm32/stm32f4discovery/configs/kostest``. This is the kernel 
  mode OS test configuration. The two standard configuration files 
  can be found in this directory: (1) ``defconfig`` and (2) ``Make.defs``.
* ``boards/arm/stm32/stm32f4discovery/kernel``. This is the first past 
  build directory. The Makefile in this directory is invoked to 
  produce the pass1 object (``nuttx_user.elf`` in this case). The 
  second pass object is created by ``arch/arm/src/Makefile``. Also 
  in this directory is the file ``userspace.c``. The user-mode blob 
  contains a header that includes information need by the kernel 
  blob in order to interface with the user-code. That header is 
  defined in by this file.
* ``boards/arm/stm32/stm32f4discovery/scripts``. Linker scripts for 
  the kernel mode build are found in this directory. This includes 
  (1) ``memory.ld`` which hold the common memory map, (2) ``user-space.ld`` 
  that is used for linking the pass1 user-mode blob, and (3) 
  ``kernel-space.ld`` that is used for linking the pass1 kernel-mode blob.

Alignment, Regions, and Subregions
==================================

There are some important comments in the ``memory.ld`` 
file that are worth duplicating here:

"The STM32F407VG has 1024Kb of FLASH beginning at address 
0x0800:0000 and 192Kb of SRAM. SRAM is split up into three blocks:

* "112KB of SRAM beginning at address 0x2000:0000
* "16KB of SRAM beginning at address 0x2001:c000
* "64KB of CCM SRAM beginning at address 0x1000:0000

"When booting from FLASH, FLASH memory is aliased to address 
0x0000:0000 where the code expects to begin execution by jumping 
to the entry point in the 0x0800:0000 address range.

"For MPU support, the kernel-mode NuttX section is assumed to 
be 128Kb of FLASH and 4Kb of SRAM. That is an excessive amount 
for the kernel which should fit into 64KB and, of course, can 
be optimized as needed... Allowing the additional memory does 
permit addition debug instrumentation to be added to the kernel 
space without overflowing the partition.

"Alignment of the user space FLASH partition is also a critical 
factor: The user space FLASH partition will be spanned with a 
single region of size 2||n bytes. The alignment of the user-space 
region must be the same. As a consequence, as the user-space 
increases in size, the alignment requirement also increases.

"This alignment requirement means that the largest user space 
FLASH region you can have will be 512KB at it would have to be 
positioned at 0x08800000. If you change this address, don't 
forget to change the ``CONFIG_NUTTX_USERSPACE`` configuration 
setting to match and to modify the check in ``kernel/userspace.c``.

"For the same reasons, the maximum size of the SRAM mapping is 
limited to 4KB. Both of these alignment limitations could be 
reduced by using multiple MPU regions to map the FLASH/SDRAM 
range or perhaps with some clever use of subregions."

Memory Management
=================

At present, there are two options for memory management in the 
NuttX protected build:

Single User Heap
----------------

By default, there is only a single user-space heap and heap 
allocator that is shared by both kernel- and user-modes. 
PROs: Simple and makes good use of the heap memory space, 
CONs: Awkward architecture and no security for kernel-mode 
allocations.

Dual, Partitioned Heaps
-----------------------

Two configuration options can change this behavior:

* ``CONFIG_MM_MULTIHEAP=y``. This changes internal memory manager interfaces 
  so that multiple heaps can be supported.
* ``CONFIG_MM_KERNEL_HEAP=y``. Uses the multi-heap capability to enable 
  a kernel heap

If this both options are defined defined, the two heap partitions and 
two copies of the memory allocators are built:

One un-protected heap partition that will allocate user accessible memory 
that is shared by both the kernel- and user-space code. That allocator 
physically resides in the user address space so that it can be called 
directly by both the user- and kernel-space code. There is a header at 
the beginning of the user-space blob; the kernel-space code gets 
address of the user-space allocator from this header.

And another protected heap partition that will allocate protected 
memory that is only accessible from the kernel code. This allocator 
is built into the kernel block. This separate protected heap is 
required if you want to support security features.

NOTE: There are security issues with calling into the user space 
allocators in kernel mode. That is a security hole that could be 
exploit to gain control of the system! Instead, the kernel code 
should switch to user mode before entering the memory allocator 
stubs (perhaps via a trap). The memory allocator stubs should 
then trap to return to kernel mode (as does the signal handler now).

The Traditional Approach
------------------------

A more traditional approach would use something like the interface 
``sbrk()``. The ``sbrk()`` function adds memory to the heap space 
allocation of the calling process. In this case, there would 
still be kernel- and user-mode instances of the memory allocators. 
Each would ``sbrk()`` as necessary to extend their heap; the pages 
allocated for the kernel-mode allocator would be protected but 
the pages allocated for the user-mode allocator would not. 
PROs: Meets all of the needs. CONs: Complex. Memory losses 
due to quantization.

This approach works well with CPUs that have very capable 
Memory Management Units (MMUs) that can coalesce the 
srbk-ed chunks to a contiguous, `virtual` heap region. 
Without an MMU, the sbrk-ed memory would not be 
contiguous; this would limit the sizes of allocations 
due to the physical pages.

Many MCUs will have Memory Protection Units (MPUs) that can 
support the security features (only). However these lower 
end MPUs may not support sufficient mapping capability to 
support this traditional approach. The ARMv7-M MPU, for 
example, only supports eight protection regions to manage 
all FLASH and SRAM and so this approach would not be 
technically feasible for th ARMv7-M family (Cortex-M3/4).

Comparing the "Flat" Build Configuration with the Protected Build Configuration
===============================================================================

Compare, for example the configuration 
``boards/arm/stm32/stm32f4discovery/configs/ostest`` and the 
configuration ``boards/arm/stm32/stm32f4discovery/configs/kostest``. 
These two configurations are identical except that one builds a 
"flat" version of OS test and the other builds a kernel version 
of the OS test. See the file ``boards/arm/stm32/stm32f4discovery/README.txt`` 
for more details about those configurations.

The configurations can be compared using the ``cmpconfig`` tool:

.. code-block:: bash

    cd tools
    make -f Makefile.host cmpconfig
    cd ..
    tools/cmpconfig boards/arm/stm32/stm32f4discovery/configs/ostest/defconfig boards/arm/stm32/stm32f4discovery/configs/kostest/defconfig

Here is a summary of the meaning of all of the important differences in the 
configurations. This should be enough information for you to convert any 
configuration from a "flat" to a protected build:

* ``CONFIG_BUILD_2PASS=y``. This enables the two pass build.
* ``CONFIG_BUILD_PROTECTED=y``. This option enables the "two pass" 
  protected build.
* ``CONFIG_PASS1_BUILDIR="boards/arm/stm32/stm32f4discovery/kernel"``. 
  This tells the build system the (relative) location of the pass1 build directory.
* ``CONFIG_PASS1_OBJECT=""``. In some "two pass" build configurations, 
  the build system need to know the name of the first pass object. 
  This setting is not used for the protected build.
* ``CONFIG_NUTTX_USERSPACE=0x08020000``. This is the expected location 
  where the user-mode blob will be located. The user-mode blob 
  contains a header that includes information need by the kernel 
  blob in order to interface with the user-code. That header will 
  be expected to reside at this location.
* ``CONFIG_PASS1_TARGET="all"``. This is the build target to use for 
  invoking the pass1 make.
* ``CONFIG_MM_MULTIHEAP=y``. This changes internal memory manager 
  interfaces so that multiple heaps can be supported.
* ``CONFIG_MM_KERNEL_HEAP=y``. NuttX supports the option of using a 
  single user-accessible heap or, if this options is defined, 
  two heaps: (1) one that will allocate user accessible memory 
  that is shared by both the kernel- and user-space code, and 
  (2) one that will allocate protected memory that is only 
  accessible from the kernel code. Separate heap memory is required 
  if you want to support security features.
* ``CONFIG_MM_KERNEL_HEAPSIZE=8192``. This determines an approximate 
  size for the kernel heap. The standard heap space is partitioned 
  into a kernel- and user-heap space. This size of the kernel heap 
  is only approximate because the user heap is subject to stringent 
  alignment requirements. Because of the alignment requirements, the 
  actual size of the kernel heap could be considerable larger than this.
* ``CONFIG_BOARD_EARLY_INITIALIZE=y``. This setting enables a special, 
  `early` initialization call to initialize board-specific resources.
* ``CONFIG_BOARD_LATE_INITIALIZE=y``. This setting enables a special 
  initialization call to initialize `late` board-specific resources. 
  The difference between ``CONFIG_BOARD_EARLY_INITIALIZE`` and 
  ``CONFIG_BOARD_LATE_INITIALIZE`` is that the ``CONFIG_BOARD_EARLY_INITIALIZE`` 
  logic runs earlier in initialization before the full operating 
  system is up and running. ``CONFIG_BOARD_LATE_INITIALIZE``, on the 
  other hand, runs at the completion of initialization, just before 
  the user applications are started. Neither ``CONFIG_BOARD_EARLY_INITIALIZE`` 
  nor ``CONFIG_BOARD_LATE_INITIALIZE`` are used in the OS test 
  configuration but other configurations (such as NSH) 
  require some application-specific initialization before 
  the application can run. In the "flat" build, such initialization 
  is performed as part of the application start-up sequence. 
  These includes such things as initializing device drivers. 
  These same initialization steps must be performed in kernel 
  mode for the protected build and ``CONFIG_BOARD_LATE_INITIALIZE``. 
  See ``boards/arm/stm32/stm32f4discovery/src/up_boot.c`` for an 
  example of such board initialization code.
* ``CONFIG_NSH_ARCHINITIALIZE`` is not defined. The setting 
  ``CONFIG_NSH_ARCHINITIALIZE`` does not apply to the OS test 
  configuration, however, this is noted here as an example 
  of initialization that cannot be performed in the protected build.

Architecture-Specific Options:

* ``CONFIG_SYS_RESERVED=8``. The user application logic 
  interfaces with the kernel blob using system calls. 
  The architecture-specific logic may need to reserved a 
  few system calls for its own internal use. The ARMv7-M 
  architectures all require 8 reserved system calls.
* ``CONFIG_SYS_NNEST=2``. System calls may be nested. The 
  system must retain information about each nested system 
  call and this setting is used to set aside resources for 
  nested system calls. In the current architecture, a maximum 
  nesting level of two is all that is needed.
* ``CONFIG_ARMV7M_MPU=y``. This settings enables support for 
  the ARMv7-M Memory Protection Unit (MPU). The MPU is used 
  to prohibit user-mode access to kernel resources.
* ``CONFIG_ARMV7M_MPU_NREGIONS=8``. The ARMv7-M MPU supports 8 
  protection regions.

Size Expansion
==============

The protected build will, or course, result in a FLASH image that is 
larger than that of the corresponding "flat" build. How much larger? 
I don't have the numbers in hand, but you can build 
``boards/arm/stm32/stm32f4discovery/configs/nsh`` and 
``boards/arm/stm32/stm32f4discovery/configs/kostest`` and compare 
the resulting binaries for yourself using the ``size`` command.

Increases in size are expected because:

* The syscall layer is included in the protected build but not the flat 
  build.
* The kernel-size _syscal_l stubs will cause all enabled OS code to be 
  drawn into the build. In the flat build, only those OS interfaces 
  actually called by the application will be included in the final objects.
* The dual memory allocators will increase size.
* Code duplication. Some code, such as the C library, will be 
  duplicated in both the kernel- and user-blobs, and
* Alignment. The alignments required by the MPU logic will leave 
  relatively large regions of FLASH (and perhaps RAM) is not usable.

Performance Issues
==================

The only performance differences using the protected build should 
result as a consequence of the `sycalls` used to interact with the 
OS vs. the direct C calls as used in the flat build. If your 
performance is highly dependent upon high rate OS calls, then 
this could be an issue for you. But, in the typical application, 
OS calls do not often figure into the critical performance paths.

The `syscalls` are, ultimately, software interrupts. If the platform 
does not support prioritized, nested interrupts then the `syscall` 
execution could also delay other hardware interrupt processing. 
However, `sycall` processing is negligible: they really just 
configure to return to in supervisor mode and vector to the 
`syscall` stub. They should be lightning fast and, for the typical 
real-time applications, should cause no issues.