diff --git a/Documentation/guides/port.rst b/Documentation/guides/port.rst index 377edbeaa4..32c8086e2a 100644 --- a/Documentation/guides/port.rst +++ b/Documentation/guides/port.rst @@ -66,10 +66,14 @@ After that, try follwoing procedure. | | | count the time accurately or not. | +------+---------------+--------------------------------------------------------------------+ -The result of porting procedure +Porting Case Studies =============================== -Although these depend on specific kernel version. +These porting guides depend on specific kernel versions, as some code structures have changed over time. They will still +provide a general idea on how to port. .. toctree:: - port_arm_cm4.rst + :glob: + :maxdepth: 1 + + porting-case-studies/* diff --git a/Documentation/guides/porting-case-studies/bcm2711-rpi4b.rst b/Documentation/guides/porting-case-studies/bcm2711-rpi4b.rst new file mode 100644 index 0000000000..376eb68d77 --- /dev/null +++ b/Documentation/guides/porting-case-studies/bcm2711-rpi4b.rst @@ -0,0 +1,382 @@ +Porting to the BCM2711 (Raspberry Pi 4B) +======================================== + +This port was completed for the 12.7.0 version of the NuttX kernel, and was contributed by Matteo Golin. + +The pull request with this initial support can be found at `apache/nuttx/pull/15188 +`_. + +The port required support to be written for a new chip (the BCM2711) and a new board. Matteo created journal entries +while working on the initial port, which can be found `on his blog +`_. The details below are a more concise summary of the +porting process. + +Researching +----------- + +The first step to porting a board to NuttX was researching the board and how NuttX works. + +The BCM2711 is a quad-core ARM Cortex A72 based SoC, and it supports both aarch64 and 32 bit ARM architectures. I +focused on the aarch64 implementation only in this port. My first step was determining other boards already in the NuttX +kernel that used the aarch64 architecture, because that gives me a starting point to porting this new chip and board. + +I primarily used the blog posts written by Lup Yuen Lee about porting NuttX to the PinePhone, another ARM Cortex-A based +device. The articles are listed `here `_. Lup's articles provided me with an +understanding of the NuttX boot process, as well as which files from the aarch64 support on NuttX were pulled into the +build process for booting. He also showed how he created an initial UART driver using the NuttX structure for UART +drivers, which allowed him to get NSH appearing in the console. + +Finally, I also of course needed the BCM2711 datasheet in order to figure out which registers were available to me for +creating peripheral drivers. The BCM2711 datasheet isn't exceptionally detailed on many of the features on the SoC, but +it did provide enough detail to set up interrupts and get UART working. + +Adding to the source tree +------------------------- + +In order to build my code with the NuttX build system, I would have to add the board and the BCM2711 chip to the source +tree for NuttX. This way, it would appear as an available configuration via the ``tools/configure.sh`` script and I +could select options for it with ``make menuconfig``. + +The first thing to do was to add the chip, which goes under the ``arch/arm64`` directory because it is an ARM 64 bit +SoC. The chip directory must be added in two places: ``arch/arm64/include/bcm2711`` and ``arch/arm64/src/bcm2711``. C +files go in the ``src`` directory with some header files, and some specific header files go in the ``include`` +directory. + +In addition, in order to make the BCM2711 visible as a supported chip, I had to add it as an option in +``arch/arm64/Kconfig``. In order to do this, I just copy-pasted the entry for the Allwinner A64, since the two chips +were very similar. I had to change a few fields (for instance, selecting ``ARCH_CORTEX_A72`` instead of +``ARCH_CORTEX_A53``), but this was relatively simple to complete with the information about the SoC. I also needed to +specify ``ARMV8A_HAVE_GICv2``, since that is the interrupt controller used by the BCM2711. ``ARCH_HAVE_MULTICPU`` +because it is a quad-core, and ``ARCH_USE_MMU`` because it has a memory management unit. + +I also needed to now add the Raspberry Pi 4B board to the source tree. To do this, I copied the board folder for the +PinePhone (``boards/arm64/a64/pinephone``) and renamed it ``raspberrypi-4b``. I also deleted many of the files in this +folder since they weren't applicable to the Pi 4B, and substituted all mentions of the PinePhone with the Raspberry Pi +4B (in path names and header include guards). + +I then added the Pi 4B to the list of supported boards in ``boards/Kconfig``. For this, I just needed to create an entry +with the name ``ARCH_BOARD_RASPBERRYPI_4B`` and write that it depends on the ``ARCH_CHIP_BCM2711``. No additional +options necessary! In two other places in this file I also had to add some directives to make sure the Kconfig for the +board was found properly. These set ``ARCH_BOARD`` to the name of the board directory "raspberrypi-4b" when the Pi 4B was +selected, and ``source``'d the Kconfig under ``boards/arm64/bcm2711/raspberrypi-4b`` when selected. + +The default configuration for this board was copied from the PinePhone's NSH configuration, which I modified to use the +correct board name, chip, and hardware specific settings. It was still incomplete because there was no code to actually +boot into NSH, but it was a starting point. + +This was basically all I needed for the board to show up as a possible configuration in the source tree! + +Mapping out the chip +-------------------- + +To start writing code for the BCM2711, I needed to map out the chip. This included the register addresses and the memory +mapping, which could all be found in the BCM2711 datasheet. From looking at other implementations, the register +addresses are usually defined as C macros and kept in header files under ``arch//src//hardware``. +This is where I put them as well, defining all the register mappings the different groups within individual files (i.e. +``bmc2711_i2c.h``, ``bcm2711_spi.h``, etc.). + +Many peripherals had groupings of memory-mapped registers, defined using a base address and then offsets from that +address to access the different fields. For instance, the two mini-SPI peripherals had the same structure, each with 12 +registers. The way I commonly saw these macros implemented was something like: + +.. code-block:: c + + #define BCM_AUX_SPI1_BASEADDR (BCM_AUX_BASEADDR + BCM_AUX_SPI1_OFFSET) + + #define BCM_AUX_SPI_CNTL0_REG_OFFSET (0x00) /* SPI control register 0 */ + /* ... more register offsets */ + + /* This allows you to choose which SPI interface base address to get the register for. */ + + #define BCM_AUX_SPI_CNTL0(base) ((base) + BCM_AUX_SPI_CNTL0_REG_OFFSET) + +In addition to the registers themselves, I also included macros to mask certain fields within the registers or set +certain values. This makes the code less error prone later, because any mistakes made while copying the long list of +fields and registers from the datasheet can be changed in one place. + +.. code-block:: c + + #define BCM_SPI_CNTL0_EN (1 << 11) /* Enable SPI interface */ + +In addition to the registers, I also had to map the interrupts. This was done in ``include/bcm2711/irq.h``. I copied the +IRQ numbers from the datasheet and listed them all as macros with names. I also had to define the number of IRQS, which +was 216 in this case. The ``MPID_TO_CORE(mpid)`` macro was copied from another arm64 implementation. + +.. code-block:: c + + #define NR_IRQS 216 + #define MPID_TO_CORE(mpid) (((mpid) >> MPIDR_AFF0_SHIFT) & MPIDR_AFFLVL_MASK) + + /* VideoCore interrupts */ + + #define BCM_IRQ_VC_BASE 96 + #define BCM_IRQ_VC(n) (BCM_IRQ_VC_BASE + n) + + #define BCM_IRQ_VC_TIMER0 BCM_IRQ_VC(0) + #define BCM_IRQ_VC_TIMER1 BCM_IRQ_VC(1) + /* More interrupts ... */ + +Finally was to define the memory mapping within the ``include/bcm2711/chip.h`` file. I did so simply since I was only +testing on the 4GB version of the BCM2711. The RAM starts at address 0, and is roughly 4GB in size. 64 MB of that is +reserved for the memory-mapped I/O, so I had to be sure to remove that. I also defined the load address of the kernel in +memory for the chip. + +.. code-block:: c + + #define CONFIG_RAMBANK1_ADDR (0x000000000) + + /* Both the 4GB and 8GB ram variants use all the size in RAMBANK1 */ + + #if defined(CONFIG_RPI4B_RAM_4GB) || defined(CONFIG_RPI4B_RAM_8GB) + #define CONFIG_RAMBANK1_SIZE GB(4) - MB(64) + #endif /* defined(CONFIG_RPI4B_RAM_4GB) || defined(CONFIG_RPI4B_RAM_8GB) */ + + /* Raspberry Pi 4B loads NuttX at this address */ + + #define CONFIG_LOAD_BASE 0x480000 + +The same load address had to be specified in the linker script for the Raspberry Pi 4B kernel. This scripts tells the +compiler how to lay out the kernel code in memory and what addresses to use. I was able to copy it from the PinePhone +and just change the load address to ``0x480000``. + +Figuring out the boot +--------------------- + +The first thing I wanted to do was determine how much work had already been done for aarch64 that would allow me to more +easily complete the port. In Lup's blogs, he tested out support for his core type (ARM Cortex-A53 on the PinePhone) by +booting the aarch64 instance of QEMU with NuttX using that core. I decided to take the same approach, and was able to +successfully boot on ARM Cortex-A72 using QEMU following his blog. This was a nice confirmation that the hardware I was +using was already supported in NuttX for booting the OS and getting NSH working with a PL011 UART interface. + +I cannot stress enough that the reason porting to this chip was made so much easier was because I am standing on the +shoulders of giants. NuttX contributors had already set up the boot scripts written in assembly, timer configuration, +interrupt handling and drivers for a lot of the standard features in aarch64 architectures. I did not have to deal with +any of this because of them, and it really cut down on the amount of assembly I had to read and understand. I also +barely had to write any assembly outside of debugging the boot process a little (we'll get to that later). Not to +mention I had Lup's well-written articles to guide me. + +In order to compile and boot the board, I had to add a definition for ``g_mmu_config``, which I was confused about and +left empty initially just to get past the compilation stage. I also defined the ``GICR_OFFSET`` and ``GICR_BASE`` macros +for the GICv2 interrupt controller by copying them from the Allwinner chip, which used the same controller. After +reading further in Lup's blog, I learned that the boot script has a ``PRINT`` macro which is called early in the boot +process, and requires an implementation of ``up_lowputc`` to print to the console. This would be the first thing I need +to implement. This compiled, but when I booted the Pi, nothing happened. + +After quite a while of trying different things and looking at other implementations, I noticed that many people were +using register manipulation directly in the early print functions. I decided I would do the same, but instead of +printing (a more complex operation), I would turn one of the GPIO pins high. I was able to measure this with my +multimeter and confirm that the GPIO did get set, so I knew that the ``arm64_earlyprint_init`` function was getting +called. Something was wrong with my UART configuration. + +I then tried directly manipulating registers to put the text "hi" in the UART FIFO. When I booted again, this printed, +but then was followed by some garbled output. It appeared that the the ``char *`` pointer passed to the print function +was getting garbled. After troubleshooting by printing characters directly by calling my ``arm64_lowputc`` in the +assembly boot script, I discovered that I could print a string from the C definition if I declared the string as static. +I also investigated the elf generated by building and confirmed the string was located in ``.rodata``. I was suspicious +that I was loading the kernel incorrectly into memory and some addresses were getting mixed up. Sure enough, I had +defined the load address in the linker script as ``0x80000`` instead of ``0x480000``. Fixing this allowed me to see the +boot messages properly! + +I received this message in the console: + +.. code-block:: console + + ----gic_validate_dist_version: No GIC version detect + arm64_gic_initialize: no distributor detected, giving up ret=-19 + _assert: Current Version: NuttX 12.6.0-RC0 6791d4a1c4-dirty Aug 4 2024 00:38:21 arm64 + _assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481418 + +I had accidentally kept the GICv3 in my config files when copying things from other boards, and changed it to GICv2. +That resolved the issue and presented me with a new one: + +.. code-block:: console + + MESS:00:00:06.144520:0:----_assert: Current Version: NuttX 12.6.0-RC0 f81fb7a076-dirty Aug 4 2024 16:16:30 arm64 + _assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x4811e4 + +After enabling all of the debug output in the build options, this became: + +.. code-block:: console + + arm64_oneshot_initialize: cycle_per_tick 54000 + arm64_fatal_error: reason = 0 + arm64_fatal_error: CurrentEL: MODE_EL1 + arm64_fatal_error: ESR_ELn: 0xbf000002 + arm64_fatal_error: FAR_ELn: 0x0 + arm64_fatal_error: ELR_ELn: 0x48a458 + print_ec_cause: SError interrupt + +This looked like an unhandled interrupt, and after narrowing down which line was failing by adding log statements to the +kernel code, I discovered it was due to the spinlock code. An exception was being caused by the ``ldaxr`` instruction, +which the ARM documentation said could only be used once the MMU was enabled. I then enabled the MMU as well as its +debug information and was greeted with the lovely error: + +.. code-block:: console + + MESS:00:00:06.174977:0:----arm64_mmu_init: xlat tables: + arm64_mmu_init: base table(L1): 0x4cb000, 64 entries + arm64_mmu_init: 0: 0x4c4000 + arm64_mmu_init: 1: 0x4c5000 + arm64_mmu_init: 2: 0x4c6000 + arm64_mmu_init: 3: 0x4c7000 + arm64_mmu_init: 4: 0x4c8000 + arm64_mmu_init: 5: 0x4c9000 + arm64_mmu_init: 6: 0x4ca000 + init_xlat_tables: mmap: virt 4227858432x phys 4227858432x size 67108864x + set_pte_table_desc: + set_pte_table_desc: 0x4cb018: [Table] 0x4c4000 + init_xlat_tables: mmap: virt 0x phys 0x size 1006632960x + set_pte_table_desc: + set_pte_table_desc: 0x4cb000: [Table] 0x4c5000 + init_xlat_tables: mmap: virt 4718592x phys 4718592x size 192512x + split_pte_block_desc: Splitting existing PTE 0x4c5010(L2) + set_pte_table_desc: + set_pte_table_desc: 0x4c5010: [Table] 0x4c6000 + init_xlat_tables: mmap: virt 4911104x phys 4911104x size 81920x + init_xlat_tables: mmap: virt 4993024x phys 4993024x size 65536x + enable_mmu_el1: MMU enabled with dcache + nx_start: Entry + up_allocate_heap: heap_start=0x0x4d3000, heap_size=0x47b2d000 + mm_initialize: Heap: name=Umem, start=0x4d3000 size=1202900992 + mm_addregion: [Umem] Region 1: base=0x4d32a8 size=1202900304 + arm64_fatal_error: reason = 0 + arm64_fatal_error: CurrentEL: MODE_EL1 + arm64_fatal_error: ESR_ELn: 0x96000045 + arm64_fatal_error: FAR_ELn: 0x47fffff8 + arm64_fatal_error: ELR_ELn: 0x489d28 + print_ec_cause: Data Abort taken without a change in Exception level + _assert: Current Version: NuttX 12.6.0-RC0 96be557b64-dirty Aug 5 2024 14:56:42 arm64 + _assert: Assertion failed panic: at file: common/arm64_fatal.c:375 task: Idle_Task process: Kernel 0x481a34 + up_dump_register: stack = 0x4d2e10 + up_dump_register: x0: 0x13 x1: 0x4d32c0 + up_dump_register: x2: 0xfe215040 x3: 0xfe215040 + up_dump_register: x4: 0x0 x5: 0x0 + up_dump_register: x6: 0x1 x7: 0xdba53f65cc808a8 + up_dump_register: x8: 0xc4276feb17c016ba x9: 0xecbcfeb328124450 + up_dump_register: x10: 0xb7989dd7d34a1280 x11: 0x5ebf5f572386fdee + up_dump_register: x12: 0x6f7c07d067f6e38 x13: 0x3f7b5adaf798b4d5 + up_dump_register: x14: 0xf3dffbe2e4cff736 x15: 0xd76b1c050c964ea0 + up_dump_register: x16: 0x6d6fa9cfeeb0eff8 x17: 0x1a051d808a830286 + up_dump_register: x18: 0x3f7b5adaf798b4bf x19: 0x4d3000 + up_dump_register: x20: 0x47fffff0 x21: 0x4d32d0 + up_dump_register: x22: 0x47b2cd30 x23: 0x4d32a8 + up_dump_register: x24: 0x4d32b0 x25: 0x4806f4 + up_dump_register: x26: 0x2f56f66b2df71556 x27: 0x74ee6bbfb5d438f4 + up_dump_register: x28: 0x7ef57ab47b85f74f x29: 0x9a7fa1cb06923003 + up_dump_register: x30: 0x489cf8 + up_dump_register: + up_dump_register: STATUS Registers: + up_dump_register: SPSR: 0x600002c5 + up_dump_register: ELR: 0x489d28 + up_dump_register: SP_EL0: 0x4d3000 + up_dump_register: SP_ELX: 0x4d2f40 + up_dump_register: TPIDR_EL0: 0x0 + up_dump_register: TPIDR_EL1: 0x0 + up_dump_register: EXE_DEPTH: 0x1 + +Some more debugging allowed me to determine that the ``CONFIG_RAM_START`` and ``CONFIG_RAM_SIZE`` macros in the +defconfig for my nsh configuration were still set to the values from the PinePhone that I copied from. I set these to +the correct values for the Raspberry Pi 4B and got much further! + +.. code-block:: console + + MESS:00:00:06.211786:0:----irq_attach: In irq_attach + irq_attach: before spin_lock_irqsave + spin_lock_irqsave: me: 0 + spin_lock_irqsave: before spin_lock + spin_lock: about to enter loop + spin_lock: loop over + spin_lock_irqsave: after spin_lock + irq_attach: after spin_lock_irqsave + irq_attach: before spin_unlock_irqrestore + irq_attach: after spin_unlock_irqrestore + arm64_serialinit: arm64_serialinit not implemented + group_setupidlefiles: ERROR: Failed to open stdin: -38 + _assert: Current Version: NuttX 12.6.0-RC0 be262c7ad3-dirty Aug 5 2024 17:16:27 arm64 + _assert: Assertion failed : at file: init/nx_start.c:728 task: Idle_Task process: Kernel 0x48162c + up_dump_register: stack = 0x4c0170 + up_dump_register: x0: 0x4c0170 x1: 0x0 + up_dump_register: x2: 0x0 x3: 0x0 + up_dump_register: x4: 0x0 x5: 0x0 + up_dump_register: x6: 0x3 x7: 0x0 + up_dump_register: x8: 0x4c7468 x9: 0x0 + up_dump_register: x10: 0x4c7000 x11: 0x4 + up_dump_register: x12: 0x4b8000 x13: 0x4b7000 + up_dump_register: x14: 0x1 x15: 0xfffffff7 + up_dump_register: x16: 0x48a654 x17: 0x0 + up_dump_register: x18: 0x1 x19: 0x0 + up_dump_register: x20: 0x4ac181 x21: 0x4bf430 + up_dump_register: x22: 0x0 x23: 0x4c0170 + up_dump_register: x24: 0x4c0170 x25: 0x2d8 + up_dump_register: x26: 0x240 x27: 0x4b7000 + up_dump_register: x28: 0xfdc3ed41d6862df6 x29: 0xbf8e8f7280a0100 + up_dump_register: x30: 0x481bf8 + up_dump_register: + up_dump_register: STATUS Registers: + up_dump_register: SPSR: 0x20000245 + up_dump_register: ELR: 0x480230 + up_dump_register: SP_EL0: 0x4c7000 + up_dump_register: SP_ELX: 0x4c6e90 + up_dump_register: TPIDR_EL0: 0x4bf430 + up_dump_register: TPIDR_EL1: 0x4bf430 + up_dump_register: EXE_DEPTH: 0x0 + dump_tasks: PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACKBASE STACKSIZE USED FILLED COMMAND + dump_tasks: ---- --- --- -------- ------- --- ------- ---------- ---------------- 0x4c4000 4096 144 3.5% irq + dump_task: 0 0 0 FIFO Kthread - Running 0000000000000000 0x4c5010 8176 1200 14.6% Idle_Task + + CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.9 | VT102 | Offline | ttyUSB0 + +We actually got into tasks now! It appears stdin failed to open because in my Mini-UART driver implementation I had the +``attach`` and ``ioctl`` functions return ``-ENOSYS``. Just changing this to 0 for success in the interim allowed us to +get even further, and I could see the beginnings of NSH spawning. + +.. code-block:: console + + mm_initialize: Heap: name=Umem, start=0x4cc000 size=4222828544 + mm_addregion: [Umem] Region 1: base=0x4cc2a8 size=4222827856 + mm_malloc: Allocated 0x4cc2d0, size 144 + mm_malloc: Allocated 0x4cc360, size 80 + gic_validate_dist_version: GICv2 detected + up_timer_initialize: up_timer_initialize: cp15 timer(s) running at 54.0MHz + arm64_oneshot_initialize: oneshot_initialize + mm_malloc: Allocated 0x4cc3b0, size 48 + arm64_oneshot_initialize: cycle_per_tick 54000 + uart_register: Registering /dev/console + mm_malloc: Allocated 0x4cc3e0, size 80 + mm_malloc: Allocated 0x4cc430, size 80 + uart_register: Registering /dev/ttys0 + mm_malloc: Allocated 0x4cc480, size 80 + mm_malloc: Allocated 0x4cc4d0, size 80 + mm_malloc: Allocated 0x4cc520, size 80 + mm_malloc: Allocated 0x4cc570, size 32 + mm_malloc: Allocated 0x4cc590, size 64 + work_start_highpri: Starting high-priority kernel worker thread(s) + mm_malloc: Allocated 0x4cc5d0, size 336 + mm_malloc: Allocated 0x4cc720, size 8208 + nxtask_activate: hpwork pid=1,TCB=0x4cc5d0 + nx_start_application: Starting init thread + task_spawn: name=nsh_main entry=0x48b24c file_actions=0 attr=0x4cbfa0 argv=0x4cbf98 + mm_malloc: Allocated 0x4ce730, size 1536 + mm_malloc: Allocated 0x4ced30, size 64 + mm_malloc: Allocated 0x4ced70, size 32 + mm_malloc: Allocated 0x4ced90, size 8208 + nxtask_activate: nsh_main pid=2,TCB=0x4ce730 + lib_cxx_initialize: _sinit: 0x4ad000 _einit: 0x4ad000 + mm_malloc: Allocated 0x4d0da0, size 848 + mm_free: Freeing 0x4d0da0 + mm_free: Freeing 0x4ced70 + mm_free: Freeing 0x4ced30 + nxtask_exit: nsh_main pid=2,TCB=0x4ce730 + mm_free: Freeing 0x4ced90 + mm_free: Freeing 0x4ce730 + nx_start: CPU0: Beginning Idle Loop + +It seemed like we were waiting on an interrupt which never occurred. This was weird, because my Mini-UART driver had an +interrupt implementation and appeared to be written just fine. This took hours of debugging, logging from interrupt +handlers and dumping register values, but eventually I determined that the BCM2711 datasheet actually had an error where +the TX and RX interrupt fields were swapped in the datasheet. A blog post online had mentioned this for the BCM2835, but +it appeared to be an issue on this chip as well. Now we were booting into NSH! + +It was at this point that the port is considered a success, since I was able to boot into NSH and successfully run the +``ostest`` benchmark. I went on to write the start of a few more drivers, like the GPIO driver, but this completed the +requirements for an initial port and is most of what ended up being submitted in the initial pull request. diff --git a/Documentation/guides/port_arm_cm4.rst b/Documentation/guides/porting-case-studies/port_arm_cm4.rst similarity index 100% rename from Documentation/guides/port_arm_cm4.rst rename to Documentation/guides/porting-case-studies/port_arm_cm4.rst