Summary:
1、cntfrq_el0 is used to store the timer frequency, which may
be different from the CPU frequency.
2、Do not use up_perf interface for SMP.
Signed-off-by: wangming9 <wangming9@xiaomi.com>
Some app with same code runs on different cores in AMP mode,
need the physical core on which the function is called.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Signed-off-by: fangxinyong <fangxinyong@xiaomi.com>
reason:
To remove the "sync pause" and decouple the critical section from the dependency on enabling interrupts,
after that we need to further implement "schedlock + spinlock".
changelist
1 Modify the implementation of critical sections to no longer involve enabling interrupts or handling synchronous pause events.
2 GIC_SMP_CPUCALL attach to pause handler to remove arch interface up_cpu_paused_restore up_cpu_paused_save
3 Completely remove up_cpu_pause, up_cpu_resume, up_cpu_paused, and up_cpu_pausereq
4 change up_cpu_pause_async to up_send_cpu_sgi
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Since FPU is now always saved into the current process stack location
upon exception entry, there is no need to keep fpu_regs (or saved_fpu_regs)
in the TCB.
Correct some of the cache operations:
- EP0 request length was handled incorrectly
- Received data cache invalidate was exceeding the received buffer
- writedtd is also called with no data (EP0 ACK/NACK). Don't touch cache in that case.
Fix trip wire handling to conform with the IMX93 reference manual
Also add DEBUGASSERTS for future to check the validity of pointers and sizes
Co-authored-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
As the handling of sp_el0 was moved from the context switch routine
to exception entry/exit, we must set sp_el0 explicitly when the user
process is first started.
1. refactor the ghs/gcc/clang/armclang toolchain management in CMake
2. unify the cmake toolchain naming style
3. support greenhills build procedure with CMake
4. add protect build for greenhills and gnu toolchain with CMake
Signed-off-by: guoshichao <guoshichao@xiaomi.com>
add arm64 qemu pm compatible for demo pm_idle in not smp & smp usage
demo, chip should based on demo to add more operation in pm_idle_handler
Signed-off-by: buxiasen <buxiasen@xiaomi.com>
This fixes two issues with the tick timer
1) Each tick was longer than the requested period. This is because setting
the compare register was done by first reading the current time, and only
after that setting the compare register. In addition, when handling the
timer interrupts in arch_alarm.c / oneshot_callback, the current_tick is
first read, all the tick handling is done and only after that the next tick
is started. The whole tick processing time was added to the total tick time.
2) When the compare time is not aligned with tick period, and is drifting,
eventually any call to ONESHOT_TICK_CURRENT would either return the current
tick, or the next one, depending on the rounding of division by the
cycle_per_tick. This again leads to oneshot_callback randomly handling
two ticks at a time, which breaks all wdog based timers, causing them to
randomly timeout too early.
The issues are fixed as follows:
Align the compare time register to be evenly divisible by cycle_per_tick.
This will lead arm64_tick_current always to return the currently ongoing tick,
fixing 2). Also calculating the next tick's start from the aligned current
count will fix 1), as there is no time drift in the start cycle.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
To align with the implementation of ARMv7-A, remove the operation of clearing
interrupts during GIC initialization to avoid losing interrupts during asynchronous startup.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
when a context switch occurs, up_switch_context is executed.
In order to reduce the time taken for context switching,
we inline the up_switch_context function.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
for the citimon stats:
thread 0: thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)
thread running
IRQ happen, in ISR:
post thread0
up_switch_context
note resume thread0 (t2)
ISR continue f1
ISR continue f2
...
ISR continue fn
leave_critical (t3)
You will see, the thread 0, critical_section time is:
(t1 - t0) + (t3 - t2)
BUT, this result contains f1 f2 .. fn time spent, it is wrong
to tell user thead0 hold the critical lots of time but actually
not belong to it.
Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends
Signed-off-by: ligd <liguiding1@xiaomi.com>
Summary
The original implement for exception handler is very simple and
haven't framework for breakpoint/watchpoint routine or brk instruction.
I refine the fatal handler and add framework for debug handler to
register or unregister. this is a prepare for watchpoint/breakpoint
implement
Signed-off-by: qinwei1 <qinwei1@xiaomi.com>
Function `arm64_lowputc` corrupted the x1 register which is used in function `boot_stage_puts`.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Summary
The original implement for exception handler is very simple and
haven't framework for breakpoint/watchpoint routine or brk instruction.
I refine the fatal handler and add framework for debug handler to
register or unregister. this is a prepare for watchpoint/breakpoint
implement
Signed-off-by: qinwei1 <qinwei1@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
reason:
1 On different architectures, we can utilize more optimized strategies
to implement up_current_regs/up_set_current_regs.
eg. use interrupt registersor percpu registers.
code size
before
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
text data bss dec hex filename
262844 49985 63893 376722 5bf92 nuttx
size change -4
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
TX clock or ref clock can be driven either from outside (PHY / oscilator) or by the ENET block.
Typical connection with RMII PHY is that the PHY drives the refclk.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
New configuration IMX9_HAVE_ATF_FIRMWARE introduced,
it is default on and it selects ARM64_HAVE_PSCI, when compiling
bootloader or when using bootloader that does not have atf
this shall be disabled
Signed-off-by: Jouni Ukkonen <jouni.ukkonen@unikie.com>
Clean up the interrupt-driven logic in the driver; handle error cases properly,
remove dead code and simplify logic.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Change "DMACH_HANDLE *handle" into "DMACH_HANDLE handle". The DMACH_HANDLE is already
defined as "void *".
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Enforcing the default 48-bit VA for everyone also implies a 4 page table
translation system. However, if less than 40 bits are needed, a full
translation table level can be dropped, making the translations faster.
Thus, make this into a configurable option, instead of enforcing the same
address widht for everyone.
There is a tiny possibility that when a process is started a trap is
taken which causes a context switch. This moves the kernel stack
unexpectedly and the task start logic no longer works.
Fix this by recording the initial context location, and use that to
trampoline into the user process with interrupts disabled. This ensures
the context stays intact AND the kernel stack is fully unwound before
the user process starts.
The register context is not needed, the original idea was to provide
the user stack pointer for signal handler delivery, but the user stack
can be obtained via sp_el0 so the context registers are not needed.
SP0 is not stored upon exception entry anyways, so this code is just
completely redundant and wrong.
reason:
In SMP, when a context switch occurs, restore_critical_section is executed.
To reduce the time taken for context switching, we directly pass the required
parameters to restore_critical_section instead of acquiring them repeatedly.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The vaddr field in TLBI means: Bits[55:12] of the virtual address to match.
This basically means the page offset of the virtual address, so the input
vaddr must be shifted to the page offset.
Reference TLBI VALE1IS register description from ARMv8-A reference manual.
The 12:0 bits in table descriptors are RES0 and AF is the 10th bit, so
it is not valid to set it in this case.
Fix this by moving AF to the common MMU_MT_NORMAL_FLAGS field
Make sure the user L1 page is updated to system memory when the kernel
mappings are copied.
Also, flush the I-cache when switching address environments.
Make this_cpu is arch independent and up_cpu_index do that.
In AMP mode, up_cpu_index() may return the index of the physical core.
Signed-off-by: fangxinyong <fangxinyong@xiaomi.com>
In corner case, the pending ISR will be triggered immediately
after enable the IRQ, this PR will setting CPU affinity first
to avoid routing the unexpected IRQ to other CPUs.
Signed-off-by: chao an <anchao@lixiang.com>
Only in the non-critical region, nuttx can the respond to the irq and not hold the lock
When returning from the irq, there is no need to check whether the lock needs to be released
we also need keep restore_critical_section in svc call
test:
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
This is the initial version for kernel mode build on the arm64 platform.
It works much in the same way as the risc-v implementation so any
highlights can be read from there.
Features that have been tested working:
- Creating address environments
- Loading init (nsh) from elf file
- Booting to nsh
- Starting other processes from nsh
- ostest runs to completion
Features that are not tested / do not work:
- SHM / shared memory support
- Kernel memory mapping (MM_KMAP)
- fork/vfork
An example qemu target is provided as a separate patch:
tools/configure.sh qemu-armv8a:knsh
Why? Because this allows optimizing the user system call path in such
a way that the parameter registers don't have to be read from the saved
integer register context when the system call is executed.
"/mnt/yang/qixinwei_vela_warnings/nuttx/include/nuttx/spinlock.h", line 252: warning #76-D:
argument to macro is empty
SP_DSB();
^
"/mnt/yang/qixinwei_vela_warnings/nuttx/include/nuttx/spinlock.h", line 261: warning #76-D:
argument to macro is empty
SP_DMB();
^
"/mnt/yang/qixinwei_vela_warnings/nuttx/include/nuttx/spinlock.h", line 252: warning #76-D:
argument to macro is empty
SP_DSB();
^
"/mnt/yang/qixinwei_vela_warnings/nuttx/include/nuttx/spinlock.h", line 261: warning #76-D:
argument to macro is empty
SP_DMB();
^
"/mnt/yang/qixinwei_vela_warnings/nuttx/include/nuttx/spinlock.h", line 296: warning #76-D:
argument to macro is empty
SP_DSB();
^
Signed-off-by: yanghuatao <yanghuatao@xiaomi.com>
when the thread to backtrace is exiting, get_tcb and up_backtrace in
different critical section may cause try to dump invalid pointer, have
to ensure the nxsched_get_tcb and up_backtrace inside same critical
section procedure.
Signed-off-by: buxiasen <buxiasen@xiaomi.com>
This adds enablers for setting various clocks to some default
values. Also, this provides helpers to grant nonsecure access
to a number of clocks. Bootloader may utilize these to make
the system boot in a deterministic manner.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
In the algorithm there is a subtraction (int - unsigned), which results (potentially overflowed)
unsigned.
Passing this to macro ABS and the assigning to int doesn't work ( unsigned is always >= 0 ).
Fix this by replacing (dangerous) ABS macro with stdlib's standard "int abs(int)"
and change the substraction to (int - int).
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
This is an initial FlexSPI SPI NOR MTD driver for IMX9
This supprts M25P SPI NOR on FlexSPI for now, and can later be extended to other
SPINOR devices if needed. The following configurations are needed to use this driver:
CONFIG_IMX9_FLEXSPI_NOR=y
CONFIG_MTD_M25P=y
In addition, board initialization logic needs to call the imx9_flexspi_nor_initialize
to receive a pointer to the mtd device.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Co-authored-by: Jouni Ukkonen <jouni.ukkonen@unikie.com>
During the boot phase, when we transition from tee smp to ap smp, we can use a busy waitflag to wait for the completion of the initialization of ap's core0
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Only in the non-critical region, nuttx can the respond to the irq and not hold the lock
When returning from the irq, there is no need to check whether the lock needs to be restored
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reduce the time consumed by function call
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
cpu0 cpu1:
user_main
signest_test
sched_unlock
nxsched_merge_pending
nxsched_add_readytorun
up_cpu_pause
arm_sigdeliver
enter_critical_section
Reason:
In the SMP, cpu0 is already in the critical section and waiting for cpu1 to enter the suspended state.
However, when cpu1 executes arm_sigdeliver, it is in the irq-disabled state but not in the critical section.
At this point, cpu1 is unable to respond to interrupts and
is continuously attempting to enter the critical section, resulting in a deadlock.
Resolve:
adjust the logic, do not entering the critical section when interrupt-disabled.
test:
We can use qemu for testing.
compiling
make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20
running
qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>