gprof can analyze code hot spots based on scheduled sampling.
After adding the "-pg" parameter when compiling, you can view the code call graph.
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
If the write lock is already held by oneself and sine the write
lock can be recursively held, so, this operation can be converted to a write
lock to avoid deadlock.
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
after
text data bss dec hex filename
269732 51065 63335 384132 5dc84 nuttx
before
text data bss dec hex filename
269784 51065 63335 384184 5dcb8 nuttx
size -50
Signed-off-by: hujun5 <hujun5@xiaomi.com>
In the original implementation, if wdog is not ACTIVE, -EINVAL will be returned. The rp2040 i2c driver relies on this semantics to correctly release the semaphore.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Some app with same code runs on different cores in AMP mode,
need the physical core on which the function is called.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Signed-off-by: fangxinyong <fangxinyong@xiaomi.com>
Do not format immediately when calling sched_note_printf, but delay formatting until dump trace.
After turning on SYSTEM_NOTE, similar asynchronous syslog functions can be achieved.
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Some app with same code runs on different cores in AMP mode,
need known physical core id on which the function is called.
Signed-off-by: fangxinyong <fangxinyong@xiaomi.com>
bug:
thread 1: get mutexA
wait a sem
thread 2: wait mutexA
This situation will be mistakenly judged as a deadlock.
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
Signed-off-by: xuxingliang <xuxingliang@xiaomi.com>
reason:
To remove the "sync pause" and decouple the critical section from the dependency on enabling interrupts,
after that we need to further implement "schedlock + spinlock".
changelist
1 Modify the implementation of critical sections to no longer involve enabling interrupts or handling synchronous pause events.
2 GIC_SMP_CPUCALL attach to pause handler to remove arch interface up_cpu_paused_restore up_cpu_paused_save
3 Completely remove up_cpu_pause, up_cpu_resume, up_cpu_paused, and up_cpu_pausereq
4 change up_cpu_pause_async to up_send_cpu_sgi
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Change mqueue_msg_s to a dynamic array so that
mqueue msg can be malloc based on the msg size
of the current data when applied dynamically.
Signed-off-by: zhangyuan29 <zhangyuan29@xiaomi.com>
Add heap current used to note.
Plot it in segger sysview data plot.
Signed-off-by: xuxingliang <xuxingliang@xiaomi.com>
Signed-off-by: Neo Xu <neo.xu1990@gmail.com>
Record all memory allocation and release, save to ram, used to analyze memory allocation rate and memory usage
Its absolute value is not trustworthy because the memory will be allocated in thread A and released in thread B
netinit-5 [0] 0.105984392: tracing_mark_write: C|5|Heap Usage|96|free: heap: 0x606000000020 size:24, address: 0x603000000370
netinit-5 [0] 0.105996874: tracing_mark_write: C|5|Heap Usage|24|free: heap: 0x606000000020 size:72, address: 0x6070000008e0
nsh_main-4 [0] 3.825169408: tracing_mark_write: C|4|Heap Usage|2177665|free: heap: 0x606000000020 size:424, address: 0x614000000840
nsh_main-4 [0] 3.825228525: tracing_mark_write: C|4|Heap Usage|14977|free: heap: 0x606000000020 size:2162688, address: 0x7f80a639f800
nsh_main-4 [0] 3.825298789: tracing_mark_write: C|4|Heap Usage|15189|malloc: heap: 0x606000000020 size:20, address: 0x6030000003a0
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Signed-off-by: Neo Xu <neo.xu1990@gmail.com>
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. use '__attribute__((constructor))' mark initialize function
2. use '__attribute__((destructor))' mark uninitialize function
3. compile module with -fvisibility=hidden. use `__attribute__((visibility("default")))`
mark is need export symbol.so not need module_initialize to initialize export symbol.
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
In the kernel, we are planning to remove all occurrences of up_cpu_pause as one of the steps to
simplify the implementation of critical sections. The goal is to enable spin_lock_irqsave to encapsulate critical sections,
thereby facilitating the replacement of critical sections(big lock) with smaller spin_lock_irqsave(small lock)
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic
-machine virt,virtualization=on,gic-version=3
-net none -chardev stdio,id=con,mux=on -serial chardev:con
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
1 To improve efficiency, we mimic Linux's behavior where preemption disabling is only applicable to the current CPU and does not affect other CPUs.
2 In the future, we will implement "spinlock+sched_lock", and use it extensively. Under such circumstances, if preemption is still globally disabled, it will seriously impact the scheduling efficiency.
3 We have removed g_cpu_lockset and used irqcount in order to eliminate the dependency of schedlock on critical sections in the future, simplify the logic, and further enhance the performance of sched_lock.
4 We set lockcount to 1 in order to lock scheduling on all CPUs during startup, without the need to provide additional functions to disable scheduling on other CPUs.
5 Cpu1~n must wait for cpu0 to enter the idle state before enabling scheduling because it prevents CPUs1~n from competing with cpu0 for the memory manager mutex, which could cause the cpu0 idle task to enter a wait state and trigger an assert.
size nuttx
before:
text data bss dec hex filename
265396 51057 63646 380099 5ccc3 nuttx
after:
text data bss dec hex filename
265184 51057 63642 379883 5cbeb nuttx
size -216
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
wd_timer handler will change current task, so need process
scheduler before wd timer handler, Ensure that the current
task is not updated before the RR process
Signed-off-by: zhangyuan29 <zhangyuan29@xiaomi.com>
nxsem_tickwait correctly sleeps more than 1 tick. But nxsem_tickwait_uninterruptible
may wake up to a signal (with -EINTR), in which case the tick + 1 must also
be taken into account. Otherwise the nxsem_tickwait_uninterruptible may
wake up 1 tick too early.
Also fix the nxsem_tickwait to return with -ETIMEDOUT if called with delay 0.
This is similar to e.g. posix sem_timedwait.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
1. extract dump from assert main flow
2. use OSINIT_PANIC for fatal error.
3. fix the method to judge kernel thread.
Signed-off-by: xuxingliang <xuxingliang@xiaomi.com>
1.nxsem_post wake up nxsched_smp_call;
2.stack smp_call_data_s may return;
3.nxsched_smp_call_handler access call_data->lock is not safety;
so adjust the unlock order
Signed-off-by: dulibo1 <dulibo1@xiaomi.com>
1. mainly for long time within critical_section
2. move out memcpy buffer from enter_critical_section()
3. let mq_send() return fail when mq buffer full in ISR
Signed-off-by: ligd <liguiding1@xiaomi.com>
LD: nuttx.elf
ld: /home/ligd/platform/mainline/nuttx/staging/libsched.a(mq_sndinternal.o): in function `wd_start_realtime':
/home/ligd/platform/mainline/nuttx/include/nuttx/wdog.h:274: undefined reference to `clock_realtime2absticks'
/home/ligd/platform/mainline/nuttx/include/nuttx/wdog.h:274:(.text+0xad): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `clock_realtime2absticks'
ld: /home/ligd/platform/mainline/nuttx/staging/libsched.a(mq_rcvinternal.o): in function `wd_start_realtime':
/home/ligd/platform/mainline/nuttx/include/nuttx/wdog.h:274: undefined reference to `clock_realtime2absticks'
/home/ligd/platform/mainline/nuttx/include/nuttx/wdog.h:274:(.text+0xad): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `clock_realtime2absticks'
make[1]: *** [Makefile:128: nuttx.elf] Error 1
make: *** [tools/Unix.mk:551: nuttx.elf] Error 2
Signed-off-by: ligd <liguiding1@xiaomi.com>
The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Signed-off-by: dulibo1 <dulibo1@xiaomi.com>
1.call_data->refcount is not remote_cpus;
2.nxsched_smp_call_add enqueue the call_data;
3.maybe nxsched_smp_call_handler is just doing;
4.dequeue the call_data but call_data->refcount is not corret;
5.unpredictability case will occur;
reason:
1In the scenario of active waiting, context switching is inevitable, and we can eliminate redundant judgments.
code size
before
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
263324 49985 63893 377202 5c172 nuttx
reduce code size by -476
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
This commit simplified thread ID dispatching logic by integrating it into the `nxsig_dispatch` function.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
There will be a large performance loss after SCHED_CRITMONITOR is enabled.
By isolating thread running time-related functions, CPU load can be run with less overhead.
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Signed-off-by: buxiasen <buxiasen@xiaomi.com>
When statistics when sched_cpuload_critMonitor, thread switching only
calculates the current CPU's running time. Other CPU running threads
are updated for updates
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Converted the timer overrun check from a loop-based approach to a division-based method. This change ensures a deterministic worst-case execution time (WCET), even if it might not outperform the loop in average scenarios.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This patch added support for SIGEV_THREAD_ID and sigev_notify_thread_id.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
When we do not drop notifier from g_notifier_pending, we need an
isolated dq entry for this queue, otherwise the queued work_s's dq entry
may be modified by the work queue and breaks the chain of
g_notifier_pending.
Signed-off-by: Zhe Weng <wengzhe@xiaomi.com>
Return 1 to indicate the work was not cancelled.
Because it is currently being processed by work thread,
but wait for it to finish.
Signed-off-by: ligd <liguiding1@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
reason:
1 On different architectures, we can utilize more optimized strategies
to implement up_current_regs/up_set_current_regs.
eg. use interrupt registersor percpu registers.
code size
before
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
text data bss dec hex filename
262844 49985 63893 376722 5bf92 nuttx
size change -4
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Using the ts/tick conversion functions provided in clock.h
Do this caused we want speed up the time calculation, so change:
clock_time2ticks, clock_ticks2time, clock_timespec_add,
clock_timespec_compare, clock_timespec_subtract... to MACRO
Signed-off-by: ligd <liguiding1@xiaomi.com>
1 Only the idle task can have the flag TCB_FLAG_CPU_LOCKED.
According to the code logic, btcb cannot be an idle task, so this check can be removed.
2 Optimized the preemption logic check and removed the call to nxsched_add_prioritized.
3 Speed up the scheduling time while avoiding the potential for
tasks to be moved multiple times between g_assignedtasks and g_readytorun.
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Most tools used for compliance and SBOM generation use SPDX identifiers
This change brings us a step closer to an easy SBOM generation.
Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
After these wdog refactor:
We conducted a latency measurement using the rt-tests/cyclictest (commit cadd661) on an x86_64 NUC12 equipped with an i7-1255U processor and 16GB of LPDDR5 memory. The specific command used for this microbenchmark was cyclictest -q -l 100000 -h 30000, which is designed to assess the responsiveness of the cyclic timer.
The findings from our benchmark are summarized below, highlighting the minimum, median, and maximum latency values for each operating system tested:
Operating System Minimum Latency (us) Median Latency (us) Maximum Latency (us)
Linux 48 53 410
PreemptRT 6 57 148
Xenomai 53 53 64
NuttX 64 626 1212
NuttX (refactor) 1 1 3
In this table, "Min" indicates the shortest latency observed, "Median" represents the middle value of the latency distribution, and "Max" denotes the longest latency encountered.
The systems tested were as follows:
Linux: ACRN version 6.1.80 (commit f528146)
PreemptRT: Linux kernel 5.4.251 with the 5.4.254-rt85 patch applied
Xenomai: Linux kernel 5.4.251 patched with ipipe-core-5.4.239-x86-13
These results clearly demonstrate the varying performance of different operating systems in terms of timer latency, the refactored NuttX showing particularly low latency values.
Signed-off-by: ligd <liguiding1@xiaomi.com>
Now we have CONFIG_USEC_PER_TICK, and for our timer system, all the calculation used 'tick'.
And all the timespec should change to 'tick' before use wd_start(), so USEC2TICK() can NOT be avoided.
Then there must be an 'less then one tick' loss.
One resolution:
ticks++ anyway when wd_start(). But this will caused time expired more a tick.
Another resolution:
Change the testcase, and allow the following logic:
t1 = current_time();
sleep(3);
t2 = current_time();
allow: t2 - t1 >= 3;
(original test must be: t2- t1 > 3)
The original test think the time must be elapse-ing, and the (t2 - t1) must bigger then 3,
but in our system, we use 'tick' as the minimal wdog unit, then there must a precision loss.
Now we choose first resolution.
Signed-off-by: ligd <liguiding1@xiaomi.com>
This patch addresses an issue where the elapsed time was uncorrectly calculated.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
For the nested interrupt, one thing should decleared:
We are in ISR context, but no meaning we are disabled the interrupts.
Signed-off-by: ligd <liguiding1@xiaomi.com>
This patch moved the g_wdtimernested to wd_start.c
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
If g_wdactivelist has been changed in the wdog callback, the list traversal with next pointer will cause problem.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
This commit refactors the wdog module to use absolute time representation internally. The main improvements include:
1. Fixed recursive watchdog handling caused by calling wd_start within watchdog timeout callback function.
2. Simplified timer processing to improve performance and enhance code readability.
3. Improved accuracy of timers.
4. Reduced critical section and interrupt disable time, improving real-time performance.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
reason:
In the SMP, when a context switch occurs, restore_critical_section is executed.
In order to reduce the time taken for context switching,
we inline the restore_critical_section function.
Given that restore_critical_section is small in size
and is called from only one location, inlining it does not increase the size of the image.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The file descriptors of kernel threads should be clean and should
not be generated based on user threads. Instead, an idle thread
hould be chosen.
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
reason:
If the type of tcb is TSTATE_TASK_ASSIGNED, removing it using nxsched_remove_not_running
and then putting it back into the queue may result in a context switch.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
1 We place interrupt handling functions of the same priority into the work queue corresponding
to that priority, allowing high-priority interrupts to preempt low-priority ones,
thus ensuring the real-time performance of high-priority interrupts.
2 The sole purpose of the interrupt handler is to wake
up the work queue of the corresponding priority and execute the interrupt handling function.
3 Compared to the functionality of isr threads, this
approach saves more memory, particularly when the number of interrupts is large.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Since pthread_mutex is implemented by sem, it is impossible to see in ps who holds the lock and causes the wait.
Replace sem with mutex implementation to solve the above problems
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
reason:
Since smp call handler may lead to context switching,
we need to update the context information by calling up_cpu_paused_[save|restore].
Signed-off-by: hujun5 <hujun5@xiaomi.com>