Fix Deadloop in VFS if CONFIG_CANCELLATION_POINTS is enabled

If cancellation points are enabled, then the following logic is activated in sem_wait().  This causes ECANCELED to be returned every time that sem_wait is called.

    int sem_wait(FAR sem_t *sem)
    {
      ...

      /* sem_wait() is a cancellation point */

      if (enter_cancellation_point())
        {
    #ifdef CONFIG_CANCELLATION_POINTS
          /* If there is a pending cancellation, then do not perform
           * the wait.  Exit now with ECANCELED.
           */

          errcode = ECANCELED;
          goto errout_with_cancelpt;
    #endif
        }
      ...

Normally this works fine.  sem_wait() is the OS API called by the application and will cancel the thread just before it returns to the application.  Since it is cancellation point, it should never be called from within the OS.

There there is is one perverse cases where sem_wait() may be nested within another cancellation point.  If open() is called, it will attempt to lock a VFS data structure and will eventually call nxmutex_lock().  nxmutex_lock() waits on a semaphore:

   int nxmutex_lock(FAR mutex_t *mutex)
   {
     ...

     for (; ; )
       {
         /* Take the semaphore (perhaps waiting) */

         ret = _SEM_WAIT(&mutex->sem);
         if (ret >= 0)
           {
             mutex->holder = _SCHED_GETTID();
             break;
           }

         ret = _SEM_ERRVAL(ret);
         if (ret != -EINTR && ret != -ECANCELED)
           {
             break;
           }
       }
   ...
}

In the FLAT build, _SEM_WAIT expands to sem_wait().  That causes the error in the logic:  It should always expand to nxsem_wait().  That is because sem_wait() is cancellation point and should never be called from with the OS or the C library internally.

The failure occurs because the cancellation point logic in sem_wait() returns -ECANCELED (via _SEM_ERRVAL) because sem_wait() is nested; it needs to return the -ECANCELED error to the outermost cancellation point which is open() in this case.  Returning -ECANCELED then causes an infinite loop to occur in nxmutex_lock().

The correct behavior in this case is to call nxsem_wait() instead of sem_wait().  nxsem_wait() is identical to sem_wait() except that it is not a cancelation point.  It will return -ECANCELED if the thread is canceled, but only once.  So no infinite loop results.

In addition, an nxsem_wait() system call was added to support the call from nxmutex_lock().

This resolves Issue #9695
This commit is contained in:
Gregory Nutt 2023-07-06 08:15:18 -06:00 committed by Alan Carvalho de Assis
parent bc62699c1d
commit 8868c58720
3 changed files with 6 additions and 4 deletions

View file

@ -71,6 +71,8 @@ SYSCALL_LOOKUP(sethostname, 2)
/* Semaphores */
SYSCALL_LOOKUP(nxsem_wait, 1)
SYSCALL_LOOKUP(sem_destroy, 1)
SYSCALL_LOOKUP(sem_post, 1)
SYSCALL_LOOKUP(sem_clockwait, 3)

View file

@ -27,6 +27,7 @@
#include <nuttx/sched.h>
#include <nuttx/clock.h>
#include <nuttx/mutex.h>
#include <nuttx/semaphore.h>
/****************************************************************************
* Pre-processor Definitions
@ -200,15 +201,13 @@ int nxmutex_lock(FAR mutex_t *mutex)
{
/* Take the semaphore (perhaps waiting) */
ret = _SEM_WAIT(&mutex->sem);
ret = nxsem_wait(&mutex->sem);
if (ret >= 0)
{
mutex->holder = _SCHED_GETTID();
break;
}
ret = _SEM_ERRVAL(ret);
if (ret != -EINTR && ret != -ECANCELED)
else if (ret != -EINTR && ret != -ECANCELED)
{
break;
}

View file

@ -84,6 +84,7 @@
"nx_pthread_exit","nuttx/pthread.h","!defined(CONFIG_DISABLE_PTHREAD)","noreturn","pthread_addr_t"
"nx_vsyslog","nuttx/syslog/syslog.h","","int","int","FAR const IPTR char *","FAR va_list *"
"nxsched_get_stackinfo","nuttx/sched.h","","int","pid_t","FAR struct stackinfo_s *"
"nxsem_wait","nuttx/semaphore.h","","int","FAR sem_t *"
"open","fcntl.h","","int","FAR const char *","int","...","mode_t"
"pgalloc", "nuttx/arch.h", "defined(CONFIG_BUILD_KERNEL)", "uintptr_t", "uintptr_t", "unsigned int"
"pipe2","unistd.h","defined(CONFIG_PIPES) && CONFIG_DEV_PIPE_SIZE > 0","int","int [2]|FAR int *","int"

Can't render this file because it has a wrong number of fields in line 2.