Commit graph

23 commits

Author SHA1 Message Date
zhanghongyu
b934555fd1 mm/iob: Support alloc IOB via malloc
Support the network interface card driver to receive zero copies of packets and send and receive giant frame packets, allowing drivers to initialize the DMA buffer to the iob structure, and we can apply for IOB with large memory

Signed-off-by: zhanghongyu <zhanghongyu@xiaomi.com>
2024-04-26 01:06:21 +08:00
zhanghongyu
13f59128fd iob: limit the iob bufsize is sufficient to fill all L2/L3/L4 headers
rndis header length is 36, L2 header is 14, IPv6 header is 40, tcp header is 56 when sack option count is 4(default max_ofosegs is 4). so the iob bufsize should greater than we need.

Signed-off-by: zhanghongyu <zhanghongyu@xiaomi.com>
2023-09-20 14:35:22 +08:00
chao an
e6b37f2b2d mm/iob: revert "modify iob to support header padding and alignment features"
we don't need to implement l2 isolation through io_head, iob offload will use io_offset

-------------------------------------------------------------
Layout of different NICs implementation:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                 io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
 Ethernet   |       Reserved    | ETH_HDRLEN |    io_len    |
            ---------------------------------|---------------
 8021Q      |   Reserved  | ETH_8021Q_HDRLEN |    io_len    |
            ---------------------------------|---------------
 ipforward  |            Reserved            |    io_len    |
            -------------------------------------------------

--------------------------------------------------------------------

Signed-off-by: chao an <anchao@xiaomi.com>
2022-12-21 01:43:02 +08:00
chao an
34d2cde8a8 net/l2/l3/l4: add support of iob offload
1. Add new config CONFIG_NET_LL_GUARDSIZE to isolation of l2 stack,
   which will benefit l3(IP) layer for multi-MAC(l2) implementation,
   especially in some NICs such as celluler net driver.

new configuration options: CONFIG_NET_LL_GUARDSIZE

CONFIG_NET_LL_GUARDSIZE will reserved l2 buffer header size of
network buffer to isolate the L2/L3 (MAC/IP) data on network layer,
which will be beneficial to L3 network layer protocol transparent
transmission and forwarding

------------------------------------------------------------
Layout of frist iob entry:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                  io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
      iob   |            Reserved            |    io_len    |
            -------------------------------------------------

-------------------------------------------------------------
Layout of different NICs implementation:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                 io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
 Ethernet   |       Reserved    | ETH_HDRLEN |    io_len    |
            ---------------------------------|---------------
 8021Q      |   Reserved  | ETH_8021Q_HDRLEN |    io_len    |
            ---------------------------------|---------------
 ipforward  |            Reserved            |    io_len    |
            -------------------------------------------------

--------------------------------------------------------------------

2. Support iob offload to l2 driver to avoid unnecessary memory copy

Support send/receive iob vectors directly between the NICs and l3/l4
stack to avoid unnecessary memory copies, especially on hardware that
supports Scatter/gather, which can greatly improve performance.

new interface to support iob offload:

  ------------------------------------------
  |    IOB version     |     original      |
  |----------------------------------------|
  |  devif_iob_poll()  |   devif_poll()    |
  |       ...          |       ...         |
  ------------------------------------------

--------------------------------------------------------------------

1> NIC hardware support Scatter/gather transfer

TX:

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                            dev->d_iob:           |
                                                ---------------         ---------------
                             io_data       iob1 |  |          |    iob3 |  |          |
                                    \           ---------------         ---------------
                                  ---------------  |       --------------- |
                             iob0 |  |          |  |  iob2 |  |          | |
                                  ---------------  |       --------------- |
                                     \             |          /           /
                                        \          |       /           /
                                   ----------------------------------------------
                    NICs io vector |    |    |    |    |    |    |    |    |    |
                                   ----------------------------------------------

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to each iobs)

--------------------------------------------------------------------

2> CONFIG_IOB_BUFSIZE is greater than MTU:

TX:

"(CONFIG_IOB_BUFSIZE) > (MAX_NETDEV_PKTSIZE + CONFIG_NET_GUARDSIZE + CONFIG_NET_LL_GUARDSIZE)"

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                                             "NIC"_send()
                          (dev->d_iob->io_data[CONFIG_NET_LL_GUARDSIZE - NET_LL_HDRLEN(dev)])

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to io_data)

--------------------------------------------------------------------

3> Compatible with all old flat buffer NICs

TX:
                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll(devif_poll_callback())  devif_poll_callback() /* new interface, gather iobs to flat buffer */
       /                                           \
      /                                             \
 devif_poll("NIC"_txpoll)                     "NIC"_send()(dev->d_buf)

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
               netdev_input()  /* new interface, Scatter/gather flat/iob buffer */
                    |
                    |
          pkt/ipv[4|6]_input()/...
                    |
                    |
    NICs io vector receive(Orignal flat buffer)

3. Iperf passthrough on NuttX simulator:

  -------------------------------------------------
  |  Protocol      | Server | Client |            |
  |-----------------------------------------------|
  |  TCP           |  813   |   834  |  Mbits/sec |
  |  TCP(Offload)  | 1720   |  1100  |  Mbits/sec |
  |  UDP           |   22   |   757  |  Mbits/sec |
  |  UDP(Offload)  |   25   |  1250  |  Mbits/sec |
  -------------------------------------------------

Signed-off-by: chao an <anchao@xiaomi.com>
2022-12-03 11:47:04 +08:00
Xiang Xiao
130b196876 Refine how to specify iob and ramlog data section
1.Remove the default value(.bss)
2.Remove !ARCH_SIM dependence

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2022-08-25 14:05:17 +02:00
luojun1
bee9dcc73a Add Kconfig option, user can specify the location of g_iob_buffer
Signed-off-by: luojun1 <luojun1@xiaomi.com>
2022-08-14 21:00:28 +08:00
luojun1
4e62772be5 modify iob to support header padding and alignment features
Signed-off-by: luojun1 <luojun1@xiaomi.com>
2022-08-14 21:00:28 +08:00
YAMAMOTO Takashi
7599ca4e95 Kconfig: mention that IOB_NCHAINS is not used by TCP anymore 2021-06-30 06:40:13 -05:00
YAMAMOTO Takashi
6b9d2fef00 mm/iob/Kconfig: Fix a typo (other other -> other) 2021-03-16 02:07:34 -07:00
Xiang Xiao
346336bb9e Make the read ahead buffer unselectable
Here is the email loop talk about why it is better to remove the option:
https://groups.google.com/forum/#!topic/nuttx/AaNkS7oU6R0

Change-Id: Ib66c037752149ad4b2787ef447f966c77aa12aad
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2020-01-11 08:24:49 -06:00
Xiang Xiao
c5a7da5b7a mm/iob/Kconfig: Make the default of IOB_NCHAINS same as IOB_NBUFFERS. It is reasonable default value since iob_qentry_s is much small than iob_s and could avoid the buffer can be allocated but the chain can't: 'tcp_datahandler: ERROR: Failed to queue the I/O buffer chain: -12' 2019-11-24 10:43:45 -06:00
Xiang Xiao
fe421022e2 sched/wqueue/kwork_notifier.c and several Kconfig files: Notifier should work with either lpwork or hpwork and other minor typo fix. 2019-01-27 11:02:56 -06:00
Gregory Nutt
22da175629 mm/iob: Add a divider that can be used to reduce the rate of IOB notifications. 2018-09-13 06:56:29 -06:00
Gregory Nutt
af0ee3c8f7 sched/wqueue: Add an option to work queue notifier so that the notification can occur on different work queues. 2018-09-11 07:22:23 -06:00
Gregory Nutt
9d3148406c Signals were not a good choice of IPC to implement the poll function for several reasons: In order to handle the asynchrnous poll-related event, a substantial amount of state information is needed. Signals are only capable of passing minimal amounts of data. There are also complexities with performing kernel space signal handlers in kernel space code that is better to avoid. So, instead of signals, the equivalent logic was converted to run via a callback that executes on the high-priority work queue.
Squashed commit of the following:

    Fix up some final compile isses.

    net/netdev:  Convert the network down notification logic to use the new wqueue-based notification factility.

    net/udp:  Convert the UDP readahead notification logic to use the new wqueue-based notification factility.

    net/tcp:  Convert the TCP readahead notification logic to use the new wqueue-based notification factility.

    mm/iob:  Convert the IOB notification logic to use the new wqueue-based notification factility.

    sched/wqueue:  Signals are not good IPCs to support the target poll functionality for several reasons including the amount of data that can be passed with a signal and in the fact that in protected and kernel modes, user threads executing signal handlers in protected, kernel memory is problematic.  Instead, convert the same logic to perform the notifications via function callback on the high priority work queue.
2018-09-09 15:01:44 -06:00
Gregory Nutt
32e3e51678 net/netdev: Add signal notification for the case where the network goes down. 2018-09-09 10:39:25 -06:00
Gregory Nutt
28f73bd928 net/tcp and udp: Add logic to signal events when TCP or UDP read-ahead data is buffered.
Squashed commit of the following:

    net/tcp:  Add signal notification for the case when UDP read-ahead data is buffered.  This is basically of clone of the TCP notification logic with naming adapted for UDP.

    net/tcp:  Add signal notification for the case when TCP read-ahead data is buffered.
2018-09-09 09:21:39 -06:00
Gregory Nutt
fc127fd297 sched/signal: Add a generic signal notification facility. Modify the custom IOB available notifier so that it is now just a wrapper around this generic signal notification. This generic signal notification faility will, eventually, be used to support network polling.
Squashed commit of the following:

    mm/iob:  The IOB available notifier is now just a wrapper around the common signal notifier.

    sched/signal:  Add a generic signal notification facility.

    sched/signal/sig_evthread.c:  More trivial naming changes.

    sched/signal:  Rename nxsig_notification() to nxsig_evthread() to make forthcoming naming additions more consistent.
2018-09-09 08:32:37 -06:00
Gregory Nutt
bba8541ad6 mm/iob: Add an IOB notifier that will send a signal to any registered threads that want to be notified when an IOB has been freed. This is an untested work-in-progress and is intended to be a part of a larger solution to correctly handling network poll operations. 2018-09-08 11:21:18 -06:00
Gregory Nutt
fef255e5be This commit adds an as-of-yet untested implemented of UDP write buffering.
Squashed commit of the following:

    net/udp:  Address most of the issues with UDP write buffering.  There is a remaining issue with handling one network going down in a multi-network environment.  None of this has been test but it is certainly ready for test.  Hence, the feature is marked EXPERIMENTAL.
    net/udp:  Some baby steps toward a corrected write buffering design.
    net/udp:  Remove pesky write buffer macros.
    Eliminate trailing space at the end of lines.
    net/udp:  A little more UDP write buffering logic.  Still at least on big gaping hole in the design.
    net/udp:  Undefined CONFIG_NET_SENDTO_TIMEOUT.
    net/udp:  Crude, naive port of the TCP write buffering logic into UDP.  This commit is certainly non-functional and is simply a starting point for the implementatin of UDP write buffering.
    net/udp:  Rename udp/udp_psock_sendto.c udp/udp_psock_sendto_unbuffered.c.
2018-01-22 18:32:02 -06:00
Gregory Nutt
2c00825dcf Porting Guide: Add description of IOBs. 2017-05-20 08:50:05 -06:00
Gregory Nutt
6a3800f611 There can be a failure in IOB allocation to some asynchronous behavior caused by the use of sem_post(). Consider this scenario:
Task A holds an IOB.  There are no further IOBs.  The value of semcount is zero.
Task B calls iob_alloc().  Since there are not IOBs, it calls sem_wait().  The v
alue of semcount is now -1.

Task A frees the IOB.  iob_free() adds the IOB to the free list and calls sem_post() this makes Task B ready to run and sets semcount to zero NOT 1.  There is one IOB in the free list and semcount is zero.  When Task B wakes up it would increment the sem_count back to the correct value.

But an interrupt or another task runs occurs before Task B executes.  The interrupt or other tak takes the IOB off of the free list and decrements the semcount.  But since semcount is then < 0, this causes the assertion because that is an invalid state in the interrupt handler.

So I think that the root cause is that there the asynchrony between incrementing the semcount.  This change separates the list of IOBs:  Currently there is only a free list of IOBs.  The problem, I believe, is because of asynchronies due sem_post() post cause the semcount and the list content to become out of sync.  This change adds a new 'committed' list:  When there is a task waiting for an IOB, it will go into the committed list rather than the free list before the semaphore is posted.  On the waiting side, when awakened from the semaphore wait, it will expect to find its IOB in the committed list, rather than free list.

In this way, the content of the free list and the value of the semaphore count always remain in sync.
2017-05-16 11:03:35 -06:00
Gregory Nutt
2043e1a114 IOBs: Move from driver/iob to a better location in mm/iob 2017-05-09 07:35:30 -06:00
Renamed from drivers/iob/Kconfig (Browse further)