In the Linux kernel, the following vulnerability has been resolved:
net: openvswitch: fix race on port output
assume the following setup on a single machine:
- An openvswitch instance with one bridge and default flows
- two network namespaces "server" and "client"
- two ovs interfaces "server" and "client" on the bridge
- for each ovs interface a veth pair with a matching name and 32 rx and
tx queues
- move the ends of the veth pairs to the respective network namespaces
- assign ip addresses to each of the veth ends in the namespaces (needs
to be the same subnet)
- start some http server on the server network namespace
- test if a client in the client namespace can reach the http server
when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
- send a large amount of parallel requests to the http server (around
3000 curls should work)
- in parallel delete the network namespace (do not delete interfaces or
stop the server, just kill the namespace)
there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.
The series of events happening here is:
- the network namespace is deleted calling
unregister_netdevice_many_notify somewhere in the process
- this sets first
NETREG_UNREGISTERING on both ends of the veth and
then runs synchronize_net
- it then calls
call_netdevice_notifiers with NETDEV_UNREGISTER
- this is then handled by
dp_device_event which calls
ovs_netdev_detach_dev (if a vport is found, which is the case for
the veth interface attached to ovs)
- this removes the rx_handlers of the device but does not prevent
packages to be sent to the device
dp_device_event then queues the vport deletion to work in
background as a ovs_lock is needed that we do not hold in the
unregistration path
unregister_netdevice_many_notify continues to call
netdev_unregister_kobject which sets real_num_tx_queues to 0
- port deletion continues (but details are not relevant for this issue)
- at some future point the background task deletes the vport
If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
dev_queue_xmit flow even though the device is unregistering.
In skb_tx_hash (which is called in the dev_queue_xmit) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
dev->real_num_tx_queues is zero.
To prevent this from happening we update do_output to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).
Additionally we now produce a warning in skb_tx_hash if we will hit
the infinite loop.
bpftrace (first word is function name):
_dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp
---truncated---
References
In the Linux kernel, the following vulnerability has been resolved:
net: openvswitch: fix race on port output
assume the following setup on a single machine:
tx queues
to be the same subnet)
when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
3000 curls should work)
stop the server, just kill the namespace)
there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.
The series of events happening here is:
unregister_netdevice_many_notifysomewhere in the processNETREG_UNREGISTERINGon both ends of the veth andthen runs
synchronize_netcall_netdevice_notifierswithNETDEV_UNREGISTERdp_device_eventwhich callsovs_netdev_detach_dev(if a vport is found, which is the case forthe veth interface attached to ovs)
packages to be sent to the device
dp_device_eventthen queues the vport deletion to work inbackground as a ovs_lock is needed that we do not hold in the
unregistration path
unregister_netdevice_many_notifycontinues to callnetdev_unregister_kobjectwhich setsreal_num_tx_queuesto 0If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
dev_queue_xmitflow even though the device is unregistering.In
skb_tx_hash(which is called in thedev_queue_xmit) path there isa while loop (if the packet has a rx_queue recorded) that is infinite if
dev->real_num_tx_queuesis zero.To prevent this from happening we update
do_outputto handle deviceswithout carrier the same as if the device is not found (which would
be the code path after 9. is done).
Additionally we now produce a warning in
skb_tx_hashif we will hitthe infinite loop.
bpftrace (first word is function name):
_dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp
---truncated---
References