Background setup:
This site has VMware vSphere 5.0 hosts which are connecting to NFS datastores on a NetApp SAN/NAS. There is a dedicated switch stack of Dell PowerConnect 5524 switches between the NetApp and the VMWare hosts.
Issue description:
Over the last couple weeks I have been seeing where VMWare virtual machines would pause or in some cases disconnect sessions. The Windows event log would consistently record an Event ID 129 with a Source of LSI_SAS: "Reset to device, \Device\RaidPort0, was issued." I did some further research and found that this event is usually generated when there is high I/O on the SAN. However, the SAN at this location wasn’t experiencing high I/O.
I started to notice the following NFS disconnect error while I was logged into the SAN:
nfsd.tcp.close.idle.notify:warning]: Shutting down idle connection to client (192.168.1.10) where receive side flow control has been enabled. There are 0 bytes in the receive buffer.
[more]
Resolution:
Per NetApp’s best practice document, flow-control should be disabled on the storage network when using modern hardware. I had flow-control enabled on the switch and the SAN and this apparently was causing the disconnect issues.
http://media.netapp.com/documents/tr-3749.pdf