Blog: SAN

Simple erasure of a disk (or thumb drive). Windows 7 “full” format will overwrite each byte on the disk with zeroes.  This began with the Vista o/s and is true also with Windows 7.  This can cause problems for virtual machines running on a SAN.  Here is the statement from Microsoft KB 941961: [more]
 
The format command behavior has changed in Windows Vista. By default in Windows Vista, the format command writes zeros to the whole disk when a full format is performed. In Windows XP and in earlier versions of the Windows operating system, the format command does not write zeros to the whole disk when a full format is performed.

The new format behavior may cause problems for the on-demand allocation modes that a volume storage provider, such as a Storage Area Network (SAN), supports. Problems may occur because the new format behavior prematurely triggers allocation of the backing space.

In the on-demand scenario, zeros do not have to be written to the whole disk because the volume storage provider initializes the on-demand-allocated data. To avoid causing unnecessary on-demand-allocation, you must use the quick format option.


 

Background setup:

This site has VMware vSphere 5.0 hosts which are connecting to NFS datastores on a NetApp SAN/NAS.  There is a dedicated switch stack of Dell PowerConnect 5524 switches between the NetApp and the VMWare hosts.

Issue description:

Over the last couple weeks I have been seeing where VMWare virtual machines would pause or in some cases disconnect sessions.  The Windows event log would consistently record an Event ID 129 with a Source of LSI_SAS: "Reset to device, \Device\RaidPort0, was issued."  I did some further research and found that this event is usually generated when there is high I/O on the SAN.  However, the SAN at this location wasn’t experiencing high I/O. 

I started to notice the following NFS disconnect error while I was logged into the SAN:
nfsd.tcp.close.idle.notify:warning]: Shutting down idle connection to client (192.168.1.10) where receive side flow control has been enabled. There are 0 bytes in the receive buffer. [more]

Resolution:

Per NetApp’s best practice document, flow-control should be disabled on the storage network when using modern hardware.  I had flow-control enabled on the switch and the SAN and this apparently was causing the disconnect issues. 
http://media.netapp.com/documents/tr-3749.pdf


 

I found out last week how easily one can get a certificate from GoDaddy with a SAN (Subject Alternative Name) for a non-registered domains name. This would include domains that end in .dom or .local that do not have a public registrar. Since GoDaddy cannot retrieve a WHOIS record for the domain, their authorization email only needs to be approved by the account that requests the certificate. This vulnerability removes a significant barrier for a man-in-the-middle attack, since the certificate would be trusted and the name would match the URL requested by the users.

Additionally, Office 365 AD Sync (needed for password synchronization) will not work with these type of non-registerable DNS names in a UPN suffix. While the UPN suffix can be changed to be different than the domain name, the problem would not exist for domains that use names like “internal.registereddomain.com”.


 

The order of shutdown on a multi-shelf SAN is important. This is especially so for situations where there are Vdisks that span the shelves in the SAN. There is apparently a timestamp (specifically on the MSA 2000 series) that the controller keeps current on each of the drives in the Vdisk set and these must match for the controller to bring the Vdisk online after a power cycle.

You should power off the main controller shelf first, then any secondary shelves so that the timestamps written by the controller will be consistent.

When powering on, power on the secondary shelves first and then the main controller shelf.


 

I visited with a HP storage engineer at a conference and he told me that the I/O module on a 1510i does NOT have the disk configuration information in it’s memory, but that the disk configuration is written on each disk drive. Therefore, if the I/O module fails, you can replace it with another module and the drive configuration (RAID, LUN’s, etc) will not be effected. He also suggested that if the I/O module fails, then you should move the cache memory from the old to the new I/O controller prior to bringing up the system so that the cache will be flushed to the disks.  I definitely recommend contacting HP support if your I/O controller goes out to verify this, but it made me feel better about the recoverability of our SAN.  If you have a spare I/O module on hand, recovering from an I/O module failure should be easy (in theory).


 

When deploying a Microsoft cluster with shared disk resources on a san, be sure that the SAN controller supports “capacity extension” technology.  If this is not supported, it will be impossible to add storage capacity to the existing disk resource.  A new resource will have to be created. [more]

Even after capacity is extended on the shared disk resource,  Windows will not recognize it, because shared disk resources are basic disks, and cannot be converted to dynamic.  The extra capacity can be utilized only by extending the partition using the diskpart command(use caution!).