Blog: VMware

One of our information security auditors recently had the motherboard on his laptop replaced to fix the "shutdown on its own" issue he'd been having for a while.  When he got the laptop back, his BIOS level fingerprint logins (to unlock the hard drive and BitLocker key) were no longer working.  Also, the x64 VMware machine he uses for audits would no longer boot.  The VM issue was pretty clear.  The CPU virtualization setting in the BIOS was disabled and needed to be turned back on.  The fingerprint issues, however, took a little more digging to figure out.  Eventually we realized the TPM on the new motherboard was not activated.  Once we activated and initialized the TPM, then turned BitLocker off and back on (without decryption), all the pre-boot login information unlocked by the fingerprint started working again.


 

In working with XenServer over the past couple of months, I have found that information is harder to come by than it is with VMware. We are only using XenServer for one customer and they are using the free version so support is not an option. Up until last week, I had no need to get into the CLI of Xen much. It’s pretty easy to configure via XenCenter and our setup is pretty simple. However, the other day, our monitoring software detected an issue where the network interfaces on one of the monitored VMs was logging a high number of discards. One of the peculiar things was that the discards were the exactly the same for Tx and Rx. After some research, I decided that it would be a good idea to run off all the offloading features in XenServer. XenServer sees network interfaces in two forms: physical interfaces (pifs) and virtual interfaces (vifs). Pifs are the actual connections to the server. Vifs are the NIC interfaces of the VMs. Naturally, turning off all of this can only be done via the XenServer CLI. So, part one of the gotcha…here is a set of scripts that can help in manipulating network interfaces in Xenserver
 
Script to turn off all offloading techniques off on all vifs and pifs: [more]
====================================================
#!/bin/bash
 
if_modes="rx tx sg tso ufo gso"
 
if [[ "$1" == "--local" || "$1" == "-l" ]]; then
echo -n "disabling checksum offloading for local devices... "
for iface in $(ifconfig | awk '$0 ~ /Ethernet/ { print $1 }'); do
for if_mode in ${if_modes}; do
ethtool -K $iface $if_mode off 2>/dev/null
done
done
echo "done."
else
echo -n "disabling checksum offloading in xapi settings... "
for VIF in $(xe vif-list --minimal | sed -e 's/,/ /g')
do
###xe vif-param-clear uuid=$VIF param-name=other-config
for if_mode in ${if_modes}; do
xe vif-param-set uuid=$VIF other-config:ethtool-${if_mode}="off"
done
done
for PIF in $(xe pif-list --minimal | sed -e 's/,/ /g')
do
###xe pif-param-clear uuid=$PIF param-name=other-config
for if_mode in ${if_modes}; do
xe pif-param-set uuid=$PIF other-config:ethtool-${if_mode}="off"
done
done
echo "done."
fi
====================================================
- create text script file (turnOffloadingOff) using VI
- Change perms to make it a script
                        chmod 777 turnOffloadingOff
- Run script
                        ./turnOffloadingOff
 
Other Useful XenServer Commands
 
- determine uuids of physical interfaces on XenServer
                        xe pif-list host-name-label=<hostname>
- determine parameters of the specific pif given the uuid
                        xe pif-param-list uuid=<uuid of pif>
- determine uuids of virtual interfaces of VMs on Xenserver
                        xe vif-list
- determine parameters of the specific vif given the uuid
                        xe vif-param-list uuid=<uuid>
- new VMs created since the script was executed will NOT have the same vif other-config settings disabled
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-gso=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-ufo=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-tso=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-sg=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-tx=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-rx=”off”

So on to part two of this post…this didn’t fix the problem. After a ton of additional troubleshooting, I determined that this behavior is due to the Citrix Paravirtual NIC driver. The issue goes away if you uninstall XenTools and the PV driver isn’t used. On Windows 2008 and Windows Vista/7 VMs, the PV Ethernet driver reports discards to the OS. In Windows 2003 and XP, it does not. Keep in mind the discards could be broadcast packets not intended for the VM or misc DOM0 traffic. In any case, it doesn’t make much sense, but there isn’t actually anything wrong with the VM. I ended up removing the monitoring of the VMs NIC interfaces because I did want to use the PV Ethernet driver.


 

Last month, I was working a maintenance window for a customer that has VMware View 4 installed. During the window, I would install all the updates on the master image, snapshot it, and recompose the pool using the updated image. During the recompose, View would shutdown all the machines needing the update, delete them from the inventory, copy out a new replica disk, recreate all the VMs, attach them to the replica disk, and complete the setup process. This particular recompose could not delete one of the machines. The other machines finished the process normally and were ready to go, but this one machine simply timed out during the recompose process.

During my troubleshooting, I ended up killing the task and trying to delete the machine through the View console. No luck. I could delete the machine from the vSphere client, but then how would I clean it up from inside View? [more]

http://kb.vmware.com/kb/1008658

This article provides the steps to manually remove a linked clone entry from VMWare View. The basic steps include:

  1. Remove the VM from the ADAM database
  2. Remove the linked clone reference from the View Composer database
  3. Delete the machine from vCenter

At that point, you can re-enable provisioning and everything should start working as normal once again.


 

If your virtual disk is at or close to the maximum size allowed by the file system, you might be unable to take snapshots due to overhead added by the snapshot process.  This failure occurs when the snapshot file at its maximum size would be unable to fit into a datastore. 

The failure depends on the size of the virtual disk. All virtual machines having disks with a maximum supported size by VMFS may experience this error. Overhead for the snapshot is roughly about 2GB for a disk size of 256GB. If snapshots are to be used, consider the overhead while deciding the size of the disks.  Follow the link below to view the maximum file sizes forthe different versions of VMFS.


 

Block level vmdk backups have limitations that will GET YOU.  Backup Exec and Veeam both have the ability to backup the vmdk files in a VMware environment and still retain enough information in the backup set to do individual file level restores.  However, both products will ONLY work if you have vmdk disks partitioned using the MBR (Master Boot Record) type tables and NOT the more modern GPT (Guid Partition Table) structure.


 

A week or so ago, I had a task to do some work on the VMware cluster for one of our customers. It was late (~11:00 PM) when I finished the task, but I decided to go ahead and try to do a P2V conversion on one of the physical servers that we had targeted for virtualization a long time ago. It’s that server that sits in the corner and just keeps running…and nobody knows a thing about it. Everybody is just happy it keeps running because it’s on 10 year old hardware. It’s the poster child for virtualization. I was 90% sure that the conversion wouldn’t even kick off because the server has such an odd-ball configuration, but to my surprise it worked fine. P2V conversion was done in about 90 minutes (~12:30 AM). When I do these conversions, the first thing I do is start the VM up without the NICs connected just to see if the new VM will boot.

I knew it was too good to be true...BSOD with immediate reboot. Error message “Inaccessible boot device”. If it were any other error message, I probably would have just hung it up for the night, but I have fixed conversions that give this error a number of times. Most of the time it’s due to the existence of a recovery partition on the physical system and the boot.ini has the wrong partition numbers in the boot parameters…easy fix. I attached the vmdk to a helper VM to access the files. The boot.ini file looked fine and all the files necessary for boot made it over. So, I took a closer look at the conversion logs. I didn’t notice it while the conversion was running, but right at the end of the job, a warning was logged…”virtual machine reconfiguration failed”. So what is reconfiguration… never noticed that before. [more]

The “reconfiguration” keyword as well as the stop code led me to the following VMware article: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1005208. Basically, the article provides instructions on how to inject the VMware SCSI controller driver into the physical machine pre-conversion. The reconfiguration task that failed on the conversion indicated that VMware was unable to inject the driver and registry keys into the VM post conversion…ah yes, the odd-ball configuration. The server was built with only a 100 MB c:\ partition. The c:\ partition only contains the windows boot files. The boot.ini references the \windows folder on partition(2), which is the d:\ drive. Why was the server built that way??...nobody knows. So, VMware tried to inject the drivers to fix the boot problem, but couldn’t because it doesn’t read the boot.ini to find out where the \windows folder it. It just assumes it will be at c:\windows and that c:\ will be partition(1). So, there is the cause…so how do I fix it. By this time, its 1:30 AM and I really didn’t want to wait 90 minutes for the conversion to run again. The article explains how to fix this problem. I just had to figure out how to make those changes post-conversion. Specifically, we want to follow the Windows 2003 (lsilogic controller) instructions. So here we go.

  1.  Copying the symmpi.sys file to the %systemroot%\system32\drivers\ folder is pretty easy. Just mount the vmdk to a helper VM and copy the file from another windows 2003 virtual machine (on vmware of course) to the target vmdk.
  2. The registry edits is where it gets a little tricky. The goal in this step is to manually load the registry hives for the VM that won’t boot and inject the .reg file changes into the registry.
    • Follow the article to export the registry sub-hives to .reg files from another windows 2003 virtual machine.
    • From your helper VM where your unbootable vmdk is mounted, using regedit.exe, highlight HKEY_LOCAL_MACHINE and choose File -> Load Hive.
    • Browse to %systemroot%\system32\config inside your unbootable vmdk. For this instance, it was on partition(2) or d:\windows\system32\config. This directory contains all the physical files that provide the registry tree. Choose the file named “system” to load.
    • You will be prompted for a name to load it as…just type the hostname of the unbootable VM (server1). This will load the system hive for the unbootable VM under HKEY_LOCAL_MACHINE in regedit.
    • If you browse down into that mounted registry hive, you will see there is no CurrentControlSet. Your registry keys that were exported need to be put back in CurrentControlSet. Your safest bet is to restore the registry hives into ALL control sets so that you can be sure you get it in the right spot.
    • You have to make sure that the registry changes get imported into the mounted registry hive and not your local machine registry hive. Open the .reg files and do a search and replace for the string “HKEY_LOCAL_MACHINE\system\currentcontrolset” and replace it with HKEY_LOCAL_MACHINE\server1\system\controlset001”. As mentioned in the step above, you may have multiple controlsets (controlset001, controlset002, etc.) which means you will have to edit the .reg file multiple times and re-merge it to get the registry settings into all control sets.
    • Right-click on each of the .reg files and merge them. Double-check to make sure all the entries got added.
    • Highlight HKEY_LOCAL_MACHINE\server1, and choose File à Unload Hive to unmount system registry file.
  3. Remove the vmdk from the helper virtual machine. Make sure to just disconnect it and not delete it from disk.
  4. Start your previously unbootable virtual machine. Should work find now. (3:00 AM)

So, this process is way easier if you just re-do the conversion. I ended up having to do this conversion again due to other circumstances and did the injection pre-conversion the second time and it works as well. However, I have had conversions that took 4-5 hours due to the amount of data involved. This process is obviously justified in those situations.
 


 

I was recently troubleshooting a problem where a terminal server, that happened to be a VMware virtual machine, could not browse the network.  Opening Explorer when drives were mapped would hang Explorer.  Opening Explorer with no drives mapped, but attempting to browse to a network location would hang Explorer.  Troubleshooting was complicated by this being a production server.

First, I cloned the server in VMware, renamed it, and rejoined the domain under the new name.  This allowed me to troubleshoot without further disrupting the users. [more]

Next, after extensive testing (resetting TCP/IP, cleaning DNS, running HijackThis, C-Cleaner, removing a bunch of software, etc.), I found that the Network Provider Order was incorrect.  VMware shared folders was listed first, followed by terminal services and then Windows Networking.  I reordered the list so that Windows Networking was first in the list, logged off and back on, and everything started working normally.  I replicated the fix to the production TS2 and users are able to browse the network.


 

I was adding a new SCSI/SATA controller card to an HP MSA 1510i. I had shut down the unit to perform the work and after rebooting I could not connect to the management interface. I checked the small interface on the front and the system was attempting to get a DHCP address. I reset the address for management and was able to connect but the password had reset to the default. At that point I determined it had dumped its configuration. [more]

The LUNs were fine just could not communicate over iSCSI. If you have ever configure a MSA 1510i you know they are not very straight forward. I was able to get everything back communicating and the VMware servers back online without too much trouble. Lesson learned was to make sure and document the configuration of a device or back it up. Unfortunately the MSA 1510i does not allow configuration backups. It’s also good to document because I had lost access to information at our office, such as passwords and IPs, because the ISA server (which is a VM) was offline.


 

We use VMware Workstation a lot during our information security audit work and have lots of times when we just need to copy a file or two to or from a virtual machine and it would be nice to not have to wait to startup the VM, login, copy, etc.

VMware has a Virtual Disk Development kit (http://www.vmware.com/support/developer/vddk/) that contains a helpful tool for this problem.  There is only a 32-bit Windows version but it works on 64-bit systems. Among other tools, the kit includes a handy command line utility called vmware-mount, also known as VMware Disk Mount. You'll find the utility in C:\Program Files\VMware\VMware Virtual Disk Development Kit\bin. [more]

Once it's mounted, you can work with that disk in Explorer, just like any other disk. To mount a local .VMDK to the M: drive, use the command:

vmware-mount M: {pathToVMDKFile}

You can even use this tool to mount remote .VMDKs, either on other Windows hosts or ESX/ESXi hosts. Here's some quick syntax to connect to a disk on a remote ESX/ESXi host:

vmware-mount K: "[storage1] WinXP/WinXP.vmdk" /i:ha-datacenter/vm/WinXP /h:esx3 /u:root /s:secret

You can get all the command line hints from the tool's documentation.


 

I was recently assigned a task for one of our customers in West Texas to get their servers checking in with WSUS correctly.  After talking to a coworker, I found out that since we deploy every virtual machine from a template that has the OS already installed on it.  Each virtual machine deployed will have the same “SusClientID” for WSUS that the template has.  Here are several steps that you can go through in order to issue each machine a new “SusClientID”: [more]

  • Stop the Windows Update service “net stop wuauserv”
  • Delete “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\ SusClientId”
  • Start the Windows Update service “net start wuauserv”
  • In a command prompt, run “wuauclt /detectnow”

Following those steps will recreate the string and that system should begin reporting in with a different SusClientID.