Blog: ESX

I learned the reason that VMware suggests having service consoles for ESX hosts on at least two distinct networks last week. I was troubleshooting intermittent backup issues with Veeam on a customer network and couldn’t really find any pattern to the failures. Two or three backups in a row would run successfully, then 5 in row might fail. The behavior was very random. However, the failures were always on Virtual Machines associated with a specific ESX host. At first I thought the host was healthy, but after watching the VI client for an extended period of time, I noticed that the ESX host would drop offline (showing disconnected in the VI client) and then come back online again.  This indicated the problem wasn’t just affecting the management/backup server. [more]

In order to level set my troubleshooting efforts, I decided to reboot this ESX host. However, after the reboot, I could not connect to it with the VI client. I could ping the IP assigned to the service console, but couldn’t SSH or connect via the VI client. I logged in via iLO and found that an ifconfig at the command line returned IP = 0.0.0.0…..interesting. So what is responding to my pings. I checked the arp cache on one of the switches and found that a thin client had been plugged in that had the same IP as my LAN service console. What is really odd is the MAC address for the thin client was all zeros AND the IP I was using for the LAN service console is not even available to be distributed by DHCP. I was not able to connect to the thin client to see how it was configured, but I was able to connect to ESX host via a second service console port that I placed on the iSCSI network. The management/backup server has a connection to the iSCSI network to do backups to disk so I was able to change the LAN-facing service console IP to another IP and everything started working fine. The backup issue was obviously being caused by changes in the arp entries on the backup server between the thin client and the ESX host. So, be aware that at boot-time, if ESX determines that the IP it is using for a service console is already in use, it just rips it out of the configuration and continues to boot with NO WARNINGS or ERRORS on the console.


 

We have a VMWare ESXi 4 infrastructure that we wanted to have VM’s with two separated networks: DMZ and Internal. This was accomplished by using the VLAN tags within the virtual switches to separate the traffic. However, when the VLAN tags were implemented on the separate switches, then we could no longer access the host itself at it’s ip address. The reason was that we did not assign a VLAN ID to the host itself. This can be done at the configuration option of the ESXi console (F2). Alternatively, one could have a completely isolated NIC card that is just for servicing the host machine that is independent of the NIC card(s) for the embedded VM’s.


 

I created a virtual machine with an “independent persistent” disk.  This prevents VMware from being able to take snapshots.  Since the method for backing up an entire virtual machine on a stand-alone ESXi server is to take a snapshot and then copy the snapshot to a network location, this prevented me from being able to back up the server.  (I could only back up the virtual machine if I shut it down.)

I was able to correct the configuration by powering off the virtual machine and editing the virtual machine settings.


 

I was trying to add a scsi controller to a ESX box to use in a VM running on the box. The ESX operating system recognized the PCI-based Adaptec controller (it was on the HCL) but when I added the SCSI controller to the VM, it would not boot and displayed the following error:

Unable to open SCSI device 'vmfs/devices/genscsi/vmhba3:5:0:0'(scsi3:0):Could not find the file. Failed to configure scsi3.

The problem was that there was an extra “:0” at the end of the file name. I edited the .vmx file for the virtual machine and it worked! Note that also you need to edit with wordpad (not notepad) because of the construct of the .vmx file. [more]

edit vmx file
Edit ******.vmx (wordpad, etc...)

(Before)
scsi0:1.present = "true"
scsi0:1.deviceType = "scsi-passthru"
scsi0:1.fileName = "/vmfs/devices/genscsi/vmhba3:5:0:0"
scsi0:1.allowGuestConnectionControl = "false"

(After)
scsi0:1.present = "true"
scsi0:1.deviceType = "scsi-passthru"
scsi0:1.fileName = "/vmfs/devices/genscsi/vmhba3:5:0"
scsi0:1.allowGuestConnectionControl = "false"

I was able to find a solution in the following VMware Communities thread: http://communities.vmware.com/thread/199408


 

I was installing a 64bit VM in ESX Server 3.0.2.  When attempting to load the ISO file to install the OS, I got a cryptic ‘Host CPU’ error in VI client.  Searching a number of forum posts, I decided to check the BIOS setting on that DL380-G5 for the CPU Virtualization Technology.  Sure enough, it was disabled and enabling let me get past the ‘host CPU’ error and load the OS.  I noticed in the posts that many people were saying older Proliants had this setting enabled, while newer models had the setting disabled.  This setting should be enabled for systems acting as VM hosts (ESX, ESXi, Hyper-V, etc), so be sure to check that setting, regardless of how new the server is, before installing your VM guests.

Also, a quick note that these CPU BIOS settings (VT, No-Execute memory protection, etc) should be consistent across any systems being used for V-Motion.

 

A while back, before the recent VMware ESX upgrade, I was having problems logging into the VC server. As soon as the main console window would pop up, I’d receive “Exception has been thrown by the target of an invocation”. The fix, which I found in the VMWare forums, is to open regedit, go to HKEY_CURRENT_USER/Software/VMWare and delete both entries (Virtual Infrastructure Client and VMWare Infrastructure Client). Doing this, and making sure the compatibility setting was set to Windows XP, let me in successfully.  [more]

http://communities.vmware.com/thread/119422


 

Do not install Vmware tools with the complete option on ESX guests to prevent possible problems with your backups.  This installs the shared folders feature which is not available on ESX.  This causes the VMware tools to keep a file (hgfs.dat) open and can cause backup errors.  To disable the shared folders feature, remove hgfs from the registry key ProviderOrder under KEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NetworkProvider\Order\.  [more]

See http://kb.vmware.com/kb/1317 for more information.


 

I recently ran into an issue using VMware Converter to move a VM from VMware Server to VMware ESX v3.0.2. I had already successfully converted two other VMs in this manner, every time I started the conversion it would bomb out during the creation of the VM on ESX. The log files in my profile at …\<username>\Local Settings\Temp\1\vmware-temp indicted the following error. [more]

'P2V' 5748 error] [task,295] Task failed: P2VError UFAD_SYSTEM_ERROR(Invalid response code: 400 Bad Request)

I did some research and found that this can often be caused by invalid ASCII characters in the VM name or path. I looked and I didn’t have anything unusual in there; all standard alpha characters. Then I got to looking around and found that the “Notes” section of the VM Summary did have some double dashes ( -- ) and periods ( . ) in it. I didn’t think that should cause an issue, but I decided to just take out all the text in the Notes section anyway. When I fired off the conversion again it worked! There must have been something in that text that was causing an issue. Here is the notes section is was choking on.

Package Testing Server – Windows Server 2003 SP2 – TS – Windows 2003 server template. Setup to match production TS Cluster.