Blog: VMware

In older versions of vCenter (and associated VMware products), installing corporate certificates was a pain. It was an extremely manual process to update each component. Just take a look at https://kb.vmware.com/s/article/2034833 to get an idea. 

With vSphere 6.x, certificate replacement is considerably simpler. You still need to have a certificate for each of the components, but VMware has automated a lot of the pain points into a simple certificate wizard. In addition, instead of creating individual certificates for each piece, you simply sign a certificate (created with a special template model), replace the root VMCA certificate with your custom signed one, and regenerate all the subordinate certificates signed by your custom signed root VMCA certificate. 

It's only slightly more complex when you have a separate PSC and vCenter appliance setup, but only in that you need to update all the PSCs before updating vCenter so that the connectivity between the two appliances remains secure. Check out https://kb.vmware.com/s/article/2147542 for all the details. 


 

VMware Content Libraries are a 6.x "new" functionality (I use quotations because 6.x has been out for a while) that I've really grown to enjoy. Content Libraries allow you to store objects such as ISOs, templates, etc. in a central repository that can be shared with other vCenter environments. One of the biggest use-cases is to clear up all those random "ISOs" folders on various datastores as engineers, who are searching for an ISO, look in just the wrong folder first and create their own. 

When Content Libraries were first released, there were a few issues that made it difficult to use. The main issue was that in order to use an ISO in the library, I had to download it first, and then connect it to through the web client. For that matter, I may as well just continue to use the locally stored ISOs on my system anyway. You could still deploy templates from the library, which helped, but ISOs were what I really was looking forward to. 

vCenter 6.5 (and 6.7) fixed this and allowed you to connect ISO images directly from a Content Library. When I stood up our 6.5 cluster internally, I started setting this up and created a repository on a FreeNAS device with the share exposed via NFS so that I could use some cheaper storage to hold this content. Growing excited, I tried to create my first VM with an ISO image stored in the library. And it didn't work. Turns out, in order to connect an ISO from a Content Library, the backend storage must either be a VMFS datastore directly connected to the host so that the VM can "see" it. 

https://kb.vmware.com/s/article/2151380?lang=en_US


 

For our audits, we run VMware Health Analyzer (VMHA) on any vCenters to collect information on ESXi build numbers, snapshots, dormant VMs, etc. Recently, a customer we were scanning had two vCenters, and while VMHA worked fine on one of them, we were getting errors on the other. Standard troubleshooting didn't work, and the customer didn't know why we weren't able to collect the information this year. After running nmap on the vCenter, we determined the customer had redefined the port used for this vCenter instance and simply defining the port in our scan credentials solved the issue.


 

Recently, quiesce snapshot jobs for some customers kept showing up as failed in Veeam with the error "msg.snapshot.error-quiescingerror".

After exhausting several research options I called VMware Support and we began sifting through event log files on the server as well as looking at the VSS writers and how VMware Tools was installed.

Looking at the log files on the ESX host the server was on led to this article:

https://kb.vmware.com/s/article/2039900

A folder named backupScripts.d gets created and references a path C:\Scripts\PrepostSnap\ which is empty. Therefore the job fails. The fix is found below:

  1. Log in to the Windows virtual machine that is experiencing the issue.
  2. Navigate to C:\Program Files\VMware\VMware Tools.
  3. Rename the backupscripts.d folder to backupscripts.d.old

If that folder is not present, and/or if the job still fails, the VMware services checked.


 

Recently, I deployed a new vCenter appliance (VCSA) – version 6.5 – with an external Platform Services Controller (PSC) appliance. VMware has made the deployment considerably simpler than it originally was with their first few appliance releases. Instead of having to import an OVA/OVF and do a lot of the configuring yourself, VMware has made an EXE available to configures most of those steps automatically. Simply step through the wizard, providing information such as “what host to deploy the appliance to” and “what deployment model would you like” (external or internal PSC) and the wizard will deploy/configure the appropriate OVA templates.

Unfortunately, the first time that I ran through this wizard, it hung around two-thirds the way through indefinitely. I even left it running overnight and it never completed. Looking through the deployment logs, it turns out that the deployment failed due to licensing issues.

debug: initiateFileTransferFromGuest error: ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation.
debug: Failed to get fileTransferInfo:ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation.
debug: Failed to get url of file in guest vm:ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation. 

Granted, these hosts hadn’t been licensed yet – I had just upgraded the hosts from 6.0 and had assumed the evaluation license was in effect. Apparently not. I installed the full license and tried the deployment once more. Sure enough, that did it …

Moral of the story, if you don’t license your hosts for vCenter (i.e. using the free Hypervisor license), you will not be able to deploy the vCenter appliance.


 

There are power management settings that should be checked when running ESX on HP Proliant G6 and above or Dell PowerEdge 11th and 12th Generation servers.  See VMware article for details: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1018206‚Äč 

The Proliant G8 that I examined having performance issues was set in the BIOS to use "HP Dynamic Power Savings mode" instead of "HP Static High Performance mode".  This can have an impact on virtual machines ability to utilize the CPU of the host.   This setting can be changed through iLO without the need to get into the BIOS directly to make the change.  It does not require a reboot of the ESX host to change the setting this way, which is even better.


 

After upgrading a customer to vCenter to 6.0, VMs that were being replicated with Veeam from one site to another started to issue an alarm for a "VM MAC Conflict". However, when I compared the MACs of the replicated VM and the original VM, they were unique. I had not upgraded the hosts at this point, only vCenter. Nothing had changed with Veeam, so this was a new issue as a result of the vCenter upgrade.

As it turns out, there is nothing wrong technically, this is simply a change in behavior in the alarm issued by vCenter. When Veeam replicates a VM, the replica VM initially has all the same settings (other than the name) of the source VM. vCenter sees the same MAC address on two VMs and alarms. vCenter then changes the MAC address of the replica (as had always been the behavior), but it never clears the alarm. You must clear it manually. Then when the next replication occurs, the alarm will trigger again.

I found several references to this issue online and most had suggested simply disabling the alarm to avoid vCenter showing the replicas with an alarm all the time, but that's not a great solution because no alarm would be generated in the event of an actual MAC conflict. Further research revealed a workaround. You can edit the alarm VM MAC Conflict in vCenter and add an argument to exclude VMs whose name ends in "_replica".


 

I came across an issue where two ESX servers that had been running for approximately 8-9 months without a reboot suddenly showed offline status in VCenter.  Looking at the events in vCenter, it showed that the ramdisk 'TMP' was full  and could not write to file /tmp/.SapInfoSysSwap.lock.LOCK.#####.

 

I got consoled into the ESX hosts and saw that there was a log file that had consumed most of the space at /tmp/mili2d.log.  From what I read, this file would have been removed upon rebooting the ESX Host, but that was not something I wanted to have to do if I could help it.

 

I reviewed the log file and determined there to be nothing of significance inside, but it had been filling up for months until reaching the limit on both hosts.  I thought I would just remove the file and reclaim the storage space, but that didn't reclaim the space. 

 

You can check the space allocation with command "vdf -h".  Here you can see the space left on the RAM Disk.

 

In order to get the ESX host to rescan the RAM Disk, restart the management services with "services.sh restart".  After I did this, the space allocation showed available, and the ESX hosts showed online again within vCenter without having to reboot the servers.


 

I was updating ESX with a customer a few weeks ago and ran into issues. We successfully upgraded from ESXi 5.1 to 5.5 Update3 using the custom Dell ISO. We then attempted to update to the latest version of ESXi 5.5, but the host purple screened upon reboot. We decided to call VMware support to create a trouble ticket. The VMware engineer provided a simple solution for our issue, which was to press Shift+r when the Hypervisor progress bar starts loading. This takes you to a menu where you can select the previous build. The VMware article can be found here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033604. We followed these instructions and were able to successfully boot the ESX host again.

 

I believe what caused the purple screen was that vSphere Update Manager tried to install HP updates on Dell hardware. It turns out that vSphere Update Manager does not detect what updates are actually needed, just what isn’t installed. The fix for this is to create different baselines for each brand of hardware in mixed hardware environments.

 


 

I recenly rebuilt a vCenter environment for a customer. We decided to use the vCenter Server Appliance 6.5. The configuration of the vCenter Server Appliance was fairly simple and operates very similar to vCenter Server installed on Windows. We attempted to setup email alerts, but were unable to get the alerts to send. We initially thought the alerts would not send due to an issue with the SMTP relay. Since this was not a Windows OS, I was not able to login to the OS and test the STMP relay using telnet. I checked my configuration of email alerts several times and the administrator of the SMTP servers checked his as well and everything looked correct on both sides, but emails still would not send.

After researching for quite some time, I found that I could use the "mailq" command to view the email queue on the vCenter Server Appliance. I connected to the vCenter Server Appliance via SSH, ran the "shell" command to get to the full shell, and then ran the "mailq" command. This showed me that several messages were in the mail queue and not being sent. I began to troubleshoot this more and eventually found an VMWare article regarding a bug in the vCenter Server Appliance 6.5 that prevented SMTP from working correctly. This article had been published one day before I found it, which was about a month after I first started troubleshooting the issue. From looking at the files, the original code had the wrong patch in the sendmail.cf file. 

Here is a link to the VMWare article with instructions on how to fix the bug: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2148396

The following must be done to successfully SCP the file to the vCenter Server Appliance: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107727