Blog: vCenter

Recently, I deployed a new vCenter appliance (VCSA) – version 6.5 – with an external Platform Services Controller (PSC) appliance. VMware has made the deployment considerably simpler than it originally was with their first few appliance releases. Instead of having to import an OVA/OVF and do a lot of the configuring yourself, VMware has made an EXE available to configures most of those steps automatically. Simply step through the wizard, providing information such as “what host to deploy the appliance to” and “what deployment model would you like” (external or internal PSC) and the wizard will deploy/configure the appropriate OVA templates.

Unfortunately, the first time that I ran through this wizard, it hung around two-thirds the way through indefinitely. I even left it running overnight and it never completed. Looking through the deployment logs, it turns out that the deployment failed due to licensing issues.

debug: initiateFileTransferFromGuest error: ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation.
debug: Failed to get fileTransferInfo:ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation.
debug: Failed to get url of file in guest vm:ServerFaultCode: Current license or ESXi version prohibits execution of the requested operation. 

Granted, these hosts hadn’t been licensed yet – I had just upgraded the hosts from 6.0 and had assumed the evaluation license was in effect. Apparently not. I installed the full license and tried the deployment once more. Sure enough, that did it …

Moral of the story, if you don’t license your hosts for vCenter (i.e. using the free Hypervisor license), you will not be able to deploy the vCenter appliance.


 

After upgrading a customer to vCenter to 6.0, VMs that were being replicated with Veeam from one site to another started to issue an alarm for a "VM MAC Conflict". However, when I compared the MACs of the replicated VM and the original VM, they were unique. I had not upgraded the hosts at this point, only vCenter. Nothing had changed with Veeam, so this was a new issue as a result of the vCenter upgrade.

As it turns out, there is nothing wrong technically, this is simply a change in behavior in the alarm issued by vCenter. When Veeam replicates a VM, the replica VM initially has all the same settings (other than the name) of the source VM. vCenter sees the same MAC address on two VMs and alarms. vCenter then changes the MAC address of the replica (as had always been the behavior), but it never clears the alarm. You must clear it manually. Then when the next replication occurs, the alarm will trigger again.

I found several references to this issue online and most had suggested simply disabling the alarm to avoid vCenter showing the replicas with an alarm all the time, but that's not a great solution because no alarm would be generated in the event of an actual MAC conflict. Further research revealed a workaround. You can edit the alarm VM MAC Conflict in vCenter and add an argument to exclude VMs whose name ends in "_replica".


 

I came across an issue where two ESX servers that had been running for approximately 8-9 months without a reboot suddenly showed offline status in VCenter.  Looking at the events in vCenter, it showed that the ramdisk 'TMP' was full  and could not write to file /tmp/.SapInfoSysSwap.lock.LOCK.#####.

 

I got consoled into the ESX hosts and saw that there was a log file that had consumed most of the space at /tmp/mili2d.log.  From what I read, this file would have been removed upon rebooting the ESX Host, but that was not something I wanted to have to do if I could help it.

 

I reviewed the log file and determined there to be nothing of significance inside, but it had been filling up for months until reaching the limit on both hosts.  I thought I would just remove the file and reclaim the storage space, but that didn't reclaim the space. 

 

You can check the space allocation with command "vdf -h".  Here you can see the space left on the RAM Disk.

 

In order to get the ESX host to rescan the RAM Disk, restart the management services with "services.sh restart".  After I did this, the space allocation showed available, and the ESX hosts showed online again within vCenter without having to reboot the servers.


 

I recenly rebuilt a vCenter environment for a customer. We decided to use the vCenter Server Appliance 6.5. The configuration of the vCenter Server Appliance was fairly simple and operates very similar to vCenter Server installed on Windows. We attempted to setup email alerts, but were unable to get the alerts to send. We initially thought the alerts would not send due to an issue with the SMTP relay. Since this was not a Windows OS, I was not able to login to the OS and test the STMP relay using telnet. I checked my configuration of email alerts several times and the administrator of the SMTP servers checked his as well and everything looked correct on both sides, but emails still would not send.

After researching for quite some time, I found that I could use the "mailq" command to view the email queue on the vCenter Server Appliance. I connected to the vCenter Server Appliance via SSH, ran the "shell" command to get to the full shell, and then ran the "mailq" command. This showed me that several messages were in the mail queue and not being sent. I began to troubleshoot this more and eventually found an VMWare article regarding a bug in the vCenter Server Appliance 6.5 that prevented SMTP from working correctly. This article had been published one day before I found it, which was about a month after I first started troubleshooting the issue. From looking at the files, the original code had the wrong patch in the sendmail.cf file. 

Here is a link to the VMWare article with instructions on how to fix the bug: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2148396

The following must be done to successfully SCP the file to the vCenter Server Appliance: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2107727


 

I was recently working on a project to migrate a customer from a physical server to new virtual servers on a new ESX host. I installed ESXi 6.0 Update 2 on the new physical server and delivered to the customer site. After the server was onsite, I began building my first virtual machine. Since it was the first virtual machines and vCenter was not installed yet, I downloaded the VI client and connected to the host.

While creating the first VM, I received the following warning:

"If you use this client to create a VM with this version, the VM will not have the new features and controllers in this hardware version. If you want this VM to have the full hardware features of this verison, use the vSphere Web Client to create it."

According to the warning message, I needed to use the vSphere Web Client to create a VM with the latest full hardware feature set. The vSphere Web Client is part of vCenter, so I didn’t see how this was possible because vCenter was not installed yet. VMware has been planning to obsolet the VI client and moving to the web client, so I figured this was just a push in that direction. Obviously, this doesn’t work well for customers who are just building their first virtual servers. I didn’t need the new hardware features, so I just picked Virtual Machine Version: 11 and continued building the VM.

A few days later I was curious as to what the warning message meant and decided to do some more investigation. It turns out that with ESXi 6.0 Update 2, VMware started embedding a new VMWare Embedded Host Client (EHC) in ESXi. This new Embedded Host Client is a HTML5-based tool to directly manage the ESXi host and is a replacement for the VI client. This is nice because nothing needs to be downloaded or installed to manage the ESXi host using the EHC.

Here's a screenshot of the new EHC:

Knowing that the EHC exists, I now understand what the warning message I received when using the VI client was saying. They were not necessarily saying I had to use the vSphere Web Client that uses vCenter, but rather that I could connect directly to the ESXi host using the Embedded Host Client.

The VMware Embedded Host Client can be access by going to http://IPAddressOfESXiHost/ui. More information on the VMWare Embedded Host Client can be found here: http://blogs.vmware.com/vsphere/2016/04/vsphere-6-0-update-2-whats-new.html

 

 


 

This is handy if you need to quickly connect to the console of a VM and don't need any other features of the vSphere web interface. The documentation from VMware says to run this from the web interface, but it can be run standalone, like this:

"C:\Program Files (x86)\VMware\VMware Remote Console\vmrc.exe" "vmrc://DOMAIN\USERNAME@VCENTER.DOMAIN.COM/?moid=vm-VMID"

VCENTER.DOMAIN.COM should be replaced with the FQDN of your vCenter server.

The "DOMAIN\USERNAME@" can be omitted, but if you are saving this command somewhere, you might as well include your username.

Use VMware PowerCLI PowerShell command "get-vm MACHINENAME | fl id" to find the VMID.  Just use the part that starts with vm-.  You can also get these from the ESX console.  

Download VMRC from here: https://my.vmware.com/web/vmware/details?downloadGroup=VMRC90&productId=491.  There is a link to this on the vSphere web page.  This requires an account with VMware.


 

One of our customers reported their Veeam backups were failing. We determined the cause to be the vCenter services were stopped and would not restart. The vCenter issue was a result of the SQL Express database having grown to its 10GB maximum size. We were able to get the vCenter services running temporarily by purging performance data from the database using the procedure at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007453. [more]

This procedure removed enough data to get the services running, but didn’t reduce the overall size of the database significantly. I found a VMware SQL stored procedure named “dbo.cleanup_events_tasks_proc” that reduced the size of the database by 60%. After a couple of shrink file operations, the database and the vCenter services were up and running. 

However the Veeam backups failed yet again the next night. While the Veeam errors indicated that the vCenter services were again offline, this time it was because the virtual disk containing the SQL Server Express vCenter database was completely full. The transaction log for the vCenter database had bloated to 24GB and filled up the disk. This was confusing initially because I had checked the recovery model of the database prior to running the stored procedure to make sure it was set to “Simple” to prevent this very issue. 

With SQL Server the growth of the transaction log is directly proportional to amount of “work” that SQL Server has to perform between BEGIN TRANSACTION and COMMIT TRANSACTION commands. Certain SQL Server commands (insert, update, and delete) are always wrapped in implicit transactions. But some bulk operation transactions can be executed with explicit BEGIN/END TRANSACTION commands to control roll back. The stored procedure that I ran wraps a potentially large batch purge process in a SQL transaction that enables the entire process to be rolled back in the event of a failure. In this case, the lengthy stored procedure resulted in a ridiculously huge transaction log. Lesson learned is that “Simple” recovery model doesn’t guarantee the transaction logs will always be a manageable size.


 

Recently, I was able to upgrade a vCenter environment from 5.0 to 5.1. One of the major steps in this is the installation of the Single Sign-On service. This is an interesting installation as there are potentially a dozen gotcha’s before you even get to the install button. One of these said gotcha’s is this:

I got to a step where the installation wanted to talk to the newly created database named “RSA” (that I had created in an earlier step using some scripts). I had to formulate a jdbc (Yes, java) connection string so that it could successfully authenticate. During this process, I found that the application wanted to install two new users, an RSA_User and RSA_DBA to the database and configure permissions so that everything was secured around those two users correctly.

The problem I kept running into was an error that stated “Unable to authenticate to db”. That’s all. I was able to connect to said “db” using the same credentials I (thought) it was using. This didn’t happen to be the case. [more]

http://kb.vmware.com/kb/2035449

In my case, this error occurred because when the SQL instance was originally set up for the vCenter installation, it was set to use Windows Authentication only. For a vCenter 5.0 or prior installation, this is fine; however, for a vCenter 5.1, we’ve got to enable Mixed Mode authentication. I made the change on the instance, restarted services, and my installation continued (sort of) smoothly (not really, but that’s a Gotcha for another time).


 

I had been troubleshooting a failed vCenter upgrade recently and trying to restart the upgrade process. Every time I would run the installer, it would fail on some piece and rollback the install. I had opened up several windows trying to figure this out, including Event Viewer, Services.msc, log files, etc. and wasn’t easily able to find a reason for the failures. At one point, the error that I was getting was something about permissions denied; which was strange, as the account I was using had full admin rights on the system and SQL server.

I found an obscure posting on some forum somewhere that suggested closing down the services.msc window and then running the install again. I did so and the install was successful! I’ve never seen an application that had to have the Services.msc window closed in order to add or remove services, but some portion of this install process seemed to require it.


 

Do you know how much additional disk space is needed to delete 4GB of data? Over 10GB of disk space.

According to http://blogs.vmware.com/kb/2010/09/dealing-with-vcenter-41-database-tables-growth.html, “In larger VirtualCenter installations you might notice the VPX_EVENT_ARG and VPX_EVENT tables can become very large.” The KB article includes instructions for clearing out old events. In this particular case, I had several million entries over a few days. Issuing a broad delete statement was filling up the transaction log before it could complete, resulting in vCenter service crashes and other database problems. Rather than increase the transaction log allowed size significantly (including expanding the virtual disk), I resolved this by deleting one hour’s worth of  events at a time. This allowed the smaller transactions to complete and then re-use the space in the transaction log (since the database was set to use the Simple Recovery Model per VMware suggestions). [more]