Blog

A week or so ago, I had a task to do some work on the VMware cluster for one of our customers. It was late (~11:00 PM) when I finished the task, but I decided to go ahead and try to do a P2V conversion on one of the physical servers that we had targeted for virtualization a long time ago. It’s that server that sits in the corner and just keeps running…and nobody knows a thing about it. Everybody is just happy it keeps running because it’s on 10 year old hardware. It’s the poster child for virtualization. I was 90% sure that the conversion wouldn’t even kick off because the server has such an odd-ball configuration, but to my surprise it worked fine. P2V conversion was done in about 90 minutes (~12:30 AM). When I do these conversions, the first thing I do is start the VM up without the NICs connected just to see if the new VM will boot.

I knew it was too good to be true...BSOD with immediate reboot. Error message “Inaccessible boot device”. If it were any other error message, I probably would have just hung it up for the night, but I have fixed conversions that give this error a number of times. Most of the time it’s due to the existence of a recovery partition on the physical system and the boot.ini has the wrong partition numbers in the boot parameters…easy fix. I attached the vmdk to a helper VM to access the files. The boot.ini file looked fine and all the files necessary for boot made it over. So, I took a closer look at the conversion logs. I didn’t notice it while the conversion was running, but right at the end of the job, a warning was logged…”virtual machine reconfiguration failed”. So what is reconfiguration… never noticed that before. [more]

The “reconfiguration” keyword as well as the stop code led me to the following VMware article: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1005208. Basically, the article provides instructions on how to inject the VMware SCSI controller driver into the physical machine pre-conversion. The reconfiguration task that failed on the conversion indicated that VMware was unable to inject the driver and registry keys into the VM post conversion…ah yes, the odd-ball configuration. The server was built with only a 100 MB c:\ partition. The c:\ partition only contains the windows boot files. The boot.ini references the \windows folder on partition(2), which is the d:\ drive. Why was the server built that way??...nobody knows. So, VMware tried to inject the drivers to fix the boot problem, but couldn’t because it doesn’t read the boot.ini to find out where the \windows folder it. It just assumes it will be at c:\windows and that c:\ will be partition(1). So, there is the cause…so how do I fix it. By this time, its 1:30 AM and I really didn’t want to wait 90 minutes for the conversion to run again. The article explains how to fix this problem. I just had to figure out how to make those changes post-conversion. Specifically, we want to follow the Windows 2003 (lsilogic controller) instructions. So here we go.

  1.  Copying the symmpi.sys file to the %systemroot%\system32\drivers\ folder is pretty easy. Just mount the vmdk to a helper VM and copy the file from another windows 2003 virtual machine (on vmware of course) to the target vmdk.
  2. The registry edits is where it gets a little tricky. The goal in this step is to manually load the registry hives for the VM that won’t boot and inject the .reg file changes into the registry.
    • Follow the article to export the registry sub-hives to .reg files from another windows 2003 virtual machine.
    • From your helper VM where your unbootable vmdk is mounted, using regedit.exe, highlight HKEY_LOCAL_MACHINE and choose File -> Load Hive.
    • Browse to %systemroot%\system32\config inside your unbootable vmdk. For this instance, it was on partition(2) or d:\windows\system32\config. This directory contains all the physical files that provide the registry tree. Choose the file named “system” to load.
    • You will be prompted for a name to load it as…just type the hostname of the unbootable VM (server1). This will load the system hive for the unbootable VM under HKEY_LOCAL_MACHINE in regedit.
    • If you browse down into that mounted registry hive, you will see there is no CurrentControlSet. Your registry keys that were exported need to be put back in CurrentControlSet. Your safest bet is to restore the registry hives into ALL control sets so that you can be sure you get it in the right spot.
    • You have to make sure that the registry changes get imported into the mounted registry hive and not your local machine registry hive. Open the .reg files and do a search and replace for the string “HKEY_LOCAL_MACHINE\system\currentcontrolset” and replace it with HKEY_LOCAL_MACHINE\server1\system\controlset001”. As mentioned in the step above, you may have multiple controlsets (controlset001, controlset002, etc.) which means you will have to edit the .reg file multiple times and re-merge it to get the registry settings into all control sets.
    • Right-click on each of the .reg files and merge them. Double-check to make sure all the entries got added.
    • Highlight HKEY_LOCAL_MACHINE\server1, and choose File à Unload Hive to unmount system registry file.
  3. Remove the vmdk from the helper virtual machine. Make sure to just disconnect it and not delete it from disk.
  4. Start your previously unbootable virtual machine. Should work find now. (3:00 AM)

So, this process is way easier if you just re-do the conversion. I ended up having to do this conversion again due to other circumstances and did the injection pre-conversion the second time and it works as well. However, I have had conversions that took 4-5 hours due to the amount of data involved. This process is obviously justified in those situations.
 


 

I've used the open source telnet/ssh client Putty for many years to manage remote machines. Recently I was using a blog post to setup a virtual machine to be used as a web server. The instructions were complete, but the commands were long and difficult to retype. I wanted to copy and paste but I couldn't use the Ctrl-V to paste in Putty because the key sequence is just sent to the connected computer. It turns out copy and paste is extremely easy in Putty, but you need to read the help file. Copy and Paste works with the mouse. Select any text with the left mouse button and it is immediately copied in the Windows clipboard. Right-Click to paste the contents of the clipboard into Putty. Here are a few bonus tips: [more]

Shift-Insert will also Paste into the Putty window.

Shift-Right-Click will bring up a context menu in the Putty window. The top menu item is Paste.

Double-Click will select the whole word below the mouse cursor and copy it to the clipboard.

Triple-Click will select the whole line below the mouse cursor and copy it to the clipboard.


 

During a level platforms monthly maintenance window, there was a server that decided it wanted to do updates the following week and not the night of the scheduled maintenance.  It appears for some reason it didn’t receive the update policy until after the scheduled time so it set the schedule to the following week.  If you have this problem, the follow procedure worked for me:

  1. Log into the SC.
  2. Go to Patch Management – Settings
  3. Click on Windows Update Agent Policies, and select the appropriate server group.
  4. On the Policy tab, change the Automatic Update option to “Auto download and notify for install”
  5. Log into the server that you are having problems with and run the following command in command prompt:
    • wuauclt.exe /detectnow
    • Windows Server 2008 makes it a little more friendly.  You can just open up the Windows Server Update Service console and click on the “Check for updates” link.

 

Recently, I was working on two similar issues with two different laptops. Both users reported problems with opening word documents. When I started troubleshooting the first laptop, I opened Word and then proceeded to wait for around 15-20 seconds for the application to start and another 15 seconds for a new document to be created. When I closed Word, I got a prompt telling me that there were changes to the normal.dotm template and asked if I wanted to save them. Sure, why not? Re-opened word and the problem still existed. When I received the same error message on closing Word, I decided to go check out the templates folder where normal.dotm is stored. In both the Templates folder and the STARTUP folder (for Microsoft Word), there was a template file that was stuck open with a tmp file present. I removed both tmp file instances and successfully started Word. Problem solved!

The second laptop was slightly different. Word would start immediately, but when double-clicking on a document, it would fail to load. No error message; it just wouldn’t load. Looked in the templates directory and didn’t see the same symptoms. After more troubleshooting, I began disabling add-ins for Word. Turns out, the “Send via Bluetooth” add-in was causing these problems. I disabled the add-in and all was good in the world.


 

I was working with a third party vendor to set up SQL Reporting Services. The reporting services install was on a remote server, while the database for SQL Reporting services was on a remote SQL Server.  Once SQL Reporting Services was installed and I was using the Reporting Services Configuration Tool, I kept getting an error when trying to create the Reporting Service database.  The error was non- specific and said I should look into permissions.  I checked permissions and determined that was not the problem.

One interesting note in this case was that there was already a database for another SQL Reporting Services install on the SQL Server named ReportServer, which is the default name. Because of that, I had to change the default name of the database for this install to a different name other than the default – in this case it was EdgeSightReportServer.

The configuration tool would create the databases as requested on the SQL server, but would give errors when running the scripts post creation. In looking at the logs, I found errors stating that the database could not be found – the name of the database in the log was ‘EdgeSight’. The ReportServer portion of the name had been removed when the script ran, and of course was nowhere to be found. [more]

The error was that ‘EdgeSight’ database did not exist. Indeed it did not. The big question is why did the script say Use EdgeSight to start with? I then set out to try the following:

  • Try to configure reporting services with a database named XYZReportServer
  • Try to configure reporting services with a database named XYZ

What needs to be kept in mind is that the default name for a SQL Reporting Services database is ReportServer.  Generating scripts for these two different database names gave some interesting results.  No errors with the database name as XYZ.  But XYZReportServer will generate the error.  Further testing showed that [AnyName]ReportServer generated this error.  Any other database name worked.

So what’s with this?  I briefly searched and did not find anything on this subject.  Is it a bug?  a feature?  Is it something that only I have experienced? Who knows, but in the meantime if you are getting an error creating a reporting services database check the name.


 

At a bank IT consulting customer, the print spooler on both of print cluster nodes was crashing multiple times a day and posting the following error. The DLL in question was part of the Xerox Global Print Driver package.

Faulting application spoolsv.exe, version 6.0.6002.18294, time stamp 0x4c6a9898, faulting module x2utilGO.dll, version 5185.4100.0.0, time stamp 0x4d46e6ea, exception code 0xc0000005, fault offset 0x0004cf8a, process id 0x778, application start time 0x01cc15d439353ddd. [more]

SOLUTION:
When looking at the orphaned spool files, I found that some of the files had a .TMP extension. These type of spool files are associated with LPR print jobs. I was able to pull the printer name from the spool file and found the specific printer that sent the job. This printer was added to the network on the evening of the 16th – the print spooler started having issues on the 17.

In looking at the configuration of the printer, the TCP\IP port was set to use the LPR protocol. This was a configuration that we had used on some printers in the past. When the new printer was setup, it was assigned to the port that was used previously (which is a common procedure). Even though the documentation states this printer supports the LPR protocol, it clearly has an issue with this configuration. I set the port back to the Raw protocol and also checked every other Xerox printer port and set it from LPR to Raw where necessary (8 printers total).


 

A customer who does CPA work, was getting errors submitting tax returns electronically. They were instructed to install an update to install the new forms needed. During the installation by one of the employees, it stopped responding and only half installed. They had been instructed to reinstall the old version over the current  install then run the update again. I was asked to perform the procedure.

Every time I attempted to re-install the older version it would hang and then give me an error that it was the wrong operating system.  I attempted the install from a Windows 2003 and Windows XP system which is how it is normally installed. After consulting with ProSystems support found that the problem was that the Microsoft installer was trying to run with the installation. The tech said “right after starting the install, open the task manager and kill any instances of MSIEXEC.exe that is running”. I did this and the install ran without any problem. I then apply the updates and it installed the needed updates, using the built in update agent, without any issues.

The nice thing is that when I asked the tech if this was documented anywhere, his response was “nope”.


 

From time to time, one of my desk monitors would take on a yellow cast.  It happened recently and was persistent in lasting about three days.  I was thinking the monitor would have to be replaced when a co-worker came by my desk, glanced at the monitor and said “you’ve got a bad connection.”

At that time we checked the monitor cable and found it was secure, but the “bad connection”  idea made me wonder if perhaps my laptop wasn’t properly seated in the dock.  I undocked and then docked again, taking care the laptop was firmly locked into place.   When the monitor came up this time, it was back to the normal color.

Thanks my to my co-worker for recognizing the problem right away.  And when checking connections, it is important to think through every link along the way.


 

I’ve been upgrading our internal Office Communication system to the new Lync 2010 environment. Everything I had been reading showed the two servers can run side-by-side, albeit with different pools created. Running side-by-side allows for easy testing and migration rather than switching everyone over and hoping it works. Unfortunately, what I didn’t realize was the two servers use some of the same database names. While Microsoft has documented this, you have to dig a little through the documentation to find it.

I discovered this travesty soon after I hit the magic “go” button. This button (also known as “Publish topology”) started the deployment of the Central Management Store into my SQL instance. This process involves taking existing databases and placing them into restricted mode. Then the installer attempts to drop the database and recreate it. Since these were OCS R2 databases, however, the Lync installer had problems recreating over the existing table layout. The whole process choked, leaving the databases in a funky, inconsistent, restricted state and communicator non-functional. I was able to connect as ‘sa’ and remove the restriction, but the databases were pretty much a lost cause. Restoring from the previous night’s backup allowed everyone get back online.

Moral of the story: Here’s yet another Microsoft product that does not warn you before dropping databases. Be wary when installing applications that automatically set up databases as part of the installation procedure.


 

Recently, I was working on our Cisco 3560E switch. I needed to create a route-map and apply it to an interface for some changes we were planning to make. I was able to create the route-map, but it wouldn’t allow me to apply it to the interface with the “ip policy route-map” command. After doing some research I realized to apply a route-map to an interface for policy-based routing our switch had to be licensed for “IP Services.” The command wasn’t even available, which makes sense being that it wasn’t supported. While upgrading the license; I tried to apply the “ip policy route-map” command again. I did a “sh run int vlan 1” to see if the command was showing up in the config and to my surprise it wasn’t listed!  I went back to the documentation and found the route-map command “set ip default next-hop” was not supported at all on the 3560/3750 switch platforms. I removed this command from my route-map and applied the route-map to the interface and everything seemed to apply correctly. Unfortunately, our whole plan revolved around the ability to use the “set ip default next-hop” command.

So when you are working with Cisco equipment there are at least 3 ways they let you know a command isn’t supported in the IOS:[more]

  1. The logical way:  The command isn’t present in IOS and it can’t be used.
  2. The illogical way: Allow you to apply a command, but doesn’t prompt you with an error if another “child” command is not supported.  This can only be discovered if you review the configuration and see that the command you entered is nowhere to be found.
  3. And what I like to call “The Cisco Way”:  Include the command in the IOS to lead users to believe that the command is supported and works with that IOS/platform all the while not supporting the command in any variation of the IOS/Platform.

After further review, the documentation did have a note that stated the command we needed wasn’t supported on these platforms.  In summary, it is a good idea to fully read any and all documentation on supported/unsupported commands for a platform.