Blog: Disaster Recovery

Recently an information security audit customer of ours lost a backup domain controller and contacted their network vendor to rebuild the machine.  The bank thought everything was in order until three months later when they were audited.  The audit discovered the old backup domain controller had not been rebuilt to be a backup domain controller again as well as no antivirus software was installed.   When the bank contacted their network vendor, the bank was told there were some issues the vendor "meant to get back to".  Regardless of errors assigning roles for the domain controller, the vendor still should have installed antivirus and other applications requested by the bank. 

The reason why steps were missed? [more] No equipment recovery checklists had been created in the bank's Business Continuity Plan (BCP) so the vendor didn’t have a detailed list of steps to take in order to recover.  This can lead to both lost time and missed steps when rebuilding equipment.  Ensure equipment recovery lists exist for critical components of your infrastructure.


 

The Microsoft Exchange team is an interesting group. For years, it seemed like they didn’t listen to any user feedback about the product. The GUI was way too complicated and automation procedures were difficult because there was not a convenient CLI for the product. When Exchange 2007 came out, even though the GUI lacked a little functionality, the product as a whole was way better and it incorporated a lot of feature requests that Exchange admins have been asking for. Most notably, the LCR, CCR, and SCR features that allow local HA and DR for Exchange with a lot less complexity than past versions.

Now, enter Exchange 2010. Exchange 2010 takes everything that we were introduced to in Exchange 2007 on the availability side (LCR,CCR, SCR) and removes it. Yep, removes it. All the availability features have been merged into one technology called Database Availability Groups (DAGs). DAGs have a really nice feature set on paper…I say on paper because I haven’t really implemented them in practice, but the one deal killer at least for most of our customers is that DAGs REQUIRE Enterprise Edition OS licenses. [more]The reason is that DAGs still depend on pieces of Microsoft Cluster Services. This kinda stinks because in Exchange 2007 you could implement database HA with LCR and DR with SCR and never buy any Enterprise licenses. Like I said, the Exchange team is an interesting group….I wonder if they discussed this and decided...”that’s no big deal…doesn’t everyone have Enterprise volume licensing”.


 

When trying to use PlateSpin to seed a server image across a WAN connection for a DR site the job would fail after a certain amount of time. Come to find out the process has a time limit of 24 hours to finish or it will fail. This time limit is hard set and cannot be increased. A way around this is to use a local image server at both ends.

Update:  We've added another post that discusses WAN optimizations for Platespin that you'll want to read.

These are the basic steps you need to follow: [more]

  1. Create an image server local to the source (Discover then right click in Portability Suite and choose install image server).
  2. Capture the image (Drag the source to the image server).
  3. Export the image (this step fixes the config.xml file point to the right location after it moves the image files).
    • The Image Operations tool is installed with the PowerConvert server and not with the Image server.
      On the PowerConvert server you have to locate the following folder: “C:\Program Files\PlateSpin PowerConvert Server\bin”
    • In that folder locate the folder: “ImageOperations” and copy it to the Image server.
    • On the Image server:
      1. Open a command prompt.
      2. Navigate to the folder “ImageOperations’ that was copied over from the PowerConvert server.
      3. Run the command “imageoperations /gather /imagepath={path of the image} /output={path that you want to place the copy of the image}” without the quotes.
      4. Once the command completes the folder specified in /output will contain the files that need to be copied to the other image server.
  4. Move the files across the WAN (FTP, Physically, etc whatever method best works for the environment).
  5. Create an image server local to the target ESX host.
  6. Import the image into the image server local to the target.
    • The Image Operations tool is installed with the PowerConvert server and not with the Image server.
      On the PowerConvert server you have to locate the following folder: “C:\Program Files\PlateSpin PowerConvert Server\bin”
    • In that folder locate the folder: “ImageOperations” and copy it to the Image server.
    • On the new Image server:
      1. Copy the folder “ImageOperations” again to this folder from the PowerConvert server that installed this Image server.
      2. Open a command prompt.
      3. Navigate to the folder “ImageOperations” that was copied over from the PowerConvert server.
      4. Run the command “imageoperations /register /imagepath={path of files copied from old Image server}” without quotes.
      5. Once the command completes refresh the details of the Image server from within PowerConvert.
  7. From the Discovered Server list expand the image server local to the target and deploy the image to target ESX host
  8. Select the deployed server and choose Prepare For Synchronization
  9. Setup a filed based server sync job.

Once completed the job should allow for incremental updates over a slower link without hitting the 24 hour time limit.


 

The other day I was setting up a Disaster Recovery DHCP server. Part of the testing process was to set up a test branch with an additional 'ip helper' command in the router so that it would start forwarding DHCP broadcasts across the WAN to the Disaster Recovery site. I entered the command and immediately started seeing traffic at the DR DHCP server. However, i was seeing more UDP traffic than just DHCP. I also started seeing errors like this in the event logs:

The master browser has received a server announcement from the computer <MACHINE> that believes that it is the master browser for the domain on transport NetBT_Tcpip_{66AC525D-CD06-401. The master browser is stopping or an election is being forced.

[more]Its not uncommon to see these messages from time to time, but i was seeing these non-stop for about an hour. After some searching i found that the 'ip helper-address' command that is standard in our Cisco router config turns on UDP broadcast forwarding for 8 different protocols. DHCP is one of them, but i wanted to turn it off for all the others. So, i found this command:

ip forward-protocol upd <protocol/port>

The previous command was supposed to fix it. The router would accept 'ip forward-protocol udp dhcp' , but it would not show up in the running config. Finally, I realized it is one of those commands that that you have to turn off what you don't want instead of turn on what you do, so i entered in these commands to stop the NETBIOS broadcast traffic:

no ip forward-protocol udp netbios-ns
no ip forward-protocol udp netbios-dgm