Blog: Networking

In working with XenServer over the past couple of months, I have found that information is harder to come by than it is with VMware. We are only using XenServer for one customer and they are using the free version so support is not an option. Up until last week, I had no need to get into the CLI of Xen much. It’s pretty easy to configure via XenCenter and our setup is pretty simple. However, the other day, our monitoring software detected an issue where the network interfaces on one of the monitored VMs was logging a high number of discards. One of the peculiar things was that the discards were the exactly the same for Tx and Rx. After some research, I decided that it would be a good idea to run off all the offloading features in XenServer. XenServer sees network interfaces in two forms: physical interfaces (pifs) and virtual interfaces (vifs). Pifs are the actual connections to the server. Vifs are the NIC interfaces of the VMs. Naturally, turning off all of this can only be done via the XenServer CLI. So, part one of the gotcha…here is a set of scripts that can help in manipulating network interfaces in Xenserver
 
Script to turn off all offloading techniques off on all vifs and pifs: [more]
====================================================
#!/bin/bash
 
if_modes="rx tx sg tso ufo gso"
 
if [[ "$1" == "--local" || "$1" == "-l" ]]; then
echo -n "disabling checksum offloading for local devices... "
for iface in $(ifconfig | awk '$0 ~ /Ethernet/ { print $1 }'); do
for if_mode in ${if_modes}; do
ethtool -K $iface $if_mode off 2>/dev/null
done
done
echo "done."
else
echo -n "disabling checksum offloading in xapi settings... "
for VIF in $(xe vif-list --minimal | sed -e 's/,/ /g')
do
###xe vif-param-clear uuid=$VIF param-name=other-config
for if_mode in ${if_modes}; do
xe vif-param-set uuid=$VIF other-config:ethtool-${if_mode}="off"
done
done
for PIF in $(xe pif-list --minimal | sed -e 's/,/ /g')
do
###xe pif-param-clear uuid=$PIF param-name=other-config
for if_mode in ${if_modes}; do
xe pif-param-set uuid=$PIF other-config:ethtool-${if_mode}="off"
done
done
echo "done."
fi
====================================================
- create text script file (turnOffloadingOff) using VI
- Change perms to make it a script
                        chmod 777 turnOffloadingOff
- Run script
                        ./turnOffloadingOff
 
Other Useful XenServer Commands
 
- determine uuids of physical interfaces on XenServer
                        xe pif-list host-name-label=<hostname>
- determine parameters of the specific pif given the uuid
                        xe pif-param-list uuid=<uuid of pif>
- determine uuids of virtual interfaces of VMs on Xenserver
                        xe vif-list
- determine parameters of the specific vif given the uuid
                        xe vif-param-list uuid=<uuid>
- new VMs created since the script was executed will NOT have the same vif other-config settings disabled
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-gso=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-ufo=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-tso=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-sg=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-tx=”off”
xe vif-param-set uuid=<uuid of vif> other-config: ethtool-rx=”off”

So on to part two of this post…this didn’t fix the problem. After a ton of additional troubleshooting, I determined that this behavior is due to the Citrix Paravirtual NIC driver. The issue goes away if you uninstall XenTools and the PV driver isn’t used. On Windows 2008 and Windows Vista/7 VMs, the PV Ethernet driver reports discards to the OS. In Windows 2003 and XP, it does not. Keep in mind the discards could be broadcast packets not intended for the VM or misc DOM0 traffic. In any case, it doesn’t make much sense, but there isn’t actually anything wrong with the VM. I ended up removing the monitoring of the VMs NIC interfaces because I did want to use the PV Ethernet driver.


 

We recently encountered some machines running Windows Embedded with the Write Filter enabled that were losing their trust relationship with an Active Directory domain due to mismatched passwords associated with computer accounts. [more]

Cause: After 30 days (default), machine account password expires. The password is updated on the machine as well as in AD. At some point, the machine is rebooted. Since the machine is running in Read Only mode (write filter enabled), the password associated with the computer account is reverted back to the password that is stored with the image on the machine. Since that password does not match the one stored in AD (the updated password), the machine can no longer communicate with the DC and the trust is broken.

Resolution: Windows Embedded Standard (from XP forward) has the ability to retain specific registry keys across reboots. It is called the Registry Filter service (Regfilter), and it works like this: determine what you want to retain, and configure it in a specific area of the registry. The service will monitor the specified key for changes, if there are any it'll both keep them in memory and write them in a specific way to c:\regfdata. From then on, any system call to read or write to that key will instead be reading from and writing to the key in memory and in c:\regfdata. When using a prebuilt HP image, keys for the Terminal Services Client Access License (TSCAL) and Domain Secret Key (key that holds the secret password for the issue above) are already added to the regfilter registry key. This process didn’t seem to be working with the current HP image we were using. However, the most current image on HP’s site did work.


 

One of our customers was having problems connecting Outlook to exchange accounts hosted with Microsoft through their Office365 program. The machines in the domain running Windows XP with Office 2007 had no problem connecting, but none of the Windows 7 machines with Office 2010 were able to connect. Since the email accounts were hosted at Microsoft, Outlook was using port 80 web traffic to establish a connection. After exempting the source IP of the test machine from filtering in the Barracuda, the connection immediately worked. This proved that something was not working correctly inside the Barracuda.

The domain outlook.com was whitelisted prior to these changes. After talking to Barracuda tech support, they found several IP addresses that Outlook was trying to contact. They suggested adding those IP addresses to the list of IP addresses that bypass the Barracuda, which is the proxy server, and opening port 80 for those IP addresses on the firewall. We made the suggested changes and it worked correctly. The Barracuda engineering department found that the traffic to outlook.com was being redirected to live.com, and therefore being dropped by the Barracuda. Barracuda suggested we add an expression to the Barracuda to allow port 443 traffic to live.com, but they later said we would probably have to whitelist live.com for this to work properly. We chose to just leave port 80 open to those IP address on the firewall and have clients bypass the proxy for those addresses.
 
When troubleshooting issues that might be related to the Barracuda, it is often helpful to temporarily exempt the source IP of the machine on which you are working. When the Barracuda is in Forward Proxy mode, this can be done by going to Advanced > Proxy. Add the IP to the Source IP group under the Proxy Authentication Exemptions.


 

We recently became aware of a problem with Exchange 2010 users being unable to set their out of office settings.  With their legacy Exchange 2003 mailboxes, they could set out of office.

When trying to set out of office within Outlook, users would get an error message that the Exchange server could not be contacted.  Performing the “Test e-mail autoconfiguration” kept failing to connect to the server with HTTP status code 401 Unauthorized.  It was also noted that OWA would not allow logins because the login credentials would not work for anyone.

After trying to troubleshoot permission problems within IIS of the mail server, I eventually came across this thread:[more] http://social.technet.microsoft.com/Forums/en-US/exchange2010/thread/36662e7c-8c4a-44dc-85d9-eb7fab1d8b49/

I ran powershell as an administrator on the server, and typed in the following:

  • Import-Module ServerManager
  • Add-WindowsFeature NET-Framework,RSAT-ADDS,Web-Server,Web-Basic-Auth,Web-Windows-Auth,Web-Metabase,Web-Net-Ext,Web-Lgcy-Mgmt-Console,WAS-Process-Model,RSAT-Web-Server,Web-ISAPI-Ext,Web-Digest-Auth,Web-Dyn-Compression,NET-HTTP-Activation,RPC-Over-HTTP-Proxy

It appears that this command re-imports many IIS modules.  In the article, it has a –restart at the end, but I left it off to prevent the server from rebooting.  It was not necessary in my case in order to resolve all of the issues with OWA/OOF/Autoconfiguration.


 

One of our customers has a point-to-point wireless connection, which started failing with an error that indicated problems with radio interference.  I ran the utility to check for busy radio channels, but it did not indicate any problems.  (Many channels came back as completely unused.)  I eventually reduced the transmit power of the root-bridge radio, which caused the connection to come back up. 

In retrospect, the issue was likely caused by a reflection that caused a second radio signal out of phase with the original signal.  This reduced or eliminated the signal at the antenna.


 

I was recently trying to factory reset a Cisco Express 500 switch for use at a customer site.  I researched Cisco’s website and other websites, but nothing I tried would work.  The basic steps are these:

  1. Hold down the mode button while applying power to the switch.
  2. After the mode lights turn amber, let go and the switch will reset to defaults.
  3. After a short time a port (usually port 1) light will start blinking.  Plug your workstation/laptop into that port.  Your workstation/laptop should then acquire a DHCP address from the switch.
  4. You should then be able to access the web GUI using the default IP address.

Unfortunately, none of the online documentation I read mentioned the fact that this only worked when Windows XP was the operating system.  Windows Vista or Windows 7 will not work.  I did not find this out until after the fact when another engineer, who had also struggled with this issue, informed me that this was the case. 


 

Cisco SGE2000 switches (and other Cisco switches) with a web interface still require that the running configuration get saved to the startup configuration.  Oddly, the option is buried under the “File Copy” menu option.  The “Save Configuration” menu option is for saving a backup (text) copy of the configuration.


 

One of our customers has a device that re-pins debit cards.  During the migration from moving users off the old Citrix farm to the new CoNetrix Citrix farm, users were having issues with this Magtec Application.   When we launched the application it would pop up a “Request Pin Timeout” error.  This meant that the application was unable to detect the Magtec IntilliPen device through the Citrix client.  We were on a very strict time schedule so a coworker began looking into the issue first as I continued to migrate users.  Four hours later after numerous tests, Magtec still wasn’t working. [more]

After we finally finished migrating all the users, (2 weeks later) my coworker tagged me to help tackle this issue of why this was not working.  After numerous tests,  I was sure that something in the intCat.ini file wasn’t set up correctly.  We could launch this application and it would connect fine on the old farm but on the new farm it would not work.  After calling Magtec support 3 times, explaining to them “I KNOW YOU HAVE SEEN THIS PROBLEM BEFORE” I finally got a technician to send me over a document that was Citrix and Terminal Server friendly.

In this document I found 2 changes that needed to be added to the intCat.ini file

a.       [Communication] add IPType="REMOTE_RS232"

b.      [Motorized_Intellicoder] change to DeviceName="MCPUM_COM2"

The communication part I believe is the most important piece as it tells the application that instead of using the local COM1 port it uses the Clients Com1 port.  After adding this in the intCat.ini file the device connects up immediately.


 

Last week, one of our customers experienced a system wide outage due to a new MAC on the network. That’s right… it is typed correctly. During the setup of a MAC OS Lion 10.7 installation, one of the users had a home folder set in their Active Directory properties. When the MAC user logged in and the laptop attempted to map the user’s home folder, the EMC Celerra NS40 NAS that serves as the back-bone of the network serving CIFS shares and about a dozen iSCSI LUNs to Microsoft Exchange, SQL Server, and VMware hosts, experienced a kernel panic and crashed….very ungracefully I might add. The outage only lasted about 2-3 minutes while the datamover failed over to the standby. However, it took nearly two hours to clean up the carnage post-failure. Not to mention, two hours outside of business hours to fail the unit back to the primary datamover.

Specifics about the exact cause are not 100% clear, but two things are known. First of all, MAC OS 10.7 uses a different SAMBA client than previous versions. Formerly, MAC OS used a bundled open source SAMBA software for Windows file share and network directory services. With 10.7, they have rolled their own and replaced the open source code with their own flavor of SAMBA. According to EMC, the root cause of the issue is that MAC OS 10.7 passes a NULL value in one of the SAMBA message headers. Obviously, this is a problem for the Celerra. No word as to how gracefully Windows file server handle this issue.


 

When troubleshooting file shares not working to several computers, I was presented with an error saying I could not connect. The specific message was “Not enough server storage is available to process this command.” I was able to find an article discussing this at http://support.microsoft.com/kb/225782/.  After increasing the registry key at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\... LanmanServer\Parameters\IRPStackSize by a decimal value of 3 and rebooting, the file share immediately started working. The value should be increased by increments of 3.