Blog: Networking

After working with SecurID to migrate to a new server (which requires a complete new server, fresh install of the software, access to the original seeds, backup and restore of the current database, etc.), we finally got the RADIUS server responsive, but I still could not get it working with the Cisco routers.  One particularly aggravating issue that held me up for a while is that the router not only allows for the configuration of multiple RADIUS servers, but it allows multiple entries for the same server.  Thus, if you initially use the wrong port numbers, and you re-enter the line with the correct port numbers, the line with the incorrect information will remain active and your RADIUS tests will continue to fail.

Lesson learned:  Sometimes you have to read through your configuration again, to make sure everything is the way you “know” it is.


 

SQL Server Access from remote client machines:  If you have a named instance of SQL Server, then there are specific things you need to do to allow clients to access the SQL database remotely. The default instance of SQL Server uses port 1433. Even for the default instance you will probably want to open an Inbound Rule on the SQL host that allows any inbound communication on port 1433 or allows the sqlsvr.exe access via this port.  For named instances, the ports that are used to talk to the SQL box are by default dynamic. The SQL browser can advertise the exact ports used by the named instances if UDP is allowed through the firewall on port 1434. A complete explanation of these issues can be found in this document: [more] http://blogs.technet.com/b/nexthop/archive/2011/04/12/using-lync-server-2010-with-a-custom-sql-server-network-configuration.aspx . This document specifically addresses the LYNC client connection, but the principals are the same regardless of the application.


 

Awhile back, we had a problem with two of our Cisco Aironet devices at a customer site that kept dropping their wireless connection to each other.  The old radios had been replaced by these newer aironet devices while still keeping the same antennas.

The error message, among many, in the logs stated: “Packet to client <clientname> reached max retries, removing the client”.

At first, we thought that they might be overshooting the other end by having their signal strength too high.  We tried lowering the signal strength but it didn’t really help the issue that much.  We tried scanning the airwaves for interference, but couldn’t decisively find anything troublesome with the channel frequencies.  We also checked the alignment of the antennas and saw that they appeared to be in the same position that they had been for quite some time.

I came across a wireless troubleshooting guide by Cisco that mentioned that the problem can be an indication of a bad RF. It suggested putting the command “packet retries 128 drop-packet” on both access points as a workaround for bad RF.  After this command was applied, we the wireless connection stopped disconnecting.


 

Background setup:

This site has VMware vSphere 5.0 hosts which are connecting to NFS datastores on a NetApp SAN/NAS.  There is a dedicated switch stack of Dell PowerConnect 5524 switches between the NetApp and the VMWare hosts.

Issue description:

Over the last couple weeks I have been seeing where VMWare virtual machines would pause or in some cases disconnect sessions.  The Windows event log would consistently record an Event ID 129 with a Source of LSI_SAS: "Reset to device, \Device\RaidPort0, was issued."  I did some further research and found that this event is usually generated when there is high I/O on the SAN.  However, the SAN at this location wasn’t experiencing high I/O. 

I started to notice the following NFS disconnect error while I was logged into the SAN:
nfsd.tcp.close.idle.notify:warning]: Shutting down idle connection to client (192.168.1.10) where receive side flow control has been enabled. There are 0 bytes in the receive buffer. [more]

Resolution:

Per NetApp’s best practice document, flow-control should be disabled on the storage network when using modern hardware.  I had flow-control enabled on the switch and the SAN and this apparently was causing the disconnect issues. 
http://media.netapp.com/documents/tr-3749.pdf


 

I had replaced the power supplies for the blade enclosures for a customer.  When they had all been replaced, I noticed that one enclosure had power warning for all but two blade servers.  I rebooted one thinking it would clear the alert, but it certainly did not.  I researched the issue on the HP website to no avail.  In desperation, I shut down the server blade completely, waited, and then restarted the server.  That finally cleared the warning.  I shut down the rest of the blades, and then turned them back on and all of the warnings had cleared.


 

Vertafore performed an AMS360 software upgrade and not one of  the client systems would function properly. The problem was seen on physical PCs located in the main office. None of the View desktops (multi-user Virtual Desktop environments) used by remote users had the issue that was seen in the main office. The vender said it was a permissions issue, but after trial and error it was determined the problem was not a permissions issue. By comparing the affected systems with the View desktops that worked properly after the upgrade,  we found that for some reason the information in the Local folder in the users profile was not getting the updated settings. To resolve the problem CoNetrix performed a reboot; logged in as the administrator and removed the users profile. Then uninstalled and reinstalled the AMS workstation client, followed by logging  the user back in, which reloaded their roaming profile and rebuild the Local and Local Low profile folders. When we would login to AMS as the user, the system check would run as it should and add the updated DLLs and information to the users Local profile folder. After these steps were completed, each user’s AMS worked as it should. However, the process also required repair of each user’s Zixmail and Zywave installations due to rebuilding the Local and Local Low folders.

All the repairs were completed without elevating the user’s rights.

Also as stated above, none of the multi-user Virtual Desktop environment systems had this problem.  After the Internet Explorer 9 upgrade, each of the View systems were recomposed which removes the user’s profile and rebuilds the user’s Local and Local Low folders on first login.


 

A customer reported a problem with one of their users not being able to get email on his iPhone. His  phone would setup his account successfully, but when he went to the mail app, it would say “The connection to the mail server failed.” The customer tried setting up another user’s mailbox on his phone and it work correctly. I setup both accounts on my iPhone and saw the same results. I checked his account in Exchange to make sure ActiveSync was enabled. After some research, I found that the user’s Active Directory account must inherit permissions from the parent folder for email to sync.

To change this setting, first open Active Directory.

Enable Advanced Features – (View > Advanced Features).

Find the user’s account in Active Directory and open the Properties.

Go the Security tab > Advanced > Check “Include inheritable permissions from this object’s parent”. [more]

Click Apply and close Active Directory.

Refresh the mail app on the phone and mail should start flowing.


 

Bit Locker encryption on my new laptop asked for a recovery key every time I booted the system. Nothing had been changed within the system to cause this behavior. In an attempt to stop this from happening I un-encrypted and re-encrypted the drive. When the re-encryption was complete I rebooted and it asked for the recovery key. I went into the Bit Locker settings and suspended the encryption and reboot without an issue. I then re-enabled the encryption, reboot and it did not ask for the recovery key again. So if your system begins prompting for the recovery key. Disable and then re-enable Bit Locker and it resolves the problem.


 

Do you remember playing those games where you would look at two seemingly “identical” pictures and try to spot the 5 differences between the two? I’d like to play that, except there’s only one difference between these two:

Here’s the situation. We’re hosting files off of a CIFS share on our NetApp device. We needed to back up this CIFS data using Backup Exec. I went through and set up NDMP on both the NetApp and Backup Exec so that the communication could be established for Backup Exec to connect to the CIFS share and grab a copy of the data. In order to get the resource credentials to show successful, I had to set the Logon Account for the NetApp to be our domain Backup Exec Service Account (BESA) and the actual CIFS share to be the NDMP user account I had created for backup.

Secondly, there is another gotcha associated with how to set up the NDMP account for backup. If you use the root account, you can simply use the regular password as if you were logging into the NetApp device from the backup server. However, if you decide to create a new account, you must encrypt the account on the NetApp device. To create the account, do the following:

useradmin user add <login_name> -g “Backup Operators”
(Type in the password)
ndmp password <login_name>

The output after the password is what you will copy and paste into the backup server so that the NDMP user can authenticate.


 

I have been doing some testing on a Lync phone system specifically with response groups and call queues to try to figure out the most appropriate way to design our incoming call routing once we completely migrate to the Lync phone system. The response group features provide most of the features that you need to create basic call center-ish call routing, queuing, and end-user call route feedback. You can configure your response groups to pick up an incoming call and then play a recorded message. You can then configure the response group to provide callers with routing options (like press or say 1 for sales….press or say 2 for support…stuff like that).

My goal was to imitate the current phone system setup as much as possible. Right now, when someone calls the main number (during business hours), it just rings back to the caller until one of the call operators answers. One of the operators must be “online” to receive the call. In Lync, the equivalent of this functionality is to have a response group configured to receive the inbound call and then ring all users who are “logged in” to the response group. The desire was expressed to make sure SOMEONE (not something) be the first to answer the phone. So, from the Lync configuration side, this is basically a response group with no recorded greeting and no user call routing options….the incoming call rings, Lync answers and places the call in the response group for someone to pick up.

The problem is that when Lync picks up the call and transfers it to the response group, it terminates the ring-back to the caller and since the call was transferred to the response group, it starts playing music on hold.....(it’s nice music on hold but could be very confusing to the caller). I struggled with how to fix this and still have the caller have the impression that they were not “talking to a machine”. The solution is somewhat lame, but what the ears hear, the mind believes. Basically, I called into our current phone system, put my phone on speaker and recorded the ring-back with my cell phone. Then, I created a .wav file of the ring-back and set that .wav file as a looping auto greeting for the response group. So, when an incoming call comes into the response group, the caller gets the impression that the phone is still ringing…just waiting for someone to answer. Problem solved.