Exchange 2010: Prevent DAG failover during vMotion

With virtual Exchange 2010 servers on VMware vSphere environments it can happen that during a vMotion process of the Exchange database server a failover of the databases is triggered.

The cause is often the heartbeat timeout when the Exchange VM is moved to another ESX host. It can happen that the switches do not learn the changed port quickly enough, or that the vMotion process takes a little too long.

In the default setting, the timeout is one second (1000ms). The settings can be checked with the "cluster /prop" command:

To prevent a fail over of the Exchange databases being triggered during a vMotion process, the value can be increased to two seconds. During this time, all switches should have received the changed switch port and the short connection loss during the vMotion process should also be bridged.

The following command can be used to increase the TimeOut to two seconds:

cluster /prop samesubnetdelay=2000

However, if DRS is active within the vSphere environment, it also helps to minimize the number of automatic vMotion operations and rather move the FileServer or similar if performance problems occur.

To minimize the number of vMotion operations for the Exchange database servers, two Sphere DRS rules are required:

1. virtual machines to hosts: For example, to keep 3 database servers on 3 ESX hosts