Site icon Franky's Web

VMware vSphere VMs: Caution with vMotion operations and time-critical VMs such as domain controllers

I recently fell into this trap, because with time-critical VMs, such as domain controllers, which are operated on VMware vSphere, you have to pay attention to a small peculiarity. As an incorrect time can have far-reaching consequences, here is a short article on the subject. The following problem has occurred. An NTP server based on Linux (which was also responsible as an NTP server for the domain controllers) initially has the wrong time without any directly recognizable cause and shows a large deviation from the configured NTP servers on the Internet:

The screenshot shows that the local system time of the VM deviates by over 13 seconds from the time of the NTP servers on the Internet. The first assumption was that the local system time was synchronized with the time of the ESXi server, but this option was deactivated in the VM settings:

However, the ESX server actually had the corresponding time difference. In addition, it was noticed that the VM was recently moved to the ESX host with the corresponding time difference via vMotion. The investigations then brought the following VMware KB article to light:

Here you can read the following:

So the following has happened: The VM was moved to an ESX server with a different time using vMotion. Immediately after the vMotion process, the system time of the VM is synchronized with the time of the ESXi server by the VMware tools, so the VM also has the wrong time. If the VM now serves as a time source for the network, for example because it is a domain controller or a Linux NTP server, the incorrect time is now distributed to the NTP clients. Another pitfall here is the way the NTP protocol itself works: The first screenshot shows that the system time of the has a deviation from the configured NTP servers on the Internet. However, in the event of a major deviation, the system time is not simply corrected again via NTP, but slowly adjusted to avoid "time jumps". Specifically, NTP corrects the time by 50ms every 60 seconds in the event of a deviation. It therefore takes quite a while for a difference of 13 seconds to be corrected.

The effects of an incorrect time in the network can be very far-reaching (it's hard to believe what can happen, I don't want to go on about it). The solution from the VMware KB article should therefore be implemented for all time-critical VMs, especially for domain controllers, Linux NTP servers, VoIP telephone systems, OTP / 2FA servers.

In vSphere 7, this option can be switched off directly in the VM settings while the VM is running (VMs on ESXi 6.7 or earlier must be shut down to switch off time synchronization):

A note has also been added which contains the description from the VMware KB article:

Maybe it will help one or the other.

Exit mobile version