Failure to Protect a Virtual Machine
Occasionally, you might find that when you create a Protection Group the process fails to register one or more virtual machines at the Recovery Site. It's important not to overreact to this situation as the causes are usually trivial ones caused by the configuration, and they are very easy to remedy. The most common cause is either bad inventory mappings, or a VM that falls outside the scope of your inventory mappings. In this section I will give you a checklist of settings to confirm, which will hopefully fix these problems for you. They amount to the kind of initial troubleshooting you may experience when you configure SRM for the first time.
Bad Inventory Mappings
This is normally caused by a user error in the previous inventory mapping process. A typical failure to protect a VM is shown in Figure 9.51. The error is flagged on the Protected Site with a yellow exclamation mark on the Protection Group, and the virtual machines that failed to be registered.
Figure 9.51 A VM failing to be protected because the VM Network port group was not included in the inventory mappings
As a consequence, you will also see errors in the Tasks & Events tab for the affected VMs. The classic clue that a VM has a bad inventory mapping is the "Unable to protect VM <VM name> due to unresolved devices" message shown in Figure 9.52.
Figure 9.52 The unresolved devices error that usually indicates a problem with inventory mappings
This error is usually caused by the virtual machine settings being outside the scope of the inventory mapping settings defined previously, and therefore the Protection Group doesn't know how to map the virtual machine's current folder, resource pool, or network membership to the corresponding location at the Recovery Site. A good example is networking, which I just described above.
In the inventory mapping process, I did not provide any inventory mappings for the VM Network port group. I regarded this as a local network that contained local virtual machines that did not require protection. Accidentally, the virtual machine named "fs01" was patched into this network, and therefore did not get configured properly in the Recovery Plan. In the real world this could have been an oversight; perhaps I meant to set an inventory mapping for vlan10 but forgot to. In this case, the problem wasn't my virtual machine but my bad configuration of the inventory mapping.
Another scenario could be that the inventory mapping is intended to handle default settings where the rule is always X. A number of virtual machines could be held within the Protection Group and could have their own unique settings; after all, one size does not fit all. SRM can allow for exceptions to those rules when a virtual machine has its own particular configuration that falls outside the group, just like with users and groups.
If you have this type of inventory mapping mismatch it will be up to you to decide on the correct course of action to fix it. Only you can decide if the virtual machine or the inventory mapping is at fault. You can resolve this match in a few different ways.
- Update your inventory mappings to include objects that were originally overlooked.
- Correct the virtual machine settings to fall within the default inventory mapping settings.
- Customize the VM with its own unique inventory mapping. This does not mean you can have rules (inventory mappings) and exceptions to the rules (custom VM settings). A VM either is covered by the default inventory mapping or is not.
If you think the inventory mapping is good, and you just have an exception, it is possible to right-click the icon in the Protection Group, select Configure Protection in the menu that opens, and offer per-VM inventory settings. If you had a bigger problem—a large number of VMs have failed to be protected because of bad inventory mapping configurations—you can resolve that in the inventory mapping, and then use Configure All to try the protection process again.
I would say the most common reason for this error is that you have deployed a new VM from a template, and the template is configured for a network not covered by the inventory mapping. Another cause can concern the use of SvSwitches. It's possible to rename the port groups of an SvSwitch to be a different label. This can cause problems for both the inventory mapping and the affected VMs. As a consequence, when the Protection Groups are created for the first time the protection process fails because the inventory mapping was using the old name.
Placeholder VM Not Found
Another error that can occur is if someone foolishly deletes the placeholder that represents a VM in the Recovery Site, as shown in Figure 9.53. It is possible to manually delete a placeholder VM, although you do get the same warning message as you would if you tried to edit the placeholder settings. Nonetheless, these placeholder objects are not protected from deletion. If a rogue vCenter administrator deletes a placeholder you will see a yellow exclamation mark on the Protection Group, together with a "Placeholder VM Not Found" error message.
Figure 9.53 The "Placeholder VM Not Found" error message caused by accidental deletion of the placeholder in the inventory
The quickest way to fix this problem is to choose either the Restore All link or the Restore Placeholder link in the Protection Group interface. The Restore All option rebuilds all the placeholders within the Protection Group, whereas Restore Placeholder fixes just one selected placeholder in the list.
VMware Tools Update Error—Device Not Found: CD/DVD Drive 1
Occasionally, the Protection Group can have a VM that displays an error on its own. For example, in Figure 9.54 the VM named "db01" has the error message "Device Not Found: CD-DVD drive1." This error is relatively benign and does not stop execution of the plan.
Figure 9.54 The old chestnut of connected CD/DVD drives can cause a benign error to appear on the Protection Group.
This issue was created by a faulty VMware Tools update using Update Manager. The CD-ROM mounted was to a Linux distribution where an automatic mounting and update of VMware Tools failed. The Update Manager was unsuccessful in unmounting the .iso file at /usr/lib/vmware/isoimages/linux.iso, but the auto-execution of VMware Tools does not work in the same way with Linux as it does with Windows. With Linux all that happens is that the .iso file is mounted as a CD-ROM device, but it is up to the administrator to extract the .tgz package and install VMware Tools to the guest system. This error was resolved by right-clicking the affected VM, and under the Guest menu selecting "End VMware Tools install." This triggered an unmounting of the VMware Tools .iso image.
Delete VM Error
Occasionally, you will want to delete a VM that might also be a member of a Protection Group. The correct procedure for doing this is to unprotect the VM, which will then unregister its placeholder VMX file, and as a consequence remove it from any Recovery Plan. Of course, there's nothing to stop someone from ignoring this procedure and just deleting the VM from the inventory. This would result in an "orphaned" object in the Protection Group and Recovery Plan, as shown in Figure 9.55.
Figure 9.55 The error when a VMware administrator deletes a protected VM without first unprotecting it in SRM
To fix these VMs, select the affected VM and click the Remove Protection button.
It's Not an Error, It's a Naughty, Naughty Boy!
If you can forgive the reference to Monty Python's The Meaning of Life, the confusing yellow exclamation mark on a Protection Group can be benign. It can actually indicate that a new virtual machine has been created that is covered by the Protection Group. As I may have stated before, simply creating a new virtual machine on a replicated LUN/volume does not automatically mean it is protected and enrolled in your Recovery Plan. I will cover this in more detail in Chapter 11, Custom Recovery Plans, as I examine how SRM interacts with a production environment that is constantly changing and evolving.
Hopefully with these "errors" you can begin to see the huge benefit that inventory mapping offers. Remember, inventory mappings are optional, and if you chose not to configure them in SRM when you created a Protection Group every virtual machine would fail to be registered at the Recovery Site. This would create tens or hundreds of virtual machines with a yellow exclamation mark, and each one would have to be mapped by hand to the appropriate network, folder, and resource pool.