This paper provides an overview of the considerations and best practices for deployment of VMware sphere on NFS based storage. It also examines the myths that exist and will attempt to dispel confusion as to when NFS should and should not be used with sphere.
The considerations for choosing a storage resource (e.g. NAS, block, HCI) tend to hinge on the issue of cost, performance, availability, and ease of manageability. However, an additional factor could also be the legacy environment and the storage administrator familiarity with one protocol vs. the other based on what is already installed.
Have a virtual switch with a VM kernel NIC configured for IP based storage. The NFS storage target needs to have been configured to export a mount point that is accessible to the ESXi hosts on a trusted network.
Regarding item one above, to configure the switch for IP storage access you will need to create a new port group, indicating it is a VM kernel type connection. This section explains the different types of network settings and how they work in different VMware NFS versions.
It is important to understand that with NFS version 3 there is only one active pipe for the connection between the ESXi host and a single storage target. To leverage more available bandwidth with NFS version 3.0, an ESXi host may have multiple connections to the storage targets.
With NFS version 4.1, new cultivating and load balancing capabilities are introduced. It is important to understand that these new features can only be leveraged so long as the NFS target supports them.
Let us now look at some options available and how you might be able to improve performance, keeping in mind that you have a single connection between host and storage. All devices sitting in the I/O path must be able to implement jumbo frames for it to make sense (array controller, physical switches, Nice and VM kernel ports).
Multiple VM kernel interfaces are bound in a LAG (Link Aggregation Group) and this is then used to access the NFS target. For detailed configuration steps, refer to the official sphere Networking documentation on vmware.com.
NFS version 4.1 allows the same data store to be presented via multiple data paths. Ideally, one would not route between the ESXi host and the NFS target; one should try to keep them both on the same subnet.
There are still quite a number of restrictions with routing NFS and the sphere release notes should be examined in detail to determine whether it is suitable for your environment. All NAS array vendors agree that it is good practice to isolate NFS traffic for security reasons.
This would mean isolating the NFS traffic on its own separate physical switches or leveraging a dedicated VLAN (IEEE 802.1Q). Another security concern is that the ESXi host mounts the NFS data stores using root privileges.
To address the concern, again it is considered a best practice to use either a dedicated LAN or a VLAN for protection and isolation. There is a requirement on Active Directory for this to work, and each ESXi host should be joined to the AD domain.
Kerberos is enabled when the NFS v4.1 data store is being mounted to the ESXi host. A warning message is displayed in the sphere client that each host mounting this data store needs to be part of an AD domain.
This section outlines these steps and investigates a number of options which can be utilized to make your NFS data stores highly available. There are still issues with the physical LAN switch being a single point of failure (SPOT).
To avoid this, a common design is to use NIC teaming in a configuration which has two physical switches. 2) Link Aggregation Control Protocol (LACP) is another option one could consider.
Now this may or may not improve throughput/performance since NFS version 3 is limited to a single connection, but it does allow protection against path failures. Many NFS array vendors support this feature at the storage controller port level.
One of the features of LACP is its ability to respond to events on the network and decide which ports should be part of the logical interface. This feature may provide additional availability in so far as a failover to an alternate NIC can now occur based on feedback from the physical switch as opposed to just relying on a link failure event.
Depending on the interconnect (1GigE or 10GigE), some vendors make recommendations about turning flow control off and allowing congestion to be managed higher up the stack. The second is a recommendation around switch ports when Spanning Tree Protocol (STP) is used in an environment.
This means that ports immediately transition its forwarding state to active. This section provides an explanation of the tunable parameters that are available when using NFS data stores.
Before we drill into these advanced settings in a bit more detail, it is important to understand that the recommended values for some of these settings may (and probably will) vary from storage array vendor to storage array vendor. My objective is to give you a clear and concise explanation of the tunable parameters and allow you to make your own decisions when it comes to tuning the values.
This means that 8 is the maximum number of NFS volumes which can be mounted to an ESXi host by default. This can be changed, as VMware supports a maximum of 256 NFS volumes mounted to an ESXi host.
Net. TcpIpHeapSize is the size of the memory (in MB) which is allocated up front by the VM kernel to TCP/IP heap. In earlier versions of ESXi, the default max TCPIP Heap size was much smaller than it is now.
Since default heap size is 512 MB in the current version of ESXi, it should be sufficient even for 256 NFS volumes mounted. Changing this advanced setting requires a host reboot to take effect.
If ESXi gives up on 10 consecutive heartbeats, it treats the NFS data store as unreachable. ESXi hosts continue to make heartbeat requests in the hope that the data store does become available once again.
In this case, another host must be able to take ownership of that VM, so a method to timeout the previous lock must exist. It then takes 3 polling attempts with 10 seconds intervals for the competing host to declare that the lock has expired and break it.
Lock preemption will be completed in 3 × 10 + 10 = 40 seconds before I/O will start to flow on the competing host. The author is not aware of any storage vendor recommendations to change these values from the default.
Finally, it is extremely important that any changes to the lock settings are reflected on all hosts sharing the data store. If there are inconsistent lock settings across multiple hosts sharing the same NFS data store, it can result in some very undesirable behavior.
Sun RPC. MaxConnPerIP defines the maximum number of unique TCP connections that can be opened for a given IP address. If the number of mounts to an IP address is more than Sun RPC. MaxConnPerIP, existing connections are shared by different mounted.
Even with the maximum, the existing TCP connections need to be shared to mount 256 volumes. Therefore, it is considered best practice to use NFS storage on trusted networks only and to isolate the traffic on separate physical switches or leverage a private VLAN.
Another security concern is that the ESXi host must mount the NFS server with root access. To address the concern, it is best practice to use of either dedicated LAN or VLAN to provide protection and isolation.
Enhancements to Kerberos for NFS version 4.1 in sphere 6.5 include adding AES encryption support. As mentioned, sphere 6.5 also introduces a new Kerberos integrity checking mechanism for NFS v4.1, called SEC_KRB5I.
This feature performs integrity checking of NFS operations using secure checksum to prevent data manipulation. Most of the interoperability features are tried and tested with NFS, but I will try to highlight areas that might because for additional consideration.
The following table lists major sphere solutions that NFS versions support. The whole point of Storage I/O Control (IOC) is to prevent a single virtual machine (VM) residing on one ESXi host from consuming more than its fair share of bandwidth on a data store that it shares with other VMs which reside on other ESXi hosts.
Historically, we have had a feature called ‘disk shares’ which can be setup on a per ESXi host basis. This will work quite well for all VMs residing on the same ESXi host sharing the same data store (i.e. local disk).
However, this could not be used as a fairness mechanism for VMs from different ESXi hosts sharing the same data store. With IOC you can set shares to reflect the priority of VMs, but you can also implement an IONS limit per VM.
This means that you can limit the number of IONS that a single VM can do to a shared data store. With 10Gb networks, this feature can be very useful as you will typically be sharing one pipe with multiple other traffic types.
While IOC assists with dealing with the noisy neighbor problem from a data store sharing perspective, IOC assists with dealing with the noisy neighbor problem from a network perspective. Storage DRS, introduced in sphere 5.0, fully supports NFS data stores.
If the cluster is set to automatic mode of operation, Storage DRS will use Storage motion to automatically migrate VMs to other data stores in the data store cluster if the threshold is exceeded. Storage DRS will provide the best recommendations to balance the space usage of a data stores.
After a period of time, if the recommendations make sense, and you build a comfort level with Storage DRS, consider switching it to automated mode. Storage motion on NFS data stores continue to use the VM kernel software data mover.
A future release of view (at the time of writing) is needed for full support of this primitive. Without AAI NAS, we never had the ability to reallocate or zero out space for Vodka on NFS.
With the introduction of Reserve Space, one can now create thick Vodka on NFS data stores. AAI NAS Reserve Space is not like Write Same for block; it does not get the array to do the zeroing on its behalf.
As an aside, we just said that AAI NAS Reserve Space allows you to create virtual disks in Thick Provision Lazy Zeroed (lazyzeroedthick) or Thick Provision Eager Zeroed (eagerzeroedthick) format on NFS data stores on arrays which support Reserve Space. Remember that a AAI NAS plugin is required from your respective storage array vendor for any of these primitives to work.
The plugin must be installed on each ESXi host that wishes to leverage the AAI NAS primitives. Similarly, the AAI TP-Stun was introduced to detect “Out of space” conditions on SCSI Runs.
However, for NAS data stores, NFS servers can already return an out-of-space error which should be propagated up the stack. This behavior does not need the VAGINAS plugin, and should work on all NFS data stores, whether the host has AAI enabled or not.
Once again, a AAI NAS plugin is required from your respective storage array vendor for any of these primitives to work with NFS version 4.1, and the plugin must be installed on each ESXi host that wishes to leverage the AAI NAS primitives. Site Recovery Manager (SRM) fully supports array based replication on NFS data stores.
In regard to best practice, I was reliably informed that one should consider storing the swap in a different directory when using a replicated NFS data store in SRM. Another consideration is the use of Fully Qualified Domain Names (FQDN) rather than IP addresses when mounting NFS data stores.
The latest version in sphere 5.x uses a mirror driver to splits writes to the source and destination data store once a migration is initiated. This should mean speedier migrations since there is only a single copy operation now needed, unlike the recursive copy process used in previous versions which leveraged Change Block Tracking (CBT).
The one consideration and this has been called out already, is that Storage motion operations cannot be offloaded to the array with AAI. The only other considerations with Storage motion are relevant to both block & NAS, namely the configuration maximums.
At the time of writing, the maximum number of concurrent Storage motion operations per ESXi host was 8, and the maximum number of Storage motion operations per NFS data store was 2. This is to prevent any single data store from being unnecessarily impacted by Storage motion operations.
Another consideration is the fact that many arrays now have reduplication and compression features, which will also reduce capacity requirements. These align nicely with the 4 KB grain size used in VMware’s virtual disk format (MDK).
For those vendors who have it set to 8 KB, the recommendation is to format the volumes in the Guest OS to a matching 8 KB block size for optimal performance. Since having multiple virtual machines sharing an NFS data store is going to result in random workloads, you probably would be using 4 KB.
(Be sure to explain that the data store will be used by VMs, so the workload will be random for the most part) Are there any considerations when formatting the volumes in the Guest OS? I thought it useful just to add a note about the Size and Size parameters when discussing sizing.
While some storage arrays can now support much larger Size & Size values (up to 1 MB), there is still no way to change these settings in sphere at the time of writing. Since VMware still only supports NFS v3 over TCP/IP, one recommendation was to use more data stores presented over different interfaces, so that traffic could be balanced over multiple TCP sessions.
The other major factor is related to the backup and recovery Service Level Agreement) SLA. If you have one single data store with lots of VMs, how long are you willing to wait while this is restored in the event of a failure.
Performance considerations are needed if using virtual machine snapshots to concurrently capture point in time copies of VMs. In many cases, array based snapshots have less impact on the data stores and are more scalable when it comes to backups.
The next section focuses on the options available and the criterion might make one choice a better selection than the alternative. Virtual disks (Vodka) created on NFS data stores are in thin provisioned format by default.
For the purpose of this paper, VMware will define wasted disk space as allocated but not used. As such, less allocation of VMS volume storage space than is needed for the same set of virtual disks provisioned as thick format.
VMware center™ now provides support for both the creation of thin virtual disks and the monitoring the data stores that may be over-committed (i.e. that can trigger alarms for various thresholds of either space used approaching capacity or provisioned pace out pacing capacity by varying degrees of over commit). Some common methods are to use products that utilize sphere Data Protection APIs (ADP).
Another is placing agents in each Guest OS, or leveraging array based snapshot technology. With NFS and array based snapshots, one has the greatest ease and flexibility on what level of granularity can be restored.
Although this does open up a bit of a security risk, NFS does provide one of the most flexible and efficient restore from backup option available today. For this reason, NFS earns high marks for ease of backup and restore capability.
This section summarizes the best practices for running VMware sphere on network attached storage. It is important to not over-subscribe the network connection between the LAN switch and the storage array.
The retransmitting of dropped packets can further degrade the performance of an already heavily congested network fabric. In short, Network Attached Storage has matured significantly in recent years, and it offers a solid availability and high-performance foundation for deployment with virtualization environments.
Following the best practices outlined in this paper will ensure successful deployments of sphere on NFS. Both performance and availability are holding up to expectations and as best practices are being further defined, the experience of running VMware technology on NFS is proving to be a solid choice of storage protocols.
Some storage technology partners are working closely with VMware to further define the best practices and extend the benefits for customers choosing to deployment VMware sphere on this storage protocol option. Cormac Hogan is a Director and Chief Technologist in the Storage and Availability Business Unit at VMware.
Cormac has written a book called “Essential San” available from VMware Press, as well as a number of storage-related white papers.