Skip to main content
Delphix

VMWare and Delphix CPU Utilization Discrepancy Explained (KBA1019)

 

 

KBA

KBA#1019

 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.0.0, 6.0.1.0, 6.0.1.1, 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1, 6.0.9.0, 6.0.10.0, 6.0.10.1, 6.0.11.0

5.3

5.3.0.0, 5.3.0.1, 5.3.0.2, 5.3.0.3, 5.3.1.0, 5.3.1.1, 5.3.1.2, 5.3.2.0, 5.3.3.0, 5.3.3.1, 5.3.4.0, 5.3.5.0, 5.3.6.0, 5.3.7.0, 5.3.7.1, 5.3.8.0, 5.3.8.1, 5.3.9.0

5.2

5.2.2.0, 5.2.2.1, 5.2.3.0, 5.2.4.0, 5.2.5.0, 5.2.5.1, 5.2.6.0, 5.2.6.1

5.1

5.1.0.0, 5.1.1.0, 5.1.2.0, 5.1.3.0, 5.1.4.0, 5.1.5.0, 5.1.5.1, 5.1.6.0, 5.1.7.0, 5.1.8.0, 5.1.8.1, 5.1.9.0, 5.1.10.0

5.0

5.0.1.0, 5.0.1.1, 5.0.2.0, 5.0.2.1, 5.0.2.2, 5.0.2.3, 5.0.3.0, 5.0.3.1, 5.0.4.0, 5.0.4.1, 5.0.5.0, 5.0.5.1, 5.0.5.2, 5.0.5.3, 5.0.5.4

Issue

Individuals responsible for the administration and/or monitoring of systems using VMware and Delphix may have noticed that the guest virtual machine (VM) associated with Delphix tends to show higher average CPU utilization than what would be considered normal when compared to a traditional application virtualized environment. Also, the performance metrics, specifically CPU utilization, reported by Delphix analytics may show a marked differential from those being reported by VMware tools like esxtop.

This phenomenon is not unique to DelphixOS (DxOS). In fact, many guests with higher vCPU and memory requirements will likely exhibit the same behavior. A number of papers have been authored on the subject such as this one, but this page identifies why this occurs specifically for DxOS.

 

In general, a Delphix guest VM will have higher resource requirements than pedestrian virtualized instances where the primary purpose is to increase consolidation and maximization of available hardware resources. As a minimum base, the Delphix Engine will have 8 vCPUs and 64GB of memory; larger installations can have more than 48 vCPUs and 1TB of memory, increasing the interaction with the VMware hypervisor. Rather than a wasteful use of available resources, this overhead is an intelligent and optimized utilization of the additional compute and memory associated with a business-critical virtualized instance. The hypervisor is not an impediment to performance but instead allows for increased observability of the nuances of DxOS.

 

First and foremost, Delphix is designed to virtualize the underlying storage for an Agile Data implementation. In order to service target environments effectively, DxOS has been engineered to ensure that requests from targets are not subject to inflated latency for reasons commonly associated with generic compute systems, such as interrupt pinning or priority inversion.

Obviously, when Delphix is actively servicing a target request, CPU utilization will be commensurate with client demand. However, once a particular Delphix vCPU has completed its work, and it has no other work in its own run-queue to process, there could be other requests that remain unserviced simply because they are scheduled to run on a more active vCPU. In this case, the idle vCPU will mark itself internally within DxOS as idle, but it will also actively search other vCPU run-queues for jobs that would be better served by being stolen rather than sitting unserviced in the current run-queue. From a DxOS standpoint, the CPU is idle; however, ESX will see this additional dispatcher work as guest CPU cycles. This optimization is not completely unique to DxOS, but given the larger number of vCPUs, it may seem more pronounced. This is not a defect, but a very well-designed solution to ensure low latency servicing of outstanding application or kernel threads.

The perceived overhead associated with this work is not fixed; in fact, it will trend toward zero as the actual workload increases. In other words, as the vCPUs find candidate threads for stealing or migration or have their own work to perform, the amount of time they will spend searching for runnable jobs will decrease.

This process of searching for runnable threads occurs anytime a vCPU becomes active. This can be the result of of a target virtual database (VDB) request, an interrupt (network or disk), or even one of several high-frequency interval timers utilized within DxOS. These CPU cycles – as reported by ESX – are well warranted and would go undetected without hypervisor intervention.

 

Additionally, Delphix implementations tend to have larger guest memory footprints. With larger memory requirements, there is additional work that must be performed on any SMP system. The primary example is the allocation and deallocation of memory. The latter is of greater consequence than the former. If memory is freed from the Delphix file system cache, this comprises a significant portion of the overall Delphix memory footprint. As a result, CPU work must be performed by DxOS, which may appear to be magnified at the hypervisor because of the sheer nature of virtualization. If a particular memory page is freed (invalidated), then any vCPU that has accessed this memory region must be notified that it is no longer a valid mapping. This requires a CPU crosscall – in this case, a Translation Lookaside Buffer (TLB) invalidation. This means that one CPU must notify one, several, or all vCPUs associated with the guest that the memory is no longer valid for reference. Any further reference will result in a fault.

Delphix product documentation specifically states that CPU and memory reservations are highly recommended and required, respectively. There are two reasons for this. First, Delphix will consume those cycles in order to lower latency on target requests. Second, when Delphix is denied CPU or memory because of policy, the impact on performance can be orders of magnitude worse than a traditionally virtualized instance, because multiple target environments are impacted simultaneously.

 

In order to determine the CPU saturation of a Delphix guest, it is important to look at statistics available from ESX other than simple CPU utilization. The primary statistics to monitor are %RDY, %VMWAIT, and %CSTP . Unless there are non-zero values for these statistics, assume that the Delphix guest is performing as designed. Internal testing has shown that even the most idle of Delphix guest VMs could report CPU utilization (according to ESX) of 65%. This is a performance optimization and should not be of concern from the perspective of capacity planning.  VMware provides a knowledge base document with some details on these statistics.

 

Delphix is a VM-encapsulated appliance, and many utilization metrics that are normally difficult to obtain are readily available through VMware’s toolset. There are specific implementation details that would go unnoticed if Delphix was not virtualized. These are not anomalous conditions; rather, they are well-designed software decisions to ensure that Delphix can provide its services without undue delay. The Analytics tools within the Delphix GUI are the best means of determining capacity and utilization of the services provided by the Delphix Engine.
 

 

 

 

 

 

 

  • Was this article helpful?