Skip to main content
Delphix

Delphix Engine Performance Issues on Large or Memory-constrained Virtual Guests (TB021)

Alert Type

Availability/Performance

Impact

The Delphix Engine may experience recurring and short-lived periods of reduced performance. In most instances the issue may be difficult to observe. Individual network connections between an affected Delphix Engine and other hosts may appear to hang for brief periods lasting 2-60 seconds. Network connections between the Delphix Engine and Windows hosts may be reset, although in most instances these connections will automatically be retried or reestablished by Delphix without any required intervention. 

In extremely rare cases, the Delphix Engine may hang, requiring manual intervention to reboot the system. 

Contributing Factors

The issue may occur in the following Delphix Releases:

  • All Delphix Engine 4.0 releases
  • All Delphix Engine 4.1 releases

The incidence and severity of the problem is related to increasing amounts of memory configured for the virtual guest hosting a Delphix Engine. Large memory guests, greater than 100GB, are more likely to be impacted.

The problem correlates with increased VDB I/O activity, especially where there are increasing numbers of read operations.

Connections between Windows hosts and affected Delphix engines are more likely to be impacted, because Windows TCP connections will attempt fewer segment retransmissions before resetting/aborting a connection. 

Symptoms

  • On MS SQL dSources, Validated Sync or transaction log processing may be interrupted due to network connectivity errors. Delphix faults and alerts may generated with text similar to the following:
Validated sync for dSource "<name>" failed with the error: An error occurred when attempting to connect to remote host "<hostname>" for environment "DSOURCE: <name>": "null".
  • TCP connections from Windows clients, e.g. Putty, to an affected Delphix Engine may be terminated with a "connection reset" message.
  • A single vCPU may stay at 100% capacity for periods of time ranging from 2 to 60 seconds. While this cannot be directly observed from the Delphix system, it may be visible using tools on the virtualization platform. For example, the esxtop utility on VMware's ESXi host console has an option on the CPU Panel that can display the utilization of individual vCPUs.

Resolution

This issue is fully resolved in Delphix Engine 4.2.0.0 and later releases.  

Additional Information

Using esxtop or resxtop in Interactive Mode (external link)

Each time the ARC shrinks, it targets to reduce by 1/32 of it's size. With the fix in DLPX-29571, the ARC will not be allowed to grow again after shrinking. While there may be some shrinking during the life of a system, this change should eliminate the frequent oscillation present in the worst pathologies of this problem. The changes associated with DLPX-36188 are mostly tools to help provide options should a remnant of the problem continue and should only be used at the direction of Engineering.

Because the memory reclamation process is so CPU intensive, a single thread processing this work can often fully occupy a single processor for long periods and executes at a fixed high priority. That's where the network interaction with this problem comes in. DxOS has discrete worker threads assigned to service various network queues, and each of these threads is bound to a specific processor. When memory reaping activity is running on a processor, it starves any network worker threads bound to that processor. This can result in very poor network performance or, in the case of Windows TCP connections, aborted connections. Windows is particularly susceptible because it's limited to five TCP retransmissions before a connection is aborted. Aborted connections can manifest as failed SnapSync actions, disconnected Putty sessions, and the like. 

Identification

A simple signature of the issue is to observe the output of the mpstat command and look for instances where one CPU is saturated at 100% of SYS time for an extended period:

# mpstat 1
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  dt idl
  0    0   0 402730   777  201    0    0    0  134    0     0    0 100  10   0
  1   90   0  832 60018  111  843  383  117  927    0   913   72  28   3   0
  2    0   0    2 60248  114  998  440   74 1729    0   535   21  79  24   0
  3    2   0    0 58481  775 1183  477  167 1243    0   362   75  23   8   2
  4    1   0    9 57130   26  983  378  212 1127    0   587   89   8   1   3
  5    0   0    2 56590  173  762  222  105  982    0   426   90   7   1   3
  6    6   0   12 58274  680 2087  896  180 1767    0  2252   75  22   5   3
  7    0   0 1331 60683  134 1415  535  186 1160    0  1021   89  11   2   0

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  dt idl
  0    0   0 388928   689  200    0    0    0   85    0     0    0 100  14   0
  1  208   0  914 57725  114  744  309   90  842    0   949   71  29  11   0
  2   11   0   14 57959  111 1787  740  128 2559    0   503   37  62  18   1
  3    1   0   20 56889  714 1660  660  155 1144    1  2122   62  36  12   2
  4   24   0   13 56890   42 2851 1255  349 1263    0  2499   82  16   3   2
  5    1   0    9 55880  158 1027  377  220  781    1   433   90   8   1   2
  6    0   0    0 57041  987 1127  456  130 2053    0  1046   74  24   7   2
  7    0   0  458 55713  127  939  310  165 1004    0   748   63  35  14   2 

Using a dtrace probe one-liner like:

# dtrace -n 'profile-31ms { @[stack()] = count(); }' -n 'END { trunc(@,30) }' -c "sleep 10"
 

should show stacks similar to:

              unix`xc_call+0x39
              unix`hat_tlb_inval+0x2a8
              unix`x86pte_inval+0xb8
              unix`hat_pte_unmap+0xde
              unix`hat_unload_callback+0xe8
              unix`hat_unload+0x3e
              unix`segkmem_free_vn+0x62
              unix`segkmem_zio_free+0x23
              genunix`vmem_xfree+0xf4
              genunix`vmem_free+0x23
              genunix`kmem_slab_destroy+0x8d
              genunix`kmem_slab_free+0x309
              genunix`kmem_magazine_destroy+0x6e
              genunix`kmem_depot_ws_reap+0x5d
              genunix`taskq_thread+0x2d0
              unix`thread_start+0x8

 

Workaround

The best method of resolving this issue is to upgrade to Delphix Engine 4.2.0.0 or later releases. In case an emergency workaround is needed there are two options available:

  1. Stop and restart the mongod service
  2. Increase the minimum length of time before DxFS will attempt to grow the ARC (default is 60 seconds)

    # echo "arc_grow_retry/W 0t1800" | mdb -kw
    # echo "set zfs_arc_grow_retry=1800" >>/etc/system

    Of course, if option 2 is used, be sure to follow the Support hotfix process. 

  3. Tune the maximum arc size on the running system (NOT RECOMMENDED):


    mdb -kw
    > arc_stats::print -at arc_stats_t ! grep c_max | head -1

        fffffffffbd1a760 kstat_named_t arcstat_c_max = {

    > fffffffffbd1a760::print -a kstat_named_t value.l

    fffffffffbd1a780 value.l = 0x15ff78e000

    > fffffffffbd1a780/Z *fffffffffbd1a780-(5*1<<0t30)

    arc_stats+0x5c0:0x15ff78e000            =       0x14bf78e000

    Note that the colored values above must be substituted with the values obtained from the running system. In the example shown above, arc_c_max is being set to 5GB less than it's current value, which would ordinarily be 6GB less than all available memory. This does not shrink the size of the current ARC in real time.  It should, however, prevent the arc from growing past the amended size until the system is next rebooted.

  4. Tune the maximum arc size using an /etc/system parameter (NOT RECOMMENDED AND REBOOT REQUIRED):

    # echo "set zfs_arc_max=<value> >>/etc/system
     

    where <value> is the desired maximum size of the arc, in bytes

Related Bugs

DLPX-29571 arc_kmem_reap_now() should not result in clearing arc_no_grow

  • fixed in Delphix 4.2.0.0

DLPX-36188 add tunables to combat schedule delay of kernel threads

  • fixed in Delphix 4.2.2.0

DLPX-36268 kmem reap thread gets blocked in reclaim callback

  • fixed in Delphix 4.2.2.0
Engineering Contact

Matt Ahrens <mahrens@delphix.com>

George Wilson <george.wilson@delphix.com>