Skip to main content
Delphix

Delphix Java Process May Stop Responding on AIX (KBA1783)

 

 

KBA

KBA# 1783

Issue 

On AIX, customers may find that there are orphaned Delphix java processes that do not appear to be functioning, but continue to consume memory resources. These processes may be hung (deadlocked) and never exit which means that a server could be deprived of memory resources to the point that it results in a loss of service due to a complete lack of memory. It may also dramatically impact the performance of the system prior to this as standard paging and swapping algorithms are used to free physical memory.

Troubleshooting 

Utilize the "ps" command to see if there are processes running from under the Delphix toolkit directory that are left unresponsive for an extended periods of time (days). In the following example, there are a number of processes owned by th Delphix operating system user "delphix":

$ ps -ef | grep delphix
delphix  4980954 50331840   0                  0:00 <defunct>
delphix 12910728        1   0 07:58:19      -  0:02 /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/var/opt/delphix/Toolkit -Ddelphix.max.worker=16 -Djava.io.tmpdir=/var/opt/delphix/Toolkit/Delphix_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/tmp -jar /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/client/dsp/client.jar
delphix 13303810        1   0   Aug 19      -  0:02 /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/var/opt/delphix/Toolkit -Ddelphix.max.worker=16 -Djava.io.tmpdir=/var/opt/delphix/Toolkit/Delphix_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/tmp -jar /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/client/dsp/client.jar
delphix 16253078 35192968   0                  0:00 <defunct>
delphix 16973938 59048178   0                  0:00 <defunct>
delphix 20971624        1   0   Aug 19      -  0:02 /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/var/opt/delphix/Toolkit -Ddelphix.max.worker=16 -Djava.io.tmpdir=/var/opt/delphix/Toolkit/Delphix_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/tmp -jar /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/client/dsp/client.jar
delphix 22282392 41680912   0                  0:00 <defunct>
delphix 22937658        1   0   Aug 19      -  0:02 /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/var/opt/delphix/Toolkit -Ddelphix.max.worker=16 -Djava.io.tmpdir=/var/opt/delphix/Toolkit/Delphix_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/tmp -jar /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/client/dsp/client.jar
delphix 35192968        1   0   Aug 17      -  0:02 /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/java/jdk/bin/java -ea -XX:-UseVMInterruptibleIO -Ddelphix.host.os=unix -Ddelphix.toolkit.base.dir=/var/opt/delphix/Toolkit -Ddelphix.max.worker=16 -Djava.io.tmpdir=/var/opt/delphix/Toolkit/Delphix_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/tmp -jar /var/opt/delphix/Toolkit/Delphix_COMMON_423e004e_8079_502d_14bb_82d1bfdd3532_delphix_host/client/dsp/client.jar

It is also likely that each of these processes will have a child process with an executable name of “<defunct>” (zombie). If these conditions are observed, it is likely that the Java process is hung in memory allocation and cannot make forward progress. This is not a Delphix or Java issue as the hang occurs in the AIX memory management facility.

Resolution 

In order to mitigate the hang, simple environment variable additions for the Delphix operating system users can be made that will only affect the Delphix OS users and have no impact to the rest of the system. You may have multiple Delphix operating system users as seen in the following graphic:

Delphix_Environment_Users.png

Adding the following environment variable and options for all of the environment users will prevent the accumulation of Java processes and the depletion of memory:

$ cat .ssh/environment
MALLOCOPTIONS=multiheap:32,pool

 

Note

Note:

The environment variable must be defined for non-interactive logins. 

Many of the commands Delphix issues are done through a non-interactive login and as such, they need to be defined in the ".ssh/environment" file. Depending on the operating system user's shell, there may be other alternatives.

Update the sshd_config file to permit setting user environment variables:

$ grep PermitUserEnvironment /etc/ssh/sshd_config
#PermitUserEnvironment no
PermitUserEnvironment yes

If PermitUserEnvironment==no which is the default setting, the operating system will not allow environment variables to be set in the ".ssh/environment" file.

To test that the environment variable is defined correctly for non-interactive logins the following test can be completed:

$ ssh ora11202@aix101-14.delphix.com "env | grep MALLOCOPTIONS"
ora11202@aix101-14.delphix.com's password: 
MALLOCOPTIONS=multiheap:32,pool

From IBM’s documentation these two options have the following impact:

multiheap

  • Configures the number of parallel heaps to be used by memory allocators. You can set the multiheap by exporting MALLOCOPTIONS=multipheap:n. The value n can vary from 1 through 32. The default value is 32, if n is not specified. This option is advisable for multithreaded applications, as it can significantly improve the performance.

pool

  • Maintains the bucket for each thread and provides a lock-free allocation and deallocation for blocks less than 513 bytes. This option improves the performance of multithreaded applications as it avoids the time that is spent on locking of memory size less than 513 bytes. The pool option makes small memory block allocations fast and efficient.