Detecting Memory Leak in RemoteServices for SQL Server Target Environments (KBA8070)
KBA
KBA# 8070
Issue
In the 6.0.2.0 release of the Delphix Virtualization Engine, a new feature was added that allows the use of Windows Authentication to access source environments when checking for available backups:
Windows Authentication for SQL Server:
You will now be able to use Windows Authentication to link SQL Server databases. Rather than providing both a database user and a Windows user to ingest data, you can leverage one set of credentials (a Windows OS user) to perform all source operations. This capability will simplify SQL Server deployments and reduce Delphix security requirements on source databases.
This new capability uses a JDBC connection from the source environment Connector Host, the host which has DelphixConnector installed, to issue SQL queries that return the backup set history of the source backups used for the SQL Server dSource ingestions during sync operation (maunal sync, validated sync). An example of the Connector Host is seen here from a Windows source environment:
This SQL Server JDBC driver is controlled by java remote services from the engine - communicating to the Connector Host to provide the SQL commands the JDBC driver will execute, collecting results and presenting them back to the Delphix engine. This information is used to determine the next available backups and some environmental aspects of the source database (size and properties, such as recovery model).
It was discovered in some environments the dSources started to throw connection faults even though there were no known network or other connectivity issues. This would happen to any dSource configured to the Connector Host, which usually provided the staging databases for the source databases and the result was stalls in the validated syncs, a common configuration for dSources.
The Fault
An example of such a fault looked like this:
Object ID FAULT-7316 Fault Id 7316 Target MSSQL_LINKED_SOURCE-56 Target Name TEST_DSRC Bundle Id fault.mssql.dSource.validated_sync.failed Params TEST_DSRC Failed to query source database "master" over JDBC. Command output: SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host sourcehost.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". Make sure the database is accepting remote connections over TCP, the database is not using a dynamic TCP port, and the SQL login "WINDOMAIN\OS_USER" has privileges to connect to the source database and has required permissions. Then refresh the environment and try again. Title Validated sync for the dSource failed Description Validated sync for dSource "TEST_DSRC" failed with the error: Failed to query source database "master" over JDBC.Command output: SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host sourcehost.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". Response Action Review the error. Additional recommendation(s): Make sure the database is accepting remote connections over TCP, the database is not using a dynamic TCP port, and the SQL login "WINDOMAIN\OS_USER" has privileges to connect to the source database and has required permissions. Then refresh the environment and try again.. Severity CRITICAL Status ACTIVE Date Diagnosed 2021-08-26 16:03:18 UTC
After executing the recommended actions, the faults remained. This was critical as it stalled the advance of ingestions necessary to keep the synchronization of the source database with the staging database intact, preventing the ability to provision VDBs with timely data from the source environment.
The issue is caused by a memory leak in the non-DelphixConnector java.exe process running on the Connector Host that manages the JDBC connections to the source environment and controlled by the Delphix engine. In some cases, in the DelphixConnector installation directory (default is C:\Program Files\Delphix\DelphixConnector) a very large HPROF file is found, the file can be several GBs. This is the heap dump of the memory of the java-based process and usually has a name like <PID>.hprof
or connector.hprof
where <PID> is the process id of the java process.
You can also check java process (see the Troubleshooting section) with task manager to check how much memory it is using. If it is in the GB range, you are likely experiencing this memory leak issue.
Prerequisites
- Delphix Virtualization Engine 6.0.2.0 or higher up to 6.0.8.1 (inclusive).
- Fault indicating connection issue with SQL Server message starting: SQLState: 08S01.
- This is the SQLState seen in all known issues.
- Validated Sync configured for dSources.
- It is conceivable that manual syncs could cause this but it would require a lot of them in repeated fashion to cause the issue.
Applicable Delphix Versions
- Click here to view the versions of the Delphix engine to which this article applies
-
Major Release All Sub Releases 6.0 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1
Resolution
There is a fix in the 6.0.9.0 Delphix Virtualization Engine release.
There are two workarounds that provide temporary relief:
- Restart the Management Service on the Delphix engine. This is not desired but will provide relief. It will not affect VDBs or dSource staging databases. It will cancel running jobs, and interrupt tasks such as validated sync and policies. To restart the Management Service from the GUI, log in as the sysadmin in the "Setup" application.
- Click on the "three dots", the ellipsis in the upper right, select "Restart", read the "Do you really want to do this" popup and agree to it.
- Look for the java process running the JDBC connections (refer to Troubleshooting section). You can kill this process, but you may need to check for restore activity on the staging instance first so as to not interrupt the ingestions. You can use a query like this to check:
SELECT sqltext.TEXT,WAIT_TYPE, req.session_id, req.status, req.command, req.cpu_time, percent_complete, req.total_elapsed_time FROM sys.dm_exec_requests req CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext WHERE req.command in ('BACKUP DATABASE','RESTORE DATABASE', 'RESTORE LOG')
When you do not see RESTORE activity you can kill the java process (you can locate java process as described in the Troubleshooting section). After it is killed, the Delphix engine will start a new process to proceed with validated sync activities.
Troubleshooting
There is a diagnostic marker as indicated in the beginning of this article:
SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host Windows_Source_host.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
There is another process used to create and manage the JDBC connections from staging to the source to determine backupset history. For example, this information is used to properly determine next available backups for a validated sync scenario.
Locating the java process for JDBC connections
This other java process originates from a toolkit directory similar to this one as seen in the Windows task manager:
C:\Program Files\Delphix\DelphixConnector\Delphix_COMMON_b6ecbbc43c25_host\java
You can inspect this java process from the Windows task manager on the staging host:
In order to see the command line you need to add it to the Details tab of the task manager.
If you right click in the column header for the Task Manager Details (where you see column names such as Name
, PID
, Status
) you will access a popup window. Scroll through it and select Command line to add to the headers.
If you notice the "Memory (active private working set)" is displaying usage in the tens or more GBs range (for example, 12,000,000 K), you might be approaching a failure threshold and may need to kill that java process. You can do this once you can confirm no activities for a few minutes (See Resolution section for example of Restore activity query to run to check).
You can also check for java HPROF (heap memory profile) files as well in the event an out of memory condition occurs, under the Delphix\DelphixConnector directory. If this occurs you may experience the connectivity issue and message indicated at the beginning of this article (Fault example).
Also, restarting DelphixConnector will do nothing to resolve since it is not handling these JDBC connections.
Related Articles
The following articles may provide more information or related information to this article:
- N/A