Skip to main content
Delphix

Detecting Memory Leak in RemoteServices for SQL Server Target Environments (KBA8070)

 

 

KBA

KBA# 8070

 

Issue

In the 6.0.2.0 release of the Delphix Virtualization Engine, a new feature was added that allows the use of Windows Authentication to access source environments when checking for available backups:

Windows Authentication for SQL Server:
You will now be able to use Windows Authentication to link SQL Server databases. Rather than providing both a database user and a Windows user to ingest data, you can leverage one set of credentials (a Windows OS user) to perform all source operations. This capability will simplify SQL Server deployments and reduce Delphix security requirements on source databases.

This new capability uses a JDBC connection from the source environment Connector Host, the host which has DelphixConnector installed, to issue SQL queries that return the backup set history of the source backups used for the SQL Server dSource ingestions during sync operation (maunal sync, validated sync). An example of the Connector Host is seen here from a Windows source environment:

ConnectorHost.png

This SQL Server JDBC driver is controlled by java remote services from the engine - communicating to the Connector Host to provide the SQL commands the JDBC driver will execute, collecting results and presenting them back to the Delphix engine.  This information is used to determine the next available backups and some environmental aspects of the source database (size and properties, such as recovery model).

It was discovered in some environments the dSources started to throw connection faults even though there were no known network or other connectivity issues. This would happen to any dSource configured to the Connector Host, which usually provided the staging databases for the source databases and the result was stalls in the validated syncs, a common configuration for dSources.

The Fault

An example of such a fault looked like this:


    Object ID                       FAULT-7316
    Fault Id                        7316
    Target                          MSSQL_LINKED_SOURCE-56
    Target Name                     TEST_DSRC
    Bundle Id                       fault.mssql.dSource.validated_sync.failed
    Params                          TEST_DSRC
                                    Failed to query source database "master" over JDBC.
                                    Command output: SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host sourcehost.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
                                    Make sure the database is accepting remote connections over TCP, the database is not using a dynamic TCP port, and the SQL login "WINDOMAIN\OS_USER" has privileges to connect to the source database and has required permissions. Then refresh the environment and try again.
    Title                           Validated sync for the dSource failed
    Description                     Validated sync for dSource "TEST_DSRC" failed with the error: Failed to query source database "master" over JDBC.Command output: SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host sourcehost.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    Response
    Action                          Review the error. Additional recommendation(s): Make sure the database is accepting remote connections over TCP, the database is not using a dynamic TCP port, and the SQL login "WINDOMAIN\OS_USER" has privileges to connect to the source database and has required permissions. Then refresh the environment and try again..
    Severity                        CRITICAL
    Status                          ACTIVE
    Date Diagnosed                  2021-08-26 16:03:18 UTC

After executing the recommended actions, the faults remained. This was critical as it stalled the advance of ingestions necessary to keep the synchronization of the source database with the staging database intact, preventing the ability to provision VDBs with timely data from the source environment.

The issue is caused by a memory leak in the non-DelphixConnector java.exe process running on the Connector Host that manages the JDBC connections to the source environment and controlled by the Delphix engine. In some cases, in the DelphixConnector installation directory (default is C:\Program Files\Delphix\DelphixConnector) a very large HPROF file is found, the file can be several GBs. This is the heap dump of the memory of the java-based process and usually has a name like <PID>.hprof or connector.hprof where <PID> is the process id of the java process.

You can also check java process (see the Troubleshooting section) with task manager to check how much memory it is using. If it is in the GB range, you are likely experiencing this memory leak issue.

Prerequisites

  • Delphix Virtualization Engine 6.0.2.0 or higher up to 6.0.8.1 (inclusive).
  • Fault indicating connection issue with SQL Server message starting: SQLState: 08S01.
    • This is the SQLState seen in all known issues.
  • Validated Sync configured for dSources.
    • It is conceivable that manual syncs could cause this but it would require a lot of them in repeated fashion to cause the issue.

 

Applicable Delphix Versions

Click here to view the versions of the Delphix engine to which this article applies
Major Release All Sub Releases
6.0 6.0.2.0, 6.0.2.1, 6.0.3.0, 6.0.3.1, 6.0.4.0, 6.0.4.1, 6.0.4.2, 6.0.5.0, 6.0.6.0, 6.0.6.1, 6.0.7.0, 6.0.8.0, 6.0.8.1

Resolution

There is a fix in the 6.0.9.0 Delphix Virtualization Engine release.

 

important

Important:

Please be aware that you should only follow this resolution path if you have already ruled out any other actual, real connectivity issue (port blocks, incorrect address, DNS, et cetera). 

There are two workarounds that provide temporary relief:

  • Restart the Management Service on the Delphix engine. This is not desired but will provide relief.  It will not affect VDBs or dSource staging databases. It will cancel running jobs, and interrupt tasks such as validated sync and policies. To restart the Management Service from the GUI, log in as the sysadmin in the "Setup" application.
    • Click on the "three dots", the ellipsis in the upper right, select "Restart", read the "Do you really want to do this" popup and agree to it.

restart.png

  • Look for the java process running the JDBC connections (refer to Troubleshooting section). You can kill this process, but you may need to check for restore activity on the staging instance first so as to not interrupt the ingestions.  You can use a query like this to check:
SELECT sqltext.TEXT,WAIT_TYPE,
req.session_id,
req.status,
req.command,
req.cpu_time,
percent_complete,
req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
WHERE req.command in ('BACKUP DATABASE','RESTORE DATABASE', 'RESTORE LOG')

When you do not see RESTORE activity you can kill the java process (you can locate java process as described in the Troubleshooting section).  After it is killed, the Delphix engine will start a new process to proceed with validated sync activities.

 

Troubleshooting

There is a diagnostic marker as indicated in the beginning of this article:

SQLState: 08S01 SQLErrorCode: 0 ErrorMessage: The TCP/IP connection to the host Windows_Source_host.acme.com, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". 

There is another process used to create and manage the JDBC connections from staging to the source to determine backupset history. For example, this information is used to properly determine next available backups for a validated sync scenario.

Locating the java process for JDBC connections

This other java process originates from a toolkit directory similar to this one as seen in the Windows task manager:

C:\Program Files\Delphix\DelphixConnector\Delphix_COMMON_b6ecbbc43c25_host\java 

You can inspect this java process from the Windows task manager on the staging host:

commandLine.png

In order to see the command line you need to add it to the Details tab of the task manager.

If you right click in the column header for the Task Manager Details (where you see column names such as Name, PID, Status) you will access a popup window.  Scroll through it and select Command line to add to the headers.

columnHeader.png

If you notice the "Memory (active private working set)" is displaying usage in the tens or more GBs range (for example, 12,000,000 K), you might be approaching a failure threshold and may need to kill that java process. You can do this once you can confirm no activities for a few minutes (See Resolution section for example of Restore activity query to run to check).

You can also check for java HPROF  (heap memory profile) files as well in the event an out of memory condition occurs, under the Delphix\DelphixConnector directory. If this occurs you may experience the connectivity issue and message indicated at the beginning of this article (Fault example).

Also, restarting DelphixConnector will do nothing to resolve since it is not handling these JDBC connections.

 

 


Related Articles

The following articles may provide more information or related information to this article:

  • N/A