Skip to main content
Delphix

Byte Order Mark (BOM) in Windows text file causes Pre/Post Scripts errors

Applicable Delphix Versions

This applies to all current Delphix Masking version.

  • 5.1
  • 5.0
  • 4.3
  • 4.2
  • 4.1
  • 4.0
  • 3.2

Issue

The Delphix Masking Engine supports files saved in ASCII or UTF-8 format without the BOM (Byte Order Mark) when importing text files. If the lookup file contains foreign alphabet characters, the file must be saved in UTF-8 format with no BOM for the Masking Engine to read the Unicode text correctly. The BOM is usually added by Windows text editors.

The BOM causes a number of different issues and the error message differs depending on BOM code. This page describes SQL Pre and Post Scripts.

There are five different BOM codes:

Byte order mark Description
EF BB BF UTF-8
FF FE UTF-16, little endian
FE FF UTF-16, big endian
FF FE 00 00 UTF-32, little endian
00 00 FE FF UTF-32, big-endian

Troubleshooting

Pre/Post Scripts in Job Configuration

The Pre/Post Scripts are SQL scripts executed before and after the transformation (masking) job. The script file is selected in the Job Configuration screen and ingested into the masking job. 

The error caused by: UTF-8 and UTF-16 

The UTF-8 and UTF-16 BOM codes cause the ingestion process to include the BOM and an 'invisible' semicolon before the SQL statement (or row of statements). This empty SQL causes "Incorrect syntax near ''."

grep -a1 "Couldn't execute SQL" joblog

2017/07/06 13:27:54 - Execute_Prescript.0 - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : An error occurred, processing will be stopped:
2017/07/06 13:27:54 - Execute_Prescript.0 - Couldn't execute SQL: update dbo.OneCol set last_name ='David'
2017/07/06 13:27:54 - Execute_Prescript.0 -
2017/07/06 13:27:54 - Execute_Prescript.0 - Incorrect syntax near ''.

The error caused by: UTF-32

The UTF-32 BOM codes cause the ingestion process to include the BOM and an 'invisible' semicolon before the SQL statement (or row of statements). This BOM includes an invalid XML (Unicode: 0x0), which results in  "[Fatal Error] :xxx:xx: An invalid XML character (Unicode: 0x0) was found in the element content of the document."

2017-07-06 04:12:13,392 INFO  ...RunServiceImpl - [JOB_ID_39_513] INFO  06-07 04:12:13,391 - PRESCRIPT TRANSFORMATION - Loading transformation from XML file [/var/delphix/dmsuite//output/MSSQL/DMSApplicator/LongRun/39/KETTLE_PRESCRIPT_TRANSFORMATION_39.xml]
2017-07-06 04:12:13,417 INFO  ...RunServiceImpl - [JOB_ID_39_513] [Fatal Error] :173:13: An invalid XML character (Unicode: 0x0) was found in the element content of the document.
2017-07-06 04:12:13,422 INFO  ...RunServiceImpl - [JOB_ID_39_513] ERROR 06-07 04:12:13,422 - 39_513 - org.pentaho.di.core.exception.KettleException:
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513] Unexpected error during transformation metadata load
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513] Error reading information from input stream
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513] An invalid XML character (Unicode: 0x0) was found in the element content of the document.
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1202)
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:621)
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.Job.execute(Job.java:589)
2017-07-06 04:12:13,423 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.Job.execute(Job.java:728)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.Job.execute(Job.java:728)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.Job.execute(Job.java:443)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.Job.run(Job.java:363)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513] Caused by: org.pentaho.di.core.exception.KettleXMLException:
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513] Error reading information from input stream
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513] An invalid XML character (Unicode: 0x0) was found in the element content of the document.
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.core.xml.XMLHandler.loadXMLFile(XMLHandler.java:588)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.core.xml.XMLHandler.loadXMLFile(XMLHandler.java:508)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.core.xml.XMLHandler.loadXMLFile(XMLHandler.java:494)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2760)
2017-07-06 04:12:13,424 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2741)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.job.entries.trans.JobEntryTrans.getTransMeta(JobEntryTrans.java:1158)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         ... 6 more
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513] Caused by: org.xml.sax.SAXParseException; lineNumber: 173; columnNumber: 13; An invalid XML character (Unicode: 0x0) was found in the element content of the document.
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         at org.pentaho.di.core.xml.XMLHandler.loadXMLFile(XMLHandler.java:561)
2017-07-06 04:12:13,425 INFO  ...RunServiceImpl - [JOB_ID_39_513]         ... 11 more

Resolution

To workaround this issue - save the text file with an encoding which excludes the BOM code. This could be done using ANSI encoding or any of the other encoding but without the BOM. This will ensure that the BOM code is not included and added to the SQL statement. 

Steps:

  1. Add SQL statement(s) in a text editor
  2. Select ANSI Encoding or one without BOM
  3. Save

Below are an example of selecting encoding in Notepad++ and to the left the Encoding (here ANSI) which is seen at the bottom of the page.

    

External Links

Information about the BOM code can be found here:

https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx

 

 

Conditional content (Pro member)
  • Was this article helpful?