DB2 json values fetch contains strange characters "NEL" when viewed through Notepad++ - encoding

We have a problem while viewing the JSON file that contains "\r\n" fetched from DB2 (encoding scheme EBCDIC).
When we check the content of the attribute with TOAD directly in the database, we see that we have correct Hex-values for CRLF.
We are fetching this data from DB2 as json format . While viewing the json file it gets converted to the below format
And NotedPadd++(UTF-8) while viewing the JSON "NEL" is being displayed . When i convert the file to ANSI i could notice Â...
I am writing to file as in below code (sample code.)
output = new FileOutputStream(tempFile);
IOUtils.write(getBytes(), output);
public byte[] getBytes() {
String data = "{\r\n" "dataLists" : [ ]}";
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byteArrayOutputStream.write(data.getBytes("UTF-8"));
return byteArrayOutputStream.toByteArray();
}
Please help.

JSON is based on JavaScript, and as part of the JavaScript definition it says that NEL characters could be translated into white spaces.
Based on that, if you don't want to have problems while viewing data imported from mainframe, you can convert the NEL characters into a white space.
Also, in this definition we can see that it can be converted into a white space

Related

Access Error: invalid UTF-8 encoding ${FFD8FFE0}

Right now there aren't really any books on red since it is so new. So I am trying to follow along an old Rebol book and salvage what I can from it.
I have found a few commands such read where I can't execute the code because of the file encoding.
save %/c/users/abagget/desktop/bay.jpg read http://rebol.com/view/bay.jpg
Access Error: invalid UTF-8 encoding: #{FFD8FFE0}
In Rebol this^ would have been read/binary and write/binary
>> write %/c/alex.txt read http://google.com
*** Access Error: invalid UTF-8 encoding: #{A050726F}
Is there a way to convert incoming content to UTF-8 so I can do the read?
Or are there other types of read that handle non-UTF-8?
In Rebol this^ would have been read/binary and write/binary
In Red too, save is for converting a Red datatype to a serialized text of binary format. So if you want to save to a JPEG file, you need to provide an image! value. read fetches a text content (limited to UTF-8 for now), so your usage is invalid. The proper line should be:
write/binary %/c/users/abagget/desktop/bay.jpg read/binary http://rebol.com/view/bay.jpg
Is there a way to convert incoming content to UTF-8 so I can do the read?
To obtain a string from a non-UTF-8 text resource, you need to fetch the resource as binary, and then write a poor's man converter which should work fine for the common Latin-1 encoding:
bin-to-string: function [bin [binary!]][
text: make string! length? bin
foreach byte bin [append text to char! byte]
text
]
Using it from the console:
>> bin-to-string read/binary http://google.com
== {<!doctype html><html itemscope="" itemtype="http://schema.org...
Red will provide proper converters for commonly used text encodings in the future. In the meantime, you can use such function, or write a proper decoder (using a conversion table) for the encodings you use most often.

Non-ISO extended-ASCII CSV giving special character while importing in DB

I am getting CSV from S3 server and inserting it into PostgreSQL using java.
S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
BufferedReader reader = new BufferedReader(
new InputStreamReader(object.getObjectContent())
);
For some of the rows the value in a column contains the special characters �. I tried using the encodings UTF-8, UTF-16 and ISO-8859-1 with InputStreamReader, but it didn't work out.
When the encoding WIN-1252 is used, the DB still shows some special characters, but when I export the data to CSV it shows the same characters which I found in the raw file.
But again when I am opening the file in Notepad the character is fine, but when I open it in excel, the same special character appears.
All the PostgreSQL stuff is quite irrelevant; PostgreSQL can deal with practically any encoding. Check your data with an utility such as enca to determine how it is encoded, and set your PostgreSQL session to that encoding. If the server is in the same encoding or in some Unicode encoding, it should work fine.

Writing CR+LF into Open XML from a Database

I'm trying to take some data stored in a database and populate a Word template's Content Controls with it using the Open XML SDK. The data contains paragraphs and so there are carriage return and line feed characters in it. The data is stored in the database as nvarchar.
When I open the generated document, the CR+LF combination shows up as a question mark with a box around it (not sure the name of this character). This is actually two sequences back to back, so CR+LF CR+LF equals two strange characters:
If I unzip the .docx, take the Custom XML part and do a hex dump, I can clearly see 0d0a 0d0a so the CR+LF is there. Word is just printing it weird.
I've tried enforcing UTF-8 encoding in my XmlWriter's settings, but that didn't seem to help:
Dim docStream As New MemoryStream
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Encoding = New UTF8Encoding(False)
Dim docWriter As XmlWriter = XmlTextWriter.Create(docStream, settings)
Does anyone know how I can get Word to render these characters correctly when written to a .docx through the Open XML SDK?
To bind to a Word 2013 rich text control, your XML element has to contain a complete docx. See [MS-DOCX]:
the data stored in the XML element will be an escaped string comprised of a flattened WordprocessingML document representing the formatted data in the structured document tag range.
Earlier versions couldn't bind a rich text control.
Things should work though (with CR/LF, not w:br), if you bind to a plain text control, with multiline set to true.

Cyrillic characters in Apache POI excel file hyperlink address

I use Scala and Apache POI (with folone/poi-scala).
I want to create a hyperlink to the local file in the cell. The path of the file contains cyrillic characters. And in Excel i can't open this file, i see '?' instead of cyrillic characters.
I tried to go through a lot of encodes and URL encoding, but it did not work.
Here is my code:
...
val cell = sheetOne.asPoi.getSheetAt(0).getRow(0).getCell(0)
cell.setHyperlink({
val link = new HSSFHyperlink(HSSFHyperlink.LINK_FILE)
link.setAddress("D:/Проверка/проверка.txt")
link
})
...
Any suggestions?
Need to replace
HSSFHyperlink.LINK_FILE
by
HSSFHyperlink.LINK_URL

How to convert UNICODE Hebrew appears as Gibberish in VBScript?

I am gathering information from a HEBREW (WINDOWS-1255 / UTF-8 encoding) website using vbscript and WinHttp.WinHttpRequest.5.1 object.
For Example :
Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
...
'writes the file as unicode (can't use Ascii)
Set Fileout = FSO.CreateTextFile("c:\temp\myfile.xml", true, true)
....
Fileout.WriteLine(objWinHttp.responsetext)
When Viewing the file in notepad / notepad++, I see Hebrew as Gibrish / Gibberish.
For example :
äìëåú - äøá àáøäí éåñó - îåøùú
I need a vbscript function to return Hebrew correctly, the function should be similar to the following http://www.pixiesoft.com/flip/ choosing the 2nd radio button and press convert button , you will see Hebrew correctly.
Your script is correctly fetching the byte stream and saving it as-is. No problems there.
Your problem is that the local text editor doesn't know that it's supposed to read the file as cp1255, so it tries the default on your machine of cp1252. You can't save the file locally as cp1252, so that Notepad will read it correctly, because cp1252 doesn't include any Hebrew characters.
What is ultimately going to be reading the file or byte stream, that will need to pick up the Hebrew correctly? If it does not support cp1255, you will need to find an encoding that is supported by that tool, and convert the cp1255 string to that encoding. Suggest you might try UTF-8 or UTF-16LE (the encoding Windows misleadingly calls 'Unicode'.)
Converting text between encodings in VBScript/JScript can be done as a side-effect of an ADODB stream. See the example in this answer.
Thanks to Charming Bobince (that posted the answer), I am now able to see HEBREW correctly (saving a windows-1255 encoding to a txt file (notpad)) by implementing the following :
Function ConvertFromUTF8(sIn)
Dim oIn: Set oIn = CreateObject("ADODB.Stream")
oIn.Open
oIn.CharSet = "X-ANSI"
oIn.WriteText sIn
oIn.Position = 0
oIn.CharSet = "WINDOWS-1255"
ConvertFromUTF8 = oIn.ReadText
oIn.Close
End Function