VB6 load unicode string as 'ChrW$ (&H410)' etc from txt file - unicode

I have a long Unicode string saved in Unicode encoding from notepad, in this form
ChrW$ (&H410) & " " & ChrW$(&H430) & vbNewLine & ChrW$(&H42F)
etc, to end of file
If I assign the above code as the value of an Ink Edit box in code, it displays the correct Unicode chars, which is what I wanted.
But for some reason I can't find the right way to open the text file and get that to display the Unicode chars. This is probably very simple, but I've got totally confused.
What is a simple way of achieving this? Thanks

Assuming your file has the Unicode text and not VB expressions as you showed... not much to it:
Dim F As Integer
Dim Text() As Byte
F = FreeFile(0)
Open "SomeUnicode.txt" For Binary Access Read As #F
'File is UTF-16LE, so we'll skip the BOM:
ReDim Text(LOF(F) - 3)
Get #F, 3, Text
Close #F
InkEd.Text = Text
Otherwise you'll need an expression evaluator, and you could use the Microsoft Script Control to process such expressions if you drop the $ type decorators.

Related

BBEdit "Find" dialog vs. CR and LF

In BBEdit (v11.6), when I search for the "\r" character in a txt file previoulsy saved as "Unix (LF)" from the "Save as..." dialog, the result is the end of each individual line of the file.
Why?
The BBEdit hex dump correctly shows that no CR (OD) chars are present in the file.
From the 11.6 release notes:
BBEdit now uses the line feed (ASCII decimal 10) as line breaks in its internal representation for text in open documents, instead of the carriage return (ASCII decimal 13) that was the standard Mac format for many years. This (theoretically) reduces the time required to open documents, since in the normal case, no conversion is necessary; it also eliminates conversion logic when copying and pasting text, since LF-delimited text is also the standard interchange format on the Clipboard.
As before, you may use \n and \r interchangeably in search strings and Grep patterns. (The latter usage is for compatibility with old versions of BBEdit.)

Writing CR+LF into Open XML from a Database

I'm trying to take some data stored in a database and populate a Word template's Content Controls with it using the Open XML SDK. The data contains paragraphs and so there are carriage return and line feed characters in it. The data is stored in the database as nvarchar.
When I open the generated document, the CR+LF combination shows up as a question mark with a box around it (not sure the name of this character). This is actually two sequences back to back, so CR+LF CR+LF equals two strange characters:
If I unzip the .docx, take the Custom XML part and do a hex dump, I can clearly see 0d0a 0d0a so the CR+LF is there. Word is just printing it weird.
I've tried enforcing UTF-8 encoding in my XmlWriter's settings, but that didn't seem to help:
Dim docStream As New MemoryStream
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Encoding = New UTF8Encoding(False)
Dim docWriter As XmlWriter = XmlTextWriter.Create(docStream, settings)
Does anyone know how I can get Word to render these characters correctly when written to a .docx through the Open XML SDK?
To bind to a Word 2013 rich text control, your XML element has to contain a complete docx. See [MS-DOCX]:
the data stored in the XML element will be an escaped string comprised of a flattened WordprocessingML document representing the formatted data in the structured document tag range.
Earlier versions couldn't bind a rich text control.
Things should work though (with CR/LF, not w:br), if you bind to a plain text control, with multiline set to true.

Applescript: Save Word documents as plaintext while retaining accents

I'm trying to save Word documents as plain text docs. Currently, some times the accents turn into other symbols (usually the same ones, for example: é turns into a theta). Other times it works fine. How do I prevent this?
Currently using the line:
save as active document file name FullDocPath file format format Unicode text
When I encounter this error, I can save the document using the dialog (selecting Western Mac OS Roman encoding...that fixes the problem.
The applescript Word dictionary mentions:
[text encoding unsigned integer] : Text encoding to use when saving out as text file
I have no idea if this is the piece I'm missing or how to utilize it (is there a set integer that designates Western Mac OS Roman encoding?)
Anyone have any ideas?
Try:
set wordDoc to choose file
do shell script "textutil -convert txt " & quoted form of POSIX path of (wordDoc as text)
Check out StefanK's solution using textutil
This is in response to your comment beginning "Thanks Stefan and bibadiak"
With .txt file formats is that there is no universally used way to specify the encoding of a file inside the file, so either the application has to guess, or you have to know the encoding and the application has to let you tell it.
AFAIK if you do not specify an output encoding when you use textutil to convert from .doc or .docx format to text, you get UTF-8. But Mac Word just does not seem to recognise that when you try to open it, either programmatically or in the UI.
So I think you need to do some mix of the following:
a. save in, and work with, a format that uses 16-bit Unicode encoding. Word should recognise that, certainly if the BOM is preserved
b. save to UTF and work with UTF elsewhere, but use textutil to do the conversion back to (say) .docx before you re-open the document in Mac Word
c. if all your characters can be encoded using Mac OS Roman, use e.g.
textutil -convert txt -encoding 30
to save, ensure you work only with that character set, and re-open with Word. (30 is the value of the APple NSString value NSMacOSRomanStringEncoding). I think textutil will fail to convert documents that contain characters outside the MacOS Roman set.

Breaking UTF-16 Unicode text by delimiters in Applescript?

I have a list of text coded in MacRoman, broken by linefeeds. Somehow a second list could not be saved in MacRoman, so I had to use Unicode UTF-16 to get German "ö", "ä" and stuff. While ListA gets filled like expected, listB doesn't get broken anymore and I end up with a single string, which I'm unable to break anymore/don't know how. Can someone help me out?
set ListA to (read file myFile1 using delimiter linefeed) as list
display dialog "" & item 1 of ListA
--> "Name A"
set ListB to (read file myFile2 using delimiter linefeed as Unicode text) as list
display dialog "" & item 1 of ListB
--> "Name A
Name B
Name C
Name D"
There can be many different types of characters that separate lines in text files. It's not always a linefeed. The easiest way to handle them is with the applescript command "paragraphs" rather than using the delimiter when reading the file. Paragraphs is pretty good at figuring out what character is used and handling it. It doesn't always work but it's worth a try before you go any deeper into the problem. As such, try reading your files like this...
set ListB to paragraphs of (read file myFile2 as Unicode text)
If that doesn't work then you'll have to try and figure out what the character is. What I do in these cases is physically open the file and select the return character with my mouse... and copy it. Then I go back to AppleScript Editor and paste it into this command. Paste it where I have the letter "a". It will give you the character id.
id of "a"
Then you can read the file using the delimiter like this, obviously using the id number from the command above in place of 97...
set ListB to read file myFile2 using delimiter (character id 97) as Unicode text
Are you sure the file uses LF line endings? This works for me:
set f to POSIX file "/tmp/1"
set b to open for access f with write permission
set eof b to 0
write "あ" & linefeed & "い" to b as Unicode text -- UTF-16
close access b
read f using delimiter linefeed as Unicode text
Did you try saving the file as UTF-8? You can read it by replacing Unicode text with «class utf8».

How to convert UNICODE Hebrew appears as Gibberish in VBScript?

I am gathering information from a HEBREW (WINDOWS-1255 / UTF-8 encoding) website using vbscript and WinHttp.WinHttpRequest.5.1 object.
For Example :
Set objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
...
'writes the file as unicode (can't use Ascii)
Set Fileout = FSO.CreateTextFile("c:\temp\myfile.xml", true, true)
....
Fileout.WriteLine(objWinHttp.responsetext)
When Viewing the file in notepad / notepad++, I see Hebrew as Gibrish / Gibberish.
For example :
äìëåú - äøá àáøäí éåñó - îåøùú
I need a vbscript function to return Hebrew correctly, the function should be similar to the following http://www.pixiesoft.com/flip/ choosing the 2nd radio button and press convert button , you will see Hebrew correctly.
Your script is correctly fetching the byte stream and saving it as-is. No problems there.
Your problem is that the local text editor doesn't know that it's supposed to read the file as cp1255, so it tries the default on your machine of cp1252. You can't save the file locally as cp1252, so that Notepad will read it correctly, because cp1252 doesn't include any Hebrew characters.
What is ultimately going to be reading the file or byte stream, that will need to pick up the Hebrew correctly? If it does not support cp1255, you will need to find an encoding that is supported by that tool, and convert the cp1255 string to that encoding. Suggest you might try UTF-8 or UTF-16LE (the encoding Windows misleadingly calls 'Unicode'.)
Converting text between encodings in VBScript/JScript can be done as a side-effect of an ADODB stream. See the example in this answer.
Thanks to Charming Bobince (that posted the answer), I am now able to see HEBREW correctly (saving a windows-1255 encoding to a txt file (notpad)) by implementing the following :
Function ConvertFromUTF8(sIn)
Dim oIn: Set oIn = CreateObject("ADODB.Stream")
oIn.Open
oIn.CharSet = "X-ANSI"
oIn.WriteText sIn
oIn.Position = 0
oIn.CharSet = "WINDOWS-1255"
ConvertFromUTF8 = oIn.ReadText
oIn.Close
End Function