Inno Setup: recording/recover file path in UTF8 - unicode

We are using Inno Setup (unicode version) to create resource package (or "samples") for our product. The program part of our product knows the location of the samples by a file that is written by samples installer. At current, it is implemented in plain way:
procedure CurStepChanged(CurStep: TSetupStep);
begin
if ( CurStep = ssPostInstall) then
begin
ForceDirectories(ExpandConstant('{userappdata}\MyCompany\MyApp'))
SaveStringToFile(ExpandConstant('{userappdata}\MyCompany\MyApp\SamplePath.txt'), ExpandConstant('{app}'), False);
end;
end;
This plain way has a fatal issue: the installer is run in Chinese language Windows, and the whole stuff works in GBK encoding, but our product is built up on UTF8 base.
After some search, I got some solution by calling Windows WideCharToMultiByte inside Pascal code. However this won't work, as it requires UTF16 as input, but what I have is GBK.
In addition, the Inno Setup also won't work with existing UTF8 file name in my SamplePath.txt. If I manually edit the SamplePath.txt file to fill UTF8-encoded Chinese letters, and initialize the app builtin with following code, it displays messy characters in dir selection page:
[Setup]
DefaultDirName={code:GetPreviousSampleDir}
[code]
function GetPreviousSampleDir(Param: String): String;
var
tmp: AnsiString;
begin
if FileExists( ExpandConstant('{userappdata}\MyCompany\MyApp\SamplePath.txt') ) then
begin
LoadStringFromFile(ExpandConstant('{userappdata}\MyCompany\MyApp\SamplePath.txt'), tmp)
Result := tmp
end
else
begin
Result := 'D:\MyApp_samples'
end;
end;
So is there any way to load/store a file name with i18n characters in UTF8?

To load a string from UTF-8 file, use LoadStringFromFileInCP from
Inno Setup - Convert array of string to Unicode and back to ANSI
const
CP_UTF8 = 65001;
{ ... }
var
FileName: string;
S: string;
begin
FileName := 'test.txt';
if not LoadStringFromFileInCP(FileName, S, CP_UTF8) then
begin
Log('Error reading the file');
end
else
begin
Log('Read: ' + S);
end;
end;
To save UTF-8 file without BOM:
either use SaveStringsToFileInCP from the same question
or see Create a UTF8 file without BOM with Inno Setup (Unicode version)

Related

How to get the length of a formatted string read in using fscanf in SystemVerilog?

I am reading a text file which has string test cases and decode them to process as Verilog test constructs to simulate. The code that I use to read a file is as follows:
integer pntr,file;
string a,b,c,d;
initial
begin
pntr = $fopen(FOO, "r");
end
always
begin
if(!$feof(pntr))
begin
file = $fscanf(pntr, "%s %s %s %s \n", a,b,c,d);
end
else
$fclose(pntr);
I have tried using
integer k;
k = strlen($fscanf(pntr, "%s %s %s %s \n", a,b,c,d));
$display(k);
and the display statement outputs an "x"
I also tried using
$display(file)
but this also gives me x as the display output. The above code is just a representation of my problem, I am using a larger formatted string to read in larger data. Each line of my testcase may have different size. I have initialized the format to the maximum number of string literals that my testcase can have. I wanted to ask if there is a way to get the length of each line that I read or number of string literals that fscanf read ?
Note: I am using Cadence tools for this task.
Input file looks like
read reg_loc1
write regloc2 2.5V regloc3 20mA
read regloc3 regloc5 regloc7
It's hard to debug your code when you have lots of typos and incomplete code. And you also have a race condition in that pntr may not have been assigned from $fopen if the always block executes before the initial block.
But in any case, the problem with using $fscanf and the %s format is that a newline gets treated as whitespace. It's better to use $fgets to read a line at a time, and the use $sscanf to parse the line:
module top;
integer pntr,file;
string a,b,c,d, line;
initial begin
pntr = $fopen("FOO", "r");
while(!$feof(pntr))
if ((file = $fgets(line,pntr)!=0)) begin
$write("%d line: ", file, line);
file = $sscanf(line, "%s %s %s %s \n", a,b,c,d);
$display(file,,a,b,,c,,d);
end
$fclose(pntr);
end
endmodule

Convert Hexadecimal to Base64 in Authotkey

I just found this code written in python to convert Hexadecimal to Base64:
import codecs
hex = "10000000000002ae"
b64 = codecs.encode(codecs.decode(hex, 'hex'), 'base64').decode()
So, Is it possible to find same working one but in Autohotkey?
There many implementations, some very fast and some more simple. You can take a look at libcrypt.ahk for many encoding and encryption algorithms including Hexadecimal and Base64.
If you do use it, you'll want to look at LC_Hex2Bin() and LC_Base64_Encode(). Examples are available too. You will likely want this libcrypt.ahk file in particular.
Example
#Include libcrypt.ahk
Hex := "48656c6c6f20576f726c642100"
len := LC_Hex2Bin(Bin, Hex)
LC_Base64_Encode(base64, Bin, len)
MsgBox % base64
Or as single function
#Include libcrypt.ahk
MsgBox % hexstring2base64("48656c6c6f20576f726c642100")
hexstring2base64(hex_string) {
len := LC_Hex2Bin(Bin, hex_string)
LC_Base64_Encode(base64, Bin, len)
return base64
}

Unicode encoding / decoding error in Free Pascal 3.2.0

This test passed with Free Pascal 3.0.4. (source file encoding is UTF8, OS is Windows 10 64 Bit)
{$MODE DELPHI}
...
var
Raw: RawByteString;
Actual: string;
begin
Raw := UTF8Encode('关于汉语');
Actual := string (UTF8Decode(Raw));
CheckEquals('关于汉语', Actual);
end
With Free Pascal 3.2.0 it fails:
expected: <关于汉语> but was: <å³äºæ±è¯­>
RawByteString is declared as type AnsiString(CP_NONE) in system.h
The conversion works if I cast (or declare) the characters as UnicodeString. The following test succeeds with Free Pascal 3.2.0:
procedure TFreePascalTests.TestUTF8Encode;
const
THE_CHARACTERS: UnicodeString = '关于汉语';
var
Raw: UTF8String;
Actual: UnicodeString;
begin
Raw := UTF8Encode(THE_CHARACTERS);
Actual := UTF8Decode(Raw);
CheckEquals(THE_CHARACTERS, Actual);
end;
The Raw variable may be defined as RawByteString or Utf8String.

PowerBuilder 12 how to determine encoding of input file

I'm new to PowerBuilder 12, and would like to know is there any way to determine the encoding (e.g. Unicode, BIG5) of an input file. Any comments and code samples are appreciated! Thanks!
From the PB 12.5 help file :
FileEncoding ( filename )
filename : The name of the file you want to test for encoding type
Return Values
EncodingANSI!
EncodingUTF8!
EncodingUTF16LE!
EncodingUTF16BE!
If filename does not exist, returns null.
Finding Unicode is pretty easy, if you assume the Unicode file has a BOM prefix (although reality is that not all Unicode files do have this). Some code to do this is below. However, I have no idea about Big5; it looks to me (at first glance at the spec, never had occasion to use it) like it doesn't have a similar prefix.
Good luck,
Terry
function of_filetype (string as_filename) returns encoding
integer li_NullCount, li_NonNullCount, li_OffsetTest
long ll_File
encoding le_Return
blob lblb_UTF16BE, lblb_UTF16LE, lblb_UTF8, lblb_Test, lblb_BOMTest, lblb_Null
lblb_UTF16BE = Blob ("~hFE~hFF", EncodingANSI!)
lblb_UTF16LE = Blob ("~hFF~hFE", EncodingANSI!)
lblb_UTF8 = Blob ("~hEF~hBB~hBF", EncodingANSI!)
lblb_Null = blobmid (blob ("~h01", encodingutf16le!), 2, 1)
SetNull (le_Return)
// Get a set of bytes to test
ll_File = FileOpen (as_FileName, StreamMode!, Read!, Shared!)
FileRead (ll_File, lblb_Test)
FileClose (ll_File)
// test for BOMs: UTF-16BE (FF FE), UTF-16LE (FF FE), UTF-8 (EF BB BF)
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF16BE))
IF lblb_BOMTest = lblb_UTF16BE THEN RETURN EncodingUTF16BE!
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF16LE))
IF lblb_BOMTest = lblb_UTF16LE THEN RETURN EncodingUTF16LE!
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF8))
IF lblb_BOMTest = lblb_UTF8 THEN RETURN EncodingUTF8!
//I've removed a hack from here that I wouldn't encourage others to use, basically checking for
//0x00 in places I'd "expect" them to be if it was a Unicode file, but that makes assumptions
RETURN le_Return

How to convert Unicode characters to escape codes

So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions?
Clarification:
I have a string like {\b\cf12 よろてそ } and I want a string like {\b\cf12 [STUFF]}, where [STUFF] will display as よろてそ when I view the rtf text.
You can simply use the AscW() function to get the correct value:-
sRTF = "\u" & CStr(AscW(char))
Note unlike other escapes for unicode, RTF uses the decimal signed short int (2 bytes) representation for a unicode character. Which makes the conversion in VB6 really quite easy.
Edit
As MarkJ points out in a comment you would only do this for characters outside of 0-127 but then you would also need to give some other characters inside the 0-127 range special handling as well.
Another more roundabout way, would be to add the MSScript.OCX to the project and interface with VBScript's Escape function. For example
Sub main()
Dim s As String
s = ChrW$(&H3088) & ChrW$(&H308D) & ChrW$(&H3066) & ChrW$(&H305D)
Debug.Print MyEscape(s)
End Sub
Function MyEscape(s As String) As String
Dim scr As Object
Set scr = CreateObject("MSScriptControl.ScriptControl")
scr.Language = "VBScript"
scr.Reset
MyEscape = scr.eval("escape(" & dq(s) & ")")
End Function
Function dq(s)
dq = Chr$(34) & s & Chr$(34)
End Function
The Main routine passes in the original Japanese characters and the debug output says:
%u3088%u308D%u3066%u305D
HTH