I need to be able to store certain Unicode characters in a few of my registry keys, but am unable to find the syntax to do this in an .iss file. I am using the Unicode version of Inno Setup
The Inno Setup site says this about Unicode values:
you can for example instead use encoded Unicode characters to build Unicode strings (like S := #$0100 + #$0101 + 'Aa';), or load the string from a file using LoadStringsFromFile, or use a {cm:...} constant.
For example one of the one's I want to enter is the degrees Fahrenheit symbol (℉) which is #$2109.
I can't put #$2109 directly into the value string because that just prints out that text.
I tried to create a #define constant but it doesn't recognize the # and $ characters.
So I want:
[Registry]
Root: HKLM; Subkey: "MyPath"; ValueType string; ValueName: "MyName; \
ValueData: "Temperature [℉]"
but obviously I cannot put it in directly.
How do I get Unicode characters into the registry section, either directly or via some variable/constant, I'm fairly new to Inno Setup.
Thanks in advance!
Just make sure your .iss file is UTF-8 encoded with BOM.
Then you can use UTF-8 strings directly in it (with Unicode version of Inno Setup), as the documentation says:
Unicode Inno Setup supports UTF-8 encoded .iss files (but not UTF-16).
[Registry]
Root: HKLM; Subkey: "MyPath"; ValueType: string; ValueName: "MyName"; \
ValueData: "Temperature [℉]"
(note that the entry syntax in your question is wrong, you are missing a colon and a quote)
An easy way to save the file in UTF-8 with BOM:
Open the .iss file in Inno Setup Compiler GUI.
Go to File > Save Encoding and select UTF-8.
Save the file.
You need to do this before inserting your UTF-8 string. Also note that the Inno Setup Compiler editor cannot display the ℉, but it will still work ok, when compiled.
Another way is:
Open the .iss file in Windows Notepad.
Go to File > Save As.
Select UTF-8 in Encoding drop down box.
Click Save.
Windows Notepad can display the ℉ (with an appropriate font, like the default Consolas or Lucida Console).
An easy way is to write what you need into Notepad++
e.g. : Temperature + (ALT+2109) //
Set the encode to UTF-8 without BOM
select the whole line and (Ctrl+C) copy
paste to the ValueData the copied line "Temperature [?]"
[Registry]
Root: ... ValueType: string; ValueName: "AString"; ValueData: "Temperature [?]"
That's all
Another solution we use a constant and a function :
FHcnst1 = #$2109#$20#$54#$65#$6D#$70; // ℉ Temp
and we add together (℉ Temperature Const)
℉ Temp
erature Const
... ValueType: string; ValueName: "AConst"; ValueData: "{code:SetTemperature|1}erature Const"
The function "SetTemperature"
[Code]
var
UserPage: TInputQueryWizardPage;
UsagePage: TInputOptionWizardPage;
DataDirPage: TInputDirWizardPage;
const
FHcnst1 = #$2109#$20#$54#$65#$6D#$70; // ℉ Temp
FHcnst2 = #$2109#$20;
...
function SetTemperature(Param: String): String;
begin
if Param = '1' then Result := FHcnst1;
if Param = '2' then Result := FHcnst2;
end;
The Result :
The Hack:
You must write to the registry three bytes.
Only the Unicode #$2109 will not work.
A good one is #$20 space (because invisible)
FHcnst2 = #$2109#$20;
Related
I'm running this code in Windows Powershell and it includes this file called languages.txt where I'm trying to convert between bytes to strings:
Here is languages.txt:
Afrikaans
አማርኛ
Аҧсшәа
العربية
Aragonés
Arpetan
Azərbaycanca
Bamanankan
বাংলা
Bân-lâm-gú
Беларуская
Български
Boarisch
Bosanski
Буряад
Català
Чӑвашла
Čeština
Cymraeg
Dansk
Deutsch
Eesti
Ελληνικά
Español
Esperanto
فارسی
Français
Frysk
Gaelg
Gàidhlig
Galego
한국어
Հայերեն
हिन्दी
Hrvatski
Ido
Interlingua
Italiano
עברית
ಕನ್ನಡ
Kapampangan
ქართული
Қазақша
Kreyòl ayisyen
Latgaļu
Latina
Latviešu
Lëtzebuergesch
Lietuvių
Magyar
Македонски
Malti
मराठी
მარგალური
مازِرونی
Bahasa Melayu
Монгол
Nederlands
नेपाल भाषा
日本語
Norsk bokmål
Nouormand
Occitan
Oʻzbekcha/ўзбекча
ਪੰਜਾਬੀ
پنجابی
پښتو
Plattdüütsch
Polski
Português
Română
Romani
Русский
Seeltersk
Shqip
Simple English
Slovenčina
کوردیی ناوەندی
Српски / srpski
Suomi
Svenska
Tagalog
தமிழ்
ภาษาไทย
Taqbaylit
Татарча/tatarça
తెలుగు
Тоҷикӣ
Türkçe
Українська
اردو
Tiếng Việt
Võro
文言
吴语
ייִדיש
中文
Then, here is the code I used:
import sys
script, input_encoding, error = sys.argv
def main(language_file, encoding, errors):
line = language_file.readline()
if line:
print_line(line, encoding, errors)
return main(language_file, encoding, errors)
def print_line(line, encoding, errors):
next_lang = line.strip()
raw_bytes = next_lang.encode(encoding, errors=errors)
cooked_string = raw_bytes.decode(encoding, errors=errors)
print(raw_bytes, "<===>", cooked_string)
languages = open("languages.txt", encoding="utf-8")
main(languages, input_encoding, error)
Here's the output (credit: Learn Python 3 the Hard Way by Zed A. Shaw):
I don't know why it doesn't upload the characters and shows question blocks instead. Can anyone help me?
The first string which fails is አማርኛ. The first character, አ is in unicode 12A0 (see here). In UTF-8, that is b'\xe1\x8a\xa0'. So, that part is obviously fine. The file really is UTF-8.
Printing did not raise an exception, so your output encoding can handle all of the characters. Everything is fine.
The only remaining reason I see for it to fail is that the font used in the console does not support all of the characters.
If it is just for play, you should not worry about it. Consider it working correctly.
On the other hand, I would suggest changing some things in your code:
You are running main recursively for each line. There is absolutely no need for that and it would run into recursion depth limit on a longer file. User a for loop instead.
for line in lines:
print_line(line, encoding, errors)
You are opening the file as UTF-8, so reading from it automatically decodes UTF-8 into Unicode, then you encode it back into row_bytes and then encode again into cooked_string, which is the same as line. It would be a better exercise to read the file as raw binary, split it on newlines and then decode. Then you'd have a clearer picture of what is going on.
with open("languages.txt", 'rb') as f:
raw_file_contents = f.read()
I have a function called GetServerName. I need to pass the file name (say for example 'test.txt') as well as a needed section string (say for example 'server')
The test.txt file is contains something like this
data1 | abcd
data2 | efgh
server| 'serverName1'
data3 | ijkl
I need to extract server name so in my function I will pass something like GetServerName('test.txt', 'server') and it should return serverName1.
My problem is that the test.txt was an ANSI-encoded file earlier. Now it can be an ANSI-encoded file or Unicode-encoded file. Below function worked correctly for ANSI-encoded file, but giving problem, if file is encoded in UNICODE. I suspect something with LoadStringsFromFile function. Because when, I debug I could see it returns Unicode characters instead of human readable characters. How to solve my issue simply? (or how to find the type of encoding of my file and how to convert UNICODE string to ANSI for comparison, then I can do it myself)
function GetServerName(const FileName, Section: string): string;
//Get Smartlink server name
var
DirLine: Integer;
LineCount: Integer;
SectionLine: Integer;
Lines: TArrayOfString;
//Lines: String;
AHA: TArrayOfString;
begin
Result := '';
if LoadStringsFromFile(FileName, Lines) then
begin
LineCount := GetArrayLength(Lines);
for SectionLine := 0 to LineCount - 1 do
begin
AHA := StrSplit(Trim(Lines[SectionLine]), '|')
if AHA[0] = Section then
begin
Result := AHA[1];
Exit;
end
end
end;
end;
First, note that the Unicode is not an encoding. The Unicode is a character set. Encoding is UTF-8, UTF-16, UTF-32 etc. So we do not know which encoding you actually use.
In the Unicode version of Inno Setup, the LoadStringsFromFile function (plural – do not confuse with singular LoadStringFromFile) uses the current Windows Ansi encoding by default.
But, if the file has the UTF-8 BOM, it will treat the contents accordingly. The BOM is a common way to autodetect the UTF-8 (and other UTF-*) encoding. You can create a file in the UTF-8 encoding with BOM using Windows Notepad.
UTF-16 or other encodings are not supported natively.
For implementation of reading UTF-16 file, see Reading UTF-16 file in Inno Setup Pascal script.
For working with files in any encoding, including UTF-8 without BOM, see Inno Setup - Convert array of string to Unicode and back to ANSI or Inno Setup replace a string in a UTF-8 file without BOM.
I have a function called GetServerName. I need to pass the file name (say for example 'test.txt') as well as a needed section string (say for example 'server')
The test.txt file is contains something like this
data1 | abcd
data2 | efgh
server| 'serverName1'
data3 | ijkl
I need to extract server name so in my function I will pass something like GetServerName('test.txt', 'server') and it should return serverName1.
My problem is that the test.txt was an ANSI-encoded file earlier. Now it can be an ANSI-encoded file or Unicode-encoded file. Below function worked correctly for ANSI-encoded file, but giving problem, if file is encoded in UNICODE. I suspect something with LoadStringsFromFile function. Because when, I debug I could see it returns Unicode characters instead of human readable characters. How to solve my issue simply? (or how to find the type of encoding of my file and how to convert UNICODE string to ANSI for comparison, then I can do it myself)
function GetServerName(const FileName, Section: string): string;
//Get Smartlink server name
var
DirLine: Integer;
LineCount: Integer;
SectionLine: Integer;
Lines: TArrayOfString;
//Lines: String;
AHA: TArrayOfString;
begin
Result := '';
if LoadStringsFromFile(FileName, Lines) then
begin
LineCount := GetArrayLength(Lines);
for SectionLine := 0 to LineCount - 1 do
begin
AHA := StrSplit(Trim(Lines[SectionLine]), '|')
if AHA[0] = Section then
begin
Result := AHA[1];
Exit;
end
end
end;
end;
First, note that the Unicode is not an encoding. The Unicode is a character set. Encoding is UTF-8, UTF-16, UTF-32 etc. So we do not know which encoding you actually use.
In the Unicode version of Inno Setup, the LoadStringsFromFile function (plural – do not confuse with singular LoadStringFromFile) uses the current Windows Ansi encoding by default.
But, if the file has the UTF-8 BOM, it will treat the contents accordingly. The BOM is a common way to autodetect the UTF-8 (and other UTF-*) encoding. You can create a file in the UTF-8 encoding with BOM using Windows Notepad.
UTF-16 or other encodings are not supported natively.
For implementation of reading UTF-16 file, see Reading UTF-16 file in Inno Setup Pascal script.
For working with files in any encoding, including UTF-8 without BOM, see Inno Setup - Convert array of string to Unicode and back to ANSI or Inno Setup replace a string in a UTF-8 file without BOM.
I have an ascii string, e.g.
"\u005c\u005c192.150.4.89\u005ctpa_test_python\u005c5.1\u005c\videoquality\u005crel_5.1.1Mx86\u005cblacklevelsetting\u005c\u5e8f\u5217\u5e8f\u5217.xml"
And I want to convert it into unicode and dump into a file, so that it gets dumped like:
"\\192.150.4.89\tpa\tpa_test_python\5.1\videoquality\logs\blacklevelsetting\序列序列.xml"
Please share your thoughts.
Thanks,
Abhishek
Use the unicode_escape codec. Python 3 example:
s=rb'\u005c\u005c192.150.4.89\u005ctpa_test_python\u005c5.1\u005cvideoquality\u005crel_5.1.1Mx86\u005cblacklevelsetting\u005c\u5e8f\u5217\u5e8f\u5217.xml'
s=s.decode('unicode_escape')
with open('out.txt','w',encoding='utf8') as f:
f.write(s)
Output to file:
\\192.150.4.89\tpa_test_python\5.1\videoquality\rel_5.1.1Mx86\blacklevelsetting\序列序列.xml
Note: There was an extra backslash before videoquality that turned the v to a \v character (vertical form feed) that I removed from your example string.
Where can I find a list of allowed characters in filenames, depending on the operating system?
(e.g. on Linux, the character : is allowed in filenames, but not on Windows)
You should start with the Wikipedia Filename page. It has a decent-sized table (Comparison of filename limitations), listing the reserved characters for quite a lot of file systems.
It also has a plethora of other information about each file system, including reserved file names such as CON under MS-DOS. I mention that only because I was bitten by that once when I shortened an include file from const.h to con.h and spent half an hour figuring out why the compiler hung.
Turns out DOS ignored extensions for devices so that con.h was exactly the same as con, the input console (meaning, of course, the compiler was waiting for me to type in the header file before it would continue).
OK, so looking at Comparison of file systems if you only care about the main players file systems:
Windows (FAT32, NTFS): Any Unicode except NUL, \, /, :, *, ?, ", <, >, |. Also, no space character at the start or end, and no period at the end.
Mac(HFS, HFS+): Any valid Unicode except : or /
Linux(ext[2-4]): Any byte except NUL or /
so any byte except NUL, \, /, :, *, ?, ", <, >, | and you can't have files/folders call . or .. and no control characters (of course).
On Windows OS create a file and give it a invalid character like \ in the filename. As a result you will get a popup with all the invalid characters in a filename.
To be more precise about Mac OS X (now called MacOS) / in the Finder is interpreted to : in the Unix file system.
This was done for backward compatibility when Apple moved from Classic Mac OS.
It is legitimate to use a / in a file name in the Finder, looking at the same file in the terminal it will show up with a :.
And it works the other way around too: you can't use a / in a file name with the terminal, but a : is OK and will show up as a / in the Finder.
Some applications may be more restrictive and prohibit both characters to avoid confusion or because they kept logic from previous Classic Mac OS or for name compatibility between platforms.
Rather than trying to identify all the characters that are unwanted,
you could just look for anything except the acceptable characters. Here's a regex for anything except posix characters:
cleaned_name = re.sub(r'[^[:alnum:]._-]', '', name)
For "English locale" file names, this works nicely. I'm using this for sanitizing uploaded file names. The file name is not meant to be linked to anything on disk, it's for when the file is being downloaded hence there are no path checks.
$file_name = preg_replace('/([^\x20-~]+)|([\\/:?"<>|]+)/g', '_', $client_specified_file_name);
Basically it strips all non-printable and reserved characters for Windows and other OSs. You can easily extend the pattern to support other locales and functionalities.
I took a different approach. Instead of looking if the string contains only valid characters, I look for invalid/illegal characters instead.
NOTE: I needed to validate a path string, not a filename. But if you need to check a filename, simply add / to the set.
def check_path_validity(path: str) -> bool:
# Check for invalid characters
for char in set('\?%*:|"<>'):
if char in path:
print(f"Illegal character {char} found in path")
return False
return True
Here is the code to clean file name in python.
import unicodedata
def clean_name(name, replace_space_with=None):
"""
Remove invalid file name chars from the specified name
:param name: the file name
:param replace_space_with: if not none replace space with this string
:return: a valid name for Win/Mac/Linux
"""
# ref: https://en.wikipedia.org/wiki/Filename
# ref: https://stackoverflow.com/questions/4814040/allowed-characters-in-filename
# No control chars, no: /, \, ?, %, *, :, |, ", <, >
# remove control chars
name = ''.join(ch for ch in name if unicodedata.category(ch)[0] != 'C')
cleaned_name = re.sub(r'[/\\?%*:|"<>]', '', name)
if replace_space_with is not None:
return cleaned_name.replace(' ', replace_space_with)
return cleaned_name