How to install a platform dictionary in Eclipse - eclipse

For spell checking purpose I would like to install an addictional "platform dictionary" in my Eclipse IDE.
You can see the list of platform dictionaries installed in Window > Preferences > General > Editors > Text Editors > Spelling, in the field "Platform dictionary". In my Helios Service Release 1 there are only english of UK or USA. I would to put the language of my country, so I can write comments in my language and have spell check. Eclipse help doesn't explain how.

If you can't find your language word list you can generate one using aspell.
aspell --lang=pl dump master | aspell --lang=pl expand | tr ' ' '\n' > pl.dict
In Ubuntu aspell generates list in UTF-8 in other systems you can add encoding option.
--encoding=utf-8

I am not sure you can add a "Platform dictionary", so that leaves you with a "user defined" one:
Eclipse supports a standard one-word-per-line format for the 'dictionary' file.
You can have several of those at Kevin's Word List on Sourceforge.net, including links to other sites.

If you can't find a good worldlist and can't run aspell, you can also get wordlists from Debian. On Windows, I used the Swiss German wordlist. Klick all, pick a mirror, download the .deb file, use 7-zip or similar to open it, open the data.tar inside, and find the file you are looking for. In my case it was /usr/share/dict/swiss.

Thank you #VonC, #Konrad Nowicki and #Alex Schröder for the earlier answers. I didn't find the other answers fully satisfying so I wanted to write my own answer. Your question:
How to install a platform dictionary in Eclipse?
For English:
Since you mentioned that US and UK English is included, no need to explain.
For non-English languages, like languages with strange characters like åäöü, example here is Swedish (tested and it works):
Download a text file of the words. My method for downloading a file of words: I googled swedish word list txt (just change swedish to whatever language you're looking for a dictionary and I hope you'll find a txt dictionary) and found this (this link worked 2018-09-25) GitHub repo with an Swedish dictionary: https://github.com/martinlindhe/wordlist_swedish. As long as your dictionary file is formatted correctly in UTF-8 and have EOL (End of lines) characters Unix (LF) it should be fine. However, if it isn't formatted as UTF-8 you'll have to convert the åäöü characters to reflect the UTF-8 standard, example program for this: Notepad++. If the dictionary has another EOL characters: Windows (CR LF) or Macintosh (CR), then just convert it to Unix (LF), example program for this: Notepad++. Then open a text editor, example program for this: Notepad++, to append the new dictionary to your custom dictionary where it is located, often times %userprofile%/eclipse/dictionary.txt or ~/eclipse/dictionary.txt depending on where you installed Eclipse. Restart Eclipse and it should work.

Related

gdb says "No such file or directory" when filename contains non-ascii (unicode) character

In VS Code, I tried to debug "구슬.cpp" with F5(gdb) and it says no such file or directory. But it works when I rename the file with "a.cpp".
Any solutions would be appreciated.
You're pulling the filename from a program which is using a multibyte character set in the current default encoding, which is for English versions of Windows.
ASCII doesn't include any extended characters for such situation. And that's why you can't use Unicode characters as a filename, otherwise it'll obviously throw an error.
Win+R -> intl.cpl
Administrative tab
Click the Change system locale button.
Enable Beta: Use Unicode UTF-8 for worldwide language support
Reboot
It helped me.

I need to remove a specific unicode in my existing subtitle text file

I basically work on subtitles and I have this arabic file and when I open it up on notepad and right click and select SHOW UNICODE CONTROL CHARACTERS I give me some weird characters on the left of every line. I tried so many ways to remove it but failed I also tried NOTEPAD++ but failed.
Notepad ++
SUBTITLE EDIT
EXCEL
WORD
288
00:24:41,960 --> 00:24:43,840
‫أتعلم، قللنا من شأنك فعلاً‬
289
00:24:44,000 --> 00:24:47,120
‫كان علينا تجنيدك لتكون جاسوساً‬
‫مكان (كاي سي)‬
290
00:24:47,280 --> 00:24:51,520
‫لا تعلمون كم أنا سعيد‬
‫لسماع ذلك‬
291
00:24:54,800 --> 00:24:58,160
‫لا تقلق، سيستيقظ نشيطاً غداً‬
292
00:24:58,320 --> 00:25:00,800
‫ولن يتذكر ما حصل‬
‫في الساعات الـ٦‬
the unicodes are not showing in this the unicode is U+202B which shows a ¶ sign, after googling it I think it's called PILCROW.
The issue with this is that it doesn't display subtitles correctly on ps4 app.
I need this PILCROW sign to go away. with this website I can see the issue in this file https://www.soscisurvey.de/tools/view-chars.php
The PILCROW ¶ is used by various software and publishers to show the end of a line in a document. The actual Unicode character does not exist in your file so you can't get rid of it.
The Unicode characters in these lines are 'RIGHT-TO-LEFT EMBEDDING'
(code \u202b) and 'POP DIRECTIONAL FORMATTING' (code \u202c) -
these are used in the text to indicate that the included text should be rendered
right-to-left instead of the ocidental left-to-right direction.
Now, these characters are included as hints to the application displaying the text, rather than to actually perform the text reversing - so they likely can be removed without compromising the text displaying itself.
Now this a programing Q&A site, but you did not indicate any programming language you are familiar with - enough for at least running a program. So it is very hard to know how give an answer that is suitable to you.
Python can be used to create a small program to filter such characters from a file, but I am not willing to write a full fledged GUI program, or an web app that you could run there just as an answer here.
A program that can work from the command line just to filter out a few characters is another thing - as it is just a few lines of code.
You have to store the follwing listing as a file named, say "fixsubtitles.py" there, and, with a terminal ("cmd" if you are on Windows) type python3 fixsubtitles.py \path\to\subtitlefile.txt and press enter.
That, of course, after installing Python3 runtime from http://python.org
(if you are on Mac or Linux that is already pre-installed)
import sys
from pathlib import Path
encoding = "utf-8"
remove_set = str.maketrans("\u202b\u202c")
if len(sys.argv < 2):
print("Usage: python3 fixsubtitles.py [filename]", file=sys.stderr)
exit(1)
path = Path(sys.argv[1])
data = path.read_text(encoding=encoding)
path.write_text(data.translate("", "", remove_set), encoding=encoding)
print("Done")
You may need to adjust the encoding - as Windows not always use utf-8 (the files can be in, for example "cp1256" - if you get an unicode error when running the program try using this in place of "utf-8") , and maybe add more characters to the set of characters to be removed - the tool you linked in the question should show you other such characters if any. Other than that, the program above should work

HtmlHelp hhc file doesn't show russian characters

I use free pascal's chmcmd command to create chm file from hhp. After converting content goes right, but left pane side (tree) doesn't show russian characters. I tried to set charset at hhc file to cp1251. And saved file in windows 1251 encoding. After that it shows tree in russian right in cool reader but not in xChm. In windows it still doesnt work, only weird symbols. Utf-8 doesn't work at all.
The Microsoft CHM help format is very old and not maintained anymore. It wasn't created with Unicode in mind and various tricks need to be done in order to be able to generate CHM files for certain encodings:
You Windows is setup in the target language of the help file
The content HTML pages must be created using the proper charset

Using unicode / utf-8 in programmers editors

There are a lot of programmers editors that claim to support unicode / utf-8. I've tried a number of them (UltraEdit, jedit, emedit) but none of them tell you how to actually enter unicode characters into a file. Some of them tell you how to change the default file encoding to utf-8 or how to select a font that has good support for utf-8, but not how to enter utf-8 into a file using their editor.
The Go language (and some others) support utf-8 and I like the idea of using the actual utf-8 symbols for variables instead of variables with names like omega. I haven't found a programmers editor yet that actually allows you to do this, though.
The only editor / word processor that I've found that lets you how to enter unicode is Microsoft Word. Type the unicode and Alt+X and Word converts it. To get the Greek letter omega type "03c9" followed by Alt+X. UltraEdit will let you copy utf-8 from a web page into it, but their docs don't say how to actually enter utf-8 in a file, and their tech. support people don't know either.
This should be simple, but seems to be completely undocumented. Is there some key combination convention the lets you enter unicode into these editors that supposedly support unicode the way that Ctrl-F is widely used for search?
Thanks.
The standard programmer’s editor vim(1) supports limited Unicode input even if your operating system should be too broken to do so (are there any such, still?).
Just enter ^VuXXXX, where XXXX represents exactly four hex digits.
That will allow you to enter the ~6% of Unicode allocated to the Basic Multilingual Plane. The rest are forbidden to you.
This may be fixed in a newer release.
Otherwise, just use your mouse.
A few techniques I use if an editor is lacking:
Use the Windows charmap.exe utility to select characters and paste into a document.
Install an input method editor (IME) to write in a particular language.
Windows ALT keycodes.
Better to set your keyboard to generate Unicode characters across all Windows applications than to rely on a single application's custom input feature IMO.
Use the EnableHexNumpad feature and you can type any character in the Basic Multilingual Plane using Alt+numbad-plus,hexcode. (May not be of much use on a laptop without a numpad though.)
Or if there are particular characters you want to type a lot, find a keyboard layout that allows you to type them directly. For example eurokb might cover it, or you can make your own with MSKLC.
Old question, but you can type a lot of unicode in GNU Emacs or Vim
GNU Emacs: M-x set-input-method RET tex (or C-x RET C-\ tex) will let you type \omega to generate ω
Vim: Vim digraphs can generate unicode; C-k w * in insert mode gives you ω.
deceze hit the nail on the head. (S)he just didn't elaborate. bobince gave a bit more.
And I'm hazarding a guess that you're a developer or tester working on L14N or I18N. I'm also guessing you need to do more than just a few characters here or there, or you'd be satisfied with pasting from another app. So, I'll share some advice. (note: here, "you" refers to the next person to look here. I'm sure the original poster doesn't care anymore by now. :-))
If you're on Windows 10, install an appropriate keyboard driver that lets you input the characters you want into any application. I'm sure Linux has support for the same sort of thing.
E.g. I'm teaching myself Hindi (हिंदी), so I installed Windows' Hindi (Devanangari) support. I typed "Hindi", in Hindi using that support, then I switched back to US English to do the rest of this post. If all you need are accented characters from Western European languages, you can install the INTL English support and type directly in español or français or whatever.
Don't look at entering Unicode characters as entering some sort of special data amidst your English text. It's just someone else's language. Use their keyboard. Type their language.
I'm writing a flashcard app to help my learning. I'm using the Hindi keyboard support to type characters into Word, WordPad, Excel, and the Visual Studio editor. And that Hindi keyboard support works exactly the same way in all of those apps, as I'd expect it to work in just about any text editor that supports Unicode. And as you saw above, it also works in a simple text edit control in Chrome. No copy and paste. No remembering special codes. It's as ubiquitous as ctrl-F.
It looks like the unicode support in programmers editors (except for some Microsoft products) is mostly read-only. They can open a file with unicode and display the characters, but typing unicode into a file is a different story. If you want to enter unicode in a programmers editor you can copy it from somewhere else (a web page or Microsoft Word or Notepad) and paste it into the editor, but the editors make typing unicode difficult or impossible.
UltraEdit tech support referred me to this web page which explains a lot. Unfortunately none of the solutions worked with UltraEdit.
Microsoft Word and Notepad support unicode entry. Type the unicode value followed by Alt+X and it converts the hexadecimal and displays it. You can then copy and paste it into UltraEdit or one of the other programmers editors. As others have mentioned unicode support depends on support within the operating system as well as the editor.
What got me interested in using unicode in source code files is Mark Summerfield's book Programming in Go. He includes an example .go file that uses unicode. It would be great to use unicode Greek characters for variable names instead of variables named "omega" or "theta".
Using unicode in source code is a bad idea, however. Support for unicode in programmers editors is lousy, and developers would have to save or convert their source code files to utf-8 instead of ASCII. Developer's tools are just not ready to write code in unicode no matter how neat the idea sounds.

How to use unicode characters in Eclipse File Search?

We have some XML file that contains some invalid character, and the program says neither which file it is, nor which line number or character offset. It would be a few seconds work to fix the problem if I could just search for exactly that character, but I cannot find how to express a Unicode character in the file search (or at least I assume so, since the search returns nothing).
Neither 0x1e nor \u001e seem to match anything.
[EDIT] I mean, I can still change the code, and eventually find which file it is by catching the Exception, and using some kind of script/tool to find where exactly the character is, but I do believe it should be possible to search with Unicode in Eclipse, and that is what I am asking in this question.
It may be a problem with the character encoding.
As you're going to need to perform a global / site-wide search to find the , you'll probably need to set the global text file encoding:
Preferences -> Workspace -> Text file encoding
This option may be under the 'General' section in Eclipse, depending on your setup and installed plugins etc.
Ensure that the encoding is set to UTF-8.
You will also need to escape the unicode character sequences, like so:
\u2665
(which I see you have tried)