How to display non-ascii in emacs? - emacs

I am looking at text files of Portuguese words in firefox, and I see accented characters correctly, but when I download the text files and open them in emacs, I see n\372mero (which is número), rela\347\343o (which is relação), and so on. What needs to be done to display the words correctly in emacs?

Try M-x revert-buffer-with-coding-system. It looks like the file is in latin-1.

Related

View text or log files using color escape sequences as colored in vscode

I am writing some logs and graphs to .log and .txt files in NodeJS. I use chalk to color my logs and help things stand out. I am also using asciichart to generate some lo-fi but very helpful graphs.
As far as I understand both of these libraries use escape sequences to color text, i.e. \x1b[32m for green, \x1b[31m for red, etc. These escape sequences, when interpereted by a terminal console, get printed as actual color. This is nicely explained in this answer to the question "How to change node.js's console font color?"
I would like to be able to see similarly colored text within an actual text file. Obviously a text file cannot show colors, but I am wondering if a way exists to view a text file such that the escape characters are processed/parsed, and colors are shown, in the same way that happens within a terminal console.
For example, writing a colored asciichart graph to a file looks like this in vscode:
Obviously as a text file \x1b[34m╭\x1b[0m\x1b[34m─\x1b[0m would just show up as such.
Does anyknow know of any vscode extensions, vscode custom language settings, or any other text viewers for that matter, that would be able to view a .txt or .log file, such that the escape characters are used to color the text, rather than be shown as a big text mess above? (Could this feasibly be written as a vscode extension / custom language setting?). While the question https://unix.stackexchange.com/questions/262185/display-file-with-ansi-colors has some nice hints, it ultimately only shows how to view the file in the terminal, not in a more ui-friendly file-viewer.
Well I asked this a bit prematurely. It turns out ANSI Colors vscode extension does exactly this. I'll leave this here for anyone that may have the same needs in the future.

How to convert embedded CRLF codes to their REAL newlines in Vscode?

I searched everywhere for this, the problem is that the search criteria is very similar to other questions.
The issue I have is that file (script actually) is embedded in another file. So when I open the parent file I can see the script as massive string with several \n and \r\n codes. I need a way to convert these codes to what they should be so that it formats the code correctly then I can read said code and work on it.
Quick snippet:
\n\n\n\n\nlocal scriptingFunctions\n\n\n\n\nlocal measuringCircles = {}\r\nlocal isCurrentlyCheckingCoherency
Should covert to:
local scriptingFunctions
local measuringCircles = {}
local isCurrentlyCheckingCoherency
perform a Regex Find-Replace
Find: (\\r)?\\n
Replace: \n
If you don't need to reconvert from newlines to \n after you're done working on the code, you can accomplish the trick by simply pressing ctrl-f and substituting every occurrence of \n with a new line (you can type enter in the replace box by pressing ctrl-enter or shift-enter).
See an example ctrl-f to do this:
If after you're done working on the code you need to reconvert to \n, you can add an invisible char to the replace string (typing it like ctrl-enter invisibleChar), and after you're done you can re-replace it with \n.
There's plenty of invisible chars, but I'd personally suggest [U+200b] (you can copy it from here); another good one is [U+2800] (⠀), as it renders as a normal whitespace, and thus is noticeable.
A thing to notice is that recent versions of vscode will show a highlight around invisible chars, but you can easily disable it by clicking on Adjust settings and then selecting Exclude from being highlighted.
If you need to reenable highlighting in the future, you'll have to look for "editor.unicodeHighlight.allowedCharacters" in the settings.

Entering accented characters with notepad++ using only the keyboard

I am new to notepad++ and like it very much, since I can customize how my text documents look more easily than with wordpad. However, I would like to know if it’s possible to enter accented characters like in wordpad (I thought it was a windows thing, but perhaps it isn’t). In wordpad, I can type, for instance, ctrl-’ then i to get an accented í character. Similarly, I can type ctrl-shift-~ then n to get the accented ñ character. It makes it much easier to enter accented characters than copying and pasting from the character map application, or trying to remember code points. When I tried this method in notepad++ I just got the plain character without the accents. I should also mention that when I open documents with such accented characters already present they appear just as expected. Is there a way to enter accented characters like this in notepad++ using only the keyboard? I am using the latest notepad++ under Windows 7.
In Notepad++ you can go to “Edit” then select “Character Panel” near the bottom of the drop down menu. It will show you the ASCII set available which includes most accented characters. You find the character you want and there will be a number for it, to easily use that, press and hold your ALT key, then, on your keypad on the right side of your keyboard type zero followed by the number for that character. So for something like “ñ” for example, the code for it is 241, so you would press ALT and then type 0241 on the keypad while holding down ALT and you will get the character you need. That works in most Windows programs, even in here.
This only works for ASCII characters in the range of 0 to 255. I don't know of a method other than copying and pasting from the “Character Map” app available in Windows for Unicode. Though I did test Wordpad with the Decimal number of the Hex value you see for a Unicode character above 255 and it will work with the ALT+#### in there, and probably other places, but it doesn't work in Notepad or Notepad++ for some strange reason, sadly. Two I use a lot and have memorized are ALT+0147 and ALT+0148 for the quotation marks “like these”, so once you use the numbers enough you tend to get used to them, or you can jot down the ones you use the most.
For anyone searching for a solution and coming across this page, try this (Windows): install and use the US International keyboard instead of the plain US keyboard. Search for "windows keyboard us international install" or something similar. I liked the techlanguage.com write-up on it and the teckangaroo.com step by step on how to install. Hope this helps someone in future looking around as I was earlier today for how to easily meet this need.
You can make your own keyboard layout to enter arbitrary characters anywhere in Windows, using MSKLC. Here's one I made earlier.
I think it is configured in the input method. With input method containing the characters you mentioned, you can press key combinations to get special letters.
You can add a keyboard layout preset in Windows. Under "Language and Regions" - "Language" - "Language settings" - "Input method" settings in Control Panel, you can add all what you want. Like this:
Switch keyboard layout with Alt + Shift.

Is there a way to make emacs ispell/aspell ignore shoft hyphens in HTML?

I write mostly my documentation in HTML using emacs as my main editor. Emacs let you interactively spell-check the current buffer with the command ispell-buffer. (I think the underlying program used for doing the spell-check is named aspell.)
When emacs is in HTML-mode, text is stripped for all HTML markup before the remaining text is being spell-checked.
However, soft-hyphen entities (­ or ­) are not stripped, so a word that is written as speci­fies in the HTML text is spell-checked as two separate words (speci and fies) which is not what is wanted.
Is there a way to make emacs ispell/aspell ignore shoft hyphens in HTML?
Or can anyone suggest an elisp function that will strip soft hyphens out of the HTML text before it being handed over to aspell for spell-checking?

How can I clean source code files of invisible characters?

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.
The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.
How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.
You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.
To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.
On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:
:set nobomb
and save the file. Presto!
The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.
If you are using Textmate and the problem is in a UTF-8 file:
Open the file
File > Re-open with encoding > ISO-8859-1 (Latin1)
You should be able to see and remove the first character in file
File > Save
File > Re-open with encoding > UTF8
File > Save
It works for me every time.
It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:
grep -rn $'\xFEFF' *
It will show you the line numbers and filenames containing BOM.
In Notepad++, there is an option to show all characters. From the top menu:
View -> Show Symbol -> Show All Characters
I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.
See "Comparison of hex editors" in WikiPedia.
I know it is a little late to answer to this question, but I am adding how to change encoding in Visual Studio, hope it will be helpfull for someone who will be reading this sometime:
Go to File -> Save (your filename) as...
And in File Explorer window, select small arrow next to the Save button -> click Save with Encoding...
Click Yes (on Do you want to replace existing file dialog)
And finally select e.g. Unicode (UTF-8 without signature) - that removes BOM