How can I clean source code files of invisible characters? - unicode

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.
The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (duh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.
How can I find characters by charcode across files or something like that? I'm open to different tools, but they have to work on Mac OS X.

You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.
To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.
On googling, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:
:set nobomb
and save the file. Presto!
The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.

If you are using Textmate and the problem is in a UTF-8 file:
Open the file
File > Re-open with encoding > ISO-8859-1 (Latin1)
You should be able to see and remove the first character in file
File > Save
File > Re-open with encoding > UTF8
File > Save
It works for me every time.

It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:
grep -rn $'\xFEFF' *
It will show you the line numbers and filenames containing BOM.

In Notepad++, there is an option to show all characters. From the top menu:
View -> Show Symbol -> Show All Characters

I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.
See "Comparison of hex editors" in WikiPedia.

I know it is a little late to answer to this question, but I am adding how to change encoding in Visual Studio, hope it will be helpfull for someone who will be reading this sometime:
Go to File -> Save (your filename) as...
And in File Explorer window, select small arrow next to the Save button -> click Save with Encoding...
Click Yes (on Do you want to replace existing file dialog)
And finally select e.g. Unicode (UTF-8 without signature) - that removes BOM

Related

How to convert embedded CRLF codes to their REAL newlines in Vscode?

I searched everywhere for this, the problem is that the search criteria is very similar to other questions.
The issue I have is that file (script actually) is embedded in another file. So when I open the parent file I can see the script as massive string with several \n and \r\n codes. I need a way to convert these codes to what they should be so that it formats the code correctly then I can read said code and work on it.
Quick snippet:
\n\n\n\n\nlocal scriptingFunctions\n\n\n\n\nlocal measuringCircles = {}\r\nlocal isCurrentlyCheckingCoherency
Should covert to:
local scriptingFunctions
local measuringCircles = {}
local isCurrentlyCheckingCoherency
perform a Regex Find-Replace
Find: (\\r)?\\n
Replace: \n
If you don't need to reconvert from newlines to \n after you're done working on the code, you can accomplish the trick by simply pressing ctrl-f and substituting every occurrence of \n with a new line (you can type enter in the replace box by pressing ctrl-enter or shift-enter).
See an example ctrl-f to do this:
If after you're done working on the code you need to reconvert to \n, you can add an invisible char to the replace string (typing it like ctrl-enter invisibleChar), and after you're done you can re-replace it with \n.
There's plenty of invisible chars, but I'd personally suggest [U+200b] (you can copy it from here); another good one is [U+2800] (⠀), as it renders as a normal whitespace, and thus is noticeable.
A thing to notice is that recent versions of vscode will show a highlight around invisible chars, but you can easily disable it by clicking on Adjust settings and then selecting Exclude from being highlighted.
If you need to reenable highlighting in the future, you'll have to look for "editor.unicodeHighlight.allowedCharacters" in the settings.

\u0119 to ę in notepad++

I have searched all over the Internet for an answer. I have achieved this once before, but I can't remember how I did this...
I have a long text file with alot of encoded characters, for example
\u0119,\u015b\\u0107
How do I change characters like \u0119 to ę, etc?
This question is not off-topic. In past I also use notepad++ for programing. Today i use Atom. You can find a lot questions about notepad++ in stack overflow - for instance: Removing duplicate rows in Notepad++ or this Convert tabs to spaces in Notepad++ (and many more). So please do not give minus points to this question.
Answer: I assume that when you go to menu>Encoding you will see 'Encde in UTF-8.'
I use this site to create part of my answer: https://superuser.com/questions/576431/notepad-inserting-special-unicode-characters-in-utf-8
If you see character codes like \u0119,\u015b\u0107 in your file this probably mean that they are just on encoded - and their codes are put expliicty as raw text.
So to change this codes into UTF-8 characters, go to
menu>run>run> type: charmap> click run
the windows charmap will show up, so check ''advanced view' an there put you character code (without \u prefix - so for instance only 0119) in filed 'go to Unicode'. Then click on 'select' and 'copy' and close window
Then go to menu>search>replace and in filed 'replece with' past you character, and inf filed 'find what' put its code (with prefix, for instance \u0119). And click 'Replace All'
Do steps 1-3 for each character code (you can check thad your done when you click menu>find> and type '\u' in "find what". If you not find any code then you job is end.

Entering accented characters with notepad++ using only the keyboard

I am new to notepad++ and like it very much, since I can customize how my text documents look more easily than with wordpad. However, I would like to know if it’s possible to enter accented characters like in wordpad (I thought it was a windows thing, but perhaps it isn’t). In wordpad, I can type, for instance, ctrl-’ then i to get an accented í character. Similarly, I can type ctrl-shift-~ then n to get the accented ñ character. It makes it much easier to enter accented characters than copying and pasting from the character map application, or trying to remember code points. When I tried this method in notepad++ I just got the plain character without the accents. I should also mention that when I open documents with such accented characters already present they appear just as expected. Is there a way to enter accented characters like this in notepad++ using only the keyboard? I am using the latest notepad++ under Windows 7.
In Notepad++ you can go to “Edit” then select “Character Panel” near the bottom of the drop down menu. It will show you the ASCII set available which includes most accented characters. You find the character you want and there will be a number for it, to easily use that, press and hold your ALT key, then, on your keypad on the right side of your keyboard type zero followed by the number for that character. So for something like “ñ” for example, the code for it is 241, so you would press ALT and then type 0241 on the keypad while holding down ALT and you will get the character you need. That works in most Windows programs, even in here.
This only works for ASCII characters in the range of 0 to 255. I don't know of a method other than copying and pasting from the “Character Map” app available in Windows for Unicode. Though I did test Wordpad with the Decimal number of the Hex value you see for a Unicode character above 255 and it will work with the ALT+#### in there, and probably other places, but it doesn't work in Notepad or Notepad++ for some strange reason, sadly. Two I use a lot and have memorized are ALT+0147 and ALT+0148 for the quotation marks “like these”, so once you use the numbers enough you tend to get used to them, or you can jot down the ones you use the most.
For anyone searching for a solution and coming across this page, try this (Windows): install and use the US International keyboard instead of the plain US keyboard. Search for "windows keyboard us international install" or something similar. I liked the techlanguage.com write-up on it and the teckangaroo.com step by step on how to install. Hope this helps someone in future looking around as I was earlier today for how to easily meet this need.
You can make your own keyboard layout to enter arbitrary characters anywhere in Windows, using MSKLC. Here's one I made earlier.
I think it is configured in the input method. With input method containing the characters you mentioned, you can press key combinations to get special letters.
You can add a keyboard layout preset in Windows. Under "Language and Regions" - "Language" - "Language settings" - "Input method" settings in Control Panel, you can add all what you want. Like this:
Switch keyboard layout with Alt + Shift.

Automatic EOL conversion in Eclipse

Need to keep EOL format consistent in all resources under Eclipse workspace.
I know about Eclipse preference that sets new line style for newly created files, but I would like to have automatic conversion for already existing files. Is there some settings/plugins?
I want just setup once and be sure that all line endings are in the same format.
In addition to the Window > Preferences > General > Workspace setting for new files that you already know about, there is a File > Convert Line Delimiters To option. I don't know of any existing plugin/tool that will do this automatically when you save, but you could certainly write one or make converting the line ending part of your process.
To make it easier on yourself, you can bind keyboard shortcuts to the conversion commands by going to Window > Preferences > General > Keys and filtering using "delimiter":
In Eclipse, to convert the line endings for existing files:
Go to the file browser view, and click on the project/folder/file that you wish to convert.
From the menu bar, select File > Convert Line Delimiters To > Windows / Unix / MacOS 9.
You can Search your resources with the Search-Dialog and go to the tab File Search. There you can enter a Regular expression. Enter \r\n or whatever line ending you want to change.
Then hit the Replace .. Button instead of Search.
I want just setup once and be sure that all line endings are in the same format.
... ok, my answer does not consider this.
You might get usefull results with Eclipse save actions: If the eclipse formatter also converts the EOL style, you could use it to modify EOL style only for the files you are modifying.
Unfortunately I don't have eclipse here, so I can't test if this actually works. Worth a try, however.

How do you display code snippets in MS Word preserving format and syntax highlighting?

Does anyone know a way to display code in Microsoft Word documents that preserves coloring and formatting? Preferably, the method would also be unobtrusive and easy to update.
I have tried to include code as regular text which looks awful and gets in the way when editing regular text. I have also tried inserting objects, a WordPad document and Text Box, into the document then putting the code inside those objects. The code looks much better and is easier to avoid while editing the rest of the text. However, these objects can only span one page which makes editing a nightmare when several pages of code need to be added.
Lastly, I know that there are much better editors/formats that have no problem handling this but I am stuck working with MS word.
Here is the best way, for me, to add code inside word:
Go to Insert tab, Text section, click Object button (it's on the right)
Choose OpenDocument Text which will open a new embedded word document
Copy and paste your code from Visual Studio / Eclipse inside this embedded word page
Save and close
Advantages
The result looks very nice. Here are the advantages of this method:
The code keeps its original layout and colors
The code is separated from the rest of the document, as if it was a picture or a chart
Spelling errors won't be highlighted in the code (this is cool !)
And it takes only few seconds.
Download and install Notepad++ and do the following:
Paste your code in the window;
Select the programming language from the language menu;
Select the text to copy;
Right click and select Plugin commands -> Copy Text with Syntax Highlighting;
Paste it into MS Word and you are good to go!
Update 29/06/2013:
Notepad++ has a plugin called "NppExport" (comes pre-installed) that allows you to copy to RTF, HTML and ALL. It permits dozens of languages, whereas the aforementioned IDEs are limited to a handful each (without other plug-ins).
I use Copy all formats to clipboard and "paste as HTML" in MS word.
After reading a lot of related answers, I came across my own solution, which for me is the most suitable one.
Result looks like this:
As you can see, it is the same syntax highlighting like on Stack Overflow which is quite awesome.
Steps to reproduce:
on Stack Overflow
Goto Ask Question (preferably with Chrome)
Paste Code and add a language tag (e.g. Java) to get syntax hightlighting
Copy code from preview
in Word
Insert > Table > 1x1
Paste code (you may need to use Paste Special... > Formatted Text (RTF) from the Edit menu to not lose the syntax hilighting)
Table Design > Borders > No Border
Select code > Edit > Find > Replace
Search Document ^p (Paragraph Mark)
Replace With ^l (Manual Line Break)
(This is required to remove the gaps between some lines)
Select code again > Review > Language > check "Do not check spelling or grammar"
Finally add a caption using References > Insert Caption > New Label > name it "Listing" or sth
Sample code thanks to this guy
There is a nice Online Tool for that : https://www.troye.io/planetb/
Just copy the generated code and paste it into your word editing software. So far I've tried it on MS Word and WPS Writer, works really well.
Doesn't play nice with Firefox but works just fine on Chrome (and IE too, but who wants to use that).
One of the main benefits is that, unlike the Code Format Add-In for Word, it does NOT mess with your code, and respects various languages' syntax.
I tried many other options offered in other answers but I found this one to be the most efficient (quick and really effective).
There is also another online tool quoted in another answer (markup.su) but I find the planetB output more elegant (although less versatile).
Input :
Output :
I type my code in Visual Studio, and then copy-paste into word. it preserves the colors.
When I've done this, I've made extensive use of styles. It helps a lot.
What I do is create a paragraph style (perhaps called "Code Example" or something like that) which uses a monospaced font, carefully chosen tabs, a very light grey background, a thin black border above and below (that helps visibility a lot) and with spelling turned off. I also make sure that inter-line and inter-paragraph spacing are set right. I then create additional character styles on top (e.g., "Comment", "String", "Keyword", "Function Name Decl", "Variable Name Decl") which I layer on top; those set the color and whether the text is bold/italic. It's then pretty simple to go through and mark up a pasted example as being code and have it come out looking really good, and this is works well for short snippets. Long chunks of code probably should not normally be in something that's going to go on a dead tree. :-)
An advantage of doing it this way is that it is easy to adapt to whatever code you're doing; you don't have to rely on some IDE to figure out whatever is going on for you. (The main problem? Printed pages typically aren't as wide as editor windows so wrapping will suck...)
Maybe this is overly simple, but have you tried pasting in your code and setting the font on it to Courier New?
Try defining a style called 'code' and make it use a small fixed width font, it should look better then.
Use CTRL+SPACEBAR to reset style.
If you are using Sublime Text, you can copy the code from Sublime to MS Word preserving the syntax highlighting.
Install the package called SublimeHighlight.
In Sublime, using your cursor, select the code you want to copy, right click it, select 'copy as rtf', and paste into MS Word.
I'm using Easy Code Formatter. It's also an Office add-in. It allows you to select the coding style / and has a quick formatting button. Pretty neat.
In case you're like me and are too lazy or in a hurry and don't want to download additional software, you can use http://markup.su/highlighter/. It's very straight forward and supports several highlight themes and many programming languages. In my case I was using Visual Studio Code, which doesn't allow copying with format due to CSS involved in styling (as reported here).
Copy the text from the Preview box and then in Word go to Insert -> Textbox, paste the Preview from the website, highlight all the text, and then disable spell checking for that textbox.
This is what the code looks like finally.
The best way I found is by using the table.
Create a table with 1x1. Then copy the code and paste it.
If you're using the desktop app then it will inherit the code editor theme color and paste it accordingly, else you can change the table style to any color.
UPDATE ------------------
From Word 2021, you can directly paste the code and it will preserve the formatting. No need to create the table.
Thank you #RdC1965 for mentioning this.
This is a bit indirect, but it works very nicely. Get LiveWriter and install this plugin:
http://lvildosola.blogspot.com/2007/02/code-snippet-plugin-for-windows-live.html
Insert your code using the plugin into a blog post. Select all and copy it to Word.
It looks great and can include line numbers. It also spans pages decently.
HTH
Colby Africa
Vim has a nifty feature that converts code to HTML format preserving syntax highlighting, font style, background color and even line numbers. Run :TOhtml and vim creates a new buffer containing html markup.
Next, open this html file in a web browser and copy/paste whatever it rendered to Word. Vim tips wiki has more information.
In my experience copy-paste from eclipse and Notepad++ works directly with word.
For some reason I had a problem with a file that didn't preserve coloring. I made a new .java file, copy-paste code to that, then copy-paste to word and it worked...
As the other guys said, create a new paragraph style. What I do is use mono-spaced font like courier new, small size close to 8px for fonts, single spaced with no space between paragraphs, make tab stops small (0.5cm,1cm,..,5cm), put a simple line border around the text and disable grammar checks. That way i achieved the line braking of eclipse so I don't have to do anything more.
Hope I helped ;)
This is the simplest approach I follow. Consider I want to paste java code.
I paste the code here so that spaces, tabs and flower brackets are neatly formated http://www.tutorialspoint.com/online_java_formatter.htm
Then I paste the code got from step 1 here so that the colors, fonts are added to the code http://markup.su/highlighter/
Then paste the preview code got from step 2 to the MS word. Finally it will look like this
You can use VS code to keep code format and highlighting. Directly copy and paste code from VS.
you can simply use this Add-in on any office program.
Go to insert tab, then Get Add-ins, and search for Easy Syntax Highlighter
It supports
185 languages and 89 themes.
Automatic language detection.
Multi-language code highlighting.
Use a monospaced font like Lucida Console, which comes with Windows. If you cut/paste from Visual Studio or something that supports syntax highlighting, you can often preserve the colour scheme of the syntax highlighter.
Answer for people trying to resolve this issue in 2019:
Most answers to this question are outdated by now. I wish there was a way to reinspect old questions and answers every now and then!
The method I found for this question that works with Office 365 and its associated programs can be found here.
I'm using Word 2010 and I like copying and paste from a github gist. Just remember to keep source formatting!
I then change the font to DejaVu Sans Mono.
You can opt to copy with or without the numbering.
Copying into Eclipse and paste it in Word is also another option.
You can also use SciTE to paste code if you don't want to install heavy IDEs and then download plugins for all the code you're making. Simply choose your language from the language menu, type your code, high-light code, select Edit->Copy as RTF, paste into Word with formatting (default paste).
SciTE supports the following languages but probably has support for others: Abaqus*, Ada, ANS.1 MIB definition files*, APDL, Assembler (NASM, MASM), Asymptote*, AutoIt*, Avenue*, Batch files (MS-DOS), Baan*, Bash*, BlitzBasic*, Bullant*, C/C++/C#, Clarion, cmake*, conf (Apache), CSound, CSS*, D, diff files*, E-Script*, Eiffel*, Erlang*, Flagship (Clipper / XBase), Flash (ActionScript), Fortran*, Forth*, GAP*, Gettext, Haskell, HTML*, HTML with embedded JavaScript, VBScript, PHP and ASP*, Gui4Cli*, IDL - both MSIDL and XPIDL*, INI, properties* and similar, InnoSetup*, Java*, JavaScript*, LISP*, LOT*, Lout*, Lua*, Make, Matlab*, Metapost*, MMIXAL, MSSQL, nnCron, NSIS*, Objective Caml*, Opal, Octave*, Pascal/Delphi*, Perl, most of it except for some ambiguous cases*, PL/M*, Progress*, PostScript*, POV-Ray*, PowerBasic*, PowerShell*, PureBasic*, Python*, R*, Rebol*, Ruby*, Scheme*, scriptol*, Specman E*, Spice, Smalltalk, SQL and PLSQL, TADS3*, TeX and LaTeX, Tcl/Tk*, VB and VBScript*, Verilog*, VHDL*, XML*, YAML*.
If you are using Intellij IDEA, just copy the code from the IDE and paste it in the word document.
A web site for coloration with lots of languages.
http://hilite.me/
You can host one yourself since it is open source. The code is on github.
There really isn't a clean way to do it, and it could still look fishy based on your exact style settings.
What you could try to do is to first run a code-to-HTML conversion (there are many programs that do that), and then try to open up the HTML file with word, that might hopefully provide you with the formatted and pretty code, and then copy and paste it into your document.
I was also looking for it and ended up creating something for my code display.
Here's a good way:
Create a rectangular form and place your text inside.
Change the font to Consolas and size ~10.
Change the text font to gray near-black (gray 25%, darker 75%)
Use darker colors to highlight your text if needed and choose one to be the contour.
I have created an easier method using tables, as they are easier to create, manage, and more consistent (with the possibility to save the table's style inside the document itself), but I couldn't find a better way for code colouring scheme, sorry for that.
Steps:
Create a 3x3 table.
Select the table, and make its borders invisible ("No Borders" option), and activate "View Gridlines" option.
Make the adjustments to cells' spacing and columns' widths to get the desired aspect. (You will have to get in "Table Properties" for fine tuning).
Create a "Paragraph Style" with the name of "Code" just for your code snippets (as mentioned in https://stackoverflow.com/a/25092977/8533804)
Create another "Paragraph Style" with the name of "Code_numberline" that will be based upon the previous created style, but this you will add a numbering line in its definition (this will automate line numbering).
Apply "Code_numberline" to the first column, and "Code" to the 3 column.
Add a fill in the middle column.
Save that table style and enjoy!
The best presentation for code in documents is in a fixed-width font (as it should appear in an IDE), with either a faint, shaded background or a light border to distinguish the block from other text.
If its Java source code copy it to Visual Studio and then copy it back to Word.