Show info about current character in status bar in Sublime Text 2 - character

I'm missing one useful feature which others text editors often offer. In bottom status bar they show ASCII and UTF code of current character - character before or after current position (not sure now). I cannot find package doing that or native feature that does that.
Thank you for your help.

I made a plugin for this :)
Create a anyname.py file in your Packages/User/ directory.
import sublime, sublime_plugin, textwrap, unicodedata
class utfcodeCommand(sublime_plugin.EventListener):
def on_selection_modified(self, view):
# some test chars = $ €
sublime.status_message('Copying with pretty format')
selected = view.substr(view.sel()[0].a)
char = str(selected)
view.set_status('Charcode', "ASCII: " + str(ord(selected)) + " UTF: " + str(char.encode("unicode_escape"))[2:-1])
This should show you the ASCII and Unicode code in the status bar of the character to the right of the caret.
Tell me if this works for you, tested with ST3 on Kubuntu Linux 12.04 x64.
Probably won't work on ST2 because of the different Python versions.

Here is one such plugin, it displays the character code in decimal: Show Character Code
Simple Sublime Text plugin for displaying decimal code of the current character in the status bar
Although it shows only the decimal value for the character code

I ran into several issues with the code posted by Sergey Telshevsky in ST2 / Python 2.7:
I got a SyntaxError: Non-ASCII character '\xe2' in file ./display_character_code.py on line 7 because of the # some test chars = $ € - removing this commented out code, or declaring a character encoding at the top of the Python code, e.g. # -*- coding: UTF-8 -*- gets rid of the error. I also got UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' when selecting the sample "€" (because it is not an ASCII character). And even after fixing these, the Unicode key was never displayed; e.g. the status bar showed ASCII: 123 UTF:. So I reworked his example and came up with the following:
import sublime_plugin
class statusCharCodes(sublime_plugin.EventListener):
def on_selection_modified(self, view):
selected = view.substr(view.sel()[0].a)
try:
ascii = str(ord(selected.encode("ascii"))).zfill(3)
except:
ascii = "n/a"
try:
utf = "U+" + str(format(ord(selected),"x")).zfill(4).upper()
except:
utf = "n/a"
view.set_status("Charcode", "ASCII: " + ascii + " UTF: " + utf)
Example output:

Related

kdb+/q script how to encode-decode string

I am looking for encoding/decoding a string in q script
.Q.x10,.Q.j10,.Q.x12 and .Q.j12 does not seem to meet requirement.
e.g.
I want to encode "Hello world" and I should be able to decode it further
Your issue is that the default .Q.j10 and .Q.x10 don't allow for the space character " " since the space character is not in the default alphabet used:
q).Q.j10
64/:?["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"]
If you look at the "tip" comment in the official documentation: https://code.kx.com/q/ref/dotq/#qj10-encode-binhex
you'll see that they suggest creating your own .Q.j10/.Q.x10 functions where the space character is the first character in your custom alphabet. Your alphabet still has to be only 64 characters though, so you would have to get rid of either the + or / (or replace them with another character of your choosing).
A similar question came up on the k4 topicbox on August 16th 2019 (subject "b64 encode") where Geo Carncross came up with this solution for b64 decoding:
q).Q.btoa"Hello World!"
"SGVsbG8gV29ybGQh"
q)
q){c:sum x="=";neg[c]_"c"$raze 256 vs'64 sv'0N 4#.Q.b6?x}"SGVsbG8gV29ybGQh"
"Hello World!"
I haven't tested the latter though.

SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape

Using Autokey 95.8, Python 3 version in Linux Mint 19.3 and I have a series of keyboard macros which generate Unicode characters. This example works:
# alt+shift+a = á
import sys
char = "\u00E1"
keyboard.send_keys(char)
sys.exit()
But the attempt to print an mdash [—] generates the following error:
SyntaxError:(unicode error) 'unicodeescape' codec' can't decode bytes in position 0-5: truncated \UXXXXXXXX escape
# alt+shift+- = —
import sys
char = "\u2014"
keyboard.send_keys(char)
sys.exit()
Any idea how to overcome this problem in Autokey is greatly appreciated.
The code you posted above would not generated the error you ae getting - "truncated \UXXXXXXXX" needs an uppercase \U - and 8 hex-digits - if you try putting in the Python source char = "\U2014", you will get that error message (and probably it you got it when experimenting with the file in this way).
The sequence char = "\u2014" will create an mdash unicode character on the Python side - but that does not mean it is possible to send this as a Keyboard sybo via autokey to Windows. That is the point your program is likely failing (and since there is no programing error, you won't get a Python error message - it is just that it won't work - although Autokey might be nice and print out some apropriate error message in this case).
You'd have to look around on how to type an arbitrary unicode character on your S.O. config (on Linux mint it should be on the docs for "wayland" I guess), and send the character composign sequence to Autokey instead. If there is no such a sequence, then finding a way to copy the desired character to the window environment clipboard, and then send Autokey the "paste" sequence (usually ctrl + v - but depending on the app it could change. Terminal emulators use ctrl + shift + v, for example)
When you need to emit non-English US characters in AutoKey, you have two choices. The simplest is to put them into the clipboard with clipboard.fill_clipboard(your characters) and paste them into the window using keyboard.send_keys("<ctrl>+v"). This almost always works.
If you need to define a phrase with multibyte characters in it, select the Paste using Clipboard (Ctrl+V) option. (I'm trying to get that to be the default option in a future release.)
The other choice, that I'm still not quite sure of, is directly sending the Unicode escape sequence to the window, letting it convert that into the actual Unicode character. Something like keyboard.send_keys("\U2014"). Assigning that to a variable first, as in the question, creates the actual Unicode character which that API call can't handle correctly.
The problem being that the underlying code for keyboard.send_keys() wants to send keycodes that actually exist on your keyboard or that it can add to an unused key in your layout. Most of the time that doesn't work for anything multibyte.

I need to remove a specific unicode in my existing subtitle text file

I basically work on subtitles and I have this arabic file and when I open it up on notepad and right click and select SHOW UNICODE CONTROL CHARACTERS I give me some weird characters on the left of every line. I tried so many ways to remove it but failed I also tried NOTEPAD++ but failed.
Notepad ++
SUBTITLE EDIT
EXCEL
WORD
288
00:24:41,960 --> 00:24:43,840
‫أتعلم، قللنا من شأنك فعلاً‬
289
00:24:44,000 --> 00:24:47,120
‫كان علينا تجنيدك لتكون جاسوساً‬
‫مكان (كاي سي)‬
290
00:24:47,280 --> 00:24:51,520
‫لا تعلمون كم أنا سعيد‬
‫لسماع ذلك‬
291
00:24:54,800 --> 00:24:58,160
‫لا تقلق، سيستيقظ نشيطاً غداً‬
292
00:24:58,320 --> 00:25:00,800
‫ولن يتذكر ما حصل‬
‫في الساعات الـ٦‬
the unicodes are not showing in this the unicode is U+202B which shows a ¶ sign, after googling it I think it's called PILCROW.
The issue with this is that it doesn't display subtitles correctly on ps4 app.
I need this PILCROW sign to go away. with this website I can see the issue in this file https://www.soscisurvey.de/tools/view-chars.php
The PILCROW ¶ is used by various software and publishers to show the end of a line in a document. The actual Unicode character does not exist in your file so you can't get rid of it.
The Unicode characters in these lines are 'RIGHT-TO-LEFT EMBEDDING'
(code \u202b) and 'POP DIRECTIONAL FORMATTING' (code \u202c) -
these are used in the text to indicate that the included text should be rendered
right-to-left instead of the ocidental left-to-right direction.
Now, these characters are included as hints to the application displaying the text, rather than to actually perform the text reversing - so they likely can be removed without compromising the text displaying itself.
Now this a programing Q&A site, but you did not indicate any programming language you are familiar with - enough for at least running a program. So it is very hard to know how give an answer that is suitable to you.
Python can be used to create a small program to filter such characters from a file, but I am not willing to write a full fledged GUI program, or an web app that you could run there just as an answer here.
A program that can work from the command line just to filter out a few characters is another thing - as it is just a few lines of code.
You have to store the follwing listing as a file named, say "fixsubtitles.py" there, and, with a terminal ("cmd" if you are on Windows) type python3 fixsubtitles.py \path\to\subtitlefile.txt and press enter.
That, of course, after installing Python3 runtime from http://python.org
(if you are on Mac or Linux that is already pre-installed)
import sys
from pathlib import Path
encoding = "utf-8"
remove_set = str.maketrans("\u202b\u202c")
if len(sys.argv < 2):
print("Usage: python3 fixsubtitles.py [filename]", file=sys.stderr)
exit(1)
path = Path(sys.argv[1])
data = path.read_text(encoding=encoding)
path.write_text(data.translate("", "", remove_set), encoding=encoding)
print("Done")
You may need to adjust the encoding - as Windows not always use utf-8 (the files can be in, for example "cp1256" - if you get an unicode error when running the program try using this in place of "utf-8") , and maybe add more characters to the set of characters to be removed - the tool you linked in the question should show you other such characters if any. Other than that, the program above should work

ZPL Utf-8 characters dissapearing

I've encountered a problem when trying to print a simple ZPL string.
My ZPL contains some UTF-8 characters like so:
^XA
^FT16,591^A0N,34^FH^FVM_F6lntorp^FS
^FT16,626^A0N,34^FH^FVV_E4gen^FS
^XZ
This should print out Mölntorp (_F6 = ö) and Vägen (_E4 = ä). And it does.
BUT, here comes the problem, I tried adding a danish ø (_F8 = ø), like so:
^XA
^FT16,626^A0N,34^FH^FVK_F8benhavnsvej
^XZ
But what comes out is K°benhavnsvej (which corresponds to _F8 = ° in CP-850). I have no clue why it successfully translates one hex code and then mucks up on the other one, since they should both be using the same encoding table. (None specified)
If I add ^CI28 below the starting ^XA tag, the UTF-8 characters simply vanish, and the output is just Kbenhavnsvej
I hope someone could give me input on why this is happening. It's frustrating.
^XA
^FT16,626^CI4^A0N,34^FH^FVK_7Cbenhavnsvej
^XZ
I haven't tried this - but it should work in theory.
^CI4 selects international character set for Denmark; character 7C should be the character you require (5C for upper-case)
This may also be the font you are using, the font simply might not contain this in its character set. You may have to use the Swiss Eastern European font.

Show a character's Unicode codepoint value in Eclipse

I have a UTF-8 text file open in Eclipse, and I'd like to find out what a particular Unicode character is. Is there a function to display the Unicode codepoint of the character under the cursor?
I do not think there is yet a plugin doing exactly what you are looking for.
I know about a small plugin able to encode/decode a unicode sequence:
The sources (there is not even a fully built jar plugin yet) are here, with its associated tarball: you can import it as a PDE plugin project a,d test it in your eclipse.
You can also look-up a character in the Unicode database using the Character Properties Unicode Utility at http://unicode.org/. I've made a Firefox Search Engine to search via that utility. So, just copy-and-paste from your favourite editor into the search box.
See the list of online tools at http://unicode.org/. E.g. it lists Unicode Lookup by Jonathan Hedley.
Here's a Python script to show information about Unicode characters on a Windows clipboard. So, just copy the character in your favourite editor, then run this program.
Not built-in to Eclipse, but it's what I'll probably use when I haven't got a better option.
"""
Print information about Unicode characters on the Windows clipboard
Requires Python 2.6 and PyWin32.
For ideas on how to make it work on Linux via GTK, see:
http://mrlauer.wordpress.com/2007/12/31/python-and-the-clipboard/
"""
import win32con
import win32clipboard
import unicodedata
import sys
import codecs
from contextlib import contextmanager
MAX_PRINT_CHARS = 1
# If a character can't be output in the current encoding, output a replacement e.g. '??'
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout, errors='replace')
#contextmanager
def win_clipboard_context():
"""
A context manager for using the Windows clipboard safely.
"""
try:
win32clipboard.OpenClipboard()
yield
finally:
win32clipboard.CloseClipboard()
def get_clipboard_text():
with win_clipboard_context():
clipboard_text = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT)
return clipboard_text
def print_unicode_info(text):
for char in text[:MAX_PRINT_CHARS]:
print(u"Char: {0}".format(char))
print(u" Code: {0:#x} (hex), {0} (dec)".format(ord(char)))
print(u" Name: {0}".format(unicodedata.name(char, u"Unknown")))
try:
clipboard_text = get_clipboard_text()
except TypeError:
print(u"The clipboard does not contain Unicode text")
else:
print_unicode_info(clipboard_text)