How to insert unusual characters in emacs?

How to insert unusual characters in emacs? - emacs

In VIM I can insert unusual characters by using digraphs:
<C-K>{char1}{char2}
for example the ¿ character is represented by the ?I digraph.
<C-K>?I
then I can define a custom list for digraphs in a separate file, but for now, I'm just going to post the content of that file:
digraph uh 601 " ə UNSTRESSED SCHWA VOWEL
digraph uH 652 " ʌ STRESSED SCHWA VOWEL
digraph ii 618 " ɪ NEAR-CLOSE NEAR-FRONT UNROUNDED VOWEL
digraph uu 650 " ʊ NEAR-CLOSE NEAR-BACK ROUNDED VOWEL
digraph ee 603 " ɛ OPEN-MID FRONT UNROUNDED VOWEL
digraph er 604 " ɜ OPEN-MID CENTRAL UNROUNDED VOWEL
digraph oh 596 " ɔ OPEN-MID BACK ROUNDED VOWEL
digraph ae 230 " æ NEAR-OPEN FRONT UNROUNDED VOWEL
digraph ah 593 " ɑ OPEN BACK UNROUNDED VOWEL
digraph th 952 " θ VOICELESS DENTAL FRICATIVE
digraph tH 240 " ð VOICED DENTAL FRICATIVE
digraph sh 643 " ʃ VOICELESS POSTALVEOLAR FRICATIVE
digraph zs 658 " ʒ VOICED POSTALVEOLAR FRICATIVE
digraph ts 679 " ʧ VOICELESS POSTALVEOLAR AFFRICATE
digraph dz 676 " ʤ VOICED POSTALVEOLAR AFFRICATE
digraph ng 331 " ŋ VOICED VELAR NASAL
digraph as 688 " ʰ ASPIRATED
digraph ps 712 " ˈ PRIMARY STRESS
digraph ss 716 " ˌ SECONDARY STRESS
digraph st 794 " ̚ NO AUDIBLE RELEASE
digraph li 8255 " ‿ LINKING
They are symbols of the phonetic alphabet I frequently use in documents.
The question is: Is there a way to port the same symbols to emacs so I can use them possibly with the same letter combination "uh, uH, ii, uu" and so on?

First of all, Emacs comes with three "input methods" that let you type IPA characters, ipa-kirshenbaum, ipa-praat and ipa-x-sampa. You can see the description of them by typing C-h I (for describe-input-method), and you can switch to one of them with C-u C-\ (for toggle-input-method with a prefix argument).
If you'd rather use your own combinations, you can define your own input method:
(quail-define-package
"my-ipa-symbols" "" "IPA" t
"My IPA input method
Documentation goes here."
nil t nil nil nil nil nil nil nil nil t)
(quail-define-rules
("uh" ?ə) ; UNSTRESSED SCHWA VOWEL
("uH" ?ʌ) ; STRESSED SCHWA VOWEL
;; add more combinations here
)
Evaluate that with eval-buffer or eval-region, and then switch to the newly created input method with C-u C-\ my-ipa-symbols.

M-x insert-char will let you interactively search for a character to insert. Searching for 'schwa' brings up a set of different schwa's to choose from.
For characters I've found I like to insert often, I've added keybinding for them like this:
(global-set-key (kbd "C-<down>") (lambda () (interactive) (insert "↓")))
where I just copy-and-pasted the character I want into that string there. Looking at the docs, you should be able to create a keybinding using insert char with the name or the hex key of the character you want, as well: https://www.gnu.org/software/emacs/manual/html_node/emacs/Inserting-Text.html

A nicer alternative to M-x insert-char is to use helm-ucs (or alternatively helm-unicode). This brings up a nice list of unicode characters in a helm interface. You can enter words of the name in any order (eg "alpha small greek") to choose from characters matching those strings.
note: helm-ucs takes a few seconds to load the first time it's used in a session, but helm-unicode doesn't suffer from this problem.

Related

Extract words in Lua split by Unicode spaces and control characters

I'm interested in a pure-Lua (i.e., no external Unicode library) solution to extracting the units of a string between certain Unicode control characters and spaces. The code points I would like to use as delimiters are:
0000-0020
007f-00a0
00ad
1680
2000-200a
2028-2029
202f
205f
3000
I know how to access the code points in a string, for example:
> for i,c in utf8.codes("é$ \tπ😃") do print(c) end
233
36
32
9
960
128515
but I am not sure how to "skip" the spaces and tabs and reconstitute the other codepoints into strings themselves. What I would like to do in the example above, is drop the 32 and 9, then perhaps use utf8.char(233, 36) and utf8.char(960, 128515) to somehow get ["é$", "π😃"].
It seems that putting everything into a table of numbers and painstakingly walking through the table with for-loops and if-statements would work, but is there a better way? I looked into string:gmatch but that seems to require making utf8 sequences out of each of the ranges I want, and it's not clear what that pattern would even look like.
Is there a idiomatic way to extract the strings between the spaces? Or must I manually hack tables of code points? gmatch does not look up to the task. Or is it?

would require painstakingly generating the utf8 encodings for all code points at each end of the range.
Yes. But of course not manually.
local function range(from, to)
assert(utf8.codepoint(from) // 64 == utf8.codepoint(to) // 64)
return from:sub(1,-2).."["..from:sub(-1).."-"..to:sub(-1).."]"
end
local function split_unicode(s)
for w in s
:gsub("[\0-\x1F\x7F]", " ")
:gsub("\u{00a0}", " ")
:gsub("\u{00ad}", " ")
:gsub("\u{1680}", " ")
:gsub(range("\u{2000}", "\u{200a}"), " ")
:gsub(range("\u{2028}", "\u{2029}"), " ")
:gsub("\u{202f}", " ")
:gsub("\u{205f}", " ")
:gsub("\u{3000}", " ")
:gmatch"%S+"
do
print(w)
end
end
Test:
split_unicode("#\0#\t#\x1F#\x7F#\u{00a0}#\u{00ad}#\u{1680}#\u{2000}#\u{2005}#\u{200a}#\u{2028}#\u{2029}#\u{202f}#\u{205f}#\u{3000}#")

how to remove spaces and underscores from a string in kdb?

How do I remove spaces and underscores from a string?
Input String:
s:"Monday comes_after Sunday";
Expected Output:
"MondaycomesafterSunday"

Want to look at the special characters section of https://code.kx.com/v2/kb/regex/
q)s:"Monday comes_after Sunday";
q)ssr[s;"[ _]";""]
"MondaycomesafterSunday"
alternatively could use except which is generally going to be faster if only removing characters
q)s except " _"
"MondaycomesafterSunday"
q)\ts:100000 s except " _"
90 816
q)\ts:100000 ssr[s;"[ _]";""]
691 1072

Common Lisp: getting the Unicode name of a character

In CL, can I get the Unicode name of a character into a string? Is there a
function that, receiving #\α as an argument, would return "GREEK SMALL LETTER ALPHA"?

Using the cl-unicode library:
CL-USER> (cl-unicode:unicode-name #\α)
"GREEK SMALL LETTER ALPHA"
CL-USER> (cl-unicode:unicode-name 945)
"GREEK SMALL LETTER ALPHA"

The result of CHAR-NAME is not standardized, but often you'll get:
? (char-name #\α)
"Greek_Small_Letter_Alpha"
In LispWorks:
CL-USER 40 > (char-name #\α)
"U+03B1"
CL-USER 41 > (system::lookup-unicode-character-name #\α)
"GREEK SMALL LETTER ALPHA"

tranpose only the two outer words of three

How to transpose "foo" and "bar" in "foo and bar" in emacs with the least number of key strokes?
input:
foo and bar
output:
bar and foo

Another way:
A numeric prefix of 0 to M-t will transpose the word ending after the mark with the word ending after the point.
So, if ^ is mark and | is point:
f^oo and ba|r
will become, after pressing M-0 M-t:
|bar and ^foo
So, in your example, if you are typing foo and bar|, the key sequence can be C-space M-3 M-b M-0 M-t (set mark at end of line, back 3 words to foo, transpose those words).

Here's one way (starting from the beginning of the phrase): M-t M-t M-m M-t.
If there's text on the line before foo, replace M-m with M-b M-b.

Code Golf - Word Scrambler

Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.

Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)

J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '

Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".

Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,

Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/

C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}

Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.

For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.

TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l

Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.