Remapping <C-ö> doesn't wotk - neovim

This works:
noremap ö l
This doesn't work:
noremap <C-ö> l
It seems that <C-ö/ä/ü/etc.> can't be mapped at all, or is there a solution?

Related

What is the difference between ö and ö?

The following characters look alike. But they are not the same. I can not visually see their difference. Could anybody let me know what their difference is? Why are there two Unicode characters that are so similar?
$ xxd <<< ö
00000000: c3b6 0a ...
$ xxd <<< ö
00000000: 6fcc 880a o...
The first is a single Unicode code point, while the second is two Unicode code points. They are two forms of the same glyph (examples in Python):
import unicodedata as ud
o1 = 'ö' # '\xf6'
o2 = 'ö' # 'o\u0308'
for c in o1:
print(f'U+{ord(c):04X} {ud.name(c)}')
print()
for c in o2:
print(f'U+{ord(c):04X} {ud.name(c)}')
U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
U+006F LATIN SMALL LETTER O
U+0308 COMBINING DIAERESIS
Ensure the two strings are in the same normalization form (either composed or decomposed) for comparison:
print(ud.normalize('NFC',o1) == ud.normalize('NFC',o2))
print(ud.normalize('NFD',o1) == ud.normalize('NFD',o2))
True
True

How to insert unusual characters in emacs?

In VIM I can insert unusual characters by using digraphs:
<C-K>{char1}{char2}
for example the ¿ character is represented by the ?I digraph.
<C-K>?I
then I can define a custom list for digraphs in a separate file, but for now, I'm just going to post the content of that file:
digraph uh 601 " ə UNSTRESSED SCHWA VOWEL
digraph uH 652 " ʌ STRESSED SCHWA VOWEL
digraph ii 618 " ɪ NEAR-CLOSE NEAR-FRONT UNROUNDED VOWEL
digraph uu 650 " ʊ NEAR-CLOSE NEAR-BACK ROUNDED VOWEL
digraph ee 603 " ɛ OPEN-MID FRONT UNROUNDED VOWEL
digraph er 604 " ɜ OPEN-MID CENTRAL UNROUNDED VOWEL
digraph oh 596 " ɔ OPEN-MID BACK ROUNDED VOWEL
digraph ae 230 " æ NEAR-OPEN FRONT UNROUNDED VOWEL
digraph ah 593 " ɑ OPEN BACK UNROUNDED VOWEL
digraph th 952 " θ VOICELESS DENTAL FRICATIVE
digraph tH 240 " ð VOICED DENTAL FRICATIVE
digraph sh 643 " ʃ VOICELESS POSTALVEOLAR FRICATIVE
digraph zs 658 " ʒ VOICED POSTALVEOLAR FRICATIVE
digraph ts 679 " ʧ VOICELESS POSTALVEOLAR AFFRICATE
digraph dz 676 " ʤ VOICED POSTALVEOLAR AFFRICATE
digraph ng 331 " ŋ VOICED VELAR NASAL
digraph as 688 " ʰ ASPIRATED
digraph ps 712 " ˈ PRIMARY STRESS
digraph ss 716 " ˌ SECONDARY STRESS
digraph st 794 " ̚ NO AUDIBLE RELEASE
digraph li 8255 " ‿ LINKING
They are symbols of the phonetic alphabet I frequently use in documents.
The question is: Is there a way to port the same symbols to emacs so I can use them possibly with the same letter combination "uh, uH, ii, uu" and so on?
First of all, Emacs comes with three "input methods" that let you type IPA characters, ipa-kirshenbaum, ipa-praat and ipa-x-sampa. You can see the description of them by typing C-h I (for describe-input-method), and you can switch to one of them with C-u C-\ (for toggle-input-method with a prefix argument).
If you'd rather use your own combinations, you can define your own input method:
(quail-define-package
"my-ipa-symbols" "" "IPA" t
"My IPA input method
Documentation goes here."
nil t nil nil nil nil nil nil nil nil t)
(quail-define-rules
("uh" ?ə) ; UNSTRESSED SCHWA VOWEL
("uH" ?ʌ) ; STRESSED SCHWA VOWEL
;; add more combinations here
)
Evaluate that with eval-buffer or eval-region, and then switch to the newly created input method with C-u C-\ my-ipa-symbols.
M-x insert-char will let you interactively search for a character to insert. Searching for 'schwa' brings up a set of different schwa's to choose from.
For characters I've found I like to insert often, I've added keybinding for them like this:
(global-set-key (kbd "C-<down>") (lambda () (interactive) (insert "↓")))
where I just copy-and-pasted the character I want into that string there. Looking at the docs, you should be able to create a keybinding using insert char with the name or the hex key of the character you want, as well: https://www.gnu.org/software/emacs/manual/html_node/emacs/Inserting-Text.html
A nicer alternative to M-x insert-char is to use helm-ucs (or alternatively helm-unicode). This brings up a nice list of unicode characters in a helm interface. You can enter words of the name in any order (eg "alpha small greek") to choose from characters matching those strings.
note: helm-ucs takes a few seconds to load the first time it's used in a session, but helm-unicode doesn't suffer from this problem.

Sorting Czech in Perl

I have the following perl program
use 5.014_001;
use utf8;
use Unicode::Collate::Locale;
require 'Unicode/Collate/Locale/cs.pl';
binmode STDOUT, ':encoding(UTF-8)';
my #old_list = (
"cash",
"Cash",
"cat",
"Cat",
"čash",
"dash",
"Dash",
"Ďash",
"database",
"Database",
);
my $col= Unicode::Collate::Locale->new(
level => 3,
locale => 'cs',
normalization => 'NFD',
);
my #list = $col->sort(#old_list);
foreach my $item (#list){
print $item, "\n";
}
This program prints out the output:
cash
Cash
cat
Cat
čash
dash
Dash
Ďash
database
Database
I believe that a careful observer would have to conclude that in Czech either
č is a first-class letter while Ď is not.
The Unicode::Collate::Locale sorting of Czech in Perl is not correct
I'd like to believe (1), and the following bolsters my case:
http://en.wiktionary.org/wiki/Index_talk:Czech
where it says:
Let us sort the entries by the existing Czech conventions, as far as practicable. That is, only the following characters have any sorting significance:
a b c č d e f g h ch i j k l m n o p q r ř s š t u v w x y z ž
But I'm confused, because I thought "D with a v over it" (and it's lowercase equivalent), is a first-class letter of the Czech alphabet.
Where is #tchrist when I need him?
I'd appreciate any insights on this.
I have not yet seen a language that would correctly order Czech or Slovak words. (Slovak is quite similar to Czech alphabet.) .NET, Java, Python, all get it wrong. The closest to the correct solution are Raku and Go.
Yes, in Czech and Slovak, ď letter comes (right) after d. There are quite a few peculiarities, such as digraphs ch, dz, dž.
#!/usr/bin/perl
use v5.30;
use warnings;
use utf8;
use Unicode::Collate::Locale;
use open ":std", ":encoding(UTF-8)";
my #words = qw/čaj auto pot márny kľak chyba drevo cibuľa džíp džem šum pól čučoriedka
banán čerešňa červený klam čierny tŕň pôst hôrny mat chobot cesnak kĺb mäta ďateľ
troska sýkorka elektrón fuj zem guma hora gejzír ihla pýr hrozno jazva džavot lom/;
my $col = Unicode::Collate::Locale->new(
level => 3,
locale => 'sk',
normalization => 'NFKC',
);
my #sort_asc = $col->sort(#words);
say "#sort_asc";
The example sorts Slovak words; it contains plenty of challenges.
$./sort_accented_words.pl
auto banán cesnak cibuľa čaj čerešňa červený čierny čučoriedka ďateľ drevo
džavot džem džíp elektrón fuj gejzír guma hora hôrny hrozno chobot chyba
ihla jazva kľak klam kĺb lom márny mat mäta pól pot pôst pýr sýkorka šum
tŕň troska zem
Perl did not order the accented words correctly. Interestingly, it correctly ordered the words with ch, dz, dž digraphs.
#!/usr/bin/raku
my #words = <čaj auto pot márny kľak chyba drevo cibuľa džíp džem šum pól čučoriedka
banán čerešňa červený klam čierny tŕň pôst hôrny mat chobot cesnak kĺb mäta ďateľ
troska sýkorka elektrón fuj zem guma hora gejzír ihla pýr hrozno jazva džavot lom>;
say #words.sort({ .unival, .NFKD[0], .fc });
This is a Raku example.
./sort_words.raku
(auto banán cesnak chobot chyba cibuľa čaj čerešňa červený čierny čučoriedka
drevo džavot džem džíp ďateľ elektrón fuj gejzír guma hora hrozno hôrny ihla
jazva klam kĺb kľak lom mat márny mäta pot pól pôst pýr sýkorka šum troska
tŕň zem)
Accented words are correctly sorted but the ch, dz, and dž digraphs are wrong.
So in my opinion, unless we create our own solution, we won't get a 100% correct output in any programming language.
A locale is just a set of rules. Here's the locale for cs from Collate::Locale 1.31. DUCET is the Default Unicode Collation Element Table.
The Ď may be a first class letter, but that's not what DUCET thinks. If you want different sorts, you can adjust your locale or supply your own.
+{
locale_version => 1.31,
entry => <<'ENTRY', # for DUCET v13.0.0
010D ; [.1FD7.0020.0002] # LATIN SMALL LETTER C WITH CARON
0063 030C ; [.1FD7.0020.0002] # LATIN SMALL LETTER C WITH CARON
010C ; [.1FD7.0020.0008] # LATIN CAPITAL LETTER C WITH CARON
0043 030C ; [.1FD7.0020.0008] # LATIN CAPITAL LETTER C WITH CARON
0063 0068 ; [.2076.0020.0002] # <LATIN SMALL LETTER C, LATIN SMALL LETTER H>
0063 0048 ; [.2076.0020.0007][.0000.0000.0002] # <LATIN SMALL LETTER C, LATIN CAPITAL LETTER H>
0043 0068 ; [.2076.0020.0007][.0000.0000.0008] # <LATIN CAPITAL LETTER C, LATIN SMALL LETTER H>
0043 0048 ; [.2076.0020.0008] # <LATIN CAPITAL LETTER C, LATIN CAPITAL LETTER H>
0159 ; [.2194.0020.0002] # LATIN SMALL LETTER R WITH CARON
0072 030C ; [.2194.0020.0002] # LATIN SMALL LETTER R WITH CARON
0158 ; [.2194.0020.0008] # LATIN CAPITAL LETTER R WITH CARON
0052 030C ; [.2194.0020.0008] # LATIN CAPITAL LETTER R WITH CARON
0161 ; [.21D3.0020.0002] # LATIN SMALL LETTER S WITH CARON
0073 030C ; [.21D3.0020.0002] # LATIN SMALL LETTER S WITH CARON
0160 ; [.21D3.0020.0008] # LATIN CAPITAL LETTER S WITH CARON
0053 030C ; [.21D3.0020.0008] # LATIN CAPITAL LETTER S WITH CARON
017E ; [.2287.0020.0002] # LATIN SMALL LETTER Z WITH CARON
007A 030C ; [.2287.0020.0002] # LATIN SMALL LETTER Z WITH CARON
017D ; [.2287.0020.0008] # LATIN CAPITAL LETTER Z WITH CARON
005A 030C ; [.2287.0020.0008] # LATIN CAPITAL LETTER Z WITH CARON
ENTRY
};
If the default sort is not working for you, this common workaround is an easy do-it-yourself:
Make a sort-array by transforming your strings: if a and á should be equivalent, transform both to a; if á should follow a, transform it into a[, for example (any character after z should be fine). Transform ch into h[, as it goes after h, if I understand correctly. Then sort the original array together with the sort-array.
Despite Czech being my native language, I don't know Czech collation perfectly. But surely, for ď, ť, ň and wovels with diacritics, the diacritics has a lower signifficance than for other Czech characters like č.
Why? This is related to pronunciation. Barring assimilation and non-native words, all consonants but d, t and n have clear pronunciation regardless of their context. (“Ch” is considered as a separate letter.) Those three letters (D, T and N) can be “softened” when they are followed by “i”, “í” or “ě”. In those cases, they are prononuced like they had a caron (háček). As a result, the diacritics for them is less signifficant.

Remove only single spaces in text file with sed, perl, awk, tr or anything

I have a rather large text file where there is an extra space between every character;
I t l o o k s l i k e t h i s .
I'd like to remove those extra characters so
It looks like this.
via the Linux terminal.
I can't seem to find anyway to do this without removing all of the whitespaces. I'm willing to try any solution at this point. I'd appreciate any nudge in the right direction.
$ echo 'I t l o o k s l i k e t h i s . ' | sed 's/\(.\) /\1/g'
It looks like this.
Are you certain that the intermediate characters are spaces? It is most likely that this is a UTF-16 file.
I suggest you use a capable editor to open it as such and convert it to UTF-8.
An awksolution
echo "I t l o o k s l i k e t h i s ." | awk '{for (i=1;i<=NF;i+=2) printf $i;print ""}' FS=""
It looks like this.
As long as it's every other character you want to get rid of, you can use python.
>>> s = "I t l o o k s l i k e t h i s ."
>>> print s[0::2]
It looks like this.
If you wanted to do this for the text file, do the following:
with open("/path/to/file.txt") as f:
f = f.readlines()
with open("/path/to/new.txt") as g:
for i in f:
g.write(str(i)[0::2]+"\n")
perl -pe 's|(\s+)| " "x (length($1)>1) |ge' file

How does uʍop-ǝpᴉsdn text work?

Here's a website I found that will produce upside down versions of any English text.
how does it work? does unicode have upside down chars? Or what?
How can I write my own text flipping function?
how does it work? does unicode have
upside down chars?
Unicode does have upside-down characters. They have "TURNED" in their name:
ƍ U+018D LATIN SMALL LETTER TURNED DELTA
Ɯ U+019C LATIN CAPITAL LETTER TURNED M
ǝ U+01DD LATIN SMALL LETTER TURNED E
Ʌ U+0245 LATIN CAPITAL LETTER TURNED V
ɐ U+0250 LATIN SMALL LETTER TURNED A
ɒ U+0252 LATIN SMALL LETTER TURNED ALPHA
ɥ U+0265 LATIN SMALL LETTER TURNED H
ɯ U+026F LATIN SMALL LETTER TURNED M
ɰ U+0270 LATIN SMALL LETTER TURNED M WITH LONG LEG
ɹ U+0279 LATIN SMALL LETTER TURNED R
ɺ U+027A LATIN SMALL LETTER TURNED R WITH LONG LEG
ɻ U+027B LATIN SMALL LETTER TURNED R WITH HOOK
ʇ U+0287 LATIN SMALL LETTER TURNED T
ʌ U+028C LATIN SMALL LETTER TURNED V
ʍ U+028D LATIN SMALL LETTER TURNED W
ʎ U+028E LATIN SMALL LETTER TURNED Y
ʞ U+029E LATIN SMALL LETTER TURNED K
ʮ U+02AE LATIN SMALL LETTER TURNED H WITH FISHHOOK
ʯ U+02AF LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL
ʴ U+02B4 MODIFIER LETTER SMALL TURNED R
ʵ U+02B5 MODIFIER LETTER SMALL TURNED R WITH HOOK
ʻ U+02BB MODIFIER LETTER TURNED COMMA
̒ U+0312 COMBINING TURNED COMMA ABOVE
ჹ U+10F9 GEORGIAN LETTER TURNED GAN
ᴂ U+1D02 LATIN SMALL LETTER TURNED AE
ᴈ U+1D08 LATIN SMALL LETTER TURNED OPEN E
ᴉ U+1D09 LATIN SMALL LETTER TURNED I
ᴔ U+1D14 LATIN SMALL LETTER TURNED OE
ᴚ U+1D1A LATIN LETTER SMALL CAPITAL TURNED R
ᴟ U+1D1F LATIN SMALL LETTER SIDEWAYS TURNED M
ᵄ U+1D44 MODIFIER LETTER SMALL TURNED A
ᵆ U+1D46 MODIFIER LETTER SMALL TURNED AE
ᵌ U+1D4C MODIFIER LETTER SMALL TURNED OPEN E
ᵎ U+1D4E MODIFIER LETTER SMALL TURNED I
ᵚ U+1D5A MODIFIER LETTER SMALL TURNED M
ᵷ U+1D77 LATIN SMALL LETTER TURNED G
ᶛ U+1D9B MODIFIER LETTER SMALL TURNED ALPHA
ᶣ U+1DA3 MODIFIER LETTER SMALL TURNED H
ᶭ U+1DAD MODIFIER LETTER SMALL TURNED M WITH LONG LEG
ᶺ U+1DBA MODIFIER LETTER SMALL TURNED V
℩ U+2129 TURNED GREEK SMALL LETTER IOTA
Ⅎ U+2132 TURNED CAPITAL F
⅁ U+2141 TURNED SANS-SERIF CAPITAL G
⅂ U+2142 TURNED SANS-SERIF CAPITAL L
⅄ U+2144 TURNED SANS-SERIF CAPITAL Y
⅋ U+214B TURNED AMPERSAND
ⅎ U+214E TURNED SMALL F
⌙ U+2319 TURNED NOT SIGN
❛ U+275B HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT
❝ U+275D HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT
⦢ U+29A2 TURNED ANGLE
Ɐ U+2C6F LATIN CAPITAL LETTER TURNED A
ⱹ U+2C79 LATIN SMALL LETTER TURNED R WITH TAIL
ⱻ U+2C7B LATIN LETTER SMALL CAPITAL TURNED E
Ꝿ U+A77E LATIN CAPITAL LETTER TURNED INSULAR G
ꝿ U+A77F LATIN SMALL LETTER TURNED INSULAR G
Ꞁ U+A780 LATIN CAPITAL LETTER TURNED L
ꞁ U+A781 LATIN SMALL LETTER TURNED L
However, it's far from a complete set. Most upside-down text works by choosing characters that happen to have a close-enough resemblance to upside-down letters. It's the equivalent of typing 0.7734 on your calculator to spell "hELLO".
does unicode have upside down chars?
Yup! Or at least characters that look like they are upside down. Also, regular English-alphabetical characters can appear to be upside down. Like u could be an upside-down n.
To code it up, you just have to take an array of characters, display them in reverse order and replace those characters with the upside down version of them. This will get you a good start: zʎxʍʌnʇsɹbdouɯןʞſıɥbɟǝpɔqɐ
When 'uʍop-ǝpısdn' is copied and echoed into a hex dump program, the string is seen as:
75 CA 8D 6F 70 2D C7 9D 70 C4 B1 73 64 6E
The UTF-8 breakdown of that is:
0x75 = U+0075 = LATIN SMALL LETTER U
0xCA 0x8D = U+028D = LATIN SMALL LETTER TURNED W
0x6F = U+006F = LATIN SMALL LETTER O
0x70 = U+0070 = LATIN SMALL LETTER P
0x2D = U+002D = HYPHEN MINUS
0xC7 0x9D = U+01DD = LATIN SMALL LETTER TURNED E
0x70 = U+0070 = LATIN SMALL LETTER P
0xC4 0xB1 = U+0131 = LATIN SMALL LETTER DOTLESS I
0x73 = U+0073 = LATIN SMALL LETTER S
0x64 = U+0064 = LATIN SMALL LETTER D
0x6E = U+006E = LATIN SMALL LETTER N
They are just unicode characters.
Look at source of web page:
function flip() {
var result = flipString(document.f.original.value);
document.f.flipped.value = result;
}
function flipString(aString) {
aString = aString.toLowerCase();
var last = aString.length - 1;
var result = "";
for (var i = last; i >= 0; --i) {
result += flipChar(aString.charAt(i))
}
return result;
}
function flipChar(c) {
if (c == 'a') {
return '\u0250'
}
else if (c == 'b') {
return 'q'
}
else if (c == 'c') {
return '\u0254' //Open o -- copied from pne
There is the ”upsidedown” python module. https://pypi.org/project/upsidedown/. And it supports non-english characters too.