What LaTeX command creates an emdash? - unicode

I know I can create an em-dash with --- (and an en-dash with --). However, I want to use these glyphs in my Unicode setup, and trying it as follows:
\DeclareUnicodeCharacter{2012}{--}
\DeclareUnicodeCharacter{2013}{--}
\DeclareUnicodeCharacter{2014}{---}
simply yields series of two or three dashes in the output. What should I use instead? I tried \endash and \ndash, but those are not known commands.

\textemdash and \textendash

Related

Unicode variable labels

I have a dataset but its variable labels are in unicode like this:
I tried typing:
unicode
However, this simply displays the following:
How can I correctly display the unicode?
Or, at least, is there any method I can see the labels using another program?
Assuming your data are in a file data.dta in your working directory (where Stata should point):
clear
unicode encoding set euc-kr
unicode translate data.dta
Type help encodings from Stata's command prompt for details regarding the different formats.

Which writing systems does given character set cover

What is the simplest way to figure out which writing systems (as in, Latin, Hebrew, Arabic, Katakana, Chinese characters) are supported by a given set of Unicode characters?
Inspect the Script and Script_Extensions properties of each character in the set, as documented in UAX #24.
Scripts and Blocks:
Unicode characters are divided into non-overlapping ranges called
blocks [Blocks]. Many of these blocks have a name derived from
a script name, because characters of that script are primarily encoded
in that block. However, blocks and scripts differ in the following
ways:
Blocks are simply ranges, and often contain code points that are unassigned.
Characters from the same script may be encoded in several different blocks.
Characters from different scripts may be encoded in the same block.
As a result, using the block names as simplistic substitute for script
identity generally leads to poor results. For example, see Annex A,
Character Blocks, in Unicode Technical Standard #18, "Unicode Regular
Expressions" [UTS18].
Inside the latter document [UTS18], pay your priority attention to Writing Systems Versus Blocks in Annex A: Character Blocks.
At this point I’m leaning towards testing if enough glyphs from a script appear in the character set.
The approach would require two preparatory steps:
Put together a set of writing systems (scripts) supported by Unicode
For each script, define a character set containing characters of that script
Then I can solve the question “does character set A supports script X” by a test “are enough characters of script X’s character set also members of character set A”. If I do that for every script from step (1), I get a list of supported scripts.
The link provided by 一二三 references a data file that maps Unicode characters to their respective scripts, which would be invaluable in steps (1) and (2).

How to escape '|" character into an org-mode table

I am building some tables in org-mode and I need to enter "||" into the table (for the logical OR command) and nothing I try turns the two characters off as table constructors.
I've tried single quotes, double quotes, backticks and prefacing them with '\'. I've also tried every permutation of using ':=' to get a literal string and they don't work.
// Tony Williams
Depending on what you want to do with the output of the table, you could use alternative unicode characters that look like vertical pipes (or double vertical pipes). Examples:
This is the pipe character written twice (as for logical OR):
||
Those are similar (or not too different) looking unicode characters
‖ - ¦¦ - ❘❘
Of course, this won't work for you if you are not just interested in the looks (but escaping pipes would not work either).
See here more unicode characters you might like better than those 3 above
It turns out that you can use HTML entities in org-mode tables for output via pandoc.
\vert{} doesn't work but I went to the table pointed to by MrSpock and tried the HTML entity and the output when run through pandoc was perfect. || gives me '||'. I also tested a few other HTML entities and they also worked fine.
Well, if the goal is to export your notes, then
$\lvert\mathbb{N}\rvert$
would be an equivalent of
$|\mathbb{N}|$
Character is: \vert
Example: a \vert\ b -> a | b

How to produce Unicode characters with Matlab LaTex interpreter

I have the following line of code
ylabel('Średnia wartość parametru $f_{max}$','Interpreter','latex');
and would like to use it as a label for my plot. Unfortunately what I actually get is:
Warning: Unable to interpret LaTeX string
If I remove Unicode characters like so:
ylabel('Srednia wartosc parametru $f_{max}$','Interpreter','latex');
it works with no problem.
So how could I make Matlab print those unfortunate characters?
Use LaTeX representations for those characters: \'S, \'s, etc. And don't forget to duplicate quote signs within the string:
ylabel('\''Srednia warto\''s\''c parametru $f_{max}$','Interpreter','latex')

How to use Unicode characters in a vim script?

I'm trying to get vim to display my tabs as ⇥ so they cannot be mistaken for actual characters. I'd hoped the following would work:
if has("multi_byte")
set lcs=tab:⇥
else
set lcs=tab:>-
endif
However, this gives me
E474: Invalid argument: lcs=tab:⇥
The file is UTF-8 encoded and includes a BOM.
Googling "vim encoding" or similar gives me many results about the encoding of edited files, but nothing about the encoding of executed scripts. How to get this character into my .vimrc so that it is properly displayed?
The tab setting requires two characters. From :help listchars:
tab:xy Two characters to be used to show a tab. The first
char is used once. The second char is repeated to
fill the space that the tab normally occupies.
"tab:>-" will show a tab that takes four spaces as
">---". When omitted, a tab is show as ^I.
Something like :set lcs=tab:⇥- works but kind of defeats your purpose as it results in tabs that look like ⇥--- instead of ---⇥ which I'm assuming is probably what you wanted.
Try:
set lcs=tab:⇥\
Make certain to put a space after the '\' so you can escape the space.