Mediawiki cannot find 'three' in search - unicode

I have two installations of mediawiki on two different hosting services. When I search for 'three' it can find it in pagenames to display on the search dropdown while I am typing, but cannot find it in the pagename or body for the actual search.
If I search for 'three a' it lists pages with 'a', and then bolds 'three' on the listing.
https://2ndbook.org/w/index.php?search=Three&title=Special%3ASearch&profile=all&fulltext=1
https://sensusplenior.net/wiki/index.php?title=Special%3ASearch&profile=advanced&fulltext=Search&search=Three&ns0=1&ns6=1&ns8=1&ns10=1&ns14=1&profile=advanced
I use a Hebrew font and switch to it while typing using Win+space.
I don't have command line access to either installation.
I have tried deleting pages and recreating them. Adding 'three' to random pages.
Perhaps there is a unicode character in the body if a page which kills the search?
Added:
I have a Terminology page that stressed the limits of transclusions. Each single and double letter Hebrew entry had four or more transclusions. I have since emptied that page, and it did not resolve the issue.

'Three' is on the list of stop words. The fix is here:
https://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html
And it is a reminder that black swan events are built into the invisible assumptions that come along with all the tech we adopt without fully understanding.

Related

ICU Message Format and Interaction / Links

In my application I use the ICU message format to localize user-visible strings for different languages. A relevant (contrived) example would be:
(EN) Click on this link to find out more.
(DE) Dieser Link führt zu weiteren Informationen.
The issue I ran into is interactivity and styling for this sentence. I want the bold fragments to be clickable links in the sentence and style them differently.
The go-to solution for such cases would be to separate the bold words and make them a separate ICU message and handle this message separately in the app (to apply interactivity and styling), perhaps like a button. The problem is how this should be implemented in the context of a sentence, since the language prescribes different number and order of sentence-fragments:
(EN) (Click on) (this link) (to find out more)
(DE) (Dieser Link) (führt zu weiteren Informationen)
I could of course feed the string of the link as a parameter into an ICU message containing the surrounding sentence to obtain the sentence as one string, but then I can't apply the link-action or styling in-app, since I end up with one sentence.
What is a solution to this problem? For context: The app is written in flutter targeting mobile.

Microsoft Keyboard Layout Creator: diacritics shown wrong after install

I'm trying to make a keyboard for myself in Microsoft Keyboard Layout Creator based on German with two additional diacritics: a caron (wedge above the character) and a macron (dash above the character). I'm defining these as dead keys. Everything works fine in "Test Keyboard Layout".
Once I build and install, it works fine for the macron (dash) but the carons are displayed as breves, i.e. instead of a little wedge over the vowels, I get a little semi-circle.
I do get errors like the one below on in the verification log, but a) in the actual log, the character displays fine (i.e. the issue isn't encoding, which is set to UTF-8) and b) I also get that same error for the macron but that one works.
Here's the error:
"The dead key ̌ (U+030c) when combined with I (U+0049) returns Ǐ (U+01cf), but Ǐ (U+01cf) is not on the default system code page (1252) of the German (Germany) language you specified. This may cause compatibility problems in non-Unicode applications."
I also tried setting the language of my custom keyboard to US and a bunch of other languages in case that made a difference, but it looks like US and DE use the same code page, so it shouldn't matter.
My suspicion is that it has something to do with the limit of the language code page or unicode range or something, but then again someone seemed to have managed to get it working with a US base (same code page), so... I don't know. (And then why can I copy/paste it from other texts and insert it from the character map? Is the issue maybe directly in the keyboard driver?)
All thoughts are appreciated!
EDIT: It's now the next day and now even the dashed characters don't work anymore! Re-installed the keyboard, nothing ò_ó

Is there a "n/a" symbol in unicode?

Is there an unicode symbol for "n/a"? There are some fractions like ½, but a n/a symbol seems to be missing.
If there is none, what would be the most appropriate unicode symbol to use for n/a in a website (which should be contained in common fonts, to avoid needing a webfont)?
Looking at the Unicode code charts, I do not see a single N/A symbol. I do, however, see ⁿ (U+207F) and ₐ (U+2090), which you could separate with / (U+002F) eg: ⁿ/ₐ, or ̷ (U+0337), eg: ⁿ̷ₐ, or ̸ (U+0338), eg: ⁿ̸ₐ. Probably not what you are hoping for, though. And I don't know if "common" fonts implement them, either.
For future reference, the fastest way I know to answer questions like the OP's when I have them myself is to go to unicodelookup.com, because of the way it works: there's a search bar at the top, and you just type a string and it will return any and all unicode characters containing that string (this is also a great way to discover new and useful symbols). So in the OP's case, he could proceed like this:
first try entering "not" (without the quotes) in the search field
visually scan through the results... doing so would not reveal a "not
applicable" character in this case
try again but this time entering "applic" in the search field
again, doing so would not turn up anything along the lines of what he's
looking for
At that point he would be reasonably confident the current Unicode standard does not have a "n/a" symbol.
If you use Firefox you can define a keyword like "uni" to search that site from the URL bar, meaning any time the browser is open and regardless of what page or site is currently showing, you could do this:
hit [F6]... this moves the cursor to the URL bar at the top
type something like "uni applic" and hit [Enter]... this brings up the
unicodelookup.com website with the search results for "applic" already
showing
For the above to work you would need to define your keyword ("uni" or wtv you prefer) to point to location http://unicodelookup.com/#%s.
There's a Negative Acknowlege icon...
␕ symbol for negative acknowledge 022025 9237 0x2415 ␕
Found by searching negative on the Unicode Lookup site.
I'm not a fan, and for my purposes have just gone with __N/A__ (Markdown..)
I see lots of answers going head-on at the "Not Applicable" abbreviation, without exploring what a symbol is. A quick search for the equivalent phrase "out of scope" brings up a couple of variations on the No symbol: ⃠ – this seems to fit the bill (and since I was looking for a way to represent inapplicability, I'll be using it in my technical document).
Per the Wikipedia article, the Unicode codepoint U+20E0 is a combining character, so it is superimposed on the preceding character; e.g. ! ⃠ overlays an exclamation point. To get it to appear isolated, use a non-breaking space
If you don't want to bother with the combining symbol, the article mentions there's also an emoji U+1F6AB 🚫 but it's typically going to be colored red, or won't render!
There's actually a single character that could be repurposed for this: the "Square Na" character ㎁ (U+3381), which is used to represent the nanoampere in fullwidth (CJK) scripts.
What about the "SYMBOL FOR NULL" ␀ (U+2400)?

Facebook search manipulates keywords

When using the Facebook Graph API to search through public posts, it seems like Facebook malforms search terms in ways that are not obvious. For example, searching for 'coffee' will return only posts containing 'coffe' (mind the e). When searching for certain plural words ending with 's', Facebook removes the 's'. The problem now is that searching for certain non-English terms such as 'Wilders' (name of a person), will return results for the English word 'wilder'.
This has been asked to Facebook dev support but they claim it to be 'by design'. If this is by design however, I am wondering how I can actually search for 'coffee', 'carrots' or 'wilders' without getting my search terms malformed.
See also http://developers.facebook.com/bugs/336651873112920 and http://developers.facebook.com/bugs/591586327522090
If the last letter of each search word will be cut off, then try to add always one letter at the end of every word.

Japanese characters in a latex \section{} cause an error

I am working on getting Japanese documents created with latex. I have installed the latest version of texlive-2008 which includes CJK.
In my document I have the following:
\documentclass{class}
\usepackage{CJK}
\begin{document}
\begin{CJK*}{UTF8}{min}
\title{[Japanese Characters here 1]}
\maketitle
\section{[Japanese Characters here 2]}
[Japanese Characters here 3]
\end{CJK*}
\end{document}
In the above code there are 3 locations Japanese characters are used.
1 + 3 work fine whereas 2, which contains Japanese characters in a \section{} fails with the following error.
! Argument of \#sect has an extra }.
After some research it turns out this error manifests when you’ve put a fragile command inside a moving argument. A moving argument because section can be moved to a contents page for example.
Does anyone know how to get this to work and why latex thinks Japanese characters are "fragile".
Sorry to post this as an answer rather than a comment to your answer; I don't have enough rep yet to comment. (EDIT: Now I have enough rep to comment, but I'm not sorry anymore. Thanks Will.)
Your solution of replacing
\section{[Japanese Text]}
with
\section{\texorpdfstring{[Japanese Text]}{}}
suggests that you're using the hyperref package. When you use the hyperref package, any sort of not-totally-boring text (e.g. math) within \section causes a problem because \section is having trouble generating pdf bookmarks. \texorpdfstring allows you to specify how you want the section title to appear in the pdf bookmark. For example, I might write
\section{Calculation of \texorpdfstring{$H_2(\mathcal{X})$}{H\_2(X)}}
if I want the section title to be "Calculation of $H_2(\mathcal{X})$" but I want the pdf bookmark to be "Calculation of H_2(X)".
You should probably use xetex/xelatex, as it has been created to support unicode. The change is sometimes not easy for already existing documents, though. (xelatex should be included in texlive, it is just different executable to call -- this is how it is done in Debian).
I have managed to get this working now!
Using Latex and CJK as before.
\section{[Japanese Text]}
was replaced with
\section{\texorpdfstring{[Japanese Text]}{}}
Now the contents pages and section titles work and update fine.