R apparently transform accent in coded character UTF-8 - encoding

when i create on object x with a character string like "Opéra" with accent RStudio returns the object with a codification ( UTF-8 ? ) of the character
i would like to get "Opéra" instead of this code!
What can i do for this ? is it a problem with my Mac? with
the parameters of R or R Studio ?
Many thanks for your help .
I am using R version 3.6.1 and RStudio 1.1.456 on a MacBook Pro
MacOs 10.14.5
x<-"Opéra"
x
[1] "Op\303\251ra"
x<-"Opéra"
x
[1] "Op\303\251ra"

I think i found a solution to my encoding question : using the package
stringr from R and the command str_replace_all i can easily replace all
my accents in a variable (column) of a dataset by same characters without accents.
Here is how i proceed with one character then with a vector :
one Character:
x<-"Opéra"
x
[1] "Op\303\251ra"
str_replace(x,"é","e")
Opera
#Several Characters in a vector
x4<-c("Opéra","Hôtel-de-Ville","Ménilmontant")
str_replace_all(x4,c("é"="e","ô"="o"))
[1] "Opera" "Hotel-de-Ville" "Menilmontant"
I can now apply this to an entire column of a dataset
then create a new column without accents with mutate
and cancel the old column with those horrible codes like
\303\251 (representing character é)
If you have a more elegant solution i would be very interested
Thanks

Related

How to use FT_Load_Char in Arabic (compatibility characters)

This is a follow-up of this question. I'm interested by different glyphs for the same character, also known as "Unicode Compatibility Characters".
Let's take the following two Arabic "reversed-character" words: كلمة ةملك
First word is:
كلمة
in hex code:
0643 0644 0645 0629
Second word is:
ةملك
in hex code:
0629 0645 0644 0643
If I paste those two words in Microsoft Word using Deja Vu Sans, I get this:
With the following pseudo-code using FreeType2, I get:
FT_Face face;
FT_New_Face(library, "DejaVuSans.ttf", 0, &face);
FT_GlyphSlot slot;
FT_Load_Char(face, each_character, FT_LOAD_RENDER);
slot = face->glyph;
//Use slot->bitmap.buffer
FT_Done_Face(face);
What am I missing? How can I have the right glyphs depending of the context?
My key issue is that I store each "character" (I should say glyph - but for me, character was equivalent to glyph) in a table so it's going to be complicated. I'm limited in speed, not in space. Can I have two different unicode characters for the same logical character?
libraqm is a solution to get the glyth for each character depending of its position in the sentence. But I'm still interested to get the character corresponding to the glyth (I know it's not a 1-to-1 relation). For instance, there are 4 characters for the 4 glyths of the letter Kaf as stated in the comment above.

Freetype unicode on Windows

I'm using Freetype 2.5.3 on a portable OpenGL application.
My issue is that i can't get unicode on my Windows machine, while i get them correctly on linux-based systems (lubuntu, OSX, Android)
i'm using the famous arialuni.ttf (23mb) so i'm pretty sure it contains everything. In fact, i had this working in my previous Windows installation (Win7), then re-installed Win7 from another source and now unicode is not working right.
Specifically when i draw a string, then only latin are rendered while unicode are getting skipped. I dug deeper and i found that character codes are not what they should be in wstring. For example, i'm using some greek letters in the string like γ which i know it should have a code point of 947.
My engine just iterates the wstring characters and drives the above code point to another vector that holds texture coordinates so i can draw the glyph.
The problem is that on my Windows 7 machine, the wstring does not give me 947 for a γ, but instead it gives me a 179. In addition, the character of Ά returns as 2 characters of 206 code (??) instead of one of 902.
It's like simple iterating a wstring, like:
for(size_t c=0,sz=wtext.size();c<sz;c++) {
uint32_t ch = wtext[c]; // code point
...
}
This is only happening on my newly installed Win7; it worked before on another Win7 system, along with my all linux machines. Now it's broken on this, and also on my XP virtual machine.
I don't use any wide formatting functions on this, just like:
wstring wtext = L"blΆh";
In addition, i can see my glyphs being rendered correctly in my OpenGL texture, so not a font issue either. My font generator uses the greek range of ~900-950 code points to collect the glyphs.
I add the code points per language with this:
FT_UInt charcode;
FT_ULong character = FT_Get_First_Char(face, &charcode);
do {
character = FT_Get_Next_Char(face, character, &charcode);
...
} while(charcode);
Not sure why but i fixed it by saving the file as UTF-8 BOM, rather UTF-8 (i had it by default).

Unicode vector over a character string

I'm using Python 3.5, PyQT5 and I need to print a character with a vector above it.
I know I have to use a Unicode codepoint, and I tried the following instruction :
myLabel = QLabel(b"\U+20D6".encode('utf-16','ignore')
Nothing worked. It does not work with any type of encoding (utf-8, utf-16, ecc.).
My goal is to put an arrow above a character, according to the tutorial found on the web I have to use unicode b"\U+20D6" codepoint.
Do you know right way to do this?
Thanks in advance.

Detecting special chars in postgres

I have usernames in my postgres 9 db such as
Ron R ty ♥☆♡★Green Eyes♥☆♡★
Sωℯℯт۞Angel 2 ᾧ➍ᾧ ty Լù☪ƖƒεƦ
the db is encoded in utf-8
is there a way to detect the presence of these special chars outside standard roman chars in SQL?
I tried using convert documented here http://www.postgresql.org/docs/9.1/static/functions-string.html but only got errors.
Try matching on a regexp character range based on unicode code-point.
WHERE uname ~ '[\x80-\xffff]';
Or, if you want to be more strict you can exclude anything non-alphanumeric.
WHERE uname ~ '[^[:alnum:]]
Other character-classes are available too. See the docs for details.

Using Eclipse with Arabic and English on the same line

I just noticed as I was doing a string compare in Eclipse that when I place an Arabic character in a line it completely throws off Eclipse. How can I interchange English and Arabic on a single line of code?
** EDIT **
Ok now that my question has been migrated here, I supposed some code is in order. I was trying to do the following in Java:
Character c1 = 'ة';
Map<Character, Double> arabicRootMap = new HashMap<Character, Double>();
arabicRootMap.put(c1, 5.0);
The exact same thing happens here on SO as in Eclipse where instead of putting c1 into my map, I would like to put my Arabic character into the map directly, but my left to right is order is partly broken and the new mixed cursor navigation on the line is crazy. As you see, I have an indirect solution to the problem by defining a character beforehand.
So that is my answer, substitution whenever you have a character or string which needs to be in the middle of a statement. This gets to be rather labor intesive as you build up strings of various lengths and can not pre-define every Arabic word ever written. If there is a better answer, I would like to hear it.