I have a μ character in my mySQL database. I would like to replace it, with the following code:
UPDATE produkte.produktliste SET Kurzbeschreibung
=REPLACE(REPLACE(Kurzbeschreibung, 'μ', ''), 'μ', '');
The code works, but I can't save the command to a script, because the μ char is not ANSI. I was having a similar problem in the past, and solved it with "char":
UPDATE produkte.produktliste SET Link
= REPLACE(REPLACE(Link, CHAR(160), ''), CHAR(160), '');
But, which number (like 160) does μ have? I have tried many values (181, 924, 956) but nothing works. Does anyone have an idea, why my statement does not work?
the numbers represent the unicode value of the char.
Thanks in advance.
Related
I need to replace a specific unicode character in SAS, exactly the U+0191 with a whitespace or blank. How can I do it by COMPRESS ? Thanks in advance.
You should use the KCOMPRESS function rather than COMPRESS for compressing unicode characters, as it is considered safer for Unicode and DBCS environments.
However, it sounds like you actually want to TRANSLATE, or more accurately KTRANSLATE, which actually replaces characters with whitespace or other characters (as opposed to removing them, as COMPRESS does).
Here's an example:
data have;
charvar = "Ƒellow Americans";
fixed_charvar = translate(charvar,'F','Ƒ');
kfixed_charvar= ktranslate(charvar,'F','Ƒ');
put _all_;
run;
Here I convert U+0191 to a normal F; of course you can convert to space as you wish (Replace the 'F' with whatever you want it converted to).
This will work in an instance of SAS set up in Unicode mode; if you're running in WLATIN1 or similar, you may have more difficulty, particularly with actually passing SAS the U+0191 character.
I have a text box in my GUI, into which I want to write a tabbed text.
As you may or may not know, the \t modifier does not work in a tex-interpreted text strings.
What I ask is if there's an elegant solution to emulate the tab modifier with the CORRECT amount of spaces, also taking into account the fact that different characters might have different widths?
Result should be like this:
[tabText('Try\tThis') ; tabText(Tryy\tThis)]
ans =
Try This
Tryy This
Thanks.
'\t' in matlab is interpreted as it is: two characters \ and t, not the tabulation.
To obtain the tabulation character, you'll have to go through sprintf:
> 'Try\tThis'
Try\tThis
> sprintf('Try\tThis')
Try This
Or with char(9) (ASCII code):
> ['Try' char(9) 'This']
Try This
Looking at the relevant part of the MATLAB documentation for text (at the time of writing, this points to the R2016b docs) one can see the TeX "subset" that is supported by MATLAB, and it does not include any tab-like character. Thus it seems that there's no proper way to do this with the tex interpreter.
You have several options:
If using uifigures is an option, text labels there allow MathML to be used. Which is very customizable...
If you switch to the 'latex' interpreter, you could use \quad, \qquad etc.
figure();
text(.5,.5,{'$$This \quad text$$','$$is \quad properly$$','$$tabbed, \quad Right?$$'},...
'Interpreter','latex');
What O'Neil suggested.
Regarding the unequal character width - you might be able to overcome this by changing the font, using the 'FontName' argument to text(...).
I'm using unaccent in Postgres but it cannot convert special character like:
ù : ù
but it's okay for ù: ù
2 characters same meaning but different code, the first one is character u + ̀
How I can solve this problem ?
Thank you so much.
Your problem is unicode normalization, what PostgreSQL does not do, unfortunately. And it's not so simple to implement on your own.
But, because you only want to remove diacritical marks, you only need to actually remove code-points (before or after calling the unaccent() function) which are unicode combining characters:
select regexp_replace(
'ùù',
'[\u0300-\u036F\u1AB0-\u1AFF\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]',
'',
'g'
)
should do the trick.
I'm using Python 3.5, PyQT5 and I need to print a character with a vector above it.
I know I have to use a Unicode codepoint, and I tried the following instruction :
myLabel = QLabel(b"\U+20D6".encode('utf-16','ignore')
Nothing worked. It does not work with any type of encoding (utf-8, utf-16, ecc.).
My goal is to put an arrow above a character, according to the tutorial found on the web I have to use unicode b"\U+20D6" codepoint.
Do you know right way to do this?
Thanks in advance.
I am trying to replace German and Dutch umlauts such as ä, ü, or ß. They should be written like ae instead of ä. So I can't simply translate one char with another.
Is there a more elegant way to do that? Actually it looks like that (not completed yet):
SELECT addr, REPLACE (REPLACE(addr, 'ü','ue'),'ß','ss') FROM search;
On my way trying different commands I got another problem:
When I searched for Ü I got this:
ERROR: invalid byte sequence for encoding "UTF8": 0xdc27
Tried it with U&'\0220', it didn't replace anything. Only by using ü (for lowercase ü) it was replaced correctly. Has to do something with unicode, but how to solve this issue?
Kind regards from Germany. :)
Your server encoding seems to be UTF8.
I suspect your client_encoding does not match, which might give you a wrong impression of what you are dealing with. Check with:
SHOW client_encoding; -- in your actual session
And read this related answers:
Can not insert German characters in Postgres
Replace unicode characters in PostgreSQL
The rest of the tool chain has to be in sync, too. When using puTTY, for instance, one has to make sure, the terminal agrees with the rest: Change settings... Window -> Translation -> Remote character set = UTF-8.
As for your first question, you already have the best solution. A couple of umlauts are best replaced with a string of replace() statements.
As you seem to know already as well, single character replacements are more efficient with (a single) translate() statement.
Related:
Replace unicode characters in PostgreSQL
Regex remove all occurrences of multiple characters in a string
Beside other reasons I decided to write the replacement in python. Like Erwin wrote before, it seems there is no better solution as combining replace- commands.
In general pretty simple, even no encoding had to benn used. My "final" solution now looks like this:
ger_UE="Ü"
ger_AE="Ä"
ger_OE="Ö"
ger_SS="ß"
dk_AA="Å"
dk_OE="Ø"
dk_AE="Æ"
cur.execute("""Select addr, REPLACE (REPLACE (REPLACE( REPLACE (REPLACE (REPLACE (REPLACE(addr, '%s','UE'),'%s','OE'),'%s','AE'),'%s','SS'),'%s','AA'),'%s','OE'),'%s','AE')
from search WHERE x = '1';"""%(ger_UE,ger_OE,ger_AE,ger_SS,dk_AA,dk_OE,dk_AE))
I am now looking forward to the speed when it hits the large table. If anyone would like to make some annotations, they are very welcome.