Pronouncing German words in English - google-text-to-speech

i'm using Google TTS to generate audio files in my native language German.
Unfortunately there are some Words like f.e. "laser" that are pronounced as a German word, but i want it to be pronounced as the English one.
Is there any way in SSML to make it possible or do i have to generate separate audio files and cut them together?
Thanks for your answer!

Try:
<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">
The French word for cat is <w xml:lang="fr">chat</w>.
He prefers to eat pasta that is <lang xml:lang="it">al dente</lang>.
</speak>
Check out the following link:
https://www.w3.org/TR/speech-synthesis11/#edef_lang

Related

IBM Text to Speech: How to have correct pronounciation of English words in German text?

I am using the IBM Text to Speech to process some German texts. Pronounciation of the text with a German voice is working ok. However, the text contains some English words and phrases. For those, the pronounciation sometimes is incorrect. How can I fix that?
Example:
Er kommt aus Amerika und nennt sich TJ.
TJ should be pronounced like someone from California would do.
I was able to make it work by using Speech Synthesis Markup Language (SSML) on my text. The German text remains the same. For the English expressions I provided some "phonemes". Here is the one for TJ:
<phoneme alphabet="ibm" ph="tIZE">TJ</phoneme>
I had to use German symbols which can be found in the docs.

How can I spell check for multiple languages in emacs?

I write mostly my documentation in HTML using emacs as my main editor. Emacs let you interactively spell-check the current buffer with the command ispell-buffer.
Since I switch between a number of languages, I have an HTML comment at the end of the file specifying the main dictionary and personal dictionary for that file, E.g. for Norwegian (norsk) I use the following pair of dictionaries:
<!-- Local IspellDict: norsk -->
<!-- Local IspellPersDict: ~/.aspell/personal.dict -->
This works great.
However, sometimes I have a paragraph in another language (e.g. English) embedded in an otherwise Norwegian document. Example:
<p xml:lang="en">This paragraph is in English.</p>
The spell-checker naturally flag all the words in such a paragraph as misspellings (since the dictionary only contain Norwegian words).
To avoid this, I've tried to add a "british" dictionary to the document, like this:
<!-- Local IspellDict: british -->
<!-- Local IspellDict: norsk -->
<!-- Local IspellPersDict: ~/.aspell/personal.dict -->
Unfortunately, this does not work. The "british" dictionary is simply ignored.
My prefered solution would to load an additional dictionary and use this, toghether with the primary dictionary, for spell-checking. Is this possible?
However, I am also interested in a solution that let me mark paragraphs for not being spell checked. It is not ideal, but it would stop valid English words from being flagged as misspellings.
PS: I have also looked at the answer to this question: Multilingual spell checking with language detection, but it is much broader and does not address the specific use emacs ispell for doing the spell-check.
Try ispell-multi and flyspell-xml-lang http://www.dur.ac.uk/p.j.heslin/Software/Emacs/
You can spawn multiple instances of ispell, and use the xml:lang tag to decide which language to check for.

Decoding Korean text files from the 90s

I have a collection of .html files created in the mid-90s, which include a significant ammount of Korean text. The HTML lacks character set metadata, so of course all of the Korean text now does not render properly. The following examples will all make use of the same excerpt of text .
In text editors such as Coda and Text Wrangler the text displays as
╙╦ ╝№бя└К ▓щ╥НВь╕цль▒Ф ▓щ╥НВь╕цль▒Ф
Which in the absence of character set metadata in < head > is rendered by the browser as:
ÓË ¼ü¡ïÀŠ ²éÒ‚ì¸æ«ì±” ²éÒ‚ì¸æ«ì±”
Adding euc-kr metadata to < head >
<meta http-equiv="Content-Type" content="text/html; charset=euc-kr">
Yields the following, which is illegible nonsense (verified by a native speaker):
沓 숩∽핅 꿴�귥멩レ콛 꿴�귥멩レ콛
I have tried this approach with all historic Korean character sets, each yielding similarly unsuccessful results. I also tried parsing and upgrading to UTF-8, via Beautiful Soup, which also failed.
Viewing the files in Emacs seems promising, as it reveals the text encoding a lower level. The following is the same sample of text:
\323\313 \274\374\241\357\300\212
\262\351\322\215\202\354\270\346\253\354\261\224 \262\3\
51\322\215\202\354\270\346\253\354\261\224
How can I identify this text encoding and promote it to UTF-8?
All of those octal codes that emacs revealed are less than 254 (or \376 in octal), so it looks like one of those old pre-Unicode fonts that just used it's own mapping in the ASCII range. If this is right, you'll just have to try to figure out what font it was intended for, find it and perhaps do the conversion yourself.
It's a pain. Many years ago I did something similar for some popular pre-Unicode Greek fonts: http://litot.es/unicode-converter/ (the code: https://github.com/seanredmond/Encoding-Converter)
In the end, it is about finding the correct character encoding and using iconv.
iconv --list
displays all available encodings. Grepping for "KR" reveals at least my system can do CSEUCKR, CSISO2022KR, EUC-KR, ISO-2022-KR and ISO646-KR. Korean is also BIG5HKSCS, CSKSC5636 and KSC5636 according to Wikipedia. Try them all until something reasonable pops out.
Even if this thread is old, it's still an issue, and not having found a way to convert the files in bulk (outside of using a Korean version of Windows7), now I'm using Naver, which has a cloud service like Google docs and if you upload those weirdly encoded files there, it deals with them very well. I just edit and copy the text, and it's back to being standard when I copy it elsewhere.
Not the kind of solution I like, but it might save a few passers-by.
You can register for the cloud account with an ID, even if you do not live in SKorea by the way, there's some minimal english to get by.

how to parse XML which contains data in Norwegian language?

how to parse XML which contains data in Norwegian language ?
Does i need any type of encoding with NSParser ?
Thanks.
I guess you are worried about non-ASCII characters in the XML file. Well you don't need to. The first line of an XML file should look something like:
<?xml version="1.0" encoding="UTF-8"?>
where the encoding attribute tells you which character set was used to encode the characters in the file. NSXMLParser will use that line to determine which character set it will use. Once it gets to your methods, all the text will be in NSStrings which will be able to cope with your Norwegian characters automatically.
All you need to be concerned about is that the file really is encoded in the character set that the first line says it is.
The xml is the language which don't concern which kind of language you are using!! In xml there should be one start tag and it's end tag. Then you can parse using xml parsing.
here is the tutorial to understand xml and
here is the link to tutorial to parse the xml file.
may this will be help full to your problem.

How to parse XML with special characters?

Whenever I try to parse XML with special characters such as ō or 満月先生 I get an error. The xml documents claims to use UTF-8 encoding but that does not seem to be the case.
Here is what the troublesome text looks like when I view the XML in Firefox:
Bleach: The Diamond Dust
Rebellion - M� Hitotsu no
Hy�rinmaru; Bleach - The
DiamondDust Rebellion - Mou Hitotsu no
Hyourinmaru
On the actual website, Å� is actually the character ō.
<br /> One day,
Doraemon and his friends meet
Professor Mangetsu
(����,
Professor Mangetsu?), who studies
magic and magical beings such as
goblins, and his daughter Miyoko
(���,
Miyoko?), and are warned of the
dangerous approximation of the
"star of the
Underworld" to the
Earth's orbit.<br />
<br />
And once again, on the actual website, those characters appear as 満月先生 and 美夜子.
The actual XML file is formatted properly other than those special characters, which certainly do not appear to be using the UTF-8 encoding. Is there a way to get NSXML to parse these XML files?
To use other characters than those who are utf-8, you need to use their special character code. If you want to represent ö you need to type ö
Find more on
Wikipedia: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references