Marytts HMM voice quality changes with text length - marytts

I am using MaryTTS as a text to speech engine inside a Grails Application.
During app testing I found out that the language quality drastically changes (for the worst) with increasing text length when using a HMM voice.
So naturally I tested via the MARY Web Client while tweeking all HMM relevant parameters (F0Add, F0Scale and Rate) as well as removing them or leaving the default values, but to no success.
The voice I am using is bits1-hsmm:5.2 (German Female)
gradle dependency:
compile "de.dfki.mary:voice-bits1-hsmm:5.2"
The code is as simple as:
def marytts = new LocalMaryInterface()
marytts.locale = Locale.GERMAN
marytts.generateAudio text
Everything works fine up to the point where the text to convert goes over 120 characters (not only in the code but also via the Mary Web Client)
Here the text I used for the last tests:
Baumaßnahmen im Mai und Oktober Notwendige Instandhaltungsarbeiten an der Münchner S-Bahn-Stammstrecke sollen von nun an gebündelt stattfinden. Die Bahn möchte dadurch die baubedingten Fahrplaneinschränkungen durch gesperrte Gleise geringer halten.
To see the difference in quality use a part of the text (first couple words) vs the whole.
Another important point: This does not occur when using a Unit Selection voice .
Am I missing something like a configuration or specific parameter set or is this the standard behaviour of HMM voices inside MaryTTS?
It will be great to be able to use this voice with decent quality, since Unit Selection voices are not available as standalone dependencies and having to split the text in smaller parts and play them sequentially is not really something I would consider.
Any input is appreciated.
Update
Further trial and error showed that the robotic background sound is added when the text contains punctoation marks such as . , : ; [ ] { }. Independent of text length! Not really sure what the root cause is but atleast with a text manipulation before the conversion the voice is useable.

Related

How to use "Easy edge trace" and "edge trace distances" in ImageJ?

I have already installed the both plugins but don't know how to use them for pod analysis. Need help in that as i don't have programming background. Also can we use it for batch processing of images, in case i have more than 100 images?
Another approach per specially coded ImageJ-macro gives reasonable estimates of the widths and lengths of all pods in the sample image. You can access the macro code from here. Unzip the zip-archive and drop the file "plantPodDimensions.ijm" onto the ImageJ main window. Then open the sample image and run the macro. The estimated pod dimensions appear in a table.
Specimen [right to left] Mean Pod Width [cm] Pod Length [cm]
OHiI7_pod-1 0.70±0.11 23.6
OHiI7_pod-2 0.59±0.09 22.3
OHiI7_pod-3 0.64±0.05 20.7
OHiI7_pod-4 0.41±0.04 20.5
OHiI7_pod-5 0.66±0.07 22.9
OHiI7_pod-6 0.68±0.10 24.4
OHiI7_pod-7 0.60±0.07 20.5
Of course it couldn't be tested, if the macro works as expected for other images than the sample image.
With the download of the plugins come documents that explain the use of the plugins. Batch processing is possible, if the starting points of the traces are known or can somehow be determined by additional pre-processing steps (not trivial). Both plugins are macro-recordable. In any case, batch processing will require some macro code.
For the use case in question I would recommend to perform the analyses via the GUI, not per batch processing. The coding of a suitable macro would take more time than the processing of 100+ images.

tlab Audio conversion

I recorded my voice in Matlab. Now i want to convert that audio in to strings i-e; written sentences in Matlab. Is there a way to convert audio in to text.
I'm pretty sure MATLAB does not have native speech-to-text functionality.
A quick Google search turned up at least one project integrating speech-to-text into MATLAB.
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Some other software that can translate recorded speech into text are Microsoft's SAPI (built into Windows Vista and Windows 7, and available as a download for Windows XP), and CMU's Sphinx project. Nuance Dragon Naturally Speaking is an option, but it is comparatively expensive. It's not obvious to me how these could be integrated into MATLAB though.
You can achieve somewhat limit mileage using the Builtin Windows Speech API. It depends on your operating system etc. and you need to follow similar principles from the API documentation:
http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx
Using MATLAB's activeX server (
http://www.mathworks.co.uk/help/matlab/ref/actxserver.html)
You need the first declare a speech recogniser engine
RC = actxserver('SAPI.SpSharedRecoContext'); %connect to speech engine
And then setup various call back functions for each state of the recogniser:
RC.registerevent({'Recognition' #CallbackFunction; 'Hypothesis' #CallbackFunction; 'FalseRecognition' #CallbackFunction})
The contents of the callback function should be along these lines:
function word = CallbackFunction(varargin)
global word
result = varargin{length(varargin)-2};
word = result.Phraseinfo.GetText;
end
Then finally switch the recogniser on:
RC.Recognizer.State = 'SRSActive';
You would need to reference the documentation for which callback functions are called and when.
You will need to also setup a grammar dictionary to get meaningful results. As the engine will be attempting to recognise any word otherwise.

How to store text as paragraphs in SQLite database in a iPhone app?

In my iPhone app, I have a requirement to store a huge amount of text. I have paragraphs of text to be stored in my database along with the newline characters.
What should I do to store the text as paragraphs in SQLite database?
For example, I want to store paragraphs like the ones below in:
(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.
The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.'
Basically I want to save the paragraphs in database in the same format with carriage returns.
It depends on what you mean by huge and how you're planning on showing the data. The SQLite TEXT field, by default, can store 1 billion bytes.
You could in theory store all of it in a TEXT field in SQLite, then render it in a UIScrollView (or whatever it is you're using to render) and check the performance, memory usage, etc.
If the performance is unacceptable, you can try "chunking" the text into multiple rows and displaying only the records of the text required for the UI.
See the SQLite Limits document:
Maximum length of a string or BLOB
The maximum number of bytes in a string or BLOB in SQLite is defined by
the preprocessor macro
SQLITE_MAX_LENGTH. The default value
of this macro is 1 billion (1 thousand
million or 1,000,000,000). You can
raise or lower this value at
compile-time using a command-line
option like this:
-DSQLITE_MAX_LENGTH=123456789
On the face of it, SQLite doesn't treat newlines any differently than other characters; you can just store the test as-is.
The issue, though, is why are you storing large volumes of raw text in SQLite? If you want to search it or organize it somehow, SQLite (nor Core Data) is probably not the best choice without first massaging the text into some other form. Or, alternatively, you'd want to store the raw text on disk then keep some kind of searchable index in the database.
My suggestion would be if you want to display your text in a webview then add HTML tags to your text.So in that way you can add paragraphs,New lines and many other effects to your text.
Thanks
so do you want to split the text into paragraph and store each in its own row like:
(paragraph_number, text_of_paragraph)
that would be:
create table paragraphs (paragraph_number, text_of_paragraph);
then in what ever language you use split the text into a list of (pn, tp) named l and do like:
executemany("insert into paragraphs values (?, ?)", l)
or do like:
for p in l:
execute("insert into paragraphs values (?, ?)", p)
i would use HTML to represent my paragraphs (i.e)
Saving the Text
<div>
<p>(the mother of the faithful believers) The commencement of the Divine Inspiration to Allah's Apostle was in the form of good dreams which came true like bright day light, and then the love of seclusion was bestowed upon him. He used to go in seclusion in the cave of Hira where he used to worship (Allah alone) continuously for many days before his desire to see his family. He used to take with him the journey food for the stay and then come back to (his wife) Khadija to take his food like-wise again till suddenly the Truth descended upon him while he was in the cave of Hira. The angel came to him and asked him to read. The Prophet replied, "I do not know how to read.</p>
<p>The Prophet added, "The angel caught me (forcefully) and pressed me so hard that I could not bear it any more. He then released me and again asked me to read and I replied, 'I do not know how to read.</p>
</div>
Loading the Paragraphs
I would load them inside a UIWebView as html, you can save the HTML into a file in the app sandbox let's say Paragraph1.HTML load it as the following:
// this is a user defined method
-(void)loadDocument:(NSString*)documentName inView:(UIWebView*)webView
{
NSURL *url = [NSURL fileURLWithPath:sFilePath];// Path of the HTML File
NSURLRequest *request = [NSURLRequest requestWithURL:url];
[web loadRequest:request];
}
dispose the File after loading it, this will save you much time and space.
Good luck.

Issues with iPhone Http Streaming with concatenated video files

We are seeing this when "tying" two video files together.
Example we have Ad video that is segmented and content file which is also segmented.
We create a new file which has both Ad and content segment information together. However we are seeing an issue where either the Ad content is truncated or the content starts having A/V sync issues.
Both ad and content are segmented the same way , 5 sec segmentation. however since Ads are variable length the result file may have left over segment something like:
#EXTM3U
#EXT-X-TARGETDURATION:5
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:5,
fileSequence6.ts
#EXTINF:5,
fileSequence7.ts
#EXTINF:4,
fileSequence8.ts
#EXTINF:5,
fileSequence0.ts
#EXTINF:5,
fileSequence1.ts
#EXTINF:5,
fileSequence2.ts
#EXTINF:3,
fileSequence3.ts
Is this the proper way to play 2 files one after the other without rebuffering?
should generate-variant-plist be used to a play list of 2 files?
When you have a break in the stream to switch to a commercial, ad, or alternate video source then you want to introduce the discontinuity tag before the start of the next segment, for example:
#EXTM3U
#EXT-X-TARGETDURATION:5
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:5,
movie0.ts
#EXTINF:2,
movie1.ts
#EXT-X-DISCONTINUITY
#EXTINF:5,
commercial0.ts
#EXTINF:5,
commercial1.ts
#EXTINF:3,
commercial2.ts
This gets a little more complicated if you encrypt the streams because they use progressive encryption based on the prior segments encryption state and the sequence number which come together to form an "Initialization Vector". If you break the stream you have to reset the initialization vector so that the encryption/decryption can continue uninterrupted. This is an involved process so best to just search on Initialization Vector in Apple's docs.

need for tool for video processing

I have a 2giga mpeg file of people runnig,jogging,walking etc. in it. I will use it in a image classification project but I need to segmentate the video depending on per person an per action.
for example;
there are 25 people in video which repeat these actions in order
1st person
-runs
-walks
2nd person
-runs
-walks
and goes on....
and what I want is to have 2 different mpeg file for each person
such as;
firstperson_runs.mpeg
firstperson_waves.mpeg
so I need a tool to split big file into these files. Splitting shall be due to time.
such as;
pick t1:start of action
pick t2:end of action
create a new video from big file for the interval t1 and t2
of course I will select time intervals for each video.
OS:Winxp pro
if it can be done by matlab ,can you describe it?
any help???
I imagine there are a number of tools available to do this without MATLAB, but if you really want to use MATLAB I would check out these submissions on The MathWorks File Exchange:
Gerald Dalley's videoIO Toolbox for Matlab
Micah Richert's mmread
David Foti's mpgread and mpgwrite
EDIT:
As mentioned by M456, you can also use the built-in function MMREADER for creating a multimedia reader object for your movie file (and subsequently reading selected movie frames from it with the READ method). However, I don't know which version of MATLAB this function was introduced in. It is in versions 7.7 and 7.8 (R2008b and R2009a, respectively), but it is not in version 7.1.
Matlab can do such video split operations. There are two built in functions (aviread and mmreader) for reading video files. Both will create objects which contain the individual frames of the video. You can save these as separate frames or make a new video out of by using avifile.