I am learning how to use Sphinx4 using the Maven plug-in for Eclipse.
I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that it sounds like it is in slow motion.
The words spoken in the file are, "also make sure it's easy for you to access the recording files so you could upload it if asked".
I am attempting to transcribe the file and my results are horrendous. My attempts at finding forum posts or links that thoroughly explain how to improve the results, or what I am not doing correctly have lead me no where.
I am looking to strengthen the accuracy of the transcription, but would like to avoid having to train a model myself due to the variance in the type of data that my current project will have to deal with. Is this not possible, and is the code I am using off?
CODE
(NOTE: Audio file available at https://instaud.io/8qv)
public class App {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
// Load model from the jar
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// You can also load model from folder
// configuration.setAcousticModelPath("file:en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
FileInputStream stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/vocaroo_test_revised.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Simple recognition with generic model
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
// I added the following print statements to get more information
System.out.println("\ngetWords() before loop: " + result.getWords());
System.out.format("Hypothesis: %s\n", result.getHypothesis());
System.out.print("\nThe getResult(): " + result.getResult()
+ "\nThe getLattice(): " + result.getLattice());
System.out.println("List of recognized words and their times:");
for (WordResult r : result.getWords()) {
System.out.println(r);
}
System.out.println("Best 3 hypothesis:");
for (String s : result.getNbest(3))
System.out.println(s);
}
recognizer.stopRecognition();
// Live adaptation to speaker with speaker profiles
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Stats class is used to collect speaker-specific data
Stats stats = recognizer.createStats(1);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
stats.collect(result);
}
recognizer.stopRecognition();
// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);
// Decode again with updated transform
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
System.out.println("...Printing is done..");
}
}
Here is the output (a photo album I took): http://imgur.com/a/Ou9oH
As Nikolay says, the audio sounds odd, probably because you haven't resampled it in the right way.
To downsample the audio from the original 22050 Hz to the desired 16kHz, you can run the following command:
sox Vocaroo.wav -r 16000 Vocaroo16.wav
The Vocaroo16.wav will sounds much better and it will (probably) give you better ASR results.
Related
I try to make a sip video call using Pjsip/Pjsua on my raspberry pi 3.
Before coding, I'm using the main sample app to test different options. Everything seems to work (registering, audio calling,..) but when I try to start a video call, the programs stops with the following message :
pjsua-armv7l-unknown-linux-gnueabihf: ../src/pjmedia-videodev/v4l2_dev.c:737: vid4lin_stream_get_frame_mmap: Assertion `!"frame buffer is too small for v4l2"' failed.
I've searched a lot, including the source code :
/* get frame from mmap */
static pj_status_t vid4lin_stream_get_frame_mmap(vid4lin_stream *stream, pjmedia_frame *frame)
{
struct v4l2_buffer buf;
pj_time_val time;
pj_status_t status = PJ_SUCCESS;
pj_bzero(&buf, sizeof(buf));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
status = xioctl(stream->fd, VIDIOC_DQBUF, &buf);
if (status != PJ_SUCCESS)
return status;
if (frame->size < buf.bytesused) {
/* supplied buffer is too small */
pj_assert(!"frame buffer is too small for v4l2");
status = PJ_ETOOSMALL;
goto on_return;
}
So I understand that the pjmedia_frame has a "size" inferior to the v4l2 buffer, resulting to my failure.
My question is simple : how can i change this setting ?
I tried evetything in the sample app : changing resolution, bitrate, fps,..
I found some ressources saying to change the h264 profile level.. ok, but where do i set it ? Is it within the v4l2 manager ? or directly in the app ? How can i do it ?
I played with different options in v4l2 to reduce the bitrate/resolution in order to have a small buffer, but still getting the same error.
At this point I'm completely clueless.
For info, I compiled PJsip using openh264 (no libx264) as suggested by PjSip.
Thanks for your help/ideas ;)
According to your question about profile level, you can try with:
const pj_str_t codec_id = {"H264", 4};
pjmedia_vid_codec_param param;
pj_status_t status;
status = pjsua_vid_codec_get_param(&codec_id, ¶m);
param.dec_fmtp.param[0].name = pj_str("profile-level-id");
param.dec_fmtp.param[0].val = pj_str("42e01f");
status = pjsua_vid_codec_set_param(&codec_id, ¶m);
do this anywhere after pjsua_start(). Last two characters in val property are profile level. Description of levels can be found here (link). More information about h264 profile here (link).
I'm not an expert of v4l2, but have little experience with encoding video on rpi3, and I suggest you to use FFmpeg instead of pure openh264, beacuse of support of hardware acceleration (link).
Good luck!
I capture the image from an IP-Camera and I work with the frames. My programm reads when there is a movement, and then, it makes a photo and save it on the computer.
It works perfectly at first, but when it is running like 2-3 hours, it usually get an error, and I do not find a explanation for this. Because, if it is an error on getting the image or the processing, it should happens since first, shouldn't it?
The error I get is the next:
Exception in thread "main" java.lang.NullPointerException
at com.googlecode.javacv.IPCameraFrameGrabber.grab(IPCameraFrameGrabber.java:105)
at Llamada.main(Llamada.java:34)
I have looked for the error nº105 but I have not found anything.
The program is the next:
public class Llamada {
public static void main(String[] args) throws Exception {
IPCameraFrameGrabber grabber = new IPCameraFrameGrabber("http://192.168.2.102:80/mjpg/video.mjpg");
//OpenCVFrameGrabber grabber = new OpenCVFrameGrabber(0);
grabber.start();
IplImage frame = grabber.grab();
IplImage image = null;
IplImage prevImage = null;
IplImage diff = null;
Date data = new Date();
String output = "";
int i=0, j=0;
CanvasFrame canvasFrame = new CanvasFrame("IP Camera");
canvasFrame.setCanvasSize(frame.width(), frame.height());
CvMemStorage storage = CvMemStorage.create();
while (canvasFrame.isVisible() && (frame = grabber.grab()) != null) {
cvSmooth(frame, frame, CV_GAUSSIAN, 9, 9, 2, 2);
if (image == null) {
image = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
cvCvtColor(frame, image, CV_RGB2GRAY);
} else {
prevImage = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
prevImage = image;
image = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
cvCvtColor(frame, image, CV_RGB2GRAY);
}
if (diff == null) {
diff = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
}
if (prevImage != null) {
// perform ABS difference
cvAbsDiff(image, prevImage, diff);
// do some threshold for wipe away useless details
cvThreshold(diff, diff, 64, 255, CV_THRESH_BINARY);
canvasFrame.showImage(diff);
// recognize contours
CvSeq contour = new CvSeq(null);
cvFindContours(diff, storage, contour, Loader.sizeof(CvContour.class), CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
while (contour != null && !contour.isNull()) {
if (contour.elem_size() > 0) {
output = data.toString();
if (data != null)
output = output.substring(0,10);
if(i%300 == 0)
cvSaveImage((j++)+" "+ output +"-capture.jpg", frame);
CvBox2D box = cvMinAreaRect2(contour, storage);
// test intersection
if (box != null) {
CvPoint2D32f center = box.center();
CvSize2D32f size = box.size();
}
i++;
}
contour = contour.h_next();
}
}
}
grabber.stop();
canvasFrame.dispose();
}
}
Thank you for everything!
Have you tried using a debugger and setting a break point? I understand that waiting around for 2-3 hours isn't fun, but maybe it'd help you get a handle on what's going on.
That seems to be in your while loop's second conditional part. Something inside the method grab on the grabber object is throwing a NullPointerException.
Probably the way you've initialized the grabber has led it to do this.
And it would be useful to know which version of the IPCameraFrameGrabber class you're using and what the author of that class really expected. Namely it's initialized to respond to a particular camera's url. In reading the class, it would appear this makes no claim to work with all IP cameras' MJPEG streams.
Let's look at one example comment in there:
foscam url http://host/videostream.cgi?user=username&pwd=password
http://192.168.0.59:60/videostream.cgi?user=admin&pwd=password
android ipcam http://192.168.0.57:8080/videofeed
And compare that to your url:
http://192.168.2.102:80/mjpg/video.mjpg
I gather it is not a foscam videostream.cgi url nor an android ipcam videofeed url, which would appear to be the only tested urls. It reminds me of an Axis camera url. More on that later.
In a recent version of that class (also in the older one actually), there seems to be some hackish attempt at reading only to the end of a subheader that is always delimited by crlfcrlf which could have been done just as well with a buffered input reader reading lines until it gets an empty line. What I do see here that seems likely to cause an npe is:
When your url's http server's response does not contain the content-length header, which is quite possible, the returned readImage() byte[] is null.
Since javax.imageio.ImageIO specifies that it will throw an IllegalArgumentException when it gets a null input, I'm guessing it's the ByteArrayInputStream constructor in the grabBufferedImage method that's throwing this, the IplImage.createFrom(null) in the old version, or the b.length in the newer version that is.
None of the line numbers of these versions line up with the error message you've shown that you're getting, so maybe your version of the library is yet again different, and broken differently. Try using the debugger, edit and patch the source of the IPCameraFrameGrabber to better support your mjpeg over http "device" based on what you find out is really in the input stream of the http response.
Since the url format reminds me of an Axis camera, I tried this with one running firmware v5.50 with the boa server built in:
$ curl -I http://user:pass#10.10.10.10:8080/mjpg/video.mjpg
HTTP/1.0 200 OK
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Connection: close
Content-Type: multipart/x-mixed-replace; boundary=myboundary
So you can see the content length is missing there. However, you do say you're getting frames initially for hours, then then, so I'm kind of at a loss with that part. I mean it sounds as though EITHER the input stream is getting closed, or the java implementation wrapping the stream, implemented in the http protocol handler, runs out of some kind of total space or open connection timer for some reason. I know this seems vague.
Another thing that seems odd is that from what I read in the two example classes of IPCameraFrameGrabber linked, every call to grab reads the input stream looking for headers first, which doesn't make sense to me right now, and I feel as though I must be misreading that.
I am very comfortable with UIMA, but my new work require me to use GATE
So, I started learning GATE. My question is regarding how to calculate performance of my tagging engines (java based).
With UIMA, I generally dump all my system annotation into a xmi file and, then using a Java code compare that with a human annotated (gold standard) annotations to calculate Precision/Recall and F-score.
But, I am still struggling to find something similar with GATE.
After going through Gate Annotation-Diff and other info on that page, I can feel there has to be an easy way to do it in JAVA. But, I am not able to figure out how to do it using JAVA. Thought to put this question here, someone might have already figured this out.
How to store system annotation into a xmi or any format file programmatically.
How to create one time gold standard data (i.e. human annotated data) for performance calculation.
Let me know if you need more specific or details.
This code seems helpful in writing the annotations to a xml file.
http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/BatchProcessApp.java
String docXMLString = null;
// if we want to just write out specific annotation types, we must
// extract the annotations into a Set
if(annotTypesToWrite != null) {
// Create a temporary Set to hold the annotations we wish to write out
Set annotationsToWrite = new HashSet();
// we only extract annotations from the default (unnamed) AnnotationSet
// in this example
AnnotationSet defaultAnnots = doc.getAnnotations();
Iterator annotTypesIt = annotTypesToWrite.iterator();
while(annotTypesIt.hasNext()) {
// extract all the annotations of each requested type and add them to
// the temporary set
AnnotationSet annotsOfThisType =
defaultAnnots.get((String)annotTypesIt.next());
if(annotsOfThisType != null) {
annotationsToWrite.addAll(annotsOfThisType);
}
}
// create the XML string using these annotations
docXMLString = doc.toXml(annotationsToWrite);
}
// otherwise, just write out the whole document as GateXML
else {
docXMLString = doc.toXml();
}
// Release the document, as it is no longer needed
Factory.deleteResource(doc);
// output the XML to <inputFile>.out.xml
String outputFileName = docFile.getName() + ".out.xml";
File outputFile = new File(docFile.getParentFile(), outputFileName);
// Write output files using the same encoding as the original
FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
OutputStreamWriter out;
if(encoding == null) {
out = new OutputStreamWriter(bos);
}
else {
out = new OutputStreamWriter(bos, encoding);
}
out.write(docXMLString);
out.close();
System.out.println("done");
I am new to j2me Mobile Applications. I have a college project to be done within a month. I need some basic idea of how it can be done. I am using Netbeans 6.8 j2me platform.
I have to create source, destination and many intermediate nodes(mobile phones). A file has to be sent(am using Bluetooth) from source to destination via several intermediate nodes. The file can be split into chunks and can also be sub-divided at any level.
http://imageshack.us/photo/my-images/692/parallelism.jpg/
This is how it should work:
Initially, the source sends simple objects of a Class(present in all nodes) to destination via several paths. Every node will be updating the object by including its Bluetooth address and passes it to the next node. When it reaches the destination, the same object is sent back to the source. The source identifies some of the optimal paths and uses them for file transfer.
The source splits the file and send them to the nearest nodes. The intermediate nodes can also split and send the divided parts.
When all the parts reach the destination, they are joined and the file is reconstructed.
I created separate netbeans project for source, destination and intermediate node.
Splitting : I did splitting successfully by converting the file into byte array and creating files using File connection & outputstream
public void splitfiles(int len)
{
String url="file:///root1/testfile.jpg";
// int len = 102400;
byte buffer[] = new byte[size];
int count = 0;
try
{
FileConnection fconi = (FileConnection)Connector.open(url,Connector.READ);
InputStream fis = fconi.openInputStream();
while (true)
{
int i = fis.read(buffer, 0, len); //creating byte array of size "len" bytes
if (i == -1)
break;
++count;
String filename ="file:///root1/testfile.part" + count;
FileConnection fcono = (FileConnection)Connector.open(filename,Connector.READ_WRITE);
if (!fcono.exists())
fcono.create();
OutputStream fos = fcono.openOutputStream();
fos.write(buffer, 0, i); //creating files out of byte array "buffer"
fos.close();
fcono.close();
}
}
catch(Exception e)
{ }
}
Please tell me how to rejoin the files.(I rejoined them using java class "RandomAcessFile" which is not present in j2me).
I tried in the following way
while(number of chunks)
{
read a single file in inputstream (files are read one after the other)
copy it to a byte array and flush inputstream
write it to ouputstream
}
copy outputstream to a file
flush outputstream
Please give me some idea about
how to rejoin the chunks in j2me
How to pass object of a class via bluetooth
From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)
What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.
A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data
This is what I tried to do:
net = require('net');
var server = net.createServer(function (socket) {
socket.on('connect',function() {
console.log('someone connected');
buf = new Buffer(Math.pow(2,16)); //new buffer with size 2^16
socket.on('data',function(data) {
if (data.toString().search('_:_:_') === -1) { // If there's no separator in the data that just arrived...
buf.write(data.toString()); // ... write it on the buffer. it's part of another message that will come.
} else { // if there is a separator in the data that arrived
parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
if (parts.length == 2) {
msg = buf.toString('utf-8',0,4) + parts[0];
console.log('MSG: '+ msg);
buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
} else {
msg = buf.toString() + parts[0];
for (var i = 1; i <= parts.length -1; i++) {
if (i !== parts.length-1) {
msg = parts[i];
console.log('MSG: '+msg);
} else {
buf.write(parts[i]);
}
}
}
}
});
});
});
server.listen(9999);
Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.
How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?
It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned # nodeconf as an area where work needs to be done.
As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:
node-buffer - https://github.com/substack/node-binary
node-ctype - https://github.com/rmustacc/node-ctype
As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().
Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.
You should use the new stream2 api. http://nodejs.org/api/stream.html
Here are some very useful examples: https://github.com/substack/stream-handbook
https://github.com/lvgithub/stick