TagLib-Sharp - Can it read/write chapter marks? - taglib-sharp

I am trying to read chapters from mp4 video files. I don't see this in the File.Tags list, but I was hoping there was a way to get them via requesting the chap atom.
I did try mp4chap, but it only gets me the first chapter. I think it may be meant for audio files only.

I am the author of mp4chap lib.
Yes, you are correct. mp4chap library is meant to use for audio files only.
Library is open source and really simple, you are free to modify it.
Here library analyze only first track (see code below). You can play with this code and get more data from mp4 tracks.
http://mp4chap.codeplex.com/SourceControl/latest#Mp4Chapters/ChapterExtractor.cs
private void ReadChapters(MoovInfo moovBox)
{
var soundBox = moovBox.Tracks.Where(b => b.Type == "soun").ToArray();
if (soundBox.Length == 0) return;
if (soundBox[0].Chaps != null && soundBox[0].Chaps.Length > 0)
{
var cb = new HashSet<uint>(soundBox[0].Chaps);
var textBox = moovBox.Tracks.Where(b => b.Type == "text" && cb.Contains(b.Id)).ToArray();
if (textBox.Length == 0) return;
ReadChaptersText(textBox[0]);
}
}

Related

Assistant Entities and Different Speakers

It is possible to differentiate among speakers/users with the Watson-Unity-SDK, as it seems to be able to return an array that identifies which words were spoken by which speakers in a multi-person exchange, but I cannot figure out how to execute it, particularly in the case where I am sending different utterances (spoken by different people) to the Assistant service to get a response accordingly.
The code snippets for parsing Assistant's json output/response as well as OnRecognize and OnRecognizeSpeaker and SpeechRecognitionResult and SpeakerLabelsResult are there, but how do I get Watson to return this from the server when an utterance is recognized and its intent is extracted?
Both OnRecognize and OnRecognizeSpeaker are used only once in the Active property, so they are both called, but only OnRecognize does the Speech-to-Text (transcription) and OnRecognizeSpeaker is never fired...
public bool Active
{
get
{
return _service.IsListening;
}
set
{
if (value && !_service.IsListening)
{
_service.RecognizeModel = (string.IsNullOrEmpty(_recognizeModel) ? "en-US_BroadbandModel" : _recognizeModel);
_service.DetectSilence = true;
_service.EnableWordConfidence = true;
_service.EnableTimestamps = true;
_service.SilenceThreshold = 0.01f;
_service.MaxAlternatives = 0;
_service.EnableInterimResults = true;
_service.OnError = OnError;
_service.InactivityTimeout = -1;
_service.ProfanityFilter = false;
_service.SmartFormatting = true;
_service.SpeakerLabels = false;
_service.WordAlternativesThreshold = null;
_service.StartListening(OnRecognize, OnRecognizeSpeaker);
}
else if (!value && _service.IsListening)
{
_service.StopListening();
}
}
}
Typically, the output of Assistant (i.e. its result) is something like the following:
Response: {"intents":[{"intent":"General_Greetings","confidence":0.9962662220001222}],"entities":[],"input":{"text":"hello eva"},"output":{"generic":[{"response_type":"text","text":"Hey!"}],"text":["Hey!"],"nodes_visited":["node_1_1545671354384"],"log_messages":[]},"context":{"conversation_id":"f922f2f0-0c71-4188-9331-09975f82255a","system":{"initialized":true,"dialog_stack":[{"dialog_node":"root"}],"dialog_turn_counter":1,"dialog_request_counter":1,"_node_output_map":{"node_1_1545671354384":{"0":[0,0,1]}},"branch_exited":true,"branch_exited_reason":"completed"}}}
I have set up intents and entities, and this list is returned by the Assistant service, but I am not sure how to get it to also consider my entities or how to get it to respond accordingly when the STT recognizes different speakers.
I would appreciate some help, particularly how to do this via Unity scripting.
I had the exact same question about dealing with the Assistant's messages, so I looked at the Assistant.OnMessage() method that returns a string like “Response: {0}”, customData[“json”].ToString() plus the JSON output that will be something like this:
[Assistant.OnMessage()][DEBUG] Response: {“intents”:[{“intent”:”General_Greetings”,”confidence”:1}],”entities”:[],”input”:{“text”:”hello”},”output”:{“text”:[“good evening”],”nodes_visited”: etc...}
I personally parse the JSON in order to extract the content from messageResponse.Entities. In the above example, you can see that that the array is empty, but if you are populating it, then that’s where you need to extract the values from and then in your code you can do what you want.
Regarding the different speaker recognition, in the Active property whose code you have included, the _service.StartListening(OnRecognize, OnRecognizeSpeaker) line takes care of both, so perhaps put some Debug.Log statements inside their code blocks to see if they are called or not.
Please set SpeakerLabels to True
_service.SpeakerLabels = true;

Error reading from an IP-Camera

I capture the image from an IP-Camera and I work with the frames. My programm reads when there is a movement, and then, it makes a photo and save it on the computer.
It works perfectly at first, but when it is running like 2-3 hours, it usually get an error, and I do not find a explanation for this. Because, if it is an error on getting the image or the processing, it should happens since first, shouldn't it?
The error I get is the next:
Exception in thread "main" java.lang.NullPointerException
at com.googlecode.javacv.IPCameraFrameGrabber.grab(IPCameraFrameGrabber.java:105)
at Llamada.main(Llamada.java:34)
I have looked for the error nº105 but I have not found anything.
The program is the next:
public class Llamada {
public static void main(String[] args) throws Exception {
IPCameraFrameGrabber grabber = new IPCameraFrameGrabber("http://192.168.2.102:80/mjpg/video.mjpg");
//OpenCVFrameGrabber grabber = new OpenCVFrameGrabber(0);
grabber.start();
IplImage frame = grabber.grab();
IplImage image = null;
IplImage prevImage = null;
IplImage diff = null;
Date data = new Date();
String output = "";
int i=0, j=0;
CanvasFrame canvasFrame = new CanvasFrame("IP Camera");
canvasFrame.setCanvasSize(frame.width(), frame.height());
CvMemStorage storage = CvMemStorage.create();
while (canvasFrame.isVisible() && (frame = grabber.grab()) != null) {
cvSmooth(frame, frame, CV_GAUSSIAN, 9, 9, 2, 2);
if (image == null) {
image = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
cvCvtColor(frame, image, CV_RGB2GRAY);
} else {
prevImage = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
prevImage = image;
image = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
cvCvtColor(frame, image, CV_RGB2GRAY);
}
if (diff == null) {
diff = IplImage.create(frame.width(), frame.height(), IPL_DEPTH_8U, 1);
}
if (prevImage != null) {
// perform ABS difference
cvAbsDiff(image, prevImage, diff);
// do some threshold for wipe away useless details
cvThreshold(diff, diff, 64, 255, CV_THRESH_BINARY);
canvasFrame.showImage(diff);
// recognize contours
CvSeq contour = new CvSeq(null);
cvFindContours(diff, storage, contour, Loader.sizeof(CvContour.class), CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
while (contour != null && !contour.isNull()) {
if (contour.elem_size() > 0) {
output = data.toString();
if (data != null)
output = output.substring(0,10);
if(i%300 == 0)
cvSaveImage((j++)+" "+ output +"-capture.jpg", frame);
CvBox2D box = cvMinAreaRect2(contour, storage);
// test intersection
if (box != null) {
CvPoint2D32f center = box.center();
CvSize2D32f size = box.size();
}
i++;
}
contour = contour.h_next();
}
}
}
grabber.stop();
canvasFrame.dispose();
}
}
Thank you for everything!
Have you tried using a debugger and setting a break point? I understand that waiting around for 2-3 hours isn't fun, but maybe it'd help you get a handle on what's going on.
That seems to be in your while loop's second conditional part. Something inside the method grab on the grabber object is throwing a NullPointerException.
Probably the way you've initialized the grabber has led it to do this.
And it would be useful to know which version of the IPCameraFrameGrabber class you're using and what the author of that class really expected. Namely it's initialized to respond to a particular camera's url. In reading the class, it would appear this makes no claim to work with all IP cameras' MJPEG streams.
Let's look at one example comment in there:
foscam url http://host/videostream.cgi?user=username&pwd=password
http://192.168.0.59:60/videostream.cgi?user=admin&pwd=password
android ipcam http://192.168.0.57:8080/videofeed
And compare that to your url:
http://192.168.2.102:80/mjpg/video.mjpg
I gather it is not a foscam videostream.cgi url nor an android ipcam videofeed url, which would appear to be the only tested urls. It reminds me of an Axis camera url. More on that later.
In a recent version of that class (also in the older one actually), there seems to be some hackish attempt at reading only to the end of a subheader that is always delimited by crlfcrlf which could have been done just as well with a buffered input reader reading lines until it gets an empty line. What I do see here that seems likely to cause an npe is:
When your url's http server's response does not contain the content-length header, which is quite possible, the returned readImage() byte[] is null.
Since javax.imageio.ImageIO specifies that it will throw an IllegalArgumentException when it gets a null input, I'm guessing it's the ByteArrayInputStream constructor in the grabBufferedImage method that's throwing this, the IplImage.createFrom(null) in the old version, or the b.length in the newer version that is.
None of the line numbers of these versions line up with the error message you've shown that you're getting, so maybe your version of the library is yet again different, and broken differently. Try using the debugger, edit and patch the source of the IPCameraFrameGrabber to better support your mjpeg over http "device" based on what you find out is really in the input stream of the http response.
Since the url format reminds me of an Axis camera, I tried this with one running firmware v5.50 with the boa server built in:
$ curl -I http://user:pass#10.10.10.10:8080/mjpg/video.mjpg
HTTP/1.0 200 OK
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Connection: close
Content-Type: multipart/x-mixed-replace; boundary=myboundary
So you can see the content length is missing there. However, you do say you're getting frames initially for hours, then then, so I'm kind of at a loss with that part. I mean it sounds as though EITHER the input stream is getting closed, or the java implementation wrapping the stream, implemented in the http protocol handler, runs out of some kind of total space or open connection timer for some reason. I know this seems vague.
Another thing that seems odd is that from what I read in the two example classes of IPCameraFrameGrabber linked, every call to grab reads the input stream looking for headers first, which doesn't make sense to me right now, and I feel as though I must be misreading that.

Why is my Sphinx4 Recognition poor?

I am learning how to use Sphinx4 using the Maven plug-in for Eclipse.
I took the transcribe demo found on GitHub and altered it to process a file of my own. The audio file is 16bit, mono, 16khz. It is approximately 13 seconds long. I noticed that it sounds like it is in slow motion.
The words spoken in the file are, "also make sure it's easy for you to access the recording files so you could upload it if asked".
I am attempting to transcribe the file and my results are horrendous. My attempts at finding forum posts or links that thoroughly explain how to improve the results, or what I am not doing correctly have lead me no where.
I am looking to strengthen the accuracy of the transcription, but would like to avoid having to train a model myself due to the variance in the type of data that my current project will have to deal with. Is this not possible, and is the code I am using off?
CODE
(NOTE: Audio file available at https://instaud.io/8qv)
public class App {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
// Load model from the jar
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// You can also load model from folder
// configuration.setAcousticModelPath("file:en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
FileInputStream stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/vocaroo_test_revised.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Simple recognition with generic model
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
// I added the following print statements to get more information
System.out.println("\ngetWords() before loop: " + result.getWords());
System.out.format("Hypothesis: %s\n", result.getHypothesis());
System.out.print("\nThe getResult(): " + result.getResult()
+ "\nThe getLattice(): " + result.getLattice());
System.out.println("List of recognized words and their times:");
for (WordResult r : result.getWords()) {
System.out.println(r);
}
System.out.println("Best 3 hypothesis:");
for (String s : result.getNbest(3))
System.out.println(s);
}
recognizer.stopRecognition();
// Live adaptation to speaker with speaker profiles
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
// Stats class is used to collect speaker-specific data
Stats stats = recognizer.createStats(1);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
stats.collect(result);
}
recognizer.stopRecognition();
// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);
// Decode again with updated transform
stream = new FileInputStream(new File("/home/tmscanlan/workspace/example/warren_test_smaller.wav"));
// stream.skip(44); I commented this out due to the short length of my file
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
System.out.println("...Printing is done..");
}
}
Here is the output (a photo album I took): http://imgur.com/a/Ou9oH
As Nikolay says, the audio sounds odd, probably because you haven't resampled it in the right way.
To downsample the audio from the original 22050 Hz to the desired 16kHz, you can run the following command:
sox Vocaroo.wav -r 16000 Vocaroo16.wav
The Vocaroo16.wav will sounds much better and it will (probably) give you better ASR results.

Converting programmatically created openXML word documents to HTML errors with nulls

I'm creating WordProcessingDocuments using openxml (which works fine and the produced word doc is exactly what I want), now I'm trying to convert these newly created docs to HTML using the openxml Powertools. I'm new to this so I'm hoping thats it's something stupid that I'm missing but was hoping someone could point me in the right direction with these nullable errors I'm receiving.
This is the exact error...
System.NullReferenceException: Object reference not set to an instance of an object.
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1d(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func2 imageHandler)
at OpenXmlPowerTools.HtmlConverter.<>c__DisplayClass37.<ConvertToHtmlTransform>b__1c(XElement e)
at System.Linq.Enumerable.WhereSelectEnumerableIterator2.MoveNext()
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XContainer.AddContentSkipNotify(Object content)
at System.Xml.Linq.XElement..ctor(XName name, Object[] content)
at OpenXmlPowerTools.HtmlConverter.ConvertToHtmlTransform(WordprocessingDocument wordDoc, HtmlConverterSettings settings, XNode node, Func`2 imageHandler)
I'm using the exact same code you can find on Eric Whites blog.
public static void PrintHTML(string file)
{
byte[] byteArray = File.ReadAllBytes(file);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
//PageTitle = "some title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"C:\\Temp\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
I know the code works because if i pass it a normal worddoc that I haven't created it works fine and converts to html fine. If i create a word doc using openxml then manually copy the contents into a new word file, save it, then pass it through the conversion code, that will work as well. So I'm thinking it must be something to do with the way I'm createing the word doc in openxml initially. Maybe im not adding a part to the file that is required.
Using the openxml sdk I have compared a working and non working file and they appear to have the same components/parts.
From the errors I've posted does anyone have any ideas of where the problem could be, ie, what is null? I can post the creation code for the word doc but it's quite extensive and it might just confuse people more.
I finally got to the bottom of this. I had to dig out the source code for the HtmlConverter in the openxmlpower tools, after some debuging I found that this line in the code was erroring...
line 371
styleId = (string)wordDoc.MainDocumentPart.StyleDefinitionsPart
.GetXDocument().Root.Elements(W.style)
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
.FirstOrDefault().Attributes(W.styleId).FirstOrDefault();
basically in my debugging the
(string)e.Attribute(W._default)
was returning as True or False
so i changed the following line
.Where(e => (string)e.Attribute(W.type) == "paragraph" &&
(string)e.Attribute(W._default) == "1")
to
.Where(e => (string)e.Attribute(W.type) == "paragraph" && (
(string)e.Attribute(W._default) == "1" || (string)e.Attribute(W._default) == "true"))
and now works as expected
Had the same issue where I was saving a reportbuilder report to OpenWordXML and could not convert the bytes to html.
Had to add the following line of code for it to work correctly with version 2.8.1.0
private static IEnumerable<XElement> ParaStyleParaPropsStack(XDocument stylesXDoc,
string paraStyleName, XElement para)
{
if (stylesXDoc == null)
yield break;
var localParaStyleName = paraStyleName;
while (localParaStyleName != null)
{
XElement paraStyle = stylesXDoc.Root.Elements(W.style).FirstOrDefault(s
=>
**s.Attribute(W.type) != null &&**
s.Attribute(W.type).Value == "paragraph" &&
s.Attribute(W.styleId).Value == localParaStyleName);
s.Attribute(W.type) != null && // the liner that was added

How do I unlock a content control using the OpenXML SDK in a Word 2010 document?

I am manipulating a Word 2010 document on the server-side and some of the content controls in the document have the following Locking properties checked
Content control cannot be deleted
Contents cannot be edited
Can anyone advise set these Locking options to false or remove then altogether using the OpenXML SDK?
The openxml SDK provides the Lock class and the LockingValues enumeration
for programmatically setting the options:
Content control cannot be deleted and
Contents cannot be edited
So, to set those two options to "false" (LockingValues.Unlocked),
search for all SdtElement elements in the document and set the Val property to
LockingValues.Unlocked.
The code below shows an example:
static void UnlockAllSdtContentElements()
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(#"c:\temp\myword.docx", true))
{
IEnumerable<SdtElement> elements =
wordDoc.MainDocumentPart.Document.Descendants<SdtElement>();
foreach (SdtElement elem in elements)
{
if (elem.SdtProperties != null)
{
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
if (l == null)
{
continue;
}
if (l.Val == LockingValues.SdtContentLocked)
{
Console.Out.WriteLine("Unlock content element...");
l.Val = LockingValues.Unlocked;
}
}
}
}
}
static void Main(string[] args)
{
UnlockAllSdtContentElements();
}
Just for the ones who copy this code, keep in mind that if there is no Locks associated to the content control, then there won't be a Lock property associated to it, so when the code executes the following instruction, it will return a exception since there is no element found:
Lock l = elem.SdtProperties.ChildElements.First<Lock>();
The way to fix this is do the FirstOrDefault instead of First.