How to write Moses decoder translations into a file? - moses

I'm trying to test the Moses decoder. I have huge amounts of data and I need to record their translations into a file. I mean giving all the data to Moses and recording the translations into a file. Is there anyone who can help me? I have used this link: http://www.statmt.org/moses/?n=Moses.Baseline and now I can just give a sentence to it inorder to translate.
Thanks!

Related

Matlab converting library to model

I'm working on a script to convert a Simulink library to a plain model, meaning it can be simulated, it does not auto-lock etc.
Is there a way to do this with code aside from basically copy-pasting every single block into a new model? And if it isn't, what is the most efficient way to do the "copy-paste".
I was not able to find any clues as how to approach this problem here, or on Google, or on the official documentation or on the MathWorks forum so I'm at a loss on how to proceed.
Thank you in advance!
I don't think it's possible to convert a library to a model, but you can programmatically add library blocks to models like so:
sys = 'testModel';
new_system(sys);
open_system(sys);
add_block('Simulink/Sources/Sine Wave', [sys, '/MySineWave']);
save_system(sys);
close_system(sys);
sim(sys);
You could even use the find_system command to list all the blocks in a library and then loop through them all and create a new model for each using the above code.

How do I train tesseract but not create a new language?

So I am trying out tesseract at the moment, and it does work, but it is not accurate enough. I know that the image quality plays a role as well, etc. etc., but some of the documents I am using use a rather unusual font. It still does recognise parts of it though (about 50-60%, which is pretty good), but this is obviously not entirely satisfying.
I would like to know now whether it's possible to train tesseract, but not to create an entirely new language, but to use the data I am already using, and build on this and improve it?
Second, if this is possible, would this even be advisable? Or (2) would it be better to create new languages for every new font I encounter, or (3) create new languages for each new font I encounter, but not from scratch but always built upon the default data I am using right now? What do you think? If you can provide any links on how to train tesseract & make use of the training data already provided, do let me know please.
You can extract the files from .traineddata file as given in documentation :
specify option -u to unpack all the components to the specified path:
combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
This will create /home/$USER/temp/eng.* files with individual tessdata components from tessdata/eng.traineddata.
There are other options too,please check the documentation on the following link.
https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc
But rather than playing with original files its advisable to train tesseract for a new language.
(2)You dont have to create new language for each font.You have to create image,box and training file for each font .All of these will then be combined into a single language's traineddata file.
(3)This is possible too.PLease visit
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#bootstrapping-a-new-character-set

Matlab tiff setTag number not recognized

I am trying to change the value of a tag from a TIFF object in my matlab code. I keep getting this error:
Error using tifflib
Tag number (273) is unrecognized by the TIFF library.
Error in Tiff/setTag (line 1146)
tifflib('setField',obj.FileID, ...
The code I am using is included below:
fname='C:\FileLocation\pcd144_012.tif';
t=Tiff(fname,'r+');
t.getTag('StripOffsets')
t.setTag('StripOffsets',[8, 16392])
Why is it I can get the tag and see it, but cannot set the tag to a different value?
Here is a link to the tiff I am working with:
Tiff Data
I think that you're out of luck with this approach. The setTag methods are mostly used when building a TIFF from scratch. My guess is that the 'StripOffsets' field is not modifiable. Keep in mind that these tools are designed for the normal case of non-broken image files and that changing this field in such cases would either break the file or necessitate re-encoding of the data most of the time. The function should give better feedback (documentation for the TIFF could be better in general) so you might still contact The MathWorks to let them know about this.
As far as finding a way to edit these tags/fields, you might look for and try out some TIFF tag viewer/editor programs to see if they might do it. Otherwise it may come down to parsing the header yourself to find the relevant bytes.

How to create .model files while doing model transformation using ETL

I am doing a model transformation using Epsilon Transformation Language. I already have the meta models for the input and output models. I have written the transformation code and want to check if it works for a small hello-world application.So, I wrote a JUnit test.
I have a hello-world application written according to my input meta-model, and I also have the hello-world application that I am expecting from my transformation. However, in this link, http://www.eclipse.org/epsilon/cinema/
they have shown .model files for the input and output models. I have no clue as to how I can obtain .model files from my hello-world files. The file extension of my hello-world applications conform to their respective meta-model, for eg. hello-world.xml.
Please can someone tell me how to generate the .model files from any other file format?
Thank you so much.
For anyone who might face this issue in future, here is the solution, that I figured out myself. A .model file is basically an xml file. If in eclipse, you see it without using exceed editor, you can see the xml structure. If you try to manually create a dummy .model file and check the xml structure, you will understand how to make your own model files from the code of your hello world. Basically, you can write an application in java that extracts the necessary information from the code and generates an xml file.

training a new model using pascal kit

need some help on this.
Currently I am doing a project on computer vision that requires me to train a new model to detect a certain object.
In this case, I am using the system provided by P. Felzenszwalb, D. McAllester, D. Ramaman and his team => Discriminatively trained deformable part models which is implemented in Matlab.
Project webpage: http://www.cs.uchicago.edu/~pff/latent/.
However I have no idea how to direct the system to use my dataset(a collection of images and annotation) which is different from the the PASCAL datasets so as to train a new model.
By directing, I meant a line of code that allows me to change the dataset the system reads from, for training a model.
E.g.
% directory for caching models, intermediate data, and results
cachedir = ['/var/tmp/rbg/YOURPATH/' VOCyear '/'];
I tried looking at their Readme and documentation guides but they do not make any mention. Do correct me if I am wrong.
Let me know if I have not made my problem clear enough.
I tried looking at some files such as global.m but no go.
Your help is much appreciated and thanks in advance!
You can try to read pascal.m in the DPM package(voc-release5), there are similar code working on VOC2007/2010 dataset.
There are plenty of parts that need to be adapted to achieve this. For example the voc_config has to be adapted in order to read from your files.
The same with the pascal_train.m function. Depending on the images and the way you parse them, this may require quite some time to adapt this function.
Other functions to consider:
imreadx
pascal_test
pascaleval