what are the steps need to follow in order to train deepspeech on Google colab? - mozilla-deepspeech

what are the steps (A - Z) we need to follow in order to train a model using Colab?
how to prepare our own data set in case if I need to fine-tune it for my voice/ our country’s accent?

The Mozilla DeepSpeech PlayBook is an excellent starting point for your question. It is not specific to Google Colab, but provides guidance on both data preparation and training.

Related

How to "Load Statistics from File" for Raster Supervised Classification in QGIS with SAGA extension

I'm working on bark-beetle outbreaks in the Alpine forests. To do so, I work on UAV-acquired orthophotos with multispectral bands (RGB, NIR, RE). I want to proceed to a raster (VRT) supervised classification, according to field-acquired ROIs.
I successfully did this using SAGA GUI. I'm now trying to repeat the same process but with using QGIS, as I have all my working layers sorted in a project. I want to get the same supervised classification with the built-in SAGA extension, but the algorithm asks here (but not in SAGA GUI) for a mandatory "Load Statistics from File" parameter.
How do I have to set this parameter?
By reading SAGA documentation, I saw it should be the path to the stats file (about my raster to classify?), but no further informations were provided about the content of this stats file. I don't know how to create it, nor if there is a way to create it using QGIS or SAGA GUI.
Neither did I find help about this in the SAGA documentation or somewhere else on internet.

Watson visual-recognition training failed

I try to use visual-recognition of Watson IBM.
In the web visual-recognition i try to train my class but the trains failed.
I don't know what i am doing worng.
I upload 2 zip each one contains 30 jpg files of house photos.
After the uploading it is training but after some minutes it become red and failed.
In addition, I try to download the example training data of IBM (the Beagle) and it also faild.
Please help me notice what i am doing worng.
Thanks,
Niv.
There are a lot of additional requirements regarding your training data. For instance the size of single pictures or entire file. Try to check your training data with this requirements:
https://www.ibm.com/watson/developercloud/doc/visual-recognition/customizing.html#size-limitations
In addition its possible that your training statement has a syntax error.

Predict Class Probabilities in Spark RandomForestClassifier

I built random forest models using ml.classification.RandomForestClassifier. I am trying to extract the predict probabilities from the models but I only saw prediction classes instead of the probabilities. According to this issue link, the issue is resolved and it leads to this github pull request and this. However, It seems it's resolved in the version 1.5. I'm using the AWS EMR which provides Spark 1.4.1 and sill have no idea how to get the predict probabilities. If anyone knows how to do it, please share your thought or solutions. Thanks!
I have already answered a similar question before.
Unfortunately, with MLLIb you can't get the probabilities per instance for classification models till version 1.4.1.
There is JIRA issues (SPARK-4362 and SPARK-6885) concerning this exact topic which is IN PROGRESS as I'm writing the answer now. Nevertheless, the issue seems to be on hold since November 2014
There is currently no way to get the posterior probability of a prediction with Naive Baye's model during prediction. This should be made available along with the label.
And here is a note from #sean-owen on the mailing list on a similar topic regarding the Naive Bayes classification algorithm:
This was recently discussed on this mailing list. You can't get the probabilities out directly now, but you can hack a bit to get the internal data structures of NaiveBayesModel and compute it from there.
Reference : source.
This issue has been resolved with Spark 1.5.0. Please refer to the JIRA issue for more details.
Concerning AWS, there is not much you can do now for that. A solution might be if you can fork the emr-bootstrap-actions for spark and configure it for you needs, then you'll be able to install Spark on AWS using the bootstrap step.
Nevertheless, this might seem a little complicated.
There is some thing you might need to consider :
update the spark/config.file to install you spark-1.5. Something like :
+3 1.5.0 python s3://support.elasticmapreduce/spark/install-spark-script.py s3://path.to.your.bucket.spark.installation/spark/1.5.0/spark-1.5.0.tgz
this file list above must be a proper build of spark located in an specified s3 bucket you own for the time being.
To build your spark, I advice you reading about it in the examples section about building-spark-for-emr and also the official documentation. That should be about it! (I hope I haven't forgotten anything)
EDIT : Amazon EMR release 4.1.0 offers an upgraded version of Apache Spark (1.5.0). You can check here for more details.
Unfortunately this isn't possible with version 1.4.1, you could extend the random forest class and copy some of the code I added in that pull request if you can't upgrade - but be sure to switch back to the regular version once you are able to upgrade.
Spark 1.5.0 is now supported natively on EMR with the emr-4.1.0 release! No more need to use the emr-bootstrap-actions, which btw only work on 3.x AMIs, not emr-4.x releases.

Plotting arbitrary data for repository

I'm looking for a way to visualize arbitrary information about my repository over time, which might be some version-dependent number, such as:
lines of code
number of lines in a latex document
time between commits
anything that can be output by a script
What is the best way to visualize this information?
More specifically, I'm using mercurial and would ideally like something with a decent interface, with plot resizing/scrolling/etc... Jenkins' plot plugin is decent but not great, but more importantly it's not possible to visualize past data (say, after adding a new metric).
I would suggest to split your task to simplify everything a little bit. It is likely you will need several different tools in order to collect and visualize all required information. Historical view seems to be another big challenge.
Lines of code
There are several plugins available for Jenkins, but almost all are highly specialized. SLOCCount plug-in seems to be most universal, but it does not provide any graphical output.
NSIQ Collector Plugin
SLOCCount plug-in
JavaNCSS Plugin
There might be some other option for your language. For example, CCCC will provide required information for C and C++ code:
Number of lines in a latex document
I see several options to achieve that:
adapt existing solution/plugin
use repository statistics tool (Pepper, for example, can do the trick)
use simple shell script to count lines and report it
Pepper will generate something like the following:
Please check Pepper gallery. There are another tools, for example: hgchart
Time between commits
The simplest solution is to let a commit to trigger some trivial job, so Jenkins will provide all information as part of build history (with a timeline, etc).
Another solution is to use repository statistics tool once again:
Anything that can be output by a script
There are several good plug-ins for that.
Plot plugin can visualize multiple values provided as properties or csv file.
Measurement Plots Plugin scans the output in order to find values to be visualized
Happy continuous integration.

Correlation in Windows Workflow

I am new to WF and i am working on the correlation on WF.i am not finding any step by step guide which explains correlation in easy language. please suggest me some step by step and easy to implement thing.kindly anyone have useful links
Checkout th following link. You may need to read part 1 first
http://blog.petegoo.com/index.php/2010/06/26/wf4-services-part-2-ndash-correlation/