Looking for advise to create my first neural network to classify text - mongodb

I am very new in this field and I would like to create a Neural Network to classify a dataset that I have in MongoDB. I would like some advise about where should I start, what technology should I use or any tutorial that you think it can help.
If you know about any open source code that already does this, I would love to take a look at it.
Thank you !!

Pick a platform
In essence, you should pick a platform or framework that does much of the dirty work for you and read up on some tutorials for that.
The big choice is between natural language processing frameworks such as NLTK or spaCy or Stanford NLP tools; or a generic machine learning framework such as Tensorflow or PyTorch.
Text classification is a popular task that's reasonably entry-level, is well supported by pretty much everything (so it's not much to say there in a shopping question, pick whatever you like) and would have a bunch of tutorials available online for any major platform.

Related

Templates for designing neural network architectures

I trying to produce a neural network visualization similar to the one below (link):
I was wondering if anybody could suggest any resources that they have used themselves and are happy with. If, in particular, anyone knows of any freely available template from which I can build off, that would be great.
I happened to e-mail the author who let me know that he used Powerpoint for this one. Who would have thought!

Do using Smile library over H2O is good idea on the scale of performances?

Walkthrough of Smile performance as compared to H2O.
Smile - Statistical Machine Intelligence and Learning Engine.
I Want to use Smile library for Constructing Pipeline using word2vec model.
If you want to use H2O word2vec algorithm you can use Sparkling Water samples here which are writing in Scala/Java, very easy to make them work on your Scala code.
I don't have much experience with SMILE library however looking their code you can find some of the Java samples in their test code here into nlp section. You can try them to use with Scala as well.
I hope you know what you want to do because they way you describe your question is not exactly clear. What you really need is to ask a very specific question after writing some code first. This will help you to understand exactly where you are stuck and what is needed to unblock you.

Any pytorch tools to monitor neural network's training?

Are there any tools to monitor network's training in PyTorch? Like tensorboard in tensorflow.
PyTorch 1.1.0 supports TensorBoard natively with torch.utils.tensorboard. The API is very similar to tensorboardX. See the documentation for more details.
I am using tensorboardX. It supports most (if not all) of the features of TensorBoard. I am using the Scalar, Images, Distributions, Histograms and Text. I haven't tried the rest, like audio and graph, but the repo also contains examples for those use cases. The installation can be done easily with pip. It's all explained in the README file of the repo.
There are also other github repos which implement a wrapper for PyTorch (and other languages/frameworks) to tensorboard. As far as I know they support fewer functionalities. But have a look at:
Crayon
Tensorboard-Logger
I have asked this question before in the forums. Tensorboard seems very convenient for Tensorflow and it is also made part of the library/framework itself. However, PyTorch wouldn't take the same approach. But there is a library called visdom here that is released by Facebook, that helps you log the training information. This gives you the flexibility of logging information the way you want. While this means a lot of flexibility, it also means you need to write some extra code to make things work.
Following up on blckbird's answer, I'm also a big fan of Tensorboard-PyTorch. However I also found that its API is relatively low level and I was writing a lot of similar code over and over to do the logging. So (shameless plug) I've written a small package on top of it to automate monitoring network training experiments with minimal code. Hopefully someone else finds it helpful. pytorch-monitor
Minetorch helps me a lot at the past 2 Kaggle competitions. I think it's ready for others to use. It has built-in tensorboard or matplotlib supported. And many other features which make the work easy, includes:
Logger
Tensorboard supported
Matplotlib (to generate png to file)
Auto resume training
Auto best model saving
Hook points for customize
...
It's still in developing so any issues or PRs are very welcomed : )

easiest tool to use for a extreme beginner for classification/clustering

I saw that the tool weka is having a gui interface. This gui interface is very easy for non coding users to classify data sets into classes. Matlab is very difficult since say for example making a neural network you need to write code and to do that you need to have a solid understanding of whats going on. Are there other tools like weka or else is there a plugin to matlab that gives more power to it?
RapidMiner has a functional GUI, and will work for both classification and clustering. It is the most popular open-source (free) data mining application available as of 2012.
RapidMiner: http://rapid-i.com/
It also has numerous training videos and tutorials that you can follow along with - I learned basic clustering methods using a K-means cluster method in about 3 hours. See the Vancouver Data blog for some great RapidMiner analytics videos. Top-notch stuff, really.
Vancouver Data (Neil McGuigan): http://vancouverdata.blogspot.com/
As a bonus, you can install the Weka plug-in, which then gives you GUI Weka. All of the add-ons are free and well-integrated. Other add-ons include a GUI 'R' (the stats program), Reporting Services, Text and Web Analytics, etc. It is fairly simple to use straight 'out of the box' (IMO).
Weka is very (very) powerful and you can write your own classifier if that's what you need to do.
Between Matlab and Weka there's pretty much nothing you can't do in terms of Machine Learning.
You might want to check out Netlab toolkit for Matlab, which is a neural network toolkit developed by a Professor at Aston University - it is available from http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/

consultation about ANN libraries

Firstly, I am a beginner in artificial neural networks and I need a special library for training the artificial neural networks, but I very confused in the selection of the library, and since I didn't have the experience I wanted to consult you.
I have read about three libraries:
FANN, Flood, and Neuro Fusion libraries.
So, what are you think about the easiest and Least problems library for using it with VC++.6?
I just started using FANN, and it seems to be very well documented, with great examples and fast.
It operates with floats/doubles/integers and implements the Cascade2 training method, which is really great if you are unsure about the architecture of your NN.
It is not as rich as Encog (didn't use it), but if FANN implements all the functionalities you need, I think you should go with it.
Edit: I just realized that Encog is only available for .NET C# (besides Java)