Am I using too many training data in GEE? - classification

I am running a classification script in GEE and I have about 2100 training data since my AOI is a region in Italy and have many classes. I receive the following error while I try save my script:
Script error File too large (larger than 512KB).
I tried cancelling some of the training data and it saves. I thought there is no limit in GEE to choose training points. How can I know what is the limit so I adjust my training points or if there is a way to save the script without deleting any points.
Here is the link to my code

The Earth Engine Code Editor “drawing tools” are a convenient, but not very scalable, way to create geometry. The error you're getting is because “under the covers” they actually create additional code that is part of your script file. Not only is this fairly verbose (hence the error you received), it's not very efficient to run, either.
In order to use large training data sets, you will need to create your point data in another tool and upload it (using CSV or SHP files) to become one or more Earth Engine “table” assets, and use those from your script.

Related

How to test cntk object detection example on custom image?

I am trying to run CNTK object detecion example on PascalVoc pretrained dataset. I run all required scripts in fastrcnn and get the visual output for the test data defined in dataset. Now I want to test network on my own image, how can I do that?
For Fast R-CNN you need a library that generates candidate ROIs (regions of interest) for your test images, e.g. selective search.
If you want to evaluate a batch of images you can follow the description in the tutorial to generate the test mapping file and the ROI coordinates (see test.txt and test.rois.txt in the corresponding proc sub folder). If you want to evaluate a single you would need to pass the image and the candidate ROI coordinates as inputs to cntk eval, similar to this example:
# compute model output
arguments = {loaded_model.arguments[0]: [hwc_format]}
output = loaded_model.eval(arguments)
For FastRCNN you need to first run your custom image through Selective Search algorithm to generate ROIs (regions of interest) and then feed it to your model with sth like this:
output = frcn_eval.eval({image_input: image_file, roi_proposals: roi_proposals})
You can find more details here: https://github.com/Microsoft/CNTK/tree/release/latest/Examples/Image/Detection/FastRCNN
Anyway FastRCNN is not the most efficient way to do it because of usage of Selective Search (which is a real bottleneck here). If you want to improve the performance you can try FasterRCNN as it gets rid of SS algorithm and replaces it with Region Proposal Network which performs much, much better.
If you're interested, you can check my repo on GitHub: https://github.com/karolzak/CNTK-Hotel-pictures-classificator

How to write "Big Data" to a text file using Matlab

I am getting some readings off an accelerometer connected to an Arduino which is in turn connected to MATLAB through serial communication. I would like to write the readings into a text file. A 10 second reading will write around 1000 entries that make the text file size around 1 kbyte.
I will be using the following code:
%%%%%// Communication %%%%%
arduino=serial('COM6','BaudRate',9600);
fopen(arduino);
fileID = fopen('Readings.txt','w');
%%%%%// Reading from Serial %%%%%
for i=1:Samples
scan = fscanf(arduino,'%f');
if isfloat(scan),
vib = [vib;scan];
fprintf(fileID,'%0.3f\r\n',scan);
end
end
Any suggestions on improving this code ? Will this have a time or Size limit? This code is to be run for 3 days.
Do not use text files, use binary files. 42718123229.123123 is 18 bytes in ASCII, 4 bytes in a binary file. Don't waste space unnecessarily. If your data is going to be used later in MATLAB, then I just suggest you save in .mat files
Do not use a single file! Choose a reasonable file size (e.g. 100Mb) and make sure that when you get to that many amount of data you switch to another file. You could do this by e.g. saving a file per hour. This way you minimize the possible errors that may happen if the software crashes 2 minutes before finishing.
Now knowing the real dimensions of your problem, writing a text file is totally fine, nothing special is required to process such small data. But there is a problem with your code. You are writing a variable vid which increases over time. That may cause bad performance because you are not using preallocation and it may consume a lot of memory. I strongly recommend not to keep this variable, and if you need the dater read it afterwards.
Another thing you should consider is verification of your data. What do you do when you receive less samples than you expect? Include timestamps! Be aware that these timestamps are not precise because you add them afterwards, but it allows you to identify if just some random samples are missing (may be interpolated afterwards) or some consecutive series of maybe 100 samples is missing.

Average results from multiple simulations

I have a simulation with a lot of random components, so I would like to run many simulations and average the results (the result is determined by a variable called score).
How would you do this in Netlogo?
Currently I'm working on a program that will export the results to csv, then I plan to use python/excel to average them. I don't like this because I want to run 100+ simulations (so there will be 100+ files)... I'm hoping there is a better solution
EDIT or an implementation of what I described (I have to relearn enough python/vba to solve this, so it's going to take me some time)
This should be simple enough if you use BehaviorSpace.
In your experiment definition, put score in the Measure runs using these reporters textbox and uncheck Measure run at every step.
When you run your experiment, save your results using Table output. It will produce a csv that you can open in your spreadsheet application. From there, producing an average of the score column should be trivial.

Error using caffe Invalid input size

I tried to train my own neural net using my own imagedatabase as described in
http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
However when I want to check the neural net after training on some standard images using the matlab wrapper I get the following output / error:
Done with init
Using GPU Mode
Done with set_mode
Elapsed time is 3.215971 seconds.
Error using caffe
Invalid input size
I used the matlab wrapper before to extract cnn features based on a pretrained model. It worked. So I don't think the input size of my images is the problem (They are converted to the correct size internally by the function "prepare_image").
Has anyone an idea what could be the error?
Found the solution: I was referencing the wrong ".prototxt" file (Its a little bit confusing because the files are quite similar.
So for computing features using the matlab wrapper one needs to reference the following to files in "matcaffe_demo.m":
models/bvlc_reference_caffenet/deploy.prototxt
models/bvlc_reference_caffenet/MyModel_caffenet_train_iter_450000.caffemodel
where "MyModel_caffenet_train_iter_450000.caffemodel" is the only file needed which is created during training.
In the beginning I was accidently referencing
models/bvlc_reference_caffenet/MyModel_train_val.prototxt
which was the ".prototxt" file used for training.

Need a method to store a lot of data in Matlab

I've asked this before, but I feel I wasn't clear enough so I'll try again.
I am running a network simulation, and I have several hundreds output files. Each file holds the simulation's test result for different parameters.
There are 5 different parameters and 16 different tests for each simulation. I need a method to store all this information (and again, there's a lot of it) in Matlab with the purpose of plotting graphs using a script. suppose the script input is parameter_1 and test_2, so I get a graph where parameter_1 is the X axis and test_2 is the Y axis.
My problem is that I'm not quite familier to Matlab, and I need to be directed so it doesn't take me forever (I'm short on time).
How do I store this information in Matlab? I was thinking of two options:
Each output file is imported separately to a different variable (matrix)
All output files are merged to one output file and imprted together. In the resulted matrix each line is a different output file, and each column is a different test. Problem is, I don't know how to store the simulation parameters
Edit: maybe I can use a dataset?
So, I would appreciate any suggestion of how to store the information, and what functions might help me fetch the only the data I need.
If you're still looking to give matlab a try with this problem, you can iterate through all the files and import them one by one. You can create a list of the contents of a folder with the function
ls(name)
and you can import data like this:
A = importdata(filename)
if your data is in txt files, you should consider this Prev Q
A good strategy to avoid cluttering your workspace is to import them all into a single matrix. SO if you have a matrix called VAR, then VAR{1,1}.{1,1} could be where you put your test results and VAR{1,1}.{2,1} could be where you put your simulation parameters of the first file. I think that is simpler than making a data structure. Just make sure you uniformly place the information in the same indexes of the arrays. You could also organize your VAR row v col by parameter vs test.
This is more along the lines of your first suggestion
Each output file is imported separately to a different variable
(matrix)
Your second suggestion seems unnecessary since you can just iterate through your files.
You can use the command save to store your data.
It is very convenient, and can store as much data as your hard disk can bear.
The documentation is there:
http://www.mathworks.fr/help/techdoc/ref/save.html
Describe the format of text files. Because if it has a systematic format then you can use dlmread or similar commands in matlab and read the text file in a matrix. From there, you can plot easily. If you try to do it in excel, it will be much slower than reading from a text file. If speed is an issue for you, I suggest that you don't go for Excel.