NEAT with multiple output - neural-network

I'm currently implementing a program classifier for my coursework.
My lecturer ask me to use "Evolving ANN" algorithm.
So I found a package called NEAT (Neuro Evolution of Augmenting Topologies).
I have 10 inputs and 7 outputs, then I just modify the source from its documentation.
def eval_fitness(genomes):
for g in genomes:
net = nn.create_feed_forward_phenotype(g)
mse = 0
for inputs, expected in zip(alldata, label):
output = net.serial_activate(inputs)
output = np.clip(output, -1, 1)
mse += (output - expected) ** 2
g.fitness = 1 - (mse/44000) #44000 is the number of samples
print(g.fitness)
I had changed the config file too, so the program has 10 inputs and 7 outputs.
But when I try to run the code, it gives me error
Traceback (most recent call last):
File "/home/ilhammaziz/PycharmProjects/tuproSC2/eANN.py", line 40, in <module>
pop.run(eval_fitness, 10)
File "/home/ilhammaziz/.local/lib/python3.5/site-packages/neat/population.py", line 190, in run
best = max(population)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
What I supposed to do?
Thanks

As far as I can tell the error is not in your code but in the library it self. Just use a different one.
This one looks promising to me.

Related

Getting "raise EOFError" when call the df.show() function

I have a dataframe(df) with 1 million rows and two columns (ID (long int) and description(String)). After transforming them into tfidf (using Tokenizer, HashingTF, and IDF), the dataframe, df has two columns (ID and features (sparse vector).
I computed the item-item similarity matrix using udf and dot function.
Computing the similarities is done successfully.
However, when I'm calling the show() function getting
"raise EOFError"
I read so many questions on this issue but did not get right answer yet.
Remember, if I apply my solution on a small dataset (like 100 rows), everything is working successfully.
Is it related to the out of memory issue?
I checked my dataset and description information, I don't see any records with null or unsupported text messages
dist_mat = data.alias("i").join(data.alias("j"), psf.col("i.ID") < psf.col("j.ID")) \
.select(psf.col("i.ID").alias("i"), psf.col("j.ID").alias("j"),
dot_udf("i.features", "j.features").alias("score"))
dist_mat = dist_mat.filter(psf.col('score') > 0.05)
dist_mat.show(1)```
If I removed the last line dist_mat.show(), it is working without error. However, when I used this line, got the error like
.......
```Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded```
...
Here is the part of the error message:
```[Stage 6:=======================================================> (38 + 1) / 39]Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/pyspark.zip/pyspark/daemon.py", line 170, in manager
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/pyspark.zip/pyspark/daemon.py", line 73, in worker
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 397, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 714, in read_int
raise EOFError
EOFError```
I increased the cluster size and run it again. It is working without errors. So, the error message is true
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
However, computing the pairwise similarities for such a large scale matrix, I found an alternative solution, Large scale matrix multiplication with pyspark
In fact, it is very efficient and much more faster, even better than the use of BlockMatrix

Pyspark, EOFError - Memory issue or broken data?

I have a dataframe, which has about 2 million rows with URLs, 2 columns: id and url. I need to parse the domain from the url. I used lambda with urlparse or simple split. But I keep getting EOFError with both ways. If I create a random "sample" of 400 000, it works.
What is also interesting, is that pyspark shows me the top 20 rows with the new column domain, but I cannot do anything with it or I get the error again.
Is it a memory issue or is something wrong with the data? Can somebody please advise me or give me a hint?
I searched several questions regarding this, none of them helped me.
The code:
parse_domain = udf(lambda x: x.split("//")[-1].split("/")[0].split('?')[0],
returnType=StringType())
df = df.withColumn("domain", parse_domain(col("url")))
df.show()
Example urls:
"https://www.dataquest.io/blog/loading-data-into-postgres/"
"https://github.com/geekmoss/WrappyDatabase"
"https://www.google.cz/search?q=pyspark&rlz=1C1GCEA_enCZ786CZ786&oq=pyspark&aqs=chrome..69i64j69i60l3j35i39l2.3847j0j7&sourceid=chrome&ie=UTF-8"
"https://search.seznam.cz/?q=google"
And the error I keep getting:
Traceback (most recent call last):
File "/opt/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/daemon.py", line 170, in manager
File "/opt/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/daemon.py", line 73, in worker
File "/opt/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 278, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
File "/opt/spark-2.3.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 692, in read_int
raise EOFError
EOFError

Find category of MATLAB mlint warning ID

I'm using the checkcode function in MATLAB to give me a struct of all error messages in a supplied filename along with their McCabe complexity and ID associated with that error. i.e;
info = checkcode(fileName, '-cyc','-id');
In MATLAB's preferences, there is a list of all possible errors, and they are broken down into categories. Such as "Aesthetics and Readability", "Syntax Errors", "Discouraged Function Usage" etc.
Is there a way to access these categories using the error ID gained from the above line of code?
I tossed around different ideas in my head for this question and was finally able to come up with a mostly elegant solution for how to handle this.
The Solution
The critical component of this solution is the undocumented -allmsg flag of checkcode (or mlint). If you supply this argument, then a full list of mlint IDs, severity codes, and descriptions are printed. More importantly, the categories are also printed in this list and all mlint IDs are listed underneath their respective mlint category.
The Execution
Now we can't simply call checkcode (or mlint) with only the -allmsg flag because that would be too easy. Instead, it requires an actual file to try to parse and check for errors. You can pass any valid m-file, but I have opted to pass the built-in sum.m because the actual file itself only contains help information (as it's real implementation is likely C++) and mlint is therefore able to parse it very rapidly with no warnings.
checkcode('sum.m', '-allmsg');
An excerpt of the output printed to the command window is:
INTER ========== Internal Message Fragments ==========
MSHHH 7 this is used for %#ok and should never be seen!
BAIL 7 done with run due to error
INTRN ========== Serious Internal Errors and Assertions ==========
NOLHS 3 Left side of an assignment is empty.
TMMSG 3 More than 50,000 Code Analyzer messages were generated, leading to some being deleted.
MXASET 4 Expression is too complex for code analysis to complete.
LIN2L 3 A source file line is too long for Code Analyzer.
QUIT 4 Earlier syntax errors confused Code Analyzer (or a possible Code Analyzer bug).
FILER ========== File Errors ==========
NOSPC 4 File <FILE> is too large or complex to analyze.
MBIG 4 File <FILE> is too big for Code Analyzer to handle.
NOFIL 4 File <FILE> cannot be opened for reading.
MDOTM 4 Filename <FILE> must be a valid MATLAB code file.
BDFIL 4 Filename <FILE> is not formed from a valid MATLAB identifier.
RDERR 4 Unable to read file <FILE>.
MCDIR 2 Class name <name> and #directory name do not agree: <FILE>.
MCFIL 2 Class name <name> and file name do not agree: <file>.
CFERR 1 Cannot open or read the Code Analyzer settings from file <FILE>. Using default settings instead.
...
MCLL 1 MCC does not allow C++ files to be read directly using LOADLIBRARY.
MCWBF 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n).
MCWFL 1 MCC requires that the first argument of WEBFIGURE not come from FIGURE(n) (line <line #>).
NITS ========== Aesthetics and Readability ==========
DSPS 1 DISP(SPRINTF(...)) can usually be replaced by FPRINTF(...).
SEPEX 0 For better readability, use newline, semicolon, or comma before this statement.
NBRAK 0 Use of brackets [] is unnecessary. Use parentheses to group, if needed.
...
The first column is clearly the mlint ID, the second column is actually a severity number (0 = mostly harmless, 1 = warning, 2 = error, 4-7 = more serious internal issues), and the third column is the message that is displayed.
As you can see, all categories also have an identifier but no severity, and their message format is ===== Category Name =====.
So now we can just parse this information and create some data structure that allows us to easily look up the severity and category for a given mlint ID.
Again, though, it can't always be so easy. Unfortunately, checkcode (or mlint) simply prints this information out to the command window and doesn't assign it to any of our output variables. Because of this, it is necessary to use evalc (shudder) to capture the output and store it as a string. We can then easily parse this string to get the category and severity associated with each mlint ID.
An Example Parser
I have put all of the pieces I discussed previously together into a little function which will generate a struct where all of the fields are the mlint IDs. Within each field you will receive the following information:
warnings = mlintCatalog();
warnings.DWVRD
id: 'DWVRD'
severity: 2
message: 'WAVREAD has been removed. Use AUDIOREAD instead.'
category: 'Discouraged Function Usage'
category_id: 17
And here's the little function if you're interested.
function [warnings, categories] = mlintCatalog()
% Get a list of all categories, mlint IDs, and severity rankings
output = evalc('checkcode sum.m -allmsg');
% Break each line into it's components
lines = regexp(output, '\n', 'split').';
pattern = '^\s*(?<id>[^\s]*)\s*(?<severity>\d*)\s*(?<message>.*?\s*$)';
warnings = regexp(lines, pattern, 'names');
warnings = cat(1, warnings{:});
% Determine which ones are category names
isCategory = cellfun(#isempty, {warnings.severity});
categories = warnings(isCategory);
% Fix up the category names
pattern = '(^\s*=*\s*|\s*=*\s*$)';
messages = {categories.message};
categoryNames = cellfun(#(x)regexprep(x, pattern, ''), messages, 'uni', 0);
[categories.message] = categoryNames{:};
% Now pair each mlint ID with it's category
comp = bsxfun(#gt, 1:numel(warnings), find(isCategory).');
[category_id, ~] = find(diff(comp, [], 1) == -1);
category_id(end+1:numel(warnings)) = numel(categories);
% Assign a category field to each mlint ID
[warnings.category] = categoryNames{category_id};
category_id = num2cell(category_id);
[warnings.category_id] = category_id{:};
% Remove the categories from the warnings list
warnings = warnings(~isCategory);
% Convert warning severity to a number
severity = num2cell(str2double({warnings.severity}));
[warnings.severity] = severity{:};
% Save just the categories
categories = rmfield(categories, 'severity');
% Convert array of structs to a struct where the MLINT ID is the field
warnings = orderfields(cell2struct(num2cell(warnings), {warnings.id}));
end
Summary
This is a completely undocumented but fairly robust way of getting the category and severity associated with a given mlint ID. This functionality existed in 2010 and maybe even before that, so it should work with any version of MATLAB that you have to deal with. This approach is also a lot more flexible than simply noting what categories a given mlint ID is in because the category (and severity) will change from release to release as new functions are added and old functions are deprecated.
Thanks for asking this challenging question, and I hope that this answer provides a little help and insight!
Just to close this issue off. I've managed to extract the data from a few different places and piece it together. I now have an excel spreadsheet of all matlab's warnings and errors with columns for their corresponding ID codes, category, and severity (ie, if it is a warning or error). I can now read this file in, look up ID codes I get from using the 'checkcode' function and draw out any information required. This can now be used to create analysis tools to look at the quality of written scripts/classes etc.
If anyone would like a copy of this file then drop me a message and I'll be happy to provide it.
Darren.

Matlab assignment - return only odd elements

I wrote code similar to that posted by FeliceM, but when I tried to include the recommended changes/additions I got the following error:
Warning: File: odd_index.m Line: 5 Column: 18
Function with duplicate name "odd_index" cannot be called.
Here's my code:
function odd_index
M=[1:5; 6:10; 11:15; 16:20; 21:25];
M=M(1:2:end, 1:2:end);
end
function M_out = odd_index(M)
M_out = M(1:2:end, 1:2:end);
end
Not quite sure what I'm doing wrong.
It appears to be simply a problem of naming your two functions the same name. Rename the second function (and make sure it's in a separate file with the same name as itself). Really, the first one isn't a function at all, but appears to be a script.
You may want to refer to the MATLAB Getting Started help documentation for more information on functions and scripts. http://www.mathworks.com/help/matlab/getting-started-with-matlab.html

Reading input m-file in a main m-file

I would like MATLAB to tell me if I have an input file (.m file) that contains some variables with their numbers (i.e., a = 5, b = 6, c = 7) so that I can then use that .m file in another program (main .m file) that uses these variables to calculate S = a + b + c. How can I then read the input file from the main file? Assume the input file is called INP and the main MAIN.
This is typically not good practice in MATLAB. The file containing the input variables would, in your example, be a script. As would your main file. MATLAB does not error when running one script from another, as suggested by ScottieT812, but under certain circumstances strange errors can arise. (Run time compiling has difficulty, variable name collisions across scripts)
A better option is to turn the inputs script into a function which returns the variables of interest
function [a,b c] = inputs
a = 5;
b = 6;
c = 7;
Then this function can be called in the main.m script.
% main.m
[a,b,c] = inputs;
s = a+b+c;
If your "input" file is an m-file, just use the name of the file in your "main" m-file. For example you might have a file called input.m that looks like this:
% File: inputs.m
a = 5;
b = 6;
c = 7;
Then, you can use it in the file main.m like this:
% File: main.m
inputs;
S = a + b + c;
For this sort of stuff (parameters that are easily adjusted later) I almost always use structures:
function S = zark
S.wheels = 24;
S.mpg = 13.2;
S.name = 'magic bus';
S.transfer_fcn = #(x) x+7;
S.K = [1 2; -2 1];
Then you can return lots of data without having to do stuff like [a,b,c,d,e,f]=some_function;
One nice thing about structures is you can address them dynamically:
>> f = 'wheels';
>> S.(f)
ans =
24
It sounds like you want to have some global configuration information that's used by scripts. Often, it's much better to create functions and pass values as arguments, but sometimes it makes sense to do things the way you suggest. One way to accomplish this is to save the information in a file. See "load" and "save" in the Matlab documentation.
I ran into the exact problem KennyMorton mentioned when trying to create runtime compiled versions of MATLAB software for my work. The software uses m-files extensively for passing arguments between functions. Additionally, we create these m-files dynamically which the deployed version of MATLAB does not play nice with. Our workaround was:
save the parameters to a file without the .m extension
read and eval the contents of the file
So, to follow the OPs example, in a function we would create a text file, INP, containing our parameters. We create this file in the directory returned by the ctfroot function. Then, in MAIN, we would use the following to retrieve these parameters:
eval(char(textread(fullfile(ctfroot, INP), '%s', 'whitespace', '');
If the data script is just a script, you can call it from a function or another script directly. Not extra commands required. For example:
%mydata.m
a = 1;
b = 2;
%mymain.m
mydata
whos
mymain
>>mymain
Name Size Bytes Class Attributes
a 1x1 8 double
b 1x1 8 double
This also works for functions in addition to scripts
%foo.m
function foo
mydata
whos
>>foo
Name Size Bytes Class Attributes
a 1x1 8 double
b 1x1 8 double
Generally, it is preferable to use a MAT or other data file for this sort of thing.