Nnet in caret, basic structure - neural-network

I'm very new to caret package and nnet in R. I've done some projects related to ANN with Matlab before, but now I need to work with R and I need some basic help.
My input dataset has 1000 observations (in rows) and 23 variables (in columns). My output has 1000 observations and 12 variables.
Here are some sample data that represent my dataset and might help to understand my problem better:
input = as.data.frame(matrix(sample(1 : 20, 100, replace = TRUE), ncol = 10))
colnames(input) = paste ( "X" , 1:10, sep = "") #10 observations and 10 variables
output = as.data.frame(matrix(sample(1 : 20, 70, replace = TRUE), ncol = 7))
colnames(output) = paste ( "Y" , 1:7, sep = "") #10 observations and 7 variables
#nnet with caret:
net1 = train(output ~., data = input, method= "nnet", maxit = 1000)
When I run the code, I get this error:
error: invalid type (list) for variable 'output'.
I think I have to add all output variables separately (which is very annoying, especially with a lot of variables), like this:
train(output$Y1 + output$Y2 + output$Y3 + output$Y4 + output$Y5 +
output$Y6 + output$Y7 ~., data = input, method= "nnet", maxit = 1000)
This time it runs but I get this error:
Error in [.data.frame(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
I try to use neuralnet package, with the code below it works perfectly but I still have to add output variables separately :(
net1 = neuralnet(output$Y1 + output$Y2 + output$Y3 + output$Y4 +
output$Y5 + output$Y6 + output$Y7 ~., data = input, hidden=c(2,10))
p.s. since these sample data are created randomly, the neuralnet cannot converge, but in my real data it works well (in comparison to Matlab ANN)
Now, if you could help me with a way to put output variables automatically (not manually), it solves my problem (although with neuralnet not caret).

use the str() function and ascertain that its a data frame looks like you are inputting a list to the train function. This may be because of a transformation you are doing before to output.
str(output)
Without a full script of earlier steps its difficult to understand what is going on.

After trying different things and searches, I finally found a solution:
First, we must use as.formula to show the relation between our input and output. With the code below we don't need to add all the variables separately:
names1 <- colnames(output) #the name of our variables in the output
names2 = colnames(input) #the name of our variables in the input
a <- as.formula(paste(paste(names1,collapse='+', sep = ""),' ~ '
,paste(names2,collapse='+', sep = "")))
then we have to combine our input and output in a single data frame:
all_data = cbind(output, input)
then, use neuralnet like this:
net1 = neuralnet(formula = a, data = all_data, hidden=c(2,10))
plot(net1)
This is also work with the caret package:
net1 = train(a, data = all_data, method= "nnet", maxit = 1000)
but it seems neuralnet works faster (at least in my case).
I hope this helps someone else.

Related

How to automatically change variable names given a new input file name

I am importing data from a .mat file and then extracting certain signals from it and I call this data, data. data is a 1x1 struct with 1 field, FT_est_X, where X is the the particular run that I collected the samples from. Here is the code snippet of how I do that.
data = load('site_data_all_2.mat');
t = data.FT_est_2.time;
% estimated data
Fx = data.FT_est_2.signals(1).values;
Fy = data.FT_est_2.signals(2).values;
Fz = data.FT_est_2.signals(3).values;
Mx = data.FT_est_2.signals(4).values;
My = data.FT_est_2.signals(5).values;
Mz = data.FT_est_2.signals(6).values;
So, you can see that this data was collected from run 2. Now, let's say I want to load in a file named site_data_all_3.mat (run 3), what happens is that all the data below %estimated data changes its name--everything stays the same, except the 2 becomes a 3 (e.g. Fx would be Fx = data.FT_est_3.signals(1).values;. Currently, I have to manually enter in the 3 for each variable; can anyone tell me how I can only change the file name and it will automatically change the variable names for me? Essentially, I just want it to be Fx = data.name_of_struct_field.signals(1).values.
Thank you!
You could construct the string some programmatic way (maybe with an iteration variable), but here's the simple answer of defining the fieldname as a string and simply using it. At the next iteration, update the fieldname variable and repeat.
fieldname = 'FT_est_2';
Fx = data.(fieldname).signals(1).values;

How to Give int-string-int Input as Parameter for Matlab's Matrix?

I would like to have short-hand form about many parameters which I just need to keep fixed in Matlab 2016a because I need them in many places, causing many errors in managing them separately.
Code where the signal is 15x60x3 in dimensions
signal( 1:1 + windowWidth/4, 1:1 + windowWidth,: );
Its pseudocode
videoParams = 1:1 + windowWidth/4, 1:1 + windowWidth,: ;
signal( videoParams );
where you cannot write videoParams as string but should I think write ":" as string and everything else as integers.
There should be some way to do the pseudocode.
Output of 1:size(signal,3) is 3 so it gives 1:3. I do not get it how this would replace : in the pseudocode.
Extension for horcler's code as function
function videoParams = fix(k, windowWidth)
videoParams = {k:k + windowWidth/4, k:k + windowWidth};
end
Test call signal( fix(1,windowWidth){:}, : ) but still unsuccessful giving the error
()-indexing must appear last in an index expression.
so I am not sure if such a function is possible.
How can you make such a int-string-int input for the matrix?
This can be accomplished via comma-separated lists:
signal = rand(15,60,3); % Create random data
windowWidth = 2;
videoParams = {1:1+windowWidth/4, 1:1+windowWidth, 1:size(signal,3)};
Then use the comma-separated list as such:
signal(videoParams{:})
which is equivalent to
signal(1:1+windowWidth/4, 1:1+windowWidth, 1:size(signal,3))
or
signal(1:1+windowWidth/4, 1:1+windowWidth, :)
The colon operator by itself is shorthand for the entirety of a dimension. However, it is only applicable in a direct context. The following is meaningless (and invalid code) as the enclosing cell has no defined size for its third element:
videoParams = {1:1+windowWidth/4, 1:1+windowWidth, :};
To work around this, you could of course use:
videoParams = {1:1+windowWidth/4, 1:1+windowWidth};
signal(videoParams{:},:)

How can I create a POSIXct vector in ffdf?

I've had a look around and can't quite seem to get a grasp of is going on with this. I'm using R in Eclipse. The file I'm trying to import is 700mb with around 15mil rows and 6 columns. As I was having problems loading in I have started using the ff package.
library(ff)
FDF = read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\GBPUSD.1986.2014.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
names(FDF)= c('Date','Time','Open','High','Low','Close')
#names the columns in the ffdf file
dim(FDF)
# produces dimensions of the file
I then want to create a POSIXct sequence which will later be joined against the imported file. I had tried;
tm1 = seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = data.frame (DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
However R kept of crashing. I then tested this is RStudio and saw that their where constraints on the vector. It did, however, produce the correct
dim(tm1)
names(tm1)
So I went back into Eclipse thinking this was something to do with memory allocation. I've attempted the following;
library(ff)
tm1 = as.ffdf(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ffdf(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
names(tm1) = c('DateTime')
dim(tm1)
names(tm1)
This gives an error of
no applicable method for 'as.ffdf' applied to an object of class "c('POSIXct', 'POSIXt')"
I can't seem to work around this. I then tried ...
library(ff)
tm1 = as.ff(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ff(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
Which produce the output dates, however not in the correct format. In addition to this, when ...
dim(tm1)
names(tm1)
where executed they both returned null.
Question
How can I produce a POSIXct seq in the format I require above?
We'll we got there in the end.
I believe the problem was the available RAM during the creation of the full vector. As this was the case I broke the vector into 3, converted them into ffdf format to free up RAM and then used rbind to bind them together.
The problem with formatting the vector once created, I believe, was due to accessing RAM. Every time I tried this R crashed.
Even with the work around below my machine is slowing (4gb). I've ordered some more RAM and hope this will smooth future operations.
Below is the working code;
library(ff)
library(ffbase)
tm1 = seq(from = as.POSIXct('1986-12-01 00:00'), to = as.POSIXct('2000-12-01 23:59'), by = 'min')
tm1 = data.frame(DateTime=strftime(tm1, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm1 = as.ffdf(tm1)
# converts to ffdf format
memory.size()
tm2 = seq(from = as.POSIXct('2000-12-02 00:00'), to = as.POSIXct('2010-12-01 23:59'), by = 'min')
tm2 = data.frame(DateTime=strftime(tm2, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm2 = as.ffdf(tm2)
memory.size()
tm3 = seq(from = as.POSIXct('2010-12-2 00:00'), to = as.POSIXct('2014-09-04 23:59'), by = 'min')
tm3 = data.frame(DateTime=strftime(tm3, format='%Y.%m.%d %H:%M'))
memory.size()
tm3 = as.ffdf(tm3)
# converts to ffdf format
memory.size()
tm4 = rbind(tm1, tm2, tm3)
# binds ffdf objects into one
dim(tm4)
# checks the row numbers

Convert matlab symbol to array of products

Can I convert a symbol that is a product of products into an array of products?
I tried to do something like this:
syms A B C D;
D = A*B*C;
factor(D);
but it doesn't factor it out (mostly because that isn't what factor is designed to do).
ans =
A*B*C
I need it to work if A B or C is replaced with any arbitrarily complicated parenthesized function, and it would be nice to do it without knowing what variables are in the function.
For example (all variables are symbolic):
D = x*(x-1)*(cos(z) + n);
factoring_function(D);
should be:
[x, x-1, (cos(z) + n)]
It seems like a string parsing problem, but I'm not confident that I can convert back to symbolic variables afterwards (also, string parsing in matlab sounds really tedious).
Thank you!
Use regexp on the string to split based on *:
>> str = 'x*(x-1)*(cos(z) + n)';
>> factors_str = regexp(str, '\*', 'split')
factors_str =
'x' '(x-1)' '(cos(z) + n)'
The result factor_str is a cell array of strings. To convert to a cell array of sym objects, use
N = numel(factors_str);
factors = cell(1,N); %// each cell will hold a sym factor
for n = 1:N
factors{n} = sym(factors_str{n});
end
I ended up writing the code to do this in python using sympy. I think I'm going to port the matlab code over to python because it is a more preferred language for me. I'm not claiming this is fast, but it serves my purposes.
# Factors a sum of products function that is first order with respect to all symbolic variables
# into a reduced form using products of sums whenever possible.
# #params orig_exp A symbolic expression to be simplified
# #params depth Used to control indenting for printing
# #params verbose Whether to print or not
def factored(orig_exp, depth = 0, verbose = False):
# Prevents sympy from doing any additional factoring
exp = expand(orig_exp)
if verbose: tabs = '\t'*depth
terms = []
# Break up the added terms
while(exp != 0):
my_atoms = symvar(exp)
if verbose:
print tabs,"The expression is",exp
print tabs,my_atoms, len(my_atoms)
# There is nothing to sort out, only one term left
if len(my_atoms) <= 1:
terms.append((exp, 1))
break
(c,v) = collect_terms(exp, my_atoms[0])
# Makes sure it doesn't factor anything extra out
exp = expand(c[1])
if verbose:
print tabs, "Collecting", my_atoms[0], "terms."
print tabs,'Seperated terms with ',v[0], ', (',c[0],')'
# Factor the leftovers and recombine
c[0] = factored(c[0], depth + 1)
terms.append((v[0], c[0]))
# Combines trivial terms whenever possible
i=0
def termParser(thing): return str(thing[1])
terms = sorted(terms, key = termParser)
while i<len(terms)-1:
if equals(terms[i][1], terms[i+1][1]):
terms[i] = (terms[i][0]+terms[i+1][0], terms[i][1])
del terms[i+1]
else:
i += 1
recombine = sum([terms[i][0]*terms[i][1] for i in range(len(terms))])
return simplify(recombine, ratio = 1)

matlab How to use a textstring as input parameter in functions

I would like to use a dataset filename "AUDUSD" in several functions. It would be easier for me, just to change the filename "AUDUSD" to a more general name like "FX" and then using the abbreviation "FX" in other_matlab functions, e.g. double(). But matlab does not know the name "FX" (that should be assigned to the dataset "AUDUSD") in the code below... Any suggestions?
CODE:
FX = 'AUDUSD';
load(FX); %OKAY !!! FX works as input to open file AUDUSD!
Svars = {'S_bid','S_offer'};
Fvars = {'F_bid','F_offer'};
vS = double(FX,Svars); % FX does NOT work as input for the file AUDUSD
There is no double() function that accepts multiple cell arrays as arguments (this is what happens when you call double(FX,Svars)).
If you call double(FX), then each character in FX is interpreted for its ASCII value and then cast to double. So you get [ 65.0 85.0 68.0 85.0 83.0 68.0 ]. This is the behavior for the double() function if you provide a vector: each individual value in the vector is cast to double.
You'd have to provide more details on what you're trying to accomplish to give any more suggestions.
I have a different example, maybe you will better understand my point. The key work I would like to process is as follows:
I have got a folder with "dataset" files. I would like to loop through this folder, entering in any datasetfile, extracting the 2nd and 3rd column of each dataset file, and constructing only ONE new datasetfile with all 2nd and 3rd columns of the datasetfiles.
One problem is that the size of the datasetfiles are not the same, so I tried to translate a datasetfile into a double-matrix and then consolidate all double matrices into ONE double matrx.
Here my code:
folder_string = 'Diss_Data/Raw';
FolderContent = dir(folder_string);
No_ds = numel(FolderContent);
for i = 1:No_ds
if isdir(FolderContent(i).name)==0
file_string = FolderContent(i).name;
file_path = strcat(folder_string,'/',file_string)
dataset_filename = file_string(1:6);
load(file_path); %loads the suggested datasetfile; OKAY
M = double(dataset_filename);% returns an ASCII code number; WRONG; should transfer the datasetfile into a matrix M
vS = M(:,2:3);
%... to be continued
end
end