How can I create a POSIXct vector in ffdf? - eclipse

I've had a look around and can't quite seem to get a grasp of is going on with this. I'm using R in Eclipse. The file I'm trying to import is 700mb with around 15mil rows and 6 columns. As I was having problems loading in I have started using the ff package.
library(ff)
FDF = read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\GBPUSD.1986.2014.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
names(FDF)= c('Date','Time','Open','High','Low','Close')
#names the columns in the ffdf file
dim(FDF)
# produces dimensions of the file
I then want to create a POSIXct sequence which will later be joined against the imported file. I had tried;
tm1 = seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = data.frame (DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
However R kept of crashing. I then tested this is RStudio and saw that their where constraints on the vector. It did, however, produce the correct
dim(tm1)
names(tm1)
So I went back into Eclipse thinking this was something to do with memory allocation. I've attempted the following;
library(ff)
tm1 = as.ffdf(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ffdf(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
names(tm1) = c('DateTime')
dim(tm1)
names(tm1)
This gives an error of
no applicable method for 'as.ffdf' applied to an object of class "c('POSIXct', 'POSIXt')"
I can't seem to work around this. I then tried ...
library(ff)
tm1 = as.ff(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ff(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
Which produce the output dates, however not in the correct format. In addition to this, when ...
dim(tm1)
names(tm1)
where executed they both returned null.
Question
How can I produce a POSIXct seq in the format I require above?

We'll we got there in the end.
I believe the problem was the available RAM during the creation of the full vector. As this was the case I broke the vector into 3, converted them into ffdf format to free up RAM and then used rbind to bind them together.
The problem with formatting the vector once created, I believe, was due to accessing RAM. Every time I tried this R crashed.
Even with the work around below my machine is slowing (4gb). I've ordered some more RAM and hope this will smooth future operations.
Below is the working code;
library(ff)
library(ffbase)
tm1 = seq(from = as.POSIXct('1986-12-01 00:00'), to = as.POSIXct('2000-12-01 23:59'), by = 'min')
tm1 = data.frame(DateTime=strftime(tm1, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm1 = as.ffdf(tm1)
# converts to ffdf format
memory.size()
tm2 = seq(from = as.POSIXct('2000-12-02 00:00'), to = as.POSIXct('2010-12-01 23:59'), by = 'min')
tm2 = data.frame(DateTime=strftime(tm2, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm2 = as.ffdf(tm2)
memory.size()
tm3 = seq(from = as.POSIXct('2010-12-2 00:00'), to = as.POSIXct('2014-09-04 23:59'), by = 'min')
tm3 = data.frame(DateTime=strftime(tm3, format='%Y.%m.%d %H:%M'))
memory.size()
tm3 = as.ffdf(tm3)
# converts to ffdf format
memory.size()
tm4 = rbind(tm1, tm2, tm3)
# binds ffdf objects into one
dim(tm4)
# checks the row numbers

Related

Nnet in caret, basic structure

I'm very new to caret package and nnet in R. I've done some projects related to ANN with Matlab before, but now I need to work with R and I need some basic help.
My input dataset has 1000 observations (in rows) and 23 variables (in columns). My output has 1000 observations and 12 variables.
Here are some sample data that represent my dataset and might help to understand my problem better:
input = as.data.frame(matrix(sample(1 : 20, 100, replace = TRUE), ncol = 10))
colnames(input) = paste ( "X" , 1:10, sep = "") #10 observations and 10 variables
output = as.data.frame(matrix(sample(1 : 20, 70, replace = TRUE), ncol = 7))
colnames(output) = paste ( "Y" , 1:7, sep = "") #10 observations and 7 variables
#nnet with caret:
net1 = train(output ~., data = input, method= "nnet", maxit = 1000)
When I run the code, I get this error:
error: invalid type (list) for variable 'output'.
I think I have to add all output variables separately (which is very annoying, especially with a lot of variables), like this:
train(output$Y1 + output$Y2 + output$Y3 + output$Y4 + output$Y5 +
output$Y6 + output$Y7 ~., data = input, method= "nnet", maxit = 1000)
This time it runs but I get this error:
Error in [.data.frame(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
I try to use neuralnet package, with the code below it works perfectly but I still have to add output variables separately :(
net1 = neuralnet(output$Y1 + output$Y2 + output$Y3 + output$Y4 +
output$Y5 + output$Y6 + output$Y7 ~., data = input, hidden=c(2,10))
p.s. since these sample data are created randomly, the neuralnet cannot converge, but in my real data it works well (in comparison to Matlab ANN)
Now, if you could help me with a way to put output variables automatically (not manually), it solves my problem (although with neuralnet not caret).
use the str() function and ascertain that its a data frame looks like you are inputting a list to the train function. This may be because of a transformation you are doing before to output.
str(output)
Without a full script of earlier steps its difficult to understand what is going on.
After trying different things and searches, I finally found a solution:
First, we must use as.formula to show the relation between our input and output. With the code below we don't need to add all the variables separately:
names1 <- colnames(output) #the name of our variables in the output
names2 = colnames(input) #the name of our variables in the input
a <- as.formula(paste(paste(names1,collapse='+', sep = ""),' ~ '
,paste(names2,collapse='+', sep = "")))
then we have to combine our input and output in a single data frame:
all_data = cbind(output, input)
then, use neuralnet like this:
net1 = neuralnet(formula = a, data = all_data, hidden=c(2,10))
plot(net1)
This is also work with the caret package:
net1 = train(a, data = all_data, method= "nnet", maxit = 1000)
but it seems neuralnet works faster (at least in my case).
I hope this helps someone else.

Matlab: read multiple files

my matlab script read several wav files contained in a folder.
Each read signal is saved in cell "mat" and each signal is saved in array. For example,
I have 3 wav files, I read these files and these signals are saved in arrays "a,b and c".
I want apply another function that has as input each signal (a, b and c) and the name of corresponding
file.
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files=dir(dirMask);
N = natsortfiles({Files.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
str =fullfile(fileRoot, Files(k).name);
[C{k},D{k}] = audioread(str);
mat = [C(:)];
fs = [D(:)];
a=mat{1};
b=mat{2};
c=mat{3};
myfunction(a,Files(1).name);
myfunction(b,Files(2).name);
myfunction(c,Files(3).name);
end
My script doesn't work because myfunction considers only the last Wav file contained in the folder, although
arrays a, b and c cointain the three different signal.
If I read only one wav file, the script works well. What's wrong in the for loop?
Like Cris noticed, you have some issues with how you structured your for loop. You are trying to use 'b', and 'c' before they are even given any data (in the second and third times through the loop). Assuming that you have a reason for structuring your program the way you do (I would rewrite the loop so that you do not use 'a','b', or 'c'. And just send 'myfunction' the appropriate index of 'mat') The following should work:
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files=dir(dirMask);
N = natsortfiles({Files.name});
C = cell(size(N));
D = cell(size(N));
a = {};
b = {};
c = {};
for k = 1:numel(N)
str =fullfile(fileRoot, Files(k).name);
[C{k},D{k}] = audioread(str);
mat = [C(:)];
fs = [D(:)];
a=mat{1};
b=mat{2};
c=mat{3};
end
myfunction(a,Files(1).name);
myfunction(b,Files(2).name);
myfunction(c,Files(3).name);
EDIT
I wanted to take a moment to clarify what I meant by saying that I would not use the a, b, or c variables. Please note that I could be missing something in what you were asking so I might be explaining things you already know.
In a certain scenarios like this it is possible to articulate exactly how many variables you will be using. In your case, you know that you have exactly 3 audio files that you are going to process. So, variables a, b, and c can come out. Great, but what if you have to throw another audio file in? Now you need to go back in and add a 'd' variable and another call to 'myfunction'. There is a better way, that not only reduces complexity but also extends functionality to the program. See the following code:
%same as your code
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files = dir(dirMask);
%slight variable name change, k->idx, slightly more meaningful.
%also removed N, simplifying things a little.
for idx = 1:numel(Files)
%meaningful variable name change str -> filepath.
filepath = fullfile(fileRoot, Files(idx).name);
%It was unclear if you were actually using the Fs component returned
%from the 'audioread' call. I wanted to make sure that we kept access
%to that data. Note that we have removed 'mat' and 'fs'. We can hold
%all of that data inside one variable, 'audio', which simplifies the
%program.
[audio{idx}.('data'), audio{idx}.('rate')] = audioread(filepath);
%this function call sends exactly the same data that your version did
%but note that we have to unpack it a little by adding the .('data').
myfunction(audio{idx}.('data'), Files(idx).name);
end

How to load a sequence of image files using a for loop in MATLAB?

I am beginner in MATLAB. I would like to load 200 image files (size 192x192) in a specific folder by using a for loop.
The image names are '1.png', '2.png', '3.png' and so on.
My code is as below.
list = dir('C:/preds/*.png');
N = size(list,1);
sum_image = zeros(192,192,200);
for i = 1:N
sum_image(:,:,i) = imread('C:/preds/i.png');
end
Which part should I change ?
I would probably do it like the code below:
You are currently getting the list of filenames then not really doing much with it. Iterating over the list is safer otherwise if there is a missing number you could have issues. Also, the sort maybe unnecessary depending if you image numbering is zero-padded so they come out in the correct order ... but better safe than sorry. One other small change initializing the array to size N instead of hard-coding 200. This will make it more flexible.
searchDir = 'C:\preds\';
list = dir([searchDir '*.png']);
nameList = {list.name}; %Get array of names
imNum = str2double(strrep(nameList,'.png','')); %Get image number
[~,idx] = sort(imNum); %sort it
nameList = nameList(idx);
N = numel(nameList);
sum_image = zeros(192,192,N);
for i=1:N
sum_image(:,:,i) = imread(fullfile(searchDir,nameList{i}));
end
I would suggest changing the line within the loop to the following:
sum_image(:,:,i) = imread(['C:/preds/', num2str(i), '.png']);
MATLAB treats the i in your string as a character and not the variable i. The above line of code builds your string piece by piece.
If this isn't a homework problem, the right answer to this question is don't write this as a for loop. Use an imageDatastore:
https://www.mathworks.com/help/matlab/ref/imagedatastore.html
ds = imageDatastore('C:/preds/');
sumImageCellArray = readall(ds);
sumImage = cat(3,sumImageCellArray{:});

matlab How to use a textstring as input parameter in functions

I would like to use a dataset filename "AUDUSD" in several functions. It would be easier for me, just to change the filename "AUDUSD" to a more general name like "FX" and then using the abbreviation "FX" in other_matlab functions, e.g. double(). But matlab does not know the name "FX" (that should be assigned to the dataset "AUDUSD") in the code below... Any suggestions?
CODE:
FX = 'AUDUSD';
load(FX); %OKAY !!! FX works as input to open file AUDUSD!
Svars = {'S_bid','S_offer'};
Fvars = {'F_bid','F_offer'};
vS = double(FX,Svars); % FX does NOT work as input for the file AUDUSD
There is no double() function that accepts multiple cell arrays as arguments (this is what happens when you call double(FX,Svars)).
If you call double(FX), then each character in FX is interpreted for its ASCII value and then cast to double. So you get [ 65.0 85.0 68.0 85.0 83.0 68.0 ]. This is the behavior for the double() function if you provide a vector: each individual value in the vector is cast to double.
You'd have to provide more details on what you're trying to accomplish to give any more suggestions.
I have a different example, maybe you will better understand my point. The key work I would like to process is as follows:
I have got a folder with "dataset" files. I would like to loop through this folder, entering in any datasetfile, extracting the 2nd and 3rd column of each dataset file, and constructing only ONE new datasetfile with all 2nd and 3rd columns of the datasetfiles.
One problem is that the size of the datasetfiles are not the same, so I tried to translate a datasetfile into a double-matrix and then consolidate all double matrices into ONE double matrx.
Here my code:
folder_string = 'Diss_Data/Raw';
FolderContent = dir(folder_string);
No_ds = numel(FolderContent);
for i = 1:No_ds
if isdir(FolderContent(i).name)==0
file_string = FolderContent(i).name;
file_path = strcat(folder_string,'/',file_string)
dataset_filename = file_string(1:6);
load(file_path); %loads the suggested datasetfile; OKAY
M = double(dataset_filename);% returns an ASCII code number; WRONG; should transfer the datasetfile into a matrix M
vS = M(:,2:3);
%... to be continued
end
end

MATLAB Changing the name of a matrix with each iteration

I was just wondering if there is a clean way to store a matrix after each iteration with a different name? I would like to be able to store each matrix (uMatrix) under a different name depending on which simulation I am on eg Sim1, Sim2, .... etc. SO that Sim1 = uMatrix after first run through, then Sim2 = uMatrix after 2nd run through. SO that each time I can store a different uMatrix for each different Simulation.
Any help would be much appreciated, and sorry if this turns out to be a silly question. Also any pointers on whether this code can be cleaned up would be great too
Code I am using below
d = 2;
kij = [1,1];
uMatrix = [];
RLABEL=[];
SimNum = 2;
for i =1:SimNum
Sim = ['Sim',num2str(i)] %Simulation number
for j=1:d
RLABEL = [RLABEL 'Row','',num2str(j) ' '];
Px = rand;
var = (5/12)*d*sum(kij);
invLam = sqrt(var);
u(j) = ((log(1-Px))*-invLam)+kij(1,j);
uMatrix(j,1) = j;
uMatrix(j,2) = u(j);
uMatrix(j,3) = kij(1,j);
uMatrix(j,4) = Px;
uMatrix(j,5) = invLam;
uMatrix(j,6) = var;
end
printmat(uMatrix,'Results',RLABEL,'SECTION u kij P(Tij<u) 1/Lambda Var')
end
There are really too many options. To go describe both putting data into, and getting data our of a few of these methods:
Encode in variable names
I really, really dislike this approach, but it seems to be what you are specifically asking for. To save uMatrix as a variable Sim5 (after the 5th run), add the following to your code at the end of the loop:
eval([Sim ' = uMatrix;']); %Where the variable "Sim" contains a string like 'Sim5'
To access the data
listOfStoredDataNames = who('Sim*')
someStoredDataItem = eval(listOfStoredDataNames {1}) %Ugghh
%or of you know the name already
someStoredDataItem = Sim1;
Now, please don't do this. Let me try and convince you that there are better ways.
Use a structure
To do the same thing, using a structure called (for example) simResults
simResults.(Sim) = uMatrix;
or even better
simResults.(genvarname(Sim)) = uMatrix;
To access the data
listOfStoredDataNames = fieldnames(simResults)
someStoredDataItem = simResults.(listOfStoredDataNames{1})
%or of you know the name already
someStoredDataItem = simResults.Sim1
This avoids the always problematic eval statement, and more importantly makes additional code much easier to write. For example you can easily pass all simResults into a function for further processing.
Use a Map
To use a map to do the same storage, first initialize the map
simResults = containers.Map('keyType','char','valueType','any');
Then at each iteration add the values to the map
simResults(Sim) = uMatrix;
To access the data
listOfStoredDataNames = simResults.keys
someStoredDataItem = simResults(listOfStoredDataNames{1})
%or of you know the name already
someStoredDataItem = simResults('Sim1')
Maps are a little more flexible in the strings which can be used for names, and are probably a better solution if you are comfortable.
Use a cell array
For simple, no nonsense storage of the results
simResults{i} = uMatrix;
To access the data
listOfStoredDataNames = {}; %Names are Not available using this method
someStoredDataItem = simResults{1}
Or, using a slight level of nonesense
simResults{i,1} = Sim; %Store the name in column 1
simResults{i,2} = uMatrix; %Store the result in column 2
To access the data
listOfStoredDataNames = simResults(:,1)
someStoredDataItem = simResults{1,2}
Just to add to the detailed answer given by #Pursuit, there is one further method I am fond of:
Use an array of structures
Each item in the array is a structure which stores the results and any additional information:
simResults(i).name = Sim; % store the name of the run
simResults(i).uMatrix = uMatrix; % store the results
simResults(i).time = toc; % store the time taken to do this run
etc. Each element in the array will need to have the same fields. You can use quick operations to extract all the elements from the array, for example to see the timings of each run at a glance you can do:
[simResults.time]
You can also use arrayfun to to a process on each element in the array:
anon_evaluation_func = #(x)( evaluate_uMatrix( x.uMatrix ) );
results = arrayfun( anon_evaluation_func, simResults );
or in a more simple syntax,
for i = 1:length(simResults)
simResults(i).result = evaluate_uMatrix( simResults(i).uMatrix );
end
I would try to use a map which stores a <name, matrix>.
the possible way to do it would be to use http://www.mathworks.com/help/matlab/map-containers.html