Efficiently converting between MongoCursor and OrderedDict in Julia - mongodb

I'm using mongo.jl and need a more efficient way to convert between a MongoCursor and OrderedDict
Here, grabbing the data cursor with find typically takes < 1 second. Extracting and converting to OrderedDict takes ~20 seconds.
using Mongo
...
collection = MongoCollection(client, "myDB", json)
dataPre = find(collection, Dict(category=>id)) #Returns type Mongo.MongoCursor
i = 1
data = DataStructures.OrderedDict()
for rec in dataPre
data[i] = dict(rec)
i=i+1
end
Edit: I've realised through using #profile that 95% of my delay is in the conversion from dict to array, which is a few lines I omitted from the example code.
Here's the same example code with the omitted line:
using Mongo
...
collection = MongoCollection(client, "myDB", json)
dataPre = find(collection, Dict(category=>id)) #Returns type Mongo.MongoCursor
i = 1
data = DataStructures.OrderedDict()
form_arr = Array(Float64,(100,100,8,length(dataPre)))
for rec in dataPre
data = dict(rec) #A
for v1 = 1:100
for v2 = 1:8
form_arr[v1,:,v2,i] = data["info"][v1]["meas"][v2]["vals"] #B
end
end
i=i+1
end
The profiler indicates that line A takes ~ 5% of time, and line B uses up 95% of the time.
I'm stuck with that dict structure.. Is there a way to extract and format the data more efficiently?

Related

How to make this for loop run faster?

I have a data structure and I would like to change the information associated to some of its inputs listed in set1 to 4 using my function. I've written a for loop, but its execution time is very long. Is there any way to speed it up or any other way besides for loop (I can run them separately but it will be a very long script). Thanks in advance...
Set1= {'Andrew';'Mike';'Jane';'Bill';'Adam'};
Set2={'Romania';'Ecuador';'Singapore';'Norway';'India';'UK'};
Set3 = {'Liverpool';'Delhi';'New York'};
Set4 = {'2003';'1992';'1991';'2018';'2011';'2024';'2020'};
for A=1:length(Set1);
for B=1:length(Set2);
for C=1:length(Set3);
for D=1:length(Set4);
SET1 = Set1{A};
SET2 = Set2{B};
SET3 = Set3{C};
SET4 = Set4{D};
Data = myfunction(structure, SET1, 65,'Categ1');
Data = myfunction(structure, SET2, 100,'Categ2');
Data = myfunction(structure, SET3, 90,'Categ2');
Data = myfunction(structure, SET4, 76,'Categ1');
end
end
end
end
Your code is slow because you're unnecessarily using 4 nested loops.
You can simply handle each set in a separate loop, as:
for A=1:length(Set1);
SET1 = Set1{A};
Data = myfunction(structure, SET1, 65,'Categ1');
end
for B=1:length(Set2);
SET2 = Set2{B};
Data = myfunction(structure, SET2, 100,'Categ2');
end
for C=1:length(Set3);
SET3 = Set3{C};
Data = myfunction(structure, SET3, 90,'Categ2');
end
for D=1:length(Set4);
SET4 = Set4{D};
Data = myfunction(structure, SET4, 76,'Categ1');
end
You can also drop the temporary variable and write loops as:
for A=1:length(Set1);
Data = myfunction(structure, Set1{A}, 65,'Categ1');
end
// same for the 3 other loops

What is the best practice to recursively extract data from a nested structure in matlab?

I'm trying to exctract some data from a nested structure in a recursive manner. First, I know that this has a field (values) which repeats itself inside the nested structure. Secondly I know that the structure which has those values has only structures as fields. In the below code I've tried to acces the structure.values by searching if my current structure has a field named values. If it has, I put .values at the end of my structure name. If it doesn't have this field, I verify if all the fields are structures. If they are, it means that I will have to consider them further and to extract the values from each one. If the fields are not structures, it means that they are values and I save them into a new simplified structure. Example of fields that I want: S.values.model1.values.mission.values.(alt/list). Currently, with the below code I'm only able to get the values from one field and then I get an error and don't know how to approach further.
Code example:
clear all
clc
S=struct()
S.case='1';
S.type='A';
S.values.model1.case='2'
S.values.model1.type='C'
S.values.model1.values.mission.case='3'
S.values.model1.values.mission.type='D'
S.values.model1.values.mission.values.alt='none'
S.values.model1.values.mission.values.list=2
S.values.model1.values.mission.values.parameter=4
S.values.model1.values.phase.case='4'
S.values.model1.values.phase.type='A'
S.values.model1.values.phase.values.num='all'
S.values.model1.values.phase.values.eq=2
S.values.model1.values.phase.values.unit=4
S.values.model1.values.analysis.case='1'
S.values.model1.values.phase.type='A'
S.values.model1.values.phase.values.nump1.list='all'
S.values.model1.values.phase.values.nump1.table='four'
S.values.model1.values.phase.values.nump1.mean=0
S.values.model1.values.phase.values.nump2.list='none'
S.values.model1.values.phase.values.nump2.table='three';
S.values.model1.values.phase.values.nump2.mean=1
s=S.values.model1;
names=fieldnames(s);
nnames=numel(names);
newStruct={};
[valsi,newstructi]=extractValues(names,s,nnames,newStruct)
function [vals,newStruct]=extractValues(names,vals,nnames,newStruct)
if any(strcmp(names,'values'))
vals=vals.('values');
names=fieldnames(vals)
nnames=numel(names)
[vals,newStruct]=extractValues(names,vals,nnames,newStruct);
end
for j=1:nnames
value(j)=isstruct((vals.(names{j})));
end
if all(value)
for k=1:nnames
vals=(vals.(names{k}));
names=fieldnames(vals);
nnames=numel(names);
[vals,newStruct]=extractValues(names,vals,nnames,newStruct);
end
else
for j=1:nnames
value=(vals.(names{j}));
newStruct.(names{j})=value;
end
end
end
As it is known beforehand what fields are requested you can arrange the subsequent filed names in a cell array and use a loop to extract the value:
names = {'values', 'model1', 'values', 'mission', 'values', 'alt'};
out = S;
for name : names
out = out.(name{1});
end
So that is a loop version of using:
out = S.values.model1.values.mission.values.alt;
EDIT:
If you want to list all field names and all field values you can used these functions:
function out = names(s, p)
if isstruct(s)
out = {};
f = fieldnames(s);
for i = 1:numel(f)
s1 = s.(f{i});
p1 = [p '.' f{i}];
out = [out; names(s1, p1)];
end
else
out = {p};
end
end
function out = values(s)
if isstruct(s)
out = {};
f = fieldnames(s);
for i = 1:numel(f)
out = [out; values(s.(f{i}))];
end
else
out = {s};
end
end
Use them as:
n = names(S, 'S');
v = values(S);

Struggling to vertically append a string ( csv file) to an existing matrix in Scilab/matlab

M=read_csv("Gradesheets\MECN2073_2014.csv")
disp(size(M,1))
for x=1:1:size(M,1)
StudNum=M(x,1)
Grd=M(x,2) // Grd is grade column in gradesheets hence x,2 (column 2 of Grd )
if find(StudNum==StudentNumbers)>0 then disp("Exists")
H = [StudNum]
StudentNumbers = [StudentNumbers;StudNum]
end
end
This is where i get error:
Undefined operation for the given operands.
check or define function %s_f_c for overloading.
at line 38 of exec file called by :
I assume that StudentNumbers is the vector containing all your existing students right? If so, try the code below and let me know what you get.
M=read_csv("Gradesheets\MECN2073_2014.csv")
storing = [];
for i=1:size(M,1)
StudNum=M(i,1)
if find(StudNum==StudentNumbers(i)) == 1
disp("Exists");
storing = [storing;StudNum];
end
end

Using a string to refer to a structure array - matlab

I am trying to take the averages of a pretty large set of data, so i have created a function to do exactly that.
The data is stored in some struct1.struct2.data(:,column)
there are 4 struct1 and each of these have between 20 and 30 sub-struct2
the data that I want to average is always stored in column 7 and I want to output the average of each struct2.data(:,column) into a 2xN array/double (column 1 of this output is a reference to each sub-struct2 column 2 is the average)
The omly problem is, I can't find a way (lots and lots of reading) to point at each structure properly. I am using a string to refer to the structures, but I get error Attempt to reference field of non-structure array. So clearly it doesn't like this. Here is what I used. (excuse the inelegence)
function [avrg] = Takemean(prefix,numslits)
% place holder arrays
avs = [];
slits = [];
% iterate over the sub-struct (struct2)
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
% slap the average and slit ID on the end
avs(end+1) = mean(prefix.dataname.data(:,7));
slits(end+1) = currslit;
end
% transpose the arrays
avs = avs';
slits = slits';
avrg = cat(2,slits,avs); % slap them together
It falls over at this line avs(end+1) = mean(prefix.dataname.data,7); because as you can see, prefix and dataname are strings. So, after hunting around I tried making these strings variables with genvarname() still no luck!
I have spent hours on what should have been 5min of coding. :'(
Edit: Oh prefix is a string e.g. 'Hs' and the structure of the structures (lol) is e.g. Hs.Hs_slit_XX.data() where XX is e.g. 01,02,...27
Edit: If I just run mean(Hs.Hs_slit_01.data(:,7)) it works fine... but then I cant iterate over all of the _slit_XX
If you simply want to iterate over the fields with the name pattern <something>_slit_<something>, you need neither the prefix string nor numslits for this. Pass the actual structure to your function, extract the desired fields and then itereate them:
function avrg = Takemean(s)
%// Extract only the "_slit_" fields
names = fieldnames(s);
names = names(~cellfun('isempty', strfind(names, '_slit_')));
%// Iterate over fields and calculate means
avrg = zeros(numel(names), 2);
for k = 1:numel(names)
avrg(k, :) = [k, mean(s.(names{k}).data(:, 7))];
end
This method uses dynamic field referencing to access fields in structs using strings.
First of all, think twice before you use string construction to access variables.
If you really really need it, here is how it can be used:
a.b=123;
s1 = 'a';
s2 = 'b';
eval([s1 '.' s2])
In your case probably something like:
Hs.Hs_slit_01.data= rand(3,7);
avs = [];
dataname = 'Hs_slit_01';
prefix = 'Hs';
eval(['avs(end+1) = mean(' prefix '.' dataname '.data(:,7))'])

MATLAB Changing the name of a matrix with each iteration

I was just wondering if there is a clean way to store a matrix after each iteration with a different name? I would like to be able to store each matrix (uMatrix) under a different name depending on which simulation I am on eg Sim1, Sim2, .... etc. SO that Sim1 = uMatrix after first run through, then Sim2 = uMatrix after 2nd run through. SO that each time I can store a different uMatrix for each different Simulation.
Any help would be much appreciated, and sorry if this turns out to be a silly question. Also any pointers on whether this code can be cleaned up would be great too
Code I am using below
d = 2;
kij = [1,1];
uMatrix = [];
RLABEL=[];
SimNum = 2;
for i =1:SimNum
Sim = ['Sim',num2str(i)] %Simulation number
for j=1:d
RLABEL = [RLABEL 'Row','',num2str(j) ' '];
Px = rand;
var = (5/12)*d*sum(kij);
invLam = sqrt(var);
u(j) = ((log(1-Px))*-invLam)+kij(1,j);
uMatrix(j,1) = j;
uMatrix(j,2) = u(j);
uMatrix(j,3) = kij(1,j);
uMatrix(j,4) = Px;
uMatrix(j,5) = invLam;
uMatrix(j,6) = var;
end
printmat(uMatrix,'Results',RLABEL,'SECTION u kij P(Tij<u) 1/Lambda Var')
end
There are really too many options. To go describe both putting data into, and getting data our of a few of these methods:
Encode in variable names
I really, really dislike this approach, but it seems to be what you are specifically asking for. To save uMatrix as a variable Sim5 (after the 5th run), add the following to your code at the end of the loop:
eval([Sim ' = uMatrix;']); %Where the variable "Sim" contains a string like 'Sim5'
To access the data
listOfStoredDataNames = who('Sim*')
someStoredDataItem = eval(listOfStoredDataNames {1}) %Ugghh
%or of you know the name already
someStoredDataItem = Sim1;
Now, please don't do this. Let me try and convince you that there are better ways.
Use a structure
To do the same thing, using a structure called (for example) simResults
simResults.(Sim) = uMatrix;
or even better
simResults.(genvarname(Sim)) = uMatrix;
To access the data
listOfStoredDataNames = fieldnames(simResults)
someStoredDataItem = simResults.(listOfStoredDataNames{1})
%or of you know the name already
someStoredDataItem = simResults.Sim1
This avoids the always problematic eval statement, and more importantly makes additional code much easier to write. For example you can easily pass all simResults into a function for further processing.
Use a Map
To use a map to do the same storage, first initialize the map
simResults = containers.Map('keyType','char','valueType','any');
Then at each iteration add the values to the map
simResults(Sim) = uMatrix;
To access the data
listOfStoredDataNames = simResults.keys
someStoredDataItem = simResults(listOfStoredDataNames{1})
%or of you know the name already
someStoredDataItem = simResults('Sim1')
Maps are a little more flexible in the strings which can be used for names, and are probably a better solution if you are comfortable.
Use a cell array
For simple, no nonsense storage of the results
simResults{i} = uMatrix;
To access the data
listOfStoredDataNames = {}; %Names are Not available using this method
someStoredDataItem = simResults{1}
Or, using a slight level of nonesense
simResults{i,1} = Sim; %Store the name in column 1
simResults{i,2} = uMatrix; %Store the result in column 2
To access the data
listOfStoredDataNames = simResults(:,1)
someStoredDataItem = simResults{1,2}
Just to add to the detailed answer given by #Pursuit, there is one further method I am fond of:
Use an array of structures
Each item in the array is a structure which stores the results and any additional information:
simResults(i).name = Sim; % store the name of the run
simResults(i).uMatrix = uMatrix; % store the results
simResults(i).time = toc; % store the time taken to do this run
etc. Each element in the array will need to have the same fields. You can use quick operations to extract all the elements from the array, for example to see the timings of each run at a glance you can do:
[simResults.time]
You can also use arrayfun to to a process on each element in the array:
anon_evaluation_func = #(x)( evaluate_uMatrix( x.uMatrix ) );
results = arrayfun( anon_evaluation_func, simResults );
or in a more simple syntax,
for i = 1:length(simResults)
simResults(i).result = evaluate_uMatrix( simResults(i).uMatrix );
end
I would try to use a map which stores a <name, matrix>.
the possible way to do it would be to use http://www.mathworks.com/help/matlab/map-containers.html