Import numbers between parentheses - matlab

I have made a datafile from OpenFoam that extracts velocity at a certain location in time. I would like to extract two of these velocity and take there time average. For example I would like to extract the numbers: 0.0539764,0.0104665,0.00201741 and so on from probe 0. And extract the numbers: 0.690374, 0.711402, 0.699848 and so on from probe 1. How can this be done in Matlab?
I have done something similar before, but then the probes only consisted of 1 number (without the parentheses), now it consist of 3 numbers inscribed in a parentheses, I don't know what I am supposed to do.
Help is much appreciated.
Link to the whole file: https://drive.google.com/file/d/0B9CEsYCSSZUSdjFzYXVFc1RhM0k/view?usp=sharing

This will create two matrices probe0 & probe1. You can index just the first column of each if that is all you are after.
id = fopen('testprobe.txt','r');
t = textscan(id,'%s','delimiter',sprintf('\n'));
fclose(id);
out = regexp(t{1,1}(6:end-3), '(?<=\()[^)]*(?=\))', 'match', 'all');
probe0 = zeros(size(out,1),3);
probe1 = zeros(size(out,1),3);
for i = 1:size(out,1)
if ~isempty(out{i,:})
probe0(i,:) = (str2double(split(out{i,1}{1,1})))';
probe1(i,:) = (str2double(split(out{i,1}{1,2})))';
else
probe0(i,:) = [0,0,0];
probe1(i,:) = [0,0,0];
end
end

Related

Reading in numbers between parentheses

I am trying to import numbers between parentheses in Matlab. I am using the software OpenFoam that produces a file that extract velocity's (u,v,w) at a number of different positions for different time steps. I would like to import these velocity so I could take their average over a certain time intervall. I have about 250 probes in my flow domain, meaning I would like to import 750 different numbers for a number of different time steps. The file looks like this:
Link to file:
https://drive.google.com/file/d/1CuoflLADasUybsR4UJf1PQBUcGD0SsVb/view?usp=sharing
So I would like to import all the numbers in to a matrix with size ((number of time steps) X (probes))
I found a code that does works and imports these numbers, but this is very manual.I would have to write out probexx(i,:) = (str2double(split(out{i,1}{1,xx})))'; manually 250 times to get it to work. I would like to have a more automatic code, so I could change the number of probes easily. Could any one help me?
Thank you in advance!
id = fopen('probe.dat','r');
t = textscan(id,'%s','delimiter',sprintf('\n'));
fclose(id);
out = regexp(t{1,1}(6:end-3), '(?<=\()[^)]*(?=\))', 'match', 'all');
probe0 = zeros(size(out,1),3);
probe1 = zeros(size(out,1),3);
for i = 1:size(out,1)
if ~isempty(out{i,:})
probe0(i,:) = (str2double(split(out{i,1}{1,1})))';
probe1(i,:) = (str2double(split(out{i,1}{1,2})))';
else
probe0(i,:) = [0,0,0];
probe1(i,:) = [0,0,0];
end
end
I would do it like this assuming that each row is uniform after the header lines.
id = fopen('probes.dat','r');
t = textscan(id,'%f','Delimiter',{'(',')',' '},'MultipleDelimsAsOne',true,'headerlines',5);
fclose(id);
numProbes = 254;
temp = reshape(t{1},numProbes*3+1,[]);
outData.time = temp(1,:).';
for ii = 1:numProbes
rowIdx = (ii-1)*3+2:(ii-1)*3+4;
outData.(num2str(ii,'probe%d')) = temp(rowIdx,:).';
end
Basically read all of the numeric data into 1 array. Using the multiple delimiters feature and specifying the number of header lines. Next reshape based on the number of probes (in your example DAT there was 254).
Then loop over the number of probes to assign it to fields of a structure with the variable names that you want (probeXX).
This leaves you with a structure of the form:
outData =
time: [47x1 double]
probe1: [47x3 double]
probe2: [47x3 double]
probe3: [47x3 double]
...
probe254: [47x3 double]

Filter data with standard derivation in loop

I have acceleration (10240x31) data that I want to filter by replacing every data point that exceeds the threshold value of 4 times standard derivation of each column with the mean value of the two adjacent data points.
First, I wanted to replace every data point with a zero, if it exceeds the maximum value. This is my loop:
for w = 1:31
Sigma(w) = std(zacceleration(:,w));
zacceleration(zacceleration<(-4*Sigma(w))) = 0;
zacceleration(zacceleration>(4*Sigma(w))) = 0;
end
That code works if w is just one number, for example:
w = 1;
But when w changes every iteration, the filtered data only contains the values that don't exceed the threshold value of the last dataset, Sigma(31).
So, I guess that I overwrite my data or something like that but I cant seem to find a solution.
Can anybody please give me a hint?
Thank you in advance and best regards.
I think I got it now.
Sigma = std(zacceleration);
for a = 1:10240;
for b = 1:31;
if zacceleration(a,b)<(-4*Sigma(b))
zacceleration(a,b) = 0;
end
if zacceleration(a,b)>(4*Sigma(b))
zacceleration(a,b) = 0;
end
end
end

Variable labels in MATLAB

I have a huge table data= {1000 x 1000} of binary data.
They table's variable names are encoded for eg D1,D2,...,DA2,DA3,... with their real labels given in a .txt file.
The .txt file also consists of some text for eg:
D1: Age
Mean age: 33
Median :
.
.
.
D2: weight
I would just like to pick out these names from the text file and create a table with the real variable names.
Any suggestions?
If there is a specific number of lines between each of those labels, then you can extract them by reading in the file, and looping over the relevant lines. For each label, it simple to extract the label with strsplit()
e.g. Let's say there's 5 lines between each label
uselessLines = 5;
% imports as a vertical matrix with each line from the file.
dataLabelsFile = importdata(filename);
% get the total number of lines
numLines = size(dataLabelsFile);
% pre-allocate array for labels, a cell is used for a string
dataLabels = cell(ceil(numLines/(uselessLines+1)));
% use a seperate counting variable
m = 1;
% now, for each label, we add it to the dataLabels matrix
for i=1:(uselessLines+1):numLines
line = strsplit(dataLabelsFile{i}); % by default splits on whitespace
dataLabels(m) = line(2);
m = m + 1;
end
By the end of that loop you should have a variable called dataLabels that holds all of the labels. Now, you can actually very easily work out which label goes with which set of data
provided they are still in the same order. The indexes will be the same for the label to the data.
This is a method you could try if the labels are evenly spaced.
However, if the labels are a random number of lines, then you probably want to do a check with a regular expression like the person below me has suggested. Then you just replace the last two lines of the loop with something like this.
...
if (regular expression matched)
dataLabels(m) = line(2);
m = m + 1;
end
...
That being said, while regular expressions are flexible, if you can get away with replacing it with literally one function call, it's usually better to do that. Regex efficiencies are determined by the skill of the programmer, while in-built functions have generally been tested by some of the better programmers in the world. Additionally, Regex's are harder to understand if you ever want to go back and change it.
Of course there are times when Regex's are amazing, I'm just not convinced this is one of those times.
An implemention of the approach in my earlier comment:
fid = fopen(filename);
varNames = cell(0);
proceed = true;
while proceed
line = fgetl(fid);
if ischar(line)
startIdx = regexp(line,'(?<=^[A-Z]*\d*:)\s');
if ~isempty(startIdx)
varNames{end+1} = strtrim(line(startIdx:end)); %#ok<SAGROW>
end
else
proceed = false;
end
end
fclose(fid);
I cant put the resulting varNames in a table for you, since I have a version of Matlab that does not support tables.

MATLAB: Creating a matrix from for loop values?

I have the following code:
for i = 1450:9740:89910
n = i+495;
range = ['B',num2str(i),':','H',num2str(n)];
iter = xlsread('BrokenDisplacements.xlsx' , range);
displ = iter;
displ = [displ; iter];
end
Which takes values from an Excel file from a number of ranges I want and outputs them as matricies. However, this code just uses the final value of displ and creates the total matrix from there. I would like to total these outputs (displ) into one large matrix saving values along the way, how would I go about doing this?
Since you know the size of the block of data you are reading, you can make your code much more efficient as follows:
firstVals = 1450:9740:89910;
displ = zeros((firstVals(end) - firstVals(1) + 1 + 496), 7);
for ii = firstVals
n = ii + 495;
range = sprintf('B%d:H%d', ii, ii+495);
displ((ii:ii+495)-firstVals(1)+1,:) = xlsread('BrokenDiplacements.xlsx', range);
end
Couple of points:
I prefer not to use i as a variable since it is built in as sqrt(-1) - if you later execute code that assumes that to be true, you're in trouble
I am not assuming that the last value of ii is 89910 - by first assigning the value to a vector, then finding the last value in the vector, I sidestep that question
I assign all space in iter at once - otherwise, as it grows, Matlab keeps having to move the array around which can slow things down a lot
I used sprintf to generate the string representing the range - I think it's more readable but it's a question of style
I assign the return value of xlsread directly to a block in displ that is the right size
I hope this helps.
How about this:
displ=[];
for i = 1450:9740:89910
n = i+495;
range = ['B',num2str(i),':','H',num2str(n)];
iter = xlsread('BrokenDisplacements.xlsx' , range);
displ = [displ; iter];
end

Matlab: Random sample with replacement

What is the best way to do random sample with replacement from dataset? I am using 316 * 34 as my dataset. I want to segment the data into three buckets but with replacement. Should I use the randperm because I need to make sure I keep the index intact where that index would be handy in identifying the label data. I am new to matlab I saw there are couple of random sample methods but they didn't look like its doing what I am looking for, its strange to think that something like doesn't exist in matlab, but I did the follwoing:
My issue is when I do this row_idx = round(rand(1)*316) sometimes I get zero, that leads to two questions
what should I do to avoid zeor?
What is the best way to do the random sample with replacement.
shuffle_X = X(randperm(size(X,1)),:);
lengthOf_shuffle_X = length(shuffle_X)
number_of_rows_per_bucket = round(lengthOf_shuffle_X / 3)
bucket_cell = cell(3,1)
bag_matrix = []
for k = 1:length(bucket_cell)
for i = 1:number_of_rows_per_bucket
row_idx = round(rand(1)*316)
bag_matrix(i,:) = shuffle_X(row_idx,:)
end
bucket_cell{k} = bag_matrix
end
I could do following:
if row_idx == 0
row_idx = round(rand(1)*316)
assuming random number will never give two zeros values in two consecutive rounds.
randi is a good way to get integer indices for sampling with replacement. Assuming you want to fill three buckets with an equal number of samples, then you can write
data = rand(316,34); %# create some dummy data
number_of_data = size(data,1);
number_of_rows_per_bucket = 50;
bucket_cell = cell(1,3);
idx = randi([1,number_of_data],[number_of_rows_per_bucket,3]);
for iBucket = 1:3
bucket_cell{iBucket} = data(idx(:,iBucket),:);
end
To the question: if you use randperm it will give you a draw order without replacement, since you can draw any item once.
If you use randi it draws you with replacement, that is you draw an item possibly many times.
If you want to "segment" a dataset, that usually means you split the dataset into three distinct sets. For that you use draw without replacement (you don't put the items back; use randperm). If you'd do it with replacement (using randi), it will be incredibly slow, since after some time the chance that you draw an item which you have not before is very low.
(Details in coupon collector ).
If you need a segmentation that is a split, you can just go over the elements and independently decide where to put it. (That is you choose a bucket for each item with replacement -- that is you put any chosen bucket back into the game.)
For that:
% if your data items are vectors say data = [1 1; 2 2; 3 3; 4 4]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]); % draw a bucket label for each item, independently.
for i=1:3
bucket{i} = data(bucket_labels==i,:);
end
%if your data items are scalars say data = [1 2 3 4 5]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]);
for i=1:3
bucket{i} = data(bucket_labels==i);
end
there we go.