Extract the data from txt file with various format speification in matlab - matlab

I would like to extract Andy's result from the below txt file. My expectation is extracting 1.033, -0.017, 1.016 and put these into the cell array. Is there any possible way to extract these? I have around 100 files with same format as below.
Total 6 outlier 0
| Thomas | -0.255 | -0.006 | -0.261 |
| Todd | 1.012 | 0.112 | 1.124 |
| Harry | -0.033 | 0.005 | -0.028 |
| Andy | 1.033 | -0.017 | 1.016 |
| Zheng | 0.152 | 0.226 | 0.378 |
| Betsi | -19.409 | 1.010 | -18.399 |
| Andrew | -0.066 | 0.048 | -0.018 |
| Tom | -95.582 | 0.590 | -94.991 |

I assume there is no new line in you text file.
Let's try something:
%Open you file and load the data
fid = fopen('the_path_of_your_file');
line = fgetl(fid);
%We split the data using the '||' delimiter. Each personne's results
%is stored in a cell.
txtSplitCell = strsplit(line,'||');
%Here we check in each cell if 'Andy' appears
indAndy = cellfun(#(c)strfind(c,'Andy')),...
txtSplitCell,'UniformOutput',false);
%We split the content of Andy's results cell with the second sperator '|'
%to get the values
resAndyRaw = strsplit(txtSplitCell{find(indAndy,1)},'|');
% We only select the subsection of the cell array that actually contains
%Andy's results (in other words we remove 'Andy' in the results cell
%and we convert that into matrix;
resAndyTab = cell2mat(resAndyRaw(strcmp(resAndyRaw,'Andy'):end));
%Note: strcmp(resAndyRaw,'Andy') enable locating the postion of
%'Andy' in the resulting cell array of the strsplit function.
% Consequently thus this code should work for thomas as well.
fclose(fid)
print(resAndyTab)
I can't run that code yet (no matlab on my computer). So test it please and correct it if needed.
Bye.

Related

SPSS group by rows and concatenate string into one variable

I'm trying to export SPSS metadata to a custom format using SPSS syntax. The dataset with value labels contains one or more labels for the variables.
However, now I want to concatenate the value labels into one string per variable. For example for the variable SEX combine or group the rows F/Female and M/Male into one variable F=Female;M=Male;. I already concatenated the code and labels into a new variable using Compute CodeValueLabel = concat(Code,'=',ValueLabel).
so the starting point for the source dataset is like this:
+--------------+------+----------------+------------------+
| VarName | Code | ValueLabel | CodeValueLabel |
+--------------+------+----------------+------------------+
| SEX | F | Female | F=Female |
| SEX | M | Male | M=Male |
| ICFORM | 1 | Yes | 1=Yes |
| LIMIT_DETECT | 0 | Too low | 0=Too low |
| LIMIT_DETECT | 1 | Normal | 1=Normal |
| LIMIT_DETECT | 2 | Too high | 2=Too high |
| LIMIT_DETECT | 9 | Not applicable | 9=Not applicable |
+--------------+------+----------------+------------------+
The goal is to get a dataset something like this:
+--------------+-------------------------------------------------+
| VarName | group_and_concatenate |
+--------------+-------------------------------------------------+
| SEX | F=Female;M=Male; |
| ICFORM | 1=Yes; |
| LIMIT_DETECT | 0=Too low;1=Normal;2=Too high;9=Not applicable; |
+--------------+-------------------------------------------------+
I tried using CASESTOVARS but that creates separate variables, so several variables not just one single string variable. I'm starting to suspect that I'm running up against the limits of what SPSS can do. Although maybe it's possible using some AGGREGATE or OMS trickery, any ideas on how to do this?
First I recreate your example here to demonstrate on:
data list list/varName CodeValueLabel (2a30).
begin data
"SEX" "F=Female"
"SEX" "M=Male"
"ICFORM" "1=Yes"
"LIMIT_DETECT" "0=Too low"
"LIMIT_DETECT" "1=Normal"
"LIMIT_DETECT" "2=Too high"
"LIMIT_DETECT" "9=Not applicable"
end data.
Now to work:
* sorting to make sure all labels are bunched together.
sort cases by varName CodeValueLabel.
string combineall (a300).
* adding ";" .
compute combineall=concat(rtrim(CodeValueLabel), ";").
* if this is the same varname as last row, attach the two together.
if $casenum>1 and varName=lag(varName)
combineall=concat(rtrim(lag(combineall)), " ", rtrim(combineall)).
exe.
*now to select only relevant lines - first I identify them.
match files /file=* /last=selectthis /by varName.
*now we can delete the rest.
select if selectthis=1.
exe.
NOTE: make combineall wide enough to contain all the values of your most populated variable.

Does SQL have a way to group rows without squashing the group into a single row?

I want to do a single query that outputs an array of arrays of table rows. Think along the lines of <table><rowgroup><tr><tr><tr><rowgroup><tr><tr>. Is SQL capable of this? (specifically, as implemented in MariaDB, though migration to AWS RDS might occur one day)
The GROUP BY statement alone does not do this, it creates one row per group.
Here's an example of what I'm thinking of…
SELECT * FROM memes;
+------------+----------+
| file_name | file_ext |
+------------+----------+
| kittens | jpeg |
| puppies | gif |
| cats | jpeg |
| doggos | mp4 |
| horses | gif |
| chickens | gif |
| ducks | jpeg |
+------------+----------+
SELECT * FROM memes GROUP BY file_ext WITHOUT COLLAPSING GROUPS;
+------------+----------+
| file_name | file_ext |
+------------+----------+
| kittens | jpeg |
| cats | jpeg |
| ducks | jpeg |
+------------+----------+
| puppies | gif |
| horses | gif |
| chickens | gif |
+------------+----------+
| doggos | mp4 |
+------------+----------+
I've been using MySQL for ~20 years and have not come across this functionality before but maybe I've just been looking in the wrong place ¯\_(ツ)_/¯
I haven't seen an array rendering such as the one you want, but you can simulate it with multiple GROUP BY / GROUP_CONCAT() clauses.
For example:
select concat('[', group_concat(g), ']') as a
from (
select concat('[', group_concat(file_name), ']') as g
from memes
group by file_ext
) x
Result:
a
---------------------------------------------------------
[[puppies,horses,chickens],[kittens,cats,ducks],[doggos]]
See running example at DB Fiddle.
You can tweak the delimiters such as ,, [, and ].
SELECT ... ORDER BY file_ext will come close to your second output.
Using GROUP BY ... WITH ROLLUP would let you do subtotals under each group, which is not what you wanted either, but it would give you extra lines where you want the breaks.

Extract contents from cell array

I have a series of Images, stored into an array A. So every entry of A contains an Image (matrix). All matrices are equally sized.
Now I want to extract the value of a specific position (pixel), but my current approach seems to be slow and I think there may be a better way to do it.
% Create data that resembles my problem
N = 5
for i = 1:N
A{i} = rand(5,5);
end
% my current approach
I = size(A{1},1);
J = size(A{1},2);
val = zeros(N,1);
for i = 1:I
for j = 1:J
for k = 1:N
B(k) = A{k}(i,j);
end
% do further operations on B for current i,j, don't save B
end
end
I was thinking there should be some way along the lines of A{:}(i,j) or vertcat(A{:}(i,j)) but both lead to
??? Bad cell reference operation.
I'm using Matlab2008b.
For further information, I use fft on B afterwards.
Here are the results of the answer by Cris
| Code | # images | Extracting Values | FFT | Overall |
|--------------|----------|-------------------|----------|-----------|
| Original | 16 | 12.809 s | 19.728 s | 62.884 s |
| Original | 128 | 105.974 s | 23.242 s | 177.280 s |
| ------------ | -------- | ----------------- | ------- | --------- |
| Answer | 16 | 42.122 s | 27.382 s | 104.565 s |
| Answer | 128 | 36.807 s | 26.623 s | 102.601 s |
| ------------ | -------- | ----------------- | ------- | --------- |
| Answer (mod) | 16 | 14.772 s | 27.797 s | 77.784 s |
| Answer (mod) | 128 | 13.637 s | 28.095 s | 83.839 s |
The answer codes was modded to double(squeeze(A(i,j,:))); because without double the FFT took much longer.
Answer (mod) uses double(A(i,j,:));
So the improvement seems to really kick in for larger sets of images, however I currently plan with processing ~ 500 images per run.
Update
Measured with the profile function, the result of using/omitting squeeze
| Code | # Calls | Time |
|--------------------------------|---------|----------|
| B = double(squeeze(A(i,j,:))); | 1431040 | 36.325 s |
| B= double(A(i,j,:)); | 1431040 | 14.289 s |
A{:}(i,j) does not work because A{:} is a comma-separated list of elements, equivalent to A{1},A{2},A{3},...A{end}. It makes no sense to index into such an array.
To speed up your operation, I recommend that you create a 3D matrix out of your data, like this:
A3 = cat(3,A{:});
Of course, this will only work if all elements of A have the same size (as was originally specified in the question).
Now you can quickly access the data like so:
for i = 1:I
for j = 1:J
B = squeeze(A3(i,j,:));
% do further operations on B for current i,j, don't save B
end
end
Depending on the operations you apply to each B, you could vectorize those operations as well.
Edit: Since you apply fft to each B, you can obtain that also without looping:
B_fft = fft(A3,[],3); % 3 is the dimension along which to apply the FFT

Scala - Remove first row of Spark DataFrame

I know dataframes are supposed to be immutable and everything and I know it's not a great idea to try to change them. However, the file I'm receiving has a useless header of 4 columns (the whole file has 50+ columns). So, what I"m trying to do is just get rid of the very top row because it throws everything off.
I've tried a number of different solutions (mostly found on here) like using .filter() and map replacements, but haven't gotten anything to work.
Here's an example of how the data looks:
H | 300 | 23098234 | N
D | 399 | 54598755 | Y | 09983 | 09823 | 02983 | ... | 0987098
D | 654 | 65465465 | Y | 09983 | 09823 | 02983 | ... | 0987098
D | 198 | 02982093 | Y | 09983 | 09823 | 02983 | ... | 0987098
Any ideas?
The cleanest way I've seen so far is something along the lines of filtering out the first row
csv_rows = sc.textFile('path_to_csv')
skipable_first_row = csv_rows.first()
useful_csv_rows = csv_rows.filter(row => row != skipable_first_row)

Matlab: find inner boundary of set of vertices

I have a set of (x,y) points defined in the following way:
map=[0,0;66,0;66,44;44,44;44,66;110,66;110,110;0,110];
There is then a function that connects these points (which are vertices, i.e. corner points) together to form a closed shape. The example vertices I have given form a shape something like this:
________________________________________
| |
| |
| |
| ____________________|
| |
| |_______
| |
| |
| |
| |
|___________________________|
I would like to now automatically generate a second set of vertices that form a boundary inside the shape, offset by some amount. I.e. this:
inner_boundary=[5,5;61,5;61,39;39,39;39,71;105,71;105,105;5,105];
________________________________________
| ___________________________________ |
| | | |
| | _____________________| |
| | | ____________________|
| | | |
| | | |_______
| | |________ |
| | | |
| | | |
| |______________________| |
|___________________________|
Any ideas on how to do this? I've been racking my brains but can't think of a robust way to do this. I need it to automatically do this for any input set of vertices. Also, to clarify - I am just interested in how to specify the set of vertices, not the drawing part.
Many thanks!
Here is a solution based on Image Processing Toolbox functions. The basic idea is as follows:
Use "poly2mask" to create a BW (0-1) image from the polygon
coordinates
Use "imerode" to erode the mask by 1 pixel
Use "bwboundaries" to trace the new, eroded, boundary
Code example:
x = [4 10 10 4 4];
y = [4 4 10 10 4];
mask = poly2mask(x,y,12,12);
mask_eroded = imerode(mask, 1);
newBnds = bwboundaries(mask_eroded);
newBnds = newBnds{1};
Note that the newBnds will probably contain more points than you want because it traces every single pixel on the boundary. You can write a simple iterative routine to discard non-endpoints.