Is there anyway to read with subsampling in MATLAB? The input data look like:
id=3,age=25, 0.5 0.5 0.2 0.6 0.6 0.5
id=1,age=15, 0.5 0.8 0.2 0.9 0.6 0.9
id=7,age=24, 0.5 0.2 0.9 0.6 0.1 0.5
(Edited) For the LAST SIX columns, I only want columns that are multiple of three (i.e. the 3rd column and the 6th column in the LAST SIX COLUMNS, which is equivalent to the 5th column and the 8th column in the whole data file) be read. That is, a matrix like:
0.2 0.5
0.2 0.9
0.9 0.5
Ideally, the code looks like:
for line=1:maxLine
header(line,:) = fscanf(fid,'id=%d,age=%d,',[1,2]);
content(line,:) = fscanf(fid,'only read columns multiple of three');
end;
I know that I can read the whole line and sub-sample, the problem is, the array I'm dealing with has large scale, 10k+ columns, I do not want to consume too much memory.
There is the way:
if you use fopen you can get access to file only one time, so you can't do it in a loop. So you need to load all necessary data in one array and then transpose it.
Your desire to save free memory is achieved by ignoring some elements:
fid = fopen('new.txt','r');
A = fscanf(fid, 'id=%d,age=%d, %*f %*f %f %*f %*f %f\n', [4 inf])
I used your data and get this result:
A =
3.0000 1.0000 7.0000
25.0000 15.0000 24.0000
0.2000 0.2000 0.9000
0.5000 0.9000 0.5000
as you can read here http://www.mathworks.com/help/matlab/ref/fscanf.html?searchHighlight=fscanf fscanf reads data into columns, that is why we need to transpose it.
So using A=A' gives result you want:
A =
3.0000 25.0000 0.2000 0.5000
1.0000 15.0000 0.2000 0.9000
7.0000 24.0000 0.9000 0.5000
Now you can make two different matrices if needed.
Related
Is there a way to calculate a moving mean in a way that the values at the beginning and at the end of the array are averaged with the ones at the opposite end?
For example, instead of this result:
A=[2 1 2 4 6 1 1];
movmean(A,2)
ans = 2.0 1.5 1.5 3.0 5 3.5 1.0
I want to obtain the vector [1.5 1.5 1.5 3 5 3.5 1.0], as the initial array element 2 would be averaged with the ending element 1.
Generalizing to an arbitrary window size N, this is how you can add circular behavior to movmean in the way you want:
movmean(A([(end-floor(N./2)+1):end 1:end 1:(ceil(N./2)-1)]), N, 'Endpoints', 'discard')
For the given A and N = 2, you get:
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
For an arbitrary window size n, you can use circular convolution with an averaging mask defined as [1/n ... 1/n] (with n entries; in your example n = 2):
result = cconv(A, repmat(1/n, 1, n), numel(A));
Convolution offers some nice ways of doing this. Though, you may need to tweak your input slightly if you are only going to partially average the ends (i.e. the first is averaged with the last in your example, but then the last is not averaged with the first).
conv([A(end),A],[0.5 0.5],'valid')
ans =
1.5000 1.5000 1.5000 3.0000 5.0000 3.5000 1.0000
The generalized case here, for a moving average of size N, is:
conv(A([end-N+2:end, 1:end]),repmat(1/N,1,N),'valid')
I have a matrix containing 8 cols and 80k rows. (From an excel file)
Each row has an ID.
I want to store all data with ID no. 1 in a new matrix. And all data with ID no. 2 in a second matrix etc. So each time an ID changes I want to save all the data of a new ID in a new matrix.
There are above 800 ID's.
Ive tried several things w/o luck. Among others:
k = zeros(117,8)
for i =1:80000
k(i) = i + Dataset(1:i,:)
end
The above was only to see if I actually could get the first 117 rows saved in another matrix which didnt succeed.
If one of the 8 columns contains the ID then you can use logical indexing. For example if column 1 contains the ID, we can first find a list of all different ID values:
uniqueIDs = unique(Dataset(:, 1));
Then we can create cell array, with the lists of items of a given ID:
listsByID = cell(length(uniqueIDs), 1);
for idx = 1:length(uniqueIDs)
listsByID{idx} = Dataset(Dataset(:, 1) == uniqueIDs(idx), :);
end
Running the above on an example dataset:
Dataset = [1 0.1 10
1 0.2 20
2 0.3 30
3 0.4 40
2 0.5 50
2 0.6 60];
Results in:
1.0000 0.1000 10.0000
1.0000 0.2000 20.0000
2.0000 0.3000 30.0000
2.0000 0.5000 50.0000
2.0000 0.6000 60.0000
3.0000 0.4000 40.0000
So I have this data I'd like plotted on loglog scale, with linear values on the y-axis and the values in dB on the x axis and
loglog(EbN0,BER)
outputs a nice looking curve, but the problem is the axis ticks. It's fine on the y-axis, but the x axis only has one tick, at 10^0and no other ticks. Furthermore, that tick corresponds to the absolute value, not the dB value. Any convenient way to accomplish this?
(Note that both EbN0 and BER contain absolute values)
EDIT: I'll add my data and explain what I want a bit more.
EbN0 =
Columns 1 through 14
0.5000 1.0000 1.5000 2.0000 2.5000 3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000 6.5000 7.0000
Columns 15 through 20
7.5000 8.0000 8.5000 9.0000 9.5000 10.0000
BER_TOT_ITER =
Columns 1 through 14
0.2928 0.2024 0.1183 0.0511 0.0164 0.0046 0.0010 0.0003 0.0001 0 0.0000 0.0000 0.0000 0
Columns 15 through 20
0 0 0 0 0 0
If I do plot(10*log10(EbN0),10*log10(BER_TOT_ITER)), I actually get exactly the graph I want and the dB values on the x axis, but now the y ticks are displayed in dB's instead of absolute values... so I just want to relabel the y ticks, NOT rescale the figure.
Relabeling the ticks is really the wrong approach here. You'd replace numerical values with strings and resizing etc. wouldn't work anymore.
Also your data does not fit to what you're actually looking at.
You should always try to transform your data first.
So besides loglog have a look at semilogx and semilogy, which allow you to have a single logarithmic axis.
To sum up, what you're looking for is:
semilogy(10*log10(EbN0), BER_TOT_ITER)
This question already has answers here:
How can I find unique rows in a matrix, with no element order within each row?
(4 answers)
Closed 7 years ago.
I have a Protein-Protein interaction data of homo sapiens. The size of the matrix is <4850628x3>. The first two columns are proteins and the third is its confident score. The problem is half the rows are duplicate pairs
if protein A interacts with B, C, D. it is mentioned as
A B 0.8
A C 0.5
A D 0.6
B A 0.8
C A 0.5
D A 0.6
If you observe the confident score of A interacting with B and B interacting with A is 0.8
If I have a matrix of <4850628x3> half the rows are duplicate pairs. If I choose Unique(1,:) I might loose some data.
But I want <2425314x3> i.e without duplicate pairs. How can I do it efficiently?
Thanks
Naresh
Supposing that in your matrix you store each protein with a unique id.
(Eg: A=1, B=2, C=3...) your example matrix will be:
M =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
2.0000 1.0000 0.8000
3.0000 1.0000 0.5000
4.0000 1.0000 0.6000
You must first sort the two first columns row-wise so you will always have the protein pairs in the same order:
M2 = sort(M(:,1:2),2)
M2 =
1 2
1 3
1 4
1 2
1 3
1 4
Then use unique with the second parameter rows and keep the indexes of unique pairs:
[~, idx] = unique(M2, 'rows')
idx =
1
2
3
Finally filter your initial matrix to keep unly the unique pairs.
R = M(idx,:)
R =
1.0000 2.0000 0.8000
1.0000 3.0000 0.5000
1.0000 4.0000 0.6000
Et voilĂ !
Let say if I have this data
my_data = [ 10 20 30 40; 0.1 0.7 0.4 0.3; 6 1 2 3; 2 5 4 2];
my_index = logical(my_data(4,:)==2);
What is the simplest way to use 'my_index' to give this output
10.0000 40.0000
0.1000 0.3000
6.0000 3.0000
2.0000 2.0000
my_data(:,my_index)
but I'm suspicious that this is so simple that it doesn't satisfy your (background) requirements ...