Comparing two text files containing numbers in columns in Matlab - matlab

I have 2 text files (a.txt, b.txt) with some columns of numbers and a header line (one header for each column as shown below). I want to match 2nd col. in a.txt with 1st col. in b.txt and get all the matched rows from b.txt. The numerical values in col.-gr are not repeated either in a.txt or b.txt.
a.txt
—————
gc gr
1 5
3 8
3 4
3 9
b.txt
—————
gr c1 c2
1 12 32
3 21 23
7 33 12
8 54 45
9 99 65
34 43 76
56 80 24
5 32 80
32 15 23
4 11 31
I want matched rows from b.txt exactly like-
5 32 80
8 54 45
4 11 31
9 99 65

try this
id = fopen('a.txt','r');
A = cell2mat(textscan(id,'%d %d','headerlines',1));
fclose(id);
id = fopen('b.txt','r');
B = cell2mat(textscan(id,'%d %d %d','headerlines',1));
fclose(id);
out_ = cell2mat(arrayfun(#(i)(B(find(A(i,2) == B(:,1),1,'first'),:)),1:size(A,1),'uni',0)');

Related

Octave: How can I vectorize this for-lop?

Can this for-loop be vectorized?
I want to be able to vectorize the for-loop of this code to obtain a matrix like "sample". Trying to vectorize I got the "sample2" matrix, however as you can see it does not show the values I want for each row due to the linear index when I take "data" as a matrix instead of as a vector.
close all; clear all; clc;
N=5; n=10; n1=2; n2=8;
rand('state', sum(100*clock));
choose=round(((n-1)*rand(N,n))+1);
data=choose.^2;
idx=choose(:,n1:n2);
for i=1:N
dat=data(i,:);
sample(i,:)=dat(idx(i,:));
end
%Trying to vectorize to get the same result
sample2(:,(n1:n2)-n1+1)=data(idx);
Results:
data =
36 64 64 25 81 4 100 36 49 25
4 4 1 16 4 16 81 16 100 64
36 81 36 25 16 16 1 64 49 4
36 64 49 49 25 36 100 64 81 64
1 16 16 49 64 49 81 4 16 64
idx =
8 8 5 9 2 10 6
2 1 4 2 4 9 4
9 6 5 4 4 1 8
8 7 7 5 6 10 8
4 4 7 8 7 9 2
sample =
36 36 81 49 64 25 4
4 4 16 4 16 100 16
49 16 16 25 25 36 64
64 100 100 25 36 64 64
49 49 81 4 81 16 16
sample2 =
81 81 1 64 4 16 64
4 36 36 4 36 64 36
64 64 1 36 36 36 81
81 4 4 1 64 16 81
36 36 4 81 4 64 4
Looks like you are trying to index row major. But Octave indexes column major. You can transpose your input to get the indicies right. Also, if you want to index into the second col, you can just add the length of the first column.
Try this:
data2 = data';
sample2 = data2([idx + [0:size(idx,1)-1]'*size(data,2)])
First line just transposes the matrix so you get what would be row indexing of the original.
Second line modifies the index matrix to be total index instead of row index by adding the length of the original data rows, then references the data to provide the result.

Matlab: extract submatrix with selecting some values from the last line

20 4 4 74 20 20 74 85 85 85
A = 36 1 1 11 36 36 11 66 66 66
77 1 1 15 77 77 15 11 11 11
3 4 2 6 7 8 10 10 15 17
how from the matrix A, I can extract the submatrix whose fourth line (end line) contains only the values ​​[3 6 10]?
for a single value, I do:
B=A(:,A(4,:)==10)
but I do not know how to do this for several values.
Use ismember -
search_array = [3 6 10]
subA = A(:,ismember(A(end,:),search_array))
Or bsxfun -
subA = A(:,any(bsxfun(#eq,A(end,:),search_array(:)),1))

Functional addition of Columns in kdb+q

I have a q table in which no. of non keyed columns is variable. Also, these column names contain an integer in their names. I want to perform some function on these columns without actually using their actual names
How can I achieve this ?
For Example:
table:
a | col10 col20 col30
1 | 2 3 4
2 | 5 7 8
// Assume that I have numbers 10, 20 ,30 obtained from column names
I want something like **update NewCol:10*col10+20*col20+30*col30 from table**
except that no.of columns is not fixed so are their inlcluded numbers
We want to use a functional update (simple example shown here: http://www.timestored.com/kdb-guides/functional-queries-dynamic-sql#functional-update)
For this particular query we want to generate the computation tree of the select clause, i.e. the last part of the functional update statement. The easiest way to do that is to parse a similar statement then recreate that format:
q)/ create our table
q)t:([] c10:1 2 3; c20:10 20 30; c30:7 8 9; c40:0.1*4 5 6)
q)t
c10 c20 c30 c40
---------------
1 10 7 0.4
2 20 8 0.5
3 30 9 0.6
q)parse "update r:(10*c10)+(20*col20)+(30*col30) from t"
!
`t
()
0b
(,`r)!,(+;(*;10;`c10);(+;(*;20;`col20);(*;30;`col30)))
q)/ notice the last value, the parse tree
q)/ we want to recreate that using code
q){(*;x;`$"c",string x)} 10
*
10
`c10
q){(+;x;y)} over {(*;x;`$"c",string x)} each 10 20
+
(*;10;`c10)
(*;20;`c20)
q)makeTree:{{(+;x;y)} over {(*;x;`$"c",string x)} each x}
/ now write as functional update
q)![t;();0b; enlist[`res]!enlist makeTree 10 20 30]
c10 c20 c30 c40 res
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
q)update r:(10*c10)+(20*c20)+(30*c30) from t
c10 c20 c30 c40 r
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
I think functional select (as suggested by #Ryan) is the way to go if the table is quite generic, i.e. column names might varies and number of columns is unknown.
Yet I prefer the way #JPC uses vector to solve the multiplication and summation problem, i.e. update res:sum 10 20 30*(col10;col20;col30) from table
Let combine both approach together with some extreme cases:
q)show t:1!flip(`a,`$((10?2 3 4)?\:.Q.a),'string 10?10)!enlist[til 100],0N 100#1000?10
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8
--| -------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0
1 | 8 4 0 4 1 6 0 6 1 7
2 | 4 7 3 0 1 0 3 3 6 4
3 | 2 4 2 3 8 2 7 3 1 7
4 | 3 9 1 8 2 1 0 2 0 2
5 | 6 1 4 5 3 0 2 6 4 2
..
q)show n:"I"$string[cols get t]inter\:.Q.n
4 8 5 7 4 7 5 5 2 8i
q)show c:cols get t
`vltg4`pnwz8`mifz5`pesq7`fkcx4`bnkh7`qvdl5`tl5`lr2`lrtd8
q)![t;();0b;enlist[`res]!enlist({sum x*y};n;enlist,c)]
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8 res
--| -----------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0 176
1 | 8 4 0 4 1 6 0 6 1 7 226
2 | 4 7 3 0 1 0 3 3 6 4 165
3 | 2 4 2 3 8 2 7 3 1 7 225
4 | 3 9 1 8 2 1 0 2 0 2 186
5 | 6 1 4 5 3 0 2 6 4 2 163
..
You can create a functional form query as #Ryan Hamilton indicated, and overall that will be the best approach since it is very flexible. But if you're just looking to add these up, multiplied by some weight, I'm a fan of going through other avenues.
EDIT: missed that you said the number in the columns name could vary, in which case you can easily adjust this. If the column names are all prefaced by the same number of letters, just drop those and then parse the remaining into int or what have you. Otherwise if the numbers are embedded within text, check out this other question
//Create our table with a random number of columns (up to 9 value columns) and 1 key column
q)show t:1!flip (`$"c",/:string til n)!flip -1_(n:2+first 1?10) cut neg[100]?100
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9
--| --------------------------
28| 3 18 66 31 25 76 9 44 97
60| 35 63 17 15 26 22 73 7 50
74| 64 51 62 54 1 11 69 32 61
8 | 49 75 68 83 40 80 81 89 67
5 | 4 92 45 39 57 87 16 85 56
48| 88 34 55 21 12 37 53 2 41
86| 52 91 79 33 42 10 98 20 82
30| 71 59 43 58 84 14 27 90 19
72| 0 99 47 38 65 96 29 78 13
q)update res:sum (1+til -1+count cols t)*flip value t from t
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9 res
--| -------------------------------
28| 3 18 66 31 25 76 9 44 97 2230
60| 35 63 17 15 26 22 73 7 50 1551
74| 64 51 62 54 1 11 69 32 61 1927
8 | 49 75 68 83 40 80 81 89 67 3297
5 | 4 92 45 39 57 87 16 85 56 2582
48| 88 34 55 21 12 37 53 2 41 1443
86| 52 91 79 33 42 10 98 20 82 2457
30| 71 59 43 58 84 14 27 90 19 2134
72| 0 99 47 38 65 96 29 78 13 2336
q)![t;();0b; enlist[`res]!enlist makeTree 1+til -1+count cols t] ~ update res:sum (1+til -1+count cols t)*flip value t from t
1b
q)\ts do[`int$1e4;![t;();0b; enlist[`res]!enlist makeTree 1+til 9]]
232 3216j
q)\ts do[`int$1e4;update nc:sum (1+til -1+count cols t)*flip value t from t]
69 2832j
I haven't tested this on a large table, so caveat emptor
Here is another solution which is also faster.
t,'([]res:(+/)("I"$(string tcols) inter\: .Q.n) *' (value t) tcols:(cols t) except keys t)
By spending some time, we can decrease the word count as well. Logic goes like this:
a:"I"$(string tcols) inter\: .Q.n
Here I am first extracting out the integers from column names and storing them in a vector. Variable 'tcols' is declared at the end of query which is nothing but columns of table except key columns.
b:(value t) tcols:(cols t) except keys t
Here I am extracting out each column vector.
c:(+/) a *' b
Multiplying each column vector(var b) by its integer(var a) and adding corresponding
values from each resulting list.
t,'([]res:c)
Finally storing result in a temp table and joining it to t.

How to select and remove cells from a 2d matrix of cells in matlab

I have a 35x2 matrix (randomwords); and I have randomly selected 8 rows (rndm). What I need to do is remove the 8 selected rows from the randomwords matrix and save this new 27x2 matrix under a new variable heading, but I am finding this extremely difficult. I have provided my code Any help would be greatly appreciated.
target = words ([30 1 46 14 44 55 8 3 57 65 69 70 57 39 21 60 22 20 16 10 9 17 62 19 25 41 49 53 36 6 42 58 40 56 63]);
synonym = words([43 15 32 28 72 27 48 51 13 67 59 33 35 47 52 61 71 7 23 12 2 66 11 37 4 45 64 38 34 31 29 18 50 68 26]);
% assigns these elements of words into targets and synonyms. They are
% ordered so that words and synonyms are corresponding elements of
% synonyms and targets
% TO SELECT 8 RANDOM WORDS FOR THE ENCODING PHASE
randomwords = [target; synonym]'; % should be a 35x2 matrix
rndm = datasample(randomwords, 8, 1); % should select 8 random couples from the rows and none of them will be repeats
unpaired = rndm(:,2); % should select only the synonyms to form the unpaired stimuli; will be different for each run
Store the index of the removed rows in a variable, let's say removedrows and then just do:
result = randomwords;
result(removedrows,:) = [];

I am trying to extract the rows with the same x values from two different files in matlab, how can I do it?

To be more clear, what I want is to generate file3 from file1 but with the x values in file 2.
Example:
file 1:
x1=[1 2 3 4 5 6 7 8 9 10]'
y1=[11 22 33 44 55 66 77 88 99 00]'
file 2:
x2=[3 4 5 8 9]'
y2=[333 444 555 888 999]'
file 3:
x2=[3 4 5 8 9]'
y2=[33 44 55 88 99]'
Use ISMEMBER to find which values of x1 are in x2, and where they're located.
x1=[1 2 3 4 5 6 7 8 9 10]'
y1=[11 22 33 44 55 66 77 88 99 00]'
x2=[3 4 5 8 9]'
y2=[333 444 555 888 999]'
x3 = x2;
y3 = y1(ismember(x1,x2))
y3 =
33
44
55
88
99