Combine two matrices with respect to primary key in Matlab - matlab

I have two matrices with the first column being the primary key as shown below:
| 123 3 234 11 |
| 124 2 634 22 |
A = | 125 8 731 33 |
| 126 8 237 44 |
| 127 6 235 55 |
| 124 34 23 |
B = | 125 45 73 |
| 126 33 37 |
| 127 44 25 |
I want the new matrix C such that
find(A(:,2) > 5). In this case the indices that satisfy this condition 3,4
The primary key values for indices 3 and 4 in A in 125 and 126.
Find the rows having the values 125 and 126 in B which is 2,3.
Create the new matrix C concatinating the values in A and B with that primary key.
Matrix C should look like
C = | 125 8 731 33 45 73 |
| 126 8 237 44 35 37 |
How can I do this?
Thanks!

Your key function to use is ISMEMBER. Use two output indices:
[idxa, idxb] = ismember(a(:,1), b(:,1));
idxb(idxb==0) = [];
Then you can combine
c = [a(idxa,:) b(idxb,:)];
I hope you can add filters and select the columns you need by yourself.

The statistics toolbox contains a function called JOIN that basically does what you want.
http://www.mathworks.de/de/help/stats/dataset.join.html
I think it does what you are looking for. Does it?

Related

diff -y usage of |'s vs. >'s and <'s in the output

I have two files:
cat test1
1
3
5
11
13
17
19
21
22
23
29
30
32
33
34
39
and
cat test2
2
4
6
10
12
20
21
22
23
24
29
30
31
32
34
49
When I diff them, I get:
diff -y test1 test2
1 | 2
3 | 4
5 | 6
11 | 10
13 | 12
17 | 20
19 <
21 21
22 22
23 23
> 24
29 29
30 30
> 31
32 32
33 <
34 34
39 | 49
The meaning and behavior of when diff -y produces "|" instead of "<" or ">" has proven very difficult to dig up in spite of googling. I understand what it means, I think (i.e. "19 <" instead of "19 | 21" b/c 21 is present later in file1 to match up with the aforementioned 21 in file2) but it strikes me as obtuse b/c the output of a given line is being influenced by subsequent ones, and depending on output formatting there is nothing indicating that the command is doing this or why.
Is there a way to suppress this behavior or make it more transparent?
I want to suppress common lines in my output, and the problem is without the common lines, this next output is very confusing (for reasons I outlined above - the common lines give context as to "why are some of these |'s and the others are >'s or <'s?", so looking at the below output there's seemingly no answer to that question):
diff -y --suppress-common-lines test1 test2
1 | 2
3 | 4
5 | 6
11 | 10
13 | 12
17 | 20
19 <
> 24
> 31
33 <
39 | 49
I'd basically like it to be either |'s -OR- all <'s and >'s

KDB - Find duplicates or similar entries in one column

I'm trying to eliminate duplicate entries for customers in my contact list. Assume my table has three columns (FirstName, LastName, CustomerID).
Can somebody help me create a query that identifies different CustomerIDs with either the same or very similar First and Last Names? We end up with multiple entries due to sales people searching for a name and not finding it due to misspellings. They then create a new entry for the customer with a slightly different spelling of the name.
Thanks!
One approach is to manage a mapping of names to common (mis)spellings and then map all the various spellings back to the intended name. Then group them.
t:([] fn:100?(`John;`Mike;`Bob;`john;`Johnn;`Mick;`Bobby);ln:100?(`Doe;`Smith;`doe;`Do;`smith);id:til 100)
mapFN:exec similar!name from ungroup flip `name`similar!flip (
(`Bob; (`Bob;`bob;`Bobby;`bobby));
(`John; (`John;`Johnn;`john));
(`Mike; (`Mike;`mike;`Mick;`Michael))
);
mapLN:exec similar!name from ungroup flip `name`similar!flip (
(`Doe; (`Doe;`doe;`Do));
(`Smith; (`Smith;`smith;`Smyth))
);
Without mapping:
q)`fn`ln xgroup t
fn ln | id
-----------| ----------------
Mick Do | 0 25 26 50 68 71
Bobby Smith| 1 22 23 83
John Smith| 2 8 48 51 69 85
Mike Doe | 3 44
john doe | ,4
Mick Doe | 5 47 95
John Doe | 6 46 49 63
john Smith| 7 66 74
Johnn doe | 9 13 79 94
Mick doe | 10 20 55 67
Bobby smith| 11 17 18 53
john Doe | 12 21 56
...
With mapping:
q)`fn`ln xgroup update mapFN[fn],mapLN[ln] from t
fn ln | id
----------| -----------------------------------------------------------------
Mike Doe | 0 3 5 10 20 25 26 39 44 47 50 52 55 67 68 70 71 78 95 97
Bob Smith| 1 11 17 18 22 23 30 38 45 53 77 82 83
John Smith| 2 7 8 16 19 33 37 40 43 48 51 64 66 69 73 74 80 85 87
John Doe | 4 6 9 12 13 21 31 32 41 42 46 49 56 57 62 63 65 72 79 81 86 89 91
Bob Doe | 14 24 27 28 35 54 58 59 61 75 76 84
Mike Smith| 15 29 34 36 60 88 90 93 96 98
You could also do something more sophisticated with regex pattern matching.
The mapping would need to be pretty precise though as otherwise you might end up with false groupings

What kind of encoding is this? - apple, int, hex

Trying to figure out how to alter some preset files and wonder how to decode my values to match this encoding ... its used in a preset file of an apple music sequencer
appears as (int), hex representation (2bytes)
b1 b2 | b1,b2(i) b1b2(i) b2,b1(i) b2b1(i)
0 00 00 | 0,0 0 0,0 0
1 80 3f | 128,63 32831 63,128 16256
2 00 40 | 0,64 64 64,0 16384
3 40 40 | 64,64 16448 64,64 16448
4 80 40 | 128,64 32832 64,128 16512
5 a0 40 | 160,64 41024 64,160 16544
6 c0 40 | 192,64 49216 64,192 16576
7 e0 40 | 224,64 57408 64,224 16608
8 00 41 | 0,65 65 65,0 16640
9 10 41 | 16,65 4161 65,16 16656
10 20 41 | 32,65 8257 65,32 16672
11 30 41 | 48,65 12353 65,48 16688
20 a0 41 | 160,65 41025 65,160 16800
21 a8 41 | 168,65 43073 65,168 16808
40 20 42 | 32,66 8258 66,32 16928
50 48 42 | 72,66 18498 66,72 16968
100 c8 42 | 200,66 51266 66,200 17096
200 48 43 | 72,67 18499 67,72 17224
Can't stand the logic behind this, help much appreciated - tyia!

Comparing two text files containing numbers in columns in Matlab

I have 2 text files (a.txt, b.txt) with some columns of numbers and a header line (one header for each column as shown below). I want to match 2nd col. in a.txt with 1st col. in b.txt and get all the matched rows from b.txt. The numerical values in col.-gr are not repeated either in a.txt or b.txt.
a.txt
—————
gc gr
1 5
3 8
3 4
3 9
b.txt
—————
gr c1 c2
1 12 32
3 21 23
7 33 12
8 54 45
9 99 65
34 43 76
56 80 24
5 32 80
32 15 23
4 11 31
I want matched rows from b.txt exactly like-
5 32 80
8 54 45
4 11 31
9 99 65
try this
id = fopen('a.txt','r');
A = cell2mat(textscan(id,'%d %d','headerlines',1));
fclose(id);
id = fopen('b.txt','r');
B = cell2mat(textscan(id,'%d %d %d','headerlines',1));
fclose(id);
out_ = cell2mat(arrayfun(#(i)(B(find(A(i,2) == B(:,1),1,'first'),:)),1:size(A,1),'uni',0)');

Functional addition of Columns in kdb+q

I have a q table in which no. of non keyed columns is variable. Also, these column names contain an integer in their names. I want to perform some function on these columns without actually using their actual names
How can I achieve this ?
For Example:
table:
a | col10 col20 col30
1 | 2 3 4
2 | 5 7 8
// Assume that I have numbers 10, 20 ,30 obtained from column names
I want something like **update NewCol:10*col10+20*col20+30*col30 from table**
except that no.of columns is not fixed so are their inlcluded numbers
We want to use a functional update (simple example shown here: http://www.timestored.com/kdb-guides/functional-queries-dynamic-sql#functional-update)
For this particular query we want to generate the computation tree of the select clause, i.e. the last part of the functional update statement. The easiest way to do that is to parse a similar statement then recreate that format:
q)/ create our table
q)t:([] c10:1 2 3; c20:10 20 30; c30:7 8 9; c40:0.1*4 5 6)
q)t
c10 c20 c30 c40
---------------
1 10 7 0.4
2 20 8 0.5
3 30 9 0.6
q)parse "update r:(10*c10)+(20*col20)+(30*col30) from t"
!
`t
()
0b
(,`r)!,(+;(*;10;`c10);(+;(*;20;`col20);(*;30;`col30)))
q)/ notice the last value, the parse tree
q)/ we want to recreate that using code
q){(*;x;`$"c",string x)} 10
*
10
`c10
q){(+;x;y)} over {(*;x;`$"c",string x)} each 10 20
+
(*;10;`c10)
(*;20;`c20)
q)makeTree:{{(+;x;y)} over {(*;x;`$"c",string x)} each x}
/ now write as functional update
q)![t;();0b; enlist[`res]!enlist makeTree 10 20 30]
c10 c20 c30 c40 res
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
q)update r:(10*c10)+(20*c20)+(30*c30) from t
c10 c20 c30 c40 r
-------------------
1 10 7 0.4 420
2 20 8 0.5 660
3 30 9 0.6 900
I think functional select (as suggested by #Ryan) is the way to go if the table is quite generic, i.e. column names might varies and number of columns is unknown.
Yet I prefer the way #JPC uses vector to solve the multiplication and summation problem, i.e. update res:sum 10 20 30*(col10;col20;col30) from table
Let combine both approach together with some extreme cases:
q)show t:1!flip(`a,`$((10?2 3 4)?\:.Q.a),'string 10?10)!enlist[til 100],0N 100#1000?10
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8
--| -------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0
1 | 8 4 0 4 1 6 0 6 1 7
2 | 4 7 3 0 1 0 3 3 6 4
3 | 2 4 2 3 8 2 7 3 1 7
4 | 3 9 1 8 2 1 0 2 0 2
5 | 6 1 4 5 3 0 2 6 4 2
..
q)show n:"I"$string[cols get t]inter\:.Q.n
4 8 5 7 4 7 5 5 2 8i
q)show c:cols get t
`vltg4`pnwz8`mifz5`pesq7`fkcx4`bnkh7`qvdl5`tl5`lr2`lrtd8
q)![t;();0b;enlist[`res]!enlist({sum x*y};n;enlist,c)]
a | vltg4 pnwz8 mifz5 pesq7 fkcx4 bnkh7 qvdl5 tl5 lr2 lrtd8 res
--| -----------------------------------------------------------
0 | 3 3 0 7 9 5 4 0 0 0 176
1 | 8 4 0 4 1 6 0 6 1 7 226
2 | 4 7 3 0 1 0 3 3 6 4 165
3 | 2 4 2 3 8 2 7 3 1 7 225
4 | 3 9 1 8 2 1 0 2 0 2 186
5 | 6 1 4 5 3 0 2 6 4 2 163
..
You can create a functional form query as #Ryan Hamilton indicated, and overall that will be the best approach since it is very flexible. But if you're just looking to add these up, multiplied by some weight, I'm a fan of going through other avenues.
EDIT: missed that you said the number in the columns name could vary, in which case you can easily adjust this. If the column names are all prefaced by the same number of letters, just drop those and then parse the remaining into int or what have you. Otherwise if the numbers are embedded within text, check out this other question
//Create our table with a random number of columns (up to 9 value columns) and 1 key column
q)show t:1!flip (`$"c",/:string til n)!flip -1_(n:2+first 1?10) cut neg[100]?100
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9
--| --------------------------
28| 3 18 66 31 25 76 9 44 97
60| 35 63 17 15 26 22 73 7 50
74| 64 51 62 54 1 11 69 32 61
8 | 49 75 68 83 40 80 81 89 67
5 | 4 92 45 39 57 87 16 85 56
48| 88 34 55 21 12 37 53 2 41
86| 52 91 79 33 42 10 98 20 82
30| 71 59 43 58 84 14 27 90 19
72| 0 99 47 38 65 96 29 78 13
q)update res:sum (1+til -1+count cols t)*flip value t from t
c0| c1 c2 c3 c4 c5 c6 c7 c8 c9 res
--| -------------------------------
28| 3 18 66 31 25 76 9 44 97 2230
60| 35 63 17 15 26 22 73 7 50 1551
74| 64 51 62 54 1 11 69 32 61 1927
8 | 49 75 68 83 40 80 81 89 67 3297
5 | 4 92 45 39 57 87 16 85 56 2582
48| 88 34 55 21 12 37 53 2 41 1443
86| 52 91 79 33 42 10 98 20 82 2457
30| 71 59 43 58 84 14 27 90 19 2134
72| 0 99 47 38 65 96 29 78 13 2336
q)![t;();0b; enlist[`res]!enlist makeTree 1+til -1+count cols t] ~ update res:sum (1+til -1+count cols t)*flip value t from t
1b
q)\ts do[`int$1e4;![t;();0b; enlist[`res]!enlist makeTree 1+til 9]]
232 3216j
q)\ts do[`int$1e4;update nc:sum (1+til -1+count cols t)*flip value t from t]
69 2832j
I haven't tested this on a large table, so caveat emptor
Here is another solution which is also faster.
t,'([]res:(+/)("I"$(string tcols) inter\: .Q.n) *' (value t) tcols:(cols t) except keys t)
By spending some time, we can decrease the word count as well. Logic goes like this:
a:"I"$(string tcols) inter\: .Q.n
Here I am first extracting out the integers from column names and storing them in a vector. Variable 'tcols' is declared at the end of query which is nothing but columns of table except key columns.
b:(value t) tcols:(cols t) except keys t
Here I am extracting out each column vector.
c:(+/) a *' b
Multiplying each column vector(var b) by its integer(var a) and adding corresponding
values from each resulting list.
t,'([]res:c)
Finally storing result in a temp table and joining it to t.