Merging two data sets based on intervals - merge

I was wondering how I can merge these two data sets by a and b. a column in f data set is the lower bound of the intervals so I need to merge 1.5 from g data set with 1 from f, 4.4 from g with 4 from f, 9.8 from g with 9 from f and etc.
a<-seq(1:10)
b<-c("a","b","a","b","a","a","a","b","b","a")
f<-data.frame(a,b)
a<-c(1.5,1.4,2.3,2.2,4.4,4,5,6.6,9.8,4.1,4.6,5.5)
b<-c("a","b","b","b","a","b","a","b","a","b","a","b")
m<-seq(1:12)
g<-data.frame(a,b,m)

Not sure exactly what you are looking for here, but the floor() function should give you what you need. You might also look into the tidyverse, in general, and dplyr, in particular, for data manipulation.
It is not entirely clear what you expect as output - the b column differs a bit after merging - did you only want the records that match? Remove all.x,all.y parameters if you do not care about unmatched records. I also presume that renaming your columns may be in order:
a <- seq(1:10)
b <- c("a", "b", "a", "b", "a", "a", "a", "b", "b", "a")
f <- data.frame(a, b)
a <- c(1.5, 1.4, 2.3, 2.2, 4.4, 4, 5, 6.6, 9.8, 4.1, 4.6, 5.5)
b <- c("a", "b", "b", "b", "a", "b", "a", "b", "a", "b", "a", "b")
m <- seq(1:12)
g <- data.frame(a, b, m)
## floor function takes care of rounding down
g$c <- floor(g$a)
merge(f, g, by.x = "a", by.y = "c", all.x = TRUE, all.y = TRUE)
#> Warning in merge.data.frame(f, g, by.x = "a", by.y = "c", all.x = TRUE, :
#> column name 'a' is duplicated in the result
#> a b.x a b.y m
#> 1 1 a 1.5 a 1
#> 2 1 a 1.4 b 2
#> 3 2 b 2.3 b 3
#> 4 2 b 2.2 b 4
#> 5 3 a NA <NA> NA
#> 6 4 b 4.4 a 5
#> 7 4 b 4.0 b 6
#> 8 4 b 4.6 a 11
#> 9 4 b 4.1 b 10
#> 10 5 a 5.5 b 12
#> 11 5 a 5.0 a 7
#> 12 6 a 6.6 b 8
#> 13 7 a NA <NA> NA
#> 14 8 b NA <NA> NA
#> 15 9 b 9.8 a 9
#> 16 10 a NA <NA> NA

Related

kdb/q how to drop entries from lists

I am trying to concatenate several csv's with identical columns.
getDataFromCsv :{[fn];
if[not () ~key hsym fn; data: ("zSzzSISSIIIIIffffff"; enlist "\t") 0:fn;
... do stuff...
:data];}
getFiles:{[dates;strat];root:"/home/me/data_";:{x: `$x, ssr[string y; "."; ""], ".csv"}[root] each dates;}
getData:{[dates;strat];`tasks set ([]c:());files:getFiles[dates;strat];:getDataFromCsv each files;}
doing this I get a list of tables with some entries empty where there was no file
[0] = ([] c1;c2;c2 ...
[1] = ([] c1;c2;c2 ...
[2] = ([] c1;c2;c2 ...
[3] = ([] c1;c2;c2 ...
[4] = ([] c1;c2;c2 ...
[5] = ::
[6] = ([] c1;c2;c2 ...
With those entries, I cant raze the list to get a table including all entries. How can I drop those empty entries?
You can remove from the list where the type is not 98h for a quick fix, assuming there are no other data types contained in the list:
q)r
::
+`a`b`c!(41 48 29;2 8 6;5 8 5)
+`a`b`c!(41 48 29;2 8 6;5 8 5)
+`a`b`c!(41 48 29;2 8 6;5 8 5)
q)raze #[r;where 98h=type each r]
a b c
------
41 2 5
48 8 8
29 6 5
41 2 5
48 8 8
29 6 5
41 2 5
48 8 8
29 6 5
This also assumes that all columns are the same from each output. If they're not, you can use a uj to merge columns:
q)t:r,enlist ([] d:1 2 3; e:3 4 5)
q)(uj/)#[t;where 98h=type each t]
a b c d e
----------
41 2 5
48 8 8
29 6 5
41 2 5
48 8 8
29 6 5
41 2 5
48 8 8
29 6 5
1 3
2 4
3 5
Personally...
Where you have "...do stuff..." or ":data", I'd simply check the count of data or add a similar check. If the count=0 then return an empty list '()' rather than the generic null '(::)' that you return in your function currently.
The generic null is the problem here and that's what you are looking to fix.
Example below...
// example returning generic null
q){if[x~0;:(::)];([]2?10)}each 1 0 3
+(,`x)!,4 4
::
+(,`x)!,7 9
q)raze {if[x~0;:(::)];([]2?10)}each 1 0 3
(,`x)!,6
(,`x)!,2
::
(,`x)!,9
(,`x)!,2
// put a check in against 'data' to return an empty list if count=0 or similar
q){if[x~0;:()];([]2?10)}each 1 0 3
+(,`x)!,3 2
()
+(,`x)!,1 8
// your raze works now
q)raze {if[x~0;:()];([]2?10)}each 1 0 3
x
-
3
1
7
2

How to compare more than one row vector to a matrix?

I have a 24 x 3 matrix “point1” and I have also 5 other 1x3 row vectors. What I want is to compare all of the 5 different row vectors to each row of “point1” so as to see if any of the 5 vectors have a corresponding row in “point1” which is equal to them, and then return the index of that row in “point1”. I have been able to do this with the following code, but i am seeking a more simple and elegant (possibly without a loop?) solution.
point1 = [7.5 4 5
8.5 4 5
9.5 4 5
10.5 4 5
11.5 4 5
7 4 5.5
12 4 5.5
6.5 4 6
12.5 4 6
6 4 6.5
13 4 6.5
5.5 4 7
13.5 4 7
5 4 7.5
14 4 7.5
5 4 8.5
14 4 8.5
5 4 9.5
14 4 9.5
5 4 10.5
14 4 10.5
5 4 11.5
14 4 11.5
5.5 4 12];
fN = [8, 4.5, 5];
fS = [8, 3.5, 5];
fE = [8.5, 4, 5];
bN = [7, 4.5, 5];
bT = [7, 4, 5.5];
for ii = 1:size(point1, 1)
indx(ii) = isequal(point1(ii,:),fN(:)') | isequal(point1(ii,:),fS(:)') | isequal(point1(ii,:),fE(:)') | isequal(point1(ii,:),bN(:)') | isequal(point1(ii,:),bT(:)')
pIndx = find(indx)
end
This returns:
pIndx = [2 6];
Thanks guys!
You can use ismember with the 'rows' flag to search for intersections between your vectors and your data matrix.
Using the above example it's probably easiest to concatenate your query vectors into one matrix and use that as an input:
test = find(ismember(point1, vertcat(fN, fS, fE, bN, bT), 'rows'))
Which returns:
test =
2
6
Alternatively you can make the queries individually if the individual results are important:
test_fN = find(ismember(point1, fN, 'rows'));
test_fS = find(ismember(point1, fS, 'rows'));
test_fE = find(ismember(point1, fE, 'rows'));
test_bN = find(ismember(point1, bN, 'rows'));
test_bT = find(ismember(point1, bT, 'rows'));
You could try the following:
[find(all(fN == point1, 2)), ...
find(all(fS == point1, 2)), ...
find(all(fE == point1, 2)), ...
find(all(bN == point1, 2)), ...
find(all(bT == point1, 2))]

Reorder Table Rows and columns Matlab

I have a 5x5 table:
a b c d e
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
where the headers are the strings.
I want to know how to reorder the table rows and columns using the headers
so for example of I wanted them to be in this order e b c a d then this will be the table:
e b c a d
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 7 3 1 4
d 8 4 1 4 7
Let the table be defined as
T = table;
T.a = [1 3 1 4 6].';
T.b = [2 5 3 4 7].';
T.c = [3 7 4 1 2].';
T.d = [4 2 6 7 1].';
T.e = [5 6 1 8 6].';
And let the new desired order be
order = {'e' 'b' 'c' 'a' 'd'};
The table can be reordered using just indexing:
[~, ind] = ismember(order, T.Properties.VariableNames);
T_reordered = T(ind,order);
Note that:
To reorder only columns you'd use T_reorderedCols = T(:,order);
To reorder only rows you'd use T_reorderedRows = T(ind,:);
So in this example,
T =
a b c d e
_ _ _ _ _
1 2 3 4 5
3 5 7 2 6
1 3 4 6 1
4 4 1 7 8
6 7 2 1 6
T_reordered =
e b c a d
_ _ _ _ _
6 7 2 6 1
6 5 7 3 2
1 3 4 1 6
5 2 3 1 4
8 4 1 4 7
Here is a way to do it using indexing. You can indeed re-arrange the rows and columns using indices as you would for any array. In this case, I substitute each letter in the headers array with a number (originally [1 2 3 4 5]) and then, using a vector defining the new order [5 2 3 1 4], re-order the table. You could make some kind of lookup table to automate this when you deal with larger tables:
clc
clear
a = [1 2 3 4 5;
3 5 7 2 6;
1 3 4 6 1;
4 4 1 7 8;
6 7 2 1 6];
headers = {'a' 'b' 'c' 'd' 'e'};
%// Original order. Not used but useful to understand the idea... I think :)
OriginalOrder = 1:5;
%// New order
NewOrder = [5 2 3 1 4];
%// Create table
t = table(a(:,1),a(:,2),a(:,3),a(:,4),a(:,5),'RowNames',headers,'VariableNames',headers)
As a less cumbersome alternative to manually creating the table with the function table, you can use (thanks to #excaza) the function array2table which saves a couple steps:
t = array2table(a,'RowNames',headers,'VariableNames',headers)
Either way, re-arrange the table using the new indices:
New_t = t(NewOrder,NewOrder)
Output:
t =
a b c d e
_ _ _ _ _
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
New_t =
e b c a d
_ _ _ _ _
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 2 3 1 4
d 8 4 1 4 7

combine two value of array in Matlab

Example, i have an array :
a = [2 3 1 2 4 5 6 4];
b = sort(a);
b = [1 2 2 3 4 4 5 6];
Now i want to combine two value of a and b :
c = [21 32 12 23 44 54 65 46]
then i do the sort of c :
d = [12 21 23 32 44 46 54 65]
and i combine again from c and d (first of c, same value of second c and first d, last d) :
e = [212 321 123 232 444 546 654 465]
then i do the sort of e :
f = [123 212 232 321 444 465 546 654]
and i combine again from e and f :
g = [2123 3212 1232 2321 4444 5465 6546 4654]
so on till the length of a that equals to 8.
Please help me.
Try this:
a = [2 3 1 2 4 5 6 4]
for m=2:8
b = sort(a)
t = round(b-10*floor(b/10))
a = 10*a+t
end
It looks to me like the algorithm just adds the last digit of each sorted list onto the corresponding number in the unsorted list. t is just the last digit in b, then 10*a+t shifts the existing digits in a and puts t at the end. Apologies if I have misunderstood the objective and this is the wrong algorithm, but it works with you example. I guess you will need to convince yourself whether the code does follow your rules.

Conditionally replacing cell values with column names

I have a 165 x 165 rank matrix such that each row has values ranging from 1-165. I want to parse each row and delete all values >= 5, sort each row in increasing order, then replace the values 1-5 with the name of the column from the original matrix.
For example, for row k the values 1 ,2 3, 4, 5, would result after the first two transformations and would be replaced by p,d, m, n, a.
I am assuming that your array consists of an array of arrays...
Neither Awk, Sed, or Perl have multi-dimensional arrays. However, they can be emulated in Perl by using arrays of arrays.
$a[0]->[0] = xx;
$a[0]->[1] = yy;
[...]
$a[0]->[164] = zz;
$a[1]->[0] = qq;
$a[1]->[1] = rr;
[...]
$a[164]->[164] = vv;
Does this make sense?
I'm calling the row $x and columns $y, so an element in your array will be $array[$x]->[$y]. Is that good?
Okay, your column names will be in row $array[0], so if we find a value less than five in $array[$x]->[$y], we know the column name is in $array[0]->[$y]. Is that good?
for my $x (1..164) { #First row is column names
for my $y (0..164) {
if ($array[$x]->[$y] <= 5) {
$array[$x]->[$y] = $array[0]->[$y];
}
}
}
I'm simply going through all the rows, and for each row, all the columns, and checking the value. If the value is less than or equal to five, I replace it with the column name.
I hope I'm not doing your homework for you.
This GNU sed solution might work although it will need scaling up as I only used a 10x10 matrix for testing purposes:
# { echo {a..j};for x in {1..10};do seq 1 10 | shuf |sed 'N;N;N;N;N;N;N;N;N;s/\n/ /g';done; }> test_data
# cat test_data
a b c d e f g h i j
4 5 9 3 6 2 10 8 7 1
3 7 4 2 1 6 10 5 8 9
10 9 3 1 2 7 8 5 6 4
5 10 4 9 7 8 1 3 6 2
8 6 5 9 1 4 3 2 7 10
2 8 9 3 5 6 10 1 4 7
3 9 8 2 1 4 10 6 7 5
3 7 2 1 8 6 10 4 5 9
1 10 8 3 6 5 4 2 7 9
7 2 3 5 6 1 10 4 8 9
# cat test_data |
sed -rn '1{h;d};s/[0-9]{2,}|[6-9]/0/g;G;s/\n|$/ &/g;s/$/&1 2 3 4 5 /;:a;s/^(\S*) (.*\n)(\S* )(.*)/\2\4\1\3/;ta;s/\n//;s/0[^ ]? //g;:b;s/([1-5])(.*)\1(.)/\3\2/;tb;p'
j f d a b
e d a c h
d e c j h
g j h c a
e h g f c
h a d i e
e d a f j
d c a h i
a h d g f
f b c h d
The sed command works as follows.
The first line of the data file contains the column headings is stored in the hold space then the pattern space (current line) is deleted. For all subsequent data lines all two or more digit numbers and values 6 to 9 are converted to 0. The column names are appended, along with a newline to the data values. Spaces are inserted before the newline and end of string. The data is transformed into a lookup and the sorted values i.e.. 1 2 3 4 5 is prepended to it. The newline is removed along with any 0 values and associated lookups. The values 1 to 5 are replaced by the column names in the lookup.
EDIT:
I may have misunderstood the problem regarding sorting columns or rows, if so it's a minimal fix - replace 1 2 3 4 5 by the original values and perform a numeric sort prior to replacing the numeric data with column names from the lookup.