Remove first n elements by group from table - group-by

Say I have the following table:
t:([]date:25#(.z.d+ til 5); travel:(5#`car),(5#`plane),(5#`bus),(5#`cycle),(5#`scooter); val:25?100)
date travel val
----------------------
2019.12.06 car 75
2019.12.07 car 47
2019.12.08 car 70
2019.12.09 car 32
2019.12.10 car 86
2019.12.06 plane 29
2019.12.07 plane 96
How do I remove the first n observations (assuming they do not start all on 2019.12.06) by travel column?
For instance, in that particular example, if n=1, I would only get entries where date>2019.12.06.

I would use following code snippet:
n: 3;
select from t where i>({last[y]^y#x-1}[n];i) fby travel
In above statement all row numbers are grouped by travel and first n of them are removed. {last[y]^y#x-1}[n] returns nth rownum value or last rownum, if n is larger than number of rows in group.

Below would work (courtesy of iasc) in cases where the data is not necessarily sorted by date:
q)select from t where 0<(iasc;date) fby travel
date travel val
----------------------
2019.12.07 car 30
2019.12.08 car 75
2019.12.09 car 61
2019.12.10 car 56
2019.12.07 plane 46
...
Wrapping in a function as you describe:
q){[tbl;skipNum]select from tbl where (skipNum-1)<(iasc;date) fby travel}[t;1]
date travel val
----------------------
2019.12.07 car 30
2019.12.08 car 75
2019.12.09 car 61
2019.12.10 car 56
2019.12.07 plane 46
...

I would take advantage of the use of a simple by clause. We can get the indices of rows by travel easily enough
q)exec i by travel from t
bus | 10 11 12 13 14
car | 0 1 2 3 4
cycle | 15 16 17 18 19
plane | 5 6 7 8 9
scooter| 20 21 22 23 24
From this we can drop n results, raze the indices and index back in
q)n:1
q)t asc raze value exec _\:[n;]i by travel from t
date travel val
----------------------
2019.12.07 car 10
2019.12.08 car 1
2019.12.09 car 90
2019.12.10 car 73
2019.12.07 plane 43
2019.12.08 plane 90
2019.12.09 plane 84
2019.12.10 plane 63
2019.12.07 bus 54
2019.12.08 bus 38
2019.12.09 bus 97
2019.12.10 bus 88
2019.12.07 cycle 68
2019.12.08 cycle 45
2019.12.09 cycle 2
2019.12.10 cycle 39
2019.12.07 scooter 49
2019.12.08 scooter 82
2019.12.09 scooter 40
2019.12.10 scooter 88

Related

Why can't I put a condition in update statement?

I have a table:
t:([]val:10?100)
And I want to add a column with a cond statement: If value is below 55, just set it to 55. However the update statement does not work with that:
update newVal:$[val<55;55;val] from
How do I have to change it?
Thanks.
Since val is a vector you have to use the vector conditional
update newVal:?[val<55;55;val] from t
By the way, an alternative way of flooring it out at 55 is to use max/or (|)
update val|55 from t
I think #terrylynch is perfect. But sometimes when vector condition is too hard to use, you can use a lambda inside q-sql statement as well. And for the reason for using vector condition is simply because a column is a list
q)t
val
---
12
10
1
90
73
90
43
90
84
63
q)update newVal:{$[x<55;55;x]}each val from t
val newVal
----------
12 55
10 55
1 55
90 90
73 73
90 90
43 55
90 90
84 84
63 63
q)update newVal:{x|55}each val from t
val newVal
----------
12 55
10 55
1 55
90 90
73 73
90 90
43 55
90 90
84 84
63 63

Sorting wrt to a column value in matlab [duplicate]

This question already has answers here:
Sorting entire matrix according to one column in matlab
(2 answers)
Closed 4 years ago.
I have multiple columns in my dataset and column 2 contains value from 1 till 7. I want to sort my dataset with respect to second column . Thanks in advance
The command you need is sortrows
By default this sorts with respect to the first column, but an additional argument can be used to change this to the 2nd (or 5th, 17th etc)
If A is your original array:
B = sortrows(A,2);
will give you the sorted array B w.r.t 2nd column
What did you mean by sort with respect to second column? You should be more specific or at least give us an example.
If you need a simple sort on each column use the following
A =
95 45 92 41 13 1 84
23 1 73 89 20 74 52
60 82 17 5 19 44 20
48 44 40 35 60 93 67
89 61 93 81 27 46 83
76 79 91 0 19 41 1
Sort each column of A in ascending order:
c = sort(A, 1)
c =
23 1 17 0 13 1 1
48 44 40 5 19 41 20
60 45 73 35 19 44 52
76 61 91 41 20 46 67
89 79 92 81 27 74 83
95 82 93 89 60 93 84

How can I interrupt a 'loop' in kdb?

numb is a list of numbers:
q))input
42 58 74 51 63 23 41 40 43 16 64 29 35 37 30 3 34 33 25 14 4 39 66 49 69 13..
31 41 39 27 9 21 7 25 34 52 60 13 43 71 10 42 19 30 46 50 17 33 44 28 3 62..
15 57 4 55 3 28 14 21 35 29 52 1 50 10 39 70 43 53 46 68 40 27 13 69 20 49..
3 34 11 53 6 5 48 51 39 75 44 32 43 23 30 15 19 62 64 69 38 29 22 70 28 40..
18 30 60 56 12 3 47 46 63 19 59 34 69 65 26 61 50 67 8 71 70 44 39 16 29 45..
I want to iterate through each row and calculate the sum of the first 2 and then 3 and then 4 numbers etc. If that sum is greater than 1000 I want to stop the iteration on that particualr row and jump on the next row and do the same thing. This is my code:
{[input]
tot::tot+{[x;y]
if[1000<sum x;:count x;x,y]
}/[input;input]
}each numb
My problem here is that after the count of x is added to tot the over keeps going on the same row. How can I exit over and jump on the next row?
UPDATE: (QUESTION STILL OPEN) I do appreciate all the answers so far but I am not looking for an efficient way to sum the first n numbers. My question is how do I break the over and jump on the next line. I would like to achieve the same thing as with those small scripts:
C++
for (int i = 0; i <= 100; i++) {
if (i = 50) { printf("for loop exited at: %i ", i); break; }
}
Python
for i in range(100):
if i == 50:
print(i);
break;
R
for(i in 1:100){
if(i == 50){
print(i)
break
}
}
I think this is what you are trying to accomplish.
sum {(x & sums y) ? x}[1000] each input
It takes a cumulative sum of each row and takes an element wise minimum between that sum and the input limit thereby capping the output at the limit like so:
q)(100 & sums 40 43 16 64 29)
40 83 99 100 100
It then uses the ? operator to find the first occurance of that limit (i.e the element where this limit was equaled or passed) adding one as it is 0 indexed. In the example the first 100 occurs after 3 elements. You might want add one to include the first element after the limit in the count.
q)40 83 99 100 100 ? 100
3
And then it sums this count over all rows of the input.
You could use coverage in this case to exit when you fail to satisfy a condition
https://code.kx.com/q/ref/adverbs/#converge-repeat
The first parameter would be a function that does your check based on the current value of x which will be the next value to be passed in the main function.
For your example ive made a projection using the main input line then increase the indexes of what i am summing each time:
q)numb
98 11 42 97 89 80 73 35 4 30
86 33 38 86 26 15 83 71 21 22
23 43 41 80 56 11 22 28 47 57
q){[input] {x+1}/[{100>sum (y+1)#x}[input;];0] }each numb
1 1 2
this returns the first index of each where running sum is over 100
However this isn't really an ideal use case of KDB
could instead be done with something like
(sums#/:numb) binr\: 100
maybe your real example makes more sense
You can use while loops in KDB although all KDB developers are generally too afraid of being openly mocked and laughed at for doing so
q){i:0;while[i<>50;i+:1];:"loop exited at ",string i}`
"loop exited at 50"
Kdb does have a "stop loop" mechanism but only in the case of a monadic function with single seed value
/keep squaring until number is no longer less than 1000, starting at 2
q){x*x}/[{x<1000};2]
65536
/keep dealing random numbers under 20 until you get an 18 (seed value 0 is irrelevant)
q){first 1?20}\[18<>;0]
0 19 17 12 15 10 18
However this doesn't really fit your use case and as other people have pointed out, this is not how you would/should solve this problem in kdb.

How to select and remove cells from a 2d matrix of cells in matlab

I have a 35x2 matrix (randomwords); and I have randomly selected 8 rows (rndm). What I need to do is remove the 8 selected rows from the randomwords matrix and save this new 27x2 matrix under a new variable heading, but I am finding this extremely difficult. I have provided my code Any help would be greatly appreciated.
target = words ([30 1 46 14 44 55 8 3 57 65 69 70 57 39 21 60 22 20 16 10 9 17 62 19 25 41 49 53 36 6 42 58 40 56 63]);
synonym = words([43 15 32 28 72 27 48 51 13 67 59 33 35 47 52 61 71 7 23 12 2 66 11 37 4 45 64 38 34 31 29 18 50 68 26]);
% assigns these elements of words into targets and synonyms. They are
% ordered so that words and synonyms are corresponding elements of
% synonyms and targets
% TO SELECT 8 RANDOM WORDS FOR THE ENCODING PHASE
randomwords = [target; synonym]'; % should be a 35x2 matrix
rndm = datasample(randomwords, 8, 1); % should select 8 random couples from the rows and none of them will be repeats
unpaired = rndm(:,2); % should select only the synonyms to form the unpaired stimuli; will be different for each run
Store the index of the removed rows in a variable, let's say removedrows and then just do:
result = randomwords;
result(removedrows,:) = [];

Matlab: Moving a percentage of one value to another in a matrix

t=3;
If I have matrix A (9x9xt):
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81
And vector B (9x1xt):
0.5
0.6
0.7
0.5
0.6
0.7
0.5
0.6
0.7
for j=1:t
I'm trying to move a percentage of A(7,1,j) into A(7,2,j+1). The percentage is B(3,1,j) in this case.
I thought I should create a new value, m(1,1,j), which holds the percentage value: B(3,1,j) * A(7,1,j)...
m(1,1,j)= A(7,1,j)*B(3,1,j); %# Find out what the percentage of A(7,1,j) is.
A(7,2,j+1)= A(7,2,j)+m(1,1,j); %# Add that ''percentaged'' value to the A(7,2,j+1)
A(7,1,j+1) = A(7,1,j)-m(1,1,j); %# Remove that ''percentaged'' value from A(7,1,j+1)
This, however, does not work. m(1,1,j) doesn't actually seem to equal A(7,1,j)*B(3,1,j) when I type ''m(1,1,j)''..
Does anyone have a better and simple idea in how to move a percentage of one value in a matrix into another for the next timestep...That percentage must be removed from one value and added to the other.
Edit: Is this possible to complete in a loop?
There are apparently two problems. First is the j+1 in your last line (as indicated by #Sam). Second, you mistakenly increase the j-th item by j*m(1,1,j) and not by m(1,1,j). This happens because you add to the next element, move next, and then you add the accumulated amount. A corrected vectorized version:
t=3;
A = repmat(reshape(1:81,9,9)',[1,1,t]);
B = repmat([0.5 0.6 0.7 0.5 0.6 0.7 0.5 0.6 0.7]', [1,1,t]);
m(1,1,1:t)= A(7,1,1:t).*B(3,1,1:t); %# Find out what the percentage of A(7,1,j) is.
A(:,:,t+1)=0; % Add zeros matrix at A(:,:,t+1)
A(7,2,2:t+1) = A(7,2,2:t+1)+m(1,1,1:t); %# Add that ''percentaged'' value to the A(7,2,j+1)
A(7,1,1:t) = A(7,1,1:t)-m(1,1,1:t); %# Remove that ''percentaged'' value from A(7,1,j+1)
Note: your original code also increases the size of A.
Should your last line read:
A(7,1,j) = A(7,1,j)-m(1,1,j);
instead of
A(7,1,j+1) = A(7,1,j)-m(1,1,j);
As to whether there's a better way to do this - I'm not sure, as I'm not sure what you're ultimately trying to do. I would guess that if you're trying to carry out this operation for all rows, or all columns, or repeatedly in some other way, then there would be a vectorized way of doing it rather than a for loop.