I'm trying to create a table in KDB where the columns are the results of a query. For example , I have loaded in stock data and search for a given time window what prices the stock traded at. I created a function
getTrades[Sybmol; Date; StartTime; StopTime]
This will search through my database and return the prices that traded between the start and stop time. So my results for Apple for a 30 second window might be:
527.10, 527.45, 527.60, 526.90 etc.
What I want to do is now create a table using xbar where I have rows of every second and columns of all the prices that trade in StartTime and StopTime. I will then place an X in the column if the price traded in that 1 second. I think I can handle most of this but the main thing I'm struggling with is converting the results I got above into the name of the table. I'm also struggling with how to make it flexible so my table will have 5 columns in one scenario (5 prices traded) but 10 in another so essentially it varies depending on how many price levels traded in the window I'm searching.
Thanks.
The best and cleanest way to do programmatic selects is with the functional form of select.
from q for mortals,
?[t;c;b;a]
where t is a table, a is a dictionary of aggregates, b is a dictionary of groupbys and c is a list of constraints.
In other words, select a by b from t where c.
This will allow you to dynamically create a, which can be of arbitrary size.
You can find more information here:
http://code.kx.com/q4m3/9_Queries_q-sql/#912-functional-forms
Pivot Table
I think that pivot table will be suitable in this case. Using jgleeson example:
time price
------------------
11:27:01.600 106
11:27:02.600 102
11:27:02.600 102
11:27:03.100 100
11:27:03.100 102
11:27:03.100 102
11:27:03.100 104
11:27:03.600 104
11:27:03.600 102
11:27:04.100 106
11:27:05.100 105
11:27:06.600 106
11:27:07.100 101
11:27:07.100 104
11:27:07.600 105
11:27:07.600 105
11:27:07.600 101
not null exec (exec `$string asc distinct price from s)#(`$string price)!price by time:1 xbar time.second from s:select from t where time within 11:27:00 11:27:30
and returns:
time | 100 101 102 103 104 105 106
--------| ---------------------------
11:27:01| 0 0 0 0 0 0 1
11:27:02| 0 0 1 0 0 0 0
11:27:03| 1 0 1 0 1 0 0
11:27:04| 0 0 0 0 0 0 1
11:27:05| 0 0 0 0 0 1 0
11:27:06| 0 0 0 0 0 0 1
11:27:07| 0 1 0 0 1 1 0
It can support any numbers of unique prices.
This looks a bit convoluted... but I think this might be what you're after.
Sample table t with time and price columns:
t:`time xasc([]time:100?(.z.T+500*til 100);price:100?(100 101 102 103 104 105 106))
This table should replicate what you get from the first step of your function call - "select time,price from trade where date=x, symbol=y, starttime=t1, endtime=t2".
To return the table in the format specified:
q) flip (`time,`$string[c])!flip {x,'y}[key a;]value a:{x in y}[c:asc distinct tt`price] each group (!) . reverse value flip tt:update time:time.second from t
time 100 101 102 103 104 105 106
------------------------------------
20:34:29 0 1 0 0 0 1 0
20:34:30 0 0 0 0 0 0 1
20:34:31 0 0 1 0 0 0 0
20:34:32 0 0 1 0 1 0 0
...
This has bools instead of X as bools are probably easier to work with.
Also please excuse the one-liner... If I get a chance I'll break it up and try to make it more readable.
A more simplified version is :
q)t:`time xasc([] s:100#`s ; time:100?(.z.T+500*til 100);price:100?(100 101 102 103 104 105 106))
q)t1:update `$string price,isPrice:1b from t
q)p:(distinct asc t1`price)
q)exec p#(10b!"X ")#(price!isPrice) by time:1 xbar time.second from t1
time | 100 101 102 103 104 105 106
--------| ---------------------------
20:39:00| X X X
20:39:01| X X X X
20:39:02| X
20:39:04| X
20:39:05| X X X X
Related
I am trying to implement a moving rank function, taking parameters of n, the number of items, and m, the column name. Here is how I implement it:
mwindow: k){[y;x]$[y>0;x#(!#x)+\:!y;x#(!#x)+\:(!-y)+y+1]};
mrank: {[n;x] sum each x > prev mwindow[neg n;x]};
But this seems to take quite some time if n is moderately large, say 100.
I figure it is because it has to calculate from scratch, unlike msum, which keeps a running variable and only calculate the difference between the newly added and the dropped.
There's a number of general sliding window functions here that you can use to generate rolling lists on which to apply your rank: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
Those approaches seem to fill the lists out with zeros/nulls however which I think won't really suit your use of rank. Here's another possible approach which might be more suitable to rank (though I haven't tested this for performance on the large scale):
q)mwin:{x each (),/:{neg[x]sublist y,z}[y]\[z]}
q)update r:mwin[rank;4;c] from ([]c:10?100)
c r
----------
84 ,0
25 1 0
31 2 0 1
0 3 1 2 0
51 1 2 0 3
29 2 0 3 1
25 0 3 2 1
73 2 1 0 3
0 2 1 3 0
6 2 3 0 1
q)update r:last each mwin[rank;4;c] from ([]c:10?100)
c r
----
38 0
72 1
13 0
77 3
64 1
9 0
37 1
79 3
97 3
63 1
q)
I am trying to perform Anosim {vegan} on my ecological data and I keep getting the same error message. I don't think this is a duplicate question from another one already posted and would like to fully show what's happening.
I have got my numeric dataframe ("sps") consisting of 17 rows (sites) and 313 columns (species), and a second dataframe ("env.data") containing a column with 17 factors. I would therefore want to test if there are any significant differences between my 17 groups.
Here is a sample of my data:
> sps[,2:5]
A. faranauti A. tecta A. lyra A. arbuscula
Sargasso Sea 0 0 2 0
Equatorial Brazil 0 0 0 0
Canarias Sea 0 0 0 0
Corner Seamounts 0 0 0 2
Gulf of Mexico 0 0 0 0
Labrador Sea 0 0 0 0
Equatorial Africa 0 0 0 0
Tropic Seamount 0 0 0 107
NewEngland Seamount Chain 0 0 0 0
Norwegian Basin 0 0 0 0
Eastern North Atlantic 0 0 3 0
Logachev and BritishIsles 0 0 0 4
Reykjanes Ridge 0 0 0 0
MAR North 0 0 0 14
Flemish Cap 0 0 0 217
MAR South 1 1 0 0
Azores Seamount Chain 0 0 0 12
> class(sps)
[1] "data.frame"
> head(env.data)
idcell geo_area
1 1 Sargasso Sea
2 2 Equatorial Brazil
3 3 Canarias Sea
4 4 Corner Seamounts
5 5 Gulf of Mexico
6 6 Labrador Sea
> str(env.data)
'data.frame': 17 obs. of 2 variables:
$ idcell : Factor w/ 17 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ geo_area: Factor w/ 17 levels "Canarias Sea",..: 15 5 1 2 7 8 4 17 12 13..
Following {vegan}, I have first calculated a dissimilarity matrix with Sorensen as the distance method. I then use this dissimilarity matrix as my input for anosim:
dist.sorensen <- vegdist(sps, method= "bray", binary = TRUE, na.rm= TRUE,
diag = TRUE)
sorensen.anosim <- anosim(dat=dist.sorensen, env.data$geo_area, permutations
= 999)
> summary(sorensen.anosim )
Call:
anosim(dat = dist.sorensen, grouping = env.data$geo_area, permutations =
999)
Dissimilarity: binary bray
ANOSIM statistic R:
Significance: 0.001
Permutation: free
Number of permutations: 999
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
I have also tried anosim with the raw species data and I get the same error:
raw.anosim <- anosim(sps, env.data$geo_area, permutations = 999, distance =
"bray")
Any ideas? My "sps" dataframe (x) is numeric. My "env.data" dataset (groupings) has a factor column with 17 levels. I can't see where the error comes from, unless it's intrinsic to my data. Many of the 313 species listed in my original dataframe have been recorded only once across my 17 sites (very probably due to sampling bias). However, I get clusters after performing "vegdist (Sorensen index)" and "hclust".
whos.exit whos condition1 result
650 452 1 0
654 456 0 0
254 650 1 1
785 412 1 0
756 654 1 1
744 0 0
125 1 0
985 1 0
... ... ...
I wish obtain the result matrix.
Result matrix contains all "whos" which satify the condition1 and are present whos.exit but in no particular order. Note: all elements in whos.exit are unique and the result of whos(condition1) will give unique whos.
You can use ismember -
result = ismember(whos,whos.exit).*condition1
Or bsxfun -
result = any(bsxfun(#eq,whos,whos.exit.'),2).*condition1
Since whos is an in-built command in MATLAB, I would suggest using some other variable name there as a matter of good practice.
You could use intersect
intersect(whos.exit,whos.*condition1)
ans =
650
654
Or if you want a binary array (not as elegant asismember though)
A=zeros(size(whos.exit,1),1);
[~,~,iwe]=intersect(whos,whos.exit);
A(iwe) = 1;
A.*c1
ans =
0
0
1
0
1
0
0
0
or
[~,~,iwe]=intersect(whos,whos.exit);
sum((((c1.*whos.exit)./whos.exit(iwe)')==1)')'
ans =
0
0
1
0
1
0
0
0
Details
Find the indices in whose.exit whose values are in both arrays.
[~,~,iwe]=intersect(whos,whos.exit)
iwe =
3
5
Find where those values are. I just use a division because a value divided by itself will show a 1 and that tells us where the values are. Each row represents the value(s) we are looking for and the column the location of this value. The first value (whos.exit(iwe(1))) is location at position 3 and the second (whos.exit(iwe(2))) is location at position 5.
(((c1.*whos.exit)./whos.exit(iwe)')==1)'
ans =
0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0
The we just sum and transpose that to get the binary array
sum((((c1.*whos.exit)./whos.exit(iwe)')==1)')'
ans =
0
0
1
0
1
0
0
0
I have .mat file, where are two columns "Product" and "Customer". Customer number is repeated as many times as he purchased different products. The table looks like that:
Product Customer
114 1
112 2
112 1
113 4
115 3
113 2
111 2
113 3
And I need to make it like this:
Customer 111 112 113 114 115
1 0 1 0 1 0
2 1 1 1 0 0
3 0 0 1 0 1
4 0 0 1 0 0
In new table have to be "Customer" column and five more columns for each product and if the customer "1" bought product "112" there should be 1 ind if he didn't buy it should be 0.
How can I do it with MATLAB? Any help would be very nice!
This is a classic case for accumarray.
>> product = [114, 112, 112, 113, 115, 113, 111, 113]';
>> customer = [1, 2, 1, 4, 3, 2, 2, 3]';
>> [~,~,ic] = unique(product);
>> accumarray([customer, ic], 1)
ans =
0 1 0 1 0
1 1 1 0 0
0 0 1 0 1
0 0 1 0 0
Here we use unique to work out the unique product IDs, and the third output is the mapping from the product vector to the unique ID.
Say N_of_pr is the total number of products, N_of_cus is the total number of customers, and tab is the first two-column table you have. The resulting binary matrix is M
pr=zeros(1,N_of_pr);
cus=zeros(1,N_of_cus);
s=size(tab);
for j=1:s(1)
pr(tab(j,1))=1;
cus(tab(j,2))=1;
end;
[X,Y]=meshgrid(pr,cus);
M=X.*Y;
You could either use basic MATLAB commands like sparse
table = sparse(Customer, Product, 1);
or something like grpstats from the statistics toolbox.
t = table(Product, Customer);
grpstats(t, {'Customer','Product'})
This doesn't yield exactly the table you want, but I guess you could still achieve your goal with that.
There is also a submission called pivottable on the File Exchange, that will do what you want:
pivottable([Customer, Product, ones(size(Product))], 1, 2, 3, #sum)
How to find the indices of rows without any zero in a matrix?
Example:
A = [
14 0 6 9 8 17
85 14 1 3 0 99
0 0 0 0 0 0
29 4 5 8 7 46
0 0 0 0 0 0
17 0 5 0 0 49
]
the desired result :
V =[4]
Since Adiel did not post an answer, I'll make their comment a CW: the command
V = find(all(A,2))
does the job, because all(A,2) processes every row, returning 1 if there are any nonzero entries. Then find returns the indices of nonzero entries, which are the desired row numbers.
Similarly, V = find(all(A,1)) works column-wise.