Inconsistent Results Using Loopback Count End-Point With Filter+Where vs. Just Where - rest

We're getting unexpected and inconsistent results when using Loopback's count end-point. Using filter[where] always returns the total number of rows in the table, ignoring any filter. For example, the following call always returns a value of 9 which is not correct (given the data below):
/api/class/count?filter[where][companyrowid]=1
Class Table Data (MySQL)
rowid description companyrowid
3367 test1 0
3366 test2 0
3364 Asia Division 1
3365 Australia Division 1
3362 Canada Division 1
3363 Europe Division 1
3359 US East Division 1
3361 US Midwest Division 1
3360 US West Division 1
Strangely, using 'where[companyid]=1' returns the correct value of 7. Close inspection of the http request object shows that filter[where][...] and where[..] filters are actually stored separately and yield different results:
/api/class/count?filter[where][companyrowid]=1
req.query.filter.where = {companyid: "1"}
req.query.where = undefined
Result: 9 (incorrect)
/api/class/count?where[companyrowid]=1
req.query.filter.where = undefined
req.query.where = {companyid: "1"}
Result: 7 (correct)
The biggest issue is the filter[where] ignoring the filter.

count() method only takes the where clause instead of the filter.
So your REST API should be like this
/api/class/count?[where][companyrowid]=1

Related

Equal not throwing error during list comparison in KDB

I have a table
t:([] s:`GOOG`APPL;d:2?.z.d ;t:2?.z.t)
I'd use = while selecting a single record :
q)select from t where s=`GOOG
s d t
----------------------------
GOOG 2017.01.25 04:30:47.898
However I am expecting an error while selecting multiple records using '=' because I should be using 'in' , but it is returning the correct data rather than throwing any error :
q)select from t where s=`GOOG`APPL
s d t
----------------------------
GOOG 2017.01.25 04:30:47.898
APPL 2007.12.27 04:07:38.923
In the first case you are comparing a vector to an atom. = is overloaded for this operation, so it will compare each element in the vector to that atom
q)t[`s]
`GOOG`APPL
q)t[`s]=`GOOG
10b
In the second case, where you expect the error, = is doing a vector to vector comparison. Since the length of both vectors are the same it works and does not throw a length error :
q)t[`s]=`GOOG`APPL
11b
Changing the order won't return anything due to the element by element comparison :
q)t[`s]=`APPL`GOOG
00b
e.g. for the following table with 3 elements
t:([] s:`GOOG`APPL`HP;d:3?.z.d ;t:3?.z.t)
q)t[`s]
`GOOG`APPL`HP
you'd get a length error
q)t[`s]=`GOOG`APPL
'length
and therefore using in would fix the error
q)t[`s] in `GOOG`APPL
110b

Reshaping and merging simulations in Stata

I have a dataset, which consists of 1000 simulations. The output of each simulation is saved as a row of data. There are variables alpha, beta and simulationid.
Here's a sample dataset:
simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
I want to estimate a new value - let's call it new - which depends on alpha and beta as well as different levels of two other variables which we'll call risk and price. Values of risk range from 0 to 100, price from 0 to 500 in steps of 5.
What I want to achieve is a dataset that consists of values representing the probability that (across the simulations) new is greater than 0 for combinations of risk and price.
I can achieve this using the code below. However, the reshape process takes more hours than I'd like. And it seems to me to be something that could be completed a lot quicker.
So, my question is either:
i) is there an efficient way to generate multiple datasets from a single row of data without multiple reshape, or
ii) am I going about this in totally the wrong way?
set maxvar 15000
/* Input sample data */
input simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
end
forvalues risk = 0(1)100 {
forvalues price = 0(5)500 {
gen new_r`risk'_p`price' = `price' * (`risk'/200)* beta - alpha
gen probnew_r`risk'_p`price' = 0
replace probnew_r`risk'_p`price' = 1 if new_r`risk'_p`price' > 0
sum probnew_r`risk'_p`price', mean
gen mnew_r`risk'_p`price' = r(mean)
drop new_r`risk'_p`price' probnew_r`risk'_p`price'
}
}
drop if simulationid > 1
save simresults.dta, replace
forvalues risk = 0(1)100 {
clear
use simresults.dta
reshape long mnew_r`risk'_p, i(simulationid) j(price)
keep simulation price mnew_r`risk'_p
rename mnew_r`risk'_p risk`risk'
save risk`risk'.dta, replace
}
clear
use risk0.dta
forvalues risk = 1(1)100 {
merge m:m price using risk`risk'.dta, nogen
save merged.dta, replace
}
Here's a start on your problem.
So far as I can see, you don't need more than one dataset.
The various reshapes and merges just rearrange what was first generated and that can be done within one dataset.
The code here in the first instance is for just one pair of values of alpha and beta. To simulate 1000 such, you would need 1000 times more observations, i.e. about 10 million, which is not usually a problem and to loop over the alphas and betas. But the loop can be tacit. We'll get to that.
This code has been run and is legal. It's limited to one alpha, beta pair.
clear
input simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
end
local N = 101 * 101
set obs `N'
egen risk = seq(), block(101)
replace risk = risk - 1
egen price = seq(), from(0) to(100)
replace price = 5 * price
gen result = (price * (risk/200)* beta[1] - alpha[1]) > 0
bysort price risk: gen mean = sum(result)
by price risk: replace mean = mean[_N]/_N
Assuming now that you first read in 1000 values, here is a sketch of how to get the whole thing. This code has not been tested. That is, your dataset starts with 1000 observations; you then enlarge it to 10 million or so, and get your results. The tricksy part is using an expression for the subscript to ensure that each block of results is for a distinct alpha, beta pair. That's not compulsory; you could do it in a loop, but then you would need to generate outside the loop and replace within it.
local N = 101 * 101 * 1000
set obs `N'
egen risk = seq(), block(101)
replace risk = risk - 1
egen price = seq(), from(0) to(100)
replace price = 5 * price
egen sim = seq(), block(10201)
gen result = (price * (risk/200)* beta[ceil(_n/10201)] - alpha[ceil(_n/10201)]) > 0
bysort sim price risk: gen mean = sum(result)
by sim price risk: replace mean = mean[_N]/_N
Other devices used: egen to set up in blocks; getting the mean without repeated calls to summarize; using a true-or-false expression directly.
NB: I haven't tried to understand what you are doing, but it seems to me that the price-risk-simulation conditions define single values, so calculating a mean looks redundant. But perhaps that is in the code because you wish to add further detail to the code once you have it working.
NB2: This seems a purely deterministic calculation. Not sure that you need this code at all.

kdb+/q: Apply iterative procedure with updated variable to a column

Consider the following procedure f:{[x] ..} with starting value a:0:
Do something with x and a. The output is saved as the new version of a, and the output is returned by the function
For the next input x, redo the procedure but now with the new a.
For a single value x, this procedure is easily constructed. For example:
a:0;
f:{[x] a::a+x; :a} / A simple example (actual function more complicated)
However, how do I make such a function such that it also works when applied on a table column?
I am clueless how to incorporate this step for 'intermediate saving of a variable' in a function that can be applied on a column at once. Is there a special technique for this? E.g. when I use a table column in the example above, it will simply calculate a+x with a:0 for all rows, opposed to also updating a at each iteration.
No need to use global vars for this - can use scan instead - see here.
Example --
Generate a table -
q)t:0N!([] time:5?.z.p; sym:5?`3; price:5?100f; size:5?10000)
time sym price size
-----------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496
2010.10.30D10:15:14.157553088 obp 31.59526 1728
2005.11.01D21:15:54.022395584 dhc 34.10485 5486
2005.03.06D21:05:07.403334368 mho 86.17972 2318
Example with a simple accumilator - note, the function has access to the other args if needed (see next example):
q)update someCol:{[a;x;y;z] (a+1)}\[0;time;price;size] from t
time sym price size someCol
-------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 1
2007.05.21D04:26:13.021438816 llm 7.347808 496 2
2010.10.30D10:15:14.157553088 obp 31.59526 1728 3
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 4
2005.03.06D21:05:07.403334368 mho 86.17972 2318 5
Say you wanted to get cumilative size:
q)update cuSize:{[a;x;y;z] (a+z)}\[0;time;price;size] from t
time sym price size cuSize
------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 14022
If you wanted more than one var passed through the scan, can pack more values into the first var, by giving it a more complex structure:
q)update cuPriceAndSize:{[a;x;y;z] (a[0]+y;a[1]+z)}\[0 0;time;price;size] from t
time sym price size cuPriceAndSize
--------------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 36.41874 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 68.014 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 102.1188 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 188.2986 14022
#MdSalih solution is correct, I am just explaining here what could be the possible reason with global variable in your case and solution for that.
q) t:([]id: 1 2)
q)a:1
I think you might have been using it like this:
q) select k:{x:x+a;a::a+1;:x} id from t
output:
k
--
1
2
And a value is 2 which means function executed only once. Reason is we passed full id column list to function and (+) is atomic which means it operates on full list at once. In following ex. 2 will get added to all items in list.
q) 2 + (1;3;5)
Correct way to use it is 'each':
q)select k:{x:x+a;a::a+1;:x} each id from t
output:
k
--
2
3

Why does calling random() inside of a case statement produce unexpected results?

https://gist.github.com/anonymous/2463d5a8ee2849a6e1f5
Query 1 does not produce the expected results. However, queries 2 and 3 do. Why does moving the call to random() outside of the case statement matter?
Consider the first expression:
select (case when round(random()*999999) + 1 between 000001 and 400000 then 1
when round(random()*999999) + 1 between 400001 and 999998 then 2
when round(random()*999999) + 1 between 999999 and 999999 then 3
else 4
end)
from generate_series(1, 8000000)
Presumably, you are thinking that the value "4" should almost never be selected. But, the problem is that random() is being called separately for each when clause.
So, the chance the it fails each clause is independent:
About 60% of the time a random number will not match "1".
About 40% of the time a random number will not match "2".
About 99.9999% of the time a random number will not match "3" (I apologize if the number of nines is off, but the value is practically 1).
That means that about 24% of the time (60% * 40% * 99.9999%), the value "4" will appear. In actual fact, the first query returns "4" 23.98% of the time. To be honest, this is very close to the actual value, but given this size of data, but it is a bit further off than I would expect. However, it is close enough to explain what is happening.

SSRS expression evaluation not working

A shared data set returns two columns: DaysEarly and NumberShipped. I want to use this in a table to show the number of shipments that were early, on time, and late. The data looks like this:
DaysEarly NumberShipped
3 123
2 234
1 254
0 542 -- On time shipments
-1 43
-2 13
The table uses this dataset. I have the following expressions:
-- Early kits
=IIf(Fields!DaysEarly.Value > 0, Sum(Fields!NumberOfKits.Value),Nothing)
-- On Time Kits
=IIf(Fields!DaysEarly.Value = 0, Sum(Fields!NumberOfKits.Value), Nothing)
-- Late Kits
=IIf(Fields!DaysEarly.Value < 0, Sum(Fields!NumberOfKits.Value), Nothing)
The last expression sums all shipments. The first two return the following message:
The Value expression for the textrun...contains an error:
The query returned now rows for the dataset. The expression
therefore evaluates to null.
What is the correct way to do this? I am looking for a result like this based on the data above:
Early shipments: 611
On time shipments: 542
Late shipments: 56
You were on the right track, you need to move the IIf expression inside the Sum expression:
Early:
=Sum(IIf(Fields!DaysEarly.Value > 0, Fields!NumberShipped.Value, Nothing))
On Time:
=Sum(IIf(Fields!DaysEarly.Value = 0, Fields!NumberShipped.Value, Nothing))
Late:
=Sum(IIf(Fields!DaysEarly.Value < 0, Fields!NumberShipped.Value, Nothing))