Wrong Output in Postgres - postgresql

I am joining 3 tables to get the retention rate. Here is my query:
select first_visit.first_month as first_month,
new_users.new_users as new_users,
count(distinct visit_tracker.customer__id) as retained,
cast(count(distinct visit_tracker.customer__id) / new_users.new_users as float) as retention_percent
from first_visit
left join visit_tracker
on visit_tracker.customer__id=first_visit.customer__id
left join new_users
on new_users.first_month=first_visit.first_month
group by 1,2;
I get the following output:
first_month new_users retained retention_percent
0 93 34 0
1 119 42 0
2 188 102 0
3 223 71 0
and so on
What I want is this:
first_month new_users retained retention_percent
0 93 34 0.37
1 119 42 0.35
2 188 102 0.54
3 223 71 0.32
I am not sure why it's not producing the results I want. Any inputs?

This looks like a classic case of an integer division problem.
In this case count(distinct visit_tracker.customer__id) will return an integer which is then divided by a float. It looks like the float is cast into an integer and the result of the division is therefore an integer. Because the expected answer is less than one, it truncates to zero. The as float part of your query will not help as this happens after the truncation has already occured.
Try making sure both the numerator and the denominator are floats before performing the division or multiply by 100 beforehand as this stackoverflow answer suggests.

Related

SQL translation of IF statement to PowerBi Dax Query

I'm kinda new to DAX and PowerBi and I need to translate my SQL IF statement for whatever syntax is this on PowerBi to achieve the output I want.
Sql code I want to translate:
IF (Payment.payment>0) AND (Account.PV = Account.GV) THEN 1 ELSE 0
I want to make a calculated column on Payment table which will return 1 or 0 so that I can use this to filter all the records that meets my condition
account_id is the relationship of these two tables
Here is a sample data for reference: Account table
account_id
pv
gv
due_date
123
100
200
08/08/2022
124
200
200
08/09/2022
125
300
800
08/10/2022
126
400
670
08/11/2022
127
500
500
08/12/2022
128
600
600
08/13/2022
129
700
1000
08/14/2022
130
800
760
08/15/2022
131
900
900
08/16/2022
132
1000
1000
08/17/2022
133
1100
2300
08/09/2022
Here is a sample data for reference: Payment table
payment_id
payment_number
payment
payment_date
account_id
_test
101
554321
1000
03/01/2022
123
0
102
554322
1200
03/21/2022
124
1
103
554322
1100
04/28/2022
124
1
104
554323
2500
05/04/2022
131
1
105
554324
3000
05/14/2022
133
0
106
554325
3000
05/14/2022
132
1
107
554322
1200
03/21/2022
124
1
108
554323
2500
04/05/2022
131
1
109
554328
3100
04/05/2022
128
0
Codes I tried but I can't help myself to find the correct way to do it correctly and return the output that I need
_test = IF(Payments[payment]>0 && RELATED('Account'[PV])=RELATED('Account'[GV]), 1)
_test = IF(AND(Payments[payment])>0, RELATED('Account'[PV])=RELATED('Account'[GV])),1,0)
Any suggestion is much appreciated. Please recommend what kind of syntax/function should be used in order to achieve the output or what would be the work around to use other than IF statement
The problem that you are facing with RELATED is that RELATED only works from 1 side to many side.
Meaning, that if you bring the axis from 1-side and perform a calculation on the many side the filter works perfectly. Take a look at the direction of the filter below. The direction of the filter tells you on normal circumstances, you should bring your axis from Account and whatever calculation you perform on `Payment table it will work out.
But you are doing exactly the reverse. You are bringing the axis from Payment and hoping for RELATED to work. It won't cause the direction to be as such.
However, DAX is much more dynamic than that. If for whatever reason, you need to bring axis from many side where you need to still filter on 1-side, you can define a reverse filter direction on-the-fly (because DAX is magical) without needing to change anything in the data model by using CROSSFILTER. With CROSSFILTER you are customizing the filter direction as such
CROSSFILTER(<LEFtblcolumnName1>, <RIGHTtblcolumnName2>, <direction>)
This is how, (with your given dataset)
Column =
VAR cond1 =
CALCULATE (
MAX ( Account[Account.pv] ),
CROSSFILTER ( Payment[Payment.account_id], Account[Account.account_id], BOTH )
)
- CALCULATE (
MAX ( Account[Account.gv] ),
CROSSFILTER ( Payment[Payment.account_id], Account[Account.account_id], BOTH )
)
RETURN
IF ( cond1 == 0 && Payment[Payment.payment] > 0, 1, 0 )

How can I efficiently convert the output of one KDB function into three table columns?

I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51

KDB+/Q: Custom min max scaler

Im trying to implement a custom min max scaler in kdb+/q. I have taken note of the implementation located in the ml package however I'm looking to be able to scale data between a custom range i.e. 0 and 255. What would be an efficient implementation of min max scaling in kdb+/q?
Thanks
Looking at the link to github on the page you referenced it looks like you may be able to define a function like so:
minmax255:{[sf;x]sf*(x-mnx)%max[x]-mnx:min x}[255]
Where sf is your scaling factor (here given by 255).
q)minmax255 til 10
0 28.33333 56.66667 85 113.3333 141.6667 170 198.3333 226.6667 255
If you don't like decimals you could round to the nearest whole number like:
q)minmax255round:{[sf;x]floor 0.5+sf*(x-mnx)%max[x]-mnx:min x}[255]
q)minmax255round til 10
0 28 57 85 113 142 170 198 227 255
(logic here is if I have a number like 1.7, add .5, and floor I'll wind up with 2, whereas if I had a number like 1.2, add .5, and floor I'll end up with 1)
If you don't want to start at 0 you could use | which takes the max of it's left and right arguments
q)minmax255roundlb:{[sf;lb;x]lb|floor sf*(x-mnx)%max[x]-mnx:min x}[255;10]
q)minmax255roundlb til 10
10 28 56 85 113 141 170 198 226 255
Where I'm using lb to mean 'lower bound'
If you want to apply this to a table you could use
q)show testtab:([]a:til 10;b:til 10)
a b
---
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
q)update minmax255 a from testtab
a b
----------
0 0
28.33333 1
56.66667 2
85 3
113.3333 4
141.6667 5
170 6
198.3333 7
226.6667 8
255 9
The following will work nicely
minmaxCustom:{[l;u;x]l + (u - l) * (x-mnx)%max[x]-mnx:min x}
As petty as it sounds, it is my strong recommendation that you do not follow through with Shehir94 solution for a custom minimum value. Applying a maximum to get a starting range, it will mess with the original distribution. A custom minmax scaling should be a simple linear transformation on a standard 0-1 minmax transformation.
X' = a + bX
For example, to get a custom scaling of 10-255, that would be a b=245 and a=10, we would expect the new mean to follow this formula and the standard deviation to only be a Multiplicative, but applying lower bound messes with this, for example.
q)dummyData:10000?100.0
q)stats:{`transform`minVal`maxVal`avgVal`stdDev!(x;min y;max y; avg y; dev y)}
q)minmax255roundlb:{[sf;lb;x]lb|sf*(x-mnx)%max[x]-mnx:min x}[255;10]
q)minmaxCustom:{[l;u;x]l + (u - l) * (x-mnx)%max[x]-mnx:min x}
q)res:stats'[`orig`lb`linear;(dummyData;minmax255roundlb dummyData;minmaxCustom[10;255;dummyData])]
q)res
transform minVal maxVal avgVal stdDev
-----------------------------------------------
orig 0.02741043 99.98293 50.21896 28.92852
lb 10 255 128.2518 73.45999
linear 10 255 133.024 70.9064
// The transformed average should roughly be
q)10 + ((255-10)%100)*49.97936
132.4494
// The transformed std devaition should roughly be
q)2.45*28.92852
70.87487
To answer the comment, this could be applied over a large number of coluwould be applied to a table in the following manner
q)n:10000
q)tab:([]sym:n?`3;col1:n?100.0)
q)multiColApply:{[tab;scaler;colList]flip ft,((),colList)!((),scaler each (ft:flip tab)[colList])}
q)multiColApply[tab;minmaxCustom[10;20];`col1`col2]
sym col1 col2 col3
------------------------------
cag 13.78461 10.60606 392.7524
goo 15.26201 16.76768 517.0911
eoh 14.05111 19.59596 515.9796
kbc 13.37695 19.49495 406.6642
mdc 10.65973 12.52525 178.0839
odn 16.24697 17.37374 301.7723
ioj 15.08372 15.05051 785.033
mbc 16.7268 20 534.7096
bhj 12.95134 18.38384 711.1716
gnf 19.36005 15.35354 411.597
gnd 13.21948 18.08081 493.1835
khi 12.11997 17.27273 578.5203

Create table in KDB with columns from results

I'm trying to create a table in KDB where the columns are the results of a query. For example , I have loaded in stock data and search for a given time window what prices the stock traded at. I created a function
getTrades[Sybmol; Date; StartTime; StopTime]
This will search through my database and return the prices that traded between the start and stop time. So my results for Apple for a 30 second window might be:
527.10, 527.45, 527.60, 526.90 etc.
What I want to do is now create a table using xbar where I have rows of every second and columns of all the prices that trade in StartTime and StopTime. I will then place an X in the column if the price traded in that 1 second. I think I can handle most of this but the main thing I'm struggling with is converting the results I got above into the name of the table. I'm also struggling with how to make it flexible so my table will have 5 columns in one scenario (5 prices traded) but 10 in another so essentially it varies depending on how many price levels traded in the window I'm searching.
Thanks.
The best and cleanest way to do programmatic selects is with the functional form of select.
from q for mortals,
?[t;c;b;a]
where t is a table, a is a dictionary of aggregates, b is a dictionary of groupbys and c is a list of constraints.
In other words, select a by b from t where c.
This will allow you to dynamically create a, which can be of arbitrary size.
You can find more information here:
http://code.kx.com/q4m3/9_Queries_q-sql/#912-functional-forms
Pivot Table
I think that pivot table will be suitable in this case. Using jgleeson example:
time price
------------------
11:27:01.600 106
11:27:02.600 102
11:27:02.600 102
11:27:03.100 100
11:27:03.100 102
11:27:03.100 102
11:27:03.100 104
11:27:03.600 104
11:27:03.600 102
11:27:04.100 106
11:27:05.100 105
11:27:06.600 106
11:27:07.100 101
11:27:07.100 104
11:27:07.600 105
11:27:07.600 105
11:27:07.600 101
not null exec (exec `$string asc distinct price from s)#(`$string price)!price by time:1 xbar time.second from s:select from t where time within 11:27:00 11:27:30
and returns:
time | 100 101 102 103 104 105 106
--------| ---------------------------
11:27:01| 0 0 0 0 0 0 1
11:27:02| 0 0 1 0 0 0 0
11:27:03| 1 0 1 0 1 0 0
11:27:04| 0 0 0 0 0 0 1
11:27:05| 0 0 0 0 0 1 0
11:27:06| 0 0 0 0 0 0 1
11:27:07| 0 1 0 0 1 1 0
It can support any numbers of unique prices.
This looks a bit convoluted... but I think this might be what you're after.
Sample table t with time and price columns:
t:`time xasc([]time:100?(.z.T+500*til 100);price:100?(100 101 102 103 104 105 106))
This table should replicate what you get from the first step of your function call - "select time,price from trade where date=x, symbol=y, starttime=t1, endtime=t2".
To return the table in the format specified:
q) flip (`time,`$string[c])!flip {x,'y}[key a;]value a:{x in y}[c:asc distinct tt`price] each group (!) . reverse value flip tt:update time:time.second from t
time 100 101 102 103 104 105 106
------------------------------------
20:34:29 0 1 0 0 0 1 0
20:34:30 0 0 0 0 0 0 1
20:34:31 0 0 1 0 0 0 0
20:34:32 0 0 1 0 1 0 0
...
This has bools instead of X as bools are probably easier to work with.
Also please excuse the one-liner... If I get a chance I'll break it up and try to make it more readable.
A more simplified version is :
q)t:`time xasc([] s:100#`s ; time:100?(.z.T+500*til 100);price:100?(100 101 102 103 104 105 106))
q)t1:update `$string price,isPrice:1b from t
q)p:(distinct asc t1`price)
q)exec p#(10b!"X ")#(price!isPrice) by time:1 xbar time.second from t1
time | 100 101 102 103 104 105 106
--------| ---------------------------
20:39:00| X X X
20:39:01| X X X X
20:39:02| X
20:39:04| X
20:39:05| X X X X

joining three tables each with an incomplete set of each others' keys

I have three tables (actually temp tables, each the result of other queries), with very similar data sets; I need to "condense", for lack of a better term, and my limited SQL knowledge is stopping me.
For example, we have Budgets by Code, Estimates by Code, and Actuals by Code. Not all possible values for Code exist in any of the three, nor even in another accessible table.
Budgets
1 $13
2 $22
4 $44
7 $71
Estimates
1 $14
4 $49
5 $55
Actuals
2 $21
3 $33
5 $57
7 $70
What I want:
Code Bgt Est Act
1 13 14 0
2 22 0 21
3 0 0 33
4 44 49 0
5 0 55 57
7 71 0 70
(I don't have to have 0 when there's no value, that's just for illustrative purposes.)
I just have no idea how to approach this - any help appreciated!
Try using Full Outer Join, In your case query will look like -
Select ISNULL(Bgt.Code,ISNULL(EST.Code,Act.Code)) AS Code,
ISNULL(Bgt.Budget,0) AS Bgt,
ISNULL(Est.Estimate,0) AS Est,
ISNULL(Act.Actual,0) AS Bgt
FROM Budget Bgt
FULL OUTER JOIN Estimates Est ON Est.Code=Bgt.Code
FULL OUTER JOIN Actuals Act ON Act.Code=bgt.Code