Ruby on Rails: How to get monthly count by using PG - postgresql

Now I'm facing a issue that is I want to write a statement to return monthly count,
For example, in period of 2014-01 to 2014-12. return an order array like
["Jan, 5", "Feb, 0",...,"Dec, 55" ]
The possible solution that I only know is
1. get a scope to return monthly record
2. calculate the period number, like here is 12
3. repeat 12.times to get record size for each month
4. build array
The problem is I have to repeat queries for 12 times! That's so weird.
I know group_by could be a better choice, but no idea how to achieve the performance which I really want to be. Could anyone help me?

Format your date column using Postgres's to_char and then use it in ActiveRecord's group method.
start = Date.new(2014, 1, 1)
finish = Date.new(2014, 12, 31)
range = start..finish
return_hash = ModelClass.
where(created_at: range).
group("to_char(created_at, 'Mon YYYY')").
count
That will return a hash like {"Nov 2014" => 500}
To 'fill in the gaps' you can create a month_names array and do:
month_names.each{ |month| return_hash[month] ||= 0 }
Consider creating a new hash altogether that has keys sorted according to your month_names variable.
Then to get your desired output:
return_hash.map{ |month, count| "#{month}, #{count}" }

I use the groupdate gem (https://github.com/ankane/groupdate)
Then add .group_by_month(:created_at).count to your query

Related

How to query the first row efficiently?

I have a table with large amount of records:
date instrument price
2019.03.07 X 1.1
2019.03.07 X 1.0
2019.03.07 X 1.2
...
When I query for the day opening price, I use:
1 sublist select from prices where date = 2019.03.07, instrument = `X
It takes a long time to execute because it selects all the prices on that day and get the first one.
I also tried:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
In Oracle an equivalent will be:
select * from prices where date = to_date(...) and instrument = "X" and rownum = 1
and Oracle will stop immediately when it finds the first record.
How to do this in KDB (e.g. stop immediately after it finds the first record)?
In kdb, where subclauses in select statements are executed sequentially. i.e. only those records which pass the first "test" get passed to the second test. With that in mind, looking at your two attempts:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
This doesn't (necessarily) return anything, because by the time it gets to the i=0 check, you've already filtered out some records (possibly including the first record in the original table, which would have i=0)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
This one should work. First you filter by date. Then within the records for that date, you select the records for instrument `X. Then within those records, you take the record where i is the first i (where i has already been filtered down, so first i is simply the index of the first record [still the index from the original table, not the filtered down version])
Q-SQL equivalent for that is select[n] which also performs better than other approaches in most of the cases. Positive 'n' will give first n records and negative will give last n records.
q) select[1] from prices where date = 2019.03.07, instrument = `X
There is no inbuilt functionality to stop after first match. You can write custom function for that but that would probably execute slower than above supported version.

Filter portal for most recently created record by group

I have a portal on my "Clients" table. The related table contains the results of surveys that are updated over time. For each combination of client and category (a field in the related table), I only want the portal to display the most recently collected row.
Here is a link to a trivial example that illustrates the issue I'm trying to address. I have two tables in this example (Related on ClientID):
Clients
Table 1 Get Summary Method
The Table 1 Get Summary Method table looks like this:
Where:
MaxDate is a summary field = Maximum of Date
MaxDateGroup is a calculated field = GetSummary ( MaxDate ;
ClientIDCategory )
ShowInPortal = If ( Date = MaxDateGroup ; 1 ; 0 )
The table is sorted on ClientIDCategory
Issue 1 that I'm stumped on: .
ShowInPortal should equal 1 in row 3 (PKTable01 = 5), row 4 (PKTable01 = 6), and row 6 (PKTable01 = 4) in the table above. I'm not sure why FM is interpreting 1Red and 1Blue as the same category, or perhaps I'm just misunderstanding what the GetSummary function does.
The Clients table looks like this:
Where:
The portal records are sorted on ClientIDCategory
Issue 2 that I'm stumped on:
I only want rows with a ShowInPortal value equal to 1 should appear in the portal. I tried creating a portal filter with the following formula: Table 1 Get Summary Method::ShowInPortal = 1. However, using that filter removes all row from the portal.
Any help is greatly appreciated.
One solution is to use ExecuteSQL to grab the Max Date. This removes the need for Summary functions and sorts, and works as expected. Propose to return it as number to avoid any issues with date formats.
GetAsTimestamp (
ExecuteSQL (
"SELECT DISTINCT COALESCE(MaxDate,'')
FROM Survey
WHERE ClientIDCategory = ? "
; "" ; "";ClientIDCategory )
)
Also, you need to change the ShowInPortal field to an unstored calc field with:
If ( GetAsNumber(Date) = MaxDateGroupSQL ; 1 ; 0 )
Then filter the portal on this field.
I can send you the sample file if you want.

Min value with GROUP BY in Power BI Desktop

id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?
Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.

how to use multiple arguments in kdb where query?

I want to select max elements from a table within next 5, 10, 30 minutes etc.
I suspect this is not possible with multiple elements in the where clause.
Using both normal < and </: is failing. My code/ query below:
`select max price from dat where time</: (09:05:00; 09:10:00; 09:30:00)`
Any ideas what am i doing wrong here?
The idea is to get the max price for each row within next 5, 10, 30... minutes of the time in that row and not just 3 max prices in the entire table.
select max price from dat where time</: time+\:(5 10 30)
This won't work but should give the general idea.
To further clarify, i want to calculate the max price in 5, 10, 30 minute intervals from time[i] of each row of the table. So for each table row max price within x+5, x+10, x+30 minutes where x is the time entry in that row.
You could try something like this:
select c1:max price[where time <09:05:00],c2:max price[where time <09:10:00],c3:max price from dat where time< 09:30:00
You can paramatize this query however you like. So if you have a list of times, l:09:05:00 09:10:00 09:15:00 09:20:00 ... You can create a function using a functional form of the query above to work for different lengths of l, something like:
q)f:{[t]?[dat;enlist (<;`time;max t);0b;(`$"c",/:string til count t)!flip (max;flip (`price;flip (where;((<),/:`time,/:t))))]}
q)f l
You can extend f to take different functions instead of max, work for different tables etc.
This works but takes a lot of time. For 20k records, ~20 seconds, too much!. Any way to make it faster
dat: update tmlst: time+\:mtf*60 from dat;
dat[`pxs]: {[x;y] {[x; ts] raze flip raze {[x;y] select min price from x where time<y}[x] each ts }[x; y`tmlst]} [dat] each dat;
this constructs a step dictionary to map the times to your buckets:
q)-1_select max price by(`s#{((neg w),x)!x,w:(type x)$0W}09:05:00 09:10:00 09:30:00)time from dat
you may also be able to abuse wj:
q)wj[{(prev x;x)}09:05:00 09:10:00 09:30:00;`time;([]time:09:05:00 09:10:00 09:30:00);(delete sym from dat;(max;`price))]
if all your buckets are the same size, it's much easier:
q)select max price by 300 xbar time from dat where time<09:30:00 / 300-second (5-min) buckets

TSQL Syntax, Replace Existing "Wrong Value" with previous "Correct Value"

I have an application that makes an entry every hour in a MS SQL database.
The last entry on the 12th FEB is a zero value and is showing in my weekly report.
What I want to do is take the value from the previous count and enter into the filed instead of the zero value.
Can someone offer some advice on how to this because it is beyond my TSQL skills?
SELECT * FROM [dbo].[CountDetails]
WHERE [updateTime] < '2013-02.13'
AND [updateTime] > '2013-02.12'
AND ( DATEPART(hh,[updateTime])= '22' OR DATEPART(hh,[updateTime])= '23' )
Note: The application is supposed to zero the count a Midnight but on the 12th FEB it happened early and I know why.
EDIT: There are 5 IP addresses in total and 6 counters in total because 192.168.168.11 has 2 counters. So 2111 to 2116 is an entire entry for all available counters at 22:58 and 2117 to 2122 is an entire entry for all available counters at 23:58. I need to replace the 23:58 values with the corresponding value from 22:58.
Guessing here, but an update that joins on the ipAddress, counterNumber, and the datetime excluding fractional seconds, separated by an hour (do the SELECT part first for safety):
UPDATE b
SET count = a.count
-- SELECT *
FROM dbo.CountDetails a
JOIN dbo.CountDetails b ON a.ipAddress = b.ipAddress AND a.counterNumber = b.counterNumber
AND CONVERT(VARCHAR(20),b.updateTime,120) = CONVERT(VARCHAR(20),DATEADD(HOUR,1,a.updateTime),120)