SQL Query to display Calculated fields on a year, monthly basis - postgresql

I need help writing this SQL query (PostgresSQL) to display results in the form below:
--------------------------------------------------------------------------------
State | Jan '17 | Feb '17 | Mar '17 | Apr '17 | May '17 ... Dec '18
--------------------------------------------------------------------------------
Principal Outs. |700,839 |923,000 |953,000 |6532,293 | 789,000 ... 913,212
Disbursal Amount |23,000 |25,000 |23,992 | 23,627 | 25,374 ... 23,209
Interest |113,000 |235,000 |293,992 |322,627 |323,374 ... 267,209
There are multiple tables but I would be okay joining them.

Related

Cannot create stream in Ksql

I have the stream as below and i want to create another stream from this. I am trying the command as below and i am getting the following error. Am i missing something?
ksql> create stream down_devices_stream as select * from fakedata119 where deviceProperties['status']='false';
Failed to generate code for SqlPredicate.filterExpression: (FAKEDATA119.DEVICEPROPERTIES['status'] = 'false')schema:org.apache.kafka.connect.data.SchemaBuilder#6e18dbbfisWindowedKey:false
Caused by: Line 1, Column 180: Operator "<=" not allowed on reference operands
ksql> select * from fakedata119;
1529505497087 | null | 19 | visibility sensors | Wed Jun 20 16:38:17 CEST 2018 | {visibility=74, status=true}
1529505498087 | null | 7 | fans | Wed Jun 20 16:38:18 CEST 2018 | {temperature=44, rotationSense=1, status=false, frequency=49}
1529505499088 | null | 28 | air quality monitors | Wed Jun 20 16:38:19 CEST 2018 | {coPpm=257, status=false, Co2Ppm=134}
1529505500089 | null | 4 | fans | Wed Jun 20 16:38:20 CEST 2018 | {temperature=42, rotationSense=1, status=true, frequency=51}
1529505501089 | null | 23 | air quality monitors | Wed Jun 20 16:38:21 CEST 2018 | {coPpm=158, status=true, Co2Ppm=215}
sql> describe fakedata119;
Field | Type
---------------------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
DEVICEID | INTEGER
CATEGORY | VARCHAR(STRING)
TIMESTAMP | VARCHAR(STRING)
DEVICEPROPERTIES | MAP[VARCHAR(STRING),VARCHAR(STRING)]
Without seeing your input data, I have guessed that it looks something like this:
{
"id": "a42",
"category": "foo",
"timestamp": "2018-06-21 10:04:57 BST",
"deviceID": 42,
"deviceProperties": {
"status": "false",
"foo": "bar"
}
}
And if so, you are better using EXTRACTJSONFIELD to access the nested values, and build predicates.
CREATE STREAM test (Id VARCHAR, category VARCHAR, timeStamp VARCHAR, \
deviceID INTEGER, deviceProperties VARCHAR) \
WITH (KAFKA_TOPIC='test_map2', VALUE_FORMAT='JSON');
ksql> SELECT EXTRACTJSONFIELD(DEVICEPROPERTIES,'$.status') AS STATUS FROM fakeData223;
false
ksql> SELECT * FROM fakeData223 \
WHERE EXTRACTJSONFIELD(DEVICEPROPERTIES,'$.status')='false';
1529572405759 | null | a42 | foo | 2018-06-21 10:04:57 BST | 42 | {"status":"false","foo":"bar"}
The error you've found I've logged as a bug to track here: https://github.com/confluentinc/ksql/issues/1474
I've added a test to cover this usecase:
https://github.com/confluentinc/ksql/pull/1476/files
Interestingly, this passes on our master and upcoming 5.0 branches, but fails on 4.1.
So... looks like this is an issue on the version you're using, but the good news is its fixed on the up coming release. Plus you can use Robin's work around above for now.
Happy querying!
Andy

Pivot table with columns as year/date in KDB+

I am trying to create a pivot table with columns as year out of a simple table
q)growth:([] stock:asc 9#`goog`apple`nokia; year: 9#2015 2016 2017; returns:9?20 )
q)growth
stock year returns
------------------
apple 2015 9
apple 2016 18
apple 2017 17
goog 2015 8
goog 2016 13
goog 2017 17
nokia 2015 12
nokia 2016 12
nokia 2017 2
but I am not able to get the correct structure, it is still returning me a dictionary rather than multiple year columns.
q)exec (distinct growth`year)#year!returns by stock:stock from growth
stock|
-----| ----------------------
apple| 2015 2016 2017!9 18 17
goog | 2015 2016 2017!8 13 17
nokia| 2015 2016 2017!12 12 2
am I doing anything wrong?
You need to convert the years to symbols in order to use them as column headers. In this case I have updated the growth table first then performed the pivot:
q)exec distinct[year]#year!returns by stock:stock from update `$string year from growth
stock| 2015 2016 2017
-----| --------------
apple| 12 8 10
goog | 1 9 11
nokia| 5 6 1
Additionally you may see that I have changed to distinct[year] from (distinct growth`year) as this yields the same result with year being pulled from the updated table.
The column names of a table in KDB should be symbols rather than any other data type.
In your pivot table , the datatype of 'year' column is int\long this is the reason a proper table is not turning up.
If you type cast it as symbol, then it will work.
q)growth:([] stock:asc 9#`goog`apple`nokia; year: 9#2015 2016 2017; returns:9?20 )
q)growth:update `$string year from growth
q)exec (distinct growth`year)#year!returns by stock:stock from growth
stock| 2015 2016 2017
-----| --------------
apple| 9 18 17
goog | 8 13 17
nokia| 12 12 2
Alternatively, you can switch the pivot columns to 'stock' rather than 'year' and get a pivot table with the same original table.
q)growth:([] stock:asc 9#`goog`apple`nokia; year: 9#2015 2016 2017; returns:9?20 )
q)show exec (distinct growth`stock)#stock!returns by year:year from growth
year| apple goog nokia
----| ----------------
2015| 4 2 4
2016| 5 13 12
2017| 12 6 1

Total count for each month within period

Given this scenario , I have the following employment records
id | user_id | Month | Active
1 | 1 | June 2014 | true
2 | 1 | September 2014 | false
3 | 2 | June 2014 | true
How can i make a query to return the total active users for each month, the return data should be:
active_count | month
2 | June 2014
2 | July 2014
2 | August 2014
1 | September 2014
Any help is highly appreciated
You are looking for a conditional aggregate:
SELECT count(case when active then 1 end) as active_count,
month
FROM employment
GROUP BY month;
With Postgres 9.4 this can be written a bit more concise using the filter() operator:
SELECT count(*) filter (where active) as active_count,
month
FROM employment
GROUP BY month;
It is sql query try to this
SELECT
count(id) active_count,
month
FROM
employment
GROUP BY
month;

function to calculate aggregate sum count in postgresql

Is there a function that calculates the total count of the complete month like below? I am not sure if postgres. I am looking for the grand total value.
2012-08=# select date_trunc('day', time), count(distinct column) from table_name group by 1 order by 1;
date_trunc | count
---------------------+-------
2012-08-01 00:00:00 | 22
2012-08-02 00:00:00 | 34
2012-08-03 00:00:00 | 25
2012-08-04 00:00:00 | 30
2012-08-05 00:00:00 | 27
2012-08-06 00:00:00 | 31
2012-08-07 00:00:00 | 23
2012-08-08 00:00:00 | 28
2012-08-09 00:00:00 | 28
2012-08-10 00:00:00 | 28
2012-08-11 00:00:00 | 24
2012-08-12 00:00:00 | 36
2012-08-13 00:00:00 | 28
2012-08-14 00:00:00 | 23
2012-08-15 00:00:00 | 23
2012-08-16 00:00:00 | 30
2012-08-17 00:00:00 | 20
2012-08-18 00:00:00 | 30
2012-08-19 00:00:00 | 20
2012-08-20 00:00:00 | 24
2012-08-21 00:00:00 | 20
2012-08-22 00:00:00 | 17
2012-08-23 00:00:00 | 23
2012-08-24 00:00:00 | 25
2012-08-25 00:00:00 | 35
2012-08-26 00:00:00 | 18
2012-08-27 00:00:00 | 16
2012-08-28 00:00:00 | 11
2012-08-29 00:00:00 | 22
2012-08-30 00:00:00 | 26
2012-08-31 00:00:00 | 17
(31 rows)
--------------------------------
Total | 12345
As best I can guess from your question and comments you want sub-totals of the distinct counts by month. You can't do this with group by date_trunc('month',time) because that'll do a count(distinct column) that's distinct across all days.
For this you need a subquery or CTE:
WITH day_counts(day,day_col_count) AS (
select date_trunc('day', time), count(distinct column)
from table_name group by 1
)
SELECT 'Day', day, day_col_count
FROM day_counts
UNION ALL
SELECT 'Month', date_trunc('month', day), sum(day_col_count)
FROM day_counts
GROUP BY 2
ORDER BY 2;
My earlier guess before comments was: Group by month?
select date_trunc('month', time), count(distinct column)
from table_name
group by date_trunc('month', time)
order by time
Or are you trying to include running totals or subtotal lines? For running totals you need to use sum as a window function. Subtotals are just a pain, as SQL doesn't really lend its self to them; you need to UNION two queries then wrap them in an outer ORDER BY.
select
date_trunc('day', time)::text as "date",
count(distinct column) as count
from table_name
group by 1
union
select
'Total',
count(distinct column)
from table_name
group by 1, date_trunc('month', time)
order by "date" = 'Total', 1

PostgreSQL - WHERE clause within OVER clause?

I need to use a where clause within an over clause. How?
SELECT SUM(amount) OVER(WHERE dateval > 12)
Or something like that.
--EDIT--
More details
My table is formatted with a year, month, and amount column.
I want to select all the year, month, and amount rows AND create a fourth 'virtual column' that has the sum of the past 12 months of amount column.
For example:
YEAR | MONTH | AMOUNT
2001 | 03 | 10
2001 | 05 | 25
2001 | 07 | 10
Should create:
YEAR | MONTH | AMOUNT | ROLLING 12 MONTHS
2001 | 03 | 10 | 10
2001 | 05 | 25 | 35
2001 | 07 | 10 | 45
Given a query against your three-column resultset, does the below work for you?
SELECT
SUM(amount) OVER(ORDER BY YEAR ASC, MONTH ASC
ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
...
select a,(select sum(a) from foo fa where fa.a > fb.a) from foo fb;
Doesn't use over, is pretty inefficient since it is running new sub-query for each query, but it works.