Selecting certain columns from a table with dates as columns - postgresql

I have a table where column names are like years "2020-05","2020-06", "2020-07" etc and so many years as columns.I need to select only the current month, next month and third month columns alone from this table.(DB : PostgreSQL Version 11)
But since the column names are "TEXT" are in the format YYYY-MM , How can I select only the current month and future 2 months from this table without hard-coding the column names.
Below is the table structure , Name : static_data
Required select statement is like this,The table contains the 14 months data as in the above screen shot like DATES as columns.From this i want the current month , and next 2 month columns along with their data, something like below.
SELECT "2020-05","2020-06","2020-07" from static
-- SELECT Current month and next 2 months
Required output:

It's nearly impossible to get the actual value of the current month as the column name, but you can do something like this:
select d.item_sku,
d.status,
to_jsonb(d) ->> to_char(current_date, 'yyyy-mm') as current_month,
to_jsonb(d) ->> to_char(current_date + interval '1 month', 'yyyy-mm') as "month + 1",
to_jsonb(d) ->> to_char(current_date + interval '2 month', 'yyyy-mm') as "month + 2"
from bad_design d
;

Technically, you can use the information schema to achieve this. But, like GMB said, please re-design your schema and do not approach this issue like this, in the first place.
The special schema information_schema contains meta-data about your DB. Among these is are details about existing columns. In other words, you can query it and convert their names into dates to compare them to what you need.
Here are a few hints.
Query existing column names.
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
Compare two dates.
SELECT now() + INTERVAL '3 months' < now() AS compare;
compare
---------
f
(1 row)
You're already pretty close with the conversion yourself.
Have fun and re-design your schema!

Disclaimer: this does not answer your question - but it's too long for a comment.
You need to fix the design of this table. Instead of storing dates in columns, you should have each date on a separate row.
There are numerous drawbacks to your current design:
very simple queries are utterly complicated : filtering on dates, aggregation... All these operations require dynamic SQL, which adds a great deal of complexity
adding or removing new dates requires modifying the structure of the table
storage is wasted for rows where not all columns are filled
Instead, consider this simple design, with one table that stores the master data of each item_sku, and a child table
create table myskus (
item_sku int primary key,
name text,
cat_level_3_name text
);
create table myvalues (
item_sku int references myskus(item_sku),
date_sku date,
value_sku text,
primary key (item_sku, date_sku)
);
Now your original question is easy to solve:
select v.*, s.name, s.cat_level_3_name
from myskus s
inner join myvalues v on v.item_sku = s.item_sku
where
v.date_sku >= date_trunc('month', now())
and v.date_sku < date_trunc('month', now()) + interval '3 month'

Related

How to properly index and query time series data in Postgres?

I'm experimenting with Postgres recently instead of using Bigquery.
My table transactions_all structure
access_id (text)
serial_no (text)
transaction_id (text)
date_local (timestamp)
and an index (BTREE, condition (((date_local)::date), serial_no))
When I'm loading 500.000 rows for one month into this table, performance is okay to query the last 2 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '1 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
But if I'm selecting the last 21 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '20 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
then fetching the data takes multiple seconds instead of milliseconds.
Is this a normal behaviour or am I using the wrong index?
Your index columns are in the wrong order. In the front you need those expressions that are used with the = operator in the WHERE condition, because index columns after the column that is used with a different operator can no longer be used.
To understand that, imagine a phone book where the names are ordered by (last name, first name). Then consider that it is easy to find all entries with last name "Miller" and a first name less than "J": you just read the "Miller" entries until you hit "J". Now consider that the task is to find all "Joe"s whole last name is less than "M": you have to scan all entries up to "M", and it doesn't help you much that the first names are sorted too.
So use an index like
CREATE INDEX ON transactions_all (serial_no, access_id, date(date_local));

Add dates ranges to a table for individual values using a cursor

I have a calendar table called CalendarInformation that gives me a list of dates from 2015 to 2025. This table has a column called BusinessDay that shows what dates are weekends or holidays. I have another table called OpenProblemtimeDiffTable with a column called number for my problem number and a date for when the problem was opened called ProblemNew and another date for the current column called Now. What I want to do is for each problem number grab its date ranges and find the dates between and then sum them up to give me the number of business days. Then I want to insert these values in another table with the problem number associated with the business day.
Thanks in advance and I hope I was clear.
TRUNCATE TABLE ProblemsMoreThan7BusinessDays
DECLARE #date AS date
DECLARE #businessday AS INT
DECLARE #Startdate as DATE, #EndDate as DATE
DECLARE CONTACT_CURSOR CURSOR FOR
SELECT date, businessday
FROM CalendarInformation
OPEN contact_cursor
FETCH NEXT FROM Contact_cursor INTO #date, #businessday
WHILE (##FETCH_STATUS=0)
BEGIN
SELECT #enddate= now FROM OpenProblemtimeDiffTable
SELECT #Startdate= problemnew FROM OpenProblemtimeDiffTable
SET #Date=#Startdate
PRINT #enddate
PRINT #startdate
SELECT #businessday= SUM (businessday) FROM CalendarInformation WHERE date > #startdate AND date <= #Enddate
INSERT INTO ProblemsMoreThan7BusinessDays (businessdays, number)
SELECT #businessday, number
FROM OpenProblemtimeDiffTable
FETCH NEXT FROM CONTACT_CURSOR INTO #date, #businessday
END
CLOSE CONTACT_CURSOR
DEALLOCATE CONTACT_CURSOR
I tried this code using a cursor and I'm close, but I cannot get the date ranges to change for each row.
So if I have a problemnumber with date ranges between 02-07-2018 and 05-20-2019, I would want in my new table the sum of business days from the calendar along with the problem number. So my output would be column number PROB0421 businessdays (with the correct sum). Then the next problem PRB0422 with date ranges of 11-6-18 to 5-20-19. So my output would be PROB0422 with the correct sum of business days.
Rather than doing this in with a cursor, you should approach this in a set based manner. That you already have a calendar table makes this a lot easier. The basic approach is to select from your data table and join into your calendar table to return all the rows in the calendar table that sit within your date range. From here you can then aggregate as you require.
This would look something like the below, though apply it to your situation and adjust as required:
select p.ProblemNow
,p.Now
,sum(c.BusinessDay) as BusinessDays
from dbo.Problems as p
join dbo.calendar as c
on c.CalendarDate between p.ProblemNow and p.Now
and c.BusinessDay = 1
group by p.ProblemNow
,p.Now
I think you can do this without a cursor. Should only require a single insert..select statement.
I assume your "businessday" column is just a bit or flag-type field that is 1 if the date is a business day and 0 if not? If so, this should work (or something close to it if I'm not understanding your environment properly).:
insert ProblemsMoreThan7BusinessDays
(
businessdays
, number
)
select
number
, sum( businessday ) -- or count(*)
from OpenProblemtimeDiffTable op
inner join CalendarInformation ci on op.problem_new >= ci.[date]
and op.[now] <= ci.[date]
and ci.businessday = 1
group by
problem_number
I usually try to avoid the use of cursors and working with data in a procedural manner, especially if I can handle the task as above. Dont think of the data as 1000's of individual rows, but think of the data as only two sets of data. How do they relate?

Produce a row for dates that do not exist in a table [duplicate]

I have a postgresql table userDistributions like this :
user_id, start_date, end_date, project_id, distribution
I need to write a query in which a given date range and user id the output should be the sum of all distributions for every day for that given user.
So the output should be like this for input : '2-2-2012' - '2-4-2012', some user id :
Date SUM(Distribution)
2-2-2012 12
2-3-2012 15
2-4-2012 34
A user has distribution in many projects, so I need to sum the distributions in all projects for each day and output that sum against that day.
My problem is what I should group by against ? If I had a field as date (instead of start_date and end_date), then I could just write something like
select date, SUM(distributions) from userDistributions group by date;
but in this case I am stumped as what to do. Thanks for the help.
Use generate_series to produce your dates, something like this:
select dt.d::date, sum(u.distributions)
from userdistributions u
join generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') as dt(d)
on dt.d::date between u.start_date and u.end_date
group by dt.d::date
Your date format is ambiguous so I guess while converting it to ISO 8601.
This is much like #mu's answer.
However, to cover days with no matches you should use LEFT JOIN:
SELECT d.d::date, sum(u.distributions) AS dist_sum
FROM generate_series('2012-02-02'::date, '2012-02-04'::date, '1 day') AS d(d)
LEFT JOIN userdistributions u ON d.d::date BETWEEN u.start_date AND u.end_date
GROUP BY 1

fetch data from and to date to get all matching results

Hello everyone I have to get data from and to date, I tried using between clause which fails to retrieve data what I need. Here is what I need.
I have table called hall_info which has following structure
hall_info
id | hall_name |address |contact_no
1 | abc | India |XXXX-XXXX-XX
2 | xyz | India |XXXX-XXXX-XX
Now I have one more table which is events, that contains data about when and which hall is booked on what date, the structure is as follows.
id |hall_info_id |event_date(booked_date)| event_name
1 | 2 | 2015-10-25 | Marriage
2 | 1 | 2015-10-28 | Marriage
3 | 2 | 2015-10-26 | Marriage
So what I need now is I wanna show hall_names that are not booked on selected dates, suppose if user chooses from 2015-10-23 to 2015-10-30 so I wanna list all halls that are not booked on selected dates. In above case both the halls of hall_info_id 1 and 2 ids booked in given range but still I wanna show them because they are free on 23,24,27 and on 29 date.
In second case suppose if user chooses date from 2015-10-25 and 2015-10-26 then only hall_info_id 2 is booked on both the dates 25 and 26 so in this case i wanna show only hall_info_id 1 as hall_info_id 2 is booked.
I tried using inner query and between clause but I am not getting required result to simply i have given only selected fields I have more tables to join so i cant paste my query please help with this. Thanks in advance for all who are trying.
Some changes in Yasen Zhelev's code:
SELECT * FROM hall_info
WHERE id not IN (
SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(DISTINCT event_date) > DATE_PART('day', '2015-10-30'::timestamp - '2015-10-23'::timestamp))
I have not tried it but how about checking if the number of bookings per hall is less than the actual days in the selected period.
SELECT * FROM hall_info WHERE id NOT IN
(SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(id) < DATEDIFF(day, '2015-10-30', '2015-10-23')
);
That will only work if you have one booking per day per hall.
To get the "available dates" for the hall returned, your query needs a row source of all possible dates. For example, if you had a calendar table populated with possible date values, e.g.
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) Engine=InnoDB
;
INSERT INTO cal (dt) VALUES ('2015-10-23')
,('2015-10-24'),('2015-10-25'),('2015-10-26'),('2015-10-27')
,('2015-10-28'),('2015-10-29'),('2015-10-30'),('2015-10-31')
;
The you could use a query that performs a cross join between the calendar table and hall_info... to get every hall on every date... and an anti-join pattern to eliminate rows that are already booked.
The anti-join pattern is an outer join with a restriction in the WHERE clause to eliminate matching rows.
For example:
SELECT cal.dt, h.id, h.hall_name, h.address
FROM cal cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = cal.dt
WHERE e.id IS NULL
AND cal.dt >= '2015-10-23'
AND cal.dt <= '2015-10-30'
The cross join between cal and hall_info gets all halls for all dates (restricted in the WHERE clause to a specified range of dates.)
The outer join to events find matching rows in the events table (matching on hall_id and event_date. The trick is the predicate (condition) in the WHERE clause e.id IS NULL. That throws out any rows that had a match, leaving only rows that don't have a match.
This type of problem is similar to other "sparse data" problems. e.g. How do you return a zero total for sales by a given store on a given date, when there are no rows with that store and date...
In your case, the query needs a source of rows with available date values. That doesn't necessarily have to be a table named calendar. (Other databases give us the ability to dynamically generate a row source; someday, MySQL may have similar features.)
If you want the row source to be dynamic in MySQL, then one approach would be to create a temporary table, and populate it with the dates, run the query referencing the temporary table, and then dropping the temporary table.
Another approach is to use an inline view to return the rows...
SELECT cal.dt, h.id, h.hall_name, h.address
FROM (
SELECT '2015-10-23'+INTERVAL 0 DAY AS dt
UNION ALL SELECT '2015-10-24'
UNION ALL SELECT '2015-10-25'
UNION ALL SELECT '2015-10-26'
UNION ALL SELECT '2015-10-27'
UNION ALL SELECT '2015-10-28'
UNION ALL SELECT '2015-10-29'
UNION ALL SELECT '2015-10-30'
) cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = c.dt
WHERE e.id IS NULL
FOLLOWUP: When this question was originally posted, it was tagged with mysql. The SQL in the examples above is for MySQL.
In terms of writing a query to return the specified results, the general issue is still the same in PostgreSQL. The general problem is "sparse data".
The SQL query needs a row source for the "missing" date values, but the specification doesn't provide any source for those date values.
The answer above discusses several possible row sources in MySQL: 1) a table, 2) a temporary table, 3) an inline view.
The answer also mentions that some databases (not MySQL) provide other mechanisms that can be used as a row source.
For example, PostgreSQL provides a nifty generate_series function (Reference: http://www.postgresql.org/docs/9.1/static/functions-srf.html.
It should be possible to use the generate_series function as a row source, to supply a set of rows containing the date values needed by the query to produced the specified result.
This answer demonstrates the approach to solving the "sparse data" problem.
If the specification is to return just the list of halls, and not the dates they are available, the queries above can be easily modified to remove the date expression from the SELECT list, and add a GROUP BY clause to collapse the rows into a distinct list of halls.

psql 8.4.1 select all the person born in a specific month

I am supposed to select all the persons born in July (or 07). This did not work:
select * from people where date_trunc('month',dob)='07';
ERROR: invalid input syntax for type timestamp with time zone: "07"
LINE 1: ...ct * from people where date_trunc('month',dob)='07';
What is the right way?
to_char() is meant to format dates. For a condition like yours, extract() is simpler & faster:
SELECT *
FROM people
WHERE extract(month FROM dob) = 7;
If you want to search for
a specific year and month too (YYYY-MM)
... like mentioned in the comment, use date_trunc() like you had initially. Just compare it to a date or timestamp, not to a string, which wouldn't make any sense (and was the cause of the error message). To find people born July 1970:
SELECT *
FROM people
WHERE date_trunc('month', dob) = '1970-07-01 0:0'::timestamp;
If performance is relevant, rewrite that to:
SELECT *
FROM people
WHERE dob >= '1970-07-01 0:0'::timestamp
AND dob < '1970-08-01 0:0'::timestamp; -- note the < with the upper limit
Because this form can use a plain index on people.dob:
CREATE INDEX people_dob_idx ON people (dob);
... and will therefore nuke the performance of the previous queries with big tables. Doesn't matter much with small tables.
You could also speed up the first query with a functional index, if needed.
select * from people where to_char(dob, 'MM') = '09';
gives you all people who where born in September, if the date of birth is stored in a timestamp table column called 'dob'.
The second param is the date format pattern. All typical patterns should be supported.
E.g.:
select * from people where to_char(dob, 'MON') = 'SEP';
would do the same.
look here for timestamp format patterns in Postgres: