fetch data from and to date to get all matching results - postgresql

Hello everyone I have to get data from and to date, I tried using between clause which fails to retrieve data what I need. Here is what I need.
I have table called hall_info which has following structure
hall_info
id | hall_name |address |contact_no
1 | abc | India |XXXX-XXXX-XX
2 | xyz | India |XXXX-XXXX-XX
Now I have one more table which is events, that contains data about when and which hall is booked on what date, the structure is as follows.
id |hall_info_id |event_date(booked_date)| event_name
1 | 2 | 2015-10-25 | Marriage
2 | 1 | 2015-10-28 | Marriage
3 | 2 | 2015-10-26 | Marriage
So what I need now is I wanna show hall_names that are not booked on selected dates, suppose if user chooses from 2015-10-23 to 2015-10-30 so I wanna list all halls that are not booked on selected dates. In above case both the halls of hall_info_id 1 and 2 ids booked in given range but still I wanna show them because they are free on 23,24,27 and on 29 date.
In second case suppose if user chooses date from 2015-10-25 and 2015-10-26 then only hall_info_id 2 is booked on both the dates 25 and 26 so in this case i wanna show only hall_info_id 1 as hall_info_id 2 is booked.
I tried using inner query and between clause but I am not getting required result to simply i have given only selected fields I have more tables to join so i cant paste my query please help with this. Thanks in advance for all who are trying.

Some changes in Yasen Zhelev's code:
SELECT * FROM hall_info
WHERE id not IN (
SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(DISTINCT event_date) > DATE_PART('day', '2015-10-30'::timestamp - '2015-10-23'::timestamp))

I have not tried it but how about checking if the number of bookings per hall is less than the actual days in the selected period.
SELECT * FROM hall_info WHERE id NOT IN
(SELECT hall_info_id FROM events
WHERE event_date >= '2015-10-23' AND event_date <= '2015-10-30'
GROUP BY hall_info_id
HAVING COUNT(id) < DATEDIFF(day, '2015-10-30', '2015-10-23')
);
That will only work if you have one booking per day per hall.

To get the "available dates" for the hall returned, your query needs a row source of all possible dates. For example, if you had a calendar table populated with possible date values, e.g.
CREATE TABLE cal (dt DATE NOT NULL PRIMARY KEY) Engine=InnoDB
;
INSERT INTO cal (dt) VALUES ('2015-10-23')
,('2015-10-24'),('2015-10-25'),('2015-10-26'),('2015-10-27')
,('2015-10-28'),('2015-10-29'),('2015-10-30'),('2015-10-31')
;
The you could use a query that performs a cross join between the calendar table and hall_info... to get every hall on every date... and an anti-join pattern to eliminate rows that are already booked.
The anti-join pattern is an outer join with a restriction in the WHERE clause to eliminate matching rows.
For example:
SELECT cal.dt, h.id, h.hall_name, h.address
FROM cal cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = cal.dt
WHERE e.id IS NULL
AND cal.dt >= '2015-10-23'
AND cal.dt <= '2015-10-30'
The cross join between cal and hall_info gets all halls for all dates (restricted in the WHERE clause to a specified range of dates.)
The outer join to events find matching rows in the events table (matching on hall_id and event_date. The trick is the predicate (condition) in the WHERE clause e.id IS NULL. That throws out any rows that had a match, leaving only rows that don't have a match.
This type of problem is similar to other "sparse data" problems. e.g. How do you return a zero total for sales by a given store on a given date, when there are no rows with that store and date...
In your case, the query needs a source of rows with available date values. That doesn't necessarily have to be a table named calendar. (Other databases give us the ability to dynamically generate a row source; someday, MySQL may have similar features.)
If you want the row source to be dynamic in MySQL, then one approach would be to create a temporary table, and populate it with the dates, run the query referencing the temporary table, and then dropping the temporary table.
Another approach is to use an inline view to return the rows...
SELECT cal.dt, h.id, h.hall_name, h.address
FROM (
SELECT '2015-10-23'+INTERVAL 0 DAY AS dt
UNION ALL SELECT '2015-10-24'
UNION ALL SELECT '2015-10-25'
UNION ALL SELECT '2015-10-26'
UNION ALL SELECT '2015-10-27'
UNION ALL SELECT '2015-10-28'
UNION ALL SELECT '2015-10-29'
UNION ALL SELECT '2015-10-30'
) cal
CROSS
JOIN hall_info h
LEFT
JOIN events e
ON e.hall_id = h.id
AND e.event_date = c.dt
WHERE e.id IS NULL
FOLLOWUP: When this question was originally posted, it was tagged with mysql. The SQL in the examples above is for MySQL.
In terms of writing a query to return the specified results, the general issue is still the same in PostgreSQL. The general problem is "sparse data".
The SQL query needs a row source for the "missing" date values, but the specification doesn't provide any source for those date values.
The answer above discusses several possible row sources in MySQL: 1) a table, 2) a temporary table, 3) an inline view.
The answer also mentions that some databases (not MySQL) provide other mechanisms that can be used as a row source.
For example, PostgreSQL provides a nifty generate_series function (Reference: http://www.postgresql.org/docs/9.1/static/functions-srf.html.
It should be possible to use the generate_series function as a row source, to supply a set of rows containing the date values needed by the query to produced the specified result.
This answer demonstrates the approach to solving the "sparse data" problem.
If the specification is to return just the list of halls, and not the dates they are available, the queries above can be easily modified to remove the date expression from the SELECT list, and add a GROUP BY clause to collapse the rows into a distinct list of halls.

Related

Postgres: Storing output of moving average query to a column

I have a table in Postgres 14.2
Table name is test
There are 3 columns: date, high, and five_day_mavg (date is PK if it matters)
I have a select statement which properly calculates a 5 day moving average based on the data in high.
select date,
avg(high) over (order by date rows between 4 preceding and current row) as mavg_calc
from test
It products output as such:
I have 2 goals:
First to store the output of the query in five_day_mavg.
Second to store this in such a way that when I a new row with data
in high, it automatically calculates that value
The closest I got was:
update test set five_day_mavg = a.mav_calc
from (
select date,
avg(high) over (order by date rows between 4 preceding and current row) as mav_calc
from test
) a;
but all that does is sets the value of every row in five_day_mavg to entire average of high
Thanks to #a_horse_with_no_name
I played around with the WHERE clause
update test l set five_day_mavg = b.five_day_mavg from (select date, avg(high) over (order by date rows between 4 preceding and current row) as five_day_mavg from test )b where l.date = b.date;
a couple of things. I defined each table. The original table I aliased as l, the temporary table created by doing a windows function (the select statement in parenthesis) I aliased as b and I joined with the WHERE clause on date which is the index/primary key.
Also, I was using 'a' as the letter for alias, and I think that may have contributed to the issue.
Either way, solved now.

Postgres query filter by non column in table

i have a challenge whose consist in filter a query not with a value that is not present in a table but a value that is retrieved by a function.
let's consider a table that contains all sales on database
id, description, category, price, col1 , ..... col n
i have function that retrieve me a table of similar sales from one (based on rules and business logic) . This function performs a query again on all records in the sales table and match validation in some fields.
similar_sales (sale_id integer) - > returns a integer[]
now i need to list all similar sales for each one present in sales table.
select s.id, similar_sales (s.id)
from sales s
but the similar_sales can be null and i am interested only return sales which contains at least one.
select id, similar
from (
select s.id, similar_sales (s.id) as similar
from sales s
) q
where #similar > 1 (Pseudocode)
limit x
i can't do the limit in subquery because i don't know what sales have similar or not.
I just wanted do a subquery for a set of small rows and not all entire table to get query performance gains (pagination strategy)
you can try this :
select id, similar
from sales s
cross join lateral similar_sales (s.id) as similar
where not isempty(similar)
limit x

Oracle Sql - Discarding outer select if inner select returns null, and avoiding multiple rows

Pre-Info: In our company a person is marked * if he is actively working. And there are people who changed their departments.
For a report I use 2 tables named COMPANY_PERSON_ALL and trifm_izinler4, joining person_id field as below.
I want to discard (don't list) the row, if the first inner select returns null.
And I want to prevent the second inner select returning multiple Departments.
select izn.person_id, izn.adi_soyadi, izn.company_id,
(select a.employee_status from COMPANY_PERSON_ALL a where a.employee_status = '*' and a.person_id = izn.person_id) as Status,
(select a.org_code from COMPANY_PERSON_ALL a where a.person_id = izn.person_id) as Department,
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
where trunc(rapor_tarihi) = trunc(SYSDATE)
Can you help me how to overcome these 2 problems of inner select statements?
Assuming you only want to see the department from the active person record, you can just join the two tables instead of using subquery expressions, and filter on that status:
select izn.person_id, izn.adi_soyadi, izn.company_id,
a.employee_status as status, a.org_code as department
izn.hizmet_suresi, izn.kalan_izin
from trifm_izinler4 izn
join company_person_all a on a.person_id = izn.person_id
where rapor_tarihi >= trunc(SYSDATE)
-- and rapor_tarihi < trunc(SYSDATE) + 1 -- probably not needed
and a.employee_status = '*'
I've also changed the date comparison; if you compare using trunc(rapor_tarihi) then a normal index on that column can't be used, so it's generally better to compare the original value against a range. Since you're comparing against today's date you probably only need to look for values greater than midnight today, but if that column can have future dates then you can put an upper bound on the range of midnight tomorrow - which I've included but commented out.
If a person can be active in more than one department at a time then this will show all of those, but your wording suggests people are only active in one at a time. If you want to see a department for all active users, but not necessarily the one that has the active flag (or if there can be more than one active), then it's a bit more complicated, and you need to explain how you would want to choose which to show.

Divide count of Table 1 by count of Table 2 on the same time interval in Tableau

I have two tables with IDs and time stamps. Table 1 has two columns: ID and created_at. Table 2 has two columns: ID and post_date. I'd like to create a chart in Tableau that displays the Number of Records in Table 1 divided by Number of Records in Table 2, by week. How can I achieve this?
One way might be to use Custom SQL like this to create a new data source for your visualization:
SELECT created_table.created_date,
created_table.created_count,
posted_table.posted_count
FROM (SELECT TRUNC (created_at) AS created_date, COUNT (*) AS created_count
FROM Table1) created_table
LEFT JOIN
(SELECT TRUNC (post_date) AS posted_date, COUNT (*) AS posted_count
FROM Table2) posted_table
ON created_table.created_date = posted_table.posted_date
This would give you dates and counts from both tables for those dates, which you could group using Tableau's date functions in the visualization. I made created_table the first part of the left join on the assumption that some records would be created and not posted, but you wouldn't have posts without creations. If that isn't the case you will want a different join.

Select unique values sorted by date

I am trying to solve an interesting problem. I have a table that has, among other data, these columns (dates in this sample are shown in European format - dd/mm/yyyy):
n_place_id dt_visit_date
(integer) (date)
========== =============
1 10/02/2012
3 11/03/2012
4 11/05/2012
13 14/06/2012
3 04/10/2012
3 03/11/2012
5 05/09/2012
13 18/08/2012
Basically, each place may be visited multiple times - and the dates may be in the past (completed visits) or in the future (planned visits). For the sake of simplicity, today's visits are part of planned future visits.
Now, I need to run a select on this table, which would pull unique place IDs from this table (without date) sorted in the following order:
Future visits go before past visits
Future visits take precedence in sorting over past visits for the same place
For future visits, the earliest date must take precedence in sorting for the same place
For past visits, the latest date must take precedence in sorting for the same place.
For example, for the sample data shown above, the result I need is:
5 (earliest future visit)
3 (next future visit into the future)
13 (latest past visit)
4 (previous past visit)
1 (earlier visit in the past)
Now, I can achieve the desired sorting using case when in the order by clause like so:
select
n_place_id
from
place_visit
order by
(case when dt_visit_date >= now()::date then 1 else 2 end),
(case when dt_visit_date >= now():: date then 1 else -1 end) * extract(epoch from dt_visit_date)
This sort of does what I need, but it does contain repeated IDs, whereas I need unique place IDs. If I try to add distinct to the select statement, postgres complains that I must have the order by in the select clause - but then the unique won't be sensible any more, as I have dates in there.
Somehow I feel that there should be a way to get the result I need in one select statement, but I can't get my head around how to do it.
If this can't be done, then, of course, I'll have to do the whole thing in the code, but I'd prefer to have this in one SQL statement.
P.S. I am not worried about the performance, because the dataset I will be sorting is not large. After the where clause will be applied, it will rarely contain more than about 10 records.
With DISTINCT ON you can easily show additional columns of the row with the resulting n_place_id:
SELECT n_place_id, dt_visit_date
FROM (
SELECT DISTINCT ON (n_place_id) *
,dt_visit_date < now()::date AS prio -- future first
,#(now()::date - dt_visit_date) AS diff -- closest first
FROM place_visit
ORDER BY n_place_id, prio, diff
) x
ORDER BY prio, diff;
Effectively I pick the row with the earliest future date (including "today") per n_place_id - or latest date in the past, failing that.
Then the resulting unique rows are sorted by the same criteria.
FALSE sorts before TRUE
The "absolute value" # helps to sort "closest first"
More on the Postgres specific DISTINCT ON in this related answer.
Result:
n_place_id | dt_visit_date
------------+--------------
5 | 2012-09-05
3 | 2012-10-04
13 | 2012-08-18
4 | 2012-05-11
1 | 2012-02-10
Try this
select n_place_id
from
(
select *,
extract(epoch from (dt_visit_date - now())) as seconds,
1 - SIGN(extract(epoch from (dt_visit_date - now())) ) as futurepast
from #t
) v
group by n_place_id
order by max(futurepast) desc, min(abs(seconds))