PostgreSQL get results that have been created 24 hours from now - postgresql

I have two tables that I am joining together. I want to filter the results based on whether or not it had been created 24 hours prior. Here are my tables.
table user_infos (
id integer,
date_created timestamp with timezone,
name varchar(40)
);
table user_data (
id integer,
team_name varchar(40)
);
This is my query that I am using to join them together and hopefully filter them:
SELECT timestampdiff(HOUR, user_infos.date_created, now()) as hours_since,
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE timestampdiff(HOUR, user_infos.date_created, now()) < 24
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
What I am trying to do is join the two tables such that the id, team_name, name, and date-created would be treated as one table.
Then I would like to filter it such that I only get the results that were created 24 hours ago. This is what I am using the timestampdiff for.
Then I ORDER then by name and id in ascending order.
then limit the results to 50.
Everything look good except that I doesn't work. When I run this query it tells me that the "hour" column does not exist.
Clearly there is something subtle here that is messing everything up. Does anyone have any suggestions?
Alternatively, I've tried this, but it tells me that there is a syntax error at 1;
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 1 DAY ) ) AND
DATE ( NOW() )
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0

I think your problem is with your data types. You are checking if a timestamp field is between a casted date field (which removes the time from the date). NOW() is different than the DATE(NOW()).
So you have 2 options. You can either remove the DATE() casting and it should work, or you can cast the date_created to a date.
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE_SUB( NOW() , INTERVAL 1 DAY ) AND
NOW()
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
SQL Fiddle Demo

Related

I am trying to make a list where all user can go using command BETWEEN, but it doesn't work properly

My code to identified where one user can go works properly, but I want to make a list of where all user can go. And for that I tried using the command BETWEEN AND, but it did not work as expected.
Code: Where ONE USER can go;
SELECT place_name, user_id, user_name
FROM schema.place, schema.person
WHERE schema.place_id NOT IN(
SELECT place_id
FROM went_to
WHERE went_to.user_id = 1
AND age(date) <= interval '4 months'
)
AND user_id=1
IMAGE OF THE CODE WORKING PROPERLY:
There's a total of 40 lines, places the user with the id 1 can go
Code: Where ALL USER can go;
SELECT place_name, user_id, user_name
FROM schema.place, schema.person
WHERE schema.place_id NOT IN(
SELECT place_id
FROM went_to
WHERE went_to.user_id BETWEEN 1 AND 15
AND age(date) <= interval '4 months'
)
AND user_id BETWEEN 1 AND 15
ORDER BY user_id
IMAGE OF THE CODE NOT WORKING PROPERLY:
It should have a total of 40 lines, places the user with the id 1 can go
When I reduce the difference in the BETWEEN, the code gets closer to the right answer, however it isn't right.
What I am doing it wrong with the BETWEEN?
The tables:
CREATE TABLE schema.place (
place_id VARCHAR(8),
place_name VARCHAR (50),
CONSTRAINT pk_place_id PRIMARY KEY (place_id)
);
CREATE TABLE schema.user (
user_id VARCHAR(3),
user_name VARCHAR (50),
CONSTRAINT pk_user_id PRIMARY KEY (user_id)
);
CREATE TABLE schema.visit (
user_id VARCHAR(3),
place_id VARCHAR(8),
data DATE,
CONSTRAINT pk_user_id FOREIGN KEY (user_id) REFERENCES SCHEMA.user,
CONSTRAINT pk_place_id FOREIGN KEY (place_id) REFERENCES code.place,
EXCLUDE USING gist (pk_user_id WITH =, daterange(data, (data + interval '6 months')::date) WITH &&)
);
Seeing the schema would be helpful, but I believe the issue is in how you constructed your query.
SELECT place_id
FROM went_to
WHERE went_to.id BETWEEN 1 AND 15
AND age(date) <= interval '4 months'
If we look at just the subquery, we are returning all the place ids where users 1-15 went to in the last 4 months. You're then trying to return all places/users that don't match those place ids. The issue is that you're combining all the places that all of those users went to and then using that as an exclusion when you really want to be excluding only places a particular user went to.
I think you want something like this:
SELECT schema.place.place_name, schema.user.user_id, schema.user.user_name
FROM schema.place, schema.user
WHERE (schema.place.place_id, schema.user.user_id) NOT IN(
SELECT place_id, schema.visit.user_id
FROM schema.visit
WHERE schema.visit.user_id::int BETWEEN 1 AND 15
AND age(data) <= interval '4 months'
)
AND user_id::int BETWEEN 1 AND 15
ORDER BY schema.user.user_id
Your schema has the ids as varchars and not ints and the date field is called data, so I had to make some tweaks

Values in raw as column names PosgreSQL

I have a very simple table like below
Events:
Event_name
Event_time
A
2022-02-10
B
2022-05-11
C
2022-07-17
D
2022-10-20
To a table like this are added new events, but we always take the event from the last X days (for example, 30 days), so the query result for this table is changeable.
I would like to transform the above table into this:
A
B
C
D
2022-02-10
2022-05-11
2022-07-17
2022-10-20
In general, the number of columns won't be constant. But if it's not possible we can add a limitation for the number of columns- for example, 10 columns.
I tried with crosstab, but I had to add the column name manually this is not what I mean and it doesn't work with the CTE query
WITH CTE AS (
SELECT DISTINCT
1 AS "Id",
event_time,
event_name,
ROW_NUMBER() OVER(ORDER BY event_time) AS nr
FROM
events
WHERE
event_time >= CURRENT_DATE - INTERVAL '31 days')
SELECT *
FROM
crosstab (
'SELECT id, event_name, event_time
FROM
CTE
WHERE
nr <= 10
ORDER BY
nr') AS ct(id int,
event_name text,
EventTime1 timestamp,
EventTime2 timestamp,
EventTime3 timestamp,
EventTime4 timestamp,
EventTime5 timestamp,
EventTime6 timestamp,
EventTime7 timestamp,
EventTime8 timestamp,
EventTime9 timestamp,
EventTime10 timestamp)
This query will be used as the data source in Tableau (data visualization and analysis software) it would be great if it could be one query (without temp tables, adding new functions, etc.)
Thanks!

How to get timestamp associated with percentile(x) value using timescale db time_bucket

I need find percentile(50) value and its timestamp using timescale db time-bucket. Finding P50 is easy but I don't know how to get the time stamp.
Select time_bucket('120 sec',timestamp_utc) as interval_size,
first(timestamp_utc,int_val) as minTime,
min(int_val) as minVal,
last(timestamp_utc,int_val) as maxTime,
max(int_val) as maxVal,
-- timestamp of percentile value below.
percentile_disc(0.5) within group (order by int_val) as medianVal
from timeseries.raw
where timestamp_utc > NOW() - INTERVAL '10 min'
AND tag_id = 59560544877390423
group by interval_size
order by interval_size desc
I think what you're looking for we can do by selecting where the int_val is equal to the median value in a lateral (percentile_disc does ensure that there is a value exactly equal to that value, there may be more than one depending on what you want there you could deal with the more than one case in different ways), building on a previous answer and making it work a bit better I think would look something like this:
WITH p50 AS (
Select time_bucket('120 sec',timestamp_utc) as interval_size,
first(timestamp_utc,int_val) as minTime,
min(int_val) as minVal,
last(timestamp_utc,int_val) as maxTime,
max(int_val) as maxVal,
-- timestamp of percentile value below.
percentile_disc(0.5) within group (order by int_val) as medianVal
from timeseries.raw
where timestamp_utc > NOW() - INTERVAL '10 min'
AND tag_id = 59560544877390423
group by interval_size
order by interval_size desc
) SELECT p50.*, rmed.*
FROM p50, LATERAL (SELECT * FROM timeseries.raw r
-- copy over the same where clause from above so we're dealing with the same subset of data
WHERE timestamp_utc > NOW() - INTERVAL '10 min'
AND tag_id = 59560544877390423
-- add a where clause on the median value
AND r.int_val = p50.medianVal
-- now add a where clause to account for the time bucket
AND r.timestamp_utc >= p50.interval_size
AND r.timestamp_utc < p50.interval_size + '120 sec'::interval
-- Can add an order by something desc limit 1 if you want to avoid ties
) rmed;
Note that this will do a second scan of the table, it should be reasonably efficient, especially if you have an index on that column, but it will cause another scan, there isn't a great way that I know of of doing it without a second scan.

Using min/max values from a CTE in a later query, instead of using a subquery in Postgres

I've got a remedial question about pulling results out of a CTE in a later part of the query. For the example code, below are the relevant, stripped down tables:
CREATE TABLE print_job (
created_dts timestamp not null default now(),
status text not null
);
CREATE TABLE calendar_day (
date_actual date not null
);
In the current setup, there are gaps in the dates in the print_job data, and we would like to have a gapless result. For example, there are 87 days from the first to last date in the table, and only 77 days in there have data. We've already got a calendar_day dimension table to join with to get the 87 rows for the 87-day range. It's easy enough to figure out the min and max dates in the data with a subquery or in a CTE, but I don't know how to use those values from a CTE. I've got a full query below, but here are the relevant fragments with comments:
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- This CTE does not work because it doesn't know what date_range is.
complete_date_series_using_cte AS (
select actual_date
from calendar_day
where actual_date >= date_range.start_date
and actual_date <= date_range.end_date
),
-- Subqueries are fine, because the FROM is specified in the subquery condition directly.
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
I run into this regularly, and finally figured I'd ask. I've hunted around already for an answer, but I'm not clear how to summarize it well. And while there's nothing wrong with the subqueries in this case, I've got other situations where a CTE is nicer/more readable.
If it helps, I've listed the complete query below.
-- Get some counts and give them names.
WITH
daily_status AS (
select created_dts::date as created_date,
count(*) AS daily_total,
count(*) FILTER (where status = 'Error') AS status_error,
count(*) FILTER (where status = 'Processing') AS status_processing,
count(*) FILTER (where status = 'Aborted') AS status_aborted,
count(*) FILTER (where status = 'Done') AS status_done
from print_job
group by created_dts::date
),
-- Get the date range from the data.
date_range AS (
select min(created_dts::date) AS start_date,
max(created_dts::date) AS end_date
from print_job),
-- There are gaps in the data, and we want a row for dates with no results.
-- Could use generate_series on a timestamp & convert that to dates. But,
-- in our case, we've already got dimension tables for days. All that's needed
-- here is the actual date.
-- This CTE does not work because it doesn't know what date_range is.
-- complete_date_series_using_cte AS (
-- select actual_date
--
-- from calendar_day
--
-- where actual_date >= date_range.start_date
-- and actual_date <= date_range.end_date
-- ),
complete_date_series_using_subquery AS (
select date_actual
from calendar_day
where date_actual >= (select min(created_dts::date) from print_job)
and date_actual <= (select max(created_dts::date) from print_job)
)
-- The final query joins the complete date series with whatever data is in the print_job table daily summaries.
select date_actual,
coalesce(daily_total,0) AS total,
coalesce(status_error,0) AS errors,
coalesce(status_processing,0) AS processing,
coalesce(status_aborted,0) AS aborted,
coalesce(status_done,0) AS done
from complete_date_series_using_subquery
left join daily_status
on daily_status.created_date =
complete_date_series_using_subquery.date_actual
order by date_actual
I said it was a remedial question....I remembered where I'd seen this done before:
https://tapoueh.org/manual-post/2014/02/postgresql-histogram/
In my example, I need to list the CTE in the table list. That's obvious in retrospect, and I realize that I automatically don't think to do that as I'm habitually avoiding CROSS JOIN. The fragment below shows the slight change needed:
WITH
date_range AS (
select min(created_dts)::date as start_date,
max(created_dts)::date as end_date
from print_job
),
complete_date_series AS (
select date_actual
from calendar_day, date_range
where date_actual >= date_range.start_date
and date_actual <= date_range.end_date
),

Days since last purchase postgres (for each purchase)

Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks
You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.
You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;