I have table of companies where each company has individual timezone.
for example -
company 1 has time zone UTC+10 and
company 2 - UTC+2
table companies has field time_zone and stored abbreviation of zone like America/Los_Angeles(I can add additional field for store offset value from UTC if need).
and has table requests with start_date field where stored TIMESTAMP without time zone(UTC-0)
for example -
id | company_id | start_date (utc-0)
------------------------------------
1 | 1 | 21-03-16 02-00 // added for company `21-03-16 12-00`
2 | 2 | 21-03-16 23-00 // added for company `22-03-16 01-00`
3 | 1 | 20-03-16 13-00 // added for company `20-03-16 23-00`
4 | 1 | 21-03-16 23-00 // added for company `22-03-16 09-00
I want select records that started from 21-03-16 00-00 to 21-03-16 23-59 considering time zone each company.
but if I will use -
select * from request where start_date between '2016-03-21 00:00:00.000000' AND '2016-03-21 23:59:59.999999'
I get requests where id = 2 and 4.
but these requests were added 22-03-16 by fact for each company.
Any suggestions how I can decide this situation by one select? Many thanks.
I'm not sure if I understand your question right, you might need to clarify.
But I'd say that joining with companies where the time zone information is stored should solve the problem:
SELECT r.*
FROM request r
JOIN companies c ON c.id = r.company_id
WHERE r.start_date BETWEEN '2016-03-21 00:00:00.000000'
AT TIME ZONE c.time_zone
AND '2016-03-21 23:59:59.999999'
AT TIME ZONE c.time_zone;
Related
First time poster here, not sure if my title really outlines what I am looking for here...
I am trying to get the following:
"Which month of the year does each property type earn the most money on average?"
I have two tables with the following fields I am working with:
calendar_metric
period (this is a date, formatted 'yyyy-mm-dd'
revenue
airbnb_property_id
property
property_type
airbnb_property_id
I have figured out how to get the month, property type, and average revenue to display, but am having trouble with grouping it correctly I think.
select
extract(month from calendar_metric.period) as month,
property.property_type,
avg(calendar_metric.revenue) as average_revenue
from
calendar_metric
inner join property on
calendar_metric.airbnb_property_id = property.airbnb_property_id
group by
month,
property_type
What I want it to output would look like this:
month | property_type | max_average_revenue
---------------------------------------------
1 | place | 123
2 | floor apt | 535
3 | hostel | 666
4 | b&b | 363
5 | boat | 777
etc| etc | etc
but currently I am getting this:
month-property_type | max_average_revenue
---------------------------------------------
1 | place | 123
2 | floor apt | 535
1 | place | 444
4 | b&b | 363
4 | b&b | 777
etc| etc | etc
So essentially, months are coming back duplicated as I extracted the month from a date stamp, the data set goes across 5 years or so, and am probably not grouping right? I know I am missing something simple probably, I just cannot seem to figure out how to do this correctly.
Help!
you should be grouping by year-month, since you are trying to view 5 year period.
select
extract(month from calendar_metric.period) as month,
property.property_type,
avg(calendar_metric.revenue) as average_revenue
from
calendar_metric
inner join property on
calendar_metric.airbnb_property_id = property.airbnb_property_id
group by
extract(year from period),
month,
property_type
I think your query is basically there, it's just returning all months rather than just filtering out the rows you don't want. I'd tend to use the DISTINCT ON clause for this sort of thing, something like:
SELECT DISTINCT ON (property_type)
p.property_type, extract(month from cm.period) AS month,
avg(cm.revenue) AS revenue
FROM calendar_metric AS cm
JOIN property AS p USING (airbnb_property_id)
GROUP BY property_type, month
ORDER BY property_type, revenue DESC;
I've shortened your query down a bit, hope it still makes sense to you.
using CTEs you can express this in two steps which might be easier to follow what's going on:
WITH results AS (
SELECT p.property_type, extract(month from cm.period) AS month,
avg(cm.revenue) AS revenue
FROM calendar_metric AS cm
JOIN property AS p USING (airbnb_property_id)
GROUP BY property_type, month
)
SELECT DISTINCT ON (property_type)
property_type, month, revenue
FROM results
ORDER BY property_type, revenue DESC;
Difficult question to title, but I am trying to replicate what social media or notification feeds do where they batch recent events so they can display “sequences” of actions. For example, if these are "like" records, in reverse chronological order:
like_id | user_id | like_timestamp
--------------------------------
1 | bob | 12:30:00
2 | bob | 12:29:00
3 | jane | 12:27:00
4 | bob | 12:26:00
5 | jane | 12:24:00
6 | jane | 12:23:00
7 | scott | 12:22:00
8 | bob | 12:20:00
9 | alice | 12:19:00
10 | scott | 12:18:00
I would like to group them such that I get the last 3 "bursts" of user likes, grouped (partitioned?) by user. If the "burst" rule is that likes less than 5 minutes apart belong to the same burst, then we would get:
user_id | num_likes | burst_start | burst_end
----------------------------------------------
bob | 3 | 12:26:00 | 12:30:00
jane | 3 | 12:23:00 | 12:27:00
scott | 2 | 12:18:00 | 12:22:00
alice's like does not get counted because it's part of the 4th most recent batch, and like 8 does not get added to bob's tally because it is 6 minutes before the next one.
I've tried keeping track of bursts with postgres' lag function, which lets me mark start and end events, but since like events can be staggered, I have no way of tying a like back to its "originator" (for example, tying id 4 back to 2).
Is this grouping possible? If so, is it possible to keep track of the start and end timestamp of each burst?
step-by-step demo:db<>fiddle
WITH group_ids AS ( -- 1
SELECT DISTINCT
user_id,
first_value(like_id) OVER (PARTITION BY user_id ORDER BY like_id) AS group_id
FROM
likes
LIMIT 3
)
SELECT
user_id,
COUNT(*) AS num_likes,
burst_start,
burst_end
FROM (
SELECT
user_id,
-- 4
first_value(like_timestamp) OVER (PARTITION BY group_id ORDER BY like_id) AS burst_end,
first_value(like_timestamp) OVER (PARTITION BY group_id ORDER BY like_id DESC) AS burst_start
FROM (
SELECT
l.*, gi.group_id,
-- 2
lag(like_timestamp) OVER (PARTITION BY group_id ORDER BY like_id) - like_timestamp AS diff
FROM
likes l
JOIN
group_ids gi ON l.user_id = gi.user_id
) s
WHERE diff IS NULL OR diff <= '00:05:00' -- 3
) s
GROUP BY user_id, burst_start, burst_end -- 5
The CTE is for creating an ordered group id per user_id. So the first user (here the most recent one) gets the lowest group_id (which is bob). The second user the second highest (jane) and so on. This is used to able to work with all likes of a certain user within one partition. This step is necessary because you cannot simply order by user_id which would bring alice to the top. The LIMIT 3 limitates the whole query to the first three users.
After joining the calculated user's group_id the time differences are calculated using the lag() window function which allows you to get the previous value. So it can be used to easily calculate the difference between the current timestamp the the previous one. This happens only within the user's groups.
After that the likes that are to much away (more than 5 minutes from the last one) can be removed through the calculated diff
Then the highest and lower timestamp can be calculated with the first_value() window function (ascending and descending order). These mark your burst_start and burst_end
Finally you can group all users and count their records.
I'm trying to construct very simple graph showing how much visits I've got in some period of time (for example for each 5 minutes).
I have Grafana of v. 5.4.0 paired well with Postgres v. 9.6 full of data.
My table below:
CREATE TABLE visit (
id serial CONSTRAINT visit_primary_key PRIMARY KEY,
user_credit_id INTEGER NOT NULL REFERENCES user_credit(id),
visit_date bigint NOT NULL,
visit_path varchar(128),
method varchar(8) NOT NULL DEFAULT 'GET'
);
Here's some data in it:
id | user_credit_id | visit_date | visit_path | method
----+----------------+---------------+---------------------------------------------+--------
1 | 1 | 1550094818029 | / | GET
2 | 1 | 1550094949537 | /mortgage/restapi/credit/{userId}/decrement | POST
3 | 1 | 1550094968651 | /mortgage/restapi/credit/{userId}/decrement | POST
4 | 1 | 1550094988557 | /mortgage/restapi/credit/{userId}/decrement | POST
5 | 1 | 1550094990820 | /index/UGiBGp0V | GET
6 | 1 | 1550094990929 | / | GET
7 | 2 | 1550095986310 | / | GET
...
So I tried these 3 variants (actually, dozens of others with no luck) with no success:
Solution A:
SELECT
visit_date as "time",
count(user_credit_id) AS "user_credit_id"
FROM visit
WHERE $__timeFilter(visit_date)
ORDER BY visit_date ASC
No data on graph. Error: pq: invalid input syntax for integer: "2019-02-14T13:16:50Z"
Solution B
SELECT
$__unixEpochFrom(visit_date),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY user_credit_id
Series ASELECT
$__time(visit_date/1000,10m,previous),
count(user_credit_id) AS "user_credit_id A"
FROM
visit
WHERE
visit_date >= $__unixEpochFrom()::bigint*1000 and
visit_date <= $__unixEpochTo()::bigint*1000
GROUP BY 1
ORDER BY 1
No data on graph. No Error..
Solution C:
SELECT
$__timeGroup(visit_date, '1h'),
count(user_credit_id) AS "user_credit_id"
FROM visit
GROUP BY time
ORDER BY time
No data on graph. Error: pq: function pg_catalog.date_part(unknown, bigint) does not exist
Could someone please help me to sort out this simple problem as I think the query should be compact, naive and simple.. But Grafana docs demoing its syntax and features confuse me slightly.. Thanks in advance!
Use this query, which will works if visit_date is timestamptz:
SELECT
$__timeGroupAlias(visit_date,5m,0),
count(*) AS "count"
FROM visit
WHERE
$__timeFilter(visit_date)
GROUP BY 1
ORDER BY 1
But your visit_date is bigint so you need to convert it to timestamp (probably with TO_TIMESTAMP()) or you will need find other way how to use it with bigint. Use query inspector for debugging and you will see SQL generated by Grafana.
Jan Garaj, Thanks a lot! I should admit that your snippet and what's more valuable your additional comments advising to switch to SQL debugging dramatically helped me to make my "breakthrough".
So, the resulting query which solved my problem below:
SELECT
$__unixEpochGroup(visit_date/1000, '5m') AS "time",
count(user_credit_id) AS "Total Visits"
FROM visit
WHERE
'1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval BETWEEN
$__timeFrom()::timestamp
AND
$__timeTo()::timestamp
GROUP BY 1
ORDER BY 1
Several comments to decypher all this Grafana magic:
Grafana has its limited DSL to make configurable graphs, this set of functions converts into some meaningful SQL (this is where seeing "compiled" SQL helped me a lot, many thanks again).
To make my BIGINT column be appropriate for predefined Grafana functions we need to simply convert it to seconds from UNIX epoch so, in math language - just divide by 1000.
Now, WHERE statement seems not so simple and predictable, Grafana DSL works different where and simple division did not make trick and I solved it by using another Grafana functions to get FROM and TO points of time (period of time for which Graph should be rendered) but these functions generate timestamp type while we do have BIGINT in our column. So, thanks to Postgres we have a bunch of converter means to make it timestamp ('1970-01-01 00:00:00 GMT'::timestamp + ((visit_date/1000)::text)::interval - generates you one BIGINT value converted to Postgres TIMESTAMP with which Grafana deals just fine).
P.S. If you don't mind I've changed my question text to be more precise and detailed.
I have daily log data stored in a Postgres database structured with an id and date. Users can, obviously, have multiple rows in the database if they log in multiple times.
To visualize:
| id | timestamp |
|------|---------------------|
| 0099 | 2004-10-19 10:23:54 |
| 1029 | 2004-10-01 10:23:54 |
| 2353 | 2004-10-20 8:23:54 |
Let's say MAU ("monthly active users") is defined as the number of unique ids that log in for a given calendar month. I would like to get the rolling sum of MAU for each day in a month, i.e. MAU at different points in time as it grows. For example, if we were looking at October 2014:
| date | MAU |
|------------|-------|
| 2014-10-01 | 10000 |
| 2014-10-02 | 12948 |
| 2014-10-03 | 13465 |
And so forth until the end of the month. I've heard that window functions might be one way to solve this. Any ideas how to utilize that to get the rolling MAU sum?
After reading the documentation for Postgres window functions, here's one solution that gets the rolling MAU sum for the current month:
-- First, get id and date of each timestamp within the current month
WITH raw_data as (SELECT id, date_trunc('day', timestamp) as timestamp
FROM user_logs
WHERE date_trunc('month', timestamp) = date_trunc('month', current_timestamp)),
-- Since we only want to count the earliest login for a month
-- for a given login, use MIN() to aggregate
month_data as (SELECT id, MIN(timestamp) as timestamp_day FROM raw_data GROUP BY id)
-- Postgres doesn't support DISTINCT for window functions, so query
-- from the rolling sum to have each row as a day
SELECT timestamp_day as date, MAX(count) as MAU
FROM (SELECT timestamp_day, COUNT(id) OVER(ORDER BY timestamp_day) FROM month_data) foo
GROUP By timestamp_day
For a given month, you can calculate this by adding in a user on the first day during the month when they are seen:
select date_trunc('day', mints), count(*) as usersOnDay,
sum(count(*)) over (order by date_trunc('day', mints)) as cume_users
from (select id, min(timestamp) as mints
from log
where timestamp >= '2004-10-01'::date and timestamp < '2004-11-01'::date
group by id
) l
group by date_trunc('day', mints);
Note: This answers your question about one month. This can be extended to more calendar months, where you are counting the unique users on the first day and then adding increments.
If you have a question where the cumulative period passes month boundaries, then ask another question and explain what a month means under those circumstances.
Here's my use case:
We have a analytics-like tools which used to count the number of users per hour on our system. And now the business would like to have the number of unique users. As our amount of user is very small, we will do that using
SELECT count(*)
FROM (
SELECT DISTINCT user_id
FROM unique users
WHERE date BETWEEN x and y
) distinct_users
i.e we will store the couple user_id, date and count unique users using DISTINCT (user_id is not a foreign key, as users are not logged in, it's just a unique identifier generated by the system, some kind of uuidv4 )
this works great in term of performance for a magnitude of data.
Now the problem is to import legacy data in it
I would like to know the SQL query to transform
date | number_of_users
12:00 | 2
13:00 | 4
into
date | user_id
12:00 | 1
12:00 | 2
13:00 | 1
13:00 | 2
13:00 | 3
13:00 | 4
(as long as the "count but not unique" returns the same number as before, we're fine if the "unique users count" is a bit off)
Of course, I could do a python script, but I was wondering if there was a SQL trick to do that, using generate_series or something related
generate_series() is indeed the way to go:
with data (date, number_of_users) as (
values
('12:00',2),
('13:00',4)
)
select d.date, i.n
from data d
cross join lateral generate_series(1, d.number_of_users) i (n)
order by d.date, i.n ;