When running the following query in PostgreSQL:-
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY time ASC
I am getting error:-
ERROR: column "trip.time" must appear in the GROUP BY clause or be used in an aggregate function
I don't understand. Isn't min an aggregate function? This query runs on MySql.
time isn't a group by item or an aggregate term, so you cannot use it in a query that has a group by clause. Presumably, you meant to order by min(time), which can be stated explicitly:
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY MIN(time) ASC
Or by a positional index:
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY 3 ASC
Related
I have used the following postgresql query to find the maximum difference between timestamp events for each user:
select
sq.user_id,
max(sq.diffs) inactivity
from (
select
user_id,
(lead("when", 1, now()) over (partition by user_id order by "when") - "when") as diffs
from tracking_viewed
) as sq
group by sq.user_id
order by inactivity desc;
This query works for a different table, but it returns all null values for the "when" column that includes nulls.
How can I remove or skip nulls from the lead and partition functions?
I'm trying to write a query for Hive that uses the system date to determine both yesterday's date as well as the date 30 days ago. This will provide me with a rolling 30 days without the need to manually feed the date range to the query every time I run it.
I have that code working fine in a CTE. The problem I'm having is in referencing those dates in another CTE without joining the CTEs together, which I can't do since there's not a common field to join on.
I've tried various approaches but I get a "ParseException" every time.
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT * FROM myTable
WHERE date_id BETWEEN (SELECT start_date FROM date_range) AND (SELECT end_date FROM date_range)
The intended result is the set of records from myTable that have a date_id between the start_date and end_date as found in the CTE date_range. Perhaps I'm going about this all wrong?
You can do a cross join, it does not require ON condition. Your date_range dataset is one row only, you can CROSS JOIN it with your_table if necessary and it will be transformed to a map-join (your small dataset will be broadcasted to all the mappers and loaded into each mapper memory and will work very fast), check the EXPLAIN command and make sure it is a map-join:
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=250000000;
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id BETWEEN d.start_date AND d.end_date
Also instead if this you can calculate dates in the where clause:
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id
BETWEEN CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT)
AND CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT)
I want to get the last entry for each user but the customer_id is a hash 'ASAG#...' order by customer_id destroys the query. Is there an alternative?
Select Distinct On (l.customer_id)
l.customer_id
,l.created_at
,l.text
From likes l
Order By l.customer_id, l.created_at Desc
Your current query already appears to be working, q.v. here:
Demo
I don't know why your current query is not generating the results you would expect. It should return one distinct record for every customer, corresponding to the more recent one, given your ORDER BY statement.
In any case, if it does not do what you want, an alternative would be to use ROW_NUMBER() here with a partition by user. The inner query assigns a row number to each user, with the value 1 going to the most recent record for each user. Then the outer query retains only the latest record.
SELECT
t.customer_id,
t.created_at,
t.text
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) rn
FROM likes
) t
WHERE t.rn = 1
To speed up the inner query which uses ROW_NUMBER() you can try adding a composite index on the customer_id and created_at columns:
CREATE INDEX yourIdx ON likes (customer_id, created_at);
I have a table with an ID column called mmsi and another column of timestamp, with multiple timestamps per mmsi.
For each mmsi I want to calculate the standard deviation of the difference between consecutive timestamps.
I'm not very experienced with SQL but have tried to construct a function as follows:
SELECT
mmsi, stddev(time_diff)
FROM
(SELECT mmsi,
EXTRACT(EPOCH FROM (timestamp - lag(timestamp) OVER (ORDER BY mmsi ASC, timestamp ASC)))
FROM ais_messages.ais_static
ORDER BY mmsi ASC, timestamp ASC) AS time_diff
WHERE time_diff IS NOT NULL
GROUP BY mmsi;
Your query looks on the right track, but it has several problems. You labelled your subquery, which looks almost right, with an alias which you then select. But this subquery returns multiple rows and columns so this doesn't make any sense. Here is a corrected version:
SELECT
t.mmsi,
STDDEV(t.time_diff) AS std
FROM
(
SELECT
mmsi,
EXTRACT(EPOCH FROM (timestamp - LAG(timestamp) OVER
(PARTITION BY mmsi ORDER BY timestamp))) AS time_diff
FROM ais_messages.ais_static
ORDER BY mmsi, timestamp
) t
WHERE t.time_diff IS NOT NULL
GROUP BY t.mmsi
This approach should be fine but there is one edge case where it might not behave as expected. If a given mmsi group have only one record, then it would not even appear in the result set of standard deviations. This is because the LAG calculation would return NULL for that single record and it would be filtered off.
I'm working out a query that I've ran successfully in MySQL for a while, but in Postgres it's not working with the ole -
ERROR: column "orders.created_at" must appear in the GROUP BY clause
or be used in an aggregate function
Here's the query:
SELECT SUM(total) AS total, to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'DD')
ORDER BY created_at ASC;
It's just supposed to return something like this:
total | order_date
---------+------------
1099.90 | 2013/01/15
650.00 | 2013/01/16
4399.00 | 2013/01/17
The main thing is I want the sum grouped by each individual day of the month.
Anyone have ideas?
UPDATE:
The reason I'm grouping by day is because the graph will be labeled with each day of the month, and the total sales for each.
1st - $3400.00
2nd - $2237.00
3rd - $1489.00
etc.
I'm not sure why you're doing a conversion there. I think the better thing to do would be this:
SELECT
SUM(total) AS total,
created_at::date AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
created_at::date
ORDER BY
created_at::date ASC;
I would recommend this query and then format the daily labels in your graph through the graph settings to ensure you do not have any weird issues of the same day in different months getting grouped. However, to get what you display in your edit you can do this:
SELECT
SUM(total) AS total,
to_char(created_at, 'DDth') AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
to_char(created_at, 'DDth')
ORDER BY
to_char(created_at, 'DDth') ASC;
Here is the sql you need in order to run this. The group by and order by need to contain the same expression.
SELECT SUM(total) AS total,
to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'YYYY/MM/DD')
order by to_char(created_at, 'YYYY/MM/DD')
http://sqlfiddle.com/#!12/52d99/2
Hope this helps,
Matt