What's wrong with this query?
select extract(week from created_at) as week,
count(*) as received,
(select count(*) from bugs where extract(week from updated_at) = a.week) as done
from bugs as a
group by week
The error message is:
column a.week does not exist
UPDATE:
following the suggestion of the first comment, I tried this:
select a.extract(week from created_at) as week,
count(*) as received, (select count(*)
from bugs
where extract(week from updated_at) = a.week) as done from bugs as a group by week
But it doesn't seem to work:
ERROR: syntax error at or near "from"
LINE 1: select a.extract(week from a.created_at) as week, count(*) a...
As far as I can tell you don't need the sub-select at all:
select extract(week from created_at) as week,
count(*) as received,
sum( case when extract(week from updated_at) = extract(week from created_at) then 1 end) as done
from bugs
group by week
This counts all bugs per week and counts those that are updated in the same week as "done".
Note that your query will only report correct values if you never have more than one year in your table.
If you have more than one year of data in the table you need to include the year in the comparison as well:
select to_char(created_at, 'iyyy-iw') as week,
count(*) as received,
sum( case when to_char(created_at, 'iyyy-iw') = to_char(updated_at, 'iyyy-iw') then 1 end) as done
from bugs
group by week
Note that I used IYYY an IW to cater for the ISO definition of the year and the week around the year end/start.
Maybe a little explanation on why your original query did not work would be helpful:
The "outer" query uses two aliases
a table alias for bugs named a
a column alias for the expression extract(week from created_at) named week
The only place where the column alias week can be used is in the group by clause.
To the sub-select (select count(*) from bugs where extract(week from updated_at) = a.week)) the alias a is visible, but not the alias week (that's how the SQL standard is defined).
To get your subselect working (in terms of column visibility) you would need to reference the full expression of the "outer" column:
(select count(*) from bugs b where extract(week from b.updated_at) = extract(week from a.created_at))
Note that I introduced another table alias b in order to make it clear which column stems from which alias.
But even then you'd have a problem with the grouping as you can't reference an ungrouped column like that.
that could work as well
with origin as (
select extract(week from created_at) as week, count(*) as received
from bugs
group by week
)
select week, received,
(select count(*) from bugs where week = extract(week from updated_at) )
from origin
it should have a good performance
Related
I have a table called Table1. I am trying to get the weekly average, but I only have daily data. My table contains the following attributes: caseID, date, status and some other (irrelevant) attributes. With the following query, I made the following table which comes close to what I want:
However, I would like to add a average per week of the number of cases. I have look everywhere, but I am not sure how to include that. Has anybody any clues for how to add that.
Thanks.
To expand on #luuk's answer...
SELECT
date,
COUNT(id) as countcase,
EXTRACT(WEEK FROM date) AS weeknbr,
AVG(COUNT(id)) OVER (PARTITION BY EXTRACT(WEEK FROM date)) as weeklyavg
FROM table1
GROUP BY date, weeknbr
ORDER BY date, weeknbr
This is possible as the Aggregation / GROUP BY is applied before the window/analytic function.
select
date,
countcase,
extract(week from date) as weeknbr,
avg(countcase) over (partition by extract(week from date)) as weeklyavg
from table1;
with partial as(
select
date_part('week', activated_at) as weekly,
count(*) as count
from vendors
where activated_at notnull
group by weekly
)
This is the query counts number of vendors activating per week. I need to change the start day of week from Monday to Saturday. Similar posts like how to change the first day of the week in PostgreSQL or Making Postgres date_trunc() use a Sunday based week but non explain how to embed it in date_part function. I would like to know how to use this function in my query and start day from Saturday.
Thanks in advance.
maybe a little bit overkill for that, you can use some ctes and window functions, so first generate your intervals, start with your first saturday, you want e.g. 2018-01-06 00:00 and the last day you want 2018-12-31, then select your data, join it , sum it and as benefit you also get weeks with zero activations:
with temp_days as (
SELECT a as a ,
a + '7 days'::interval as e
FROM generate_series('2018-01-06 00:00'::timestamp,
'2018-12-31 00:00', '7 day') as a
),
temp_data as (
select
1 as counter,
vendors.activated_at
from vendors
where activated_at notnull
),
temp_order as
(
select *
from temp_days
left join temp_data on temp_data.activated_at between (temp_days.a) and (temp_days.e)
)
select
distinct on (temp_order.a)
temp_order.a,
temp_order.e,
coalesce(sum(temp_order.counter) over (partition by temp_order.a),0) as result
from temp_order
Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?
For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1
I'm trying to write a query for Hive that uses the system date to determine both yesterday's date as well as the date 30 days ago. This will provide me with a rolling 30 days without the need to manually feed the date range to the query every time I run it.
I have that code working fine in a CTE. The problem I'm having is in referencing those dates in another CTE without joining the CTEs together, which I can't do since there's not a common field to join on.
I've tried various approaches but I get a "ParseException" every time.
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT * FROM myTable
WHERE date_id BETWEEN (SELECT start_date FROM date_range) AND (SELECT end_date FROM date_range)
The intended result is the set of records from myTable that have a date_id between the start_date and end_date as found in the CTE date_range. Perhaps I'm going about this all wrong?
You can do a cross join, it does not require ON condition. Your date_range dataset is one row only, you can CROSS JOIN it with your_table if necessary and it will be transformed to a map-join (your small dataset will be broadcasted to all the mappers and loaded into each mapper memory and will work very fast), check the EXPLAIN command and make sure it is a map-join:
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=250000000;
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id BETWEEN d.start_date AND d.end_date
Also instead if this you can calculate dates in the where clause:
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id
BETWEEN CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT)
AND CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT)
To make the example super simple, lets say that I have a table with three rows, ID, Name, and Date. I need to find the count of all ID's belonging to a specific name where the ID does not belong to this month.
Using that example, I would want this output:
In other words, I want to count how many ID's that a name has that aren't this month/year.
I'm more into PowerShell and still fairly new to SQL. I tried doing a case statement, but because it's not a foreach it seems to be returning "If the Name has ANY date in this month, return NULL" which is not what I want. I want it to count how many ID's per name do not appear in this month.
SELECT NAME,
CASE
WHEN ( Month(date) NOT LIKE Month(Getdate())
AND Year(date) NOT LIKE Year(Getdate()) ) THEN Count(id)
END AS TotalCount
FROM dbo.table
GROUP BY NAME,
date
I really hope this makes sense, but if it doesn't please let me know and I can try to clarify more. I tried researching cursors, but I'm having a hard time grasping them to get them into my statement. Any help would be greatly appreciated!
You only want to group by the non-aggregated columns that are in the result set (in this case, Name). You totally don't need a cursor for this, it's a fairly straight-forward query.
select
Name,
Count(*) count
from
tbl
where
tbl.date > eomonth(getdate()) or
tbl.date <= eomonth(dateadd(mm, -1, getdate())
group by
Name
I did a little bit of trickery on the exclusion of rows that are in the current month. Generally, you want to avoid running functions on the columns you're comparing to if you can so that SQL Server can use an index to speed up its search. I assumed that the ID column is unique, if it's not, change count(*) to count(distinct ID).
Alternative where clause if you're using older versions of sql server. If the table is small enough, you can just do it directly (similar to what you tried originally, it just goes in the query where clause and not embedded in a case)
where
Month(date) <> Month(Getdate())
AND Year(date) <> Year(Getdate())
If you have a large table and sarging on the index is important, there some fun stuff you can build eomonth with dateadd and the date part functions, but it's a pain.
SELECT Name, COUNT(ID) AS TotalCount
FROM dbo.[table]
WHERE DATEPART(MONTH, [Date]) != DATEPART(MONTH, GETDATE()) OR DATEPART(YEAR, [Date]) != DATEPART(YEAR, GETDATE())
GROUP BY Name;
In T-SQL:
SELECT
NAME,
COUNT(id)
FROM dbo.table
WHERE MONTH(Date_M) <> MONTH(GETDATE())
GROUP BY NAME