For almost a year, I have been using this sql to report on the rank of game profiles, based on the number of days that a game has been above the rank of 10:
SELECT P.id, P.name, P.rank, COUNT(P.id)
FROM application_classes_models_gameprofile P
LEFT JOIN application_classes_models_gamedeveloper D ON D.id = P."developerId"
LEFT JOIN application_classes_models_gameprofileposition PP ON
PP."gameProfileId" = P.id AND
PP.position <= 10 AND
PP.position > 0
WHERE
P.inactive = false AND
D."excludeFromRanking" = false AND
P.rank <= 10 AND
P.rank > 0
GROUP BY P.id
ORDER BY COUNT(P.id) DESC
Grouping is always a big of a pain in postgresql, but the above sql has been working fine for almost a year, returning the expected results.
Yesterday, I had an issue with the game profile table which forced me to have to restore a backup, for that table. I did so using pg_restore -v --clean -t application_classes_models_gameprofile < backup.bak.
This morning, when we ran our reports, postgresql came back with the error:
column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
Just to clarify, this sql has been running for almost a year, and the above error has never appeared for this specific sql query, however, it seems that after we have cleaned and restored the game profile table we're getting the above error...
I know that I can solve the problem by fixing the sql query to remove the name/rank, but I worry if there is a deeper issue here... so does anyone know why the above might happen?
Postgresal version is 9.6, running on debian 9
Related
I use macOS and I am having issue with full-refresh on a large table. During the run it appears as if it hangs and there is no query running in redshift. It does not happen with smaller tables and it does not happen if I run an incremental. This table used to be smaller and I was able to run a full refresh as long as I specified the table. Now that it is bigger I seem to be running into this issue. There are 6 tables that this model is dependent on. Almost like the command isn’t being sent. Any suggestions?
There is no error because it just doesn't run. Other team members running this on windows and macos expect it to finish in 10 min. Currently it is 30 min but I have let it sit a lot longer than that.
My command is
dbt run --models +fct_mymodel --full-refresh --vars "run_date_start: 2020-06-01"
Thank you
Redhift UI usually shows only the long running queries. I ran into similar problems and they were caused by lock on some tables - in our case caused by uncommitted explicit transactions (BEGIN without COMMIT or ROLLBACK).
run this query to see current transactions and their locks:
select a.txn_owner, a.txn_db, a.xid, a.pid, a.txn_start, a.lock_mode, a.relation as table_id,nvl(trim(c."name"),d.relname) as tablename, a.granted,b.pid as blocking_pid ,datediff(s,a.txn_start,getdate())/86400||' days '||datediff(s,a.txn_start,getdate())%86400/3600||' hrs '||datediff(s,a.txn_start,getdate())%3600/60||' mins '||datediff(s,a.txn_start,getdate())%60||' secs' as txn_duration
from svv_transactions a
left join (select pid,relation,granted from pg_locks group by 1,2,3) b
on a.relation=b.relation and a.granted='f' and b.granted='t'
left join (select * from stv_tbl_perm where slice=0) c
on a.relation=c.id
left join pg_class d on a.relation=d.oid
where a.relation is not null;
read AWS knowledge base entry for more details https://aws.amazon.com/premiumsupport/knowledge-center/prevent-locks-blocking-queries-redshift/
I have a simple script that counts form leads and displays the counts by month and year. It worked fine until I upgraded to MySQL 5.7. Now I get this error:
There was an error running the query [Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'form.form_25.submission_date' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by]
My query is:
SELECT YEAR(`submission_date`) AS yr,
MONTH(`submission_date`) AS mth,
DATE_FORMAT(`submission_date`,'%M %Y') AS display_date,
COUNT(*) AS leadcount
FROM form_25
WHERE `submission_date` >= CURRENT_DATE - INTERVAL 1 YEAR
GROUP BY yr,mth
ORDER BY yr DESC, mth DESC
I realize this is because only_full_group_by is enabled, but I don't want to disable it.
I've researched this problem, but it seems like all of the suggested solutions are about grouping by a unique column. That isn't a solution in this case because grouping by my primary column does not display the lead counts properly.
Thanks in advance for your help.
Okay, I figured out a solution that is good enough for my purposes. I discovered that the error only happens when this line is included:
DATE_FORMAT(`submission_date`,'%M %Y') AS display_date,
So I removed that line and recreated the display_date variable in PHP by using the yr and mth aliases.
I am running Postgres 9.1.3 32-bit on Windows 7 x64. (Have to use 32 bit because there is no Windows PostGIS release compatible with 64 bit Postgres.) (EDIT: As of PostGIS 2.0, it is compatible with Postgres 64 bit on windows.)
I have a query that left joins a table (consistent.master) with a temporary table, then inserts the resulting data into a third table (consistent.masternew).
Since this is a left join, the resulting table should have the same number of rows as the left table in the query. However, if I run this:
SELECT count(*)
FROM consistent.master
I get 2085343. But if I run this:
SELECT count(*)
FROM consistent.masternew
I get 2085703.
How can masternew have more rows than master? Shouldn't masternew have the same number of rows as master, the left table in the query?
Below is the query. The master and masternew tables should be identically-structured.
--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
SELECT citation_id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom)
--Here's the select query that the insert statement is using.
SELECT stops.stop,
master.citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
In case it matters, I have run a VACUUM FULL ANALYZE and reindexed both tables. (Not sure of exact commands; did it through pgAdmin III.)
A left join does not necessarily have the same number of rows as the number of rows in the left table. Basically, it is like a normal join, except rows of the left table that would not appear in the normal join are also added. So, if you have more than one row in the right table that matches one row in the left table, you can have more rows in your results than the number of rows of the left table.
In order to do what you want to do, you should use a group by, and a count to detect multiples.
select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1
Sometimes you know there are multiples, but don't care. You just want to take the first or top entry.
If so, you can use SELECT DISTINCT ON:
FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id
Where citation_id is the column that you want to take the first (any) row for each match.
You might want to ensure this is deterministic and use ORDER BY with some other orderable column:
SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at
I can't paste in the entire SQL for various reasons, so consider this example:
select *
from
(select nvl(get_quantity(1), 10) available_qty
from dual)
where available_qty > 30;
get_quantity is a function that makes a calculation based on the ID of a record that's passed through it. If it returns null, I use nvl() to force it to 10.
The query runs very slow when I use the WHERE clause in the parent query. When I comment out the WHERE clause, however, it runs very fast. What I don't get is why it can display the data very fast, but it can't query it just as fast. I am querying the results of a subquery, too. I was under the impression that subqueries return a "rendered" dataset. It's almost as if querying the available_qty identifier is causing it to reference something within the subquery.
This is why I don't think the contents of the get_quantity function are relevant here, so I didn't bother posting it. Instead, I think it's a misunderstanding on my part of how Oracle handles subqueries and whatnot.
Do any of you Oracle gurus have any idea what I am doing wrong?
Afterthought: as I was entering tags for this question, the tag "correlated subquery" came up. In doing some quick research, it seems that this type of subquery somewhat depends on the outer query. Could this be related to my problem?
Let's try an experiment. First we'll run the following query:
select lvl, rnd
from (select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd from dual) b;
The "a" subquery will return 5 rows with values from 1 to 5. The "b" subquery will return one row with a random value. If the function is run before the two tables are join (by Cartesian), the same random value will be returned for each row. The actual results:
LVL RND
---------- ----------
1 .417932089
2 .963531718
3 .617016889
4 .128395638
5 .069405568
5 rows selected.
Clearly the function was run for each of the joined rows, not for the subquery before the join. This is a result of Oracle's optimizer deciding that the best path for the query is to do things in that order. To prevent this, we have to add something to the second subquery that will make Oracle run the subquery in it's entirety before performing the join. We'll add rownum to the subquery, since Oracle knows rownum will change if it's run after the join. The following query demonstrates this:
select lvl, rnd from (
select level as lvl from dual connect by level <= 5) a,
(select dbms_random.value() rnd, rownum from dual) b;
As you can see from the results, the function was only run once in this case:
LVL RND
---------- ----------
1 .028513902
2 .028513902
3 .028513902
4 .028513902
5 .028513902
5 rows selected.
In your case, it seems likely that the filter provided by the where clause is making the optimizer take a different path, where it's running the function repeatedly, rather than once. By making Oracle run the subquery as written, you should get more consistent run-times.
What about the following is not proper syntax for Postgresql?
select p.*, SUM(vote) as votes_count
FROM votes v, posts p
where p.id = v.`voteable_id`
AND v.`voteable_type` = 'Post'
group by v.voteable_id
order by votes_count DESC limit 20
I am in the process of installing postgresql locally but wanted to get this out sooner :)
Thank you
MySQL is a lot looser in its interpretation of standard SQL than PostgreSQL is. There are two issues with your query:
Backtick quoting is a MySQL thing.
Your GROUP BY is invalid.
The first one can be fixed by simply removing the offending quotes. The second one requires more work; from the fine manual:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
This means that every column mentioned in your SELECT either has to appear in an aggregate function or in the GROUP BY clause. So, you have to expand your p.* and make sure that all those columns are in the GROUP BY, you should end up with something like this but with real columns in place of p.column...:
select p.id, p.column..., sum(v.vote) as votes_count
from votes v, posts p
where p.id = v.voteable_id
and v.voteable_type = 'Post'
group by p.id, p.column...
order by votes_count desc
limit 20
This is a pretty common problem when moving from MySQL to anything else.