Rails PG GroupingError column must appear in the GROUP BY clause - postgresql

there are a few topics about this already with accepted answers but I couldn't figure out a solution based on those:
Eg:
Ruby on Rails: must appear in the GROUP BY clause or be used in an aggregate function
GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function
PGError: ERROR: column "p.name" must appear in the GROUP BY clause or be used in an aggregate function
My query is:
Idea.unscoped.joins('inner join likes on ideas.id = likes.likeable_id').
select('likes.id, COUNT(*) AS like_count, ideas.id, ideas.title, ideas.intro, likeable_id').
group('likeable_id').
order('like_count DESC')
This is fine in development with sqlite but breaks on heroku with PostgreSQL.
The error is:
PG::GroupingError: ERROR: column "likes.id" must appear in the GROUP BY clause or be used in an aggregate function
If I put likes.id in my group by then the results make no sense. Tried to put group before select but doesn't help. I even tried to take the query into two parts. No joy. :(
Any suggestions appreciated. TIA!

I don't know why you want to select likes.id in the first place. I see that you basically want the like_count for each Idea; I don't see the point in selecting likes.id. Also, when you already have the ideas.id, I don't see why you would want to get the value of likes.likeable_id since they'll both be equal. :/
Anyway, the problem is since you're grouping by likeable_id (basically ideas.id), you can't "select" likes.id since they would be "lost" by the grouping.
I suppose SQLite is lax about this. I imagine it wouldn't group things properly.
ANYWAY(2) =>
Let me propose a cleaner solution.
# model
class Idea < ActiveRecord::Base
# to save you the effort of specifying the join-conditions
has_many :likes, foreign_key: :likeable_id
end
# in your code elsewhere
ideas = \
Idea.
joins(:likes).
group("ideas.id").
select("COUNT(likes.id) AS like_count, ideas.id, ideas.title, ideas.intro").
order("like_count DESC")
If you still want to get the IDs of likes for each item, then after the above, here's what you could do:
grouped_like_ids = \
Like.
select(:id, :likeable_id).
each_with_object({}) do |like, hash|
(hash[like.likeable_id] ||= []) << like.id
end
ideas.each do |idea|
# selected previously:
idea.like_count
idea.id
idea.title
idea.intro
# from the hash
like_ids = grouped_like_ids[idea.id] || []
end
Other readers: I'd be very interested in a "clean" one-query non-sub-query solution. Let me know in the comments if you leave a response. Thanks.

Related

Converting complex query with inner join to tableau

I have a query like this, which we use to generate data for our custom dashboard (A Rails app) -
SELECT AVG(wait_time) FROM (
SELECT TIMESTAMPDIFF(MINUTE,a.finished_time,b.start_time) wait_time
FROM (
SELECT max(start_time + INTERVAL avg_time_spent SECOND) finished_time, branch
FROM mytable
WHERE name IN ('test_name')
AND status = 'SUCCESS'
GROUP by branch) a
INNER JOIN
(
SELECT MIN(start_time) start_time, branch
FROM mytable
WHERE name IN ('test_name_specific')
GROUP by branch) b
ON a.branch = b.branch
HAVING avg_time_spent between 0 and 1000)t
GROUP BY week
Now I am trying to port this to tableau, and I am not being able to find a way to represent this data in tableau. I am stuck at how to represent the inner group by in a calculated field. I can also try to just use a custom sql data source, but I am already using another data source.
columns in mytable -
start_time
avg_time_spent
name
branch
status
I think this could be achieved new Level Of Details formulas, but unfortunately I am stuck at version 8.3
Save custom SQL for rare cases. This doesn't look like a rare case. Let Tableau generate the SQL for you.
If you simply connect to your table, then you can usually write calculated fields to get the information you want. I'm not exactly sure why you have test_name in one part of your query but test_name_specific in another, so ignoring that, here is a simplified example to a similar query.
If you define a calculated field called worst_case_test_time
datediff(min(start_time), dateadd('second', max(start_time), avg_time_spent)), which seems close to what your original query says.
It would help if you explained what exactly you are trying to compute. It appears to be some sort of worst case bound for avg test time. There may be an even simpler formula, but its hard to know without a little context.
You could filter on status = "Success" and avg_time_spent < 1000, and place branch and WEEK(start_time) on say the row and column shelves.
P.S. Your query seems a little off. Don't you need an aggregation function like MAX or AVG after the HAVING keyword?

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

same query, two different ways, vastly different performance

I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query via DBD::Pg, I get wildly different results.
$q .= '%';
## query 1
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '$q'
};
my $sth1 = $dbh->prepare($sql);
$sth1->execute();
## query 2
my $sth2 = $dbh->prepare(qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE ?
});
$sth2->execute($q);
query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index.
Would love hear why.
With LIKE expressions, b-tree indexes can only be used if the search pattern is left-anchored, i.e. terminated with %. More details in the manual.
Thanks to #evil otto for the link. This link to the current version.
Your first query provides this essential information at prepare time, so the query planner can use a matching index.
Your second query does not provide any information about the pattern at prepare time, so the query planner cannot use any indexes.
I suspect that in the first case the query compiler/optimizer detects that the clause is a constant, and can build an optimal query plan. In the second it has to compile a more generic query because the bound variable can be anything at run-time.
Are you running both test cases from same file using same $dbh object?
I think reason of increasing speed in second case is that you using prepared statement which is already parsed(but maybe I wrong:)).
Ahh, I see - I will drop out after this comment since I don't know Perl. But I would trust that the editor is correct in highlighting the $q as a constant. I'm guessing that you need to concatenate the value into the string, rather than just directly referencing the variable. So, my guess is that if + is used for string concatenation in perl, then use something like:
my $sql = qq{
SELECT a, b, c
FROM t
WHERE Lower( a ) LIKE '
} + $q + qq{'};
(Note: unless the language is tightly integrated with the database, such as Oracle/PLSQL, you usually have to create a completely valid SQL string before submitting to the database, instead of expecting the compiler to 'interpolate'/'Substitute' the value of the variable.)
I would again suggest that you get the COUNT() of the statements, to make sure that you are comparing apple to apples.
I don't know Postgres at all, but I think in Line 7 (WHERE Lower( a ) LIKE '$q'
), $q is actually a constant. It looks like your editor thinks so too, since it is highlighted in red. You probably still need to use the ? for the variable.
To test, do a COUNT(*), and make sure they match - I could be way offbase.

left join in zend framework

I am new in ZF and i would like to left join a table named country on country.id=firm_dtl.firm_country
$firmobj->fetchAll($firmobj->select($this)->where("firm_name like '$alpha%'")->order('firm_name'));
How can i do this.
I am trying with this code :-
$firmobj->select($this)->joinLeft(array("c"=>"country"), "c.id = firm_dtl.firm_country","c.name")->where("firm_name like '$alpha%'")->order('firm_name');
Here are some things that you can try to get the left join working and also to improve security.
I usually build my select statements across many lines and so I like to put it in a variable. To debug, I simply comment out the lines that I don't need.
$select = $firmobj->select()->from('country');
You'll want to setIntegrityCheck(false) because you probably won't be changing and committing the results from the query. Here's a quote from the ZF documentation about it.
The Zend_Db_Table_Select is primarily used to constrain and validate
so that it may enforce the criteria for a legal SELECT query. However
there may be certain cases where you require the flexibility of the
Zend_Db_Table_Row component and do not require a writable or deletable
row. for this specific user case, it is possible to retrieve a row or
rowset by passing a FALSE value to setIntegrityCheck().
$select->setIntegrityCheck(false);
Here is where you join. You can replace field1, field2, fieldn with the fields in the firm_dtl table that you want to see in the results.
$select->joinLeft(array('c' => 'country'), 'c.id = firm_dtl.firm_country', array('field1', 'field2', 'fieldn'));
Use parameter substitution to avoid SQL injection attacks.
$select->where('firm_name LIKE ?', "$alpha%");
And finally order the results and fetch the row set.
$select->order('firm_name');
$rowSet = $firmobj->fetchAll($select);
The 3rd parameter of joinLeft function should be an array of columns you want to fetch.
$firmobj->select($this)
->joinLeft(array("c"=>"country"), "c.id = firm_dtl.firm_country", array("c.name"))
->where("firm_name like '$alpha%'")
->order('firm_name');
Additionally, the better way is to use where function this way:
->where("firm_name like ?", $alpha . "%")
This way is the safer solution.

How to do a SQL query using columns from a related table?

I've got three related SQL tables, simplified they look like this:
ShopTable
[ShopID]
ShelfTable
[ShelfID]
[ShopID]
InventoryTable
[ShelfID]
[Value]
[ShopID] and [ShelfID] are relations. Now what I want to do is get the SUM of [Value] for one [ShopID], but this obviously won't work since [ShopID] ain't part of InventoryTable:
SELECT SUM([Value]) WHERE [ShopID] = '1'
How do I have to write the query to filter the InventoryTable using the ShopID?
SELECT SUM(i.value)
FROM shelfTable s
JOIN inventoryTable i
ON i.shelfId = s.shelfId
WHERE s.shopId = 1
This is a fundamental question about relations between tables, so I'll provide some detail, hoping that you can use some of these ideas when writing SQL queries in the future.
Let's start with one basic thing first. [ShopID] could refer to two different but related columns, one in [ShopTable] and one in [ShelfTable]. The same things applies to [ShelfID]. It's useful to always specify the table.
You describe [ShopID] and [ShelfID] as "relations." As Damien_The_Unbeliever has commented, those columns are, in fact, two pairs of primary and foreign keys. That is, [ShelfTable].[ShelfID] identifies a "shelf" record, and [InventoryTable].[ShelfID] relates an "inventory item" (whatever that is) to a "shelf." (It's not always possible to interpret rows in a database this naively, but I'm willing to guess I'm not too far off from reality.)
Likewise, each "shelf" belongs to one "shop," and [ShelfTable].[ShopID] refers to that specific "shop." Notice that because we have the value of [ShopID] already (I'll call it "#MyShopID"), we don't even need the [ShopTable] here. We can just use [ShelfTable].[ShopID] to filter for the "shelves" we're interested in.
You're asking to get the sum total of [InventoryTable].[Value] for one [ShopID] value, but [ShopID] doesn't show up in [InventoryTable]. That's where your (inner) join comes into play. You know that you'll be adding up values from [InventoryTable], but you've got to specify the particular "shop." You specify #MyShopID for [ShelfTable].[ShelfID], which will do your filtering in [InventoryTable] for you.
One final thing before composing the query. I'm assuming that you haven't oversimplified your tables too much, and that [Value] is the total value of each "inventory item," and not just a unit value. If it wasn't, we'd have to multiply values by quantities, etc., but I'll let you check your own work here.
So, here's what we do:
We select FROM the [InventoryTable]
but we INNER JOIN to the [ShelfTable] on [ShelfID] from both tables
and we only want "shelves" from one "shop," i.e. WHERE [ShelfTable].[ShopID] = #MyShopID
and then we SELECT the SUM([InventoryTable].[Value])
and we're done. In SQL, let's remove the brackets, provide some table aliases, and we'll get a query that looks like this:
SELECT SUM(inv.Value)
FROM InventoryTable AS inv
INNER JOIN ShelfTable AS shf ON shf.ShelfID = inv.ShelfID
WHERE shf.ShopID = #MyShopID
;
Here are a few take-away points to consider. Notice we handled the FROM clause first. You'll always want to do that.
You'll also want a "driving table" to start with, in this case, [InventoryTable]. The other tables in your join add extra information and provide you a means to filter, but don't otherwise interfere with your summing up. More complex queries don't offer such an obvious luxury, but we're not getting too fancy here.
You'll also note, just briefly, that because [ShelfID] is a primary key in [ShelfTable], those [ShelfID]'s are unique values in [ShelfTable], and so each "inventory" thing belongs to a single "shelf." So the join won't cause us to double-count values. That's a good thing to remember when you're not dealing with primary and foreign keys, like we're doing here.
Hope that helps. And I hope I didn't come across as too pedantic.