JPA: Group by and select function in Postgres - postgresql

I have a simple reporting query group by id and day that looks like the following:
select id,
avg(case when name = 'temp' then value end) as average_temp,
DATE_TRUNC('day', timestamp) as day
from data
group by id, day
order by id;
The query basically needs to show the average daily temperature for each asset.
The user is able to specify a bunch of different aggregation functions beyond just 'average', the above is only a simple example. For example, avg temp, max temp, max speed, etc.
I'm trying to translate that into JPA as follows:
CriteriaQuery<Object[]> query = criteriaBuilder.createQuery(Object[].class);
Root<AssetMetricDataPoint> root = query.from(Data.class);
List<Selection<?>> selectionList = getSelections(aggregationQuery, criteriaBuilder, root);
Expression<Instant> groupDate = criteriaBuilder.function("date_trunc", Instant.class, criteriaBuilder.literal("day"), root.get("timestamp"));
selectionList.add(groupDate.alias("day"));
query.multiselect(selectionList);
query.where(getWherePredicates(aggregationQuery, criteriaBuilder, root));
query.orderBy(getOrderBy(aggregationQuery, criteriaBuilder, root));
query.groupBy(root.get("id"), groupDate);
return this.setupPagination(entityManager.createQuery(query), aggregationQuery);
I'm using criteriaBuilder.function to group by the date. However, when I execute the query using JPA I get the following exception:
org.postgresql.util.PSQLException: ERROR: column "data0_.timestamp" must appear in the GROUP BY clause or be used in an aggregate function
This appears to occur because the query is parametized and Postgres doesn't realize that the 'day' parameter that appears in both the select and group by clauses are the same.
Is there any way around this. Can I somehow bake in the 'day' value so it's not sent a parameter? Or some other method?

In the end there's a relatively solution to the problem. Rather than grouping by the expression, I thought I'd try to group by the alias instead: criteriaBuilder.literal("day"). This didn't work, however, with Postgres complaining about a non-integer literal.
I then realised I could group by a positional integer instead, which in my case ended up looking like:
query.groupBy(root.get("id"), criteriaBuilder.literal(selectionList.size()));
This works as expected.

Related

SQL: Change the datetime to the exact string returned

See below for what is returned in my automated test for this query:
Select visit_date
from patient_visits
where patient_id = '50'
AND site_id = '216'
ORDER by patient_id
DESC LIMIT 1
08:52:48.406 DEBUG Executing : Select visit_date from patient_visits
where patient_id = '50' AND site_id = '216' ORDER by patient_id DESC
LIMIT 1 08:52:48.416 TRACE Return: [(datetime.date(2017, 2, 17),)]
When i run this in workbench i get
2017-02-17
How can i make the query return this instead of the datetime.date bit above. Some formatting needed?
What you got from the database is python's datetime.date object - and that happens due to the db connector drivers casting the DB records to the their corresponding python counterparts. Trust me, it's much better this way than plain strings the user would have to parse and cast himself later.
Imaging the result of this query is stored in a variable ${record}, there are a couple of things to get to it, in the form you want.
First, the response is (pretty much always) a list of tuples; as in your case it will always be a single record, go for the 1st list member, and its first tuple member:
${the_date}= Set Variable ${record[0][0]}
Now {the_date} is the datetime.date object; there are at least two ways to get its string representation.
1) With strftime() (the pythonic way):
${the_date_string}= Evaluate $the_date.strftime('%Y-%m-%d') datetime
here's a link for the strftime's directives
2) Using the fact it's a regular object, access its attributes and construct the result as you'd like:
${the_date_string}= Set Variable ${the_date.year}-${the_date.month}-${the_date.day}
Note that this ^ way, you'd most certainly loose the leading zeros in the month and day.

Converting complex query with inner join to tableau

I have a query like this, which we use to generate data for our custom dashboard (A Rails app) -
SELECT AVG(wait_time) FROM (
SELECT TIMESTAMPDIFF(MINUTE,a.finished_time,b.start_time) wait_time
FROM (
SELECT max(start_time + INTERVAL avg_time_spent SECOND) finished_time, branch
FROM mytable
WHERE name IN ('test_name')
AND status = 'SUCCESS'
GROUP by branch) a
INNER JOIN
(
SELECT MIN(start_time) start_time, branch
FROM mytable
WHERE name IN ('test_name_specific')
GROUP by branch) b
ON a.branch = b.branch
HAVING avg_time_spent between 0 and 1000)t
GROUP BY week
Now I am trying to port this to tableau, and I am not being able to find a way to represent this data in tableau. I am stuck at how to represent the inner group by in a calculated field. I can also try to just use a custom sql data source, but I am already using another data source.
columns in mytable -
start_time
avg_time_spent
name
branch
status
I think this could be achieved new Level Of Details formulas, but unfortunately I am stuck at version 8.3
Save custom SQL for rare cases. This doesn't look like a rare case. Let Tableau generate the SQL for you.
If you simply connect to your table, then you can usually write calculated fields to get the information you want. I'm not exactly sure why you have test_name in one part of your query but test_name_specific in another, so ignoring that, here is a simplified example to a similar query.
If you define a calculated field called worst_case_test_time
datediff(min(start_time), dateadd('second', max(start_time), avg_time_spent)), which seems close to what your original query says.
It would help if you explained what exactly you are trying to compute. It appears to be some sort of worst case bound for avg test time. There may be an even simpler formula, but its hard to know without a little context.
You could filter on status = "Success" and avg_time_spent < 1000, and place branch and WEEK(start_time) on say the row and column shelves.
P.S. Your query seems a little off. Don't you need an aggregation function like MAX or AVG after the HAVING keyword?

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.

JPQL Group By not working

This is my simple JPQL:
SELECT s
FROM Site s
GROUP BY s.siteType
siteResult = q.getResultList();
for (Site site : siteResult) {
// loops all sites
}
This query returns all sites, including sites of the same siteType.
I'm using JPA 2.0 Eclipselink.
Whats wrong here?
Such a query does not make sense. If you use GROUP BY, other attributes in SELECT should be aggregated. As it is said in JPA specification:
The requirements for the SELECT clause when GROUP BY is used follow
those of SQL: namely, any item that appears in the SELECT clause
(other than as an aggregate function or as an argument to an aggregate
function) must also appear in the GROUP BY clause. In forming the
groups, null values are treated as the same for grouping purposes.
If you think SQL counterpart of your query:
SELECT s.attr1, attr2, s.siteType
FROM site s
GROUP BY (s.siteType)
you notice that it is hard to imagine which possible value of attr1 and attr2 should be chosen.
In such a case EclipseLink with derby just drops GROUP BY away from the query, which is of course little bit questionable way to handle invalid JPQL. I like more how Hibernate+MySQL behaves with such a invalid JPQL, it fails with quite clear error message:
java.sql.SQLSyntaxErrorException: The SELECT list of a grouped query
contains at least one invalid expression. If a SELECT list has a GROUP
BY, the list may only contain valid grouping expressions and valid
aggregate expressions.
Answer to comment:
One Site contains probably also attributes other than siteType as well. Lets use following example:
public class Site {
int id;
String siteType;
}
and two instances: (id=1, siteType="same"), (id=2, siteType="same"). Now when type of select is Site itself (or all attributes of it) and you make group by by siteType, it is impossible to define should result have one with id value 1 or 2. Thats why you have to use some aggregate function (like AVG, which gives you average of attribute values) for remaining attributes (id in our case).
Behind this link: ObjectDB GROUP BY you can find some examples with GROUP BY and aggregates.

How to avoid T-SQL function being called more times when needing combined results?

I have two T-SQL scalar functions that both perform calculations over large sums of data (taking 'a lot' of time) and return a value, e.g. CalculateAllIncomes(EmployeeID) and CalculateAllExpenditures(EmployeeID).
I run a select statement that calls these and returns results for each Employee. I also need the balance of each employee calculated as AllIncomes-AllExpenditures.
I have a function GetBalance(EmployeeID) that calls the two above mentioned functions and returns the result {CalculateAllIncomes(EmployeeID) - CalculateAllExpenditures(EmployeeID)}. But if I do:
Select CalculateAllIncomes(EmployeeID), CalculateAllExpenditures(EmployeeID), GetBalance(EmployeeID) .... the functions CalcualteAllIncomes() and CalculateAllExpenditures get called twice (once explicitly and once inside the GetBalance funcion) and so the resulting query takes twice as long as it should.
I'd like to find some better solution. I tried:
select alculateAllIncomes(EmployeeID), AS Incomes, CalculateAllExpenditures
(EmployeeID) AS Expenditures, (Incomes - Expenditures) AS Balance....
but it throws errors:
Invalid column name Incomes and
Invalid column name Expenditures.
I'm sure there has to be a simple solution, but I cannot figure it out. For some reason it seems that I am not able to use column Aliases in the SELECT clause. Is it so? And if so, what could be the workaround in this case?
Thanks for any suggestions.
Forget function calls: you can probably do it everything in one normal query.
Function calls misused (trying for OO encapsulation) force you into this situation. In addition, if you have GetBalance(EmployeeID) per row in the Employee table then you are CURSORing over the table. And you've now compounded this by multiple calls too.
What you need is something like this:
;WITH cSUMs AS
(
SELECT
SUM(CASE WHEN type = 'Incomes' THEN SomeValue ELSE 0 END) AS Income),
SUM(CASE WHEN type = 'Expenditures' THEN SomeValue ELSE 0 END) AS Expenditure)
FROM
MyTable
WHERE
EmployeeID = #empID --optional for all employees
GROUP BY
EmployeeID
)
SELECT
Income, Expenditure, Income - Expenditure
FROM
cSUMs
I once got a query down from a weekend to under a second by eliminating this kind of OO thinking from a bog standard set based aggregate query.