How to avoid T-SQL function being called more times when needing combined results? - tsql

I have two T-SQL scalar functions that both perform calculations over large sums of data (taking 'a lot' of time) and return a value, e.g. CalculateAllIncomes(EmployeeID) and CalculateAllExpenditures(EmployeeID).
I run a select statement that calls these and returns results for each Employee. I also need the balance of each employee calculated as AllIncomes-AllExpenditures.
I have a function GetBalance(EmployeeID) that calls the two above mentioned functions and returns the result {CalculateAllIncomes(EmployeeID) - CalculateAllExpenditures(EmployeeID)}. But if I do:
Select CalculateAllIncomes(EmployeeID), CalculateAllExpenditures(EmployeeID), GetBalance(EmployeeID) .... the functions CalcualteAllIncomes() and CalculateAllExpenditures get called twice (once explicitly and once inside the GetBalance funcion) and so the resulting query takes twice as long as it should.
I'd like to find some better solution. I tried:
select alculateAllIncomes(EmployeeID), AS Incomes, CalculateAllExpenditures
(EmployeeID) AS Expenditures, (Incomes - Expenditures) AS Balance....
but it throws errors:
Invalid column name Incomes and
Invalid column name Expenditures.
I'm sure there has to be a simple solution, but I cannot figure it out. For some reason it seems that I am not able to use column Aliases in the SELECT clause. Is it so? And if so, what could be the workaround in this case?
Thanks for any suggestions.

Forget function calls: you can probably do it everything in one normal query.
Function calls misused (trying for OO encapsulation) force you into this situation. In addition, if you have GetBalance(EmployeeID) per row in the Employee table then you are CURSORing over the table. And you've now compounded this by multiple calls too.
What you need is something like this:
;WITH cSUMs AS
(
SELECT
SUM(CASE WHEN type = 'Incomes' THEN SomeValue ELSE 0 END) AS Income),
SUM(CASE WHEN type = 'Expenditures' THEN SomeValue ELSE 0 END) AS Expenditure)
FROM
MyTable
WHERE
EmployeeID = #empID --optional for all employees
GROUP BY
EmployeeID
)
SELECT
Income, Expenditure, Income - Expenditure
FROM
cSUMs
I once got a query down from a weekend to under a second by eliminating this kind of OO thinking from a bog standard set based aggregate query.

Related

grouping multiple queries into a single one, with Postgres

I have a very simple query:
SELECT * FROM someTable
WHERE instrument = '{instrument}' AND ts >= '{fromTime}' AND ts < '{toTime}'
ORDER BY ts
That query is applied to 3 tables across 2 databases.
I receive a list of rows that have timestamps (ts). I take the last timestamp and it serves as the basis for the 'fromTime' of the next iteration. toTime is usually equal to 'now'.
This allows me to only get new rows at every iteration.
I have about 30 instrument types and I need an update every 1s.
So that's 30 instruments * 3 queries = 90 queries per second.
How can I rewrite the query so that I could use a function like this:
getData table [(instrument, fromTime) list] toTime
and get back some dictionary, in the form:
Dictionary<instrument, MyDataType list>
To use a list of instruments, I could do something like:
WHERE instrument in '{instruments list}'
but this wouldn't help with the various fromTime as there is one value per instrument.
I could take the min of all fromTime values, get the data for all instruments and then filter the results out, but that's wasteful since I could potentially query a lot of data to throw is right after.
What is the right strategy for this?
So there is a single toTime to test against per query, but a different fromTime per instrument.
One solution to group them in a single query would be to pass a list of (instrument, fromTime) couples as a relation.
The query would look like this:
SELECT [columns] FROM someTable
JOIN (VALUES
('{name of instrument1}', '{fromTime for instrument1}'),
('{name of instrument2}', '{fromTime for instrument2}'),
('{name of instrument3}', '{fromTime for instrument3}'),
...
) AS params(instrument, fromTime)
ON someTable.instrument = params.instrument AND someTable.ts >= params.fromTime
WHERE ts < 'toTime';
Depending on your datatypes and what method is used by the client-side driver
to pass parameters, you may have to be explicit about the datatype of
your parameters by casting the first value of the list, as in, for
example:
JOIN (VALUES
('name of instrument1', '{fromTime for instrument1}'::timestamptz),
If you had much more than 30 values, a variant of this query with arrays as parameters (instead of the VALUES clause) could be preferred. The difference if that it would take 3 parameters: 2 arrays + 1 upper bound, instead of N*2+1 parameters. But it depends on the ability of the client-side driver to support Postgres arrays as a datatype, and the ability to pass them as a single value.

JPA: Group by and select function in Postgres

I have a simple reporting query group by id and day that looks like the following:
select id,
avg(case when name = 'temp' then value end) as average_temp,
DATE_TRUNC('day', timestamp) as day
from data
group by id, day
order by id;
The query basically needs to show the average daily temperature for each asset.
The user is able to specify a bunch of different aggregation functions beyond just 'average', the above is only a simple example. For example, avg temp, max temp, max speed, etc.
I'm trying to translate that into JPA as follows:
CriteriaQuery<Object[]> query = criteriaBuilder.createQuery(Object[].class);
Root<AssetMetricDataPoint> root = query.from(Data.class);
List<Selection<?>> selectionList = getSelections(aggregationQuery, criteriaBuilder, root);
Expression<Instant> groupDate = criteriaBuilder.function("date_trunc", Instant.class, criteriaBuilder.literal("day"), root.get("timestamp"));
selectionList.add(groupDate.alias("day"));
query.multiselect(selectionList);
query.where(getWherePredicates(aggregationQuery, criteriaBuilder, root));
query.orderBy(getOrderBy(aggregationQuery, criteriaBuilder, root));
query.groupBy(root.get("id"), groupDate);
return this.setupPagination(entityManager.createQuery(query), aggregationQuery);
I'm using criteriaBuilder.function to group by the date. However, when I execute the query using JPA I get the following exception:
org.postgresql.util.PSQLException: ERROR: column "data0_.timestamp" must appear in the GROUP BY clause or be used in an aggregate function
This appears to occur because the query is parametized and Postgres doesn't realize that the 'day' parameter that appears in both the select and group by clauses are the same.
Is there any way around this. Can I somehow bake in the 'day' value so it's not sent a parameter? Or some other method?
In the end there's a relatively solution to the problem. Rather than grouping by the expression, I thought I'd try to group by the alias instead: criteriaBuilder.literal("day"). This didn't work, however, with Postgres complaining about a non-integer literal.
I then realised I could group by a positional integer instead, which in my case ended up looking like:
query.groupBy(root.get("id"), criteriaBuilder.literal(selectionList.size()));
This works as expected.

SQL: Change the datetime to the exact string returned

See below for what is returned in my automated test for this query:
Select visit_date
from patient_visits
where patient_id = '50'
AND site_id = '216'
ORDER by patient_id
DESC LIMIT 1
08:52:48.406 DEBUG Executing : Select visit_date from patient_visits
where patient_id = '50' AND site_id = '216' ORDER by patient_id DESC
LIMIT 1 08:52:48.416 TRACE Return: [(datetime.date(2017, 2, 17),)]
When i run this in workbench i get
2017-02-17
How can i make the query return this instead of the datetime.date bit above. Some formatting needed?
What you got from the database is python's datetime.date object - and that happens due to the db connector drivers casting the DB records to the their corresponding python counterparts. Trust me, it's much better this way than plain strings the user would have to parse and cast himself later.
Imaging the result of this query is stored in a variable ${record}, there are a couple of things to get to it, in the form you want.
First, the response is (pretty much always) a list of tuples; as in your case it will always be a single record, go for the 1st list member, and its first tuple member:
${the_date}= Set Variable ${record[0][0]}
Now {the_date} is the datetime.date object; there are at least two ways to get its string representation.
1) With strftime() (the pythonic way):
${the_date_string}= Evaluate $the_date.strftime('%Y-%m-%d') datetime
here's a link for the strftime's directives
2) Using the fact it's a regular object, access its attributes and construct the result as you'd like:
${the_date_string}= Set Variable ${the_date.year}-${the_date.month}-${the_date.day}
Note that this ^ way, you'd most certainly loose the leading zeros in the month and day.

Converting complex query with inner join to tableau

I have a query like this, which we use to generate data for our custom dashboard (A Rails app) -
SELECT AVG(wait_time) FROM (
SELECT TIMESTAMPDIFF(MINUTE,a.finished_time,b.start_time) wait_time
FROM (
SELECT max(start_time + INTERVAL avg_time_spent SECOND) finished_time, branch
FROM mytable
WHERE name IN ('test_name')
AND status = 'SUCCESS'
GROUP by branch) a
INNER JOIN
(
SELECT MIN(start_time) start_time, branch
FROM mytable
WHERE name IN ('test_name_specific')
GROUP by branch) b
ON a.branch = b.branch
HAVING avg_time_spent between 0 and 1000)t
GROUP BY week
Now I am trying to port this to tableau, and I am not being able to find a way to represent this data in tableau. I am stuck at how to represent the inner group by in a calculated field. I can also try to just use a custom sql data source, but I am already using another data source.
columns in mytable -
start_time
avg_time_spent
name
branch
status
I think this could be achieved new Level Of Details formulas, but unfortunately I am stuck at version 8.3
Save custom SQL for rare cases. This doesn't look like a rare case. Let Tableau generate the SQL for you.
If you simply connect to your table, then you can usually write calculated fields to get the information you want. I'm not exactly sure why you have test_name in one part of your query but test_name_specific in another, so ignoring that, here is a simplified example to a similar query.
If you define a calculated field called worst_case_test_time
datediff(min(start_time), dateadd('second', max(start_time), avg_time_spent)), which seems close to what your original query says.
It would help if you explained what exactly you are trying to compute. It appears to be some sort of worst case bound for avg test time. There may be an even simpler formula, but its hard to know without a little context.
You could filter on status = "Success" and avg_time_spent < 1000, and place branch and WEEK(start_time) on say the row and column shelves.
P.S. Your query seems a little off. Don't you need an aggregation function like MAX or AVG after the HAVING keyword?

COUNT(field) returns correct amount of rows but full SELECT query returns zero rows

I have a UDF in my database which basically tries to get a station (e.g. bus/train) based on some input data (geographic/name/type). Inside this function i try to check if there are any rows matching the given values:
SELECT
COUNT(s.id)
INTO
firsttry
FROM
geographic.stations AS s
WHERE
ST_DWithin(s.the_geom,plocation,0.0017)
AND
s.name <-> pname < 0.8
AND
s.type ~ stype;
The firsttry variable now contains the value 1. If i use the following (slightly extended) SELECT statement i get no results:
RETURN query SELECT
s.id, s.name, s.type, s.the_geom,
similarity(
regexp_replace(s.name::text,'(Hauptbahnhof|Hbf)','Hbf'),
regexp_replace(pname::text,'(Hauptbahnhof|Hbf)','Hbf')
)::double precision AS sml,
st_distance(s.the_geom,plocation) As dist from geographic.stations AS s
WHERE ST_DWithin(s.the_geom,plocation,0.0017) and s.name <-> pname < 0.8
AND s.type ~ stype
ORDER BY dist asc,sml desc LIMIT 1;
the parameters are as follows:
stype = '^railway'
pname = 'Amsterdam Science Park'
plocation = ST_GeomFromEWKT('SRID=4326;POINT(4.9492530 52.3531670)')
the tuple i need to be returned is:
id name type geom (displayed as ST_AsText)
909658;"Amsterdam Sciencepark";"railway_station";"POINT(4.9482893 52.352904)"
The same UDF returns quite well for a lot of other stations, but this is one (of more) which just won't work. Any suggestions?
P.S. The use of the <-> operator is coming from the pg_trgm module.
Some ideas on how to troubleshoot this:
Break your troubleshooting into steps. Start with the simplest query possible. No aggregates, just joins and no filters. Then add filters. Then add order by, then add aggregates. Look at exactly where the change occurs.
Try reindexing the database.
One possibility that occurs to me based on this is that it could be a corrupted index used in the second query but not the first. I have seen corrupted indexes in the past and usually they throw errors but at least in theory they should be able to create a problem like this.
If this is correct, your query will suddenly return rows if you remove the ORDER BY clause.
If you have a corrupted index, then you need to pay close attention to hardware. Is the RAM ECC? Is the processor overheating? How are you disks doing?
A second possibility is that there is a typo on a join condition of filter statement. Normally this is something I would suspect first but it is easy enough to weed out index problems to start there. If removing the ORDER BY doesn't change things, then chances are it is a typo. If you can't find a typo, then try reindexing.