Conditional OR in the SQL Server Join – Multi-Value Parameters - sql-server-2008-r2

I have an SSRS report with 4 parameters, two of which are multi-value parameters (#material and #color using VARCHAR(MAX) datatype in SQL Server 2008 R2). I am using a split function to return the value as a comma separated:
SELECT *
FROM MyView
WHERE height > 200
AND width > 100
AND (
material IN (SELECT Item FROM [dbo].[MySplitFunction] (#material, ',')) OR
color IN (SELECT Item FROM [dbo].[MySplitFunction] (#color, ','))
)
(The code above would return 50 records)
The problem with this approach is that these two multi-value parameters have around of 1,500 different colors and materials and degrade the performance. Sometimes, it takes more than 40 minutes to return the results (row count in the view around 600,000).
I tried a different approach where I used a temp table and used it in the JOIN instead of the WHERE clause:
SELECT Item
INTO #TempTable
FROM [dbo].[MySplitFunction] (#material, ',')
SELECT *
FROM MyView
INNER JOIN ON MyView.Item = #TempTable.Item
WHERE height > 200
AND width > 100
AND material IN (SELECT Item FROM [dbo].[MySplitFunction] (#material, ','))
(The code above would return 7 records only, but the performance is much better)
My question is how can I return the same number of records (50 rows) using the second approach by adding the other #color parameter and allowing the OR condition? So in the SSRS report, the user can multi select these two parameters and the query will return #material = values OR #color = Values.
I am open to any other approach as long as it speeds up the query and allows the OR condition for the two multi-value parameters (#material, #color).
Thanks!

Something like the following might do the trick. I'm not sure I have the syntax precisely right, and it wants further testing and analysis that I can't do without the proper structures and data...
SELECT
from MyVeiew
where height > 200
and width > 100
and (exists (select Item
from dbo.MySplitFunction(#material, ',')
where Item = material)
or exists (select Item
from dbo.MySplitFunction(#color, ',')
where Item = color)
)
This performs two correlated subqueries on nested function calls. Exists checks are generally faster than in lookups in these situations. The syntax bit that worries me is the "and (exists" bit -- that's the parenthesis for the OR clause, and combined with exists it looks a bit wonky.
I think it should do what you want, but testing is definitely called for.
I mistrust that or clause. To get rid of it, try this and see what happens:
SELECT * -- Better with specific columns
from MyView
where height > 200
and width > 100
and exists (select Item
from dbo.MySplitFunction(#material, ',')
where Item = material)
UNION select *
from MyView
where height > 200
and width > 100
and exists (select Item
from dbo.MySplitFunction(#color, ',')
where Item = color)
This runs and combines two queries, removing all duplicates -- pretty much the same as the OR clause would.
Next thing to check would be reviewing table sizes and checking indexes. You're filtering results on (only!) columns height, width, material, and color; if the table is huge, appropriate index would help here.

Related

OrientDB: Find Connected Components Values during the visit

I have schema with 3 main classes: Transaction , Address and ValueTx(Edge).
I am trying to find connected components within a range of time.
Now I am doing this query based on this one ( OrientDB: connected components OSQL query) :
SELECT distinct(traversedElement(0)) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044))
And this returns the rid of the 'head' of each trasversal and from it doing another DFS I can get every node and edge of the connected component I want to search about.
How can I, using the query above, also get the number of the transactions within the connected component and also the sum of their values? (The value of a tx is a property of the class Transaction)
I want to do something like:
SELECT distinct(traversedElement(0)) as head, count(Transaction), sum(valueTot) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) group by head
But of course is not working. I get only one row with the last head and the sum of all the transactions.
Thanks in advance.
Edit:
This is an example of what I'm looking for:
Connected Transactions
Every transaction there is within the same range of height:
Using my query ( the first one in my post) I get the rid of the first node of each group of transaction that are linked through several addresses.
example:
#15:27
#15:28
#15:30
#15:34
#15:35
#15:36
#15:37
#15:41
#15:47
#15:53
What I'm trying to get is a list of every first node with the total number of transactions (not addresses only the transaction) of the group it belongs to and the sum of the value of every Transaction (stored in valueTot inside the class transaction.
Edit2:
This is the dataset where I am making the tests:
The main problem is that I have a lot of data and the approach I was trying before (from every rid I make a different sql query) it's quite slow, I hope there is a faster way.
Edit3:
This is an updated sample db: Download
(note, it's way larger than the other)
select head, sum(valueTot) as valueTot, count(*) as numTx,sum(miner) as minerCount from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 0 and height <= 110000 ) while ( #class = 'Address' or (#class = 'Transaction' and height >= 0 and height <= 110000 )) ) where #class = 'Transaction' ) group by head
This query on my system takes around one minute, also if I limit the result set, so I think the problem maybe in the internal query that selects the transactions that isn't using the indexes... Do you have any idea?
You can use this query
select #rid, $a[0].sum as sumValueTot ,$a[0].count as countTransaction from Transaction
let $a = ( select sum(valueTot),count(*) from (TRAVERSE both('ValueTx') from $parent.$current) where #class="Transaction")
where height >= 402041 and height <= 402044
Hope it helps.
is this what are you looking for?
select head, sum(valueTot), count(*) from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) where #class = "Transaction") group by head

it is possible to concatenate one result set onto another in a single query?

I have a table of Verticals which have names, except one of them is called 'Other'. My task is to return a list of all Verticals, sorted in alpha order, except with 'Other' at the end. I have done it with two queries, like this:
String sqlMost = "SELECT * from core.verticals WHERE name != 'Other' order by name";
String sqlOther = "SELECT * from core.verticals WHERE name = 'Other'";
and then appended the second result in my code. Is there a way to do this in a single query, without modifying the table? I tried using UNION
(select * from core.verticals where name != 'Other' order by name)
UNION (select * from core.verticals where name = 'Other');
but the result was not ordered at all. I don't think the second query is going to hurt my execution time all that much, but I'm kind of curious if nothing else.
UNION ALL is the usual way to request a simple concatenation; without ALL an implicit DISTINCT is applied to the combined results, which often causes a sort. However, UNION ALL isn't required to preserve the order of the individual sub-results as a simple concatenation would; you'd need to ORDER the overall UNION ALL expression to lock down the order.
Another option would be to compute an integer order-override column like CASE WHEN name = 'Other' THEN 2 ELSE 1 END, and ORDER BY that column followed by name, avoiding the UNION entirely.

T-SQL Query to process data in batches without breaking groups

I am using SQL 2008 and trying to process the data I have in a table in batches, however, there is a catch. The data is broken into groups and, as I do my processing, I have to make sure that a group will always be contained within a batch or, in other words, that the group will never be split across different batches. It's assumed that the batch size will always be much larger than the group size. Here is the setup to illustrate what I mean (the code is using Jeff Moden's data generation logic: http://www.sqlservercentral.com/articles/Data+Generation/87901)
DECLARE #NumberOfRows INT = 1000,
#StartValue INT = 1,
#EndValue INT = 500,
#Range INT
SET #Range = #EndValue - #StartValue + 1
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
SELECT TOP (#NumberOfRows)
GroupID = ABS(CHECKSUM(NEWID())) % #Range + #StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
This will create a table with about 435 groups of records containing between 1 and 7 records in each. Now, let's say I want to process these records in batches of 100 records per batch. How can I make sure that my GroupID's don't get split between different batches? I am fine if each batch is not exactly 100 records, it could be a little more or a little less.
I appreciate any suggestions!
This will result in slightly smaller batches than 100 entries, it'll remove all groups that aren't entirely in the selection;
WITH cte AS (SELECT TOP 100 * FROM (
SELECT GroupID, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY GroupID) r
FROM #SomeTestTable) a
ORDER BY GroupID, r DESC)
SELECT c1.GroupID FROM cte c1
JOIN cte c2
ON c1.GroupID = c2.GroupID
AND c2.r = 1
It'll select the groups with the lowest GroupID's, limited to 100 entries into a common table expression along with the row number, then it'll use the row number to throw away any groups that aren't entirely in the selection (row number 1 needs to be in the selection for the group to be, since the row number is ordered descending before cutting with TOP).

Using two different rows from the same table in an expression

I'm using PostgreSQL + PostGIS.
In table I have a point and line geometry in the same column of the same table, in different rows. To get the line I run:
SELECT the_geom
FROM filedata
WHERE id=3
If i want to take point I run:
SELECT the_geom
FROM filedata
WHERE id=4
I want take point and line together, like they're shown in this WITH expression, but using a real query against the table instead:
WITH data AS (
SELECT 'LINESTRING (50 40, 40 60, 50 90, 30 140)'::geometry AS road,
'POINT (60 110)'::geometry AS poi)
SELECT ST_AsText(
ST_Line_Interpolate_Point(road, ST_Line_Locate_Point(road, poi))) AS projected_poi
FROM data;
You see in this example data comes from a hand-created WITH expression. I want take it from my filedata table. My problem is i dont know how to work with data from two different rows of one table at the same time.
One possible way:
A subquery to retrieve another value from a different row.
SELECT ST_AsText(
ST_Line_Interpolate_Point(
the_geom
,ST_Line_Locate_Point(
the_geom
,(SELECT the_geom FROM filedata WHERE id = 4)
)
)
) AS projected_poi
FROM filedata
WHERE id = 3;
Use a self-join:
SELECT ST_AsText(
ST_Line_Interpolate_Point(fd_road.the_geom, ST_Line_Locate_Point(
fd_road.the_geom,
fd_poi.the_geom
)) AS projected_poi
FROM filedata fd_road, filedata fd_poi
WHERE fd_road.id = 3 AND fd_poi.id = 4;
Alternately use a subquery to fetch the other row, as Erwin pointed out.
The main options for using multiple rows from one table in a single expression are:
Self-join the table with two different aliases as shown above, then filter the rows;
Use a subquery expression to get a value for all but one of the rows, as Erwin's answer shows;
Use a window function like lag() and lead() to get a row relative to the current row within the query result; or
JOIN on a subquery that returns a table
The latter two are more advanced options that solve problems that're difficult or inefficient to solve with the simpler self-join or subquery expression.

Postgres - Get data from each alias

In my application i have a query that do multiple joins with a table position. Just like this:
SELECT *
FROM (...) as trips
join trip as t on trips.trip_id = t.trip_id
left outer join vehicle as v on v.vehicle_id = t.trip_vehicle_id
left outer join position as start on trips.start_position_id = start.position_id and start.position_vehicle_id = v.vehicle_id
left outer join position as "end" on trips.end_position_id = "end".position_id and "end".position_vehicle_id = v.vehicle_id
left outer join position as last on trips.last_position_id = last.position_id and last.position_vehicle_id = v.vehicle_id;
My table position has 35 columns(for example position_id).
When I run the query, in result should appear the table position 3 times, start, end and last. But postgres can not distinguish between, for exemplar, start.position_id, end.position_id and last.position_id. So this 3 columns are group and appear as one, position_id.
As the data from start.position_id and end.position_id are different, the column, position_id, that appear in result, it's empty.
Without having to rename all the columns, like this: start.position_id as start_position_id.
How can i get each group of data separately, for exemple, get all columns from the table 'start'. In MYSQL i can do this operation by calling fetch_fields, and give the function an alias, like 'start'.
But i can i do this in Postgres?
Best Regards,
Nuno Oliveira
My understanding is that you can't (or find it difficult to) discern between which table each column with a shared name (such as "position_id") belongs to, but only need to see one of the sets of shared columns at any one time. If that is the case, use tablename.* in your SELECT, so SELECT trips.*, start.*... would show the columns from trips and start, but no columns from other tables involved in the join.
SELECT [...,] start.* [,...] FROM [...] atable AS start [...]