OrientDB: Find Connected Components Values during the visit - orientdb

I have schema with 3 main classes: Transaction , Address and ValueTx(Edge).
I am trying to find connected components within a range of time.
Now I am doing this query based on this one ( OrientDB: connected components OSQL query) :
SELECT distinct(traversedElement(0)) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044))
And this returns the rid of the 'head' of each trasversal and from it doing another DFS I can get every node and edge of the connected component I want to search about.
How can I, using the query above, also get the number of the transactions within the connected component and also the sum of their values? (The value of a tx is a property of the class Transaction)
I want to do something like:
SELECT distinct(traversedElement(0)) as head, count(Transaction), sum(valueTot) from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) group by head
But of course is not working. I get only one row with the last head and the sum of all the transactions.
Thanks in advance.
Edit:
This is an example of what I'm looking for:
Connected Transactions
Every transaction there is within the same range of height:
Using my query ( the first one in my post) I get the rid of the first node of each group of transaction that are linked through several addresses.
example:
#15:27
#15:28
#15:30
#15:34
#15:35
#15:36
#15:37
#15:41
#15:47
#15:53
What I'm trying to get is a list of every first node with the total number of transactions (not addresses only the transaction) of the group it belongs to and the sum of the value of every Transaction (stored in valueTot inside the class transaction.
Edit2:
This is the dataset where I am making the tests:
The main problem is that I have a lot of data and the approach I was trying before (from every rid I make a different sql query) it's quite slow, I hope there is a faster way.
Edit3:
This is an updated sample db: Download
(note, it's way larger than the other)
select head, sum(valueTot) as valueTot, count(*) as numTx,sum(miner) as minerCount from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 0 and height <= 110000 ) while ( #class = 'Address' or (#class = 'Transaction' and height >= 0 and height <= 110000 )) ) where #class = 'Transaction' ) group by head
This query on my system takes around one minute, also if I limit the result set, so I think the problem maybe in the internal query that selects the transactions that isn't using the indexes... Do you have any idea?

You can use this query
select #rid, $a[0].sum as sumValueTot ,$a[0].count as countTransaction from Transaction
let $a = ( select sum(valueTot),count(*) from (TRAVERSE both('ValueTx') from $parent.$current) where #class="Transaction")
where height >= 402041 and height <= 402044
Hope it helps.

is this what are you looking for?
select head, sum(valueTot), count(*) from (SELECT *,traversedElement(0) as head from ( TRAVERSE both('ValueTx') from (select * from Transaction where height >= 402041 and height <= 402044)) where #class = "Transaction") group by head

Related

Unexpected Results From Postgresql Query

I am trying to find a way to create a table where it shows each customer profile (71 in total) their top item bought per time frame (10 in total), what that item is, and the time frame that was most popular. Whenever I run this query it shows the top items for a time frame but it shows all the customers as null. I also need a way to display the customer name which is also accessed through the id_table. I'm lost so any direction would be greatly appreciated! I only have read permissions on this DB.
select distinct id_table.name as product_name, pb.recruitment_round, count(pb.purchased), st.cust_dbf_id as cust_profile
from product_bought pb
join id_table
on id_table.dbf_id = pb.dbf_id
left join shopper_table st
on st.cust_dbf_id = id_table.dbf_id
where pb.date >= '2022-01-01'
and pb.date <= '2022-01-05'
and pb.shopping_time = 4
group by id_table."name", pb.recruitment_round, pb.cust_dbf_id
order by count(pb.purchased) desc, pb.recruitment_round
limit 1;
Expected: a return of st.cust_dbf_id.
Received: Null values

TSQL select and join issue

I have two tables, EMPL which is a historical employee table to track changes in an employee's tax rate and PAYROLL which is also a historical table filled with employee pay over a number of periods.
FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd, only one record should be joined from EMPL to PAYROLL.
Below are the two tables, query and result set. However, I only want 1 record for each employee per pay period, which matches the relevant employee record based upon the payroll_pd and effect_pd.
(Click image to enlarge)
first of all - welcome!
You wrote "...FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd ..." but you start your SQL with PAYROLL and not with EMPL.
Pls test this statement first:
SELECT
E.rec_id
,E.empl_id
,E.empl_name
,E.tax_rate
,E.effect_pd
,P.rec_id
,P.payroll_pd
,P.empl_id
,P.pd_pay
FROM
empl AS E
LEFT OUTER JOIN
payroll AS P
ON E.empl_id = P.empl_id
AND E.effect_pd < P.payroll_pd
After that you get 7 records witch are uniqe.
i think, thats it.
Best regards
After 3 days of messing around with the code, I finally arrived at the solution which is:
SELECT * FROM PAYROLL p
LEFT JOIN EMPL e on p.empl_id = e.empl_id
WHERE e.rec_id = ( SELECT TOP 1 c.rec_id
FROM EMPL c
WHERE c.empl_id = p.empl_id
AND p.payroll_pd >= c.effect_pd
ORDER BY c.effect_pd DESC );

Tableau - Calculating average where date is less than value from another data source

I am trying to calculate the average of a column in Tableau, except the problem is I am trying to use a single date value (based on filter) from another data source to only calculate the average where the exam date is <= the filtered date value from the other source.
Note: Parameters will not work for me here, since new date values are being added constantly to the set.
I have tried many different approaches, but the simplest was trying to use a calculated field that pulls in the filtered exam date from the other data source.
It successfully can pull the filtered date, but the formula does not work as expected. 2 versions of the calculation are below:
IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated])) THEN AVG([Raw Score]) END
IF DATEDIFF('day', DATE(ATTR([Exam Date])), DATE(ATTR([Averages (Tableau Test Scores)].[Updated]))) > 1 THEN AVG([Raw Score]) END
Basically, I am looking for the equivalent of this in SQL Server:
SELECT AVG([Raw Score]) WHERE ExamDate <= (Filtered Exam Date)
Below a workbook that shows an example of what I am trying to accomplish. Currently it returns all blanks, likely due to the many-to-one comparison I am trying to use in my calculation.
Any feedback is greatly appreciated!
Tableau Test Exam Workbook
I was able to solve this by using Custom SQL to join the tables together and calculate the average based on my conditions, to get the column results I wanted.
Would still be great to have this ability directly in Tableau, but whatever gets the job done.
Edit:
SELECT
[AcademicYear]
,[Discipline]
--Get the number of student takers
,COUNT([Id]) AS [Students (N)]
--Get the average of the Raw Score
,CAST(AVG(RawScore) AS DECIMAL(10,2)) AS [School Mean]
--Get the number of failures based on an "adjusted score" column
,COUNT([AdjustedScore] < 70 THEN 1 END) AS [School Failures]
--This is the column used as the cutoff point for including scores
,[Average_Update].[Updated]
FROM [dbo].[Average] [Average]
FULL OUTER JOIN [dbo].[Average_Update] [Average_Update] ON ([Average_Update].[Id] = [Average].UpdateDateId)
--The meat of joining data for accurate calculations
FULL OUTER JOIN (
SELECT DISTINCT S.[Id], S.[LastName], S.[FirstName], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[Subject], P.[Id] AS PeriodId
FROM [StudentScore] S
FULL OUTER JOIN
(
--Get only the 1st attempt
SELECT DISTINCT [NBOMEId], S2.[Subject], MIN([ExamDate]) AS ExamDate
FROM [StudentScore] S2
GROUP BY [NBOMEId],S2.[Subject]
) B
ON S.[NBOMEId] = B.[NBOMEId] AND S.[Subject] = B.[Subject] AND S.[ExamDate] = B.[ExamDate]
--Group in "Exam Periods" based on the list of periods w/ start & end dates in another table.
FULL OUTER JOIN [ExamPeriod] P
ON S.[ExamDate] = P.PeriodStart AND S.[ExamDate] <= P.PeriodEnd
WHERE S.[Subject] = B.[Subject]
GROUP BY P.[Id], S.[Subject], S.[ExamDate], S.[RawScoreStandard], S.[RawScorePercent], S.[AdjustedScore], S.[NBOMEId], S.[NBOMELastName], S.[NBOMEFirstName], S.[SecondYrTake]) [StudentScore]
ON
([StudentScore].PeriodId = [Average_Update].ExamPeriodId
AND [StudentScore].Subject = [Average].Subject
AND [StudentScore].[ExamDate] <= [Average_Update].[Updated])
--End meat
--Joins to pull in relevant data for normalized tables
FULL OUTER JOIN [dbo].[Student] [Student] ON ([StudentScore].[NBOMEId] = [Student].[NBOMEId])
INNER JOIN [dbo].[ExamPeriod] [ExamPeriod] ON ([Average_Update].ExamPeriodId = [ExamPeriod].[Id])
INNER JOIN [dbo].[AcademicYear] [AcademicYear] ON ([ExamPeriod].[AcademicYearId] = [AcademicYear].[Id])
--This will pull only the latest update entry for every academic year.
WHERE [Updated] IN (
SELECT DISTINCT MAX([Updated]) AS MaxDate
FROM [Average_Update]
GROUP BY[ExamPeriodId])
GROUP BY [AcademicYear].[AcademicYearText], [Average].[Subject], [Average_Update].[Updated],
ORDER BY [AcademicYear].[AcademicYearText], [Average_Update].[Updated], [Average].[Subject]
I couldn't download your file to test with your data, but try reversing the order of taking the average ie
average(IF DATE(ATTR([Exam Date])) <= DATE(ATTR([Averages (Tableau Test Scores)].[Updated]) then [Raw Score]) END)
as written, I believe you'll be averaging the data before returning it from the if statement, whereas you want to return the data, then average it.

Conditional OR in the SQL Server Join – Multi-Value Parameters

I have an SSRS report with 4 parameters, two of which are multi-value parameters (#material and #color using VARCHAR(MAX) datatype in SQL Server 2008 R2). I am using a split function to return the value as a comma separated:
SELECT *
FROM MyView
WHERE height > 200
AND width > 100
AND (
material IN (SELECT Item FROM [dbo].[MySplitFunction] (#material, ',')) OR
color IN (SELECT Item FROM [dbo].[MySplitFunction] (#color, ','))
)
(The code above would return 50 records)
The problem with this approach is that these two multi-value parameters have around of 1,500 different colors and materials and degrade the performance. Sometimes, it takes more than 40 minutes to return the results (row count in the view around 600,000).
I tried a different approach where I used a temp table and used it in the JOIN instead of the WHERE clause:
SELECT Item
INTO #TempTable
FROM [dbo].[MySplitFunction] (#material, ',')
SELECT *
FROM MyView
INNER JOIN ON MyView.Item = #TempTable.Item
WHERE height > 200
AND width > 100
AND material IN (SELECT Item FROM [dbo].[MySplitFunction] (#material, ','))
(The code above would return 7 records only, but the performance is much better)
My question is how can I return the same number of records (50 rows) using the second approach by adding the other #color parameter and allowing the OR condition? So in the SSRS report, the user can multi select these two parameters and the query will return #material = values OR #color = Values.
I am open to any other approach as long as it speeds up the query and allows the OR condition for the two multi-value parameters (#material, #color).
Thanks!
Something like the following might do the trick. I'm not sure I have the syntax precisely right, and it wants further testing and analysis that I can't do without the proper structures and data...
SELECT
from MyVeiew
where height > 200
and width > 100
and (exists (select Item
from dbo.MySplitFunction(#material, ',')
where Item = material)
or exists (select Item
from dbo.MySplitFunction(#color, ',')
where Item = color)
)
This performs two correlated subqueries on nested function calls. Exists checks are generally faster than in lookups in these situations. The syntax bit that worries me is the "and (exists" bit -- that's the parenthesis for the OR clause, and combined with exists it looks a bit wonky.
I think it should do what you want, but testing is definitely called for.
I mistrust that or clause. To get rid of it, try this and see what happens:
SELECT * -- Better with specific columns
from MyView
where height > 200
and width > 100
and exists (select Item
from dbo.MySplitFunction(#material, ',')
where Item = material)
UNION select *
from MyView
where height > 200
and width > 100
and exists (select Item
from dbo.MySplitFunction(#color, ',')
where Item = color)
This runs and combines two queries, removing all duplicates -- pretty much the same as the OR clause would.
Next thing to check would be reviewing table sizes and checking indexes. You're filtering results on (only!) columns height, width, material, and color; if the table is huge, appropriate index would help here.

Postgresql upper limit of a calculated field

is there a way to set an upper limit to a calculation (calculated field) which is already in a CASE clause? I'm calculating percentages and, obviously, don't want the highest value exceed '100'.
If it wasn't in a CASE clause already, I'd create something like 'case when calculation > 100.0 then 100 else calculation end as needed_percent' but I can't do it now..
Thanks for any suggestions.
I think using least function will be the best option.
select least((case when ...), 100) from ...
There is a way to set an upper limit on a calculated field by creating an outer query. Check out my example below. The inner query will be the query that you have currently. Then just create an outer query on it and use a WHERE clause to limit it to <= 1.
SELECT
z.id,
z.name,
z.percent
FROM(
SELECT
id,
name,
CASE WHEN id = 2 THEN sales/SUM(sales) ELSE NULL END AS percent
FROM
users_table
) AS z
WHERE z.percent <= 1