OrientDB Create GroupBy Table - orientdb

For example I have following table
[id, V1, V2]
[A , 1 , 5]
[A , 2 , 4]
[A , 3 , 3]
[A , 4 , 2]
[B , 9 , 6]
[B , 8 , 7]
[B , 7 , 8]
[B , 6 , 9]
I Want to create query with following result
[id, V1` , V2` ] ]
[A , [1,2,3,4] , [5,4,3,2] ]
[B , [9,8,7,6] , [6,7,8,9] ]
OR
[id, min , max ]
[A , [1] , [5] ]
[B , [6] , [9] ]
How this query can be costructed ?
I allready tried a lot of options but failed.
Any help will be appriciated.
Thanks in advance

Try this:
select id, $a.V1 as V1, $a.V2 as V2 from <class name>
let $a = (select from <class name> where $parent.current.id = id)
group by id
Hope it helps
Regards

Related

Iterate through records to build hierarchy

Please consider the following tables. They describe a schools' hierarchy and the notes per student.
users
-------------------------------------------------------------
root_id obj_id obj_type obj_ref_id obj_role
-------------------------------------------------------------
1 2 student 7 learn
1 3 student 7 learn
1 1 principal 1 lead
1 4 mentor 1 train teachers
1 5 trainee 4 learn teaching
1 6 trainee 4 learn teaching
1 7 teacher 1 teach
2 8 student 9 learn
2 9 principal 9 lead
notes
--------------------------------------------------------------
note_id obj_id note
--------------------------------------------------------------
1 2 foo
2 2 bar
3 2 baz
4 3 lorem
5 8 ipsum
I need to write out the hierarchy and number of notes per user as follows:
-------------------------------------------------------------------------------------------
obj_id notes obj_path
-------------------------------------------------------------------------------------------
1 0 principal 1 (lead)
2 3 student 2 (learn) > teacher 7 (teach) > principal 1 (lead)
3 1 student 3 (learn) > teacher 7 (teach) > principal 1 (lead)
4 0 mentor 4 (train teachers) > principal 1 (lead)
5 0 trainee 5 (learn teaching) > mentor 4 (train teachers) > principal 1 (lead)
6 0 trainee 6 (learn teaching) > mentor 4 (train teachers) > principal 1 (lead)
7 0 teacher 7 (teach) > principal 1 (lead)
8 1 student 8 (learn) > principal 2 (lead)
9 0 principal 9 (lead)
For this, I understand that I need to use a loop as follows:
declare cur cursor for
select obj_id from users order by root_id
open cur
declare #obj_id int
fetch next from cur into #id
while (##FETCH_STATUS = 0)
begin
select obj_role from users where obj_id = #obj_id
fetch next from cur into #obj_id
end
close cur
deallocate cur
This is what I have until now, but I do not understand how to go from here. Can someone help me on my way?
Understand that using a cursor is going to process each individual record one by one.
Recursive CTE would be a better solution:
Sql server CTE and recursion example
CTE Recursion to get tree hierarchy
How does a Recursive CTE run, line by line?
Something like:
DECLARE #User TABLE
(
[root_id] INT
, [obj_id] INT
, [Obj_type] NVARCHAR(100)
, [obj_ref_id] INT
, [obj_role] NVARCHAR(100)
);
DECLARE #Notes TABLE
(
[note_id] INT
, [obj_id] INT
, [note] NVARCHAR(255)
);
INSERT INTO #Notes (
[note_id]
, [obj_id]
, [note]
)
VALUES ( 1, 2, 'foo ' )
, ( 2, 2, 'bar ' )
, ( 3, 2, 'baz ' )
, ( 4, 3, 'lorem' )
, ( 5, 8, 'ipsum' );
INSERT INTO #User (
[root_id]
, [obj_id]
, [Obj_type]
, [obj_ref_id]
, [obj_role]
)
VALUES ( 1, 2, 'student', 7, 'learn' )
, ( 1, 3, 'student', 7, 'learn' )
, ( 1, 1, 'principal', 1, 'lead' )
, ( 1, 4, 'mentor', 1, 'train teachers' )
, ( 1, 5, 'trainee', 4, 'learn teaching' )
, ( 1, 6, 'trainee', 4, 'learn teaching' )
, ( 1, 7, 'teacher', 1, 'teach' )
, ( 2, 8, 'student', 9, 'learn' )
, ( 2, 9, 'principal', 9, 'lead' );
WITH [Hierarchy]
AS ( SELECT [obj_id] AS [root_obj]
, [obj_ref_id] AS [root_obj_ref]
, [obj_id]
, [obj_ref_id]
, CONVERT(
NVARCHAR(MAX)
, [Obj_type] + ' ' + CONVERT(NVARCHAR, [obj_id]) + ' ('
+ [obj_role] + ')'
) AS [obj_path]
FROM #User
UNION ALL
SELECT [a].[root_obj]
, [a].[root_obj_ref]
, [b].[obj_id]
, [b].[obj_ref_id]
, [a].[obj_path] + ' > ' + [b].[Obj_type]
+ CONVERT(NVARCHAR, [b].[obj_id]) + ' (' + [b].[obj_role] + ')' AS [obj_path]
FROM [Hierarchy] [a]
INNER JOIN #User [b]
ON [b].[obj_id] = [a].[obj_ref_id]
WHERE [a].[obj_id] <> [a].[obj_ref_id] ) --Here, basically continue the recursion while the record isn't referencing itself. The final will include that self referencing record.
SELECT [Hierarchy].[root_obj] AS [obj_id]
, (
SELECT COUNT(*)
FROM #Notes
WHERE [obj_id] = [Hierarchy].[root_obj]
) AS [notes] --Here we'll go out and get the count of notes.
, [Hierarchy].[obj_path]
FROM [Hierarchy]
WHERE [Hierarchy].[obj_id] = [Hierarchy].[obj_ref_id] --Then we only went those records built up to the final record that was referencing itself.
ORDER BY [Hierarchy].[root_obj];

SELECT, format rows with different columns into a single row that share an ID

I'm trying to format one SELECT statement so that it outputs a resultset with combined values over a few columns.
I have a resultset like this:
ID VID PID VALUE
1 x 1 a
2 y 1 A
3 y 2 B
4 x 2 b
5 y 3 C
6 x 3 c
7 x 4 d
8 y 4 D
9 x 5 e
10 y 5 E
Can I format one SELECT statement to effectively join the values with duplicate PIDs into a single row? I'm only really interested in PID and VALUE, e.g.
PID VALUE1 VALUE2
1 a A
2 b B
3 c C
4 d D
5 e E
Otherwise, should I be using actual JOINs with queries acting on the same table?
I tried to use CASE but can get up to a resultset like this:
ID VID PID VALUE1 VALUE2
1 x 1 a NULL
2 y 1 NULL A
3 y 2 NULL B
4 x 2 b NULL
5 y 3 NULL C
6 x 3 c NULL
7 x 4 d NULL
8 y 4 NULL D
9 x 5 e NULL
10 y 5 NULL E
The query I'm using looks somewhat like this.
SELECT
ID,
VID,
PID,
CASE WHEN VID = 'x' THEN VALUE END VALUE1,
CASE WHEN VID = 'y' THEN VALUE END VALUE2
FROM BIGTABLE
WHERE PID IN (1, 2, 3, 4, 5)
AND VID IN ('x', 'y')
There's a lot of values of PID and VID that aren't just 1-5 and x & y so I'm selecting them that way from the whole table.
Do you mean like this? It's called "conditional aggregation."
with
resultset ( id, vid, pid, value ) as (
select 1, 'x', 1, 'a' from dual union all
select 2, 'y', 1, 'A' from dual union all
select 3, 'y', 2, 'B' from dual union all
select 4, 'x', 2, 'b' from dual union all
select 5, 'y', 3, 'C' from dual union all
select 6, 'x', 3, 'c' from dual union all
select 7, 'x', 4, 'd' from dual union all
select 8, 'y', 4, 'D' from dual union all
select 9, 'x', 5, 'e' from dual union all
select 10, 'y', 5, 'E' from dual
)
-- End of simulated resultset (for testing purposes only, not part of the solution).
-- SQL query begins below this line.
select pid,
min(case when vid = 'x' then value end) as value1,
min(case when vid = 'y' then value end) as value2
from resultset
-- WHERE conditions, if any are needed - as in your attempt
group by pid
order by pid
;
PID VALUE1 VALUE2
--- ------ ------
1 a A
2 b B
3 c C
4 d D
5 e E

Update Column 1 Based on time and repeating value in Column B

Table A: ID (Identity) Name State StateTimestamp Course
1 ABC C 1/1/2001 ?
2 ABC A 1/5/2001 OO
3 ABC B 2/3/2001 OO
4 ABC A 2/4/2001 PP
5 ABC D 2/5/2001 PP
6 ABC A 2/12/2001 QQ
7 ABC A 2/18/2001 RR
8 ABC z 2/20/2001 ?
9 XYZ C 1/1/2001 ?
10 XYZ A 1/14/2001 ?
11 XYZ D 1/16/2001 ?
12 XYZ A 1/17/2001 ?
13 XYZ z 1/31/2001 ?
ID is a unique column. Each state belongs to name and has timestamp associated with it in an incremental order.
Update the course for each row based on the State-A, such that when state A occurs first time for a Name Update
Course to 'OO', when state A occurs 2nd time for a Name update course 'PP' and When state C occurs 3rd time update
course to 'QQ'. And the number of repeatation of State A can be infinite.
And Course for all of the states having date > State A and date < consequitive State A - i.e After 1 Pending state
and before following pending state (i.e B,2/3/2001 in the above table ) has to be updated same as the course for
precedding A state (i.e. OO). Your ideas will be much appreciated.
Create table As:
CREATE TABLE TABLEA
(
ID Identity 1,1
, Name varchar(155)
, State varchar(155)
, StateTimestamp Datetime
, Course varchar(155)
)
Populate As:
INSERT INTO TABLEA VALUES
( 'ABC' ,'C', '1/1/2001' , '? ')
,( 'ABC' ,'A', '1/5/2001' , '?')
,( 'ABC' ,'B', '2/3/2001' , '?')
,( 'ABC' ,'A', '2/4/2001' , '?')
,( 'ABC' ,'D', '2/5/2001' , '?')
,( 'ABC' ,'A', '2/12/2001' , '?')
,( 'ABC' ,'A', '2/18/2001' , '?')
,( 'ABC' ,'z', '2/20/2001' , '? ')
,( 'XYZ' ,'C', '1/1/2001' , '? ')
,( 'XYZ' ,'A', '1/14/2001' , '? ')
,( 'XYZ' ,'D', '1/16/2001' , '? ')
,( 'XYZ' ,'A', '1/17/2001' , '? ')
,( 'XYZ' ,'z', '1/31/2001' , '? ')
Desired Result:
Table A:
ID (Identity) Name State StateTimestamp Course
1 ABC C 1/1/2001 ?
2 ABC A 1/5/2001 OO
3 ABC B 2/3/2001 OO
4 ABC A 2/4/2001 PP
5 ABC D 2/5/2001 PP
6 ABC A 2/12/2001 QQ
7 ABC A 2/18/2001 RR
8 ABC z 2/20/2001 RR
9 XYZ C 1/1/2001 XX
10 XYZ A 1/14/2001 YY
11 XYZ D 1/16/2001 YY
12 XYZ A 1/17/2001 ZZ
13 XYZ z 1/31/2001 ZZ

group rows into a list in pyspark

I have a spark dataframe having a structure similar to the following table
**col1** **col2**
A 1
B 2
A 3
B 4
C 1
A 2
I want it to be grouped on col1 and create a list of values on col2. Following should be my output
**col1** **list**
A [1,3,2]
B [2, 4]
C [1]
Can someone point me to any references?
This should do the job:
df.groupBy($"col1").agg( collect_list($"col2") )

Spark MLlib - SQLContext - Group items based on top 3 value

I'm trying to make some Basket Market Analysis using Spark MLlib with this dataset:
Purchase_ID Category Furnisher Value
1 , A , 1 , 7
1 , B , 2 , 7
2 , A , 1 , 1
3 , C , 2 , 4
3 , A , 1 , 4
3 , D , 3 , 4
4 , D , 3 , 10
4 , A , 1 , 10
5 , E , 1 , 8
5 , B , 3 , 8
5 , A , 1 , 8
6 , A , 1 , 3
6 , B , 1 , 3
6 , C , 5 , 3
7 , D , 3 , 4
7 , A , 1 , 4
The transaction value (value) is grouped by each Purchase_ID. And what I want is just return the top 3 categories ​​with higher Value. Basically, I want to return this dataset:
D,A
E,B,A
A,B
For that I'm trying with the following code:
val data = sc.textFile("PATH");
case class Transactions(Purchase_ID:String,Category:String,Furnisher:String,Value:String);
def csvToMyClass(line: String) = {
val split = line.split(',');
Transactions(split(0),split(1),split(2),split(3))}
val df = data.map(csvToMyClass)
.toDF("Purchase_ID","Category","Furnisher","Value")
.(select("Purchase_ID","Category") FROM (SELECT "Purchase_ID","Category",dense_rank() over (PARTITION BY "Category" ORDER BY "Value" DESC) as rank) tmp WHERE rank <= 3)
.distinct();
The rank function isn't correct...
Anyone knows how to solve this problem?
Many thanks!