MariaDB - Conjunction-Search in Many-to-Many - group-by

I have problems to implement an "and-concatenated" search with many-to-many tables. I tried to present a simple example below. I use MariaDB.
I have a table with process. To the process a can assign persons and tags. There is a table for tags and a table for persons.
There a two many-to-many relationships: tags_to_processes and persons_to_processes.
example: Find all process with person 1 and person 2 and with tag 1 and 2. Result: process 1.
example: Find all process with person 1 and person 2 and with tag 2. Result: Process 1 and Process 2.
Thank you very much!
'processes' Table
+-----------+-------------------+
|process_id |process_name |
+-----------+-------------------+
|1 |Process 1 |
|2 |Process 2 |
|3 |Process 3 |
+-----------+-------------------+
'persons' table
+----------+------------+
|person_id |person_name |
+----------+------------+
|1 |Person 1 |
|2 |Person 2 |
|3 |Person 3 |
|4 |Person 4 |
|5 |Person 5 |
+----------+------------+
'tags' table
+----------+-----------+
|tag_id |tag_name |
+----------+-----------+
|1 |Tag 1 |
|2 |Tag 2 |
|3 |Tag 3 |
|4 |Tag 4 |
|5 |Tag 5 |
|6 |Tag 6 |
+----------+-----------+
'persons_to_processes' table
+----------+-----------+
|person_id |process_id |
+----------+-----------+
|1 |1 |
|2 |1 |
|3 |1 |
|4 |1 |
|5 |1 |
|1 |2 |
|2 |2 |
|4 |3 |
+----------+-----------+
'tags_to_processes' table
+----------+-----------+
|tag_id |process_id |
+----------+-----------+
|1 |1 |
|2 |1 |
|3 |1 |
|6 |1 |
|2 |2 |
|2 |3 |
+----------+-----------+

You can join persons_to_processes to persons, filter the resuults for the persons that you want and use aggregation:
SELECT ptp.process_id
FROM persons_to_processes ptp INNER JOIN persons p
ON p.person_id = ptp.person_id
WHERE p.person_name IN ('Person 1', 'Person 2')
GROUP BY ptp.process_id
HAVING COUNT(*) = 2 -- 2 persons
Similarly for the tables tags_to_processes and tags:
SELECT ttp.process_id
FROM tags_to_processes ttp INNER JOIN tags t
ON t.tag_id = ttp.tag_id
WHERE t.tag_name IN ('Tag 1', 'Tag 2')
GROUP BY ttp.process_id
HAVING COUNT(*) = 2 -- 2 tags
Finally, you can combine the 2 queries to get their common results with INTERSECT:
WITH
cte1 AS (
SELECT ptp.process_id
FROM persons_to_processes ptp INNER JOIN persons p
ON p.person_id = ptp.person_id
WHERE p.person_name IN ('Person 1', 'Person 2')
GROUP BY ptp.process_id
HAVING COUNT(*) = 2 -- 2 persons
),
cte2 AS (
SELECT ttp.process_id
FROM tags_to_processes ttp INNER JOIN tags t
ON t.tag_id = ttp.tag_id
WHERE t.tag_name IN ('Tag 1', 'Tag 2')
GROUP BY ttp.process_id
HAVING COUNT(*) = 2 -- 2 tags
)
SELECT process_id FROM cte1
INTERSECT
SELECT process_id FROM cte2;
See the demo.

Related

How to add some values in a dataframe in Scala Spark?

Here is the dataframe I have for now, suppose there are totally 4 days{1,2,3,4}:
+-------------+----------+------+
| key | Time | Value|
+-------------+----------+------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 4 | 3 |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
+-------------+----------+------+
And what I want is
+-------------+----------+------+
| key | Time | Value|
+-------------+----------+------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | null |
| 1 | 4 | 3 |
| 2 | 1 | null |
| 2 | 2 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | null |
+-------------+----------+------+
If there is some ways that can help me get this?
Say df1 is our main table:
+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1 |1 |1 |
|1 |2 |2 |
|1 |4 |3 |
|2 |2 |4 |
|2 |3 |5 |
+---+----+-----+
We can use the following transformations:
val data = df1
// we first group by and aggregate the values to a sequence between 1 and 4 (your number)
.groupBy("key")
.agg(sequence(lit(1), lit(4)).as("Time"))
// we explode the sequence, thus creating all 'Time' per 'key'
.withColumn("Time", explode(col("Time")))
// finally, we join with our main table on 'key' and 'Time'
.join(df1, Seq("key", "Time"), "left")
To get this output:
+---+----+-----+
|key|Time|Value|
+---+----+-----+
|1 |1 |1 |
|1 |2 |2 |
|1 |3 |null |
|1 |4 |3 |
|2 |1 |null |
|2 |2 |4 |
|2 |3 |5 |
|2 |4 |null |
+---+----+-----+
Which should be what you are looking for, good luck!

SQL Select Unique Values Each Column

I'm looking to select unique values from each column of a table and output the results into a single table. Take the following example table:
+------+---------------+------+---------------+
|col1 |col2 |col_3 |col_4 |
+------+---------------+------+---------------+
|1 |"apples" |A |"red" |
|2 |"bananas" |A |"red" |
|3 |"apples" |B |"blue" |
+------+---------------+------+---------------+
the ideal output would be:
+------+---------------+------+---------------+
|col1 |col2 |col_3 |col_4 |
+------+---------------+------+---------------+
|1 |"apples" |A |"red" |
|2 |"bananas" |B |"blue" |
|3 | | | |
+------+---------------+------+---------------+
Thank you!
Edit: My actual table has many more columns, so ideally the SQL query can be done via a SELECT * as opposed to 4 individual select queries within the FROM statement.

Trying to unwind two or more arrays in OrientDB

I'm using OrientDB's UI/Query tool to analyze some graph data, and I've spent a couple of days unsuccessfully trying to unwind two arrays.
The unwind clause works just fine for one array but I can't seem to get the output I'm looking for when trying to unwind two arrays.
Here's a simplified example of my data:
#class | amt | storeID | customerID
transaction $4 1 1
transaction $2 1 1
transaction $6 1 4
transaction $3 1 4
transaction $2 2 1
transaction $7 2 1
transaction $8 2 2
transaction $3 2 2
transaction $4 2 3
transaction $9 2 3
transaction $10 3 4
transaction $3 3 4
transaction $4 3 5
transaction $10 3 5
Each customer is a document with the following information:
#class | customerID | State
customer 1 NY
customer 2 NJ
customer 3 PA
customer 4 NY
customer 5 NY
Each store is a document with the following information:
#class | storeID | State | Zip
store 1 NY 1
store 2 NJ 3
store 3 NY 2
Assuming I did not have storeID (nor wanted to create it), I want to recover a flattened table with the following distinct values: name of the store, city, account numbers, and the sum of spent.
The query would hopefully generate something like the table below (for a given depth value).
State | Zip | customerID
NY 1 4
NY 1 5
NY 2 1
NY 2 4
NJ 3 1
NJ 3 2
NJ 3 3
I've tried various expand/flatten/unwind operations but I can't seem to get my query to work.
Here's the query I have that recovers the State and Zip as two arrays and flattens the customerID:
SELECT out().State as State,
out().Zip as Zip,
customerID
FROM ( SELECT EXPAND(IN())
FROM (TRAVERSE * FROM
( SELECT FROM transaction)
)
) ;
Which yields,
State | Zip | customerID
[NY, NY, NJ, NJ] [1,1,2,2] 1
[NY, NY, NJ, NJ] [1,1,2,2] 1
[NY, NY, PA, PA] [1,1,3,3] 4
[NY, NY, PA, PA] [1,1,3,3] 4
... .... ....
Which is not what I'm looking for. Can someone provide a little help on how I can flatten/unwind these two arrays all together?
I tried your case with this structure (based on your example):
I used this queries to retrieve State, Zip and customerID (not as array):
Query 1:
SELECT State, Zip, in('transaction').customerID AS customerID FROM Store
ORDER BY Zip UNWIND customerID
----+------+-----+----+----------
# |#CLASS|State|Zip |customerID
----+------+-----+----+----------
0 |null |NY |1 |1
1 |null |NY |1 |1
2 |null |NY |1 |4
3 |null |NY |1 |4
4 |null |NY |2 |4
5 |null |NY |2 |4
6 |null |NY |2 |5
7 |null |NY |2 |5
8 |null |NJ |3 |1
9 |null |NJ |3 |1
10 |null |NJ |3 |2
11 |null |NJ |3 |2
12 |null |NJ |3 |3
13 |null |NJ |3 |3
----+------+-----+----+----------
Query 2:
SELECT inV('transaction').State AS State, inV('transaction').Zip AS Zip,
outV('transaction').customerID AS customerID FROM transaction ORDER BY Zip
----+------+-----+----+----------
# |#CLASS|State|Zip |customerID
----+------+-----+----+----------
0 |null |NY |1 |1
1 |null |NY |1 |1
2 |null |NY |1 |4
3 |null |NY |1 |4
4 |null |NY |2 |4
5 |null |NY |2 |4
6 |null |NY |2 |5
7 |null |NY |2 |5
8 |null |NJ |3 |1
9 |null |NJ |3 |1
10 |null |NJ |3 |2
11 |null |NJ |3 |2
12 |null |NJ |3 |3
13 |null |NJ |3 |3
----+------+-----+----+----------
EDITED
In the following example, with the query you'll be able to retrieve the average and the total spent for every storeID (based on each customerID):
SELECT customerID, storeID, avg(amt) AS averagePerStore, sum(amt) AS totalPerStore
FROM transaction GROUP BY customerID,storeID ORDER BY customerID
----+------+----------+-------+---------------+-------------
# |#CLASS|customerID|storeID|averagePerStore|totalPerStore
----+------+----------+-------+---------------+-------------
0 |null |1 |1 |3.0 |6.0
1 |null |1 |2 |4.5 |9.0
2 |null |2 |2 |5.5 |11.0
3 |null |3 |2 |6.5 |13.0
4 |null |4 |1 |4.5 |9.0
5 |null |4 |3 |6.5 |13.0
6 |null |5 |3 |7.0 |14.0
----+------+----------+-------+---------------+-------------
Hope it helps

How to search and join multi indexes with SphinxQL?

I have 2 indexes, indexA and indexB. There 2 indexes have different columns.
Example:
Index A:
+---+-----+
|id |text |
+---+-----+
|1 |john |
|2 |tom |
|3 |sam |
+---+-----+
Index B:
+---+---------+-----+
|id |parentid |num |
+---+---------+-----+
|1 |1 |64 |
|2 |1 |128 |
|3 |2 |256 |
+---+---------+-----+
Question:
How do I get result like this?
/*Client search*/
SELECT
A.id, A.text, B.num
FROM
indexa A
INNER JOIN
indexb B ON A.id = B.parentid
WHERE
B.num > 100
Result:
+-----+--------+-------+
|A.id | A.text |B.num |
+-----+--------+-------+
|1 |john |128 |
|2 |tom |256 |
+-----+--------+-------+
After edit index query, problem solved.
Solved index query:
SELECT
A.id,A.text,B.num
FROM
tableA A
LEFT JOIN
tableB B ON A.id=B.parentid
Search query:
SELECT * FROM indexA

Sort hierarchical table CTE query

How I can sort a hierarchical table with CTE query ?
sample table :
|ID|Name |ParentID|
| 0| |-1 |
| 1|1 |0 |
| 2|2 |0 |
| 3|1-1 |1 |
| 4|1-2 |1 |
| 5|2-1 |2 |
| 6|2-2 |2 |
| 7|2-1-1 |5 |
and my favorite result is :
|ID|Name |ParentID|Level
| 0| |-1 |0
| 1|1 |0 |1
| 3|1-1 |1 |2
| 4|1-2 |1 |2
| 2|2 |0 |1
| 5|2-1 |2 |2
| 7|2-1-1 |5 |3
| 6|2-2 |2 |2
another Sample :
an other sample :
|ID|Name |ParentID|
| 0| |-1 |
| 1|Book |0 |
| 2|App |0 |
| 3|C# |1 |
| 4|VB.NET |1 |
| 5|Office |2 |
| 6|PhotoShop |2 |
| 7|Word |5 |
and my favorite result is :
|ID|Name |ParentID|Level
| 0| |-1 |0
| 1|Book |0 |1
| 3|C# |1 |2
| 4|VB.NET |1 |2
| 2|App |0 |1
| 5|Office |2 |2
| 7|Word |5 |3
| 6|PhotoShop |2 |2
The hierarchyid datatype is able to represent hierarchical data, and already has the desired sorting order. If you can't replace your ParentID column, then you can convert to it on the fly:
(Most of this script is data setup, the actual answer is quite small)
declare #t table (ID int not null,Name varchar(10) not null,ParentID int not null)
insert into #t(ID,Name,ParentID)
select 0,'' ,-1 union all
select 1,'Book' ,0 union all
select 2,'App' ,0 union all
select 3,'C#' ,1 union all
select 4,'VB.NET' ,1 union all
select 5,'Office' ,2 union all
select 6,'PhotoShop' ,2 union all
select 7,'Word' ,5
;With Sensible as (
select ID,Name,NULLIF(ParentID,-1) as ParentID
from #t
), Paths as (
select ID,CONVERT(hierarchyid,'/' + CONVERT(varchar(10),ID) + '/') as Pth
from Sensible where ParentID is null
union all
select s.ID,CONVERT(hierarchyid,p.Pth.ToString() + CONVERT(varchar(10),s.ID) + '/')
from Sensible s inner join Paths p on s.ParentID = p.ID
)
select
*
from
Sensible s
inner join
Paths p
on
s.ID = p.ID
order by p.Pth
ORDER BY Name should work as desired:
WITH CTE
AS(
SELECT parent.*, 0 AS Level
FROM #table parent
WHERE parent.ID = 0
UNION ALL
SELECT parent.*, Level+1
FROM #table parent
INNER JOIN CTE prev ON parent.ParentID = prev.ID
)
SELECT * FROM CTE
ORDER BY Name
Here's your sample data(add it next time yourself):
declare #table table(ID int,Name varchar(10),ParentID int);
insert into #table values(0,'',-1);
insert into #table values(1,'1',0);
insert into #table values(2,'2',0);
insert into #table values(3,'1-1',1);
insert into #table values(4,'1-2',1);
insert into #table values(5,'2-1',2);
insert into #table values(6,'2-2',2);
insert into #table values(7,'2-1-1',5);
Result:
ID Name ParentID Level
0 -1 0
1 1 0 1
3 1-1 1 2
4 1-2 1 2
2 2 0 1
5 2-1 2 2
7 2-1-1 5 3
6 2-2 2 2