In SPARQL, can I list a group by using GROUP BY possibly?
Right now my query returns:
?p ?p2
----------------
abc zza
abc zba
abc zdf
bcd zbc
bcd zef
bcd zhr
bcd zfe
cde zop
cde zzz
The query I used is:
PREFIX bo: <https://webfiles.uci.edu/jenniyk2/businessontology#>
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
SELECT DISTINCT ?p ?p2
WHERE
{
?p v:hasAddress ?ad .
?p2 v:hasAddress ?ad .
FILTER( ?p != ?p2 )
}
Is there any way I can make it return something like:
?p ?p2
---------------
abc zza
zba
zdf
bcd zbc
zef
zhr
zfe
cde zop
zzz
or
?p
-------------------
abc zza zba zdf
bcd zbc zef zhr zfe
cde zop zzz
Something like this should do the trick:
PREFIX bo:<https://webfiles.uci.edu/jenniyk2/businessontology#>
PREFIX v: <http://www.w3.org/2006/vcard/ns#>
SELECT DISTINCT (GROUP_CONCAT(?p2; SEPARATOR=" ") AS ?p)
WHERE {
?p1 v:hasAddress ?ad.
?p2 v:hasAddress ?ad.
} GROUP BY ?p1
Related
I altered a code from solo learn app but got confused :
import re
pattern = r'(.+)(.+) \2'
match = re.match(pattern , 'ABC bca cab ABC')
if match:
print('Match 1' , match.group())
match = re.match(pattern , 'abc BCA cab BCA')
if match:
print('Match 2' , match.group())
match = re.match(pattern , 'abc bca CAB CAB')
if match:
print('Match 3' , match.group())
And am getting this output:
Match 1 ABC bca ca
Match 3 abc bca CAB CAB
Any help ?!!
Student Name Start Date
ABC 2021-01-01
DEF 2021-01-02
GHI 2021-02-03
JKL 2021-02-01
MNO 2021-01-05
PQR 2021-03-06
STU 2021-03-01
Output
Student Name Start Date Month_Start
ABC 2021-01-01 TRUE
DEF 2021-01-02 FALSE
GHI 2021-02-03 FALSE
JKL 2021-02-01 TRUE
MNO 2021-01-05 FALSE
PQR 2021-03-06 FALSE
STU 2021-03-01 TRUE
Using date_trunc() you can achieve this:
select student_name,
start_date,
start_date = date_trunc('month', start_date)::date as month_start
from the_table
The cast ::date is necessary, because date_trunc() returns a timestamp but we want to compare date values.
Have different files in a directory as below
f1.txt
id FName Lname Adrress sex levelId
t1 Girish Hm 10oak m 1111
t2 Kiran Kumar 5wren m 2222
t3 sara chauhan 15nvi f 6666
f2.txt
t4 girish hm 11oak m 1111
t5 Kiran Kumar 5wren f 2222
t6 Prakash Jha 18nvi f 3333
f3.txt
t7 Kiran Kumar 5wren f 2222
t8 Girish Hm 10oak m 1111
t9 Prakash Jha 18nvi m 3333
f4.txt
t10 Kiran Kumar 5wren f 2222
t11 girish hm 10oak m 1111
t12 Prakash Jha 18nvi f 3333
only first name and last name constant here and case should be ignored,
other Address,Sex, levelID could be changed.
Data should be grouped first based on fname and lname
t1 Girish Hm 10oak m 1111
t4 girish hm 11oak m 1111
t8 Girish Hm 10oak m 1111
t11 girish hm 10oak m 1111
t2 Kiran Kumar 5wren m 2222
t5 Kiran Kumar 5wren f 2222
t7 Kiran Kumar 5wren f 2222
t10 Kiran Kumar 5wren f 2222
t3 sara chauhan 15nvi f 6666
t6 Prakash Jha 18nvi f 3333
t9 Prakash Jha 18nvi m 3333
t12 Prakash Jha 18nvi f 33
Later we need to choose appropriate first record from each group based on frequency of values of columns Address,Sex,LevelID
Example: For person Girish Hm
10oak has maximum frequency from address
m has maximum frequency from gender
1111 has maximum frequency from LevelID.
so, Id with t1 will be correct record(considering need to choose 1st appropriate record from the group)
Final output should be:
t1 Girish Hm 10oak m 1111
t5 Kiran Kumar 5wren f 2222
t3 sara chauhan 15nvi f 6666
t6 Prakash Jha 18nvi f 3333
Scala solution:
First define columns of interest:
val cols = Array("Adrress", "sex", "levelId")
Then add an array column of each column of interest and its frequency using
df.select(
cols.map(
x => array(
count(x).over(
Window.partitionBy(
lower(col("FName")),
lower(col("LName")),
col(x)
)
),
col(x)
).alias(x ++ "_freq")
)
)
Then group by each person and aggregate to get the maximum frequency: (ignore the dummy agg which is due to the agg function requiring an argument and a bunch of other arguments)
.groupBy(
lower(col("FName")).alias("FName"),
lower(col("LName")).alias("LName"))
.agg(
count($"*").alias("dummy"),
cols.map(
x => max(col(x ++ "_freq"))(1).alias(x)
): _*
)
.drop("dummy"))
Overall code:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window
val cols = Array("Adrress", "sex", "levelId")
val df = spark.read.option("header", "true").option("delimiter", " ").option("inferSchema", "true").csv("names.txt")
val df2 = (df
.select(col("*") +: cols.map(x => array(count(x).over(Window.partitionBy(lower(col("FName")), lower(col("LName")), col(x))), col(x)).alias(x ++ "_freq")): _*)
.groupBy(lower(col("FName")).alias("FName"), lower(col("LName")).alias("LName"))
.agg(count($"*").alias("dummy"), cols.map(x => max(col(x ++ "_freq"))(1).alias(x)): _*)
.drop("dummy"))
df2.show
+-------+-------+-------+---+-------+
| FName| LName|Adrress|sex|levelId|
+-------+-------+-------+---+-------+
| sara|chauhan| 15nvi| f| 6666|
|prakash| jha| 18nvi| f| 3333|
| girish| hm| 10oak| m| 1111|
| kiran| kumar| 5wren| f| 2222|
+-------+-------+-------+---+-------+
How to get an additional column of type string using ??
I tried this:
t:([]c1:`a`b`c;c2:1 2 3)
?[t;();0b;`c1`c2`c3!(`c1;`c2;10)] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist(`abc))] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;"10")] / 'length
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist("10"))] / 'length
but got 'length error.
Your first case works because an atom will automatically expand to the required length. For a compound column you'll need to explicitly generate the correct length as follows
q)select c1,c2,c3:`abc,c4:10,c5:count[i]#enlist"abc" from t
c1 c2 c3 c4 c5
------------------
a 1 abc 10 "abc"
b 2 abc 10 "abc"
c 3 abc 10 "abc"
// in functional form
q)?[t;();0b;`c1`c2`c3!(`c1;`c2;(#;(count;`i);(enlist;"abc")))]
c1 c2 c3
-----------
a 1 "abc"
b 2 "abc"
c 3 "abc"
Jason
I have two tables. The first one:
col1 | col2 | ColumnOfInterest | DateOfInterest
--------------------------------------------------------
abc | def | ghi | 2013-02-24 17:48:32.548
.
.
.
The second one:
ColumnOfInterest | DateChanged | col3 | col4
--------------------------------------------------------
ghi | 2012-08-13 06:28:11.092 | jkl | mno
ghi | 2012-10-16 23:54:07.613 | pqr | stu
ghi | 2013-01-29 14:13:18.502 | vwx | yz1
ghi | 2013-10-01 14:17:32.992 | 234 | 567
.
.
.
What I'm trying to do is to make a 1:1 join between the two tables on the ColumnOfInterest and so that the DateOfInterest reflects the date from the second table.
That is, the line from the first table would be joined to the third line of the second table.
Do you have any ideas?
Thanks
select table1.ColumnOfInterest, max(table2.DateChanged)
from table1
join table2
on table1.ColumnOfInterest = table1.ColumnOfInterest
and table1.CDateOfInterest >= table2.DateChanged
group by table1.ColumnOfInterest
SELECT 'abc' col1,
'def' col2,
'ghi' ColumnOfInterest,
CAST('2013-02-24 17:48:32.548' AS DATETIME) DateOfInterest
INTO #DateOfInterest
CREATE TABLE #History
(
ColumnOfInterest VARCHAR(5),
DateChanged DATETIME,
col3 VARCHAR(5),
col4 VARCHAR(5)
)
INSERT INTO #History
VALUES ('ghi','2012-08-13 06:28:11.092','jkl','mno'),
('ghi','2012-10-16 23:54:07.613','pqr','stu'),
('ghi','2013-01-29 14:13:18.502','vwx','yz1'),
('ghi','2013-10-01 14:17:32.992','234','567');
;WITH CTE_Date_Ranges
AS
(
SELECT ColumnOfInterest,
DateChanged,
LAG(DateChanged,1,GETDATE()) OVER (PARTITION BY ColumnOfInterest ORDER BY DateChanged) AS end_date,
col3,
col4
FROM #History
)
SELECT B.*,
A.*
FROM CTE_Date_Ranges A
INNER JOIN #DateOfInterest B
ON B.DateOfInterest > A.DateChanged AND B.DateOfInterest < A.end_date
Results:
col1 col2 ColumnOfInterest DateOfInterest ColumnOfInterest DateChanged end_date col3 col4
---- ---- ---------------- ----------------------- ---------------- ----------------------- ----------------------- ----- -----
abc def ghi 2013-02-24 17:48:32.547 ghi 2012-08-13 06:28:11.093 2015-04-21 18:46:46.967 jkl mno