SPARQL query for merging RDF Data Cubes - merge

I'm engaging in a project that stores 2 RDF Data Cubes:
Climate Data Cube : humidity-dataset, rainfall-dataset, temperature-dataset
Industry Data Cube : industry-dataset
Both data cubes are stored on GraphDB Database as named graphs. Each dataset of these graphs both have the same dimension: time and year. Now I need to merge these dataset together for data-exploring. Assume we the observations below that contain the data of climate and industry of Ha Noi city in 2016-2017:
graph : http://sda-research.ml/graph/climate
Dataset-climate
ds:obs5 a qb:Observation;
qb:dataSet ds:dataset-climate;
prop:city "Ha Noi"#en;
prop:cityid "hanoi";
prop:humidity 8.17E1;
prop:rainfall 2.1668E3;
prop:year "2016"^^xsd:int .
ds:obs6 a qb:Observation;
qb:dataSet ds:dataset-climate;
prop:city "Ha Noi"#en;
prop:cityid "hanoi";
prop:humidity 8.18E1;
prop:rainfall 2.6402E3;
prop:year "2017"^^xsd:int .
graph : http://sda-research.ml/graph/industry
Dataset-industry
ds:obs205 a qb:Observation;
qb:dataSet ds:dataset-industry;
prop:city "Hà Nội"#en;
prop:cityid "hanoi";
prop:industry 1.073E2;
prop:year "2016"^^xsd:int .
ds:obs206 a qb:Observation;
qb:dataSet ds:dataset-industry;
prop:city "Hà Nội"#en;
prop:cityid "hanoi";
prop:industry 1.07E2;
prop:year "2017"^^xsd:int .
Now I want to merge 2 graphs for the output that contain humidity and industry value of Hanoi in 2016-2017. On GraphDB SPARQL Endpoint, I used this query:
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX prop: <http://www.sda-research.ml/dc/prop/>
select ?city ?year ?temperature ?industry
where{
{graph ?g {
?obs a qb:Observation.
?obs prop:cityid ?cityid filter regex(?cityid, 'hanoi').
?obs prop:city ?city.
?obs prop:year ?year filter(?year >= 2017 && ?year <= 2018 ).
?obs prop:temperature ?temperature.
}
}
UNION
{graph ?g {
?obs a qb:Observation.
?obs prop:cityid ?cityid filter regex(?cityid, 'hanoi').
?obs prop:city ?city.
?obs prop:year ?year filter(?year >= 2016 && ?year <= 2017).
?obs prop:industry ?industry.
}
}
}
Expected output:
city------year------humidity------industry---
Ha Noi-----2016-------8.17E1------ 1.073E2---
Ha Noi-----2017-------8.18E1-------1.07E2----
Actual output:
city------year------humidity------industry--
Ha Noi-----2016-------8.17E1--------null----
Ha Noi-----2017-------8.18E1--------null----
Ha Noi-----2016--------null--------1.073E2--
Ha Noi-----2017--------null--------1.07E2---
How can I remove the null value when using UNION, or do you have any query that give the correctly expected result?

There are several issues with your query before we get into the SPARQL itself.
Your dataset contains humidity, but you are querying temperature.
The years that you are querying do not match, except for 2017: In
the first graph you are looking at 2017 and 2018, in the second, you
are looking at 2016 and 2017. This may be fine in certain cases, but
it will not produce the result you expect.
Now in terms of SPARQL issues.
You query both ?cityid and ?city, but the value of ?city is spelt differently across named graphs, namely "Hà Nội"#en and "Ha Noi"#en.
Your observations are not the same resource across named graphs.
You use only one variable, ?g for your named graphs. This means that the 2/4 results are obtained by looking at the climate graph, whereas the second two results by looking at the industry graph.
When you have a specific graph in mind from which to extract sources, you should specify it.
When you have a specific city in mind, I would avoid using the REGEX. Different triplestores implement query planning differently, but this is an expensive operation that may significantly worsen your performance. See below for how to deal with this by using the values keyword.
Now here is a slightly amended query that produces the results you're after:
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX prop: <http://www.sda-research.ml/dc/prop/>
select ?cityid ?year ?humidity ?industry
where{
values ?cityid {'hanoi'}
graph <http://sda-research.ml/graph/climate> {
?obs1 a qb:Observation.
?obs1 prop:cityid ?cityid.
?obs1 prop:year ?year filter(?year >= 2016 && ?year <= 2017 ).
?obs1 prop:humidity ?humidity.
}
graph <http://sda-research.ml/graph/industry> {
?obs2 a qb:Observation.
?obs2 prop:cityid ?cityid.
?obs2 prop:year ?year filter(?year >= 2016 && ?year <= 2017).
?obs2 prop:industry ?industry.
}
}

Related

SPARQL: Combine two select statements that each have a GROUP BY clause

Hello I am trying to find the total number of municipalities a region has along with the name of each region and the total number of municipalities a regional unit has along with the name of the regional unit. A region consists of regional units and a regional unit consists of municipalities. Below is my query that unfortunately returns wrong results. What I am basically trying to do is group by region and get the name and the total municipalities of each region and group by regional unit and take the name and the total municipalities of each unit. Any suggestions to the right direction would be appreciated. Cheers!:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema
PREFIX strdf: <http://strdf.di.uoa.gr/ontology
PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?region ?municipality_region ?unit ?municipality_unit
WHERE
{
{ SELECT ?region (COUNT(?municipality) AS ?municipality_region)
{?m rdf:type gag:Δήμος .
?m gag:έχει_επίσημο_όνομα ?municipality .
?m gag:ανήκει_σε ?reg_un .
?reg_un gag:ανήκει_σε ?reg .
?reg gag:έχει_επίσημο_όνομα ?region .
}GROUP BY ?region}
{ SELECT ?unit (COUNT(?municipality_un) AS ?municipality_unit)
{ ?m rdf:type gag:Δήμος .
?m gag:έχει_επίσημο_όνομα ?municipality_un .
?m gag:ανήκει_σε ?reg_un .
?reg_un gag:έχει_επίσημο_όνομα ?unit .
} GROUP BY ?unit}
};
Below I am giving a mapping of properties in english:
Δήμος = municipality
έχει_επίσημο_όνομα = has name
ανήκει_σε = belongs to
And here is the ontology I am working with:
link

sparql select wikidata group_by and concat

I want to extract a list o players and a list of clubs where it has played, separated by commas.
SELECT DISTINCT ?playerLabel
(GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P2574 ?team
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel
I have two problems:
I don't get a list of teams for each player, only the name, and variable ?teams empty.
If I don't use GROUP CONCAT and GROUP BY I obtain the team id, but I prefer the label of team.
For example 2 players...:
playerLabel teams
Cristiano Ronaldo Sporting Portugal, Manchester U, Real Madrid, Juventus, Manchester U
Leo Messi Barcelona, PSG
At least I need the Concat and group by, even with code...
thanks
You use P2574, which is "National-Football-Teams.com player ID". While National-Football-Teams.com lists all teams a player played for, this data is not accessible through the Wikidata Query Service. But Wikidata itself has a dedicated property for sports team member: P54.
So write ?player wdt:P54 ?team instead of ?player wdt:P2574 ?team.
Additionaly, you need to add ?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en') to be able to use ?teamLabel in GROUP_CONCAT().
Thus, the full working query looks like this (restricted to US players to avoid query time outs):
SELECT DISTINCT ?playerLabel (GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P27 wd:Q30 .
?player wdt:P54 ?team .
?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en')
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel

OrientDB: Join in OrientDB - How to

Hello everybody (again),
I am trying to join two class in orient db.
I want all the records and properties from two class as a result.
Since here join not works so please suggest me in orient db how join works
and please suggest me also how to use edges for join in orientdb
Its rather simple: Write the rid of the target-record into a field in your master-table.
I will describe using the activeorient, the ruby-orientDB ORM:
DB.create_class :basiswert
=> Basiswert
DB.create_class :stock
=> Stock
apple = Basiswert.create name: 'Apple', kind: 'silicon valley company'
=> #<Basiswert:0x0000000241ca38 #metadata={"type"=>"d", "class"=>"basiswert", "version"=>1, "fieldTypes"=>nil, "cluster"=>53, "record"=>0}, #d=nil, #attributes={"name"=>"Apple", "kind"=>"silicon valley company", "created_at"=>Fri, 24 Feb 2017 16:55:37 +0100}>
apple_stock = Stock.create symbol: 'AAPL', :price => 200, basiswert: apple
=> #<Stock:0x00000003ecb370 #metadata={"type"=>"d", "class"=>"stock", "version"=>1, "fieldTypes"=>"basiswert=x", "cluster"=>57, "record"=>0}, #d=nil, #attributes={"symbol"=>"AAPL", "price"=>200, "basiswert"=>"#53:0", "created_at"=>Fri, 24 Feb 2017 16:55:43 +0100}>
apple_stock.basiswert
=> #<Basiswert:0x0000000241ca38 #metadata={"type"=>"d", "class"=>"basiswert", "version"=>1, "fieldTypes"=>nil, "cluster"=>53, "record"=>0}, #d=nil, #attributes={"name"=>"Apple", "kind"=>"silicon valley company", "created_at"=>Fri, 24 Feb 2017 16:55:37 +0100}>
Alternatively you just put "#53:0" into »apple-stock.basiswert«
This is a unidirectional join (or a simple link).
Obviously you can query the stock-class
Stock.where basiswert: apple-stock.rid
or in plain OrientDB-SQL
select from stock where basiswert= "#53:0"

How to aggregate synonym data with SPARQL

I'm using sesame repository with some triple data publication like this:
<http://example.org/doc2> a qb:Observation;
foaf:Organization "Inst. of Technol.";
ps:sumPaper 3 .
<http://example.org/doc3> a qb:Observation;
foaf:Organization "Institute of Technology";
ps:sumPaper 5 .
<http://example.org/doc4> a qb:Observation;
foaf:Organization "Dong C Univ.";
ps:sumPaper 4 .
<http://example.org/doc5> a qb:Observation;
foaf:Organization "University of Dong C";
ps:sumPaper 2 .
doc2 and doc3, actually have the same organization. As well as doc 4 and doc 5, its has synonym organization.
I want to aggregate data with sparql, and I want to expect result like this :
Organization sumPaper
-----------------------------------
Insitute of Technology 8
University of Dong C 6
so, I added at repository with synonym ontology to description.
:org2 a foaf:Organization;
ps:organizationName "Inst. of Technol";
owl:sameAs :org3.
:org3 a foaf:Organization;
ps:organizationName "Institute of Technology".
:org4 a foaf:Organization;
ps:organizationName "Dong C Univ.";
:org5 a foaf:Organization;
ps:organizationName "University of Dong C";
owl:sameAs :org4.
please help me...I'm so confused to make sparql statement to get result that I expected.
You're complicating things with owl:sameAs, Try this instead:
:org1 a foaf:Organization ;
ps:organizationName "Inst. of Technol", "Institute of Technology" .
:org2 a foaf:Organization ;
ps:organizationName "Dong C Univ.", "University of Dong C" .
You can then do the following:
select ?org (SUM(?sumP) as ?sum)
{
?ob a qb:Observation ;
ps:sumPaper ?sumP ;
foaf:Organization ?orgName .
# Lookup org based on synonyms
?org ps:organizationName ?orgName .
}
group by ?org
Although that will give you org identifiers. If that bothers you:
select (SAMPLE(?orgName) as ?name) (SUM(?sumP) as ?sum)
...
or even add an rdsf:label or skos:prefLabel to each org in your synonym file.

Selecting rows only if meeting criteria

I am new to PostgreSQL and to database queries in general.
I have a list of user_id with university courses taken, date started and finished.
Some users have multiple entries and sometimes the start date or finish date (or both) are missing.
I need to retrieve the longest course taken by a user or, if start date is missing, the latest.
If multiple choices are still available, then pick random among the multiple options.
For example
on user 2 (below) I want to get only "Economics and Politics" because it has the latest date;
on user 6, only "Electrical and Electronics Engineering" because it is the longer course.
The query I did doesn't work (and I think I am off-track):
(SELECT Q.user_id, min(Q.started_at) as Started_on, max(Q.ended_at) as Completed_on,
q.field_of_study
FROM
(select distinct(user_id),started_at, Ended_at, field_of_study
from educations
) as Q
group by Q.user_id, q.field_of_study )
order by q.user_id
as the result is:
User_id Started_on Completed_on Field_of_studies
2 "2001-01-01" "" "International Economics"
2 "" "2002-01-01" "Economics and Politics"
3 "1992-01-01" "1999-01-01" "Economics, Management of ..."
5 "2012-01-01" "2016-01-01" ""
6 "2005-01-01" "2009-01-01" "Electrical and Electronics Engineering"
6 "2011-01-01" "2012-01-01" "Finance, General"
6 "" "" ""
6 "2010-01-01" "2012-01-01" "Financial Mathematics"
I think this query should do what you need, it relies on calculating the difference in days between ended_at and started_at, and uses 0001-01-01 if the started_at is null (making it a really long interval):
select
educations.user_id,
max(educations.started_at) started_at,
max(educations.ended_at) ended_at,
max(educations.field_of_study) field_of_study
from educations
join (
select
user_id,
max(
ended_at::date
-
coalesce(started_at, '0001-01-01')::date
) max_length
from educations
where (started_at is not null or ended_at is not null)
group by user_id
) x on educations.user_id = x.user_id
and ended_at::date
-
coalesce(started_at, '0001-01-01')::date
= x.max_length
group by educations.user_id
;
Sample SQL Fiddle