How can I fix Unicode issues in the dataset returned from my SPARQL query? - unicode

At the moment, I am getting rows with Unicode decode issues, while using SPARQL on Dbpedia (using Virtuoso servers). This is an example of what I am getting Knut %C3%85ngstr%C3%B6m.
The right name is Knut Ångström. Cool, now how do I fix this? My crafted query is:
select distinct (strafter(str(?influencerString),str(dbpedia:)) as ?influencerString) (strafter(str(?influenceeString),str(dbpedia:)) as ?influenceeString) where {
{ ?influencer a dbpedia-owl:Person . ?influencee a dbpedia-owl:Person .
?influencer dbpedia-owl:influenced ?influencee .
bind( replace( str(?influencer), "_", " " ) as ?influencerString )
bind( replace( str(?influencee), "_", " " ) as ?influenceeString )
}
UNION
{ ?influencee a dbpedia-owl:Person . ?influencer a dbpedia-owl:Person .
?influencee dbpedia-owl:influencedBy ?influencer .
bind( replace( str(?influencee), "_", " " ) as ?influenceeString )
bind( replace( str(?influencer), "_", " " ) as ?influencerString )
}
}

The DBpedia wiki explains that the identifiers for resources in the English DBpedia dataset use URIs, not IRIs, which means that you'll end up with encoding issues like this.
3. Denoting or Naming “things”
Each thing in the DBpedia data set is denoted by a de-referenceable
IRI- or URI-based reference of the form
http://dbpedia.org/resource/Name, where Name is derived from the URL
of the source Wikipedia article, which has the form
http://en.wikipedia.org/wiki/Name. Thus, each DBpedia entity is tied
directly to a Wikipedia article. Every DBpedia entity name resolves to
a description-oriented Web document (or Web resource).
Until DBpedia release 3.6, we only used article names from the English
Wikipedia, but since DBpedia release 3.7, we also provide localized
datasets that contain IRIs like http://xx.dbpedia.org/resource/Name,
where xx is a Wikipedia language code and Name is taken from the
source URL, http://xx.wikipedia.org/wiki/Name.
Starting with DBpedia release 3.8, we use IRIs for most DBpedia entity
names. IRIs are more readable and generally preferable to URIs, but
for backwards compatibility, we still use URIs for DBpedia resources
extracted from the English Wikipedia and IRIs for all other languages.
Triples in Turtle files use IRIs for all languages, even for English.
There are several details on the encoding of URIs that should always
be taken into account.
In this particular case, it looks like you don't really need to break up the identifier so much as get a label for the entity.
## If things were guaranteed to have just one English label,
## we could simply take ?xLabel as the value that we want with
## `select ?xLabel { … }`, but since there might be more than
## one, we can group by `?x` and then take a sample from the
## set of labels for each `?x`.
select (sample(?xLabel) as ?label) {
?x dbpedia-owl:influenced dbpedia:August_Kundt ;
rdfs:label ?xLabel .
filter(langMatches(lang(?xLabel),"en"))
}
group by ?x
SPARQL results
Simplifying your query a bit, we can have this:
select
(sample(?rLabel) as ?influencerName)
(sample(?eLabel) as ?influenceeName)
where {
?influencer dbpedia-owl:influenced|^dbpedia-owl:influencedBy ?influencee .
dbpedia-owl:Person ^a ?influencer, ?influencee .
?influencer rdfs:label ?rLabel .
filter( langMatches(lang(?rLabel),"en") )
?influencee rdfs:label ?eLabel .
filter( langMatches(lang(?eLabel),"en") )
}
group by ?influencer ?influencee
SPARQL results
If you don't want language tags on those results, then add a call to str():
select
(str(sample(?rLabel)) as ?influencerName)
(str(sample(?eLabel)) as ?influenceeName)
where {
?influencer dbpedia-owl:influenced|^dbpedia-owl:influencedBy ?influencee .
dbpedia-owl:Person ^a ?influencer, ?influencee .
?influencer rdfs:label ?rLabel .
filter( langMatches(lang(?rLabel),"en") )
?influencee rdfs:label ?eLabel .
filter( langMatches(lang(?eLabel),"en") )
}
group by ?influencer ?influencee
SPARQL results

Related

SPARQL: Combine two select statements that each have a GROUP BY clause

Hello I am trying to find the total number of municipalities a region has along with the name of each region and the total number of municipalities a regional unit has along with the name of the regional unit. A region consists of regional units and a regional unit consists of municipalities. Below is my query that unfortunately returns wrong results. What I am basically trying to do is group by region and get the name and the total municipalities of each region and group by regional unit and take the name and the total municipalities of each unit. Any suggestions to the right direction would be appreciated. Cheers!:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema
PREFIX strdf: <http://strdf.di.uoa.gr/ontology
PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?region ?municipality_region ?unit ?municipality_unit
WHERE
{
{ SELECT ?region (COUNT(?municipality) AS ?municipality_region)
{?m rdf:type gag:Δήμος .
?m gag:έχει_επίσημο_όνομα ?municipality .
?m gag:ανήκει_σε ?reg_un .
?reg_un gag:ανήκει_σε ?reg .
?reg gag:έχει_επίσημο_όνομα ?region .
}GROUP BY ?region}
{ SELECT ?unit (COUNT(?municipality_un) AS ?municipality_unit)
{ ?m rdf:type gag:Δήμος .
?m gag:έχει_επίσημο_όνομα ?municipality_un .
?m gag:ανήκει_σε ?reg_un .
?reg_un gag:έχει_επίσημο_όνομα ?unit .
} GROUP BY ?unit}
};
Below I am giving a mapping of properties in english:
Δήμος = municipality
έχει_επίσημο_όνομα = has name
ανήκει_σε = belongs to
And here is the ontology I am working with:
link

sparql select wikidata group_by and concat

I want to extract a list o players and a list of clubs where it has played, separated by commas.
SELECT DISTINCT ?playerLabel
(GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P2574 ?team
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel
I have two problems:
I don't get a list of teams for each player, only the name, and variable ?teams empty.
If I don't use GROUP CONCAT and GROUP BY I obtain the team id, but I prefer the label of team.
For example 2 players...:
playerLabel teams
Cristiano Ronaldo Sporting Portugal, Manchester U, Real Madrid, Juventus, Manchester U
Leo Messi Barcelona, PSG
At least I need the Concat and group by, even with code...
thanks
You use P2574, which is "National-Football-Teams.com player ID". While National-Football-Teams.com lists all teams a player played for, this data is not accessible through the Wikidata Query Service. But Wikidata itself has a dedicated property for sports team member: P54.
So write ?player wdt:P54 ?team instead of ?player wdt:P2574 ?team.
Additionaly, you need to add ?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en') to be able to use ?teamLabel in GROUP_CONCAT().
Thus, the full working query looks like this (restricted to US players to avoid query time outs):
SELECT DISTINCT ?playerLabel (GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P27 wd:Q30 .
?player wdt:P54 ?team .
?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en')
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel

Sparql Select only one label and comment of classes

I am trying to select one label and comment of classes but group by doesn't work as expected. The following query is an example of the select.
SELECT ?class ?label ?comment WHERE
{
{SELECT DISTINCT ?class WHERE { {?uri rdf:type ?class}UNION {?class rdf:type owl:Class} UNION {?class rdf:type rdfs:Class} }OFFSET 0 LIMIT 100}
.optional{?class rdfs:label ?label}
.optional{?class rdfs:comment ?comment}
}GROUP BY ?class
The goal is to to have every class uri with one label and comment
But am getting results as :
http://dbpedia.org/ontology/Activity "attività"#it
http://dbpedia.org/ontology/Activity "活動"#ja
Any idea ?
Your query is actually illegal - it uses non-aggregate, non-group key in the outer SELECT.
You need to use "SAMPLE" to one pick (random) item from a group if you think there might be multiple labels or comments.
SELECT ?class (sample(?labelX) as ?label) (sample(?commentX) as ?comment) WHERE
{
SELECT DISTINCT ?class {
{?uri rdf:type ?class} UNION
{?class rdf:type owl:Class} UNION
{?class rdf:type rdfs:Class}
} LIMIT 100
optional{?class rdfs:label ?labelX}
optional{?class rdfs:comment ?commentX}
} GROUP BY ?class

Quote raw sql in ZEND to avoid sql injection

Long story short, I have an admin section where the user can choose from multiple dropdown lists the tables and fields that must be queries in order to get some values. Therefore, the query in ZEND is performed by concatenating the strings
$query = "SELECT $fieldName1, $fieldName2 from $tableName where $fieldName1 = $value";
How can I escape the above using ZEND approach to avoid sql injection? I tried adding them all as ? and calling quoteinto but it seems this does not work on some of the variables (like table names or field names)
ZF has quoteIdentifier() specifically for this purpose:
$query = "SELECT ".$db->quoteIdentifier($fieldName1).","...
In your case you might (also) want to check against a white list of valid column names.
Use quoteInto() or Zend_db_Select::where() for the values, and for the table and column names, I would simply strip any non alpha characters and then wrap them in ` quotes prior to using them in your SQL.
Example:
// Strip non alpha and quote
$fieldName1 = '`' . preg_replace('/[^A-Za-z]/', '', $fieldName1) . '`';
$tableName = '`' . preg_replace('/[^A-Za-z]/', '', $tableName) . '`';
// ....
// Build the SQL using Zend Db Select
$db->select()->from($tableName, array($fieldName1, $fieldName2))
->where($fieldName1 . ' = ?', $value);
In SafeMysql you can make it as simple, as
$sql = "SELECT ?n, ?n from ?n where ?n = ?s";
$data = $db->getAll($sql,$fieldName1,$fieldName2, $tableName, $fieldName1, $value);
though I understand that you won't change your ZF to SafeMysql.
Nevertheless, there is one essential thing that is ought to be done manually:
I doubt you want to let users to browse users table or financial table or whatever.
So, you have to verify a passed table name against an allowed tables array.
like
$allowed = ('test1','test2');
if (!in_array($tableName, $allowed)) {
throw new _403();
}

Get Place categories from DBpedia using SPARQL

The following code queries DBpedia for places within a bounded geographic area and returns the name, lat, and long of the place. I'd also like the query to return the category of the place--e.g., park, restaurant, museum, etc.
The following code works fine.
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT * WHERE {
?s a dbo:Place .
?s geo:lat ?lat .
?s geo:long ?long .
I tried to add the following code to get categories for places, but this doesn't work:
?s category:cat ?cat .
What should I add/change? Thanks.
You can get the category of a place (assuming you mean the type) by finding the type (rdfs:type) or the subject (dcterms:subject) of a resource. In DBPedia the first relates to the DBPedia and Yago ontologies and the second is a SKOS hierarchy in DBPedia. Here is an example query:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT * WHERE {
?s a dbo:Place .
?s geo:lat ?lat .
?s geo:long ?long .
?s a ?type .
?s dcterms:subject ?sub
}
Note that you will get multiple types and subjects for each place.