How to isolate a list of URIs in a RDF file using CONSTRUCT or DESCRIBE in SPARQL? - select

I'm trying to get only a list of URIs in RDF instead of a list of triples:
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DESCRIBE ?product
WHERE
{
?product rdfs:subClassOf gr:ProductOrService .
}
Using SELECT, instead of DESCRIBE I receive only the subject (which I want), but not as an RDF but like a SPARQL Result with binds, vars, etc.
Using CONSTRUCT, I can't specify only the ?product, as above, so the closest I can get is:
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT
WHERE
{
?product rdfs:subClassOf gr:ProductOrService .
}
Which returns an RDF with triples of different products, but the same properties and objects.

It looks like you should read over the SPARQL specification, which tells you that this is exactly as expected. To get the single column you wish, you must use SELECT.
SPARQL has four query forms. These query forms use the solutions from pattern matching to form result sets or RDF graphs. The query forms are:
SELECT
Returns all, or a subset of, the variables bound in a query pattern match.
CONSTRUCT
Returns an RDF graph constructed by substituting variables in a set of triple templates.
ASK
Returns a boolean indicating whether a query pattern matches or not.
DESCRIBE
Returns an RDF graph that describes the resources found.

I found a solution. I've used SELECT, but instead of binds and vars, as output received a "Comma-Separated Values (with fields in N-Triples syntax)" CSV file:
<http://www.productontology.org/id/Real_estate>
<http://www.productontology.org/id/Automobile>
<http://www.productontology.org/id/Auction>
<http://www.productontology.org/id/Video_game>
<http://www.productontology.org/id/Campsite>
<http://www.productontology.org/id/Car>
<http://www.productontology.org/id/Audiobook>
<http://www.productontology.org/id/Browser_game>
...

Representing the result of a SPARQL SELECT query as an RDF List is a tooling issue - it is not something that can be solved in SPARQL in general.
Some SPARQL tools may have ways to support rendering the query result as an RDF list. But it's not something you can fix by just formulating your query differently, you'll need to use a tool to (programmatically) format the result.
In Java, using Eclipse RDF4J, you can do it as follows (untested so you may need to tweak it to work properly, but it should give you a general idea):
String query = "SELECT ?product WHERE { ?product rdfs:subClassOf gr:ProductOrService . }";
// do the query on repo and convert to a Java list of URI objects
List<URI> results = Repositories.tupleQuery(
repo,
query,
r -> QueryResults.stream(r).map(bs -> (URI)bs.getValue("product")).collect(Collectors.toList()
);
// create a resource (bnode or URI) for the start of the rdf:List
Resource head = SimpleValueFactory.getInstance().createBNode();
// convert the Java list of results to an rdf:list and add it
// to a newly-created RDF Model
Model m = RDFCollections.asRDF(results, head, new LinkedHashModel());
Once you have your result as an RDF model you can use any of the existing RDF4J APIs to write it to file or to store it in a triplestore.
Now, I am not claiming that any of this is a good idea - I have never seen a use case for something like this. But this is how I would do it if it were necessary.

Related

Load more records from Gatling feeder

I would like to inject n-rows from my csv file to Gatling feeder. The default approach of Gatling is to read and inject one row at a time. However, I cannot find anywhere, how to take and inject an eg. Array into a template.
I came up with creating a JSON template with Gatling Expressions as some of the fields. The issue is I have a JSON array with N-elements:
[
{"myKey": ${value}, "mySecondKey": ${value2}, ...},
{"myKey": ${value}, "mySecondKey": ${value2}, ...},
{"myKey": ${value}, "mySecondKey": ${value2}, ...},
{"myKey": ${value}, "mySecondKey": ${value2}, ...}
]
And my csv:
value,value2,...
value,value2,...
value,value2,...
value,value2,...
...
I would like to make it as efficient as possible. My data is in CSV file, so I would like to use csv feeder. Also, the size is large, so readRecords is not possible, since I'm getting out of memory.
Is there a way I can put N-records into the request body using Gatling?
From the documentation:
feed(feeder, 2)
Old Gatling versions:
Attribute names, will be suffixed. For example, if the columns are name “foo” and “bar” and you’re feeding 2 records at once, you’ll get “foo1”, “bar1”, “foo2” and “bar2” session attributes.
Modern Gatling versions:
values will be arrays containing all the values of the same key.
In this latter case, you can access a value at a given index with Gatling EL: #{foo(0)}, #{foo(1)}, #{bar(0)} and #{bar(1)}
It seems that the documentation on this front might have changed a bit since then:
It’s also possible to feed multiple records at once. In this case, values will be arrays containing all the values of the same key.
I personally wrote this in Java, but it is easy to find the syntax for scala as well in the documentation.
The solution I used for my CSV file is to add the feeder to the scenario like:
.feed(CoreDsl.csv("pathofyourcsvfile"), NUMBER_OF_RECORDS)
To apply/receive that array data during your .exec you can do something like this:
.post("YourEndpointPath").body(StringBody(session -> yourMethod(session.get(YourStringKey))))
In this case, I am using a POST and requestBody, but the concept remains similar for GET and their corresponding queryParameters. So basically, you can use the session lambda in combination with the session.get method.
"yourMethod" can then receive this parameter as an Object[].

RDF4J not filtering TreeModel in expected way

I have a TTL with something like
ex:isDataProperty rdf:type owl:DatatypeProperty .
ex:Article a owl:Class ;
owl:hasKey ( ex:isDataProperty ) .
And when I load the model with RDF4J (as a TreeModel) then try to filter to extract the properties annotated with haskey fails (just returns empty list result)
Some samples that return data:
val dataProperties = model.filter(null, RDF.TYPE, OWL.DATATYPEPROPERTY).subjects().asScala
val classes = model.filter(null, RDF.TYPE, OWL.CLASS).subjects().asScala
The sample I want, that doesn't return data:
val propertiesWithKeys = model.filter(null, RDF.PROPERTY, OWL.HASKEY).subjects().asScala
I have tried a few variations of the previous one using RDF.TYPE or RDF.Value. (instead of RDF.PROPERTY)
The thing you're after is any subject that has a owl:hasKey property, regardless of value. So both the subject and the object are wildcards, you just want to filter by property name. The way to do that is like this:
model.filter(null, OWL.HASKEY, null)
Now, furthermore you say that you want to know the properties that have been used as annotation using this owl:hasKey property. In your example, that would be ex:isDataProperty. Note that in your model, this is not the subject of the owl:hasKey relation - it's in the object values:
model.filter(null, OWL.HASKEY, null).objects()
To further complicate matters, the object values in your example are not simply single values. Instead, each class is annotated using a list of properties, so the object value is a list object (a.k.a. an RDF Collection). To process this list, there are some utility methods provided by the Models and RDFCollections classes.
For each of the objects you can do this to get the actual list of values:
RDFCollections.asValues(model, objectNode, new ArrayList<Value>())
(where objectNode is one of the values that .objects() returned)
Edit since objects() returns objects of type Value and RDFCollections expects a Resource, you'll either have to do a cast, or if you want to do all of this in a fluent way, you can use Models.objectResources instead. The whole thing then becomes:
Models.objectResources(model.filter(null, OWL.HASKEY, null))
.asScala.map(o => RDFCollections.asValues(model, o, new ArrayList[Value]()));
(I may have the Scala-specific bits of this wrong, but you get the gist hopefully)
For more information on how to work with the rdf4j Model API and with RDF Collections, see the rdf4j documentation.

RDF document metadata

I have a software that generates an RDF representation of certain dataset. I want to add to the generated data also some metadata describing not specific data contained in the data set but the document itself - i.e., when the document was created, by which software, which version, etc. The schema.org properties provide the necessary relationships, but I can not figure out the proper place to attach it. Is there some standard way of saying "this is the metadata about the document itself" in RDF? I use Turtle serialization for RDF but generic answer working with any serialization would be preferable.
There is not a standard place to do this. A RDF graph is just a collection of triples; it's not identified by an IRI or anything like that. (However, in SPARQL datasets, you could post some metadata about a named graph by using the name of the graph as the subject in a triple. That would just be a convention, though. It's not "official" in any sense.)
In the RDF serializations of OWL ontologies, there can be an ontology element (i.e., a resource with the type owl:Ontology), and that can be used to associate some metadata with the ontology. You'd probably want to adopt an approach like that. That is, you'd establish a convention with something like
#prefix ex: <...>
[] a ex:DatasetRepresentation ;
ex:created "..." ;
ex:representationOf <...> .
#... rest of generated content ...
I am not that familiar with Schema.org, but both DCat and Doublin Core provide means to do so.
In Doublin Core there is the identifier Data property. An example:
PREFIX : <http://my.domain/meta-data#>
PREFIX dcterms: <http://purl.org/dc/terms/>
:1 a dcterms:BibliographicResource ;
dcterms:identifier <http://my.domain/my-document> .
A similar record but now using DCat and the landingPage data property:
PREFIX : <http://my.domain/meta-data#>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
:1 a dcat:Resource ;
dcat:landingPage <http://my.domain/my-document> .

How do I dynamically build a search block in sunspot?

I am converting a Rails app from using acts_as_solr to sunspot.
The app uses the field search capability in solr that was exposed in acts_as_solr. You could give it a query string like this:
title:"The thing to search"
and it would search for that string in the title field.
In converting to sunspot I am parsing out field specific portions of the query string and I need to dynamically generate the search block. Something like this:
Sunspot.search(table_clazz) do
keywords(first_string, :fields => :title)
keywords(second_string, :fields => :description)
...
paginate(:page => page, :per_page => per_page)
end
This is complicated by also needing to do duration (seconds, integer) ranges and negation if the query requires it.
On the current system users can search for something in the title, excluding records with something else in another field and scoping by duration.
In a nutshell, how do I generate these blocks dynamically?
I recently did this kind of thing using instance_eval to evaluate procs (created elsewhere) in the context of the Sunspot search block.
The advantage is that these procs can be created anywhere in your application yet you can write them with the same syntax as if you were inside a sunspot search block.
Here's a quick example to get you started for your particular case:
def build_sunspot_query(conditions)
condition_procs = conditions.map{|c| build_condition c}
Sunspot.search(table_clazz) do
condition_procs.each{|c| instance_eval &c}
paginate(:page => page, :per_page => per_page)
end
end
def build_condition(condition)
Proc.new do
# write this code as if it was inside the sunspot search block
keywords condition['words'], :fields => condition[:field].to_sym
end
end
conditions = [{words: "tasty pizza", field: "title"},
{words: "cheap", field: "description"}]
build_sunspot_query conditions
By the way, if you need to, you can even instance_eval a proc inside of another proc (in my case I composed arbitrarily-nested 'and'/'or' conditions).
Sunspot provides a method called Sunspot.new_search which lets you build the search conditions incrementally and execute it on demand.
An example provided by the Sunspot's source code:
search = Sunspot.new_search do
with(:blog_id, 1)
end
search.build do
keywords('some keywords')
end
search.build do
order_by(:published_at, :desc)
end
search.execute
# This is equivalent to:
Sunspot.search do
with(:blog_id, 1)
keywords('some keywords')
order_by(:published_at, :desc)
end
With this flexibility, you should be able to build your query dynamically. Also, you can extract common conditions to a method, like so:
def blog_facets
lambda { |s|
s.facet(:published_year)
s.facet(:author)
}
end
search = Sunspot.new_search(Blog)
search.build(&blog_facets)
search.execute
I have solved this myself. The solution I used was to compiled the required scopes as strings, concatenate them, and then eval them inside the search block.
This required a separate query builder library that interrogates the solr indexes to ensure that a scope is not created for a non existent index field.
The code is very specific to my project, and too long to post in full, but this is what I do:
1. Split the search terms
this gives me an array of the terms or terms plus fields:
['field:term', 'non field terms']
2. This is passed to the query builder.
The builder converts the array to scopes, based on what indexes are available. This method is an example that takes the model class, field and value and returns the scope if the field is indexed.
def convert_text_query_to_search_scope(model_clazz, field, value)
if field_is_indexed?(model_clazz, field)
escaped_value = value.gsub(/'/, "\\\\'")
"keywords('#{escaped_value}', :fields => [:#{field}])"
else
""
end
end
3. Join all the scopes
The generated scopes are joined join("\n") and that is evaled.
This approach allows the user to selected the models they want to search, and optionally to do field specific searching. The system will then only search the models with any specified fields (or common fields), ignoring the rest.
The method to check if the field is indexed is:
# based on http://blog.locomotivellc.com/post/6321969631/sunspot-introspection
def field_is_indexed?(model_clazz, field)
# first part returns an array of all indexed fields - text and other types - plus ':class'
Sunspot::Setup.for(model_clazz).all_field_factories.map(&:name).include?(field.to_sym)
end
And if anyone needs it, a check for sortability:
def field_is_sortable?(classes_to_check, field)
if field.present?
classes_to_check.each do |table_clazz|
return false if ! Sunspot::Setup.for(table_clazz).field_factories.map(&:name).include?(field.to_sym)
end
return true
end
false
end

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?
For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.
The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986
Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.
To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.
Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}
I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html
This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.
In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.
There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.
RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's
Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)
My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful