Is there any Open topic Classifier API for demo purposes - classification

Is there an Open Classifier/Categorizer API which returns categories/topics for a given entity string. For example for an input string like : "Barack Obama" it should return topics like : "Politics" similarly for "Albert Einsten" it should emit "Physicist, Nobel Laureate" etc.

try texlexan. it gave me some really good results but with slightly longer input texts.

Related

Kafka Connect - json path - regex condition not working

I have an application where I use a connector to save data to a database.
I want to filter the messages saved by removing the ones that have a certain property very long.
My messages are like this :
{
field_a : value,
field_b : value,
field_c : possible very long value
}
So, I used in the kafka connector the Confluent Filter like this :
transforms: filterSpam
transforms.filterSpam.type: io.confluent.connect.transforms.Filter$Value
transforms.filterSpam.filter.condition: $[?(#.field_c =~ /^.{32000,}$/)]
transforms.filterSpam.filter.type: exclude
transforms.filterSpam.missing.or.null.behavior: include
For some reason the filter is not working. All messages pass through.
I tried also with the negation :
$[?(!(#.field_c =~ /^.{1,32000}$/))]
In this case, the very long were filtered out, but also some of the shorter ones were.
I do not understand where the issue is coming from. Any help ?
Actually, I needed to update my regex knowledge.
The issue was related to the fact that the string field on which I tried to apply the regex sometimes was multiline.
Thanks to this I managed to use a proper validation of the size of this field.
The final solution is :
transforms.filterSpam.filter.condition: $[?(#.field_c =~ /(\s)^.{32000,}$/)]

how do duckduckgo spice IA secondary API calls get their parameters?

I have been looking through the spice instant answer source code. Yes, I know it is in maintenance mode, but I am still curious.
The documentation makes it fairly clear that the primary spice to API gets its numerical parameters $1, $2, etc. from the handle function.
My question: should there be secondary API calls included with spice alt_to as, say, in the movie spice IA, where do the numerical parameters to that API call come from?
Note, for instance, the $1 in both the movie_image and cast_image secondary API calls in spice alt_to at the preceding link. I am asking which regex capture returns those instances of $1.
I believe I see how this works now. The flow of information is still a bit murky to me, but at least I see how all of the requisite information is there.
I'll take the cryptocurrency instant answer as an example. The alt_to element in the perl package file at that link has a key named cryptonator. The corresponding .js file constructs a matching endpoint:
var endpoint = "/js/spice/cryptonator/" + from + "/" + to;
Note the general shape of the "remainder" past /js/spice/cryptonator: from/to, where from and to will be two strings.
Back in the perl package the hash alt_to->{cryptonator} has a key from which receives, I think, this remainder from/to. The value corresponding to that key is a regex meant to split up that string into its two constituents:
from => '([^/]+)/([^/]*)'
Applied to from/to, that regex will return $1=from and $2=to. These, then, are the $1 and $2 that go into
to => 'https://api.cryptonator.com/api/full/$1-$2'
in alt_to.
In short:
The to field of alt_to->{blah} receives its numerical parameters by having the from regex operate on the remainder past /js/spice/blah/ of the name of the corresponding endpoint constructed in the relevant .js file.

Autocomplete with Firebase

How does one use Firebase to do basic auto-completion/text preview?
For example, imagine a blog backed by Firebase where the blogger can tag posts with tags. As the blogger is tagging a new post, it would be helpful if they could see all currently-existing tags that matched the first few keystrokes they've entered. So if "blog," "black," "blazing saddles," and "bulldogs" were tags, if the user types "bl" they get the first three but not "bulldogs."
My initial thought was that we could set the tag with the priority of the tag, and use startAt, such that our query would look something like:
fb.child('tags').startAt('bl').limitToFirst(5).once('value', function(snap) {
console.log(snap.val())
});
But this would also return "bulldog" as one of the results (not the end of the world, but not the best either). Using startAt('bl').endAt('bl') returns no results. Is there another way to accomplish this?
(I know that one option is that this is something we could use a search server, like ElasticSearch, for -- see https://www.firebase.com/blog/2014-01-02-queries-part-two.html -- but I'd love to keep as much in Firebase as possible.)
Edit
As Kato suggested, here's a concrete example. We have 20,000 users, with their names stored as such:
/users/$userId/name
Oftentimes, users will be looking up another user by name. As a user is looking up their buddy, we'd like a drop-down to populate a list of users whose names start with the letters that the searcher has inputted. So if I typed in "Ja" I would expect to see "Jake Heller," "jake gyllenhaal," "Jack Donaghy," etc. in the drop-down.
I know this is an old topic, but it's still relevant. Based on Neil's answer above, you more easily search doing the following:
fb.child('tags').startAt(queryString).endAt(queryString + '\uf8ff').limit(5)
See Firebase Retrieving Data.
The \uf8ff character used in the query above is a very high code point
in the Unicode range. Because it is after most regular characters in
Unicode, the query matches all values that start with queryString.
As inspired by Kato's comments -- one way to approach this problem is to set the priority to the field you want to search on for your autocomplete and use startAt(), limit(), and client-side filtering to return only the results that you want. You'll want to make sure that the priority and the search term is lower-cased, since Firebase is case-sensitive.
This is a crude example to demonstrate this using the Users example I laid out in the question:
For a search for "ja", assuming all users have their priority set to the lowercased version of the user's name:
fb.child('users').
startAt('ja'). // The user-inputted search
limitToFirst(20).
once('value', function(snap) {
for(key in snap.val()){
if(snap.val()[key].indexOf('ja') === 0) {
console.log(snap.val()[key];
}
}
});
This should only return the names that actually begin with "ja" (even if Firebase actually returns names alphabetically after "ja").
I choose to use limitToFirst(20) to keep the response size small and because, realistically, you'll never need more than 20 for the autocomplete drop-down. There are probably better ways to do the filtering, but this should at least demonstrate the concept.
Hope this helps someone! And it's quite possible the Firebase guys have a better answer.
(Note that this is very limited -- if someone searches for the last name, it won't return what they're looking for. Hence the "best" answer is probably to use a search backend with something like Kato's Flashlight.)
It strikes me that there's a much simpler and more elegant way of achieving this than client side filtering or hacking Elastic.
By converting the search key into its' Unicode value and storing that as the priority, you can search by startAt() and endAt() by incrementing the value by one.
var start = "ABA";
var pad = "AAAAAAAAAA";
start += pad.substring(0, pad.length - start.length);
var blob = new Blob([start]);
var reader = new FileReader();
reader.onload = function(e) {
var typedArray = new Uint8Array(e.target.result);
var array = Array.prototype.slice.call(typedArray);
var priority = parseInt(array.join(""));
console.log("Priority of", start, "is:", priority);
}
reader.readAsArrayBuffer(blob);
You can then limit your search priority to the key "ABB" by incrementing the last charCode by one and doing the same conversion:
var limit = String.fromCharCode(start.charCodeAt(start.length -1) +1);
limit = start.substring(0, start.length -1) +limit;
"ABA..." to "ABB..." ends up with priorities of:
Start: 65666565656565650000
End: 65666665656565650000
Simples!
Based on Jake and Matt's answer, updated version for sdk 3.1. '.limit' no longer works:
firebaseDb.ref('users')
.orderByChild('name')
.startAt(query)
.endAt(`${query}\uf8ff`)
.limitToFirst(5)
.on('child_added', (child) => {
console.log(
{
id: child.key,
name: child.val().name
}
)
})

What version of GWT-RPC is this request?

I have this kind of request which seems to be GWT-RPC format :
R7~"61~com.foo.Service~"14~MethodA~D2~"3~F7e~"3~B0e~Ecom.foo.data.BeanType~I116~Lcom.foo.Parameter~I5~"1~b~Z0~"1~n~V~"1~o~Z1~"1~p~Z1~"1~q~V~'
But it is not in line with protocol described here:
https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit
What exactly is this protocol ? Is it really GWT-RPC or something else (deRPC?) ?
Looking into gwt-2.5.1 source code, I notice it seems that following packages could be generating this kind of format:
com.google.gwt.rpc.client
com.google.gwt.rpc.server
Is this deRPC ?
Based on a quick glance through the deRPC classes in the packages you listed, it does indeed appear to be deRPC. Note that deRPC has always been marked as experimental, and is now deprecated, and that either RPC or RequestFactory should be used instead.
Details that seem to confirm this:
com.google.gwt.rpc.client.impl.SimplePayloadSink#RPC_SEPARATOR_CHAR is a constant equal to the ~ character, which appears to be a separator between different tokens in the sample string you provided.
Both com.google.gwt.rpc.client.impl.SimplePayloadSink andcom.google.gwt.rpc.server.SimplePayloadDecoder` have many comments that appear to depict the same basic format that you are seeing in there:
// "4~abcd in endVisit(StringValueCommand x, Context ctx) closely matches several tokens in the sample string - a quote indicating a string, an int describing the length, a ~ separator, then the string itself (this doesnt match everything, I suspect because you removed details about the service and the name of the method):
"3~F7e
"3~B0e
"1~b
Booleans all follow Z1 or Z0, as in your sample string

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?
For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.
The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986
Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.
To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.
Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}
I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html
This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.
In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.
There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.
RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's
Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)
My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful