Ranking rules for camelCase attributes - algolia

I'm building an Algolia index to search through user-created communities on my site.
Just like for subreddits, the name of the communities can't contain spaces and are therefore often written by users in camelCase.
Here is an example of an object in my index:
{
"name": "headphoneAdvice",
"description": "This community is dedicated to enthusiasts and newcomers. We are all about making the right decision when purchasing new headphones."
}
Both name and description are set to be searchable attributes and i'm currently using these ranking rules :
["typo","geo","words","filters","proximity","attribute","exact","custom"]
However, this does not seem to work well with the camelCase name. For example, if I type "advice" in the search, the object above with "name": "headphonesAdvice" isn't found.
I'm guessing this is because words in camelCase are considered single words and thus do not match.
I've looked online for rules that allow indexing of camelCase attributes but couldn't find anything really.
Any ideas?
Cheers!

After asking around, found that someone at algolia thought of this and added https://www.algolia.com/doc/api-reference/api-parameters/camelCaseAttributes/ kudos!

Related

RESTapi nesting endpoint

Ok, new to RESTapi so not sure if I am using the correct terminology for what I want to ask so bear with me. I believe what I am asking about is nested resources in a service but I want to ask specifically about using it for separating a blob of "closely related" content. It may be easier to provide an example. Let's say I have the following service that could output the following:
/Policy
"data": [ {
"name": "PolicyName1",
"description": "",
"size": 25000,
.... (bunch of other fields)
"specialEnablement": true,
“specialEnablementOptions”: { <-- options below valid only if specialEnablement is true
“optionType”: “TypeII”,
“optionFlagA”: false,
“optionFlagB”: true,
“optionFlagC”: false,
...(bunch of other options here)
}
},
{ . . . }],
The specialEnablementOptions are only used if specialEnablement is 'true'. It is all part of this /Policy service so has no primary key other than the policy "name" (and doesnt make sense to have to generate one) so does not fall under some of the other questions I have been reading about nested resources.
It does make it more readable to separate this set of information since there are 12 or so options but, this is REST so, maybe human readability does not weigh heavily here.
I am being told that, if we do it this way, it makes it more complex to work with during POST/PUT/PATCH commands. Specifically, it is being said in my group that if we do this, we should require two calls....one that creates the policy main information then the user must call a second time to PATCH the specialEnablementOptions (assuming specialEnablement is true). This seems kludgy to me.
I am looking for expert advise on what the best practice is.
My questions:
Does having the specialEnablementOptions nested in this way cause a
lot of complexity. Seems to me that either way we have to verify
that the settings are valid?
Does having the specialEnablementOptions nested in this way require
two calls? In other words, can a user not do a POST/PATCH/PUT for
all the fields including those in the specialEnablementOptions in
one call? We are planning to provide a way for the user to do a
PATCH of just the specialEnablementOptions options without changing
any of the first level for ease of use but is there something that
prevents them from creating or modifying all settings in one call?
Another option is to just get rid of the nested
specialEnablementOptions and put everything at the same level. I
dont have a problem with this but wasn't sure if this was just being
lazy. I dont mind doing this if the consensus is it is the best way
to do it....but I also have a second example that is similar to this
scenario but is a bit more complex where putting everything under the parent level is not really optimal (I will show in the next example)
So, my second example is as follows:
/anotherPolicy
"data": [ {
"name": "APolicyName1",
"description": "",
"count": 123,
"lastModified": "2022-05-17-20.37.27.000000",
[{
"ownerId": 1
"ownerCount": 1818181
"specialFlags": 'ABA'
},
{ . . . }]
},
{ . . . }],
The above 'count' is the total number associated to that policy and then there is a nested resource by owner where the count by owner can be seen..plus maybe other information specific to that owner. The SUM(ownerCount) would equal "count" above it. Does this scenario change any of the answers to the questions above?
I appreciate your help. I found a ton of information and reference on when to use or not use nested endpoints but all the examples seem to orient around subjects that seem like they could easily be separated into two resource...for instance whether to nest /employees under /departments or /comments under /posts. Also, they didn't deal with the complexities of having nested endpoints vs avoiding them. And last, if using nesting is unnecessary as a readability standpoint.

How do i stop my Watson Assistant from auto-correcting the user input

Can anyone help me solve this issue
The screenshot attached below is self explanatory
It is auto-correcting sufyan to Susan
The value for the context variable is
"<? input.text.substring(0, 1).toUpperCase() + input.text.substring(1) ?>"
The motive here is to simply convert lowercase name sufyan to Sufyan
or for that case any Indian name.
But the auto-correct has now become a hindrance.
I want the assistant to interact with the user in the later part using his/her name.
You can configure autocorrection in your bot settings.
Along with Henrik’s answer, it’s good to learn about fuzzy matching in Watson Assistant as it runs before autocorrection
How is spelling autocorrection related to fuzzy matching?
Fuzzy matching helps your assistant recognize dictionary-based entity mentions in user input. It uses a dictionary lookup approach to match a word from the user input to an existing entity value or synonym in the skill's training data. For example, if the user enters boook, and your training data contains a #reading_material entity with a book value, then fuzzy matching recognizes that the two terms (boook and book) mean the same thing.
When you enable both autocorrection and fuzzy matching, the fuzzy matching function runs before autocorrection is triggered. If it finds a term that it can match to an existing dictionary entity value or synonym, it adds the term to the list of words that belong to the skill, and does not correct it.
Check the complete documentation here before turning of autocorrection
You can use the following context variable:
"<? input.original_text.substring(0,1).toUpperCase() + input.original_text.substring(1) ?>"

Google DLP - Displaying the Region using InfoTypes.list()

After integrating the Google DLP API, the ListInfoTypes() currently returns the name, description, supported types of the infotypes present in the infotypes reference. Is it possible to also obtain the region for the infotypes like "Australia" or "Argentina" as a seperate field?
Currently this is my output:
"name": "AUSTRALIA_MEDICARE_NUMBER",
"displayName": "Australia medicare number",
"supportedBy": [
"INSPECT"
],
"description": "A 9-digit Australian Medicare account
I need the Region as well for example Region: "Australia" for every other infotypes.
I also got around to see locations.infoTypes.list() but I'm not sure which location I should enter in the filter to get any value.
Looking at the REST API there doesn't appear to be identifying data that can be formally used to determine the region. If we look at the InfoTypeDescription JSON structure found here:
https://cloud.google.com/dlp/docs/reference/rest/v2/ListInfoTypesResponse#InfoTypeDescription
we see that "name" is described as an "internal name of the InfoType". I wondered if we could depend on a structure of the string ... perhaps (.)*_.* as a regular expression grouping. While this might work, it shouldn't be relied upon without investigation of more samples and the docs don't describe the structure.
If you really need a solution, my recommendation would be to dump ALL the InfoTypes and then manually group the "name" fields into the regions of interest to you. You could then store this as CSV or JSON and have a reference piece of data that you could use in your app and regenerate as needed.
It's a great feature request I'll forward to the team. In the short term you can hack the name as ones that are regional will say they are in their name.

Can you list multiple features within the same Schema.org "LocationFeatureSpecification"?

I am working on Schema.org Resort schema for a ton of resorts on a travel website and am trying to find the most efficient ways of filling out the schema with regards to amenities.
The current code looks something like this:
"amenityFeature": [
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Spa",
"value":"true"
},
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Internet Access",
"value":"true"
},
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":"Tennis Courts",
"value":"true"
}
]
My question is, can I write it like this instead to shorten lines of code:
{
"#type":"http://schema.org/LocationFeatureSpecification",
"name":[
"Spa", "Internet Access", "Tennis Courts"
],
"value":"true"
}
When I test it in Google’s Structured Data Testing Tool, it doesn’t give any errors. Here is what it looks like in the SDTT when I write it the short way:
And here is what it looks like if I do it the first/long way:
If I do it the short way, I want to make sure all those items are getting listed as amenities and not just different names for the same amenity. Otherwise, I'll go the long route.
No, each LocationFeatureSpecification represents one feature:
Specifies a location feature by providing a structured value representing a feature of an accommodation as a property-value pair of varying degrees of formality.
Your second snippet would represent one feature with multiple names.

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?
For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.
The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986
Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.
To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.
Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}
I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html
This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.
In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.
There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.
RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's
Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)
My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful