Conditional text replacement - text-processing

I'm pre-processing some ecommerce product titles, such as:
Input:
1. Jersey Shore: Family Vac Season 2
2. Robotic Vac Cleaner with Max Power Suction
Notice that booth titles have a Vac word. I would like to correct the 2nd one, replacing it to Vaccum.
Desired output:
1. Jersey Shore: Family Vac Season 2
2. Robotic Vaccum Cleaner with Max Power Suction
I could write a algorithm (for instance checking if the string contains "clean" or "suction"), but first I would like to know if there are any frameworks, libraries, etc that already does this kind of task. Seems to be a commom problem... It could be any language (java, python, c, etc).

I could think that you are getting those tittles from an API or are them hardcoded in the site?
If you have it in a JSON format or even in something more simple as an array of strings:
var products = [{
'title': "Jersey Shore: Family Vac Season 2",
}, {
'title': 'Robotic Vac Cleaner with Max Power Suction',
}]
There is a very useful Javascript tool ->
https://fusejs.io/
With it you can search and even specify nice parameters like:
threshold -> if you want a perfect match or similar words, etc.
Visit the site the documentation is great.
After that you can use Javascript (String.prototype.replace) replace with the word that you want, in this case: Vaccum
https://developer.mozilla.org/es/docs/Web/JavaScript/Referencia/Objetos_globales/String/replace
And get your final object or array to be placed on your ecommerce site

Related

In a Quiz, how can i instead of "Say A, B or C" let the user use one of the three response words?

VIA Actions Console, not Dialogflow!
After several days I finally finished to create a Quiz that works like this.
Google Mini says: "What is the capital of France? A) Rome, B) Berlin or C) Paris ?"
In my scene i have two conditions.
scene.slots.status == "FINAL" && intent.params.choosenABC.original == session.params.antwort
AND
!(scene.slots.status == "FINAL" && intent.params.choosenABC.original == session.params.antwort)
So here, these conditions check whether the user says the correct letter coming from the session parameter "antwort".
Everything works smooth as long as the user says "A", "B" or "C".
But how can i compare a condition to what the user says?
In the above example i want the user to be able to say "Rome" or "Berlin" or "Paris" and the condition to check these entries.
Thanks in advance!
You have a number of questions packed in there, so let's look at each.
Does input.params.original exist?
In short, yes. You can see the documentation of the request Intent object and you'll see that there is intent.params.*name*.original. Your question seems to suggest this would work as well.
There is also intent.params.*name*.resolved which contains the value after you take type aliases into account.
I found some variables on a Dialogflow forum...
Those only work if you're using Dialogflow and don't make any sense when you're looking at Action Builder.
How to match
You don't show the possible value of session.params.antwort or how you're setting antwort. But it sounds like it makes sense that you're setting this in a handler. So one thing you could do is to set antwort to the city name (or whatever the full word answer is) and set letter to the letter with the valid reply. Then test both against original to see if there is a match.
But, to be honest, that starts getting somewhat messy.
You also don't indicate how the Intent is setup, or if you're using an Entity Type to capture the answer. One great way to handle this, however, is to create a Type that can represent the answers, and use a runtime type override to set what the possible values and aliases for that value are. Then, you can control exactly what the valid value you will use to compare with will be.
For example, if you create a type named "Answer", then in your fulfillment when you ask the question you can set the possible values for this with something like
conv.session.typeOverrides = [{
name: 'Answer',
mode: 'TYPE_REPLACE',
synonym: {
entries: [
{
name: 'A',
synonyms: ['A', 'Rome']
},
{
name: 'B',
synonyms: ['B', 'Berlin']
},
{
name: 'C',
synonyms: ['C', 'Paris']
}
]
}
}];
If you then have an Intent with a parameter of type Answer with the name answer, then you can test if intent.parameter.answer.resolved contains the expected letter.
Adding a visual interface
Using runtime type overrides are particularly useful if you also decide to add support for a visual selection response such as a list. The visual response builds on the runtime type override to add visual aliases that users can select on appropriate devices. When you get the reply, however, it is treated as if they said the entry name.

classify cell array in matlab

I want to do text categorization on a dataset of news. I have a lot of features like subject, keyword, summary, etc... all of these features are stored in one cell array of structs, each struct looking like this:
label: 'misc.forsale'
subj: ' Motorcycle wanted.'
keyword: [1x190 char]
reference: []
organization: ' Worcester Polytechnic Institute'
from: ' kedz#bigwpi.WPI.EDU (John Kedziora)'
summary: []
lines: ' 11'
vocab: [4x2 double]
I want to classify them with class = classify(test, train, target, 'diaglinear');but these functions only receive arrays as input, and do not accept cells or structs.
I can't convert this cell array to one multidimensional array because the amount of features varies (for example, one subject has two words and other has three words).
What can I do?
Do some feature extraction first. For example, tokenize the strings, then use TF-IDF.
You can include the key with the tokens. This is a common practise in information retrieval. See the Xapian manual for an example.
Usually, you will do some stemming, e.g. Examples -> exampl. Now, just add a prefix to make the words distinct depending on their occurrence. E.g. Sexampl when the subject contained example and Kexampl when it was a keyword.
Then you have a "bag of words" representation that is used everywhere. They even do this for mining images, it's called "visual words" then. These aren't english-language words either.

Address an ordered list the RESTful way

I doubt what's the best way to address an ordered list in a RESTful API. Imagine the following example: Let's create a chart list of LPs, where you want to add new LPs, delete those which aren't in the TOP10 yet, and change their positions. How would you implement those methods in a RESTful JSON-API?
I thought of the following way:
GET / to return the ordered chart list like [{ "name": "1st-place LP", "link": "/uid123" }, { "name": "2nd-place LP", "link": "/uid987" }, ...]
GET /{uid} to return a LP by its unique ID, returning sth. like {"name": "1st-place LP", "ranking": 1 }
GET /ranking/{position} to access e.g. the current first-ranked LP, returning a 303 See Other with a Location-header like Location: /uid123
POST / with request body { "name": "my first LP title" } to create a new LP without specifying its current chart position
Now it's the question how we could change the current chart positions? One could simply PUT /{uid} to update the ranking attribute, but I think a PUT /ranking/{position} would be more natural. On the other hand it doesn't make sense to PUT against an URI which will return a 303 See Other when using GET.
What do you think would be the best way to address such a chart list? I don't like the solution of changing simply the ranking attribute in the LP-datasets as this could end in senseless states like two LPs with the same ranking and so on.
I see two questions. 1. What is the most RESTful (beautiful) way to design the API? 2. How do I make sure that two LPs does not get the same ranking?
1:
Your LPs could have several properties that are relative to eachother, e.g. different ranking on different charts. I would say that you want the ranking moved OUT of your LP resource. Keep the ranking on a certain list as a separate resource. Example:
GET /LPuid only returns properties about the LP, not relative properties, like rankings
GET /billboard/3 returns the URI to LP that has rankning 3 on the billboard list.
PUT /billboard takes a document of 100 LP URI's.
PUT /billboard/3 INSERTS an LP URI at that ranking and moves the other ones down.
2: That has nothing to do with rest and you would have that issue no matter how you design your API. Transactions is one solution.
You have two collection resources within your music service. As such, I would design a URI structure like this:
/ => returns links to collections (ergo itself a collection resource)
/releases => returns a list of LPs
/chart => returns the top 10 LPs, or redirects to the current chart URI
You would POST to /releases to add a new LP, and PUT or PATCH to /chart to define a new chart or alter the current chart. You will need to define what representation formats are transfered in each case.
This gives you the flexability to define thinks like /chart/2012-12-25 to show the chart as it stood on christmas day 2012.
I do not suggest using PUT /chart/{position} to insert an LP at a specific position and shuffle everything else down. Intermediarys would not know that a PUT to that URI causes other resources to change their URIs. This is bad for caching.
Also, as a user, I would hope you avoid the word "billboard" as the other answer suggests. A billboard conjures in the mind pictures of an advertising hoarding, and not anything to do with ranking charts!

Problem while using NSPredicate

Sql query:
select * from test_mart
where replace(replace(replace(replace(replace(replace(lower(name),'+'),'_'),'the '),' the'),'a '),' a')='tariq'
I can fire following query very easy, if I have to use simply Sqlite... but In current project I am using Core Data so not familiar about NSPredicate much.
The functionality talks about removing all BUT alphanumeric characters, which means removing special characters.
The characters that should be valid in the comparison would be
ABCDEFGHIJKLMNOPQRESTUVWXYZ1234567890
But we should not fail the comparison for the following characters
:;,~`!##$%^&*()_-+="'/?.>,<|\
Or for the following words
'the' 'an' 'a'
Some examples:
'Walmart' would be seen as the same payee as 'Wal-Mart'
'The Shoe Store' would be seen as the same payee as 'Shoe Store'
'Domino's Pizza' would be seen as the same payee as 'Dominos Pizza'
'Test Payee;' would be seen as the same payee as 'Test Payee'
Can any one suggest appropriate Predicates/Regular Expression ?
Thanks
I would have an extra field in the data base which would be a processed version of the original with all the irrelevant characters stripped out. Then use that for comparisons.
You might want to look at the soundex algorithm which may suite your purposes better... Soundex
It seems to me that you would want to normalize your data before it every gets set into the core data store. So if you're given "Wal-Mart", normalize it to "walmart" once, and then save it. Then you won't be doing all of this expensive on-the-fly comparison many many times.
The normalization would be fairly simple, given your rules:
Strip the words "a", "an", and "the"
Remove punctuation

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?
For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.
The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986
Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.
To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.
Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}
I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html
This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.
In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.
There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.
RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's
Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)
My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful