Does anyone know if its possible to get the searchkick gem to explain itself when it uses misspellings?
Ie on a search for "Penut", I should be able to know that the results returned are actually really for "Peanut". Pretty much in the same way that google works, it has the "Showing search results for Peanut" when searching for "Penut".
You can get suggestions for the search query (not the results) by using Searchkick's suggestion feature.
You can try to use highlighting to get them from the results.
class Product < ApplicationRecord
searchkick highlight: [:name]
end
def parse_highlights(name)
Nokogiri::HTML(name).css("em").map { |v| v.text }
end
products = Product.search("penut", highlight: true)
products.with_highlights.each do |product, highlights|
parse_highlights(highlights[:name])
end
Similar question about Elasticsearch.
Related
How does one use Firebase to do basic auto-completion/text preview?
For example, imagine a blog backed by Firebase where the blogger can tag posts with tags. As the blogger is tagging a new post, it would be helpful if they could see all currently-existing tags that matched the first few keystrokes they've entered. So if "blog," "black," "blazing saddles," and "bulldogs" were tags, if the user types "bl" they get the first three but not "bulldogs."
My initial thought was that we could set the tag with the priority of the tag, and use startAt, such that our query would look something like:
fb.child('tags').startAt('bl').limitToFirst(5).once('value', function(snap) {
console.log(snap.val())
});
But this would also return "bulldog" as one of the results (not the end of the world, but not the best either). Using startAt('bl').endAt('bl') returns no results. Is there another way to accomplish this?
(I know that one option is that this is something we could use a search server, like ElasticSearch, for -- see https://www.firebase.com/blog/2014-01-02-queries-part-two.html -- but I'd love to keep as much in Firebase as possible.)
Edit
As Kato suggested, here's a concrete example. We have 20,000 users, with their names stored as such:
/users/$userId/name
Oftentimes, users will be looking up another user by name. As a user is looking up their buddy, we'd like a drop-down to populate a list of users whose names start with the letters that the searcher has inputted. So if I typed in "Ja" I would expect to see "Jake Heller," "jake gyllenhaal," "Jack Donaghy," etc. in the drop-down.
I know this is an old topic, but it's still relevant. Based on Neil's answer above, you more easily search doing the following:
fb.child('tags').startAt(queryString).endAt(queryString + '\uf8ff').limit(5)
See Firebase Retrieving Data.
The \uf8ff character used in the query above is a very high code point
in the Unicode range. Because it is after most regular characters in
Unicode, the query matches all values that start with queryString.
As inspired by Kato's comments -- one way to approach this problem is to set the priority to the field you want to search on for your autocomplete and use startAt(), limit(), and client-side filtering to return only the results that you want. You'll want to make sure that the priority and the search term is lower-cased, since Firebase is case-sensitive.
This is a crude example to demonstrate this using the Users example I laid out in the question:
For a search for "ja", assuming all users have their priority set to the lowercased version of the user's name:
fb.child('users').
startAt('ja'). // The user-inputted search
limitToFirst(20).
once('value', function(snap) {
for(key in snap.val()){
if(snap.val()[key].indexOf('ja') === 0) {
console.log(snap.val()[key];
}
}
});
This should only return the names that actually begin with "ja" (even if Firebase actually returns names alphabetically after "ja").
I choose to use limitToFirst(20) to keep the response size small and because, realistically, you'll never need more than 20 for the autocomplete drop-down. There are probably better ways to do the filtering, but this should at least demonstrate the concept.
Hope this helps someone! And it's quite possible the Firebase guys have a better answer.
(Note that this is very limited -- if someone searches for the last name, it won't return what they're looking for. Hence the "best" answer is probably to use a search backend with something like Kato's Flashlight.)
It strikes me that there's a much simpler and more elegant way of achieving this than client side filtering or hacking Elastic.
By converting the search key into its' Unicode value and storing that as the priority, you can search by startAt() and endAt() by incrementing the value by one.
var start = "ABA";
var pad = "AAAAAAAAAA";
start += pad.substring(0, pad.length - start.length);
var blob = new Blob([start]);
var reader = new FileReader();
reader.onload = function(e) {
var typedArray = new Uint8Array(e.target.result);
var array = Array.prototype.slice.call(typedArray);
var priority = parseInt(array.join(""));
console.log("Priority of", start, "is:", priority);
}
reader.readAsArrayBuffer(blob);
You can then limit your search priority to the key "ABB" by incrementing the last charCode by one and doing the same conversion:
var limit = String.fromCharCode(start.charCodeAt(start.length -1) +1);
limit = start.substring(0, start.length -1) +limit;
"ABA..." to "ABB..." ends up with priorities of:
Start: 65666565656565650000
End: 65666665656565650000
Simples!
Based on Jake and Matt's answer, updated version for sdk 3.1. '.limit' no longer works:
firebaseDb.ref('users')
.orderByChild('name')
.startAt(query)
.endAt(`${query}\uf8ff`)
.limitToFirst(5)
.on('child_added', (child) => {
console.log(
{
id: child.key,
name: child.val().name
}
)
})
i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)
I have a mongoid document (in a rails app) that has failed validation. I want to reset all of the invalid fields. Currently I am doing this:
#product.errors.each do |e,m|
method_name = "reset_#{e}!"
#product.send(method_name)
end
This is ok, but is there not a better (more concise) way to do this? I've read the dirty tracking documentation and googled, but I can't find anything about this.
I don't know what do you mean with 'more concise' but you don't need use "send"
#product.errors.errors.each do |f|
#product[f] = c.changes[f.to_s].first
end
I want to use some keywords that include special characters like & in Facebook search api. I tried the query below but I cannot get useful results. Is there any chance for this usage in search api? How should I build my search query?
My example queries and keywords are "H&M", "marks & spencer",
http://graph.facebook.com/search?type=post&limit=25&q="H&M"
http://graph.facebook.com/search?type=post&limit=25&q="marks & spencer"
My team worked on this forever, ended up finding this as a solution that provides relevant results for a query with an ampersand, such as 'H&M'.
%26amp%3b
This is the hex equivilent to &
So your example link would be
http://graph.facebook.com/search?type=post&limit=25&q="H%26amp%3bM"
We found the solution thanks to Creative Jar
You want %26 which is the URL encode for ampersand so
http://graph.facebook.com/search?type=post&limit=25&q="H%26M" http://graph.facebook.com/search?type=post&limit=25&q="marks %26 spencer"
Depending on your language, it may have a URL encoding function or you can just use string replacement.
It seems, that all of solutions suggested here are not working any more.
Searching for q=H%26%bM returns empty data set. The same for q=H%26M.
It must have changed recently, in last 2 months.
If you try to search for postings about H&M on Facebook site (type H&M in search, then "Show me more results" on the bottom of list and then public posts on menu on the left side) the list is empty.
The only query that returns any results is q=H&M but it is not helpful, as the results are irrelevant for that query.
I have been trying to work out what is the best way to search for gather all of the documents in a database that have a certain date.
Originally I was trying to use FTsearch or search to move through a document collection, but I changed over to processing a view and associated documents.
My first question is what is the easiest way to spin through a set of documents and find if a date stored in the documents is greater than or less than a specified date?
So, to continue working I implemented the following code.
If (doc.creationDate(0) > cdat(parm1))
And (doc.creationDate(0) < CDat(parm2)) then
...
end if
but the results are off
Included! Date:3/12/10 11:07:08 P1:3/1/10 P2: 3/5/10
Included! Date:3/13/10 9:15:09 P1:3/1/10 P2: 3/5/10
Included! Date:3/17/10 16:22:07P1:3/1/10 P2: 3/5/10
You can see that the date stored in the doc is not between P1 and P2. BUT! it does limit the documents with a date less than P1 correctly. So I won't get a result for a document with a date less than 3/1/10
If there isn't a better way than the if statement, can someone help me understand why the two examples from above are included?
Hi you can try something like this:
searchStr = {(Form = "yourForm" & ((#Created > [} & parm1 & {]) & (#Created < [} & parm2 & {])))}
Set docCollection = currentDB.Search(searchStr, Nothing, 0)
If(docCollection.Count > 0)Then
'do your stuff with the collection returned
End If
Carlos' response is pretty good.
If you have a lot of documents, you can also use a full-text search which will is much faster. The method call is very similar (db.ftsearch(), online help can be found here).
The standard DB Search method operates in the same way as view index updates, so it can get a little slow if you have thousands of documents to search through.
Just make sure you enable full text index for your database in the database properties, (last tab).
Syntax on this approach is very similar, this link provides a good reference for FTsearch. Using Carlos' syntax, you can substitute FTSearch and searchStr assignment for faster searching.