Use of full-text search + GIN in a view (Django 1.11 ) - postgresql

I need some help with building proper query in a django view for full-text search using GIN index. I have quite a big database (~400k lines) and need to do a full-text search on 3 fields from it. Tried to use django docs search and this is code BEFORE GIN. It works, but takes 6+ seconds to search over all fields. Next I tried to implement a GIN index to speed up my search. There are a lot of questions already how to build it. But my question is - how does the view query change when using a GIN index for search? What fields should I search?
Before GIN:
models.py
class Product(TimeStampedModel):
product_id = models.AutoField(primary_key=True)
shop = models.ForeignKey("Shop", to_field="shop_name")
brand = models.ForeignKey("Brand", to_field="brand_name")
title = models.TextField(blank=False, null=False)
description = models.TextField(blank=True, null=True)
views.py
def get_cosmetic(request):
if request.method == "GET":
pass
else:
search_words = request.POST.get("search")
search_vectors = (
SearchVector("title", weight="B")
+ SearchVector("description", weight="C")
+ SearchVector("brand__brand_name", weight="A")
)
products = (
Product.objects.annotate(
search=search_vectors, rank=SearchRank(search_vectors, search)
)
.filter(search=search_words)
.order_by("-rank")
)
return render(request, "example.html", {"products": products})
After GIN:
models.py
class ProductManager(models.Manager):
def with_documents(self):
vector = (
pg_search.SearchVector("brand__brand_name", weight="A")
+ pg_search.SearchVector("title", weight="A")
+ pg_search.SearchVector("description", weight="C")
)
return self.get_queryset().annotate(document=vector)
class Product(TimeStampedModel):
product_id = models.AutoField(primary_key=True)
shop = models.ForeignKey("Shop", to_field="shop_name")
brand = models.ForeignKey("Brand", to_field="brand_name")
title = models.TextField(blank=False, null=False)
description = models.TextField(blank=True, null=True)
search_vector = pg_search.SearchVectorField(null=True)
objects = ProductManager()
class Meta:
indexes = [
indexes.GinIndex(
fields=["search_vector"],
name="title_index",
),
]
# update search_vector every time the entry updates
def save(self, *args, **kwargs):
super().save(*args, **kwargs)
if (
"update_fields" not in kwargs
or "search_vector" not in kwargs["update_fields"]
):
instance = (
self._meta.default_manager
.with_documents().get(pk=self.pk)
)
instance.search_vector = instance.document
instance.save(update_fields=["search_vector"])
views.py
def get_cosmetic(request):
if request.method == "GET":
pass
else:
search_words = request.POST.get('search')
products = ?????????
return render(request, 'example.html', {"products": products})

Answering my own question:
products = (
Product.objects.annotate(rank=SearchRank(F("search_vector"), search_words))
.filter(search_vector=search_words)
.order_by("-rank")
)
This means you should search your index field - in my case search_vector field.
Also I have changed my code a bit in ProductManager() class, so now I can just use
products = Product.objects.with_documents(search_words)
Where with_documents() is a custom function of custom ProductManager(). The recipe of this change is here (page 29).
What does all this code do:
creates search_vector with scores to fields, field with bigger score - gets higher place in result sorting.
creates GIN index for full-text search via ORM Django
updates GIN index every time the instance of model is changed
What this code dosn't do:
It doesn't sort by relevance of substring which is queried. Possible solution.
Hope this will help somebody with a bit complicated full-text search in Django.

Related

Alternatives for withFilterExpression for supporting composite key

I'm trying to query dynamoDB through withFilterExpression. I get an error as the argument is a composite key
Filter Expression can only contain non-primary key attributes: Primary key attribute: question_id
and also as it uses OR operator in the query and it cannot be passed to withKeyConditionExpression.
The query that was passed to withFilterExpression is similar to this question_id = 1 OR question_id = 2. The entire code is like follows
def getQuestionItems(conceptCode : String) = {
val qIds = List("1","2","3")
val hash_map : java.util.Map[String, Object] = new java.util.HashMap[String, Object]()
var queries = ArrayBuffer[String]()
hash_map.put(":c_id", conceptCode)
for ((qId, index) <- qIds.zipWithIndex) {
val placeholder = ":qId" + index
hash_map.put(placeholder, qId)
queries += "question_id = " + placeholder
}
val query = queries.mkString(" or ")
val querySpec = new QuerySpec()
.withKeyConditionExpression("concept_id = :c_id")
.withFilterExpression(query)
.withValueMap(hash_map)
questionsTable.query(querySpec)
}
Apart from withFilterExpression and withConditionExpression is there any other methods that I can use which is a part of QuerySpec ?
Let's raise things up a level. With a Query (as opposed to a GetItem or Scan) you provide a single PK value and optionally an SK condition. That's what a Query requires. You can't provide multiple PK values. If you want multiple PK values, you can do multiple Query calls. Or possibly you may consider a Scan across all PK values.
You can also consider having a GSI that presents the data in a format more suitable to efficient lookup.
Side note: With PartiQL you can actually specify multiple PK values, up to a limit. So if you really truly want this, that's a possibility. The downside is it raises things up to a new level of abstraction and can make inefficiencies hard to spot.

Combining two search/filter functions to render template in flask pymongo

I have a text search based on a text index of my mongodb collection 'locations' so a user can search for results based on town/city.
and I also have a dropdown select where user can select a 'type'.
both functions return the desired result, but they both work independently. I don't know enough to understand how to combine them and render the results. Any help much appreciated.
#app.route("/search")
query = request.form.get("query")
search = mongo.db.properties.find({"$text": {"$search": query}})
Dropdown Filter:
#app.route("/search", methods=["GET", "POST"])
def search():
# query the database for all property types
featured = mongo.db.properties.find()
types = mongo.db.type.find()
query = request.form.get("query")
filtered_result = []
# gets property type input from dropdown select and renders results
if request.method == "POST":
property_types = request.form.get("propertytype")
filtered_result = list(mongo.db.properties.find({'property_type' : property_types}))
return render_template("properties.html", types=types, properties=filtered_result, featured=featured)

How do i create a Document class having geopoint property in motorengine

Motorengine is a great library for doing async db operation with mongodb. But i am wondering how can i do a geo spacial query with motorengine.
Since the library doesn't have support for geo fields. Option i have is using motor 2dspear index. But it would be really nice if i find a way with the help of motorengine.
Can anyone please help me with that.
I fixed the problem like this.
from motorengine.document import Document
import pymongo
class BaseDocument(Document):
import pymongo
ASCENDING = pymongo.ASCENDING
DESCENDING = pymongo.DESCENDING
GEO2D = pymongo.GEOSPHERE
def __init__(self, alias=None, **kwargs):
indexes = self.__indexes__ if hasattr(self, "__indexes__") else []
if len(indexes) == 0:
return
def ensure_index(index, **spec):
self.objects.coll(alias).ensure_index(index, **spec)
for index_spec in indexes:
ensure_index([index_spec])
super(BaseDocument, self).__init__(**kwargs)
and the using index with it.
class Team(BaseDocument):
__indexes__ = [('location', BaseDocument.GEO2D)]
name = StringField(required=True)
location = GeoPointField()
contact = StringField(required=True)

PonyORM orphaned items when clearing a set that defines one-to-many relation

I have a basic relation defined as follows:
db = Database('sqlite', 'test_db.sqlite', create_db=True)
class WOEID(db.Entity):
woeid = PrimaryKey(int)
iso = Optional(str)
name = Required(str)
language = Optional(str)
place_type = Required(str)
parent_id = Required(int)
trends = Set('Trend')
ancestry = Optional(str)
class Trend(db.Entity):
woeid = Required(int)
events = Optional(str)
name = Required(str)
promoted_content = Optional(str)
query = Required(str)
url = Required(str)
location = Optional(WOEID)
db.generate_mapping(create_tables=True)
Now, I add some items to WOEID.trends within a function decorated with #db_session. This works as expected.
Now I try to update WOEID.trends by first reading an object using
location = WOEID.get(woeid = some_woeid)
later on I issue
location.trends.clear()
to delete old entries and I add new items to the trends set.
In the generated Trends table after this operation I have the items added, but previous items (cleared from the set) are not deleted, they stay in the database with 'location' field nulled (they are dereferenced I guess).
How should I perform the operation outlined above to get read of orphaned items?
There are two kinds of one-to-many relationships in PonyORM. The first kind of relationship is when one end of relationship is Set and the other end of relationship is Required. In that case when you remove an item from the collection this item will be deleted. For example we can define two entities Article and Comment in the following way:
class Article(db.Entity):
author = Required(User)
text = Required(str)
comments = Set('Comment')
class Comment(db.Entity):
author = Required(User)
text = Required(str)
article = Required(Article)
In that case, when you perform article.comments.clear() all comment will be deleted, because the Comment.article attribute is required and a comment cannot exist without an article.
The other kind of relationship is where Comment.article attribute is defined as Optional:
class Comment(db.Entity):
author = Required(User)
text = Required(str)
article = Optional(Article)
In that case a comment can exist without any article, and when you remove the comment from the Article.comments collection it remains in the database, but the Comment.article attribute value is set to NULL.
You can find orphaned items by executing the following query:
select(c for c in Comment if c.article is None)
or, equivalently
Comment.select(lambda c: c.article is None)
In some cases it may be desirable to define attribute as Optional, but perform cascade delete on removing item from the collection. In order to do this, you can specify cascade_delete option for the Set attribute:
class Article(db.Entity):
author = Required(User)
text = Required(str)
comments = Set('Comment', cascade_delete=True)
class Comment(db.Entity):
author = Required(User)
text = Required(str)
article = Optional(Article)
Then if you do article.comments.clear() then all removed comments will be deleted from the database.

Columns Priority while searching with Lucene.NET

Team,
I have 6 indexed columns to search as below.
Name
Description
SKU
Category
Price
SearchCriteria
Now, While searching I have need to perform search on "SearchCritera" column first then rest of the columns.
In short - The products with matched "SearchCritera" shold display on the top of search results.
var parser = new MultiFieldQueryParser(Version.LUCENE_30,
new[] { "SearchCriteria",
"Name",
"Description",
"SKU",
"Category",
"Price"
}, analyzer);
var query = parseQuery(searchQuery, parser);
var finalQuery = new BooleanQuery();
finalQuery.Add(parser.Parse(searchQuery), Occur.SHOULD);
var hits = searcher.Search(finalQuery, null, hits_limit, Sort.RELEVANCE);
There are 2 ways to do it.
The first method is using field boosting:
During indexing set a boost to the fields by their priority:
Field name = new Field("Name", strName, Field.Store.NO, Field.Index.ANALYZED);
name.Boost = 1;
Field searchCriteria = new Field("SearchCriteria", strSearchCriteria, Field.Store.NO, Field.Index.ANALYZED);
searchCriteria.Boost = 2;
doc.Add(name);
doc.Add(searchCriteria);
This way the scoring of the terms in SearchCriteria field will be doubled then the scoring of the terms in the Name field.
This method is better if you always wants SearchCriteria to be more important than Name.
The second method is to using MultiFieldQueryParser boosting during search:
Dictionary<string,float> boosts = new Dictionary<string,float>();
boosts.Add("SearchCriteria",2);
boosts.Add("Name",1);
MultiFieldQueryParser parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30new[], new[] { "SearchCriteria", "Name"}, analyzer, boosts);
This method is better if you want the boosting to work only in some scenarios of your application.
You should try and see if the boosting number fits your needs (the sensitivity of the priority you are looking for) and change them according to your needs.
to make the example short and readable I used only 2 of your fields but you should use all of them of curseā€¦