Lets say I want to create two separate indexes on something like BlogPosts, so that I can do a quick search using one index (for autocomplete purposes for example) then use the other index for full blown search querying.
Is that something I can do with Tire?
so something like this (forgive me if its a little primitive)
class Post < ActiveRecord::Base
include Tire::Model::Search
include Tire::Model::Callbacks
index_name 'autocomplete'
mapping do
indexes :title, :analyzer => 'my_ngram_analyzer'
end
index_name 'main'
mapping do
indexes :title
indexes :description
indexes :author
indexes :published_on
end
end
where the callbacks know to add and remove new posts from the appropriate indexes
You can't do this in Tire, setting two separate indices (and mappings) in one class with the mapping DSL method.
It may be a good idea to use two separate indices, one for auto-completion, another for search. There's a nice tutorial, StackOverflow answer and even an elasticsearch plugin to get you started.
However, unless you have a lot of data, you'd be able to achieve that even with a single index, using the multi_field type, match query across multiple fields, and potentially an NGram based analyzer.
Check out the Autocomplete with Tire tutorial which outlines the approach.
Cross-post from https://github.com/karmi/tire/issues/531
Related
I would like to get specific neuron models and even though I believe I understand the RMA query system, I can not find a list of the valid keywords/arguments/criteria/parameters that would correspond to what I am looking for.
For example 'homo sapiens' as donor species is valid, and makes sense.
But if 'm__biophys_perisomatic' returns all cells with perisomatic biophysical models, what about 'all active' ones (just an example, I would be interested in many other categories)?
I assume it is obvious but I will not stumble upon it until I have posted this question.
Thanks for your question. You can see what fields and associations are available for a table using the describe route. For example:
http://api.brain-map.org/api/v2/data/NeuronalModel/describe.xml
From your question, I believe you're looking at this table:
http://api.brain-map.org/api/v2/data/ApiCellTypesSpecimenDetail/describe.xml
You can use m__biophys_all_active to see if a cell in that table has an all-active model.
FYI: The ApiCellTypesSpecimenDetail table is a denormalized table, which means it combines a complex set of relationships among tables into a single flat table.
You could similarly use the following, more generic query to find the all-active models.
http://api.brain-map.org/api/v2/data/query.xml?criteria=model::NeuronalModel,rma::criteria,neuronal_model_template[name$eq'Biophysical - all active']&num_rows=150
I'm new to nosql and views. Wondering if someone could show me how to build an index such that it will return all the different documents that apply multiple different keys. An example is below.
I have many many documents that all have the naming convention as follows:
AABA_August-11-2017_2017-06-29_10
BBY_August-11-2017_2017-06-29_10
CECO_January-19-2018_2017-06-08_19
GEL_December-15-2017_2017-06-08_1
Etc..
I'd like a view such that I could query on "starts with BBY" for example. And it would return all the documents that start with BBY. Maybe even "BBY_December", "BBY_August" etc.
Wondering if this is possible and what it would look like. I'm using CouchDB which uses Mango to build indexes. If someone could just point me in the right direction that would help too.
Thanks
You could write such a view like this:
function(doc) {
var docId = doc._id;
var p = docId.substring(0, 2); // Or however many chars you want
if (p === 'BBY') emit(doc._id, doc); // Or whatever kind of key you want
}
And then write similar views for alternate prefixes. You can also use query parameters similar to the _all_docs endpoint with views (http://docs.couchdb.org/en/2.0.0/api/ddoc/views.html).
I think the only benefit of using a view instead of what you have done is that you can filter unnecessary fields, do some basic transformations, etc.
Considering the similarities between retrieval from _all_docs vs from a view, it looks like the _all_docs endpoint is just index similar to a custom view. But I'm not sure on that.
Not sure how to use Mango to do the same.
My current naming convention required no new index. I used futon to find:
ip:port/DB/_all_docs?inclusive_end=true&start_key="BBY_Aug"&end_key="BBY_Auh"
Short description of the situation:
We're running a forked version of Sulu 1.5.2, PHP 7.1, Windows server environment, db connection with PostgreSQL
We have a website structure/tree where we have house templates at the top level; each house has one house_rooms and one house_occupants template; each house_rooms template has N house_rooms_room templates, and each house_occupants template has N house_occupants_occupant templates. This represents an actual House that has N Rooms and N Occupants.
Now I'd like to know if there is a way to specifically get, for instance, all the house_occupants_occupant content that follows a certain pattern of attributes (for instance: their gender attribute having value 'female' and their date_of_birth parameter being >= 1990/01/01), without having to load each house, then find its house_occupantspage among the children, and then loop over that template's house_occupants_occupant children and filter the thus begotten content according to their gender and date of birth attributes.
I already found that there is a ContentRepository class that can ::findAll() and ::findByUuids(), but there doesn't seem to be a way to filter on specific attributes (like template type, template attributes, ...). So I took a roundabout way of creating my own "repository" that does direct PDO queries on the phpcr_nodes table in the database, to specifically scan the props attribute for the occurence of a certain template name:
$this->pdo->query("SELECT identifier, props FROM phpcr_nodes WHERE props LIKE '%>house_occupants_occupant<%'");
I can see that the propscontains a string value representing an XML document that somehow translates into the entire template with attribute-value pairs, however it is obscured regarding tag-levels and how certain attributes relate to certain values. So in theory I could use a specific XML parser to turn this into something human-readable, so that for my house_occupants_occupant data I could get something like:
// what I would get after putting the props through a certain XML parser:
$xmlHumanReadableData = [
'<the_uuid_of_occupant_1>' => [
...
'gender' => 'female',
'date_of_birth' => '1992-05-18T00:00:00.000+00:00',
...
],
... //etcetera etcetera
];
When I would have that, I could filter the readable data to ascertain which content I want to keep, add the node-uuid to some $theUuids variable, and then retrieve the actual content using Sulu's ContentRepository::findByUuids($theUuids) method. That would "only" require 2 queries and some PHP array filtering in between, which is a great deal better than looping over all the children content starting from a certain parent and doing this until you've traversed all the parents and all their children... (Certainly, the overhead would increase if you'd want to search for, for instance, all house nodes where at least one of its house_occupants_occupant nodes represents a child less than 10 years old, since you'd need extra queries to "set up" the filterdata used in the final query. But still: a great deal better than looping everything... ;-) )
So my question is sort-of twofold:
What is the Sulu-specific XML parser I can use to turn the XML string value in this props column into something human-readable, with proper attribute-value pairs?
And/or, hopefully: is there a way I can avoid all this nonsense and just use a less low-level way of retrieving content of a specific template type with specific values for specific attributes ?
The ContentRepository you've found is already an abstraction to some of our requirements for pages. Your requirements are already quite specific, so you should write your own query using SQL-2, the query language for PHPCR.
This should enable you to write a query which matches your requirements.
We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.
We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).
This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).
Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?
The first thing you want to decide is exactly what kind of tree you will use.
The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.
So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.
As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.
The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:
db.products.find({category: /^Sports,Tennis,Womens[,]/})
to find all products listed under a certain path of your tree.
Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.
A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:
{
_id: ObjectId(),
name: 'Women\'s',
path: 'Sports,Tennis,Womens',
normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}
So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.
So an example of changing "Tennis" to "Badmin":
db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
doc.path = doc.path.replace(/,Tennis/, ",Badmin");
db.categories.save(doc);
});
Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.
And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.
Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.
The schema for materialised paths easily supports this by just adding another path, that simple.
Hope it makes sense, quite a long one there.
If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.
This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.
My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.
I have a database of restaurants which I do a full-text search on. The code looks something like this:
SELECT * FROM restaurant WHERE restaurant.search_vector ## plainto_tsquery(:terms);
And search_vector is defined like this:
alter table restaurant add column search_vector tsvector;
create index restaurant_search_index on restaurant using gin(search_vector);
create trigger restaurant_search_update before update or insert on restaurant
for each row execute procedure
tsvector_update_trigger('search_vector',
'pg_catalog.english','title');
Now, a notable problem with this search is the word barbecue. It can be spelled many different ways: barbecue, barbeque, BBQ, B.B.Q., B-B-Q, etc. When somebody searches any of these, I need to search restaurants for all of these terms.
From what I've read online, it seems I need to modify the dictionary (That would be pg_catalog.english, right?), but I'm not sure how to go about this.
Sounds like what you want to do is add a synonym dictionary in front of your english one. This will only work on single words though, so you might have problems with B.B.Q. if it gets parsed as three separate tokens.
Synonym dictionaries in postgresql.org docs
When I drumbled over a similiar problem I came across the option for Query Rewrites, see http://www.postgresql.org/docs/8.3/static/textsearch-features.html forexample, section 12.4.2.1
This is an easier approach then tackling the dictionary as it allows instantly extending your rewrite rules by just inserting new rules in your rewrite table.