Sunspot facet option for an array-type attribute - sunspot

I have researched on this question but found nothing close. Thats why I decide to ask. Stackoverflow has been a tremendous help for me.
I have a jobs table with one attribute is location. Each job.location is either in New York, Boston, or both (New York & Boston).
In sunspot solr for rails, how do I create facet for this attribute so that if a job.location is in both cities, the job can be displayed in the results when either New York or Boston is selected in the facet?
Should it be like an array ['New York', 'Boston']?
Thanks!

You should change your design so that you have a separate Location class and set the Job class to have has_and_belongs_to_many :locations because it is a many-to-many relationship.
Then you can create an integer scope in your Job class searchable block for the location:
integer :locations, :multiple => true
to allow multiple locations per job.
Now it is easy to add a facet(:locations) that will do exactly what you want.
The above works for any number of locations, not only two. However, if you don't want to create a new class/table and are sure you only have 1 or 2 locations, you can just create a query scope field:
integer :locations, :multiple => true do
if location == 'NY & Boston'
['NY', 'Boston']
else
[location]
end
end

Related

How can I match up user inputs to ambiguous city names?

We have a set of tables shown below we use for our other tables to reference for location data. Some examples are:
Find all companies within X miles of X City
Create a company profile's location as X City
We solve the problem of multiple cities with similar names by matching with State as well, but now we ran into a different set of problems. We use Google's Place Autocomplete for both Geocoding and matching up a users query with our Cities. This works fairly well until Google's format deviates from ours.
Example:
St. Louis !== Saint Louis and
Ameca del Torro !== Ameca Torro
Is there a way to fuzzy match cities in our queries?
Our query to match cities now looks like:
SELECT c.id
FROM city c
INNER JOIN state s
ON s.id = c.state_id
WHERE c.name = 'Los Angeles' AND s.short_name = 'CA'
I've also considered the denormalizing city and simply storing coordinates to still accomplish the radius search. We have around 2 million rows in our company table now so a radius search would be performed on that rather than by city table with a JOIN on company. This would also mean we wouldn't be able to create custom regions (simply anyway) for cities, and add other attributes to cities in the future.
I found this answer but it is basically affirming our way of normalizing input is a good method, but not how we match to our local Table (unless Google offers a City Name export I don't know about).
The short answer is that you can use Postgres's full text search functionality, with a customized search configuration.
Since your dealing with place names, your probably want to avoid stemming, so you can use the simple configuration as a starting point. You can also add stop-words that make sense for place names (with the examples above, you can probably consider "St.", "Saint", and "del" as stop-words).
A pretty basic outline of setting up your customized is below:
Create a stopwords file and put it in your $SHAREDIR/tsearch_data Postgres directory. See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS.
Create a dictionary that uses this stopwords list (you can probably use the pg_catalog.simple as your template dictionary). See https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY.
Create a search configuration for place names. See https://www.postgresql.org/docs/9.1/static/textsearch-configuration.html.
Alter your search configuration to use the dictionary you created in Step 2 (cf. the link above).
Another consideration is how to consider internationalization. It seems that the issue for your second example (Ameca del Torro vs. Ameca Torro) might be a Spanish vs. English representation of the name. If that's the case, you could also consider storing both a "localized" and "universal" (e.g. English) version of the city name.
At the end, your query (using full-text search) might look like this (where the 'places' is the name of your search configuration):
SELECT cities."id"
FROM cities
INNER JOIN "state" ON "state".id = cities.state_id
WHERE
"state".short_name = 'CA'
AND TO_TSVECTOR('places', cities.name) ## TO_TSQUERY('places', 'Los & Angeles')

REST API structure for multiple countries

I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk

Reporting Services and Dynamic Fields

I'm new to reporting services so this question might be insane. I am looking for a way to create an empty 'template' report (that is basically a form letter) rather than having to create one for every client in our system. Part of this form letter is a section that has any number of 25 specific fields. The section is arranged as such:
Name: Jesse James
Date of Birth: 1/1/1800
Address: 123 Blah Blah Street
Anywhere, USA 12345
Another Field: Data
Another Field2: More Data
Those (and any of the other fields the client specifies) could be arranged in any order and the label on the left could be whatever the client decides (example: 'DOB' instead of 'Date of Birth'). IDEALLY, I'd like to be able to have a web interface where you can click on the fields you want, specify the order in which they'll appear, and specify what the custom label is. I figured out a way to specify the labels and order them (and load them 'dynamically' in the report) but I wanted to take it one step further if I could and allow dynamic field (right side) selection and ordering. The catch is, I want to do this without using dynamic SQL. I went down the path of having a configuration table that contained an ordinal, custom label text, and the actual column name and attempting to join that table with the table that actually contains the data via information_schema.columns. Maybe querying ALL of the potential fields and having an INNER JOIN do my filtering (if there's a match from the 'configuration' table, etc). That doesn't work like I thought it would :) I guess I was thinking I could simulate the functionality of a dataset (it having the value and field name baked in to the object). I realize that this isn't the optimal tool to be attempting such a feat, it's just what I'm forced to work with.
The configuration table would hold the configuration for many customers/reports and I would be filtering by a customer ID. The config table would look somthing like this:
CustID LabelText ColumnName Ordinal
1 First Name FName 1
1 Last Name LName 2
1 Date of Birth DOBirth 3
2 Client ID ClientID 1
2 Last Name LName 2
2 Address 1 Address1 3
2 Address 2 Address2 4
All that to say:
Is there a way to pull off the above mentioned query?
Am I being too picky about not using dynamic SQL as the section in question will only be pulling back one row? However, there are hundreds of clients running this report (letter) two or three times a day.
Also, keep in mind I am not trying to dynamically create text boxes on the report. I will either just concatenate the fields into a single string and dump that into a text box or I'll have multiple reports each with a set number of text boxes expecting a generic field name ("field1",etc). The more I type, the crazier this sounds...
If there isn't a way to do this I'll likely finagle something in custom code; but my OCD side wants to believe there is SQL beyond my current powers that can do this in a slicker way.
Not sure why you need this all returned in one row: it seems like SSRS would want this normalized further: return a row for every row in the configuration table for the current report. If you really need to concatenate then do that in Embedded code in the report, or consider just putting a table in the form letter. The query below makes some assumptions about your configuration table. Does it only hold the cofiguration for the current report, or does it hold the config for many customers/reports at once? Also you didn't give much info about how you'll filter to the appropriate record, so I just used a customer ID.
SELECT
config.ordinal,
config.LabelText,
CASE config.ColumnName
WHEN 'FName' THEN DataRecord.FirstName
WHEN 'LName' THEN DataRecord.LastName
WHEN 'ClientID' THEN DataRecord.ClientID
WHEN 'DOBirth' THEN DataRecord.DOB
WHEN 'Address' THEN DataRecord.Address
WHEN 'Field' THEN DataRecord.Field
WHEN 'Field2' THEN DataRecord.Field2
ELSE
NULL
END AS response
FROM
ConfigurationTable AS config
LEFT OUTER JOIN
DataTable AS DataRecord
ON config.CustID = DataRecord.CustomerID
WHERE DataRecord.CustomerID = #CustID
ORDER BY
config.Ordinal
There are other ways to do this, in SSRS or in SQL, depends on more details of your requirements.

Most efficient database schema for counting keywords

I'm working on an iPhone app with a GAE backend. I currently have a database of ~8000 products and each product has 5 keywords, mined from reviews, that are the words used most often to describe the product. Once I deploy the app, I'd like to allow users to add new products, and add their 5 keywords to existing products. So, when "reviewing" an existing product, they would add their 5 words, and these would be reflected in the Top 5 words if they push a word over into the Top 5. These keywords will be selected via a large whitelist with indirect selection so I can control the user input. I'd like the application to scale to thousands of users without hitting my backend too hard.
My question is:
What's the most efficient database schema for keeping track of all the words for a product and calculating the top 5 for each product once it's updated?
My two ideas (which may be terrible):
Have a "words" column which contains a 2d array, one dimension is the word, the other is the count for that word. They would then be incremented/decremented as needed.
Have a database with each word as a column and each product as a row and the corresponding row/column would contain the count.
The easiest way to do this would be to have a 'tags' kind, defined something like this (you haven't specified a backend language, so I'm assuming Python):
class Tag(db.Model):
# Tags should be child entities of Products and have key name based on the tag
# eg, created with Tag(parent=a_product, key_name='awesome', ...)
count = db.IntegerProperty(required=True, default=0)
#classmethod
def increment_tags(cls, product, tag_names):
def _tx():
tags = cls.get_by_key_name(tag_names, parent=product)
for i, tag in enumerate(tags):
if tag is None:
# New tag
tags[i] = tag = cls(key_name=tag_names[i], parent=product)
tag.count += 1
db.put(tags)
return db.run_in_transaction(_tx)
#classmethod
def get_top_product_tags(cls, product, num=5):
return [x.key().name() for x
in cls.all().ancestor(product).order('-count').fetch(num)]
The increment_tags method increments the count property on all the relevant tags. Since they all have the same parent entity, they're in the same entity group, and it can do this transactionally, in a single transaction.
The get_top_product_tags method does a simple datastore query to find the num top ranked tags for a product.
You should use a normalized schema and let SQL and the database engine be your friend. Have a single table with a design like this:
create table KeywordUse
( AppID int
, UserID int
, Sequence int
, Word varchar(50) -- or whatever makes sense
)
You can also have an identity primary key if you like, but AppID + UserID + Sequence is a candidate key (i.e. the combination of these three must be unique).
To find the top 5 keywords for any app, do a SQL query like this:
select top 5
count(AppID) as Frequency -- If you have an identity PK count that instead.
, Word
from KeywordUse
where AppID = #AppIDVariable...
group by Word, AppID
order by count(AppID) desc
If you are really, really worried about performance you could denormalize the results of this query into a table that shows the words for each app. Then you'd have to work out how often to refresh that snapshot.
REVISED ANSWER:
As Nick Johnson so generously pointed out, aggregate functions are not available in GQL. However, the philosophy of my answer remains unchanged. Let the database engine do its job.
The table should be AppID, Word, and Frequency. (AppID and Word are the PK.) Then each use of the word would be added up as it is applied. Then, when you want to know the top five words for an app you select by AppID := #Value and order by Frequency (descending) with a LIMIT = 5.
You would need a separate table to track user keywords if that is important.

How to order documents by a dynamic property in Mongoid

I am using Mongoid to store a series of geocoded listings. These listings need to be sorted by price and proximity. The price of every listing is a field in the database whereas distance is a dynamic property that is unique for every user.
class Listing
include Mongoid::Document
field :price
def distance
get_distance(current_user.location,coordinates)
end
end
How can I sort these documents by distance? I tried #listing.desc(:distance) and that didn't work.
The short (and unhelpful) answer is: you can't.
Mongoid does have the ability to query based on 2d co-ordinates though, then you could update your controller to do something like this:
#listings = Listing.near(current_user.location)
Which I believe will return your listings in order of distance.
On a side note, I noticed that your Listing model is referring to your current_user object, which kinda breaks the MVC architecture, since your models shouldn't know anything about the current session.