How do I return multiple documents that share indexes efficiently? - sphinx

I'm indexing some data using Sphinx. I have objects that are categorised and the categories have a heirarchy. My basic table structure is as follows:
Objects
| id | name |
| 1 | ABC |
| 2 | DEF |
...
Categories
| id | name | parent_id |
| 1 | My Category | 0 |
| 2 | A Child | 1 |
| 3 | Another Child | 1 |
...
Object_Categories
| object_id | category_id |
| 1 | 2 |
| 2 | 3 |
...
My config currently is:
sql_query = SELECT categories.id, objects.name, parent_id FROM categories \
LEFT JOIN object_categories ON categories.id = object_categories.category_id \
LEFT JOIN objects ON objects.id = object_categories.object_id
sql_attr_uint = parent_id
This returns category IDs for any categories that contain objects that match my search, but I need to make an adjustment to get objects in that category or any of it's children.
Obviously, I could UNION this query with another that gets ID from matched categories parents, and so on (it could be up to 4 or 5 levels deep), but this seems hugely inefficient. Is there a way to return multiple document IDs in the first field, or to avoid repeated needless indexing?
I'm a Sphinx noob, so I'm not sure how to approach the problem.

See
http://www.sitepoint.com/hierarchical-data-database/
its talking about a database, but the same system works equally well within sphinx. It can take a while to get your head around, but its well worth mastering (IMHO!).
(ie add the left/right columns to the database, and then include them as attributes in the sphinx index)

Related

SQL parameter table

I suspect this question is already well-answered but perhaps due to limited SQL vocabulary I have not managed to find what I need. I have a database with many code:description mappings in a single 'parameter' table. I would like to define a query or procedure to return the descriptions for all (or an arbitrary list of) coded values in a given 'content' table with their descriptions from the parameter table. I don't want to alter the original data, I just want to display friendly results.
Is there a standard way to do this?
Can it be accomplished with SELECT or are other statements required?
Here is a sample query for a single coded field:
SELECT TOP (5)
newid() as id,
B.BRIDGE_STATUS,
P.SHORTDESC
FROM
BRIDGE B
LEFT JOIN PARAMTRS P ON P.TABLE_NAME = 'BRIDGE'
AND P.FIELD_NAME = 'BRIDGE_STATUS'
AND P.PARMVALUE = B.BRIDGE_STATUS
ORDER BY
id
I want to produce 'decoded' results like:
| id | BRIDGE_STATUS |
|--------------------------------------|------------ |
| BABCEC1E-5FE2-46FA-9763-000131F2F688 | Active |
| 758F5201-4742-43C6-8550-000571875265 | Active |
| 5E51634C-4DD9-4B0A-BBF5-00087DF71C8B | Active |
| 0A4EA521-DE70-4D04-93B8-000CD12B7F55 | Inactive |
| 815C6C66-8995-4893-9A1B-000F00F839A4 | Proposed |
Rather than original, coded data like:
| id | BRIDGE_STATUS |
|--------------------------------------|---------------|
| F50214D7-F726-4996-9C0C-00021BD681A4 | 3 |
| 4F173E40-54DC-495E-9B84-000B446F09C3 | 3 |
| F9C216CD-0453-434B-AFA0-000C39EFA0FB | 3 |
| 5D09554E-201D-4208-A786-000C537759A1 | 1 |
| F0BDB9A4-E796-4786-8781-000FC60E200C | 4 |
but for an arbitrary number of columns.

How to aggregate Postgres table so that ID is unique and column values are collected in array?

I'm not sure how to call what I'm trying to do, so trying to look it up didn't work very well. I would like to aggregate my table based on one column and have all the rows from another column collapsed into an array by unique ID.
| ID | some_other_value |
-------------------------
| 1 | A |
| 1 | B |
| 2 | C |
| .. | ... |
To return
| ID | values_array |
-------------------------
| 1 | {A, B} |
| 2 | {C} |
Sorry for the bad explanation, I'm really lacking the vocabulary here. Any help with writing a query that achieves what's in the example would be very much appreciated.
Try the following.
select id, array_agg(some_other_value order by some_other_value ) as values_array from <yourTableName> group by id
You can also check here.
See Aggregate Functions documentation.
SELECT
id,
array_agg(some_other_value)
FROM
the_table
GROUP BY
id;

Eloquent: Distinct values and counts from relations

I have a DB structure as follows:
fashion_item
==============
| id | name |
|------------|
| 1 | item1 |
|------------|
| 2 | item2 |
--------------
fashion_colour
===============
| id | name |
|-------------|
| 1 | red |
|-------------|
| 2 | white |
|-------------|
| 3 | green |
---------------
| 4 | black |
---------------
fashion_color_fashion_item
======================================
| fashion_item_id | fashion_color_id |
|------------------------------------|
| 1 | 1 |
|------------------------------------|
| 1 | 2 |
|------------------------------------|
| 1 | 3 |
|------------------------------------|
| 2 | 2 |
|------------------------------------|
| 2 | 3 |
--------------------------------------
The fashion_color_fashion_item table is a join table for a many to many relationship between fashion_item and fashion_color.
Using Eloquent, I would like to retrieve a list of results from fashion_item (based on other criteria) then get a distinct list of fashion_colour id's from the results, with a count.
I need to end up with a value like the following, though I'm willing to transform the relevant data from another structure.
[ 1 => 1, 2 => 2, 3 => 2, 4 => 0 ]
In this format, there is a key which reflects a fashion_color.id, and a value which represents the number of times the colour is referenced by a row from the result set.
fashion_colour.id's with no count result can be null, 0 or simply not present.
I have the correct relationships setup between the tables and I can return results using all of the regular methods, including eager loading the colour data.
I've been able to achieve a similar result on direct belongs to relationships by grouping the results based on the foreign key in the table and counting the array. This won't work for many-to-many relationships.
e.g.
$silhouetteFilterList = array();
$results = FashionItem::(where clauses, etc...)->get();
$silhouettes = $results->groupBy('fashion_silhouette_id')->all();
foreach ($silhouettes as $key => $value) {
$silhouetteFilterList[$key] = count($value);
}
P.S. We're currently using Eloquent 4.1 because we need PHP 5.3 compatibility, we're hoping to move on soon. Comments regarding the antiquated nature of either PHP5.3 or Eloquent 4.1 will not be welcome :p
We are using Eloquent but not Laravel.
Try eager loading the relationship:
$collection = $items->with('colors')->get();
Each item in the collection should now have a colors variable that represents an array of colors for that particular item.
Since this is a general Laravel Collection, you can use collection and array methods to get it in the format you like.

removing duplicates from Sphinx Search based on 1 column

I have one table which I need to do a search on, this table is formed through a join between 2 other tables.
ThinkingSphinx::Index is defined on table posts, my posts_index.rb looks something like this:
join 'LEFT JOIN threads on posts.id = threads.id'
indexes 'posts.text', as: posts_text
indexes 'threads.text', as: threads_text
and my tables:
threads
| id | text |
| 0 | test title |
| 1 | foo bar |
posts
| id | parent | text
| 0 | 0 | some stuff
| 1 | 0 | more stuff
What I need to do is to perform a sphinx search on both the thread.text and the posts.text. Say I do a search on the word stuff, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is what I need, but if I do a search on the word test, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is NOT what I want, as you can see there is an extra unnecessary row returned, in this case I only want the first row. I cannot do a group_by because one thread can have many posts that may/may not contain the search term, I still need to return all of those posts that are hit. The only time when I dont want a certain duplicate result is if the search term is ONLY found in the thread title. For various reasons I cannot just write a filter after the sphinx search, it has to be written into the query.
About the only way to do this would be to arrange that the thread.text column is blank on all but one post (eg the first), that way a keyword match against the thread title can only ever match against one post.
(but you then can't get matches, where some words match the title, and some match the post text, for anything but the first post)
... also not sure exactly how to arrange for this in ThinkingSphinx.
not really a solution but I ended up just adding another index and removed the title field from the other index.

Order by date AND id, sqldeveloper

I have some tables with date and id as two of the columns:
ID | DATE | ITEMS
1 | 7/1/13 | More Apples
2 | 6/29/13 | Carrots
1 | 6/20/13 | Apples
2 | 6/10/13 | Broccoli
I would like to order them by DATE and then group them by ID's so that all the 1's are together ordered by dates:
ID | DATE | ITEMS
1 | 7/1/13 | More Apples
1 | 6/20/13 | Apples
2 | 6/29/13 | Carrots
2 | 6/10/13 | Broccoli
How would I accomplish this?
I'm thinking my solution might be a sub-select but I haven't gotten anywhere closest to what I want to achieve. Note that the above tables are very simplified. I'm actually trying to accomplish this with many tables joined and many different fields being displayed. Thanks.