removing duplicates from Sphinx Search based on 1 column

removing duplicates from Sphinx Search based on 1 column - sphinx

I have one table which I need to do a search on, this table is formed through a join between 2 other tables.
ThinkingSphinx::Index is defined on table posts, my posts_index.rb looks something like this:
join 'LEFT JOIN threads on posts.id = threads.id'
indexes 'posts.text', as: posts_text
indexes 'threads.text', as: threads_text
and my tables:
threads
| id | text |
| 0 | test title |
| 1 | foo bar |
posts
| id | parent | text
| 0 | 0 | some stuff
| 1 | 0 | more stuff
What I need to do is to perform a sphinx search on both the thread.text and the posts.text. Say I do a search on the word stuff, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is what I need, but if I do a search on the word test, this comes back
thread.id | posts.id | thread.text | posts.text |
0 | 0 | test title | some stuff |
0 | 1 | test title | more stuff |
this is NOT what I want, as you can see there is an extra unnecessary row returned, in this case I only want the first row. I cannot do a group_by because one thread can have many posts that may/may not contain the search term, I still need to return all of those posts that are hit. The only time when I dont want a certain duplicate result is if the search term is ONLY found in the thread title. For various reasons I cannot just write a filter after the sphinx search, it has to be written into the query.

About the only way to do this would be to arrange that the thread.text column is blank on all but one post (eg the first), that way a keyword match against the thread title can only ever match against one post.
(but you then can't get matches, where some words match the title, and some match the post text, for anything but the first post)
... also not sure exactly how to arrange for this in ThinkingSphinx.

not really a solution but I ended up just adding another index and removed the title field from the other index.

Related

Loop insert with select

I have the following structures
Tickets
+----+---------------------+-----------+---------------+
| id | price | seat_id | flight_id |
+----+---------------------+-----------+---------------+
Seats
+----+--------+-----------+
| id | letter | number |
+----+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
+----+--------+-----------+
I want to insert 2 tickets using only one query where the letter is A and the number is between 1 and 2, I guess to make more than 1 insert at time I have to use some plsql loop but I don't know how to do it and i don't know if this is the approach

Not sure what you are actually wanting to do, but from your description I'll assume you want 2rows in tickets referencing id 1 and 2 from seats.
SQL works in sets NOT in individual rows and loop (yes those are available via plpgsql) but avoid loops when ever possible. Inserting 2 rows does not require one; in fact it is almost exactly the same as inserting a single row. Since you didn not specify values for price and flight, I'll just omit them. But to insert 2 rows:
Insert into tickets(id,seat_id) values (1,1),(2,2);

How to aggregate Postgres table so that ID is unique and column values are collected in array?

I'm not sure how to call what I'm trying to do, so trying to look it up didn't work very well. I would like to aggregate my table based on one column and have all the rows from another column collapsed into an array by unique ID.
| ID | some_other_value |
-------------------------
| 1 | A |
| 1 | B |
| 2 | C |
| .. | ... |
To return
| ID | values_array |
-------------------------
| 1 | {A, B} |
| 2 | {C} |
Sorry for the bad explanation, I'm really lacking the vocabulary here. Any help with writing a query that achieves what's in the example would be very much appreciated.

Try the following.
select id, array_agg(some_other_value order by some_other_value ) as values_array from <yourTableName> group by id
You can also check here.

See Aggregate Functions documentation.
SELECT
id,
array_agg(some_other_value)
FROM
the_table
GROUP BY
id;

Eloquent: Distinct values and counts from relations

I have a DB structure as follows:
fashion_item
==============
| id | name |
|------------|
| 1 | item1 |
|------------|
| 2 | item2 |
--------------
fashion_colour
===============
| id | name |
|-------------|
| 1 | red |
|-------------|
| 2 | white |
|-------------|
| 3 | green |
---------------
| 4 | black |
---------------
fashion_color_fashion_item
======================================
| fashion_item_id | fashion_color_id |
|------------------------------------|
| 1 | 1 |
|------------------------------------|
| 1 | 2 |
|------------------------------------|
| 1 | 3 |
|------------------------------------|
| 2 | 2 |
|------------------------------------|
| 2 | 3 |
--------------------------------------
The fashion_color_fashion_item table is a join table for a many to many relationship between fashion_item and fashion_color.
Using Eloquent, I would like to retrieve a list of results from fashion_item (based on other criteria) then get a distinct list of fashion_colour id's from the results, with a count.
I need to end up with a value like the following, though I'm willing to transform the relevant data from another structure.
[ 1 => 1, 2 => 2, 3 => 2, 4 => 0 ]
In this format, there is a key which reflects a fashion_color.id, and a value which represents the number of times the colour is referenced by a row from the result set.
fashion_colour.id's with no count result can be null, 0 or simply not present.
I have the correct relationships setup between the tables and I can return results using all of the regular methods, including eager loading the colour data.
I've been able to achieve a similar result on direct belongs to relationships by grouping the results based on the foreign key in the table and counting the array. This won't work for many-to-many relationships.
e.g.
$silhouetteFilterList = array();
$results = FashionItem::(where clauses, etc...)->get();
$silhouettes = $results->groupBy('fashion_silhouette_id')->all();
foreach ($silhouettes as $key => $value) {
$silhouetteFilterList[$key] = count($value);
}
P.S. We're currently using Eloquent 4.1 because we need PHP 5.3 compatibility, we're hoping to move on soon. Comments regarding the antiquated nature of either PHP5.3 or Eloquent 4.1 will not be welcome :p
We are using Eloquent but not Laravel.

Try eager loading the relationship:
$collection = $items->with('colors')->get();
Each item in the collection should now have a colors variable that represents an array of colors for that particular item.
Since this is a general Laravel Collection, you can use collection and array methods to get it in the format you like.

Crosstabs Crystal Reports Null value vs zero

I'm creating a crosstab report showing the survey history for gopher tortoises (if you must know what that is) monitoring stations. Not all stations are monitored for a given survey and sometimes when we monitor we don't find any and thus record a 0 which is a valid result.
In the crosstab when the station isn't used I would like it to say "N/A" or some other equivalent, but when it's a zero I want it to stay as zero.
I've found so much on how to change a null to a zero, but nothing when you want to keep the zero and somehow note the null.
Below is what the crosstab should look like. You'll see that the 0 in Station4 on 1/1/2004 is "real" (meaning we didn't find any) but all of the N/A's are when we didn't use the station.
Survey Dates
| | 1/1/2000 | 1/1/2002 | 1/1/2004 | 1/1/2006 |
|----------|----------|----------|----------|----------|
| Station1 | 9 | 5 | N/A | N/A |
| Station2 | 5 | 7 | 2 | 6 |
| Station3 | N/A | N/A | 6 | 9 |
| Station4 | 10 | 9 | 0 | 11 |
This is what the Oracle table look like for the 1/1/2000 survey as an example
| SurveyID | StationID | Number |
|----------|-----------|--------|
| 1 | 1 | 9 |
| 1 | 2 | 5 |
| 1 | 4 | 6 |
So, basically how to I keep the zero's and put some text in the nulls in a CR crosstab?
Thanks!

Because CR doesn't differentiate between nulls and actual zeros in the crosstab, you can try replacing actual zero values with a placeholder so you can tell the difference. Note that this solution will only work if you are trying to display the values and not do any aggregate calculations.
First, create a formula that will replace the zeros with a placeholder value. In this case, I'm using -1 since that number should never appear in the database.
//{#Survey Num}
local numbervar totalSurvey;
totalSurvey:={Table.ActiveBurrows} + {Table.InactiveBurrows};
if totalSurvey=0 then -1 else totalSurvey
Use this formula to create your crosstab. Now you need to set a display string so that everything appears correctly. Right-click one of your crosstab cells → hit "Format Field" → select the "Common" tab → then create a "Display String" formula. That formula should be something like:
if currentfieldvalue=-1 then "0" else if currentfieldvalue=0 then "N/A" else totext(currentfieldvalue,0,'')
Now you're basically just printing the real values over top of the placeholders.

How do I return multiple documents that share indexes efficiently?

I'm indexing some data using Sphinx. I have objects that are categorised and the categories have a heirarchy. My basic table structure is as follows:
Objects
| id | name |
| 1 | ABC |
| 2 | DEF |
...
Categories
| id | name | parent_id |
| 1 | My Category | 0 |
| 2 | A Child | 1 |
| 3 | Another Child | 1 |
...
Object_Categories
| object_id | category_id |
| 1 | 2 |
| 2 | 3 |
...
My config currently is:
sql_query = SELECT categories.id, objects.name, parent_id FROM categories \
LEFT JOIN object_categories ON categories.id = object_categories.category_id \
LEFT JOIN objects ON objects.id = object_categories.object_id
sql_attr_uint = parent_id
This returns category IDs for any categories that contain objects that match my search, but I need to make an adjustment to get objects in that category or any of it's children.
Obviously, I could UNION this query with another that gets ID from matched categories parents, and so on (it could be up to 4 or 5 levels deep), but this seems hugely inefficient. Is there a way to return multiple document IDs in the first field, or to avoid repeated needless indexing?
I'm a Sphinx noob, so I'm not sure how to approach the problem.

See
http://www.sitepoint.com/hierarchical-data-database/
its talking about a database, but the same system works equally well within sphinx. It can take a while to get your head around, but its well worth mastering (IMHO!).
(ie add the left/right columns to the database, and then include them as attributes in the sphinx index)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

removing duplicates from Sphinx Search based on 1 column - sphinx

not really a solution but I ended up just adding another index and removed the title field from the other index.

Related

Loop insert with select

How to aggregate Postgres table so that ID is unique and column values are collected in array?

Eloquent: Distinct values and counts from relations

Crosstabs Crystal Reports Null value vs zero

How do I return multiple documents that share indexes efficiently?

Categories

Resources