Using MYSQLI to select rows in which part of a column matches part of an input - mysqli

I have a database in which one of the columns contains a series of information 'tags' about the row that are stored as a comma-separated list (a string) of dynamic length. I am using mysqli within PHP, and I want to select rows in which any of these items match any of the items in an input string.
For example, there could be a row describing an apple, containing the tags: "tasty, red, fruit, sour, sweet, green." I want this to show up as a result in a query like: "SELECT * FROM table WHERE info tags IN ('blue', 'red', 'yellow')", because it has at least one item ("red") overlapping. Kind of like "array_intersect" in PHP.
I think I could use IN if each row had only one tag, and I could use LIKE if I used only one input tag, but both are of dynamic length. I know I can loop over all the input tags, but I was hoping to put this in a single query. Is that possible? If not, can I use a different structure to store the tags in the database to make this possible (something other than a comma separated string)?

I think the best would be to create tags table (id + label) then separate "table_tags" table which holds table_id and tag_id.
that means using JOINS to get the final result.
another (but lazy) solution would be to prefix and suffix tags with commas so the full column contains something like:
,tasty,red,fruit,sour,sweet,green,
and you can do a LIKE search without being worried about overlapping words (i.e red vs bored) and still get a proper match by using LIKE '%,WORD,%'

Related

Postgres/full text search showing a preview of part of a document

I'm using postgres 9.3 with full text search and I'm running a query like
select * from jobs where fts ## plainto_tsquery('pg_catalog.english','search term');
I'm getting the proper results, however, I'd like to be able to get a portion of the search results that match the terms searched. The FTS column is just a to_tsvector() of the description column. What I'd like to do is show a short excerpt of the description, with the terms highlighted. Any ideas on how I'd achieve this?
This is what the ts_headline() function is intended for.
It is designed to deliver you excerpts or highlights of the "original" text you have normalized. The most basic usage would be this:
SELECT ts_headline(description, keywords) as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Note that "description" in this query is my guess to the name of your column that holds the original text and "fts" is the guess for the column that contains the normalized text.
This query will return a result set containing an excerpt of your orignal text with the matching tokens highlighted through HTML <b> tags.
There is a comma separated string of optional values you can pass into this function to alter its behavior. You could, for example, alter the surrounding tags you will get back by setting the StartSel and EndSel values:
SELECT ts_headline(description, keywords, 'StartSel=<em>,StopSel=</em>') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Now the <b> tags will become <em> tags. Actually, they do not have to be HTML tags, you can pass in (almost) any string.
Another popular value to set is the amount of excerpts you wish to see by setting the MaxFragments values to control the maximum amount of possible excerpts to return in combination with the MaxWords and MinWords values to set how much text should surround each excerpt.
SELECT ts_headline(description, keywords, 'MaxFragments=4,MaxWords=5,MinWords=2') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
The above query will now show a maximum of four possible excerpts and have a word boundary set between two and five words.
If you wish to simply show the whole document with the results highlighted, you could use the HighlightAll value, which overrides all fragment values set:
SELECT ts_headline(description, keywords, 'HighlightAll=true') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts ## keywords;
Note: beware of using ts_headline() for it is a possible bottleneck in performance. For each record you wish to highlight, the database has to go and fetch the whole text, parse it and insert the desired start and end elements.
Please use the function with great care and only set it loose on a small portion (top five or top ten records) of your complete result set.

Sphinx Filtering based on categories using OR

I have the following text fields I search with Sphinx: Title, Description, keywords.
However, sometimes things are narrowed down using categories. We have 3 category fields: CatID1, CatID2 and CatID3.
So, for example, I need to see if the word "Kittens" is in the Title, Description, or Keywords, but I also want to filter so that only items that have the categories (Animals - ID Number 8) or (Pets - ID Number 9) or (Felines - Category ID Number 10) in either of those CatID fields.
To clarify, only show items that have a 8,9 or 10 in CatID1, 2 or 3.
Any ideas on how I would accomplish this using sphinx filtering or searching the CatID1 fields as keywords?
Note: I am able to filter and it works great only using one category, i.e:
if(!empty($cat_str)) {
$cl->SetFilter( 'catid1', array( $cat_str ));
}
Thanks!
Craig
SetFilter takes an array. In your example you are putting $cat_str into an array. A array of one item.
So you just needs to build array with all the ids.
$cl->SetFilter( 'catid', array( $cat1, $cat2, $cat3 ));
But thats not very flexible. So you probably build the array dynamically, rather than hard-coded like that. But thats upto your application how to build the array.
But also storing the ids, in three sperate attributes, makes it hard to search. Notice in the above example, just noticed a attribute called catid. This would be a single multi-value attribute, that contains the ids from all three cat fields. That way its easy to search for ids in ANY of the columns at once.
http://sphinxsearch.com/docs/current.html#mva
if using a sql source, could do with something like
sql_query = SELECT id, title ... , CONCAT_WS(',', CatID1, CatID2 and CatID3) as catid FROM ...
sql_attr_multi = uint catid from field;

Unable to use Sphinx MVA sql_attr_multi

I have a field called "tags" and it has values (say) "Music, Art, Sports, Food" etc. How can I use setFilter function in PHP-Sphinx for this field. I know that it has to be an integer and should be used as an array in PHP. So, if I use a numeric field for tags, what about the delimiters (in this case comma). Currently, I am using "sql_attr_multi" like this…
sql_attr_multi = uint tags from field
I have to filter the search based on any of the keywords the user has selected, Music, Sports, Food etc. As such, only MVA is the right option to do this. But I am just not able to figure out, how to do this. I can store all tag elements as numeric values and make the tags field as int. But what about the comma or how will I convert the whole string (Music, Art, Sports, Food) as an integer. Later, how do I call setFilter using PHP.
Any help is highly appreciated.
Well using a MVA, suggests you already unique-ids for each tag.
Which if you had a seperate table for tags (with a PK), and many-to-many table joining your documents, and tags. (thats a very common way to store tags - in normal form)
If you have a text column containing the text, would be easier to just use a Field. Can easily filter by fields in the main text-query.
crispy creams #tags Food
for example (thats extended mode query)
(But fields can't do Grouping like you can with Attributes)

Google refine cross-reference between row and column

I'm not sure if this can be achieved in Google Refine at all. But basically, I have data like this.
The first table is the table of all the users. The second table show all the friends. However, in the second table in "friends" column not all the id exists in the first table which I want to get rid of. So, how can I search each id in friends column in the second table and get rid of the id that doesn't exists in the table 1?
Put the two tables in different projects (we'll call them Table1 and Table2).
In Table2 on on the friends column:
use "split multi-valued cells" to get each value on a separate row
convert the visitors column to numbers (or conversely user_id in Table1 to string)
use "add a new column based on this column" with the expression cross(cell,'Table1','user_id').length()
This will return 0 if there's no match, 1 if there's a match or N>1 if there are duplicates in Table1
If you want the data back in the original format, set up a facet to filter on the validity column, blank out all the bad values and then use "join multi-valued cells" to reverse the split operation you did up front.
I fixed some caching bugs with cross() for OpenRefine 2.6, so if the cross doesn't work, try stopping and restarting the Refine server.

Reporting Services and Dynamic Fields

I'm new to reporting services so this question might be insane. I am looking for a way to create an empty 'template' report (that is basically a form letter) rather than having to create one for every client in our system. Part of this form letter is a section that has any number of 25 specific fields. The section is arranged as such:
Name: Jesse James
Date of Birth: 1/1/1800
Address: 123 Blah Blah Street
Anywhere, USA 12345
Another Field: Data
Another Field2: More Data
Those (and any of the other fields the client specifies) could be arranged in any order and the label on the left could be whatever the client decides (example: 'DOB' instead of 'Date of Birth'). IDEALLY, I'd like to be able to have a web interface where you can click on the fields you want, specify the order in which they'll appear, and specify what the custom label is. I figured out a way to specify the labels and order them (and load them 'dynamically' in the report) but I wanted to take it one step further if I could and allow dynamic field (right side) selection and ordering. The catch is, I want to do this without using dynamic SQL. I went down the path of having a configuration table that contained an ordinal, custom label text, and the actual column name and attempting to join that table with the table that actually contains the data via information_schema.columns. Maybe querying ALL of the potential fields and having an INNER JOIN do my filtering (if there's a match from the 'configuration' table, etc). That doesn't work like I thought it would :) I guess I was thinking I could simulate the functionality of a dataset (it having the value and field name baked in to the object). I realize that this isn't the optimal tool to be attempting such a feat, it's just what I'm forced to work with.
The configuration table would hold the configuration for many customers/reports and I would be filtering by a customer ID. The config table would look somthing like this:
CustID LabelText ColumnName Ordinal
1 First Name FName 1
1 Last Name LName 2
1 Date of Birth DOBirth 3
2 Client ID ClientID 1
2 Last Name LName 2
2 Address 1 Address1 3
2 Address 2 Address2 4
All that to say:
Is there a way to pull off the above mentioned query?
Am I being too picky about not using dynamic SQL as the section in question will only be pulling back one row? However, there are hundreds of clients running this report (letter) two or three times a day.
Also, keep in mind I am not trying to dynamically create text boxes on the report. I will either just concatenate the fields into a single string and dump that into a text box or I'll have multiple reports each with a set number of text boxes expecting a generic field name ("field1",etc). The more I type, the crazier this sounds...
If there isn't a way to do this I'll likely finagle something in custom code; but my OCD side wants to believe there is SQL beyond my current powers that can do this in a slicker way.
Not sure why you need this all returned in one row: it seems like SSRS would want this normalized further: return a row for every row in the configuration table for the current report. If you really need to concatenate then do that in Embedded code in the report, or consider just putting a table in the form letter. The query below makes some assumptions about your configuration table. Does it only hold the cofiguration for the current report, or does it hold the config for many customers/reports at once? Also you didn't give much info about how you'll filter to the appropriate record, so I just used a customer ID.
SELECT
config.ordinal,
config.LabelText,
CASE config.ColumnName
WHEN 'FName' THEN DataRecord.FirstName
WHEN 'LName' THEN DataRecord.LastName
WHEN 'ClientID' THEN DataRecord.ClientID
WHEN 'DOBirth' THEN DataRecord.DOB
WHEN 'Address' THEN DataRecord.Address
WHEN 'Field' THEN DataRecord.Field
WHEN 'Field2' THEN DataRecord.Field2
ELSE
NULL
END AS response
FROM
ConfigurationTable AS config
LEFT OUTER JOIN
DataTable AS DataRecord
ON config.CustID = DataRecord.CustomerID
WHERE DataRecord.CustomerID = #CustID
ORDER BY
config.Ordinal
There are other ways to do this, in SSRS or in SQL, depends on more details of your requirements.