Find questions with multiple tags (AND, not OR) - tags

How can I find questions that are tagged with multiple tags, for example, BOTH regex AND python?
I've found answers on meta about using the browser to search multiple tags, but this question is specific to the SEDE: https://data.stackexchange.com
In the table Posts, the column Tags are stored as ncharv(250) and tags are appended to the string (not stored as array). In the browser, it looks like this
I only need the question (and answer) text for some text mining, so I had tried going direct to the Posts table:
pseudo_sql
select * from Posts where Tags in (tag_list)
this returns tag1 OR tag2
select * from Posts p1 inner join Posts p2 ON p1.Tags in (tag1) AND p2.Tags in (tag2)
I've also tried a larger query based on this popular query.

For multiple tags, one inefficient way to use SEDE is multiple LIKE statements
SELECT TOP 10
*
FROM Posts
WHERE Tags LIKE ('%python%')
AND Tags LIKE ('%regex%')
This will get you also similar tags, for example, python-3.x.
To get only those tags and no fuzzy matching, use
%<python>%
permalink

Related

Search pages with a Tag in CQ5

I am working on a custom search component in CQ5. I need to search for 1 or more tags selected by user using checkboxes. I tried using an earlier query to search text (select * from cq:Pagecontent where...)
I tried using :
select * from cq:PageContent where cq:tags like '%mytag%'
but it is not working. There are 2 pages which have 'mytag' as tag.
Any suggestion on how to do it ?
The following query is working for me. I'm searching here for for the following tags marketing:interest/services and marketing:interest/product
//element(*,cq:PageContent)[#cq:tags='marketing:interest/services' or #cq:tags='marketing:interest/product']
At the moment I would still go for XPATH, because of the better performance then SQL2.
When searching for a tag I also would avoid wildcards as they are not necessary if you are searching for an exact tagname.
Wildcards can negatively influence the performance of your query.

Finding similar posts with PostgreSQL

I have a table posts:
CREATE TABLE posts (
id serial primary key,
content text
);
When a user submits a post, how can I compare his post with the others and find similar posts?
I'm looking for something like StackOverflow does with the "Similar Questions".
While Text Search is an option it is not meant for this type of search primarily. The typical use case would be to find words in a document based on dictionaries and stemming, not to compare whole documents.
I am sure StackOverflow has put some smarts into the similarity search, as this is not a trivial matter.
You can get halfway decent results with the similarity function and operators provided by the pg_trgm module:
SELECT content, similarity(content, 'grand new title asking foo') AS sim_score
FROM posts
WHERE content % 'grand new title asking foo'
ORDER BY 2 DESC, content;
Be sure to have a GiST index on content for this.
But you'll probably have to do more. You could combine it with Text Search after identifying keywords in the new content ..
You need to use Full Text Search in Postgres.
http://www.postgresql.org/docs/9.1/static/textsearch-intro.html

Unable to use Sphinx MVA sql_attr_multi

I have a field called "tags" and it has values (say) "Music, Art, Sports, Food" etc. How can I use setFilter function in PHP-Sphinx for this field. I know that it has to be an integer and should be used as an array in PHP. So, if I use a numeric field for tags, what about the delimiters (in this case comma). Currently, I am using "sql_attr_multi" like this…
sql_attr_multi = uint tags from field
I have to filter the search based on any of the keywords the user has selected, Music, Sports, Food etc. As such, only MVA is the right option to do this. But I am just not able to figure out, how to do this. I can store all tag elements as numeric values and make the tags field as int. But what about the comma or how will I convert the whole string (Music, Art, Sports, Food) as an integer. Later, how do I call setFilter using PHP.
Any help is highly appreciated.
Well using a MVA, suggests you already unique-ids for each tag.
Which if you had a seperate table for tags (with a PK), and many-to-many table joining your documents, and tags. (thats a very common way to store tags - in normal form)
If you have a text column containing the text, would be easier to just use a Field. Can easily filter by fields in the main text-query.
crispy creams #tags Food
for example (thats extended mode query)
(But fields can't do Grouping like you can with Attributes)

Is there way to create & index N no. of 'fields' dynamically with Sphinx

I am using Sphinx (with Thinking Sphinx v2.0 for RoR plugin),
Lets say I have several indexes on User model, lets say on 'name', 'address' and its one-to-many associations like 'posts' , 'comments' etc.
This means searching by post content would return me the User who made the post, and using :fieldmask 'rank mode' of sphinx, I am able to determine that the user was searched due to matching of 'posts'. But user has 'many' posts. So how to determine which 'post' it matched.
Is there any way, while indexing I can specify the index dynamically.?
For e.g. If I can specify index 'post_1'='< post1content >' , 'post_5'='< post5content >' as different 'fields' for user1; similarly 'post_2', 'post_7' for user2, Thus after searching It would return me user2 matched with matching fields as post_7...
Sphinx can't have different fields for each record, I'm afraid, so what you're hoping to do isn't possible with that approach.
If you need to know which posts match a query, I'd recommend conducting the search on the Post model instead, and then you can refer to a post's user? You could sort by user_id before weight, or group by user_id (so only one post per user is returned)? You'd be able to bring in user data into the Post index definition (and if a post has one user, then that data is kept to single values, instead of many, per record).
Hope this gives you some clarity with your options.
If you know, you want to search for post_5 in one query, and for post_7 in another query, you may use json as {post_1:, post_2:}.
Problem is that you have to know number of post you are searching for.
Maybe look to: https://stackoverflow.com/a/24505347/1444576 -if it is similar to your example.

Is it possible to perform a Sphinx search on one string attribute?

sql_query=SELECT id,headline,summary,body,tags,issues,published_at
FROM sphinx_search
I am working on the search feature of my Web site and I am using Sphinx, Perl and Sphinx::Search. As long as I want to search in all the attributes and I don't restrict it to just one, everything goes well. However when the user searches for a specific tag, I can't just give the result of a fuzzy search, I want to use the power of Sphinx to search only on tags or issues, maybe sometimes the user wants to search on headline and issues.
How can I perform such a task?
You need to put it in Extended Match Mode
https://metacpan.org/module/JJSCHUTZ/Sphinx-Search-0.27.2/lib/Sphinx/Search.pm#SetMatchMode
Then you can use Extended Query syntax
http://sphinxsearch.com/docs/current.html#extended-syntax
Which includes the field search operator
#tags keyword1
(Be careful with sphinx, the word "attribute" has a specific meaning - values attached to the document, useful for sorting/grouping/filtering and returning with the resultset. Whereas I think you are talking about fields. All the columns from the sql_query you dont mark as an attribute, are a field - and full text searchable)