SPH_MATCH_EXTENDED sphinx - sphinx

I have the following in my index this is test, and I want to be able to search for "this test is".
In other words, I'm looking for exact match with no regards to the words position.
Is this true?
$cl->SetMatchMode( SPH_MATCH_EXTENDED);
$result = $cl->Query( '"this test is"~3', $index );
If it is true then is the number after the ~ should be the count of the words to get the exact match search with no regards to the positions?
I have tested the result and it looks good but I'm not sure.

Your use of the proximity operator looks fine to me.

Related

Microsoft graph Mail Search Strict value

I have an issue with the search parameters. I want to pass a phrase in my query. For exemple i'm looking for emails where the subject is "Test 1".
For this i'm doing a get on this ressource.
https://graph.microsoft.com/v1.0/me/messages?$search="subject:Test 1"
But the behaviour of this query is : Looking for mails that contains "Test" in the subject OR 1 in any other fields.
Refering to the KQL Syntax
A phrase (includes two or more words together, separated by spaces; however, the words must be enclosed in double quotation marks)
So, to do what i want i have to put double quotes (") around my phrase to do a strict value search. Like below
subject:"Test 1"
The problem it's at this point. Microsoft graph api already use double quotes (") after the parameters $search.
?$search="Key words"
So I can't do what is mentioned in the KQL doc.
https://graph.microsoft.com/v1.0/me/messages?$search="subject:"Test 1""
It's throwing an error :
"Syntax error: character '1' is not valid at position 15 in '\"subject:\"test 1\"\"'.",
It's an expected behaviour. I was pretty sure it will not work.
If someone has any suggestions for a solution or a workaround, I'm a buyer.
What I've already tried so far :
Use simple quote
Remove the quotes right after $select=
Remove the subject part $select="Test 1", same behaviour as the first request mentioned in this post. It will looks for emails that contain "test" or "1".
Best regards.
EDIT :
After sasfrog's anwser :
I used $filter : It works well with simple operator AND, OR.I have some errors by using the Not Operator. And btw you have to use the orderby parameter to show the result by date and add the field in filter parameters.
Exemple 1 (working, what I asked for first) :
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc &$filter=receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')
Exemple 2 (not working)
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc &$filter=(receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')) NOT(contains(from/EmailAddress/address,[specific address]))
EDIT 2
After some test with the filter parameters.
The NOT operator is still not working so to workaround use "ne" (non-equals)
the example 2 becomes :
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc&$filter=(receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')) AND (from/EmailAddress/address ne [specific address])
UPDATE : OTHER SOLUTION WITH $search
Using $filter is great but it looks like it was sometimes pretty slow. So I found a workaround aboutmy issue.
It's to use AND operator between all terms.
Exemple 4 :
I'm looking for the mails where the subject is test 1;
Let value = "test 1". So you have to splice it by using space separator. And after write some code to manipulate this array, to obtain something like below.
$search="(subject:test AND subject:1)"
The brackets can be important if you use a multiple fields search. And Voilà.
Not sure if it's sufficient for what you're doing, but how about using the contains function within a filter query instead:
https://graph.microsoft.com/v1.0/me/messages?$filter=contains(subject,'Test 1')
Sounds like you're already looking at the doco but here it is just in case.
Update also, this worked for me using the search method:
https://graph.microsoft.com/v1.0/me/messages?$search="subject:'Test 1'"

Tableau Filter Formula

I am trying to filter workgroup name that only contains BL or CL so I used the formula...
STARTSWITH([wrkgrp_shrt_nm], "BL") or STARTSWITH([wrkgrp_shrt_nm], "CL" )
I get the little green check, but when I hit apply it is blank and nothing pulls through
I tried another formula...
if right([wrkgrp_shrt_nm],2) = 'BL' then 1 elseif
right([wrkgrp_shrt_nm],2) = 'CL' then 1 elseif
right([wrkgrp_shrt_nm],2) then 0
end
but I am only getting an error
any suggestions?
If you want "contains", you can just call contains()
contains(wrkgrp_shrt_nm, 'BL') or contains(wrkgrp_shrt_nm, 'CL')
Does the same thing as the find() solution Fred posted, just a little easier to read in this case. I'm not sure why Fred says you cannot use IF. I use IF all the time without problems.
BTW, in case you were wondering, the square brackets around field names are optional if the field name does not include spaces or punctuation, and function names are not case sensitive.
To clarify, you're asking for "contains BL or CL", but your formula specify STARTSWITH which will be true is your field [wrkgrp_shrt_nm] starts with the string "BL" or the string "CL".
If you want "contains", you could use FIND:
FIND([wrkgrp_shrt_nm], 'BL' ) > 0 OR FIND([wrkgrp_shrt_nm], 'CL' ) > 0
You cannot use IF in a condition field, but you can use inline IF (IIF), however it's not necessary in your case.
Edit:
I can totally be wrong with my comment on IF (because I'm still new in Tableau) but I tried IF in a condition field of a filter (as the OP asked) and I can't make it work. I use IF all the time in Calculated Fields however. I'll try again...

regexpression [R] "About 52,883,038 results"

I want to parse an html webpage (Specifically a Google Search Results Page)
Looking for the specific counter string
"About *many results"
where *many can range from 0 to 999,999,999,999 results
grep("About [0-9] results",file)
I can't figure out how to incorporate the range of numbers (including commas) into the regular expression. Can anyone clarify? I've looked for similar questions posted, but their codes do not work for this task.
I'm guessing introduce some kind of wildcard "." but I don't think I'm using it correctly
The structure I had in mind was
Any#Times { { Any#Times( [0-9] ) },}
Solved own question...
didn't have to be fancy at all
"About .* results"
works fine
Depending on the content of the page then your .* works, but could get a very long and incorrect string.
If you want to make sure that you get only numbers, try:
"About ([0-9]+|[0-9]{1,3}(,[0-9]{3})*) results"
I've tested it with grep -E and it'll give you ungrouped numbers:
About 10000000 results
as well as grouped numbers using British/English conventions:
About 100,000 results
but not non-numbers:
About a bajillion results
or badly grouped numbers:
About 100,0 results

What's the opposite of substr?

I have a text file like so:
dave ran very quickly
dan very slowly ran
I am doing a regex to look for the word "ran" but I also need to know where it starts (in the first case it is character 6, in the second case it's 17).
I have (though it isn't much):
for(#lines){
if(/ran/){
# find where ran is so we can continue parsing
}
}
It's easy:
my $ran_pos = $-[0];
See the perlvar man page for a detailed description of the #- array.
I believe the index function is what you're looking for.
Here are a couple links:
http://perlmeme.org/howtos/perlfunc/index_function.html
http://www.misc-perl-info.com/perl-index.html
index STR,SUBSTR will return the position of a substring within a string.

Zend Search Lucene and Accented Characters

I'm trying to find a way in Zend_Search_Lucene to pull off the following scenario:
Let's say we have a user and her name is Aïcha (note the special character). If I'm searching the index for Aicha (without the special derivative of i), I'd like for Aïcha to be returned in the results.
Is there something special I need to do when indexing or searching in order to make this work? I've read solutions about normalizing the data before indexing, replacing all special characters with normalized characters, but I'd rather not go that route.
Thanks in advance,
Gary
function normalize ($string){
$a = 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
ßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕ';
$b = 'aaaaaaaceeeeiiiidnoooooouuuuy
bsaaaaaaaceeeeiiiidnoooooouuuyybyRr';
$string = utf8_decode($string);
$string = strtr($string, utf8_decode($a), $b);
$string = strtolower($string);
return utf8_encode($string);
}
$passToIndexer = normalize(" Aïcha ");
try to use this functions output while creating the index, store the actual value without indexing it =) hope it helps, I Frankly dont think there is any other way.