Return Start/End position of match in Sphinx - sphinx

Can I request star and stop position of matches in a document with sphinx?
Given say
select ID from idx_1 WHERE (MATCH('#(name) "New York"'))
can I ask it to tell me the character position of the first Letter, 'N', in New York and the last letter, K, in New Yor'K' in the match?

Sphinx does not track character positions, so can't directly tell you that.
Could use BuildExcerpts or SNIPPET function, which could perhaps compare the output with the documet to deduce the position yourself.
Or there is the PACKEDFACTORS function, which will give you many details of the ranking calculation. In there is the WORD position of each keyword. (sphinx does track word positions, as all its matching is work (well token) based)

Yes you can use like below while using sphinxQL
select * from courses where MATCH('#title PMP') limit 0, 100;

Related

How do I use pg_trgm to be more permissible

I used pg_trgrm to check string matches and I am pretty happy with the results. But it is not pefrectly the way I want it. I want that searches like "poduto" finds "produtos" (the r was missing). And Also that "sofáa" finds "sofa". I am using posgresql 9.6.
It does find "vermelho" when I type "vermelo" (h is missing). And it does find "sofa" when I type "sof". It seems that only some letters in middle can be left out and I always can miss a final letter. I want to be able to miss any letter in the middle of the word. And also be able to commit "two mistakes" in the case of sofáa and sofá (I used an accent and used one additional "a").
The solution is to lower pg_trgm.similarity_threshold (or pg_trgm.word_similarity_threshold if you are using <% or %>).
Then words with lower similarity will also be found.

I can't understand the behaviour of btrim()

I'm currently working with postgresql, I learned about this function btrim, I checked many websites for explanation, but I don't really understand.
Here they mention this example:
btrim('xyxtrimyyx', 'xyz')
It gives trim.
When I try this example:
btrim('xyxtrimyyx', 'yzz')
or
btrim('xyxtrimyyx', 'y')
I get this: xyxtrimyyx
I don't understand this. Why didn't it remove the y?
From the docs you point to, the definition says:
Remove the longest string consisting only of characters in characters
(a space by default) from the start and end of string
The reason your example doesn't work is because the function tries to strip the text from Both sides of the text, consisting only of the characters specified
Lets take a look at the first example (from the docs):
btrim('xyxtrimyyx', 'xyz')
This returns trim, because it goes through xyxtrimyyx and gets up to the t and doesn't see that letter in xyz, so that is where the function stops stripping from the front.
We are now left with trimyyx
Now we do the same, but from the end of the string.
While one of xyz is the last letter, remove that letter.
We do this until m, so we are left with trim.
Note: I have never worked with any form of sql. I could be wrong about the exact way that postgresql does this, But I am fairly certain from the docs that this is how it is done.

Looking for the first underscore then find the 5th space after

I am trying to create a filter where I am looking for the first (or last) occurrence of an underscore (or it can be any character) and then start from there to look for the 5th character.
I am thinking of something along the lines of either right or left char index picking a side to start on. Really trying to look for a good explanation of why your answer is written in that manner.
Example: I am looking for __poptarts_________.
So I would want it to start at the leftmost _ and search for the 5th character after that (p).
You could achieve that by using both SUBSTRING and CHARINDEX
SELECT SUBSTRING (string,(CHARINDEX('_',string,0)+1),5)
In your case which would be:
SELECT SUBSTRING ('I am looking for __poptarts_________',(CHARINDEX('_','I am looking for __poptarts_________',0)+1),5)
Result is _popt because you put two '_' before 'p'

Selecting words out of table which sound similar

I read an interesting article about English and phonetics - and would like to see if my newfound knowledge can be applied in TSQL to generate a fuzzy result set. In one of my applications, there is a table containing words, which I extracted from a word list. It is literally a one-column table -
Word |
------
A
An
Apple
...
their
there
Is there an built-in function in SQL Server to Select a word which Sounds The same, even though it is spelled different? (The globalization settings are on en-ZA - as last time I checked)
SELECT Word FROM WordTable WHERE Word = <word that sounds similar>
SoundEx()
SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken.
Difference()
Returns an integer value that indicates the difference between the SOUNDEX values of two character expressions.
SELECT word
, SoundEx(word) As word
, SoundEx(word_that_sounds_similar) As word_that_sounds_similar
, Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) As how_similar
FROM wordtable
WHERE Difference(SoundEx(word), SoundEx(word_that_sounds_similar)) <= 1 /* quite close! */
The value returned by Difference() indicates how similar the two words are.
A value of 0 indicates a strong match and a value of 4 means slim-to-no match.

Is there a way to get the page number of the start of a paragraph with Applescript in Mac Word 2011?

I have a Word document and want to get the page number for any arbitrary paragraph within the document. I realise that paragraphs can span pages, so I actually need to ask about the start (or end) of the paragraph. Something like this pseudocode:
set the_page_number to page number of character 1 of paragraph 1 of my_document
I haven't been able to figure out how you link a range object with any kind of information about its rendering and am officially baffled.
Does anyone know the proper way?
I just found this question about dealing with this in C#: How do I find the page number for a Word Paragraph?
Poking around in the answer to that I found reference to range.get_Information(Word.WdInformation.wdActiveEndPageNumber)
It turns out there's a get range information command in the applescript dictionary, so you can do this:
set the_range to text object of character 1 of paragraph 123 of the_document
set page_number to get range information the_range information type active end adjusted page number
That'll get the page number that would be printed (e.g. if you'd set the document to start at page 42, this will produce the number you expect). Or, you can get the number without adjustment, i.e. your document page numbering is set to start at 42, but you want the page number as if numbering started at 1.
set the_range to text object of character 1 of paragraph 456 of the_document
set page_number to get range information the_range information type active end page number
Phew.