can my int / uint mappings of a configuration be stop words - postgresql

I want to be able to do a postgresql text-search with number inside the search.
Today, for my sentence in English The brown foxes jumped over the 10 lazy dogs. the query a fox jumped over the 2 lazy dogs will return false :
> SELECT to_tsvector('public.english', 'The brown foxes jumped over the 10 lazy dogs.') ## phraseto_tsquery('public.english', 'a fox jumped over the 2 lazy dogs')
false
I want the request to return true, whatever the number specified.
The only solution I found so far is to create a new configuration, based on the existing one, and drop the mappings uint / int from this new configuration :
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION public.english DROP MAPPING FOR, int, uint;
Then I have the result below :
> SELECT to_tsvector('public.english', 'The brown foxes jumped over the 10 lazy dogs.') ## phraseto_tsquery('public.english', 'a fox jumped over the 2 lazy dogs')
true
but with my second test it returns true when it should return false, because no number are specified :
> SELECT to_tsvector('public.english', 'The brown foxes jumped over the 10 lazy dogs.') ## phraseto_tsquery('public.english', 'a fox jumped over the lazy dogs')
true
So I was thinking to mark uint / int as a stop word, but I don't see how to do that. any help would be really appreciated!

Related

Postgres FTS Priority Field

I am using Postgres FTS to search a field in a table. The only issue is for some reason the below issue is happening.
store=# select name from service where to_tsvector(name) ## to_tsquery('find:*') = true;
name
--------------
Finding Nora
(1 row)
store=# select name from service where to_tsvector(name) ## to_tsquery('findi:*') = true;
name
------
(0 rows)
store=# select name from service where to_tsvector(name) ## to_tsquery('findi:*') = true;
How come when searching using the query findi:*,the result doesnt show?
In my PG 12.2 with default text search configuration I have:
# select to_tsvector('Finding Nora');
to_tsvector
-------------------
'find':1 'nora':2
(1 row)
# select to_tsquery('findi:*');
to_tsquery
------------
'findi':*
(1 row)
I understand that because there is no lexeme findi in the default dictionary, the query does not find any match.

Full text search with #

I have tables with a comment column like
create table x (
id uuid primary key,
a_random_business_field int,
comment text,
);
In this comment box, person put #myname. #myname is a the slack name. I would like to search for these #myname. I tried this :
SELECT * FROM x WHERE comment ## to_tsquery('#myname');
and
SELECT * FROM x WHERE comment ## plainto_tsquery('#myname');
It depends on full text search configuration. You can make your to_tsquery function to leave these names as they are. But there are still some problems with the matching of "#" symbol...
SELECT
to_tsvector('simple', 'Hi! My name is #Mike') ## to_tsquery('simple', '#Mike'),
to_tsvector('simple', 'Hi! My name is #Mikel') ## to_tsquery('simple', '#Mike'),
to_tsvector('simple', 'Hi! My name is Mike') ## to_tsquery('simple', '#Mike');

PostgreSQL full text search yielding weird results

I have a schema like this (simplified):
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name NOT NULL
);
CREATE INDEX users_idx
ON users
USING GIN (to_tsvector('finnish', name));
But I'm getting completely invalid results with my queries:
# select name from users where to_tsvector('finnish', name) ## to_tsquery('lemmin');
name
------
(0 rows)
# select name from users where to_tsvector('finnish', name) ## to_tsquery('lemmink');
name
--------------------
Riitta ja Lemminki
Riitta ja Lemminki
(2 rows)
# select name from users where name ilike 'lemmink%';
name
----------------------
Lemminkäinen Matilda
Lemminkäinen Matias
Lemminkäinen Kyösti
Lemminkäinen Tuomas
(4 rows)
Another example:
# select name from users where to_tsvector('finnish', name) ## to_tsquery('partu');
name
----------
Partuuna
(1 row)
# select name from users where to_tsvector('finnish', name) ## to_tsquery('partur');
name
------------------------
Parturi-Kampaamo Raija
Parturi-Kampaamo Siema
(2 rows)
I was expecting to get the bottom two results on both queries...
Using the following version:
psql (9.4.6, server 9.5.2)
WARNING: psql major version 9.4, server major version 9.5.
Some psql features might not work.
I don't speak Finnish, but it seems expected result. FTS looks for lexemes, not for parts of words, Eg, do is not a lexemme for dog, but dog is for dogs:
t=# select to_tsvector('english', 'Dogs eats bone') ## to_tsquery('do');
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
?column?
----------
f
(1 row)
t=# select to_tsvector('english', 'Dogs eats bone') ## to_tsquery('dog');
?column?
----------
t
(1 row)
So I believe in Parturi last i is optional ending - right?..
Update:
from https://en.wiktionary.org/wiki/parturi :
partur[i], partur[eita] => lexeme will be partur

EntityFramework Query select (contains) (all villages that have farmers which plant apples)

village(id, list(farmers))
farmer(id, List(fruits));
fruit(id,name).
How would I write a query that selects all villages that have the fruit with ID 23 (e.g. apples)?
It would be easy to write this with 2 queries. How wold you do it with one?
Try
var villages = db.Villages
.Where(v => v.Farmaers.Any(f => f.Fruits.Any(t => t.Id == 23)));

Postgresql: how to make full text search ignore certain tokens?

is there a magic function or operator to ignore some tokens?
select to_tsvector('the quick. brown fox') ## 'brown' -- returns true
select to_tsvector('the quick,brown fox') ## 'brown' -- returns true
select to_tsvector('the quick.brown fox') ## 'brown' -- returns false, should return true
select to_tsvector('the quick/brown fox') ## 'brown' -- returns false, should return true
I'm afraid that you are probably stuck. If you run your terms through ts_debug you will see that 'quick.brown' is parsed as a hostname and 'quick/brown' is parsed as filesystem path. The parser really isn't that clever sadly.
My only suggestion is that you preprocess your texts to convert these tokens to spaces. You could easily create a function in plpgsql to do that.
nicg=# select ts_debug('the quick.brown fox');
ts_debug
---------------------------------------------------------------------
(asciiword,"Word, all ASCII",the,{english_stem},english_stem,{})
(blank,"Space symbols"," ",{},,)
(host,Host,quick.brown,{simple},simple,{quick.brown})
(blank,"Space symbols"," ",{},,)
(asciiword,"Word, all ASCII",fox,{english_stem},english_stem,{fox})
(5 rows)
As you can see from the above you don't get tokens for quick and brown