PostgreSQL full text search cannot find "andy" - postgresql

I have this PostgreSQL query:
SELECT d.user_id, display_name, avatar_url
FROM user_directory_search
WHERE
user_id like '#and%';
I get these results:
user_id | display_name | avatar_url
----------------------------------------+--------------+------------
#andy.huang:synapse.siliconmotion.com | |
#andy.zhao:synapse.siliconmotion.com | Andy.zhao |
#andy.yao:synapse.siliconmotion.com | |
#andy.zou:synapse.siliconmotion.com | |
#andy.xie:synapse.siliconmotion.com | |
#andy.chang:synapse.siliconmotion.com | andy.chang |
#andy.chuang:synapse.siliconmotion.com | andy.chuang |
#andy.hsiao:synapse.siliconmotion.com | |
(8 rows)
But when I use the command:
SELECT d.user_id, display_name, avatar_url
FROM user_directory_search
WHERE
vector ## to_tsquery('english', '(andy:* | andy)');
I got nothing:
user_id | display_name | avatar_url
---------+--------------+------------
(0 rows)
Does anyone know the reason?

The problem is that the full text parser parses these strings as host names:
SELECT alias, description, token, lexemes
FROM ts_debug('english', '#andy.huang:synapse.siliconmotion.com')
WHERE alias <> 'blank';
alias | description | token | lexemes
-------+-------------+---------------------------+-----------------------------
host | Host | andy.huang | {andy.huang}
host | Host | synapse.siliconmotion.com | {synapse.siliconmotion.com}
(2 rows)
You could replace the offending periods with spaces during indexing:
SELECT alias, description, token, lexemes
FROM ts_debug('english',
translate('#andy.huang:synapse.siliconmotion.com', '.', ' '))
WHERE alias <> 'blank';
alias | description | token | lexemes
-----------+-----------------+---------------+--------------
asciiword | Word, all ASCII | andy | {andi}
asciiword | Word, all ASCII | huang | {huang}
asciiword | Word, all ASCII | synapse | {synaps}
asciiword | Word, all ASCII | siliconmotion | {siliconmot}
asciiword | Word, all ASCII | com | {com}
(5 rows)
But I would use the simple full text search configuration if I were you. Or do you want stemming (compare "token" and "lexemes" above)?

Related

Create full text search configuration with two dictionaries

I want to perform a full text search on a postgresql column using the english_stem dictionary and the simple dictionary. I can do something like this:
ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
But this checks that the word is in both dictionaries. Is there a way to alter this configuration so the word can be matched with one dictionary OR the other?
Edit:
The reason I think they are not being checked in order is because when searching for a partial word that should be found in the simple dictionary, nothing is returned.
select * from ts_debug('english', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+----------------+--------------+----------
asciiword | Word, all ASCII | gutter | {english_stem} | english_stem | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {english_stem} | english_stem | {clean}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {english_stem} | english_stem | {servic}
select * from ts_debug('simple', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | gutter | {simple} | simple | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {simple} | simple | {cleaning}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {simple} | simple | {services}
select name from categories where (to_tsvector('english_simple_conf', name) ## (to_tsquery('english_simple_conf', 'cleani:*')));
name
------
(0 rows)
But searching for a partial in the english dictionary returns as expected.
select name from categories where (to_tsvector('english_simple_conf', name) ## (to_tsquery('english_simple_conf', 'clea:*')));
name
--------------------------
Gutter Cleaning Services
But this checks that the word is in both dictionaries.
That's not correct. As noted in the docs (see the description for the dictionary_name parameter), it checks them in order; it only checks the 2nd dictionary if it did not get a token from the first. You can verify this with ts_debug().
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | cars | {simple} | simple | {cars}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {simple} | simple | {boats}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+-----------------------+--------------+------------
asciiword | Word, all ASCII | cars | {english_stem,simple} | english_stem | {car}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {english_stem,simple} | english_stem | {boat}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
The reason for the difference in the last two queries is that english_stem stems 'Cleaning' to 'clean', so searching for 'cleani*' will not match. Try adding the to_tsvector and to_tsquery expressions as a column and removing them from the WHERE; you'll see that "Gutter Cleaning Services" is stemmed to 'clean':2 'gutter':1 'servic':3.
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleani:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'cleani':* | Gutter Cleaning Services
(1 row)
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleaning:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'clean':* | Gutter Cleaning Services
(1 row)
If you change the ts_query to instead search for cleaning:*, that will get stemmed as well and again match. But, english_stem cannot figure out that 'cleani' is meant to stem to 'clean' unless it also sees the 'ng'. So, that falls through to simple, which performs no stemming, and you end up with the mismatch - still a trailing i in the tsquery, but not in the tsvector.
Stemming isn't meant to work on arbitrary prefixes of words, only on whole ones; for prefix matching, you'd use a traditional left-anchored LIKE.

Sorting Issue with Underscore in Postgres

I'm trying to perform sorting on below data but postgres return the wrong sorting result.
Can someone please help me over her. How can I get proper sorting data.
Here I'm write below query to get data,
SELECT * FROM TempTable ORDER BY a_test ASC NULLS FIRST;
and it's return result like below,
| BB001217 |
| BB001217_000010 |
| BB001217_000011 |
| BB001217_00002 |
| BB001217_00003 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_000010 |
| BB001220_000011 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
And I Expected result in below form,
| BB001217 |
| BB001217_00002 |
| BB001217_00003 |
| BB001217_000010 |
| BB001217_000011 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
| BB001220_000010 |
| BB001220_000011 |
From PostgreSQL v10 on you could use an ICU collation that provides “natural sorting”:
CREATE COLLATION english_natural (
LOCALE = 'en-US-u-kn-true',
PROVIDER = icu
);
SELECT *
FROM TempTable
ORDER BY a_test COLLATE english_natural
ASC NULLS FIRST;
You are storing numbers in a VARCHAR column and the sorting is thus based on character sorting where '10' is considered to be smaller than '2'
You need to split the column into two parts, then convert the second to a number and sort on those two:
SELECT *
FROM temptable
ORDER BY split_part(a_test,'_',1),
nullif(split_part(a_test,'_',2),'')::int ASC NULLS FIRST;
Online example: https://rextester.com/RNU44666

Weird ghost records in PostgreSQL - what are they?

I have a very weird issue on our postgresql DB. I have a table called "statement" which has some strange records in it.
Using the command line console psql, I query select * from customer.statement where type in ('QUOTE'); and get 12 rows back. 7 rows look normal, 5 are missing all data except a single column which is a nullable column but seems to hold real values entered by the user. psql tells me that 7 rows were returned even though there are 12. Most of the other columns are not nullable. The weird records look like this:
select * from customer.statement where type = 'QUOTE';
id | issuer_id | recipient_id | recipient_name | recipient_reference | source_statement_id | catalogue_id | reference | issue_date | due_date | description | total | currency | type | tax_level | rounding_mode | status | recall_requested | time_created | time_updated | time_paid
------------------+------------------+------------------+----------------+---------------------+---------------------+--------------+-----------+------------+------------+------------------------------------------------------------------+-----------+----------+-------+-----------+---------------+-----------+------------------+----------------------------+----------------------------+-----------
... 7 valid records removed ...
| | | | | | | | | | Build bulkheads and sheet with plasterboard. +| | | | | | | | | |
| | | | | | | | | | Patch all patches. +| | | | | | | | | |
| | | | | | | | | | Set and sand all joints ready for painting. +| | | | | | | | | |
| | | | | | | | | | Use wall angle on bulkhead in main bedroom. +| | | | | | | | | |
| | | | | | | | | | Build nib and sheet and set in entrance | | | | | | | | | |
(7 rows)
If I run the same query using pgAdmin, I don't see those weird records.
Anyone know what these are?
The plus sign before the separator (+|) indicates a newline character in the displayed string value in psql. So no additional rows, just the same row continued with line breaks. The final line of output in your quote confirms as much: (7 rows).
In pgAdmin you don't see the extra lines as long as you don't increase the height of the field (or copy / paste the content somewhere), but there are multiple lines as well.
Try in psql and in pgAdmin:
test=# SELECT E'This\nis\na\ntest.' AS multi_line, 'foo' AS single_line;
multi_line | single_line
--------------+-------------
This +| foo
is +|
a +|
test. |
(1 row)
The manual about psql:
linestyle
Sets the border line drawing style to one of ascii, old-ascii, or unicode. [...] The default setting is ascii. [...]
ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. [...]

Using variables in select (apostrophes needed)

psql (9.6.1, server 9.5.5)
employees
Column | Type | Modifiers | Storage | Stats target | Description
----------------+-----------------------------+-----------------------------------------------------------------+----------+--------------+---- ---------
employee_id | integer | not null default nextval('employees_employee_id_seq'::regclass) | plain | |
first_name | character varying(20) | | extended | |
last_name | character varying(25) | not null | extended | |
email | character varying(25) | not null | extended | |
phone_number | character varying(20) | | extended | |
hire_date | timestamp without time zone | not null | plain | |
job_id | character varying(10) | not null | extended | |
salary | numeric(8,2) | | main | |
commission_pct | numeric(2,2) | | main | |
manager_id | integer | | plain | |
department_id | integer
For self education I'd like to use a variable.
The result of this request would suit me:
hr=> select last_name, char_length(last_name) as Length from employees where substring(last_name from 1 for 1) = 'H' order by last_name;
last_name | length
-----------+--------
Hartstein | 9
Higgins | 7
Hunold | 6
(3 rows)
But for self education I'd like to use a variable:
\set chosen_letter 'H'
hr=> select last_name, char_length(last_name) as Length from employees where substring(last_name from 1 for 1) = :chosen_letter order by last_name;
ERROR: column "h" does not exist
LINE 1: ...ployees where substring(last_name from 1 for 1) = H order by...
^
Those apostrophes seems to ruin everything. And I can't cope with the problem.
Could you help me understand how to use variable to acquire the result as above?
Try using:
\set chosen_letter '''H'''

Error in Insert query : syntax error at or near ","

My insert query is,
insert into app_library_reports
(app_id,adp_id,reportname,description,searchstr,command,templatename,usereporttemplate,reporttype,sentbothfiles,useprevioustime,usescheduler,cronstr,option,displaysettings,isanalyticsreport,report_columns,chart_config)
values
(25,18,"Report_Barracuda_SpamDomain_summary","Report On Domains Sending Spam Emails","tl_tag:Barracuda_spam AND action:2","BarracudaSpam/Report_Barracuda_SpamDomain_summary.py",,,,,,,,,,,,);
Schema for the table 'app_library_reports' is:
Table "public.app_library_reports"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+---------+------------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('app_library_reports_id_seq'::regclass) | plain | |
app_id | integer | | plain | |
adp_id | integer | | plain | |
reportname | text | | extended | |
description | text | | extended | |
searchstr | text | | extended | |
command | text | | extended | |
templatename | text | | extended | |
usereporttemplate | boolean | | plain | |
reporttype | text | | extended | |
sentbothfiles | text | | extended | |
useprevioustime | text | | extended | |
usescheduler | text | | extended | |
cronstr | text | | extended | |
option | text | | extended | |
displaysettings | text | | extended | |
isanalyticsreport | boolean | | plain | |
report_columns | json | | extended | |
chart_config | json | | extended | |
Indexes:
"app_library_reports_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"app_library_reports_adp_id_fkey" FOREIGN KEY (adp_id) REFERENCES app_library_adapter(id)
"app_library_reports_app_id_fkey" FOREIGN KEY (app_id) REFERENCES app_library_definition(id)
When I execute insert query it gives error:ERROR: syntax error at or near ","
Please help me to find out this error.Thank you.
I'm fairly certain your immediate error is coming from the empty string of commas (i.e. ,,,,,,,) appearing at the end of the INSERT. If you don't want to specify values for a particular column, you can pass NULL for the value. But in your case, since you only specify values for the first 6 columns, another way is to just specify those 6 columns names when you insert:
INSERT INTO app_library_reports
(app_id, adp_id, reportname, description, searchstr, command)
VALUES
(25, 18, 'Report_Barracuda_SpamDomain_summary',
'Report On Domains Sending Spam Emails', 'tl_tag:Barracuda_spam AND action:2',
'BarracudaSpam/Report_Barracuda_SpamDomain_summary.py')
This insert would only work if the columns not specified accept NULL. If some of the other columns are not nullable, then you would have to pass in values for them.