PostgreSql XML Text search - postgresql

I have a text column in a table. We store XML in this column. Now I want to search for tags and values
Example data:
<bank>
<name>Citi Bank</name>
.....
.....
/<bank>
I would like to run the following query:
select * from xxxx where to_tsvector('english',xml_column) ## to_tsquery('<name>Citi Bank</name>')
This works fine but it also works for tags like name1 or no tag.
How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?

You could use the xpath function like this
select *
from xxx
where xpath(xml_column, 'bank/name/text()') = 'CitiBank';
BUT it won't use the full-text search index. You could use a subquery to find probable matches and avoid full scans, and the xpath expression for getting correct answers, or create a function index if the queries are going to be always the same.

You might want to reconsider storing XML in a database, instead you could look at inserting the data into related tables, since using XML is a poor replacement for a relational store. Even if you go with XML in database, use the XML type, not the TEXT type, and create an index like this (yes, basically you'd need an index per xpath expression):
CREATE INDEX my_funcidx ON my_table USING GIN ( CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) );
then, query it like this:
SELECT * FROM my_table WHERE CAST(xpath('/bank/name/text()', xmlfield) AS TEXT[]) #> '{Citi Bank}'::TEXT[];
and this will use the index, as EXPLAIN will indicate.
The important part is the CASTing to TEXT[], as XML[], which the xpath function returns, isn't indexable by default.

Related

Postgres full text search for partial word

I've been seeing a lot of examples around like this one:
postgres full text search like operator
They all specify that you can do a prefix search like this:
SELECT *
FROM eventlogging
WHERE description_tsv ## to_tsquery('mess:*');
and it will retrieve a word like: "message"
However, what I do not see anywhere is whether or not there is a way to search for different parts of a word, such as a suffix?
The example that I am having trouble with right now is this:
CREATE TABLE IF NOT EXISTS project (
id VARCHAR NOT NULL,
org_name VARCHAR NOT NULL DEFAULT '',
project_name VARCHAR NOT NULL DEFAULT ''
);
insert into project(id, org_name, project_name) values ('123', 'org', 'proj');
insert into project(id, org_name, project_name) values ('456', 'huh', 'org');
insert into project(id, org_name, project_name) values ('789', 'orgs', 'project');
CREATE OR REPLACE FUNCTION get_projects(query_in VARCHAR)
RETURNS TABLE (id VARCHAR, org_name VARCHAR, project_name VARCHAR) AS $$
BEGIN
RETURN QUERY
SELECT * FROM project WHERE (
to_tsvector('simple', coalesce(project.project_name, '')) ||
to_tsvector('simple', coalesce(project.org_name, ''))
) ## to_tsquery('simple', query_in);
END;
$$ LANGUAGE plpgsql;
The following example returns:
select * from get_projects('org');
id org_name project_name
----------------------------
123 org proj
456 huh org
My question is: why does it not return orgs? Similarly, if I search for proj, I only get the project named "proj" but not the one named "project."
Bonus points: how can I get results if I search for a substring? For example, if I search for the string jec, I would like to get back the project named project. I'm not really looking for fuzzy searching, but I would say that I am looking for substring searching.
Am I completely wrong to be using to_tsquery? I also tried plainto_tsquery and I tried using english instead of simple, but several references said to stick with simple.
Full text search is different from substring search. Full text search is about searching whole words, omitting frequent words from indexing, ignoring inflection and the like. PostgreSQL full text search extends that somewhat by allowing prefix searches.
To search for substrings, you have to search with a condition like
WHERE word ~ 'suffix\M'
(This would be a suffix search with the regular expression matching operator ~.)
To speed up a search like that, create a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON tab USING gin (doc gin_trgm_ops);
So-called prefix searching doesn't really thematically belong in full text searching. I think it was tossed in because, given that tokens would be stored in a btree anyway, adding that "feature" was free. No other types of partial matching are mentioned in the context of FTS because they don't exist.
You discuss the partial matching that does exist with FTS, the :* notation. But then in your example, you don't actually use it. That is why you don't see it working, because you don't use it. If you do use it, it does work:
select * from get_projects('org:*');
But given your description, it sounds like you don't want FTS in the first place. You want LIKE or regex, perhaps with index support from pg_trgm.
but several references said to stick with simple.
It is hard to know how good the judgement of anonymous references are, but if you only want to use 'simple' than most likely you shouldn't be using FTS in the first place. 'simple' is useful for analyzing or learning or debugging real FTS situations, and can be used as a baseline for building up more complex configurations.

PostgreSQL, allow to filter by not existing fields

I'm using a PostgreSQL with a Go driver. Sometimes I need to query not existing fields, just to check - maybe something exists in a DB. Before querying I can't tell whether that field exists. Example:
where size=10 or length=10
By default I get an error column "length" does not exist, however, the size column could exist and I could get some results.
Is it possible to handle such cases to return what is possible?
EDIT:
Yes, I could get all the existing columns first. But the initial queries can be rather complex and not created by me directly, I can only modify them.
That means the query can be simple like the previous example and can be much more complex like this:
WHERE size=10 OR (length=10 AND n='example') OR (c BETWEEN 1 and 5 AND p='Mars')
If missing columns are length and c - does that mean I have to parse the SQL, split it by OR (or other operators), check every part of the query, then remove any part with missing columns - and in the end to generate a new SQL query?
Any easier way?
I would try to check within information schema first
"select column_name from INFORMATION_SCHEMA.COLUMNS where table_name ='table_name';"
And then based on result do query
Why don't you get a list of columns that are in the table first? Like this
select column_name
from information_schema.columns
where table_name = 'table_name' and (column_name = 'size' or column_name = 'length');
The result will be the columns that exist.
There is no way to do what you want, except for constructing an SQL string from the list of available columns, which can be got by querying information_schema.columns.
SQL statements are parsed before they are executed, and there is no conditional compilation or no short-circuiting, so you get an error if a non-existing column is referenced.

What PostgreSQL type is good for stroring array of strings and offering fast lookup afterwards

I am using PostgreSQL 11.9
I have a table containing a jsonb column with arbitrary number of key-values. There is a requirement when we perform a search to include all values from this column as well. Searching in jsonb is quite slow so my plan is to create a trigger which will extract all the values from the jsonb column:
select t.* from app.t1, jsonb_each(column_jsonb) as t(k,v)
with something like this. And then insert the values in a newly created column in the same table so I can use this column for faster searches.
My question is what type would be most suitable for storing the keys and then searchin within them. Currently the search looks like this:
CASE
WHEN something IS NOT NULL
THEN EXISTS(SELECT value FROM jsonb_each(column_jsonb) WHERE value::text ILIKE search_term)
END
where the search_term is what the user entered from the front end.
This is not going to be pretty, and normalizing the data model would be better.
You can define a function
CREATE FUNCTION jsonb_values_to_string(
j jsonb,
separator text DEFAULT ','
) RETURNS text LANGUAGE sql IMMUTABLE STRICT
AS 'SELECT string_agg(value->>0, $2) FROM jsonb_each($1)';
Then you can query like
WHERE jsonb_values_to_string(column_jsonb, '|') ILIKE 'search_term'
and you can define a trigram index on the left hand side expression to speed it up.
Make sure that you choose a separator that does not occur in the data or the pattern...

Timescaledb - How to display chunks of a hypertable in a specific schema

I have a table named conditions on a schema named test. I created a hypertable and inserted hundreds of rows.
When I run select show_chunks(), it works and displays chunks but I cannot use the table name as parameter as suggested in the manual. This does not work:
SELECT show_chunks("test"."conditions");
How can I fix this?
Ps: I want to query the chunk itself by its name? How can I do this?
The show_chunks expects a regclass, which depending on your current search path means you need to schema qualify the table.
The following should work:
SELECT public.show_chunks('test.conditions');
The double quotes are only necessary if your table is a delimited identifier, for example if your tablename contains a space, you would need to add the double quotes for the identifier. You will still need to wrap it in single quotes though:
SELECT public.show_chunks('test."equipment conditions"');
SELECT public.show_chunks('"test schema"."equipment conditions"');
For more information about identifier quoting:
https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
Edit: Addressing the PS:
I want to query the chunk itself by its name? How can I do this?
feike=# SELECT public.show_chunks('test.conditions');
show_chunks
--------------------------------------------
_timescaledb_internal._hyper_28_1176_chunk
_timescaledb_internal._hyper_28_1177_chunk
[...]
SELECT * FROM _timescaledb_internal._hyper_28_1176_chunk;

How to make fulltext search in PostgreSQL useful?

I have a russian dictionary in postgresql 8.4.
I try to use full search but have some troubles.
I dont get any results becouse try to find words by 4-5symbols. For example:
select * from parcels_temp where name_dispatcher ## to_tsquery('Нику');
Get result: 0 rows.
select * from parcels_temp where name_dispatcher ## to_tsquery('Никуд');
Get result: 2 rows. its correct.
I try to do search by words not contained in dictionary. What i gonna do in this case? How can i update dictionary in PostgreSQL?
Its must create column to tsvector or i can use to_tsvector function i queries? Or its more slowly?
Postgresql indexer does not process strings shorter than 3 characters. It is done to reduce index size and generation time.
Obviously, you get nothing. There is way to update dictionary, refer to documentation.
Use GIN index on tsvector field.
1: By default, postgreSQL only matches the whole word. You can opt in to do a prefix matching:
select * from parcels_temp where name_dispatcher ## to_tsquery('Нику:*');
See: https://www.postgresql.org/docs/9.5/static/textsearch-controls.html
3: You can create a column and GIN index on it, or you can just create the index without creating a column. I think they are the same in terms of performance. For details, see:
https://www.postgresql.org/docs/9.5/static/textsearch-tables.html