Postgresql: Combining similarity with tsvector - postgresql

I got a database table containing more than 50 million records
which i need to full text search as fast as possible.
On a smaller table i just had a index on the text column and i use the similarity function to get similar results. I was also able to sort by the result of similarity().
Now, after my table is a lot bigger, i switched to tsvector. I created a column for the tsvector result and a trigger which updates the column before insert or update. After that i can search ultra fast (<100ms).
The problem is that i would like to use a combination of both tsvector and similarity.
Example
My table contains the following data.
| MyColumn |
------------
| Apple |
| Orange |
| ... |
But if i search for "App" i don't get "Apple" back.
Any ideas on how to get a fast "like/similar" search with a "score/similarity" score ?

https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
Also, * can be attached to a lexeme to specify prefix matching:
smth like that?.:
postgres=# with c(v) as (values('Apple'),('App'),('application'),('apricote'))
select v,to_tsvector(v),to_tsvector(v) ## to_tsquery('app:*') from c;
v | to_tsvector | ?column?
-------------+-------------+----------
Apple | 'appl':1 | t
App | 'app':1 | t
application | 'applic':1 | t
apricote | 'apricot':1 | f
(4 rows)
postgres=# with c(v) as (values('Apple'),('App'),('application'),('apricote'))
select v,to_tsvector(v),to_tsvector(v) ## to_tsquery('ap:*') from c;
v | to_tsvector | ?column?
-------------+-------------+----------
Apple | 'appl':1 | t
App | 'app':1 | t
application | 'applic':1 | t
apricote | 'apricot':1 | t
(4 rows)

Related

how to restore data values that already converted into scientific notation in a table

so i have a problem where i inserted some values into my table that the value automatically converted into a scientific notation (ex: 8.24e+04) does anyone know how restore the original value or how keep the original values in the table?
i'm using double precision as data type for the column and i just noticed that double precision data type often convert long number values into scientific notation.
this is how table looks like after i inserted some values
test=# select * from demo;
| string_col | values |
|------------|-----------------------|
| Rocket | 123228435521 |
| Test | 13328422942213 |
| Power | 1.243343991231232e+15 |
| Pull | 1.233433459353712e+15 |
| Drag | 1244375399128 |
edb=# \d+ demo;
Table "public.demo"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
string_col | character varying(20) | | | | extended | |
values | double precision | | | | plain | |
Access method: heap
this just some dummy table i used to explain my question here.
You'll have to format the number using to_char if you want it in a specific format:
SELECT 31672516735473059594023526::double precision,
to_char(
31672516735473059594023526::double precision,
'999999999999999999999999999.99999999999999FM'
);
float8 │ to_char
═══════════════════════╪════════════════════════════
3.167251673547306e+25 │ 31672516735473058997862400
(1 row)
The result is not exact because the precision of double precision is not high enough.
If you don't want the rounding errors and want to avoid scientific notation as well, use the data type numeric instead.

SQL parameter table

I suspect this question is already well-answered but perhaps due to limited SQL vocabulary I have not managed to find what I need. I have a database with many code:description mappings in a single 'parameter' table. I would like to define a query or procedure to return the descriptions for all (or an arbitrary list of) coded values in a given 'content' table with their descriptions from the parameter table. I don't want to alter the original data, I just want to display friendly results.
Is there a standard way to do this?
Can it be accomplished with SELECT or are other statements required?
Here is a sample query for a single coded field:
SELECT TOP (5)
newid() as id,
B.BRIDGE_STATUS,
P.SHORTDESC
FROM
BRIDGE B
LEFT JOIN PARAMTRS P ON P.TABLE_NAME = 'BRIDGE'
AND P.FIELD_NAME = 'BRIDGE_STATUS'
AND P.PARMVALUE = B.BRIDGE_STATUS
ORDER BY
id
I want to produce 'decoded' results like:
| id | BRIDGE_STATUS |
|--------------------------------------|------------ |
| BABCEC1E-5FE2-46FA-9763-000131F2F688 | Active |
| 758F5201-4742-43C6-8550-000571875265 | Active |
| 5E51634C-4DD9-4B0A-BBF5-00087DF71C8B | Active |
| 0A4EA521-DE70-4D04-93B8-000CD12B7F55 | Inactive |
| 815C6C66-8995-4893-9A1B-000F00F839A4 | Proposed |
Rather than original, coded data like:
| id | BRIDGE_STATUS |
|--------------------------------------|---------------|
| F50214D7-F726-4996-9C0C-00021BD681A4 | 3 |
| 4F173E40-54DC-495E-9B84-000B446F09C3 | 3 |
| F9C216CD-0453-434B-AFA0-000C39EFA0FB | 3 |
| 5D09554E-201D-4208-A786-000C537759A1 | 1 |
| F0BDB9A4-E796-4786-8781-000FC60E200C | 4 |
but for an arbitrary number of columns.

How to aggregate Postgres table so that ID is unique and column values are collected in array?

I'm not sure how to call what I'm trying to do, so trying to look it up didn't work very well. I would like to aggregate my table based on one column and have all the rows from another column collapsed into an array by unique ID.
| ID | some_other_value |
-------------------------
| 1 | A |
| 1 | B |
| 2 | C |
| .. | ... |
To return
| ID | values_array |
-------------------------
| 1 | {A, B} |
| 2 | {C} |
Sorry for the bad explanation, I'm really lacking the vocabulary here. Any help with writing a query that achieves what's in the example would be very much appreciated.
Try the following.
select id, array_agg(some_other_value order by some_other_value ) as values_array from <yourTableName> group by id
You can also check here.
See Aggregate Functions documentation.
SELECT
id,
array_agg(some_other_value)
FROM
the_table
GROUP BY
id;

How do I list all streams and continuous views in pipelinedb?

In pipelinedb I can't seem to locate a way to list all of the streams and continuous views that I've created.
I can back into the CVs by looking for the "mrel" tables that are created but it's kind of clunky.
Is there a system table or view I can query that will list them?
You may have an older version of pipelinedb, or you may be looking at an older version of the docs.
You can check your version with psql like so:
pipeline=# select * from pipeline_version();
pipeline_version
-----------------------------------------------------------------------------------------------------------------------------------------------------------
PipelineDB 0.9.0 at revision b1ea9ab6acb689e6ed69fb26af555ca8d025ebae on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4, 64-bit
(1 row)
In the latest version, information about views can be obtained like so:
pipeline=# select * from pipeline_views();
id | schema | name | query
----+--------+------+-----------------------
11 | public | cv | SELECT x::integer, +
| | | count(*) AS count+
| | | FROM ONLY s +
| | | GROUP BY x::integer
(1 row)
Information about streams can be obtained like so:
pipeline=# select * from pipeline_streams();
schema | name | inferred | queries | tup_desc
--------+------+----------+---------+----------------------------------------
public | s | t | {cv} | \x000000017800000006a4ffffffff00000000
(1 row)
More information can be obtained by using \d+:
pipeline=# \d+ cv
Continuous view "public.cv"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
x | integer | | plain |
count | bigint | | plain |
View definition:
SELECT x::integer,
count(*) AS count
FROM ONLY s
GROUP BY x::integer;
pipeline=# \d+ s
Stream "public.s"
Column | Type | Storage
-------------------+-----------------------------+---------
arrival_timestamp | timestamp(0) with time zone | plain
It's easy peasy,
just write
select * from pipeline_streams();
To see pipelinestreams and inside of it u can see which stream has which views.
Edit:
Above code snippet is only for 0.9.x version of PipelineDB since it is PostgreSQL extension with version 1.x you will use foreign tables as a streams
psql -c "\dE[S+];"
This code will show you all foreign tables on psql (Streams on pipelinedb).
For more information : http://docs.pipelinedb.com/streams.html

Escaping special characters in to_tsquery

How do you espace special characters in string passed to to_tsquery? For instance, this kind of query:
select to_tsquery('AT&T');
Produces:
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
Edit: I also noticed that there is the same issue in to_tsvector.
A simple solution is to create the tsquery as follows:
select $$'AT&T'$$::tsquery;
You can make more complex queries:
select $$'AT&T' & Phone | '|Bang!'$$::tsquery;
See the text search docs for more.
I found this comment very useful that uses the plainto_tsquery('AT&T) function https://stackoverflow.com/a/16020565/350195
If you want 'AT&T' to be treated as a search word, you're going to need some customised components, because the default parser splits it as two words:
steve#steve#[local] =# select * from ts_parse('default', 'AT&T');
tokid | token
-------+-------
1 | AT
12 | &
1 | T
(3 rows)
steve#steve#[local] =# select * from ts_debug('simple', 'AT&T');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+--------------+------------+---------
asciiword | Word, all ASCII | AT | {simple} | simple | {at}
blank | Space symbols | & | {} | |
asciiword | Word, all ASCII | T | {simple} | simple | {t}
(3 rows)
As you can see from the documentation for CREATE TEXT PARSER this is not very trivial, as the parser appears to need to be a C function.
You might find this post of someone getting "underscore_word" to be recognised as a single token useful: http://postgresql.1045698.n5.nabble.com/Configuring-Text-Search-parser-td2846645.html