Postgres Text Search against a derived ts_vector - postgresql

Somehow I've tricked myself into writing a full text search implementation on a database. I have a table that represents all the entities in my database, a table that represents all the tags, and a table representing the many to many relationship of tags to entities.
I wrote a query that groups all tag names for a given entity and concatenates them into a string, which I then transform into a ts_vector. That query looks like this:
SELECT e.id, to_tsvector(c.publicname || ' ' || string_agg(cv.name, ' '))
FROM categoryvalue cv, entitycategoryvalue ecv, entity e
WHERE ccv.categoryvalueid = cv.id AND e.id = ecv.entityid
GROUP BY e.id;
The query returns results in this schema:
id | to_tsvector
1 | tag_a, tag_b, tag_c
2 | tag_b, tag_d, tag_e
Which is exactly what I'd like to match a ts_query against. I'm a noob at SQL though, and I am wondering if there is a way I can create a table that is continually updated with the results of the query I've written?
EDIT: I documented my eventual solution and system in this blog post http://tech.pro/tutorial/1142/building-faceted-search-with-postgresql

I think you want a VIEW.
CREATE VIEW my_tsearch_table AS
SELECT e.id, to_tsvector(c.publicname || ' ' || string_agg(cv.name, ' '))
FROM categoryvalue cv, entitycategoryvalue c=ecv, entity e
WHERE ccv.categoryvalueid = cv.id AND e.id = ecv.celebrityid
GROUP BY e.id;
Note, however, that you cannot add indexes to a view. This may be a problem with full-text search, as it's quite expensive to generate all those tsvectors for every search.
If you need to index that table, you are looking for a materialized view.
PostgreSQL does not support automatically maintained materialized views; you can't just CREATE MATERIALIZED VIEW and then add some indexes to it.
What you can do is manually maintain a materialized view using an ordinary table, an ordinary view created with your query, and some trigger functions.
Add a trigger function on each table that contributes to the view, and have that trigger function update (or insert into, or delete from, as appropriate) the materialized view based on updates made to that table. This can be complicated to get right in a concurrent environment, though, and it can be prone to lock-ordering deadlocks.
An alternative to a trigger-maintained view is to live with the materialized view getting a little out of date. Periodically create a new table, copy your ordinary view into the new table, add the desired indexes, then drop the old materialized view table and rename the new one to replace it.
See also:
http://wiki.postgresql.org/wiki/Materialized_Views
http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

Related

Join a view to another table

Is it possible to join a view with another table in SQL? If so, how?
I have a query on Oracle db which has specific fields. I need to re create the same query on PostgreSQL but some of the data in the PostgreSQL query are coming from a view... And that view has missing information. It's a pretty complex view, so I don't want to NOT use it for now.
For example, in Oracle I do this:
SELECT
d.dos_id,
trunc(d.dos_creation, 'MM') as Cohorte,
sum(v.ver_etude + v.ver_direct) as encaissé
from t_dossier d
left outer join v_versement v
on v.dos_id = d.dos_id
In the Postgres one, I'm using a view. But the view does not return "dos_id" so I cannot explicitly join v_versement with the view.
Is there a way to force a view to return specific fields at runtime which weren't there when creating the view?
You can't force it
to return specific fields at runtime which weren't there when creating
the view
You can create or replace it with limitation:
https://www.postgresql.org/docs/current/static/sql-createview.html
CREATE OR REPLACE VIEW is similar, but if a view of the same name
already exists, it is replaced. The new query must generate the same
columns that were generated by the existing view query (that is, the
same column names in the same order and with the same data types), but
it may add additional columns to the end of the list. The calculations
giving rise to the output columns may be completely different.
example:
t=# create view v2 as select now();
CREATE VIEW
Time: 36.488 ms
t=# create or replace view v2 as select now(),current_user;
CREATE VIEW
Time: 8.551 ms
t=# create or replace view v2 as select now()::text,current_user;
ERROR: cannot change data type of view column "now" from timestamp with time zone to text
Time: 0.430 ms
I guess I had not realised that I can actually use the view without creating it...
So I edited the SQL statement that makes up the view, added the fields that I needed and used the code of the view without having to create a new view (creating a new view would mean outsourcing it to another company, which would cost us money..)
Thanks :)

Common table expression,WITH clause in PostgreSQL ;ERROR: relation "stkpos" does not exist

for example the following is my query
WITH stkpos as (
select * from mytbl
),
updt as (
update stkpos set field=(select sum(fieldn) from stkpos)
)
select * from stkpos
ERROR: relation "stkpos" does not exist
Unlike MS-SQL and some other DBs, PostgreSQL's CTE terms are not treated like a view. They're more like an implicit temp table - they get materialized, the planner can't push filters down into them or pull filters up out of them, etc.
One consequence of this is that you can't update them, because they're a copy of the original data, not just a view of it. You'll need to find another way to do what you want.
If you want help with that you'll need to post a new question that has a clear and self contained example (with create table statements, etc) showing a cut down version of your real problem. Enough to actually let us understand what you're trying to accomplish and why.

Postgres/GIS: RULE does not affect all inserts

I have a rule for a table which simply checks if the new entry matches a name and intersects with that matching existing row using st_intersects from postgis library.
It seems that only a part are NOT inserted, but most get through this rule. I checked some entries manually after the insert and can confirm the rule should have blocked that insert.
Is something wrong with my RULE?
Table has 3 columns. id serial, name varchar(200) and way geometry(Linestring,4326)
And my RULE is as follows (excerpt from \d names)
blockduplicate AS
ON INSERT TO nameslist
WHERE (EXISTS ( SELECT 1
FROM nameslist
WHERE nameslist.name::text = new.name::text AND st_intersects(nameslist.way, new.way) = true)) DO INSTEAD NOTHING
This table simply takes a line having a name, and whenever another entry comes in with the same name and intersecting with another existing entry having the same name, it should be blocked. So I have only one entry with this name in the area represented by the geometry field way. After the insert I see plenty of duplicates (name matches and st_intersects returns true when checking way field). Why is my rule not blocking the insert?
Update: Is it because I do multiple inserts in one query. I actually insert 12000 entries in one shot with the query INSERT INTO (a,b,c) VALUES (...),(...),(...),...
Does PostgreSQL call the RULE for each value? I need to do multiple inserts otherwise it would take months to finish my inserts.
Ok, usually you are going to find triggers are cleaner than rules. Triggers are fired on each insert. Rules are macros which rewrite your SQL. There is a time and place for the use of rules but they are certainly an advanced area.
Let's look at what happens with your insert. Suppose you:
INSERT INTO nameslist
SELECT * FROM nameslist_import;
Your rule will actually rewrite the query into something like:
INSERT INTO nameslist
SELECT * FROM nameslist_import WHERE not (expression modelled on your rule);
Usually it is cleaner here to just write this into your query rather than using a rule to rewrite your query for you. This allows you to tune exactly what you want to do in each query. If you want to prevent such data from overlapping, look at exclude constraints if they are applicable or triggers.
Hope this helps.

Postgresql, query results to new table

Windows/NET/ODBC
I would like to get query results to new table on some handy way which I can see through data adapter but I can't find a way to do it.
There is no much examples around to satisfy beginner's level on this.
Don't know temporary or not but after seeing results that table is no more needed so I can delete it 'by hand' or it can be deleted automatically.
This is what I try:
mCmd = New OdbcCommand("CREATE TEMP TABLE temp1 ON COMMIT DROP AS " & _
"SELECT dtbl_id, name, mystr, myint, myouble FROM " & myTable & " " & _
"WHERE myFlag='1' ORDER BY dtbl_id", mCon)
n = mCmd.ExecuteNonQuery
This run's without error and in 'n' I get correct number of matched rows!!
But with pgAdmin I don't see those table no where?? No matter if I look under opened transaction or after transaction is closed.
Second, should I define columns for temp1 table first or they can be made automatically based on query results (that would be nice!).
Please minimal example to illustrate me what to do based on upper code to get new table filled with query results.
A shorter way to do the same thing your current code does is with CREATE TEMPORARY TABLE AS SELECT ... . See the entry for CREATE TABLE AS in the manual.
Temporary tables are not visible outside the session ("connection") that created them, they're intended as a temporary location for data that the session will use in later queries. If you want a created table to be accessible from other sessions, don't use a TEMPORARY table.
Maybe you want UNLOGGED (9.2 or newer) for data that's generated and doesn't need to be durable, but must be visible to other sessions?
See related: Is there a way to access temporary tables of other sessions in PostgreSQL?

SQL Views - no variables?

Is it possible to declare a variable within a View? For example:
Declare #SomeVar varchar(8) = 'something'
gives me the syntax error:
Incorrect syntax near the keyword 'Declare'.
You are correct. Local variables are not allowed in a VIEW.
You can set a local variable in a table valued function, which returns a result set (like a view does.)
http://msdn.microsoft.com/en-us/library/ms191165.aspx
e.g.
CREATE FUNCTION dbo.udf_foo()
RETURNS #ret TABLE (col INT)
AS
BEGIN
DECLARE #myvar INT;
SELECT #myvar = 1;
INSERT INTO #ret SELECT #myvar;
RETURN;
END;
GO
SELECT * FROM dbo.udf_foo();
GO
You could use WITH to define your expressions. Then do a simple Sub-SELECT to access those definitions.
CREATE VIEW MyView
AS
WITH MyVars (SomeVar, Var2)
AS (
SELECT
'something' AS 'SomeVar',
123 AS 'Var2'
)
SELECT *
FROM MyTable
WHERE x = (SELECT SomeVar FROM MyVars)
EDIT: I tried using a CTE on my previous answer which was incorrect, as pointed out by #bummi. This option should work instead:
Here's one option using a CROSS APPLY, to kind of work around this problem:
SELECT st.Value, Constants.CONSTANT_ONE, Constants.CONSTANT_TWO
FROM SomeTable st
CROSS APPLY (
SELECT 'Value1' AS CONSTANT_ONE,
'Value2' AS CONSTANT_TWO
) Constants
#datenstation had the correct concept. Here is a working example that uses CTE to cache variable's names:
CREATE VIEW vwImportant_Users AS
WITH params AS (
SELECT
varType='%Admin%',
varMinStatus=1)
SELECT status, name
FROM sys.sysusers, params
WHERE status > varMinStatus OR name LIKE varType
SELECT * FROM vwImportant_Users
also via JOIN
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers INNER JOIN params ON 1=1
WHERE status > varMinStatus OR name LIKE varType
also via CROSS APPLY
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers CROSS APPLY params
WHERE status > varMinStatus OR name LIKE varType
Yes this is correct, you can't have variables in views
(there are other restrictions too).
Views can be used for cases where the result can be replaced with a select statement.
Using functions as spencer7593 mentioned is a correct approach for dynamic data. For static data, a more performant approach which is consistent with SQL data design (versus the anti-pattern of writting massive procedural code in sprocs) is to create a separate table with the static values and join to it. This is extremely beneficial from a performace perspective since the SQL Engine can build effective execution plans around a JOIN, and you have the potential to add indexes as well if needed.
The disadvantage of using functions (or any inline calculated values) is the callout happens for every potential row returned, which is costly. Why? Because SQL has to first create a full dataset with the calculated values and then apply the WHERE clause to that dataset.
Nine times out of ten you should not need dynamically calculated cell values in your queries. Its much better to figure out what you will need, then design a data model that supports it, and populate that data model with semi-dynamic data (via batch jobs for instance) and use the SQL Engine to do the heavy lifting via standard SQL.
What I do is create a view that performs the same select as the table variable and link that view into the second view. So a view can select from another view. This achieves the same result
How often do you need to refresh the view? I have a similar case where the new data comes once a month; then I have to load it, and during the loading processes I have to create new tables. At that moment I alter my view to consider the changes.
I used as base the information in this other question:
Create View Dynamically & synonyms
In there, it is proposed to do it 2 ways:
using synonyms.
Using dynamic SQL to create view (this is what helped me achieve my result).