Does CitusDB support `CREATE VIEW`? - postgresql

Assume we have a distributed CitusDB table named customer_reviews, and we try to create a view on it:
CREATE VIEW book_reviews AS
(SELECT * FROM customer_reviews WHERE product_group = 'Book');
This appears to work. But if we run:
SELECT COUNT(1) FROM book_reviews;
CitusDB gives the following error:
ERROR: cannot plan queries that include both regular and partitioned relations
Two questions:
Is there a way to work around this by manually creating the view on all worker nodes?
Is there a way to make CREATE VIEW and DROP VIEW work correctly on the master node, for apps which create and destroy views automatically at runtime?

UPDATE: View support is added to Citus with this PR.
First of all, I created an issue to track this. Please feel free to add your comments and feedback on that issue.
Until we implement this feature, I see two workarounds;
Using a UDF or a PL/pgSQL function to wrap the view query instead of creating the view. I added specific examples to the GitHub issue
Create some UDFs and PL/pgSQL functions to propagate views down to the shards on worker nodes and manipulate metadata to simulate views on the master node. I also added a prototype approach to the GitHub issue
I think what is best for you depends on your CREATE VIEW queries and your application stack. Could you explain your use-case and how do you plan to use views a bit more?

Related

PostgreSQL row-level security with views (protecting/hiding columns) [duplicate]

I need strict control of the reading and writing of my Postgres data. Updatable views have always provided very good, strict, control of the reading of my data and allows me to add valuable computed columns. With Postgres 9.5 row level security has introduced a new and powerful way to control my data. But I can't use both technologies views, and row level security together. Why?
Basically because it wasn't possible to retroactively change how views work. I'd like to be able to support SECURITY INVOKER (or equivalent) for views but as far as I know no such feature presently exists.
You can filter access to the view its self with row security normally.
The tables accessed by the view will also have their row security rules applied. However, they'll see the current_user as the view creator because views access tables (and other views) with the rights of the user who created/owns the view.
Maybe it'd be worth raising this on pgsql-hackers if you're willing to step in and help with development of the feature you need, or pgsql-general otherwise?
That said, while views access tables as the creating user and change current_user accordingly, they don't prevent you from using custom GUCs, the session_user, or other contextual information in row security policies. You can use row security with views, just not (usefully) to filter based on current_user.
You can do this from PostgreSQL v15 on, which introduced the security_invoker option on views. If you turn that on, permissions on the underlying tables are checked as the user who calls the view, and RLS policies for the invoking user are used.
You can change existing views with
ALTER VIEW view_name SET (security_invoker = on);
The row level security policy can still be applied in the WHERE clause of the view. For example:
WHERE my_security_policy_function(person_id)

Can we add data to a continuous view in pipelinedb externally

I would like to add data to a specific continuous view. I do not want to feed it through the stream as I want to add it only to this specific view with out disturbing others.
I have tried adding rows to the cv_mrel table directly but I was unable to as some columns of the view are of hll(hyperloglog) type.
Is the any way or a function by which I create/cast to this Data structure from a value?
PipelineDB exposes functions for building and manipulating HLLs:
http://docs.pipelinedb.com/builtin.html#hyperloglog-functions
So in your case I'm guessing you'd want to build representative HLLs and then insert those into the _mrel table directly.

Delayed Table Name Resolution in View

I have a view over a table. It turns out the table gets moved and an updated version of it created each night. This ensures there is always a table of the expected name present in the database, but I cannot find a way to make my view continue to point to the current version of the table. Whichever table existed when the view was created is the one I end up pointing to even after it moves and goes stale.
ViewA:
select a, b, c from todays_table;
todays_table stays current all day, then at night it gets renamed to todays_table01. View A now points to todays_table01 and a new table shows up called todays_table. Again, todays_table is current, but ViewA no longer is.
Is there a way to delay the table name resolution until the view is used? I haven't been able to get EXECUTE IMMEDIATE working for SELECT statement. I think I could get a dynamic SQL statement working if I used a cursor, but I have never needed these before and I'm not sure if they are the right path. I read about AUTO_REVAL but I believe this would only delay resolution until the first time the view was used and still go stale that night.
I could, of course, stop using the view and just move the complex query into my program but there are many places it is needed so I would like to eliminate all other solutions before falling back to this.
It would be ideal to eliminate the temporary table and just have the master table receive updates throughout the day but this is beyond my comprehension as I know nothing about RPG II and OCL.
Thanks for reading.
Edit
Per #Mr. Llama's suggestion, I experimented with using synonyms and aliases to point to todays_table and then having my view point to the synonym. Unfortunately in this scenario, the view uses the alias to resolve the actual table name on creation so the view continues to point to todays_table when it is renamed to todays_table01, though the alias continues to reference todays_table.
Edit 2
I'm accepting #mustaccio's answer because it does work and would be a reasonable approach to this problem if I could get the parameters going where they need to. My particular project requires flexibility so I am actually going to jump on the nightly process bandwagon and add a program to recreate my views after the process messes with their references as #danny117 suggested.
Thanks to everyone who replied though, I learned a lot about how all of these pieces work together.
I think you might be able to achieve what you want by wrapping your view definition in a SQL table function, something like
CREATE FUNCTION insteadofview (<parameters>)
RETURNS TABLE (<columns>)
...
RETURN
SELECT <the rest of your view definition>
Depending on how you query your view, you will probably need to pass search criteria into the function as parameters, otherwise performance will be suboptimal because the function will have to return all rows from the query before search arguments can be applied.
According to the manual, as you have noticed views on a table that is renamed continue to point to the original table object. Routines, however, including table functions, will be invalidated and their plans prepared again when they next invoked, using the original source table name.
I have no way of testing this though.
Full syntax to create a table function.

Managing postgresql views without having to write migrations?

The problems with SQL views is that every time I need to make a small change I need to create another migration. Being in a small startup, that's quite a hindrance to have to change something small to change the view.
Is it advisable to do the following
Drop and recreate view everytime I deploy my app;
This way, when I change something in the view, it will get updated in the database as soon as I deploy my app.
What you are describing is just another type of migration that get reversed on deploy. This may make sense for your business needs, and if you get blocked by this technique, you can always fall back to the regular migration system.
The best way to implement such a system in PostgreSQL is to create a schema that you drop on deploy. This way you don't have to create all the DROP VIEW ... commands, just DROP SCHEMA and everything in there will be deleted. Then you can run you procedure to rebuild it.
Example deploy script to execute on deploy:
/* Drop and rebuild the schema */
DROP SCHEMA IF EXISTS view_schema;
CREATE VIEW view_schema.my_users AS (SELECT * FROM users);
CREATE VIEW view_schema.my_products AS (SELECT * FROM products);
....

Managing database changes

I'm starting to move more logic into the database, using triggers, views, functions, CTEs, etc. When plv8/json comes out for postgres, I can see myself putting lots of logic in there.
I'm having problems with the "standard" way of doing database migrations in sequel and activerecord. Both sequel and activerecord let you put arbitrary sql code into timestamped files. When each file is ran, a schema_versions table is updated with the filename (or timestamp in the filename), which keeps record of which migrations have been applied to the current database.
If a lot of coding is being done at the database level, that means that modifications to existing views, functions, etc follow the below pattern:
Migration 1 defines a function and a view that uses that function.
-- Migration 1
create function calculate(x int) returns int as $$
return x + 1;
$$ language sql;
create view foos as (
select something, calculate(something) from a_table
);
Requirements change, and I need to change a function type. In Migration 2 I have to drop all objects that depend on foo, and recreate them by copying their entire body -- even if there weren't any changes in most of the other code!
-- Migration 2
-- Have to drop all views and functions that depend on the
-- `calculate(int)` function.
drop view foos;
create or replace calculate(x bigint) returns bigint as $$
return x + 1;
$$ language sql;
-- I could do `drop function calculate(int) cascade`,
-- but I might accidentally drop some objects that wouldn't get recreated below.
-- Now I have to recreate foo.
create view foos as (
select something, calculate(something) from a_table
);
If I'm building a system based on views and functions and triggers, my migrations would be filled with duplicated code, and it's difficult to find the latest version of the code. You might say "don't do that!", but for my purposes (e-commerce, shipping, transactions), I'm finding it's a lot easier and faster to have the database ensure the integrity of the data by doing the logic inside the database.
You can (of course) dump the current database schema (which includes all the code definitions), but I think you lose comments. And you wouldn't generally want to edit a giant file that contains the whole schema.
Any ideas on how to solve this problem?
My best idea is to how the sql code contained in their own canonical files (app/sql/orders/shipping.sql, app/sql/orders/creation.sql, etc). Everyone develops directly on these. Whenever it's time for a release, then you'd want to make a new migration file, look at all the changed code since the previous release, figure out the dependency chain of the database objects that need to be dropped and recreated, and then copy the sql from the canonical sql files into a new sequel/activerecord migration file. But it's a pain. :/
Thoughts are very welcome. I hope I explained this well enough, I'm cutting back on my caffeine intake and I'm a little groggy atm.
Oh, I asked a similar question on Stack Overflow: Changing the type of a column used in other views The answer was a function that let me pass in:
sql code to run
database views to drop and recreate
The function would retrieve the view definition, drop the views, run the sql code, then recreate the view definition (in reverse order of dropping). Perhaps a system of functions like this would help solve the problem of having to copy/paste sql code into the migration files.
I'd recommend liquibase.
You create files which track the changes to your database and these will be run into the database in the correct migration order.
You might find Dave Wheeler's blog-posts interesting starting from here:
http://justatheory.com/computers/databases/simple-sql-change-management.html
My rate of database change is fairly small but I tend to be careless and make small changes to the schema directly, so I've had to come up with a fair bit of infrastructure to catch when I've done so. The basic elements are:
A makefile that can rebuild a development database from scratch
A set of schema-files separated into "modules" (lookups_schema.sql, lookup_data.sql)
A set of update files that transition from one revision to the next
I don't usually have the corresponding downgrade scripts, some people do
A script to populate my database with a plausible amount of test data
Crucially, a test suite via pgTAP that checks my various functions, views and also the upgrade scripts. The upgrade tests can be run against a live database too.
If you have a separate instance of PostgreSQL set up with fsync turned off / on ramdisk etc then rebuilding the whole DB and populating it can take seconds (if you don't have too much test data).
Start with #1, #2, then add #6 (pgTAP is very cool), then the rest. The crucial thing is a test suite that checks your in-database code.
There are tools that try to automate schema changes for you, but they are really only good at adding a new column to a table and that sort of thing. Once you have code in your db then they're not much help.