I've created a function to index a certain value from another table.
basically, i'm querying these activities, often filtered on the context of the activityplans table.
this is my function:
CREATE or replace FUNCTION get_activity_context(parentact uuid, parentplan uuid) RETURNS TEXT
LANGUAGE sql IMMUTABLE AS
$$
SELECT CASE
WHEN $2 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap where ap.key = $2)
WHEN $1 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap, activities act where act."parentPlan" = ap.key AND act.key=$1)
END
$$;
the function works, when I use it, for example, like select get_activity_context("parentActivity", "parentPlan") from activities limit 10;
but when i try to create an index:
create index on activities (get_activity_context("parentActivity", "parentPlan"));
i get this:
ERROR: could not read block 0 in file "base/16402/60840": read only 0 of 8192 bytes
CONTEXT: SQL function "get_activity_context" during startup
SQL state: XX001
googling this error only bring me to database data issues etc, but i don't think this is the case. My guess is something is wrong with the function, but i can't seem to figure out what.
I don't know which relation 60840 is in your database, but it sure has a problem. Find out with
SELECT relname,relnamespace::regnamespace
FROM pg_class
WHERE relfilenode = 60840;
Anyway, that index will never work, because the function is not IMMUTABLE, no matter how you declared it. It may return a different result tomorrow. This would lead to data corruption.
An index on one table can never refer to data from another table.
My first thought:
Usually index are created like this:
CREATE INDEX id_column_idx
ON public.naleznosc USING btree
(id_column)
TABLESPACE pg_default;
but you, are trying to create them this way:
CREATE INDEX .......... on activities ...........(get_activity_context("parentActivity", "parentPlan")) ...........;
The dots showing places where you not put some things :)
I have a query that does a large bulk insert from a CSV. The CSV is formatted in a very specific way so this query I use has lots of lines involving the creation of temp tables, altering and updating columns in them, and then dropping them, etc, etc. It requires lots of 'steps' in the query but I'm happy with it as it's very efficient/fast.
Eg portion:
--5
UPDATE #Map_M
SET Site = REPLACE(Site, ' ', '')
GO
--6
ALTER TABLE #Map_M
ADD P_V NUMERIC(12,3)
GO
--7
UPDATE #Map_M
SET P_V = TRY_CAST(Mock_P_V AS NUMERIC(12,3))
GO
-8
ALTER TABLE #Map_M
DROP COLUMN Mock_P_V
GO
To get this to work with an SP, though, I had to I swap 'GO' for ';' but when I do that the query fails because it doesn't seem like it's order of execution is honoring the order of the way the query is written. For example, I'm getting an error 'Invalid column name 'P_V''. I'm assuming it's because the UPDATE #Map_M SET P_V = TRY_CAST(Mock_P_V AS NUMERIC(12,3)); is executing before ALTER TABLE #Map_M ADD P_V NUMERIC(12,3); when I swap GO for ;.
Wondering how to resolve this?
I am doing a experimental script to do a SQL Comparison (COLLATED as case-sensitive) and I am having issues with the SET IDENTITY_INSERT <Table> ON
I have switched on this option and disabled foreign key checks, but it still seems to be complaining about the latter.
Here are the steps I followed:
1 - I created a linked server
EXEC sp_addlinkedserver #Server=N'xxx.xxx.xxx.xxx', #srvproduct=N'SQL Server'
2 - I added the login credentials
EXEC master.dbo.sp_addlinkedsrvlogin
#rmtsrvname = N'xxx.xxx.xxxx.xxx',
#locallogin = NULL ,
#useself = N'False',
#rmtuser = N'xxxxxxxxxxx',
#rmtpassword = N'xxxxxxxxxxx'
3 - In the same batch, I set the identity_insert, disabled foreign key checks and ran the following merge script. Note, the deferred query returns an XML field which is disallowed over distributed servers, so I casted to NVARCHAR(MAX)
SET IDENTITY_INSERT [DATABASE1].[dbo].[TABLE1] ON
ALTER TABLE [DATABASE1].[dbo].[TABLE1] NOCHECK CONSTRAINT ALL
MERGE [DATABASE1].[dbo].[TABLE1]
USING OPENQUERY([xxx.xxx.xxx.xxx], 'SELECT S.ID, S.EventId, S.SnapshotTypeID, CAST(S.Content AS NVARCHAR(MAX)) AS Content FROM [DATABASE1].[dbo].[TABLE1] AS S') AS S
ON (CAST([DATABASE1].[dbo].[TABLE1].Content AS NVARCHAR(MAX)) = S.Content)
WHEN NOT MATCHED BY TARGET
THEN INSERT VALUES (S.ID, S.EventId, S.SnapshotTypeID, CAST(S.Content AS XML))
WHEN MATCHED
THEN UPDATE SET [DATABASE1].[dbo].[TABLE1].EventId = S.EventId,
[DATABASE1].[dbo].[TABLE1].SnapshotTypeID = S.SnapshotTypeID,
[DATABASE1].[dbo].[TABLE1].Content = S.Content
COLLATE Latin1_General_CS_AS;
GO
The error message I am getting reads as follows:
Msg 8101, Level 16, State 1, Line 4
An explicit value for the identity column in table 'Database1.dbo.Table' can only be specified when a column list is used and IDENTITY_INSERT is ON.
How can I fix this? As I mentioned, this script is only an experiment for one of the systems I am writing. I am probably reinventing the wheel somewhere, but its all about learning in this exercise.
An explicit value for the identity column in table 'Database1.dbo.Table' can only be specified when a column list is used and IDENTITY_INSERT is ON.
You have no column list
We've got a system (MS SQL 2008 R2-based) that has a number of "input" database and a one "output" database. I'd like to write a query that will read from the output DB, and JOIN it to data in one of the source DB. However, the source table may be one or more individual tables :( The name of the source DB is included in the output DB; ideally, I'd like to do something like the following (pseudo-SQL ahoy)
select o.[UID]
,o.[description]
,i.[data]
from [output].dbo.[description] as o
left join (select [UID]
,[data]
from
[output.sourcedb].dbo.datatable
) as i
on i.[UID] = o.[UID];
Is there any way to do something like the above - "dynamically" specify the database and table to be joined on for each row in the query?
Try using the exec function, then specify the select as a string, adding variables for database names and tables where appropriate. Simple example:
DECLARE #dbName VARCHAR(255), #tableName VARCHAR(255), #colName VARCHAR(255)
...
EXEC('SELECT * FROM ' + #dbName + '.dbo.' + #tableName + ' WHERE ' + #colName + ' = 1')
No, the table must be known at the time you prepare the query. Otherwise how would the query optimizer know what indexes it might be able to use? Or if the table you reference even has an UID column?
You'll have to do this in stages:
Fetch the sourcedb value from your output database in one query.
Build an SQL query string, interpolating the value you fetched in the first query into the FROM clause of the second query.
Be careful to check that this value contains a legitimate database name. For instance, filter out non-alpha characters or apply a regular expression or look it up in a whitelist. Otherwise you're exposing yourself to a SQL Injection risk.
Execute the new SQL string you built with exec() as #user353852 suggests.
Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;