I have two tables one is role and another one is user_role. User_role table contains all users of particular role. Now, i added two extra new roles into the role table. I want to insert the rows into user_role table who has role_id=4 with the new role_ids. I cannot update exsting rows because i want exsting rows also means the users who have role_id=4 also belongs to new role_ids.
I tried in this way
INSERT INTO table2 (all_links, fields_one, fields_two)
select URI, fields, details FROM table1
WHERE date > "12-11-2013 00-00-00";
But here two tables are there, but in my case only one table is there. And in user_role table id is sequene so cant select from another table dirctely and do insert.
Please help on these.
You can select and insert records into the same table.
If I understood your use case correctly,
INSERT INTO user_role (role_id, <other columns>)
SELECT <new_role_id>, <other columns>
FROM
user_role
WHERE
role_id = 4.
This will insert all the existing records in user_role table with role_id as 4, into the same table with a different role_id. HTH.
Related
I have a time-series location data table containing the following columns (time, first_name, last_name, loc_lat, loc_long) with the first three columns as the primary key. The table has more than 1M rows.
I notice that first_name and last_name duplicate quite often. There are only 100 combinations in 1M rows. Therefore, to save disk space, I am thinking about creating a separate people table with columns (id, first_name, last_name) where (first_name, last_name) is a unique constraint, in order to simplify the time-series location table to be (time, person_id, loc_lat, loc_long) where person_id is a foreign key for the people table.
I want to first create a new table from my existing 1M row table to test if there is indeed meaningful disk space save with this change. I feel like this task is quite doable but cannot find a concrete way to do so yet. Any suggestions?
That's a basic step of database normalization.
If you can afford to do so, it will be faster to write a new table exchanging full names for IDs, than altering the schema of the existing table and update all rows. Basically:
BEGIN; -- wrap in single transaction (optional, but safer)
CREATE TABLE people (
people_id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, first_name text NOT NULL
, last_name text NOT NULL
, CONSTRAINT full_name_uni UNIQUE (first_name, last_name)
);
INSERT INTO people (first_name, last_name)
SELECT DISTINCT first_name, last_name
FROM tbl
ORDER BY 1, 2; -- optional
ALTER TABLE tbl RENAME TO tbl_old; -- free up org. table name
CREATE TABLE tbl AS
SELECT t.time, p.people_id, t.loc_lat, t.loc_long
FROM tbl_old t
JOIN people p USING (first_name, last_name);
-- ORDER BY ??
ALTER TABLE tbl ADD CONSTRAINT people_id_fk FOREIGN KEY (people_id) REFERENCES people(people_id);
-- make sure the new table is complete. indexes? constraints?
-- Finally:
DROP TABLE tbl_old;
COMMIT;
Related:
Best way to populate a new column in a large table?
Add new column without table lock?
Updating database rows without locking the table in PostgreSQL 9.2
DISTINCT is simple. But for only 100 distinct full names - and with the right index support! - there are more sophisticated, (much) faster ways. See:
Optimize GROUP BY query to retrieve latest row per user
I have a table that has the following fields
----------------------------------
| id | user_id | doc_id |
----------------------------------
I want to create a new unique constraint to make sure that there are no repeat user_id and doc_id records. Aka a user can only be linked to a doc one time. That is simple enough.
ALTER TABLE mytable
ADD CONSTRAINT uniquectm_const UNIQUE (user_id, doc_id);
The issue is I have records that currently violate that constraint. I was wondering if there is an easy way to query for those records or to tell postgres just delete anything that violates the constraint.
Identifying records that violate your new key:
SELECT *
FROM
(
SELECT id, user_id, doc_id
, COUNT(*) OVER (PARTITION BY user_id, doc_id) as unique_check
FROM mytable
)
WHERE unique_check > 1;
Then you can figure out from those duplicates, which should be deleted and perform the delete.
To my knowledge there is no other way to perform this since any automated "Delete any duplicates" command would leave the database engine to decide which of the two-or-more duplicate records to get rid of.
If the entire record is a duplicate (all columns match) then you could just create a new table with your new unique constraint and do a INSERT INTO newtable SELECT DISTINCT * FROM oldtable but I'm betting that isn't the case.
Let's say I have 2 tables: Students and Groups.
The Group table has 2 columns: id, GroupName
The Student table has 3 columns: id, StudentName and GroupID
The GroupID is a foreign key to a Group field.
I need to import the Students table from a CSV, but in my CSV instead of the Group id appears the name of the group. How can I import it with pgAdmin without modifying the csv?
Based on Laurenz answer, use follwoing scripts:
Create a temp table to insert from CSV file:
CREATE TEMP TABLE std_temp (id int, student_name char(25), group_name char(25));
Then, import the CSV file:
COPY std_temp FROM '/home/username/Documents/std.csv' CSV HEADER;
Now, create std and grp tables for students and groups:
CREATE TABLE grp (id int, name char(25));
CREATE TABLE std (id int, name char(20), grp_id int);
It's grp table's turn to be populated based on distinct value of group name. Consider how row_number() is use to provide value for id`:
INSERT INTO grp (id, name) select row_number() OVER (), * from (select distinct group_name from std_temp) as foo;
And the final step, select data based on the join then insert it into the std table:
insert into std (id, name, grp_id) select std_temp.id, std_temp.student_name,grp.id from std_temp inner join grp on std_temp.group_name = grp.name;
At the end, retreive data from final std table:
select * from std;
Your easiest option is to import the file into a temporary table that is defined like the CSV file. Then you can join that table with the "groups" table and use INSERT INTO ... SELECT ... to populate the "students" table.
There is of course also the option to define a view on a join of the two tables and define an INSTEAD OF INSERT trigger on the view that inserts values into the underlying tables as appropriate. Then you could load the data directly to the view.
The suggestion by #LaurenzAlbe is the obvious approach (IMHO never load a spreadsheet directly to
your tables, they are untrustworthy beasts). But I believe your implementation after loading the staging
table is flawed.
First, using row_number() virtually ensures you get duplicated ids for the same group name.
The ids will always increment from 1 by 1 to then number of group names no matter the number of groups previously loaded and you cannot ensure the identical sequence on a subsequent spreadsheets. What happens when you have a group that does not previously exist.
Further there is no validation that the group name does not already exist. Result: Duplicate group names and/or multiple ids for the same name.
Second, you attempt to use the id from the spreadsheet as the id the student (std) table is full of error possibilities. How do you ensure that number is unique across spreadsheets?
Even if unique in a single spreadsheet, how do you ensure another spreadsheet does not use the same numbers as a previous one. Or assuming multiple users create the spreadsheets that one users numbers do not overlap another users even if all users
user are very conscious of the numbers they use. Result: Duplicate id numbers.
A much better approach would be to put a unique key on the group table name column then insert any group names from the stage table into the group trapping any duplicate name errors (using on conflict). Then load the student table directly from the stage table
while selecting group id from the group table by the (now unique) group name.
create table csv_load_temp( junk_num integer, student_name text, group_name text);
create table groups( grp_id integer generated always as identity
, name text
, grp_key text generated always as ( lower(name) ) stored
, constraint grp_pk
primary key (grp_id)
, constraint grp_bk
unique (grp_key)
);
create table students (std_id integer generated always as identity
, name text
, grp_id integer
, constraint std_pk
primary key (std_id)
, constraint std2grp_fk
foreign key (grp_id)
references groups(grp_id)
);
-- Function to load Groups and Students
create or replace function establish_students()
returns void
language sql
as $$
insert into groups (name)
select distinct group_name
from csv_load_temp
on conflict (grp_key) do nothing;
insert into students (name, grp_id)
select student_name, grp_id
from csv_load_temp t
join groups grp
on (grp.name = t.group_name);
$$;
The groups table requires Postgres v12. For prior versions remove the column grp_key couumn
and and put the unique constraint directly on the name column. What to do about capitalization is up to your business logic.
See fiddle for full example. Obviously the 2 inserts in the Establish_Students function can be run standalone and independently. In that case the function itself is not necessary.
I have one table USER and some other tables like USER_DETAILS ,USER_QUALIFICATION etc USER_ID references to all such table i want to remove those USER_ID which are not present in any other tables.
Deleting all of the users that are not present in a connected table:
DELETE FROM table WHERE user_id NOT IN (SELECT user_id FROM other_table)
If you want to delete only users that are not found in any table than you can add
AND NOT IN (SELECT user_id FROM another_table)
Alternatively you can create a tmp table and merge in all the user_ids that you want to keep and use that table in the sub-select for the NOT IN.
Use a DELETE with a not exists condition for all related tables:
delete from "USER" u
where not exists (select *
from user_details ud
where ud.user_id = u.user_id)
and not exists (select *
from user_qualification uq
where uq.user_id = u.user_id);
Note that user is a reserved word, and thus needs to be quoted to be usable as a table name. But quoting makes it case-sensitive. So "USER" and "user" are two different table names. As you have not included the DDL for your tables I cannot tell if your table is named "USER" or "user".
In general I would strongly recommend to avoid using double quotes for identifies completely.
I have a table 'users' with the columns:
user_id(PK), user_firstname, user_lastname
and another table 'room' with the columns:
event_id(PK), user_id(FK), user_firstname, user_lastname....(and more columns).
I want to know if it is possible to fill the user_firstname and user_lastname automatically just knowing the user_id column.
Like the default value of user_firstname would be like: "select users.user_firstname where users.user_id = user_id"
I don't know if was clear enough...As you can see my knowledge in database is very narrow.
What you want to achieve can be done with JOINs. They will avoid those redundant user_firstname and user_lastname columns. So you'd just fetch from both tables when querying the room table and you get the extra columns of users into the result set:
SELECT * FROM room AS r INNER JOIN users AS u ON r.user_id = u.user_id;
The thing we did here is called normalization. Another important thing to take care of are foreign key constraints and their cascades, in your case room.user_id references user.user_id. A delete on user should most probably cascade to room, if you want to delete users, instead of flagging them deleted.
The columns user_firstname and user_lastname do not belong in your room table. The user_id column references the users table, that is all you need.
To select the data, you can use a JOIN statement, something like
SELECT R.event_id, R.user_id, U.user_firstname, U.user_lastname
FROM room AS R
JOIN users AS U ON R.user_uid = U.user_id
The answer here is sideways to the question. You do not want a user_firstname and user_lastname column in the Event table. The user_id is a proxy for that row of the entire User table. When you need to access user_firstname, you do a JOIN of the two tables on the common column.