ROWID equivalent in postgres 9.2 - postgresql

Is there any way to get rowid of a record in postgres??
In oracle i can use like
SELECT MAX(BILLS.ROWID) FROM BILLS

Yes, there is ctid column which is equivalent for rowid. But is useless for you. Rowid and ctid are physical row/tuple identifiers => can change after rebuild/vacuum.
See: Chapter 5. Data Definition > 5.4. System Columns

The PostgreSQL row_number() window function can be used for most purposes where you would use rowid. Whereas in Oracle the rowid is an intrinsic numbering of the result data rows, in Postgres row_number() computes a numbering within a logical ordering of the returned data. Normally if you want to number the rows, it means you expect them in a particular order, so you would specify which column(s) to order the rows when numbering them:
select client_name, row_number() over (order by date) from bills;
If you just want the rows numbered arbitrarily you can leave the over clause empty:
select client_name, row_number() over () from bills;
If you want to calculate an aggregate over the row number you'll have to use a subquery:
select max(rownum) from (
select row_number() over () as rownum from bills
) r;
If all you need is the last item from a table, and you have a column to sort sequentially, there's a simpler approach than using row_number(). Just reverse the sort order and select the first item:
select * from bills
order by date desc limit 1;

Use a Sequence. You can choose 4 or 8 byte values.
http://www.neilconway.org/docs/sequences/

Add any unique column to your table(name maybe rowid).
And prevent changing it by creating BEFORE UPDATE trigger, which will raise exception if someone will try to update.
You may populate this column with sequence as #JohnMudd mentioned.

Related

Postgres Remove records by duplicate control_id [duplicate]

I have a table in a PostgreSQL 8.3.8 database, which has no keys/constraints on it, and has multiple rows with exactly the same values.
I would like to remove all duplicates and keep only 1 copy of each row.
There is one column in particular (named "key") which may be used to identify duplicates, i.e. there should only exist one entry for each distinct "key".
How can I do this? (Ideally, with a single SQL command.)
Speed is not a problem in this case (there are only a few rows).
A faster solution is
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
DELETE FROM dupes a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM dupes b
WHERE a.key = b.key);
This is fast and concise:
DELETE FROM dupes T1
USING dupes T2
WHERE T1.ctid < T2.ctid -- delete the older versions
AND T1.key = T2.key; -- add more columns if needed
See also my answer at How to delete duplicate rows without unique identifier which includes more information.
EXISTS is simple and among the fastest for most data distributions:
DELETE FROM dupes d
WHERE EXISTS (
SELECT FROM dupes
WHERE key = d.key
AND ctid < d.ctid
);
From each set of duplicate rows (defined by identical key), this keeps the one row with the minimum ctid.
Result is identical to the currently accepted answer by a_horse. Just faster, because EXISTS can stop evaluating as soon as the first offending row is found, while the alternative with min() has to consider all rows per group to compute the minimum. Speed is of no concern to this question, but why not take it?
You may want to add a UNIQUE constraint after cleaning up, to prevent duplicates from creeping back in:
ALTER TABLE dupes ADD CONSTRAINT constraint_name_here UNIQUE (key);
About the system column ctid:
Is the system column “ctid” legitimate for identifying rows to delete?
If there is any other column defined UNIQUE NOT NULL column in the table (like a PRIMARY KEY) then, by all means, use it instead of ctid.
If key can be NULL and you only want one of those, too, use IS NOT DISTINCT FROM instead of =. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
As that's slower, you might instead run the above query as is, and this in addition:
DELETE FROM dupes d
WHERE key IS NULL
AND EXISTS (
SELECT FROM dupes
WHERE key IS NULL
AND ctid < d.ctid
);
And consider:
Create unique constraint with null columns
For small tables, indexes generally do not help performance. And we need not look further.
For big tables and few duplicates, an existing index on (key) can help (a lot).
For mostly duplicates, an index may add more cost than benefit, as it has to be kept up to date concurrently. Finding duplicates without index becomes faster anyway because there are so many and EXISTS only needs to find one. But consider a completely different approach if you can afford it (i.e. concurrent access allows it): Write the few surviving rows to a new table. That also removes table (and index) bloat in the process. See:
How to delete duplicate entries?
I tried this:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
provided by Postgres wiki:
https://wiki.postgresql.org/wiki/Deleting_duplicates
I would use a temporary table:
create table tab_temp as
select distinct f1, f2, f3, fn
from tab;
Then, delete tab and rename tab_temp into tab.
I had to create my own version. Version written by #a_horse_with_no_name is way too slow on my table (21M rows). And #rapimo simply doesn't delete dups.
Here is what I use on PostgreSQL 9.5
DELETE FROM your_table
WHERE ctid IN (
SELECT unnest(array_remove(all_ctids, actid))
FROM (
SELECT
min(b.ctid) AS actid,
array_agg(ctid) AS all_ctids
FROM your_table b
GROUP BY key1, key2, key3, key4
HAVING count(*) > 1) c);
Another approach (works only if you have any unique field like id in your table) to find all unique ids by columns and remove other ids that are not in unique list
DELETE
FROM users
WHERE users.id NOT IN (SELECT DISTINCT ON (username, email) id FROM users);
Postgresql has windows function, you can use rank() to archive your goal, sample:
WITH ranked as (
SELECT
id, column1,
"rank" () OVER (
PARTITION BY column1
order by column1 asc
) AS r
FROM
table1
)
delete from table1 t1
using ranked
where t1.id = ranked.id and ranked.r > 1
Here is another solution, that worked for me.
delete from table_name a using table_name b
where a.id < b.id
and a.column1 = b.column1;
How about:
WITH
u AS (SELECT DISTINCT * FROM your_table),
x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;
I had been concerned about execution order, would the DELETE happen before the SELECT DISTINCT, but it works fine for me.
And has the added bonus of not needing any knowledge about the table structure.
Here is a solution using PARTITION BY and the virtual ctid column, which is works like a primary key, at least within a single session:
DELETE FROM dups
USING (
SELECT
ctid,
(
ctid != min(ctid) OVER (PARTITION BY key_column1, key_column2 [...])
) AS is_duplicate
FROM dups
) dups_find_duplicates
WHERE dups.ctid == dups_find_duplicates.ctid
AND dups_find_duplicates.is_duplicate
A subquery is used to mark all rows as duplicates or not, based on whether they share the same "key columns", but not the same ctid, as the "first" one found in the "partition" of rows sharing the same keys.
In other words, "first" is defined as:
min(ctid) OVER (PARTITION BY key_column1, key_column2 [...])
Then, all rows where is_duplicate is true are deleted by their ctid.
From the documentation, ctid represents (emphasis mine):
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. A primary key should be used to identify logical rows.
well, none of this solution would work if the id is duplicated which is my use case, then the solution is simple:
myTable:
id name
0 value
0 value
0 value
1 value1
1 value1
create dedupMyTable as select distinct * from myTable;
delete from myTable;
insert into myTable select * from dedupMyTable;
select * from myTable;
id name
0 value
1 value1
well you shouldn't have duplicates id into your table unless it doesn't have PK constraints or simply doesn't support it such as Hive/data lake tables
Better pay attention when loading your data to avoid dups over ID's
DELETE FROM tracking_order
WHERE
mvd_id IN (---column you need to remove duplicate
SELECT
mvd_id
FROM (
SELECT
mvd_id,thoi_gian_gui,
ROW_NUMBER() OVER (
PARTITION BY mvd_id
ORDER BY thoi_gian_gui desc) AS row_num
FROM
tracking_order
) s_alias
WHERE row_num > 1)
AND thoi_gian_gui in ( --column you used to compare to delete duplicates, eg last update time
SELECT
thoi_gian_gui
FROM (
SELECT
thoi_gian_gui,
ROW_NUMBER() OVER (
PARTITION BY mvd_id
ORDER BY thoi_gian_gui desc) AS row_num
FROM
tracking_order
) s_alias
WHERE row_num > 1)
My code, I remove all duplicates 7800445 row and keep only 1 copy of each row with 7 min 28 secs.
enter image description here
This worked well for me. I had a table, terms, that contained duplicate values. Ran a query to populate a temp table with all of the duplicate rows. Then I ran the a delete statement with those ids in the temp table. value is the column that contained the duplicates.
CREATE TEMP TABLE dupids AS
select id from (
select value, id, row_number()
over (partition by value order by value)
as rownum from terms
) tmp
where rownum >= 2;
delete from [table] where id in (select id from dupids)

postgres random using setseed

I would like to add a column with a random number using setseed to a table.
The original table structure (test_input) col_a,col_b,col_c
Desired output (test_output) col_a, col_b, col_c, random_id
The following returns the same random_id on all rows instead of a different value in each row.
select col_a,col_b,col_c,setseed(0.5),(
select random() from generate_series(1,100) limit 1
) as random_id
from test_input
Could you help me modify the query that uses setseed and returns a different random_id in each row?
You have to use setseed differently. Also generate_series() is misued in your example. You need to use something like:
select setseed(0.5);
select col_a,col_b,col_c, random() as random_id from test_input;
If you want to get the same random number assigned to the same row, you will have to sort rows first, quoting documentation:
If the ORDER BY clause is specified, the returned rows are sorted in
the specified order. If ORDER BY is not given, the rows are returned
in whatever order the system finds fastest to produce.
You can use:
select setseed(0.5);
select *, random() as random_id from (
select col_a,col_b,col_c from test_input order by col_a, col_b, col_c) a;
Here I assume that combination of col_a, col_b, col_c is unique. If it's not the case, you will have to first add another column with unique ID to the table and sort by this column in the query above.

how to get rowNum like column in sqlite IPHONE

I have an Sqlite database table like this (with out ascending)
But i need to retrive the table in Ascending order by name, when i set it ascending order the rowId changes as follows in jumbled order
But i need to retrieve some limited number of contacts 5 in ascending order every time
like Aaa - Eeee and then Ffff- Jjjjj ......
but to se**t limits like 0-5 5-10 .... ** it can able using rowids since they are in jumble order
So i need another column like (rowNum in oracle) wich is in order 1234567... every time as follows
how to retrive that column with existing columns
Note: WE DONTE HAVE ROWNUM LIKE COLUMN IN SQLITE
The fake rownum solution is clever, but I am afraid it doesn't scale well (for complex query you have to join and count on each row the number of row before current row).
I would consider using create table tmp as select /*your query*/.
because in the case of a create as select operation the rowid created when inserting
the rows is exactly what would be the rownum (a counter). It is specified by the SQLite doc.
Once the initial query has been inserted, you only need to query the tmp table:
select rowid, /* your columns */ from tmp
order by rowid
You can use offset/limit.
Get the first, 2nd, and 3rd groups of five rows:
select rowid, name from contactinfo order by name limit 0, 5
select rowid, name from contactinfo order by name limit 5, 5
select rowid, name from contactinfo order by name limit 10, 5
Warning, using the above syntax requires SQLite to read through all prior records in sorted order. So to get the 10th record for statement number 3 above SQLite needs to read the first 9 records. If you have a large number of records this can be problematic from a performance standpoint.
More info on limit/ offset:
Sqlite Query Optimization (using Limit and Offset)
Sqlite LIMIT / OFFSET query
This is a way of faking a RowNum, hope it helps:
SELECT
(SELECT COUNT(*)
FROM Names AS t2
WHERE t2.name < t1.name
) + (
SELECT COUNT(*)
FROM Names AS t3
WHERE t3.name = t1.name AND t3.id < t1.id
) AS rowNum,
id,
name
FROM Names t1
ORDER BY t1.name ASC
SQL Fiddle example

Resequencing a column with identifier in Postgresql

The following code works and creates a temporary table with a sequence number which is restarted for every new name:
with results as (select row_number() over (partition by name order BY name) as mytid,name from telephn_table)
select * from results order by name
My objective however is to insert the new sequence number permanently into the telephone table.
How do I transfer the new sequence number from the results table to the telephone table? I have come across the following for MySql but was not able to convert it to Postgresql.
MySQL: Add sequence column based on another field
Can anyone help?
If memory serves, row_number() returns the number within its own partition. In other words, row_number() over (partition by name order BY name) would return 1 for each row except duplicates. You likely want rank() over (order by name) instead.
After a long discussion:
update telephn_table
set sid = rows.new_sid
from (select pkey,
row_number() over (partition BY name) as new_sid,
name
from telephn_table
) as rows
where rows.pkey = telephn_table.pkey;
THIS WORKS! (See my OP link to a previous MySql link. In Postgresql it works without need for a temporary table)
alter table telephn_table add column tid integer default 0;
UPDATE telephn_table set tid=(SELECT count(*)+1 from telephn_table t where t.sid < telephn_table.sid and telephn_table.name=t.name)

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.