TABLESAMPLE SYSTEM_ROWS() gives not random results - postgresql

I'm trying to use SELECT id FROM table TABLESAMPLE SYSTEM_ROWS(1) for selecting 1 random row, but I always get first row, so it's not random.
But I realized that I get this problem only with the newly created table, because same query for the old table gives correct random results. I already tried to reindex table with this problem, but unfortunately it didn't help.
So, how can I get correct random output using TABLESAMPLE SYSTEM_ROWS(1)?

Related

Postgresql - Slow running Query for a simple select statement attached to timescaledb

I have a append only table with more than ~80M records attached to timescaledb, the frequency of insertion of records to table is one minute. Also, there is an index created on non unique column and start time (ds_id, start_time).
When I try to run the simple
select * from observation where ds_id in (27525, 27567, 28787,27099)
The query itself is taking longer than 1 minute to give the output.
I, also tried to analyze the table, as it is append only there is no scope of vacuum on this table.
So, I am in confusion why the simple select query is taking much time. I am thinking due to huge number of records it is taking time to query the results.
Please help me in understanding the issue and help me with fixing
query plan: https://explain.depesz.com/s/M8H7
Thanks in advance.
Note: ds_id (fk) and start_time(insertion time) are the one used for getting results. Also, I am sorry for not providing the table structure and details as it is confidential. :(

How can I sum/subtract time values from same row

I want to sum and subtract two or more timestamp columns.
I'm using PostgreSQL and I have a structure as you can see:
I can't round the minutes or seconds, so I'm trying to extract the EPOCH and doing the operation after, but I always get an error because the first EXTRACT recognizes the column, but when I put the second EXTRACT in the same SQL command I get an error message saying that the second column does not exist.
I'll give you an example:
SELECT
EXAMPLE.PERSON_ID,
COALESCE(EXTRACT(EPOCH from EXAMPLE.LEFT_AT),0) +
COALESCE(EXTRACT(EPOCH from EXAMPLE.ARRIVED_AT),0) AS CREDIT
FROM
EXAMPLE
WHERE
EXAMPLE.PERSON_ID = 1;
In this example I would get an error like:
Column ARRIVED_AT does not exist
Why is this happening?
Could I sum/subtract time values from same row?
Is ARRIVED_AT a calculated value instead of a column? What did you run to get the query results image you posted showing those columns?
The following script does what you expect, so there's something about the structure of the table you're querying that isn't what you expect.
CREATE SCHEMA so46801016;
SET search_path=so46801016;
CREATE TABLE trips (
person_id serial primary key,
arrived_at time,
left_at time
);
INSERT INTO trips (arrived_at, left_at) VALUES
('14:30'::time, '19:30'::time)
, ('11:27'::time, '20:00'::time)
;
SELECT
t.person_id,
COALESCE(EXTRACT(EPOCH from t.left_at),0) +
COALESCE(EXTRACT(EPOCH from t.arrived_at),0) AS credit
FROM
trips t;
DROP SCHEMA so46801016 CASCADE;

Wrong display of count(*) result with PgAdmin and postgresql 9.3

I'm a little confused with my data, postgres returns wrong value with a simple count(*)
I use a :
select count(*) from DimUsers
it returns : 74280
this one :
select count(*) from DimUsers group by user_type
Returns :
72134,12288, 89850
this one :
select * from DimUser
displays a table of 1674280 rows
And my full database is estimated by pgadmin to 1674280 rows.
I can't see what is wrong with it, this happened to anyone before?
I recently encountered a similar problem. However I found that pgAdmin 3 was in fact returning the correct count value, except the column size for the count column was not auto-sized correctly to fit the data and thus it seemed like it was only returning the last 5 digits of the count. Increasing the size of the count column by using the column resize on the header row allows you to see the full number, but unfortunately you have to do this for every query you run.
As noted by rachekalmir the issue is pgAdmin 3. You can overcome this by giving "count" a longer column name. then you will see the full number.
select count(*) as mybiggggcount from DimUsers

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum

PostgreSQL changing returned rows order

I have a table named categories, which contains ID(long), Name(varchar(50)), parentID(long), and shownByDefault(boolean) columns.
This table contains 554 records. All the shownByDefaultValues are 'false'.
When I execute 'select id, name from categories', pg returns me all the categories,
orderer by its id.
Then I update some of the rows of the table('update categories set shownByDefault where parentId = 1'), update OK.
Then, when I try to execute the first query, which returns all the categories, they are
returner with a very weird order.
I do not have problem to add 'order by', but since I am using JPA to get this values, anyone knows what the problem is or if there is a way to fix this?
That's not a problem. The order of rows returned by a SQL SELECT is undefined unless it has an ORDER BY. The order you get them is usually influenced by the order they are stored in the table and/or the indices that are used by the statement.
So depending on that order without using ORDER BY is a very, very bad idea.
If you need them in some order, simply specify that.
It is important that a table is a set of rows and not a sequence of rows.
From the docs:
If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
The rows are returned in whatever their physical order on disk is; you can reorder them physically using the CLUSTER SQL command, but due to the way Postgres works they'll become unordered as soon as you start modifying rows.
For what you're doing an ORDER BY is the right answer.