Our web based app with 100,000 concurrent users has a use case where we auto-save the user's activity every 5 seconds. Consider a table like this:
create table essays
(
id uuid not null constraint essays_pkey primary key,
userId text not null,
essayparts jsonb default '{ }' :: jsonb,
create_date timestamp with time zone default now() not null,
modify_date timestamp with time zone default now() not null
);
create index essays_create_idx on essays ("create_date");
create index essays_modify_idx on essays ("modify_date");
This works well for us as all the stuff related to a user's essay such as title, brief byline. requestor, full essay body, etc. are all stored in the essayparts column as a JSON. For auto-saving the essay, we don't insert new rows all the time though. We update each ID (each essay) with all its components.
So there are plenty of updates per essay, as this is a time consuming and thoughtful activity. Given the auto save every 5 seconds, if a user was to be writing for half an hour, we'd have updated her essay around 360 times.
This would be fine with the "HOT" (heap only tuples) functionality of PostgreSQL. We're using v10 so we are fine. However, the challenge is that we also update the modify_date column every time the essay is saved and this has an index too. Which means by the principle of HOT this is not benefiting from the HOT update and a lot of fragmentation occurs.
I suppose in the web or mobile world this is not an unusual pattern. Many services seem to auto-save content. Are they insert only? If so, if the user logs out and comes back in, how do they show the records, by looking at the max(modify_date)? Or is there any other mechanism to leverage HOT updates while also updating an indexed column in the table?
Appreciate any pointers, thank you!
Performing an update every 5 second with 100000 concurrent users will produce 20000 updates per second. This is quite challenging as such, and you would need a good system to pull it off, but autovacuum will never be able to keep up if those updates are not HOT.
You have several options:
Choose a relational database management system other than PostgreSQL that updates rows in place.
Do not index modify_date and hope that HOT will do the trick.
Perform these updates way less often than once every 5 seconds (who needs auto-save every 5 seconds anyway?).
Auto-save the data somewhere else than in the database.
Related
So we have a task at the moment where I need to move millions of records from one database to another.
To complicate things slightly I need to change an id on each record before inserting the data.
How it works is we have 100 stations in database a.
Each station contains 30+ sensors.
Each sensor contains readings for about the last 10 years.
These readings are anywhere from 15minute interval to daily interval.
So each station can have at least 5m records.
database b has the same structure as database a.
The reading table contains the following fields
id: primary key
sensor_id: int4
value: numeric(12)
time: timestamp
What I have done so far for one station is.
Connect to database a and select all readings for station 1
Find all corresponding sensors in database b
Change the sensor_id from database a to it's new sensor_id from database b
Chunk the updated sensor_id data to groups of about 5000 parameters
Loop over the chunks and do a mass insert
In theory, this should work.
However, I am getting errors saying duplicate key violates unique constraint.
If I query the database on those records that are failing, the data doesn't exist.
The weird thing about this is that if I run the script 4 or 5 times in a row all the data eventually gets in there. So I am at a loss as to why I would be receiving this error because it doesn't seem accurate.
Is there a way I can get around this error from happening?
Is there a more efficient way of doing this?
If I have large amounts of data in a table defined like
CREATE TABLE sensor_values ( ts TIMESTAMPTZ(35, 6) NOT NULL,
value FLOAT8(17, 17) DEFAULT 'NaN' :: REAL NOT NULL,
sensor_id INT4(10) NOT NULL, );
Data comes in every minute for thousands of points. Quite often though I need to extract and work with daily values over years (On a web frontend). To aid this I would like a sensor_values_days table that only has the daily sums for each point and then I can use this for faster queries over longer timespans.
I don't want a trigger for every write to the db as I am afraid that would slow down the already bottle neck of writes to the db.
Is there a way to trigger only after so many rows have been inserted ?
Or perhaps an index and maintains a index of a sum of entries over days ? I don't think that is possible.
What would be the best way to do this. It would not have to be very up to date. Losing the last few hours or a day would not be an issue.
Thanks
What would be the best way to do this.
Install clickhouse and use AggregatingMergeTree table type.
With postgres:
Create per-period aggregate table. You can have several with different granularity, like hours, days, and months.
Have a cron or scheduled task run at the end of each period plus a few minutes. First, select the latest timestamp in the per-period table, so you know at which period to start. Then, aggregate all rows in the main table for periods that came after the last available one. This process will also work if the per-period table is empty, or if it missed the last update then it will catch up.
In order to do only inserts and no updates, you have to run it at the end of each period, to make sure it got all the data. You can also store the first and last timestamp of the rows that were aggregated, so later if you check the table you see it did use all the data from the period.
After aggregation, the "hour" table should be 60x smaller than the "minute" table, that should help!
Then, repeat the same process for the "day" and "month" table.
If you want up-to-date stats, you can UNION ALL the results of the "per day" table (for example) to the results of the live table, but only pull the current day out of the live table, since all the previous days's worth of data have been summarized into the "per day" table. Hopefully, the current day's data will be cached in RAM.
It would not have to be very up to date. Losing the last few hours or a day would not be an issue.
Also if you want to partition your huge table, make sure you do it before its size becomes unmanageable...
Materialized Views and a Cron every 5 minutes can help you:
https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
In PG14, we will have INCREMENTAL MATERIALIZED VIEW, but for the moment is in devel.
So I know that TTL is not available for counters because of design reasons and I've read https://issues.apache.org/jira/browse/CASSANDRA-2103 as well as some other SO questions regarding this but there seems to be no clear answer(unless I am missing something which is entirely plausible):
How do we elegantly handle the expiration of counters in Cassandra?
Example use case: page views on a specific day.
For this we might have a table such as
CREATE TABLE pageviews (page varchar, date varchar, views counter, PRIMARY KEY(page, date));
One year from now the information of how many views we had on one specific day is not very relevant (instead we might have aggregated it into a view/month table or similar) and we don't want unnecessary data hanging around in our db for no reason. Normally we would put a TTL on this and let Cassandra handle it for us - elegant! But since we aren't allowed to use TTL for counter tables this is not an option..
You also cant just run delete from pageviews where date > 'xxxx' since both key must be defined in the where clause.
You would first need to query all the page first then issue individual deletes, which is not scalable.
Is there any proper way of achieving this ?
Its significantly slower, but thats kinda the price if you dont want to manage the expiration yourself - you can use LWTs and actually insert TTL'd columns instead of updating a counter. ie:
CREATE TABLE pageviews (
page varchar,
date timestamp,
views int,
PRIMARY KEY(page, date))
WITH compaction = {'class': 'LeveledCompactionStrategy'};
To update a page view:
UPDATE pageviews USING TTL 604800
SET views = *12*
WHERE page = '/home' AND date = YYYY-MM-DD
IF views = *11*
if it fails, reread and try again. This can be very slow if high contention, but in that case you can do some batching per app, say only flush updates every 10 seconds or something and increment by more than 1 at a time
To see total in range of dates:
SELECT sum(views) FROM pageviews WHERE page='/home' and date >= '2017-01-01 00:00:00+0200' AND date <= '2017-01-13 23:59:00+0200'
Fastest approach would be to use counters and just have a job during a less busy time that deletes things older than X days.
Another idea if you are Ok with some % error, you can use a single counter per page and use forward decay to "expire" (make insignificant) old view increments, will still need a job to adjust landmark periodically though. This will not be as useful for looking at ranges though and will only give you an estimate of "total so far".
If you don't need date range queries, you can use a partition key of page % X, date and a clustering key of page.
Then for each date you wish to discard, you can delete partitions 0 through X - 1 with X delete statements.
In many examples I've read of SQL timestamp useage, a typical case would be that a timestamp column is added to prevent a kind of race condition whereby a user is changing data that has lost it's integrity since another user 'got in there first'.
More specifically, prior to issuing an update on a row, business logic would cross check the timestamp they believe to be changing so that there isn't a mix up with row versioning.
Question
Why wouldn't DATETIME suffice for this task? In fact, by that logic - why wouldn't any unique data type be appropriate instead? NEWID() every time an update is issued, for example?
In mySQL, timestamp is a physically smaller datatype to store than datetime. In addition, timestamp is universal, ignoring all timezones. For international products, this is important.
ID's are not recommended as they often generate at the point of insert/update.
It appears I missed the fundamental feature of timestamp, it auto updates.
So calling UPDATE on a row will automatically increment it's TIMESTAMP column without me manually setting it.
I'll leave this answer here just in case anybody has comments about what else I may have missed.
I was using hstore, Postgresql 9.3.4, to store a count for each time an event happened in a given day, with an update like the following.
days_count = days_count || hstore('x', (coalesce((days_count -> 'x')::integer, 0) + 1)::text)
Where x is the day of the year. After running a simulation of expected behavior for production I ended up with a table that was 150MB + 2GB Toast + 25-30MB for the index, after Analyze and Vacuum.
I am now instead breaking up the above column into one for each month like the following
y_month_days_count = y_month_days_count || hstore('x', (coalesce((y_month_days_count -> 'x')::integer, 0) + 1)::text)
Where x is the day of the month, and y is the month of the year.
I am still running the simulation right now, but so far at third of the way done I am at 60MB + A pretty steady 20-30MB of Toast + 25-30MB for the index. Which means in the end I should end up with about 180MB + 30-40MB for Toast + 25MB-30MB for the index after Analyze and Vacuum.
So first is there any known issues with Hstore and Toast bloat that would explain my issue with my first set up?
Second will my current solution of breaking up the columns cause any type of issues with hstore and performance in the future because of the number of hstore columns on one table? It seems to be steady now with row numbers in the hundred of thousands, and while I know more columns can make things slower, I am unsure if this is worse with hstore columns.
Finally I did find something out. I have one hstore column that ends up representing each hour a day, so it has 24 different keys. When I run the simulation for just this column I end up with almost no toast, in the KB, but when I run the whole simulation, with the days broken up into months columns, my largest hstore has 52 keys.
So for a simple store of either a counter or a word or two, the max number of keys before I see any amount of toast for hstore is between 24 and 52 keys.
So first is there any known issues with Hstore and Toast bloat that would explain my issue with my first set up?
Yes.
When you update any part of an out-of-line stored TOASTed field like text, hstore or json the whole field must be re-written as a new row version. This is a consequence of MVCC - it's necessary to retain a copy of every version of the row that might still be visible to another transaction.
The old one can be vacuumed away when it's no longer required by any running transaction, so in practice this has minimal impact so long as autovacuum is running aggressively enough.
So if you're updating lots of rows with big text, hstore or json fields, or updating them frequently, tune autovacuum up so it runs more often and does work faster. Make sure you don't have long running <IDLE> in transaction connections.
You say the table sizes you quoted were "after analyze and vacuum" but I'm guessing you only ran a regular vacuum, so the table bloat would've been freed for re-use by PostgreSQL but not released back to the OS. See if VACUUM FULL compacts it.
Will my current solution of breaking up the columns cause any type of issues with hstore and performance in the future because of the number of hstore columns on one table?
Depends on your query patterns and workload, but probably not.