I need some very simple storage to keep data, before it will be added to PostgreSQL. Currently, I have web application that collects data from clients. This data is not very important. Application collects about 50 kb of data (simple string) from one client. It will be collect about 2GB data per hour.
This data is not needed ASAP and it's nothing if it will be lost.
Is there a existing solution to store it in memory for a while (~ 1 hour), and then write it all in PostgreSQL. I don't need to query it in any way.
I can use Redis, probably, but Redis is too complex for this task.
I can write something by myself, but this tool will be must to handle many requests to store data (maybe about 100 per second) and existing solution may be better.
Thanks,
Dmitry
If you do not plan to work this data operatively so why do you want to store it in memory? You may create UNLOGGED table and store data in this table.
Look at the documentation for details:
UNLOGGED
If specified, the table is created as an unlogged table. Data written
to unlogged tables is not written to the write-ahead log, which makes
them considerably faster than ordinary tables.
However, they are not crash-safe: an unlogged table is automatically
truncated after a crash or unclean shutdown. The contents of an
unlogged table are also not replicated to standby servers. Any indexes
created on an unlogged table are automatically unlogged as well;
however, unlogged GiST indexes are currently not supported and cannot
be created on an unlogged table.
Storing data in memory sounds like caching to me. So, if you are using Java, I would recommend Guava Cache to you!
It seems to fit all your requirements, e.g. setting an expiry delay, handling the data once it is evicted from the cache:
private Cache<String, Object> yourCache = CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.HOURS)
.removalListener(notification -> {
// store object in DB
})
.build();
Related
I'm migrating from a proprietary dbms to PG. In the proprietary dbms, "offlining" and "onlining" data partitions is a very lightweight operation. I'm looking to implement similar functionality with PG by backup and restore of individual table (partitions). Obviously I need to avoid a performance regression. So my question is what the fastest way is of:
Backing up a table (partition), both data and indexes
Taking the table offline (meaning that the data is now gone from the database)
Restoring the table (partition), both data and indexes
Once I have some advice I can design more targeted performance comparisons. Thanks in advance for any pointers.
What is fast and needs to be fast is adding or removing a partition (ALTER TABLE ... ATTACH/DETACH PARTITION).
After you have detached the partition you are in no great hurry to backup/export the data. This can be done comfortably with pg_dump.
Similarly, importing the data for a table that is to become a new partition is normally not time critical.
If you need this to happen faster (for example, you want the old partition to be visible in another database as soon as it is detached in the old one), you could use logical replication to replicate the partition to another PostgreSQL database before you detach it. As soon as replication has caught up, you detach or drop the original partition and attach the copy in the other database.
We are using using trigger to store the data on warehouse. Whenever some process is executed a trigger fires and store some information on the data warehouse. When number of transactions increase it affects the processing time.
What would be the best way to do this activity ?
I was thinking about Foreign Data Wrapper or AWS Read replica. Is there any other way to do such activity would be appreciated as well. Or I might not have to use trigger at all ?
Here are quick tips
Reduce latency between database server
Target database table should have less index, To Improve DML Performance
Logical replication may solve syncing data to warehouse
Option 3 is an architectural change, though you don't need to write triggers on each table to sync data
I have to upsert large number of rows in multiple tables in Postgres 9.6 daily. 6 million rows of 1kb each can be loaded on some of these tables, volume is extremely large and this needs to be done rather quickly.
I have some transformation needs so I cannot use copy directly from source; also, copy doesn't update existing rows. I could use fdw wrapper after transformation, but that will also require SQL. So, I decided to write a java program using producer consumer concurrency pattern with producers reading from files, and consumers writing to postgres with multi row upserts.
I can recreate data from ultimate source, I figure I can skip WAL and use unlogged tables - make them unlogged before copy and make them logged after copy, I am on v9.6.
I did all this and now want to perform performance tests, before I do that I want to know what a commit and checkpoint of an unlogged table means. I suspect data and index files (I have dropped the index) are going to be written at checkpoint. Here are my questions
1) What happens at commit since commit applies to WAL and there is no data for unlogged tables in WAL? Is commit writing my data and index files to disk; or, its unnecessary operation and data will only be written at checkpoint?
2) Ultimately, what should I be tuning for unlogged tables?
Thanks.
I need some expert advice on Postgres
I have few tables in my database that can grow huge, may be a hundred million records and have to implement some sort of data archiving in place. Say I have a subscriber table and subscriber_logs table. The subscriber_logs table will grow huge with time, affecting performance. I wanted to create a separate table called archive_subscriber_logs and create a scheduled task which will read from subscriber_logs and insert the data into archive_subscriber_logs, then delete the dumped data from subscriber_logs.
But my concern is, should I create the archive_subscriber_logs in the same database or in a different database. The problem with storing in a different db is the foreign key constraints that already exists on the main tables.
Anyone can suggest whether same db or different db is preferable? Or any other solutions?
Consider table partitioning, which is implemented in Postgres using table inheritance. This will improve performance on very large tables. Of course you would do measurements first to make sure it is worth implementing. The details are in the excellent Postgres documentation.
Using separate databases is not recommended because you won't be able to have foreign key constraints easily.
I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.