I am running various tests that spend a lot of time in the database.
I'd like to keep it all in memory and have it not touch the db, hopefully that would speed things up. Like using sqlite3's in-memory option. I don't need persistence/durability/whatnot, everything is immediately discarded after the test.
Is that possible? I tried tweaking my postgres memory-related vars (as in the solution below), but that doesn't seem to affect the number of db writes it performs, and I couldn't find anything that looks like an 'in-memory' option.
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
I wrote a detailed post on this some time ago:
Optimise PostgreSQL for fast testing
You may find it informative; it covers options for making PostgreSQL run without durability and other tweaks that're useful for running tests.
You do not actually need in-memory operation. If PostgreSQL is set not to flush changes to disk then in practice there'll be little difference for DBs that fit in RAM, and for DBs that don't fit in RAM it won't crash.
You should test with the same database engine you're using in production. Testing with SQLite, Derby, H2, etc then deploying live on PostgreSQL doesn't make tons of sense... as any Heroku/Rails user can tell you from experience.
Related
I have a mongodb database with several million users.
I wanted to free space and I created a bot to remove inactive users of more than 6 months.
I have been looking at the disk for several minutes
and I have seen that it varied but it will not release large space, not even 1 mb. That's weird.
I've read that "remove" does not actually delete the disc if it does not simply mark that it can be deleted or overwritten. It is true?
That seemed to make a lot of sense to me. So, I've looked for something that forces space to really free up...
I've applied repairDatabase() and I think I've done wrong.
Everything has been blocked!
I have tried the luck and I have restarted the server.
There is a MongoDB service working but its status is maintained in "Starting" (not Running).
I'm reading from other sites that repairDatabase() requires twice as much space as the original size of the database, it does not have it.
I do not know, what is doing, and this could in several hours, days ...
Is the database lost? I think I will stop all services and delete the database.
repairDatabase is similar to fsck. That is, it attempts to clean up the database of any corrupt documents which may be preventing MongoDB to start up. How it works in detail is different depending on your storage engine, but repairDatabase could potentially remove documents from the database.
The details of what the command does is outlined quite clearly (with all the warnings) in the MongoDB documentation page: https://docs.mongodb.com/manual/reference/command/repairDatabase/
I would suggest that next time it's better to read the official documentation first rather than reading what people said in forums. Second-hand information like these could be outdated, or just plain wrong.
Having said that, you should leave the process running until completion, and perform any troubleshooting if the database cannot be started. It may require 2x the disk space of your data, but it's also possible that the command just needs time to finish.
I am using a SQLite backed core data store for an iPhone app I am currently developing.
I was wondering how database maintenance is handled with SQLite on iPhone if at all? Server based RDBMS systems usually require regular defragmentation and statistics updates etc to ensure consistent performance. I'm not sure how this is handled on mobile devices.
How is this handled in iOs and SQLite.? Is there anything that needs to be done on the part of the developer or is it all handled automatically?
My database will contains at most 100,000 records with approximately 500 inserts or deletes a day.
SQLite has the ANALYZE command to update statstics, but it is unlikely that it would have much of an effect in a small database like yours. You could check with EXPLAIN QUERY PLAN whether there is any difference. (In any case, it wouldn't hurt.)
To defragment, you can use the VACUUM command. However, on flash-based storage, fragmentation is much less of a problem than the time where the database is blocked by a complete reorganization would be.
In practice, database maintenance is almost never worth the bother on handheld devices.
I just stumbled upon pgpool-II in my search for clustering my Postgres DB (just getting ready to deploy a web app in a couple months). I still have the shakes from excitement, but I'm nervous, as each time I find something this excellent I am soon let down. Have you any experience with pgpool-II, and will it help me run my database in multiple VMs, and later in multiple physical servers altogether? Is it all I need for backing up, load balancing, and providing a higher availability for my DB server!?
Also, is it easy to use the parallel query function (for instance, in Django or through Pythons psycopg2)? This would be most excellent for providing reporting and aggregation!
One last thing: It seems to work between Postgres and psycopg2. Is this a correct understanding of it, so I can use psycopg2 the same as normal, without regard for pgpool-II?
pgpool-II works fine for what it claims to do. And it fits between your application and the database the way you expect it to; just point psycopg2 toward it instead of directly at the database and off you go.
The main thing you have to note is that while it supports many different types of features--replication, load balancing, parallel query--you can't use them all at once. It sounds like you may be under the impression you can do that, and it doesn't work that way. The documentation is not all that clear on this subject (the English version at least, I can't speak to the original Japanese one).
For example, if you run pgpool-II in its "Master/Slave" mode, so that it supports load-balancing for scaling reads, you have to use another program to actually do the replication between those nodes. Slony was the supported replication solution to put underneath of there in earlier PostgreSQL versions, as of pgpool-II 3.0 and PostgreSQL 9.0 you can also use the soon to be released Streaming Replication/Hot Standby features of that new version as well.
pgpool-II is a useful component and you can use it in a lot of interesting ways, but I doubt it will be "all you need" for every requirement you hope to achieve with it.
Recently we are working on migrate our software from general PC server to a kind of embedded system which use Disk on module (DOM) instead of hard disk drive.
My colleague insist that as DOM could only support about 1 million times of write operation, we should running our database entirely in a RAM disk and backup the database to DOM.
There 3 ways to trigger the backup :
User trigger
Every 30 minutes
Every time when there is some add/update/delete operation in database
As we expecte that user will only modify the database when system is installed, I think maybe postgresql would not write that often.
But I don't know much about postgresql, I can not judge if it worth all this trouble and which approach is better.
What do you think about it?
The problem of wearing out SSDs can be alleviated by whatever firmware the SSD has. Sometimes those chipsets don't do it well, or leave the responsibility to someone else. In this case, you can use a filesystem designed to do wear levelling by itself. UBIFS or LogFS are suitable filesystems.
Assuming that the claim about the DOM write cycles is true, which I can't comment on, then this won't work very well. PostgreSQL assumes that it can write whatever it wants whenever it wants (even if no logical updates are happening), and you have no real chance of making it go along with the 3 triggers that you mention.
What you could do is have the entire thing run on a RAM disk and have some operating system process back this up atomically to permanent storage. This needs careful file system and kernel support. This could work if your device is on most of the time, but probably not so well if it's the sort of thing that you switch on and off like a TV, because the recovery times could be annoying.
Alternatives are using either a more embedded-like RDBMS such as SQLite, or using a storage system that can handle PostgreSQL, like the recent solid state drives, although some SSDs have bogus cache settings that might make them unsuitable for PostgreSQL.
Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.