How can I replicate only one database on another server so that I can generate read-only queries from that other server?
I was trying to do it as a cluster, the problem is that all the databases are copied, and I only need one, I know that my question is something basic and simple, but a little guidance could help me, the database weighs 10GB, perform a dump, copy the database, import it and perform this process every minute is very expensive and takes a long time (3 minutes)
Any idea of a step by step how to do what I need?
I was reading, apparently I could do it with Bucardo, but I can't find clear, up-to-date and accurate documentation on how to do it in production environments.
Thanks.
Related
We have one Google Cloud SQL instance with 1 vCPU for production. I want to grab a copy of the data by exporting to a bucket. Is this safe to do? As in might it block other operations on the instance?
I think it's important to take into consideration the RDBMS that you are using, it's mentioned in here that PostgreSQL has issues when handling big blobs in an export, and at this other SO post there's an answer with the most votes with hints to have an smoother export, since it can lead to DBs getting unresponsive, which is a pretty well known fact.
In the case of MySQL, the product doc have some tips for this case in this article where it stated:
"If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup"
And you can achive this by using mysqldump --lock-tables=false into your export command.
We have a few tables with a pretty large number of entries that sometimes need to be re-imported. Only some tables are concerned, so we don't use restore but a command similar to this:
heroku pg:psql --app ourapp HEROKU_POSTGRESQL_WHITE < data.sql
This takes roughly 30min, mainly due to data upload (about 1GB).
Until now we've put the app in maintenance mode to import the new data, but we'd like to avoid the long downtime in the future.
What would be the best way to achieve this in Heroku?
Our first thought to reduce downtime was to find a way to run the command from a server that will have much better upload speed, but it's still not perfect.
We've thought of using followers but some other tables need to be written to when users are interacting with the app, and we're not sure if the app can be told to fall back on followers even if the master db doesn't have issues.
We've also thought of entirely caching all relevant tables while we're uploading new data, and then clearing that cache, but Heroku doesn't seem to give enough control on the cache to achieve that.
Import into a temporary second table, and then drop first table, and rename second one in a transaction.
I've upgraded a server from SQL Server 2005 to SQL Server 2008 but the database runs slower when running certain stored procedures especially against records which contain more data than others.
It's been suggested that I run a basic reindex to see if this resolves.
Can someone take a look at the screenshot and advise if this will remove any data from my database - if so then this isn't the right thing to do.
Thanks James
p.s I will now attach a screen-shot if I can as not done that before using this Forum
Those actions won't remove any data from the database, but generally I wouldn't advise trying to shrink the database unless you really need the space as this can cause more fragmentation of indexes. The only options that you have ticked there that have the ability to improve performance are the rebuild/reorganise indexes and the update statistics options.
Rather than maintenance plans though I would generally recommend using Ola Hallengren's DB maintenance scripts though as they offer more flexibility and are generally a lot better than these plans:
Ola Hallengren - SQL Server Maintenance Solution
There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.
Hy guys, i have a postgresql 8.3 server with many database.
Actually, im planning to backup those db with a script that will store all the backup in a folder with the same name of the db, for example:
/mypath/backup/my_database1/
/mypath/backup/my_database2/
/mypath/backup/foo_database/
Every day i make 1 dump each 2 hours, overwriting the files every day... for example, in the my_database1 folder i have:
my_database1.backup-00.sql //backup made everyday at the 00.00 AM
my_database1.backup-02.sql //backup made everyday at the 02.00 AM
my_database1.backup-04.sql //backup made everyday at the 04.00 AM
my_database1.backup-06.sql //backup made everyday at the 06.00 AM
my_database1.backup-08.sql //backup made everyday at the 08.00 AM
my_database1.backup-10.sql //backup made everyday at the 10.00 AM
[...and so on...]
This is how i actually assure myself to be able to restore everydatabase loosing at least 2 hours of data.
2 hours still looks too much.
I've got a look to the postgresql pitr trought the WAL files, but, those files seem to contain all the data about all my database.
I'll need to separate those files, in the same way i do separate the dump files.
How to?
Otherwise, there is another easy-to-install to have a backup procedure that allo me to restore just 1 backup at 10 seconds earlier, but without creating a dump file every 10 seconds?
It is not possible with one instance of PostgresSQL.
You can divide your 500 tables between several instances, each listening on different port, but it would mean that they will not use resources like memory effectively (memory reserved but unused in one instance can not be used by another).
Slony will also not work here, as it does not replicate DDL statements, like dropping a table.
I'd recommend doing both:
continue to do your pg_dump backups, but try to smooth it - throttle pg_dump io bandwith, so it will not cripple a server, and run it continuously - when it finishes with the last database then immediately start with a first one;
additionally setup PITR.
This way you can restore a single database fast, but you can loose some data. If you'll decide that you cannot afford to loose that much data then you can restore your PITR backup to a temporary location (with fsync=off and pg_xlog symlinked to ramdisk for speed), pg_dump affected database from there and restore it to your main database.
Why do you want to separate the databases?
The way the PITR works, it is not possible to do since it works on the complete cluster.
What you can do in that case is to create a data directory and a separate cluster for each of those databases (not recommended though since it will require different ports, and postmaster instances).
I believe that the benefits of using PITR instead of regular dumps outweigh having separate backups for each database, so perhaps you can re-think the reasons for why you need to separate it.
Another way could be to set up some replication with Slony-I but that would require a separate machine (or instance) that receives the data. On the other hand, that way you would have a replicated system in near real-time.
Update for comment:
To recover from mistakes, like deleting a table, PITR would be perfect since you can replay to a specific time. However, for 500 databases I understand that can be a lot of overhead. Slony-I would probably not work, since it is replicating. Not sure how it handles table deletions.
I am not aware of any other ways you can go. What I would do would probably still be going for PITR and just not do any mistakes ;). Jokes aside, depending how frequently mistakes are being made this could be a solution:
Set it up for PITR
have a second instance ready on standby.
When a mistake happens, replay the restore to the point in time on the second instance.
Do a pg_dump of the affected database from that instance.
Do a pg_restore on the production instance for that database.
However, it would require you to have a second instance ready, either on the same server or a different one (different is recommended). Also, the restore time would be a bit longer since it would require you to do one extra dump and restore.
I think the way you are doing this is flawed. You should have one database with multiple schemas and roles. Then you can use PITR. However PITR is not a replacement for dumps.