Move data from a database cluster to another - postgresql

In a short time, I will deploy the first version of my web app online. I am using a PostgreSQL database for this project. I have two different choices as database clusters to use: Planescale PostgreSQL, which is free up to 5GB of space, or Digital Ocean cluster, which for 10 gigs costs about 12 dollars a month. The price is not that a big deal, but I think that for at least one year the 5GB plan is more than enough, given that my application will generate almost no revenue.
My question was, how hard can be to move all my data from Planetscale to DigitalOcean or to any other possible provider in the future if more space is needed? Can I do that almost painlessly or I put my data in danger is better to start paying now and don't have problems in the future?
If you have suggestions or recommendations, please, you are welcome

Related

Google Cloud SQL very slow from time to time

It's been almost 3 months I have switched my platform to Google Cloud (Compute Engine + Cloud SQL + Cloud Storage).
I am very happy with it but from time to time I noticed big latency on the Cloud SQL server. My VMs from Compute Engine and my Cloud SQL instance are all on the same location (us-1) datacenter.
Since my Java backend makes a lot of SQL queries to generate a server response, the response times may vary from 250-300ms (normal) up to 2s!
In the console, I notice absolutely nothing: no CPU peaks, no read/write peaks, no backup running, nothing. No alert. Last time it happened, it lasted for a few days and then the response times went suddenly better than ever.
I am pretty sure Google works on the infrastructure behind the scenes... But no way to point that out.
So here's my questions:
Has anybody else ever had noticed the same kind of problem?
It is really annoying for me because my web pages get very slow and I have absolutely no control over it. Plus I loose a lot of time because I generally never first suspect a hardware problem / maintenance but instead something that we introduced in our app. Is it normal or do I have a problem on my SQL instance?
Is there anywhere I can have visibility over what's Google doing on the hardware? I know there are maintenance alerts, but for my zone it seems always empty when it happen.
The only option I have for now is to wait and that is really not acceptable.
I suspect that Google does some sort of IO throttling and their algorithm is not very sophisticated. We have a build server which slows down to a crawl if we do more than two builds within an hour. The build that normally takes 15 minutes will run for more than an hour and we usually terminate it and re-run manually later. This question describes a similar problem and the recommended solution is to use larger volumes as they come with more IO allowance.

Server-side technology for a game

We’re creating a massively-multiplayer social game. We expect up to 1 million concurrent users. The game is not real-time, instead it’s turn-based. We need reliable messaging between our clients and the server, preferably over HTTP protocol.
Besides the multiplayer functionality, we’ll also need a content delivery service.
Could you please recommend a server-side technology for us, so we’ll start searching for the right people to hire?
Is it correct assumption that no single server will hold that amount of load so it must scale horizontally?
Will Windows Azure do the job?
Thanks in advance.
Hmmm... gaming, concurrency, server?
G-WAN (200 KB, full-ANSI C scripts included).
This is the best candidate -by far. And it lets you grow horizontally with load-balancing as time goes (you will not have 1 million users the day you ship the game).
I know they are workng on applets (client-side) so you might benefit asking them the question.
[quote]a million concurrent users IS NOT a real number by any means[/quote]
There are games that have this concurrency, and more. Most of the popular Facebook games do, while they have their 15 days in the sun. That being said, having to solve that problem is a nice problem to have :-)
It's probably possible to write such a system on Azure, but you'd probably be piloting in uncharted waters, and you'd also have to pay Microsoft for the hosting. Compare to Amazon ECC for pricing, for example, and perhaps another approach would be better.
Other technologies to consider, depending on what it is you're really trying to do:
- J2EE
- Erlang/OTP
- Python/Twisted
Also, the networking and multiplayer game FAQ on gamedev.net: http://www.gamedev.net/community/forums/showfaq.asp?forum_id=15
Is it [a] correct assumption that no single server will hold that amount of load so it must scale horizontally?
Yes. It depends on how much work the server has to do per person, but I'd say 1 million concurrent users would require more than one server.
Will Windows Azure do the job?
Windows Azure will provide the computers and the storage for a fee. You have to provide the software and make sure the software can scale horizontally.
Is it correct assumption that no
single server will hold that amount of
load so it must scale horizontally?
No, that is nao avalid assumption. There are servers that are HUGH - 1000+ processors (not on a cluster). Also, a million concurrent users IS NOT a real number by any means - that would be way too much a slice of the concurrent facebook users. And it totally depends no what you do in your game. TUrn based could be chess, and I would not have a problem hostin 1.000.000 concurrent chess boards on a high end server with let's say 256gb memory.
Realistically, though, you possibly will scale horizontaly. First, it makes no sense to ahve a million people in one game / world (even eve online scales horizontally by solar system), second it is likely cheaper than buying a super big computer.
Will Windows Azure do the job?
Hahaha. Seriously. Scaling horizontally - yes.
Look at the price, calcualte up an nistance for a month, compare to dedicated server and laugh on the way to the shop. Nice for very varsying load, bad for base load.
Comapre mid range server (8-12cores, 64gb RAM) to an azure instance and iti s clear ONE azure instance is not going to compare.

Are there clusters available to rent?

I am wondering if there are clusters available to rent.
Scenario:
We have a program that will take what we estimate a week to run(after optimization) on a given file. Quite possibly, longer. Unfortunately, we also need to do approximately 300+ different files, resulting in approximately 300 weeks of compute time(roundable to 6 wallclock years of continuously running job). For a research job that should be done - at the latest - by December, that's simply unacceptable. While we are exploring other options, I am investigating the option of simply renting a Beowulf cluster. The job is academic and will lead towards the completion of a PhD.
What would be ideal would be a company that we send the source and the job files to the company and then receive a week or two later the result files. Voila!
Quick googling doesn't turn up anything terribly promising.
Suggested Solutions?
Cloud computing sounds like what you need. Amazon, Microsoft and Google rent computer resources on a pay for what you use basis.
Amazon's service is the most mature, and there are several questions already about Amazon's service, EG here and here.
Amazon EC2 (Elastic Compute Cloud) sounds like exactly what you're looking for. You can sign up for one or more virtual machines (up to 20 automatically, more if you request permission), starting at $0.10 an hour per VM, plus bandwidth costs (free between EC 2 machines and Amazon's other web services). You can choose between several operating systems (various Linux distributions, OpenSolaris, Windows if you pay extra), and you can use pre-existing machine images or create your own. If you're using all open-source software and don't have much bandwidth costs, it sounds like it would cost you around $5000 to run your job (assuming that your 6 years of compute time was for something comparable to their small instances, with a single virtual CPU).
Once you sign up for the service and get their tools set up, it's pretty easy to get new virtual machines launched. I've even spent the $0.10 to launch a machine for a few minutes just to verify an answer I was giving someone here on StackOverflow; I wanted to check something on Solaris, so I just booted up an instance and had a Solaris VM at my disposal within 5 minutes.
I don't know where are you doing your PhD... Most of the Asian, European, and North American universities have some clusters. You can
meet directly the people at the lab which is in charge of the cluster.
ask your PhD director to arrange that. Maybe he/she have some friends that can handle that.
Also, the classical trick is to use the unused time of the computers of your lab/university... Basically, each computer run a client application that crunch numbers when the computer is not used. See http://boinc.berkeley.edu/
This lead may prove helpful:
http://lcic.org/vendors.html
And this is a fantastic resource site on the matter:
http://www.hpcwire.com
The thread has been replete with pointers to Amazon's EC2 - and correctly so. They are the most mature in this area. Recently, they've released their elastic map-reduce platform which sound similar (although not exactly) like what you are trying to do. Google is not an option for you as their compute model doesn't support the generic compute model you need.
For academic/scientific use, there are several public centers offering HPC capability. In Europe, there is DEISA. http://www.deisa.eu/ and DEISA members. There must be similar possibilities in the US, probably thru the NSF.
For commercial use, check IBM Deep Computing On Demand offerings.
http://www-03.ibm.com/systems/deepcomputing/cod/
There are several ways to get time on clusters.
Purchase time on Amazon elastic cloud. Depending on how familiar you are with their service, it may take time to get it configured the way you want it.
Approach a university and see if they have a commercial program to rent out the time to companies. I know several do. One that I know of specifically is private sector program at NCSA at UIUC. Depending on the institution, they also offer porting and optimization service for your code.
Or you could rent CPU time from a private provider.
I'm from Slovenia and, for example, here we have a great private provider called Arctur. The guys were helpful and and responsive when I contacted them.
You can find them here: hpc.arctur.net
One option is to rent the virtual resources equivalent of whatever number of PCs you need, and set them up as a cluster, using the Amazon Elastic Compute Cloud.
Setting up a beowulf cluster of those is entirely possible.
Check out this link which provides resources and software to do exactly that.
Go to : http://www.extremefactory.com/index.php
True HPC cluster, up to 200 TFlops.

What time should I build to production?

My users use the site pretty equally 24/7. Is there a meme for build timing?
International audience, single cluster of servers on eastern time, but gets hit well into the morning, by international clients.
1 db, several web servers, so if no db, simple, whenever.
But when the site has to come down, when would you, as a programmer be least mad to see SO be down for say 15 minutes.
If there's truly no good time from the users' perspective, then I'd suggest doing it when your team has the most time to recover from any build-related disaster.
Here's what I have done and its worked well for me:
Get a site traffic analysis tool
which will graph hourly user load
Select low-point in graph for doing
updates
If you're small, then yeah, find when your lowest usage period is, and do it then (for us personally, usually around 1AM-3AM PST is the lowest dip...but it never drops to 0 of course). Once you start growing to having a larger userbase, if you want people to take you seriously you'll need to design your application such that you can upgrade without downtime. This is not simple, and it often involves having multiple servers.
I've spent ages trying to get our application to this point, the best I've come up with so far is for a couple hours run both the old version and new version at the same time. Users logged in at the time of the switchover stay on the old version, until they log out. Next time they come in they go to the new version. Any users coming on after the switchover get sent straight to the new version. It's still not foolproof, but it's pretty good.
What kind of an application is it? Most sites that I use tend to update around 2AM or 3AM.
Use a second site, and hotswap as needed.
The issue with hot-swapping, is database would still be shared, and breaking changes would bring stand in down as well.
I guess you have to ask your clients.
In any case, there's the wee hours of the morning. If you're talking about a locally available website, I do not think users will mind if they get an "under maintenance" notice at 2 am in their time zone.
Depends on your location: 4AM East Coast/1AM West Coast is typlically the lightest time.
Pick a few times that you'd like to do it and offer them as choices to the decider-types. Whatever you do, put up a "down for routine maintenance" page while you deploy.
Check the time of least usage
Clone/copy/update latest production code to another directory
If there exists any database migrations to be done, perform any that are required, and non conflicting with the old code base
At time of least usage, move symlink to point to latest code
First use an analysis tool to try and determine your typically "light" traffic times. Depending on the site and your location in the world in comparison to most of your users, it could be 4am, it could be 1pm, who knows. Then, once you have a good timeframe nailed down, make sure to have your deployment process as automated as possible, so that it happens quickly to minimize the downtime of your site.

What technical considerations must a system/network administrator worry about when a site gets onto social bookmarking/sharing sites?

The reason I ask is that Stack Overflow has been Slashdotted, and Redditted.
First, what kinds of effect does this have on the servers that power a website? Second, what can be done by system administrators to ensure that their sites remain up and running as best as possible?
Unfortunately, if you haven't planned for this before it happens, it's probably too late and your users will have a poor experience.
Scalability is your first immediate concern. You may start getting more hits per second than you were getting per month. Your first line of defense is good programming and design. Make sure you're not doing anything stupid like reloading data from a database multiple times per request instead of caching it. Before the spike happens, you need to do some fairly realistic load tests to see where the bottlenecks are.
For absurdly high traffic, consider the ability to switch some dynamic pages over to static pages.
Having a server architecture that can scale also helps. Shared hosts generally don't scale. A single dedicated machine generally doesn't scale. Using something like Amazon's EC2 to host can help, especially if you plan for a cluster of servers from the beginning (even if your cluster is a single computer).
You're next major concern is security. You're suddenly a much bigger target for the bad guys. Make sure you have a good security plan in place. This is something you should always have, but it become more important with high usage.
Firstly, ask if you really want to spend weeks and thousands of $ on planning for something that might not even happen, and if it does happen, lasts about 5 hours.
Easiest solution is to have a good way to switch to a page simply allowing a signup. People will sign up and you can email them when the storm has passed.
More elaborate solutions rely on being able to scale quickly. That's firstly a software issue (can you connect to a db on another server, can you do load balancing). Secondly, your hosting solution needs to support fast expansion. Amazon EC2 comes to mind, or maybe slicehost. With both services you can easily start new instances ("Let's move the database to a different server") and expand your instances ("Let's upgrade the db server to 4GB RAM").
If you keep all data in the db (including sessions), you can easily have multiple front-end servers. For the database I'd usually try a single server with the highest resources available, but only because I haven't worked with db replication and it used to be quite hard to do, at least with mysql. Things might have improved.
The app designer needs to think about scaling up (larger machines with more cores and higher performance) and/or scaling out (distributing workload across multiple systems). The IT guy needs to work out how to best support that. The network is what you look at first, because obviously everything rides on top of it. Starting at the border, that usually means network load balancers and redundant routers being served by multiple providers. You can also look at geographic caching services and apps such as cachefly.
You want to reduce your bottlenecks as much as possible. You also want to design the environment such that it can be scaled out as needed without much work. Do the design work up front and it'll mean less headaches when you do get dugg.
Some ideas (of what I used in the past and current projects):
For boosting performance (if needed) you can put a reverse-proxying, caching squid in front of your server. Of course that only works if you don't have session keys and if the pages are somewhat static (means: they change only once an hour or so) and not personalised.
With the squid you can boost a bloated and slow CMS like typo3, thus having the performance of static websites with the comfort of a CMS.
You can outsource large files to external services like Amazon S3, saving your server's bandwidth.
And if you are able to spend some (three-figures per month) bucks, you can as well use a Content Delivery Network. Whith that in place you automatically have scaling, high-availability and low latencys for your users. Of course, your pages must be cachable, so session keys and personalised pages are a no-no. If designed carefully and with CDNs in mind, you can at least cache SOME content, like pics and videos and static stuff.
The load goes up, as other answers have mentioned.
You'll also get an influx of new users/blog comments/votes from bored folks who are only really interested in vandalism. This is mostly a problem for blogs which allow completely anonymous commenting, where some dreadful stuff will be entered. The blog platform might have spam filters sufficient to block it, but manual intervention is frequently required to clean up remaining drivel.
Even a little barrier to entry, like requiring a user name or email address even if no verification is done, will dramatically reduce the volume of the vandalism.