Google Cloud SQL very slow from time to time - google-cloud-sql

It's been almost 3 months I have switched my platform to Google Cloud (Compute Engine + Cloud SQL + Cloud Storage).
I am very happy with it but from time to time I noticed big latency on the Cloud SQL server. My VMs from Compute Engine and my Cloud SQL instance are all on the same location (us-1) datacenter.
Since my Java backend makes a lot of SQL queries to generate a server response, the response times may vary from 250-300ms (normal) up to 2s!
In the console, I notice absolutely nothing: no CPU peaks, no read/write peaks, no backup running, nothing. No alert. Last time it happened, it lasted for a few days and then the response times went suddenly better than ever.
I am pretty sure Google works on the infrastructure behind the scenes... But no way to point that out.
So here's my questions:
Has anybody else ever had noticed the same kind of problem?
It is really annoying for me because my web pages get very slow and I have absolutely no control over it. Plus I loose a lot of time because I generally never first suspect a hardware problem / maintenance but instead something that we introduced in our app. Is it normal or do I have a problem on my SQL instance?
Is there anywhere I can have visibility over what's Google doing on the hardware? I know there are maintenance alerts, but for my zone it seems always empty when it happen.
The only option I have for now is to wait and that is really not acceptable.

I suspect that Google does some sort of IO throttling and their algorithm is not very sophisticated. We have a build server which slows down to a crawl if we do more than two builds within an hour. The build that normally takes 15 minutes will run for more than an hour and we usually terminate it and re-run manually later. This question describes a similar problem and the recommended solution is to use larger volumes as they come with more IO allowance.

Related

PDO queries very slow over high latency, high bandwidth connection

Running postgresql 9.x (9.1 - 9.3)
I have a custom web app built using php's PDO library. Every query in our app uses prepared statements (via our internal PDO wrapper library).
Our production system uses AWS EC2 small instances for the web server and RDS for the app server.
For local development, my local machine serves as the web-server, and an office machine running Mac OSX (Mavericks) serves up the DB.
When I'm at work, I can typically ping my local office DB server and get 1-5ms ping times. Everything performs well, page load times are very speedy, my internal timer shows that PHP runs the page from start to end in about 12ms.
Now, the issue comes in when I take my work laptop home-- from home, I get about 50-60ms ping times to the office DB server. If I run my development machine at home, page times now take 5-10 seconds to load-- every time. Granted, there are 4 db queries running per page load, it's very, very little data. I've tried TCP_NOWAIT settings, I've tried loading pgbouncer on my local machine with persistent connections to the remote db-- nothing has helped so far.
When timing the queries, we have a simple query that returns 100 bytes of data that runs in .0006 seconds locally to taking around 1 second to run remotely. Lastly, I'll say it appears to be related to latency only, no matter how much data a query returns, it's like it takes around 1 second longer than it normally would if running locally.. give or take.
I was simply wondering if anyone could help me resolve what this delay might be. It seems that every single query, no matter how much data, imparts a delay of around a second, give or take. The odd thing is that when I run PGAdmin on the my machine connecting to the remote DB, it takes nowhere near that much time to run simple queries.
Does anyone have any idea of other things to try? I'm not runnig an SSL DB connection, or using any compression, I'm willing to try if necessary, however, that's one thing I haven't gotten to work before, and I doubt that'll help with latency anyway.

Severe delays in cloud SQL responses

In the past 4-5 hours there have been 10s of simple read queries that took 40-70 seconds to return result from the cloud SQL DB. Usually they take 50ms or so. Is there some ongoing issue? I can provide DB IDs and specific times if needed.
Thanks.
Between 11.00PST and 11.30PST there was an issue that interrupted many Cloud SQL instances. The problem should now be resolved.
We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority for the Google Cloud Platform, and we are making continuous improvements to make our systems better.
To be kept informed of other Google Cloud SQL issues and launches, please join google-cloud-sql-announce#googlegroups.com
https://groups.google.com/forum/#!forum/google-cloud-sql-announce
(from Joe Faith in another thread)

How scalable is Parse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've been considering using Parse.com's service for my backend, but I'm skeptical about its scalability.
Can it really handle several thousand simultaneous users? If not, is their any good way transitioning away from it?
I know the question may be old, but wanted to provide my 2 cents for others out there who may be considering parse....
Under the simplest of scenarios, parse may work well. As soon as you need to scale up to more complex queries, I have personally found nothing but headaches.
Queries are limited to 1000 records. Initially, you may think this is not an issue, until you start dealing with sub queries, and realize weird data is returned because the sub query cuts records off without warning or error. (FYI, the default is 100 records unless you specify a limit up to 1000, so the problem is even worse if you are not paying attention).
For some strange reason there is a limit to the number of times you can issue a count query in a min. (and this limit appears to be really low). Be prepared to try and throttle your code so you don't hit this limit, otherwise errors are thrown.
Background Jobs do not run reliably. I have had a background job set to run every 5 min, and there are times it takes 20+ min before the job will kick in.
Lots of Timeouts. This is the one that gives me the most heartburn.
A. If you have a cloud function that takes a while to process, you have about 6 or 7 seconds to get it done or it will cut you off.
B. I get the feeling that there is a general instability with the system. Periodically, I run into issues which seems to last for about an hour or so where timeouts happen more frequently (and with relatively simple functions that should return immediately).
I fully regret my decision to use parse, and I am doing all I can to keep the app alive long enough for us to get funding, so we can move off the platform. If anyone has any better alternatives to parse, I am all ears.
[Edit: after three amazing years with the team, I've decided to move on and am no longer a Parse or Facebook employee. The team is in great hands and has done amazing things. The entire backend has been rewritten to increase performance and reliability dramatically. The roadmap is amazing, and I expect great things to come from the team. At the time of my departure, Parse powered over 600,000 applications and served a mind boggling number of requests each day. Were each Parse push to be sent to a unique person, they could form the world's fourth largest country in one day. For future help with Parse, please either post questions here with the parse.com tag or post to the parse-developers Google group.]
Full disclosure: I'm a Parse engineer.
Parse already hosts thousands of apps, let alone users. When we exited beta in late march, we announced over 10,000 applications running on Parse with a 40% month-over-month growth rate. Parse is staffed by a world-class team, many with years of experience in big data and high volume traffic.
We welcome your traffic with open arms; you will be in the company of great teams like Band of the Day and Hipmunk. We are so confident in our services that we built our One Click Export system so people like you can try Parse risk free. If you feel Parse does not meet your performance expectations, we will gladly send you off with all of your data intact.
We chose Parse as the backend for our app.
Conclusion: DON'T.
Stability is a disaster, performance is a disaster too, and so is support (probably because they can't really help you because all the issues are non-reproducible).
Running even the simplest of functions can lead to random timeouts inside Parse (I am talking about simple PFUser login calls for instance):
Error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo=0x17e42480 {NSErrorFailingURLStringKey=https://api.parse.com/2/client_events, NSErrorFailingURLKey=https://api.parse.com/2/client_events, NSLocalizedDescription=The request timed out., NSUnderlyingError=0x17d10e60 "The request timed out."} (Code: 100, Version: 1.2.20)
We encounter timeouts on a daily basis, and this is with an app we are testing with 10 users max!
This is the typical one we get back all the time, at completely arbitrary moments and impossible to reproduce. Calling a Cloud Code function that does a few queries and a few inserts:
{"code":124,"message":"Request timed out"}
Try the same 10 minutes later and it runs in less than a second. Try again 20 minutes later and it takes 30 seconds to execute.
Because there is no transactionality it is really a lot of fun when storing for instance 3 objects in 1 Cloud Code function, where Parse decides to bail out of the function randomly after let's say having saved 2 of the 3 objects. Great to keep your database consistent.
The "best" ones we got where these. Mind you, this is the actual data coming back from a Cloud Code function:
{"code":107,"message":"Received an error with invalid JSON from Parse: <!DOCTYPE html>\n<html>\n<head>\n <title>We're sorry, but something went wrong (500)</title>\n <style type=\"text/css\">\n body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }\n div.dialog {\n width: 25em;\n padding: 0 4em;\n margin: 4em auto 0 auto;\n border: 1px solid #ccc;\n border-right-color: #999;\n border-bottom-color: #999;\n }\n h1 { font-size: 100%; color: #f00; line-height: 1.5em; }\n </style>\n</head>\n\n<body>\n <!-- This file lives in public/500.html -->\n <div class=\"dialog\">\n <h1>We're sorry, but something went wrong.</h1>\n <p>We've been notified about this issue and we'll take a look at it shortly.</p>\n </div>\n</body>\n</html>\n"}
The stuff I describe here is not something that happens once in a blue moon in our project. Except for the 500 errors (which I encountered twice in a month) all the others are seen on a daily basis.
So yes, it's very easy to get started with, but you must take into account that you are working on an unstable platform, so make sure you got your retries and exponential backoff systems up and running, because you will need this!
What worries me the most is that I have no idea what would happen once 20.000 people start using my app on this backend.
edit:
Right now I have this when doing a PFUser login:
Error: Error Domain=PF_AFNetworkingErrorDomain Code=-1011 "Expected status code in (200-299), got 502" UserInfo=0x165ec090 {NSLocalizedRecoverySuggestion=<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
, PF_AFNetworkingOperationFailingURLResponseErrorKey=<NSHTTPURLResponse: 0x16615c10> { URL: https://api.parse.com/2/get } { status code: 502, headers {
"Cache-Control" = "no-cache";
Connection = "keep-alive";
"Content-Length" = 107;
"Content-Type" = "text/html; charset=utf-8";
Date = "Mon, 08 Sep 2014 13:16:46 GMT";
Server = "nginx/1.6.0";
} }, NSErrorFailingURLKey=https://api.parse.com/2/get, NSLocalizedDescription=Expected status code in (200-299), got 502, PF_AFNetworkingOperationFailingURLRequestErrorKey=<NSMutableURLRequest: 0x166f68b0> { URL: https://api.parse.com/2/get }} (Code: 100, Version: 1.2.20)
Isn't it great?
If you're writing a small/simple app (or a throwaway prototype) with little to no logic on the backend then go for it, but for something larger/scalable it's best to avoid it, I can say that from first hand experience. It all sounds good with their user management, push notifications, abstracted storage and what not but in the end it's not worth the trouble. Namely I was developing the backend for an app on Parse, clients were so much into it because it sounded cool and promising (strong marketing I guess), being bought by Facebook and what not, but a few weeks into production major issues/limitations with the platform started arising, what should be a simple app turned out to be a nightmare to develop and scale.
The result/conclusion of the project:
- broke the time window for a relatively simple app - it should have lasted 2-3 months, it lasted almost a year and still isn't stable/reliable, if we used a custom stack it'd be done inside the time window for sure cause I made a similar demo project in 5-10 days with a custom node stack
- lost the client's trust, they're now remaking the app with another team who'll use a custom stack
- lost loads of cash for breaking the time window and trying to make it work
- did so much overtime cause of it that it started to reflect on my health
- never using some platform/solution that promises to have it all, always going with a custom/tried stack
First were the stability issues and constant failing of the platform like server downtimes and random errors, but they have all that sorted out (that was at the start-mid of 2014), but the following problems remain:
you can't debug your code, at least at the time being (there are ways you could make it work with an additional node server and some obscure lib)
the limits are ridiculous, a scalable platform which can do 50-60 API request per second (or more depending on your subscription), which isn't as low it sounds until you start to do strain testing, and when you hit it your code will constantly fail
API calls are measured like this: calling a server function (Parse job) - 1 call, querying the database - 1 call, another query (cause they don't have some advanced/complex query system in place, if you have a more complex database schema you'll realise very soon what I mean) - 1 call, if you need to get more than 1000 queries guess what - query again, etc., query for count (you need to do it as a separate query) which is unreliable (tends to return an approximation for a few thousand entries)
creating/saving ~1000+ simple objects is a strain on the platform/database, deleting 1000 or more objects, even more so, which is ridiculously fast for normal databases, but on Parse it tends to take 5-10 minutes (if you check it more closely it deletes 20 objects per batch)
no way to use most of the npm packages (only the pure JS ones by including the source directly)
if you go and read Parse forums you'll see users downvoting/roasting the Parse team constantly for the platform's lack of features and needing to jump through hoops for arbitrary logic implementation like fetching random entries and similar stuff
they support Stripe integration, but if you want to use Paypal or some other payment service (we decided to use Paypal cause it has a vastly superior country support over Stripe) you can't make it work on Parse, for Paypal integration I had to use a separate server to pull it off
no easy way to sync users and handle concurrency issues, you have to use hacks and some funny logic you wouldn't use or admit using nowhere never
want 100+, let alone 1000+ simultaneous users, good luck pulling that off
when you want to find out the number of entries in a table, you can hit the limit on calling the count query which it's funny, not documented and totally ridiculous, and in the end returns an approximate number
modularity is foreign to the platform, the functions you call from your jobs can't last more than a couple of seconds (7 seconds I think) and when you take into consideration the query time it's bound to happen a lot with more complex queries and some complex logic
You can have something like Cron jobs but they can't last more than 15 minutes (due to the low performance of the platform like multiple queries that's very, very short), they are limited to 2-3-4 simultaneous jobs depending on your subscription fee, and have a very limited/poor scheduling system in place (e.g. you can't edit it from your code, it's very limited so you have to use hacks to run the same job at 2 exact times during the day or something similar, it can't watch for time savings etc.)
When you get an error on the server it can be totally misleading, check the forums for that, can't remember anything from top of my mind
Push notifications are regularly late as much as 20-30 minutes
An arbitrary example: you want to fetch a random item from their database, your app makes the call to a job that'll provide it (1 API call), the job queries the database, but you have to make 2 calls, first to get the count of the items (1 API call) and then a second one to get a random item (1 API call), this is 3 API calls for that functionality, and with 60 requests per second, 20 users can make that call at a given time before hitting the request limit and the platform going haywire, after you include other users browsing through app screens and stuff, you see where this leads...
If it were any good wouldn't Facebook who bought it every mention using it for even some of their apps? I'd suggest 3 things:
- first - don't listen to the Parse guy, it's his platform so he has to promote it, listen to people who have been using it to make something using it
- second - if you need a serious and scalable platform and don't want to go fully custom, go for Amazon Cloud services or something similar that's tested and reliable
- third - stay away from the platform if you have any server side experience, if you don't then go and hire a backend dev for the project, it will be cheaper and you'll get a working solution in the end
I have spent the day looking into parse.com and here is my current opinion based on what I've found (Please bear in mind that I have only very brief experience of developing with the SDK as yet)..
Parse.com clearly has some very attractive positives which is why I found myself looking into it, but for the sake of debate I will concentrate on being critical as the great positives are all listed on their website. (Well done parse.com for attempting to solve such a great problem!)...
In the testimonials, Hipmunk is the biggest name I would say. It is listed as an app which uses the data portion of the SDK. Without approaching Hipmunk developers, I can't know for sure but I can't imagine them storing ALL their data in the parse.com cloud.
After trying and browsing most of the apps listed. None really stand out as being hugely dependent on a server back-end so I find it impossible to get an idea of whether or not scalability has been solved using parse.com based on these.
The website states 40,000 apps and counting. I feel (but do not know) that based on the app gallery, this figure is based on the amount of apps in their user-base, and not real live production apps in the app-stores. The app gallery would feature far more big names if that many apps were using parse.com.
Parse.com is a very new concept, and very different even to its closest rivals. So without concrete evidence on how scalable and stable (and all the rest) it is, then it is very hard for a developer on a project to consider committing to it as there is too much at stake.
I ran tests for my own answer to similar question and it can be VERY, VERY FAST. However , the results you get may depend on the details of your implementation...
Test compared Android SDK to Android using native HTTP stack making Parse/REST calls...
Test Details:
Test environment - newest Android version on 10 month old phone over fast WIFI connection.
( upload 63 pictures where avg filesize=80K )
test 1 using the android SDK RESULT=Slow performance
test 2 using native REST calls over android RESULT=VERT FAST
--EDIT-- as there is interest here....
Regarding http thruput , the parse SDK(android) and performance, it may be that parse.com has not optimized performance on the way that they implement android asyncTask() in the parse.android SDK? How the work that required 8 min. on parse.sdk could be done in 3 seconds on an optimized REST , DIY framework ( see links for details on implementations), i really do not know. If parse have not fixed their SDK implementation since these comparison tests ran, then you probably dont want their default SDK asnycTask stuff doing anything approaching a real workload on the network.
The great attraction about Parse (and similar SaaS) is that you can save tens of thousands on back-end development costs. Given that the back-end is often the most expensive aspect of a Web app; that head-ache is suddenly poof.
The problem with Parse and most (all) SaaS is that the region, power, memory, bandwidth, scalability, thresholds, alerts and various actions are out of your control.
Same with Shopify. It's a great Saas with comprehensive control over products, orders, inventory, and aesthetics -- but zero control over the machine. So, today's SaaS is not a heck of a lot different than godaddy. They invariably oversell or max-out their machines in order to make money; and you are stuck if you really care about ass-kicking performance. You cannot even buy that level of service.
I would like something AT LEAST as powerful and comprehensive as the AWS console. Most techies know and accept that Heroku and Parse are both hosted on AWS. Who cares. So charge more for the added service, but don't deny access to those critical low-level tools that make a Site and App and the user experience zing. Hint to those Parse employees.
At any rate, in answer to the question:
The Parse API is simple JSON. So you can pump out the data in the same JSON format that a Parse application expects.
You might even be able to utilize their PFObject (iOS). At some point, all that highlevel API goes to a common HTTP request/response. The good thing about REST's generality means common-of-the-shelf; things like http, url, strings, and utf. No funky Orb here.
Parse is great to start with especially helper functions/features about user management. But I started encountering issues ..
Long execution/ping times, 1000 object limit INCLUDING subqueries, no datacenters at europe (as far as I know)
It would've been a divine platform if they could sort performance and stability issues. I somehow regret developing with it, but I put 5000+ lines of code so I'm going to stick with it.
Maybe they should separate their DEV apps and PROD apps environments, and only allow PROD apps after some kind of supervision, or create a different environment with only paying customers?
We are in 2014, $20/month servers can handle unoptimized websites(60 not-cached db queries on homepage) with 1 million visits/month, this shouldn't be that hard come on Parse!
It's ok for prototyping the apps, especially if the iOS/Android developer doesn't know how to build a DB/API backend himself.
It's not ok at all, when it comes to developing an application with a logic that requires queries more complex than:
SELECT * FROM 'db' WHERE 'column' = 'value' LIMIT 100;
Related queries and inner joins do not exist on Parse. And good luck updating/removing 320 000 records if you need (that's the number I'm working with now).
The only thing that is really useful is handling the Users through the SDK. If I could find a good docs or even tutorial how to handle/create users through iOS/Android apps using Django and DRF/Tastypie, I'm instantly converting everything is being developed in our company to use that.

Hosting needs for a turn based iPhone game

So i've been spending some time developing an iPhone app - it's a simple little game and is similar to "Words with friends" in that it:
1) is turn based
2) contacts a web service API to store the "game data" (turns, user info, etc).
In my case, i'm using .NET MVC and a SQL Server backend to develop the API. We're not talking an immense amount of data here - small images will be transferred back and forth and stored in the database though. A typical request would see a few records added or changed in the database.
I mostly don't have much concept of when things would start to get overloaded - my concern, of course, is that this thing takes off (obviously wishful thinking) and then my server gets so overwhelmed that it dies. That being said, I don't want to spend time and money on Windows Azure or something when my hosting needs may be totally trivial.
So, my somewhat general question is this - does anyone have any firsthand knowledge of when things start to get overloaded? Like...just a general estimate of number of requests or something for a time period, assuming each request hits the .NET app which then hits the database a reasonable number of times.
Even some anecdotal "My similar API gets hit 10,000 times a minute and is hosted on crappy shared hosting" would be awesome just so I get some concept.
Thanks in advance!
It is very hard to give a good answer to your question as it greatly depends on what precisely the backend does for each request. Even "trivial" services as you describe can easily differ greatly in performance depending on the actual implementation.
As a rough guideline based on our projects, if your API is a single HTTP request (no HTTPS), hitting a bare-bones controller, being translated into a single, simple SQL statement ("SELECT * FROM foo WHERE bar") returning less than 100 Bytes of data, you can serve about 750 requests per minute on a 32 Bit, 1 Gigahertz box with 512MB ram.
But this number will be reduced to 75 or less if any of those factors go up.
That said:
This is the poster-child case for cloud computing.
If Azure is too much hassle / cost for you (which is not an uncommon complaint from independent developers) you have three main alternatives:
1) Ditch .NET in favor of Python and host within Google App Engine
Python is quick to learn and GAE scales beautifully without you ever needing to care. Best of all, there is a huge free-tier so unless your app really takes off, you won't pay a cent. As you are developing for iOS, I assume you aren't hell bent on .NET to begin with.
2) If you need .NET, go with AWS
They also have a rather large free-tier. Either throw everything on top of a Mono stack (completely free for the 1st year) or shell out the money for a Windows EC2 instance. This takes more planning than GAE but with a little work you can make it scale to wherever your app goes.
If cost is a concern, use the same AWS cluster to host several of your Apps' APIs.
3) Go with OpenFeint's Multiplayer API
OpenFeint supports basic multiplayer games. If you can implement the needed functionality using it, then this might be the best solution. If not, look into (1) and (2).
How long is a piece of string? It all depends with the hosting and connection speeds. .Net is more than capable of handling LARGE amounts of requests. The simplest solution is to monitor the server (or if you cannot, monitor your web services performance) and get better hosting if your app starts to suffer.

PostgreSQL availability and merges

Is there a PostgreSQL HA solution that can handle a splitbrain situation gracefully. To elaborate, the system i'm working on is expected to run in several areas with users close to the servers there and connectivity between the zones is known to be questionable. I'd like for the users to be able to continue using the system in a degraded state (without updates from disconnected zones) and for a sensible merge once they come back online.
If you're prepared to live with a time delay, there should be some log shipping solutions that you could implement with a scheduled job. Basically, you send slices of the transaction log to the backup server. Here's some links with a better description:
http://developer.postgresql.org/pgdocs/postgres/warm-standby.html
http://developer.postgresql.org/~wieck/slony1/adminguide-1.1.rc1/logshipping.html
http://www.network-theory.co.uk/docs/postgresql/vol3/RecordbasedLogShipping.html
Note that a full implementation of Slony-I may be clunky (at least I found it that way a couple years ago, it may have drastically improved).