How to reduce build times with Algolia

How to reduce build times with Algolia - algolia

I set up Algolia a few weeks ago have been liking it a lot. But today I started to notice that that it would take a while for updates in my Rails application to be shown in the Algolia index.
Some investigation shows that for whatever reason, the build time shot up yesterday from around 20s to 750s. There doesn't seem to be anything else particularly strange about that time, except for the fact that there were a bunch of 'Get Settings' search operations for some reason. Any ideas on why this happened or how to fix it?
build time graph
Index operation graph

All FREE, Starter, Growth or Pro accounts run on clusters shared with other users. Only Enterprise accounts have their own cluster(s).
In your case, it sounds like the cluster you've been currently on, has been more solicited that the normal at the time you indexed your data. Resulting in the increase of the average build time. As on many other hosted services, it totally normal and expected. Nothing to worry about.

Related

How long do you fine tune false positives with mod_security and OWASP rules?

I just started using owasp rules and got tons of false positives. Example someone in the description field has written:
"we are going to select some users tomorrow for our job platform."
This is detected as sql injection attack (id 950007). Well it is not. It is valid comment. I have tons of this kind false positives.
First I have set up SecRuleEngine DetectionOnly to gather information.
Then I started using "SecRuleUpdateTargetById 950007 !ARGS:desc" or "SecRuleRemoveById 950007" and I already spend a day for this. modsec_audit.log is alreay > 100MB of size.
I am interested from your experience, how long do you fine tune it (roughly). After you turn it on, do you still get false positives and how do you manage to add white lists on time (do you analyze the logs daily) ?
I need this info to tell by boss the estimation for this task. It seems that will be long lasting.

Totally depends on your site, your technology and your test infrastructure. The OWASP CRS is very noisy by default and does require a LOT of tweaking. Incidentally there is some work going on this and next version might have a normal and a paranoid mode, to hopefully reduce false positives.
To give an example I look after a reasonably sized site with a mixture of static pages and a number of apps written in wide variety of technologies (legacy code - urgh!) and a fair amount of visitors.
Luckily I had a nightly regression run in our preproduction environment with good coverage, so that was my first port of call. I released ModSecurity there after some initial testing, in DetectionOnly mode and tweaked it over a month maybe until I'd addressed all of the issues and was comfortable moving to prod. This wasn't a full month of continuous work of course but 30-60 mins on most days to check the previous nights run, tweak the rules appropriately and set it up for next night's run (damn cookies with their random strings!).
Next up I did the same in production, and pretty much immediately ran into issues with free text feedback fields like you have (of course I didn't see most of these in regression runs). That took a lot of tweaking (had to turn off a lot of SQL Injection rules for those fields). I also got a lot of insight how many bots and scripts run against our site! Most were harmless or Wordpress exploit attempts (luckily I don't run Wordpress), so no real risk to my site, but still an eye opener. I monitored the logs hourly initially (paranoid!), then daily, and then weekly.
I would say from memory that it took another 3 months or so until I was fully comfortable turning it on fully and checked it a lot over the next few days. Luckily all hard work paid off and very few false positives.
Since then it's been fairly stable and very few false alerts - mostly dues to bad data (e.g. email##example.com entered as an email address for a field which didn't validate email addresses properly) and I often left those place and fixed the field validation instead.
Some of the common issues and rules I had to tweak are given here: Modsecurity: Excessive false positives (note you may not need or want to turn off all these rules in your site).
We have Splunk installed on our web servers (basically a tool which sucks up log files and can then be searched or automatically alert or report on issues). So set up a few alerts for when the more troublesome, free text fields fields caused a ModSecurity block (have corrected one or two more false positives there), and also on volume (so we get an alert when a threshold passed and could see we were under a sustained attack - happens few times a year) and weekly/monthly reporting.
So a good 4-5 months to implement from scratch end to end with maybe 30-40 man days work over that time. But it was a very complicated site and I had no prior ModSecurity/WAF experience. On plus side learned a lot about web technologies, ModSecurity and got regexpr-blindness from staring at some of the rules! :-)

What are ways to handle too many database connections?

I have a website that caters to daily fantasy sports players and on days with big slates (i.e. a lot of games), the site gets hammered before the lineups get locked. "Hammered" in my case means 50-100 concurrent users, which I'm sure doesn't sound like a lot, but it seems to kill my database host (Mongolab). During "normal" operations, the max number of open connections I see is less than 500, but during these peak times, there can be upwards of 10,000 open connections!
I think what happens is that users open the site, see that it is not responding quickly enough, and then they proceed to open a bunch more sessions thinking one of them will work. This causes the downward spiral.
Last week I added multi-key indexes which helped quite a bit, and I thought had "solved" the issues. But today the site got hammered and even the new indexes haven't proved to be enough. What else can I do?

Google Cloud SQL very slow from time to time

It's been almost 3 months I have switched my platform to Google Cloud (Compute Engine + Cloud SQL + Cloud Storage).
I am very happy with it but from time to time I noticed big latency on the Cloud SQL server. My VMs from Compute Engine and my Cloud SQL instance are all on the same location (us-1) datacenter.
Since my Java backend makes a lot of SQL queries to generate a server response, the response times may vary from 250-300ms (normal) up to 2s!
In the console, I notice absolutely nothing: no CPU peaks, no read/write peaks, no backup running, nothing. No alert. Last time it happened, it lasted for a few days and then the response times went suddenly better than ever.
I am pretty sure Google works on the infrastructure behind the scenes... But no way to point that out.
So here's my questions:
Has anybody else ever had noticed the same kind of problem?
It is really annoying for me because my web pages get very slow and I have absolutely no control over it. Plus I loose a lot of time because I generally never first suspect a hardware problem / maintenance but instead something that we introduced in our app. Is it normal or do I have a problem on my SQL instance?
Is there anywhere I can have visibility over what's Google doing on the hardware? I know there are maintenance alerts, but for my zone it seems always empty when it happen.
The only option I have for now is to wait and that is really not acceptable.

I suspect that Google does some sort of IO throttling and their algorithm is not very sophisticated. We have a build server which slows down to a crawl if we do more than two builds within an hour. The build that normally takes 15 minutes will run for more than an hour and we usually terminate it and re-run manually later. This question describes a similar problem and the recommended solution is to use larger volumes as they come with more IO allowance.

How scalable is Parse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've been considering using Parse.com's service for my backend, but I'm skeptical about its scalability.
Can it really handle several thousand simultaneous users? If not, is their any good way transitioning away from it?

I know the question may be old, but wanted to provide my 2 cents for others out there who may be considering parse....
Under the simplest of scenarios, parse may work well. As soon as you need to scale up to more complex queries, I have personally found nothing but headaches.
Queries are limited to 1000 records. Initially, you may think this is not an issue, until you start dealing with sub queries, and realize weird data is returned because the sub query cuts records off without warning or error. (FYI, the default is 100 records unless you specify a limit up to 1000, so the problem is even worse if you are not paying attention).
For some strange reason there is a limit to the number of times you can issue a count query in a min. (and this limit appears to be really low). Be prepared to try and throttle your code so you don't hit this limit, otherwise errors are thrown.
Background Jobs do not run reliably. I have had a background job set to run every 5 min, and there are times it takes 20+ min before the job will kick in.
Lots of Timeouts. This is the one that gives me the most heartburn.
A. If you have a cloud function that takes a while to process, you have about 6 or 7 seconds to get it done or it will cut you off.
B. I get the feeling that there is a general instability with the system. Periodically, I run into issues which seems to last for about an hour or so where timeouts happen more frequently (and with relatively simple functions that should return immediately).
I fully regret my decision to use parse, and I am doing all I can to keep the app alive long enough for us to get funding, so we can move off the platform. If anyone has any better alternatives to parse, I am all ears.

[Edit: after three amazing years with the team, I've decided to move on and am no longer a Parse or Facebook employee. The team is in great hands and has done amazing things. The entire backend has been rewritten to increase performance and reliability dramatically. The roadmap is amazing, and I expect great things to come from the team. At the time of my departure, Parse powered over 600,000 applications and served a mind boggling number of requests each day. Were each Parse push to be sent to a unique person, they could form the world's fourth largest country in one day. For future help with Parse, please either post questions here with the parse.com tag or post to the parse-developers Google group.]
Full disclosure: I'm a Parse engineer.
Parse already hosts thousands of apps, let alone users. When we exited beta in late march, we announced over 10,000 applications running on Parse with a 40% month-over-month growth rate. Parse is staffed by a world-class team, many with years of experience in big data and high volume traffic.
We welcome your traffic with open arms; you will be in the company of great teams like Band of the Day and Hipmunk. We are so confident in our services that we built our One Click Export system so people like you can try Parse risk free. If you feel Parse does not meet your performance expectations, we will gladly send you off with all of your data intact.

We chose Parse as the backend for our app.
Conclusion: DON'T.
Stability is a disaster, performance is a disaster too, and so is support (probably because they can't really help you because all the issues are non-reproducible).
Running even the simplest of functions can lead to random timeouts inside Parse (I am talking about simple PFUser login calls for instance):
Error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo=0x17e42480 {NSErrorFailingURLStringKey=https://api.parse.com/2/client_events, NSErrorFailingURLKey=https://api.parse.com/2/client_events, NSLocalizedDescription=The request timed out., NSUnderlyingError=0x17d10e60 "The request timed out."} (Code: 100, Version: 1.2.20)
We encounter timeouts on a daily basis, and this is with an app we are testing with 10 users max!
This is the typical one we get back all the time, at completely arbitrary moments and impossible to reproduce. Calling a Cloud Code function that does a few queries and a few inserts:
{"code":124,"message":"Request timed out"}
Try the same 10 minutes later and it runs in less than a second. Try again 20 minutes later and it takes 30 seconds to execute.
Because there is no transactionality it is really a lot of fun when storing for instance 3 objects in 1 Cloud Code function, where Parse decides to bail out of the function randomly after let's say having saved 2 of the 3 objects. Great to keep your database consistent.
The "best" ones we got where these. Mind you, this is the actual data coming back from a Cloud Code function:
{"code":107,"message":"Received an error with invalid JSON from Parse: <!DOCTYPE html>\n<html>\n<head>\n <title>We're sorry, but something went wrong (500)</title>\n <style type=\"text/css\">\n body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }\n div.dialog {\n width: 25em;\n padding: 0 4em;\n margin: 4em auto 0 auto;\n border: 1px solid #ccc;\n border-right-color: #999;\n border-bottom-color: #999;\n }\n h1 { font-size: 100%; color: #f00; line-height: 1.5em; }\n </style>\n</head>\n\n<body>\n <!-- This file lives in public/500.html -->\n <div class=\"dialog\">\n <h1>We're sorry, but something went wrong.</h1>\n <p>We've been notified about this issue and we'll take a look at it shortly.</p>\n </div>\n</body>\n</html>\n"}
The stuff I describe here is not something that happens once in a blue moon in our project. Except for the 500 errors (which I encountered twice in a month) all the others are seen on a daily basis.
So yes, it's very easy to get started with, but you must take into account that you are working on an unstable platform, so make sure you got your retries and exponential backoff systems up and running, because you will need this!
What worries me the most is that I have no idea what would happen once 20.000 people start using my app on this backend.
edit:
Right now I have this when doing a PFUser login:
Error: Error Domain=PF_AFNetworkingErrorDomain Code=-1011 "Expected status code in (200-299), got 502" UserInfo=0x165ec090 {NSLocalizedRecoverySuggestion=<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
, PF_AFNetworkingOperationFailingURLResponseErrorKey=<NSHTTPURLResponse: 0x16615c10> { URL: https://api.parse.com/2/get } { status code: 502, headers {
"Cache-Control" = "no-cache";
Connection = "keep-alive";
"Content-Length" = 107;
"Content-Type" = "text/html; charset=utf-8";
Date = "Mon, 08 Sep 2014 13:16:46 GMT";
Server = "nginx/1.6.0";
} }, NSErrorFailingURLKey=https://api.parse.com/2/get, NSLocalizedDescription=Expected status code in (200-299), got 502, PF_AFNetworkingOperationFailingURLRequestErrorKey=<NSMutableURLRequest: 0x166f68b0> { URL: https://api.parse.com/2/get }} (Code: 100, Version: 1.2.20)
Isn't it great?

If you're writing a small/simple app (or a throwaway prototype) with little to no logic on the backend then go for it, but for something larger/scalable it's best to avoid it, I can say that from first hand experience. It all sounds good with their user management, push notifications, abstracted storage and what not but in the end it's not worth the trouble. Namely I was developing the backend for an app on Parse, clients were so much into it because it sounded cool and promising (strong marketing I guess), being bought by Facebook and what not, but a few weeks into production major issues/limitations with the platform started arising, what should be a simple app turned out to be a nightmare to develop and scale.
The result/conclusion of the project:
- broke the time window for a relatively simple app - it should have lasted 2-3 months, it lasted almost a year and still isn't stable/reliable, if we used a custom stack it'd be done inside the time window for sure cause I made a similar demo project in 5-10 days with a custom node stack
- lost the client's trust, they're now remaking the app with another team who'll use a custom stack
- lost loads of cash for breaking the time window and trying to make it work
- did so much overtime cause of it that it started to reflect on my health
- never using some platform/solution that promises to have it all, always going with a custom/tried stack
First were the stability issues and constant failing of the platform like server downtimes and random errors, but they have all that sorted out (that was at the start-mid of 2014), but the following problems remain:
you can't debug your code, at least at the time being (there are ways you could make it work with an additional node server and some obscure lib)
the limits are ridiculous, a scalable platform which can do 50-60 API request per second (or more depending on your subscription), which isn't as low it sounds until you start to do strain testing, and when you hit it your code will constantly fail
API calls are measured like this: calling a server function (Parse job) - 1 call, querying the database - 1 call, another query (cause they don't have some advanced/complex query system in place, if you have a more complex database schema you'll realise very soon what I mean) - 1 call, if you need to get more than 1000 queries guess what - query again, etc., query for count (you need to do it as a separate query) which is unreliable (tends to return an approximation for a few thousand entries)
creating/saving ~1000+ simple objects is a strain on the platform/database, deleting 1000 or more objects, even more so, which is ridiculously fast for normal databases, but on Parse it tends to take 5-10 minutes (if you check it more closely it deletes 20 objects per batch)
no way to use most of the npm packages (only the pure JS ones by including the source directly)
if you go and read Parse forums you'll see users downvoting/roasting the Parse team constantly for the platform's lack of features and needing to jump through hoops for arbitrary logic implementation like fetching random entries and similar stuff
they support Stripe integration, but if you want to use Paypal or some other payment service (we decided to use Paypal cause it has a vastly superior country support over Stripe) you can't make it work on Parse, for Paypal integration I had to use a separate server to pull it off
no easy way to sync users and handle concurrency issues, you have to use hacks and some funny logic you wouldn't use or admit using nowhere never
want 100+, let alone 1000+ simultaneous users, good luck pulling that off
when you want to find out the number of entries in a table, you can hit the limit on calling the count query which it's funny, not documented and totally ridiculous, and in the end returns an approximate number
modularity is foreign to the platform, the functions you call from your jobs can't last more than a couple of seconds (7 seconds I think) and when you take into consideration the query time it's bound to happen a lot with more complex queries and some complex logic
You can have something like Cron jobs but they can't last more than 15 minutes (due to the low performance of the platform like multiple queries that's very, very short), they are limited to 2-3-4 simultaneous jobs depending on your subscription fee, and have a very limited/poor scheduling system in place (e.g. you can't edit it from your code, it's very limited so you have to use hacks to run the same job at 2 exact times during the day or something similar, it can't watch for time savings etc.)
When you get an error on the server it can be totally misleading, check the forums for that, can't remember anything from top of my mind
Push notifications are regularly late as much as 20-30 minutes
An arbitrary example: you want to fetch a random item from their database, your app makes the call to a job that'll provide it (1 API call), the job queries the database, but you have to make 2 calls, first to get the count of the items (1 API call) and then a second one to get a random item (1 API call), this is 3 API calls for that functionality, and with 60 requests per second, 20 users can make that call at a given time before hitting the request limit and the platform going haywire, after you include other users browsing through app screens and stuff, you see where this leads...
If it were any good wouldn't Facebook who bought it every mention using it for even some of their apps? I'd suggest 3 things:
- first - don't listen to the Parse guy, it's his platform so he has to promote it, listen to people who have been using it to make something using it
- second - if you need a serious and scalable platform and don't want to go fully custom, go for Amazon Cloud services or something similar that's tested and reliable
- third - stay away from the platform if you have any server side experience, if you don't then go and hire a backend dev for the project, it will be cheaper and you'll get a working solution in the end

I have spent the day looking into parse.com and here is my current opinion based on what I've found (Please bear in mind that I have only very brief experience of developing with the SDK as yet)..
Parse.com clearly has some very attractive positives which is why I found myself looking into it, but for the sake of debate I will concentrate on being critical as the great positives are all listed on their website. (Well done parse.com for attempting to solve such a great problem!)...
In the testimonials, Hipmunk is the biggest name I would say. It is listed as an app which uses the data portion of the SDK. Without approaching Hipmunk developers, I can't know for sure but I can't imagine them storing ALL their data in the parse.com cloud.
After trying and browsing most of the apps listed. None really stand out as being hugely dependent on a server back-end so I find it impossible to get an idea of whether or not scalability has been solved using parse.com based on these.
The website states 40,000 apps and counting. I feel (but do not know) that based on the app gallery, this figure is based on the amount of apps in their user-base, and not real live production apps in the app-stores. The app gallery would feature far more big names if that many apps were using parse.com.
Parse.com is a very new concept, and very different even to its closest rivals. So without concrete evidence on how scalable and stable (and all the rest) it is, then it is very hard for a developer on a project to consider committing to it as there is too much at stake.

I ran tests for my own answer to similar question and it can be VERY, VERY FAST. However , the results you get may depend on the details of your implementation...
Test compared Android SDK to Android using native HTTP stack making Parse/REST calls...
Test Details:
Test environment - newest Android version on 10 month old phone over fast WIFI connection.
( upload 63 pictures where avg filesize=80K )
test 1 using the android SDK RESULT=Slow performance
test 2 using native REST calls over android RESULT=VERT FAST
--EDIT-- as there is interest here....
Regarding http thruput , the parse SDK(android) and performance, it may be that parse.com has not optimized performance on the way that they implement android asyncTask() in the parse.android SDK? How the work that required 8 min. on parse.sdk could be done in 3 seconds on an optimized REST , DIY framework ( see links for details on implementations), i really do not know. If parse have not fixed their SDK implementation since these comparison tests ran, then you probably dont want their default SDK asnycTask stuff doing anything approaching a real workload on the network.

The great attraction about Parse (and similar SaaS) is that you can save tens of thousands on back-end development costs. Given that the back-end is often the most expensive aspect of a Web app; that head-ache is suddenly poof.
The problem with Parse and most (all) SaaS is that the region, power, memory, bandwidth, scalability, thresholds, alerts and various actions are out of your control.
Same with Shopify. It's a great Saas with comprehensive control over products, orders, inventory, and aesthetics -- but zero control over the machine. So, today's SaaS is not a heck of a lot different than godaddy. They invariably oversell or max-out their machines in order to make money; and you are stuck if you really care about ass-kicking performance. You cannot even buy that level of service.
I would like something AT LEAST as powerful and comprehensive as the AWS console. Most techies know and accept that Heroku and Parse are both hosted on AWS. Who cares. So charge more for the added service, but don't deny access to those critical low-level tools that make a Site and App and the user experience zing. Hint to those Parse employees.
At any rate, in answer to the question:
The Parse API is simple JSON. So you can pump out the data in the same JSON format that a Parse application expects.
You might even be able to utilize their PFObject (iOS). At some point, all that highlevel API goes to a common HTTP request/response. The good thing about REST's generality means common-of-the-shelf; things like http, url, strings, and utf. No funky Orb here.

Parse is great to start with especially helper functions/features about user management. But I started encountering issues ..
Long execution/ping times, 1000 object limit INCLUDING subqueries, no datacenters at europe (as far as I know)
It would've been a divine platform if they could sort performance and stability issues. I somehow regret developing with it, but I put 5000+ lines of code so I'm going to stick with it.
Maybe they should separate their DEV apps and PROD apps environments, and only allow PROD apps after some kind of supervision, or create a different environment with only paying customers?
We are in 2014, $20/month servers can handle unoptimized websites(60 not-cached db queries on homepage) with 1 million visits/month, this shouldn't be that hard come on Parse!

It's ok for prototyping the apps, especially if the iOS/Android developer doesn't know how to build a DB/API backend himself.
It's not ok at all, when it comes to developing an application with a logic that requires queries more complex than:
SELECT * FROM 'db' WHERE 'column' = 'value' LIMIT 100;
Related queries and inner joins do not exist on Parse. And good luck updating/removing 320 000 records if you need (that's the number I'm working with now).
The only thing that is really useful is handling the Users through the SDK. If I could find a good docs or even tutorial how to handle/create users through iOS/Android apps using Django and DRF/Tastypie, I'm instantly converting everything is being developed in our company to use that.

What time should I build to production?

My users use the site pretty equally 24/7. Is there a meme for build timing?
International audience, single cluster of servers on eastern time, but gets hit well into the morning, by international clients.
1 db, several web servers, so if no db, simple, whenever.
But when the site has to come down, when would you, as a programmer be least mad to see SO be down for say 15 minutes.

If there's truly no good time from the users' perspective, then I'd suggest doing it when your team has the most time to recover from any build-related disaster.

Here's what I have done and its worked well for me:
Get a site traffic analysis tool
which will graph hourly user load
Select low-point in graph for doing
updates

If you're small, then yeah, find when your lowest usage period is, and do it then (for us personally, usually around 1AM-3AM PST is the lowest dip...but it never drops to 0 of course). Once you start growing to having a larger userbase, if you want people to take you seriously you'll need to design your application such that you can upgrade without downtime. This is not simple, and it often involves having multiple servers.
I've spent ages trying to get our application to this point, the best I've come up with so far is for a couple hours run both the old version and new version at the same time. Users logged in at the time of the switchover stay on the old version, until they log out. Next time they come in they go to the new version. Any users coming on after the switchover get sent straight to the new version. It's still not foolproof, but it's pretty good.

What kind of an application is it? Most sites that I use tend to update around 2AM or 3AM.

Use a second site, and hotswap as needed.

The issue with hot-swapping, is database would still be shared, and breaking changes would bring stand in down as well.

I guess you have to ask your clients.
In any case, there's the wee hours of the morning. If you're talking about a locally available website, I do not think users will mind if they get an "under maintenance" notice at 2 am in their time zone.

Depends on your location: 4AM East Coast/1AM West Coast is typlically the lightest time.

Pick a few times that you'd like to do it and offer them as choices to the decider-types. Whatever you do, put up a "down for routine maintenance" page while you deploy.

Check the time of least usage
Clone/copy/update latest production code to another directory
If there exists any database migrations to be done, perform any that are required, and non conflicting with the old code base
At time of least usage, move symlink to point to latest code

First use an analysis tool to try and determine your typically "light" traffic times. Depending on the site and your location in the world in comparison to most of your users, it could be 4am, it could be 1pm, who knows. Then, once you have a good timeframe nailed down, make sure to have your deployment process as automated as possible, so that it happens quickly to minimize the downtime of your site.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse