Why does my Github webhook keep timing out? - github

We couldn’t deliver this payload: Service Timeout
I was successfully sending webooks to my server 5 minutes ago, and now I just keep getting timeouts. I tried deleting the webook and re-adding it, changing the URL it points to, but nothing.
Am I flooding it with too many pushes, or is GitHub's webhook service just down?

It also turns out that GitHub has a 10-second timeout set on their webhooks. That is what I ran into. See the documentation here.

Unless there is some kind of error on the GitHub side (which doesn't seem to be the case at the moment, given their "System Status" history), you might check the program receiving the payload of that webhook.
See a similar problem in Supybot-plugins 225:
I contacted GitHub support and one of the employees has been troubleshooting this for me. Here is part of what he had to say about the issue:
I just tried making a request manually from one of our machines, and that went through with no error (see curl -v output below).
However, I did notice that it took extremely long for the request to be processed -- over 15 seconds (for 2 bytes of data).
Decoupling the listening and reception of the payload, from its proicessing, is generally the right approach, as I recommended ion "Perl Script slow over Tomcat 6.0 and generates service time out".
The first part should be as fast as possible.

Related

Getting many welcome messages from the same user

I am getting many welcome messages from the same user, is it some kind of a monitoring system by Google?
How can I learn to ignore those requests?
Yes, Google periodically issues a health check against your Action, usually about every 5-10 minutes. Your Action should respond to it normally so Google knows if there is something wrong. If there is, you will receive email that your Action is unavailable because it is unhealthy. They will continue to monitor it and, when healthy again, will restore it.
You don't need to ignore those requests, however you may wish to, either to save on resources or to avoid logging it all the time.
With a library such as multivocal, it detects it and responds automatically - there is nothing you need to to. For other libraries, you will need to examine the raw input sent in the body of your webhook request.
If you are using the Action SDK, you should examine the inputs array to see if there is one with an argument named "is_health_check". If you are using Dialogflow, then you would need to look under originalDetectIntentRequest.data.inputs.

An attempt was made to access a socket in a way forbidden by its access permissions in Azure Web Apps

I'm running a webapi on an Azure website that makes calls to external web services. The webapi handles approximately 2K-3K requests per minute.
Periodically, lots of socket errors start occurring that indicate: "An attempt was made to access a socket in a way forbidden by its access permissions". This error seems to occur regardless of the ip address of the external web service.
At first, I thought it might be ephemeral port exhaustion, but I've limited "connectionManagement" to a maximum of 100 connections.
What would be causing this?
Thanks very much. Happy to provide whatever information might be helpful.
Update 6/1: - doesn't work per 6/2
I added the following to my web.config system.net section:
<defaultProxy enabled="false" useDefaultCredentials="false">
<proxy/>
<bypasslist/>
<module/>
</defaultProxy>
It appears to have helped as I haven't seen this issue in the last 6 hours. I have no idea why this would actually help though as I'm not using any proxy-related stuff.
Any thoughts?
Update 6/2:
Adding the defaultProxy doesn't actually appear to help. The problem is still occurring. Back to the drawing board.
I've finally figured out the cause of this problem. The issue was occurring due to port exhaustion.
I was using an NLog email target which was grabbing and holding onto too many SMTP connections over time (despite the 100 max connection limit). After removing the email target, the issue no longer occurs. I haven't figured out why NLog was exhibiting this behavior.

Distributed Recovery - can this be done without timeout?

We have a mail sender application, that receives a bunch of mails in one blob, and then puts all those mails into database. This can take up to ten minutes. During this process the state of the mailing is BUILDING.
When it is finished the state gets changed to READY.
When the server crashes (shouldn't happen of course) and restarts, it looks for all mailings with status BUILDING and marks them as ERROR. This happens, because we never want to send incomplete mailings.
Now we'd like to scale up using a second server. The recovery strategy above doesn't work here.
e.g. server 1 is BUILDING a mailing, and server 2 crashes and restarts. Now server 2 will see the BUILDING mailing and doesn't know if it's been aborted or if it's running on another server.
So what's the best recovery strategy for distributed services?
(We thought about some timeout mechanism, where the BUILDING server updates a timestamp every few seconds, and when some server reboots it checks if there's a BUILDING mailing that hasn't been updated for x minutes. Then it's highly possible that this mailing has been aborted.)
EDIT:
What I'd like to achieve: If some server restarts (after a crash or just because we added a new mailing server to the cluster), it should not mark mailings as ERROR if this particular mailing is actually being built (by another server).
Nice to have: If this would work without having to store server ids, because then it's possible to easily add and/or remove servers. Else it would not be possible to completely remove some server, because then there might be a BUILDING mailing with that particular server id. But this server got removed and will never get started again. Though the only server that could set the mailing to ERROR will be gone.
Add two things to your state tracking: a timestamp and the server working on it.
If a server starts up and sees anything in a building state for itself it knows it failed. Conversely, if it starts up and sees something in a building state for another server, it now has information that it's going to need to look at later to see if there's a problem that needs to be addressed. You need to worry about multiple servers restarting at the same time, so you can't just have a server grab all old bundles for all servers at startup.
Or you can just use a clustering service for your OS.

GWT-RPC and the infamous sporadic "StatusCodeException: 0" exception revisited

My problem is the infamous "StatusCodeException: 0" problem happening when using GWT 2.6.1 when accessing page via subdomain https://sub.site.com/.
Now, this happens quite sporadically for one customer using IE11 and I can't reproduce this from several distinct computers using IE11, IE10, IE9 or IE8 (not to talk about Chrome or Firefox).
Accessing exactly the same webapp from https://site.com/ seems to work fine for that customer.
This obviously lead me to conclusion that I'm having problem with Same Origin Policy.
What is strange though is that my webapp is designed in the way that no cross-domain or cross-subdomain requests are made. Same goes for no cross-protocol as well no cross-port requests. In other words, Same Origin Policy is not violated in this situation. As a confirmation of that, I can provide following proof:
While being at customer site I've seen how this is reproduced: customer starts using application and everything works fine - all requests are returning response normally. Then, after several minutes of working, exactly the same requests on the same page (without reloads) starts to fail with StatusCodeException: 0.
Basically, both https://sub.site.com and https://site.com points to the same IP, and there is only one Tomcat webapp serving exactly the same resources both for https://sub.site.com and https://site.com.
Another proof would be the codebase of the single GWT module itself: there I use only one instance of one service called DashboardService:
public class DashboardModule extends EntryPoint implements IDashboardModule {
private final DashboardServiceAsync dashboardService = createDashboardService();
#Override
public void onModuleLoad() {
// loading of module elements
// dashboardService is passed as a parameter so only one instance is used
}
/**
* PLEASE SEE QUESTION #1 BELOW CODE SNIPPET
*/
private static final String DASHBOARD_REQUEST_URL = "request";
private static DashboardServiceAsync createDashboardService() {
final DashboardServiceAsync service = GWT.create(DashboardService.class);
((ServiceDefTarget) service).setServiceEntryPoint(DASHBOARD_REQUEST_URL);
return service;
}
}
=================================== EDIT ====================================
After looking in the console at customer location, the error was always the following:
SCRIPT7002: XmlHttpRequest: network error 0x2ee4, ...
so it seems that this has nothing to do with Same-Origin Policy, because as per this article it is described as ERROR_INTERNET_INTERNAL_ERROR An internal error has occurred.
It's a pity, but I've found only 2 mentions of this error which were not resolved:
Error under IE10 and Error under IE11.
I have an assumption that customer is very probably accessing site through some proxy which slightly changes the requests and IE can't handle them.
Question 1: does anybody knows how can I simulate or reproduce mentioned error locally?
Question 2: does anybody knows how this problem can be gracefully worked around?
Question 3: is it ok to simply retry the request, or this request may have reached the server and modify it, so retrying it may produce duplicate modification?
Will try to setup forwarding proxy to simulate possible customer setup to at least reproduce mentioned error...
I greatly appreciate any help!
Ok, so after bugging with this problem for a workweek I finally managed to solve it.
Actually, I was able to reproduce very similar problem locally when I installed Apache2 server in front of Tomcat and accessed it from another VirtualBox Win7 host with IE11. This gave me sporadic StatusCodeException: 0 with Network error 0x2ef3 though but the behaviour was very similar: GWT-RPC requests started to fail after a minute or so. This was reproducable in IE10 and IE11 but working fine in IE8 and IE9 :) (is IE getting crappier with new versions?)
Locally I was able to fix that problem by simply disabling Keep-alive functionality for IE browsers by adding following lines to /etc/apache2/sites-enabled/default-ssl.conf Apache2 ssl configuration file:
# following line was added
BrowserMatch "Trident" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0
</VirtualHost>
</IfModule>
This basically tells Apache2 not to use keep-alive, use special SSL handling and generally downgrade to HTTP 1.0 standard whenever user-agent string in request has Trident word (matches IE11 and IE10 and possibly earlier IEs)
This added Connection: close HTTP header to each response and seemed to work fine locally.
On customers site this wasn't still working and produced the same Network error: 0x2ee4.
It may be worth noting that customer was using McAfee Web Gateway as forwarding proxy which stood in the middle of browser <-> server communication.
Long story short, I found out that the problem was in the following: when page loads there are multiple GET requests being sent to server to get the page, resources etc. Then after 10 seconds of using it (my webapp is single-page-application, so user may spend more than 10 minutes on same page) only GWT-RPC requests are being made to the server which are POST requests. And after a minute of using this page (I suspect 1 minute = keep-alive timeout of proxy server) these POST requests start randomly fail with 0x2ee4 network error.
After I implemented GWT-RPC retry functionality, I found out that after 30 seconds of retries simply ALL GWT-RPC requests fail with above error. Refreshing the page was solving this problem again for a minute or so and then same story happened.
So, I figured out that CRAPPY IE11 and IE10 are incorrectly handling combination of SSL, Keep-alive and POST requests. It seems that сrappy IE10 and IE11 simply can't renew keep-alive ssl connection using POST requests and only do this using GET requests.
Please note that Chrome, Firefox and other normal browsers are handling this situation quite well. When inspecting how Firefox behaves in such situation in Firebug: it can be clearly seen that POST request is made, then it is shown as aborted for like 0.5s and then this it is shown as successful (I suspect that Firefox handles this specific situation and makes GET request to server itself to renew SSL keep-alive connection and then retries POST request)
So, to fix this problem in IE I simply implemented functionality which "pings" server with GET request every 5 seconds (be ready to experiment with this time since this is most probably related to customer's proxy keep-alive timeout).
This made it work (please note that above Apache2 configuration hack is not needed in this case)
I really hope that this will help people with similar issue and save their time
Resources used:
IE Network Error 0x2ef3 question 1
IE Network Error 0x2ef3 question 2
IE Network Error 0x2ef3 question 3
Awesome q&a on how to implement transparent GWT-RPC retry functionality
P.S. Will I report this IE10 and IE11 issue to Microsoft? - really I'm not eager spending 30+ minutes of my time reporting issue on commercial crappy IE browser issue after I've already spent more than a week of finding out the problem.
I insist on recommending Chrome or Firefox or other normal browser to customers as viable alternative and I still think that IE11 is not suited for modern websites with AJAX

Deleting BlogEngine.NET comments from the database

I've searched through all the questions tagged blogengine.net and not found an answer here. I set up a site with BlogEngine.NET. At the time I never configured for spam comment purposes the spam settings (figuring that, dontcha know, I was going to write such earth-shatteringly good content that all the comments would be just affirmations of how great my content was, with the occasional pithy insight)
Turns out the blog is more of a notebook for myself rather than anything I people have got engaged with (quelle surprise) so now I'm going to turn off comments (at least until I can find a good way to moderate them) but I need to delete the existing spam.
I've tried:
using the admin UI but it times out with "Could not delete comment: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated."
writing my own Q&D aspx page to cycle through all the comments but it suffers a similar fate. "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding"
I then turned to the database itself and from inspection it seems that the be_PostComment table is the place to go so I issued a blanket delete statement there and it deleted all the rows.
However they're still in shown in the UI - is this down to caching by ASP.NET?
It must have been ASP.NET caching, as when I wrote the initial question I had tried (CTRL+) F5 to no effect. I cycled the site and the app pool from IIS and I'm now happy to report that all spam comments are gone.