Robots visiting website 1k+ times a day [closed] - server

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed last year.
Improve this question
I'm having difficulties identifying what causes my website to load extremely slow, I have found something but google archives don't provide the right answer or even explanation.
In my raw-access logs I found multiple records about different robots accessing my website, here's an example:
202.46.53.40 - - [31/Dec/2016:03:30:51 +0100] "GET /en/home/184-2016-hyperlite-motive-wakeboard.html HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
202.46.54.27 - - [31/Dec/2016:03:30:52 +0100] "GET /en/home/184-2016-hyperlite-motive-wakeboard.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
202.46.56.210 - - [31/Dec/2016:03:30:53 +0100] "GET /en/home/184-2016-hyperlite-motive-wakeboard.html HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
202.46.56.114 - - [31/Dec/2016:03:30:54 +0100] "GET /en/wakeboards/184-2016-hyperlite-motive-wakeboard.html HTTP/1.1" 200 140041 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
180.76.15.154 - - [31/Dec/2016:03:31:26 +0100] "GET /en/26-sup HTTP/1.1" 406 73864 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
157.55.39.40 - - [31/Dec/2016:03:31:50 +0100] "GET /en/helmets/57-2015-mystic-mk8-helmet-mint.html HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.40 - - [31/Dec/2016:03:31:55 +0100] "GET /en/helmets/57-2015-mystic-mk8-helmet-mint.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
77.75.77.95 - - [31/Dec/2016:03:34:03 +0100] "GET /robots.txt HTTP/1.1" 404 57839 "-" "Mozilla/5.0 (compatible; SeznamBot/3.2; +http://napoveda.seznam.cz/en/seznambot-intro/)"
77.75.77.95 - - [31/Dec/2016:03:34:05 +0100] "GET /en/31-bags HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; SeznamBot/3.2; +http://napoveda.seznam.cz/en/seznambot-intro/)"
163.172.66.143 - - [31/Dec/2016:03:43:36 +0100] "GET /en/13-rokavice HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)"
202.46.54.134 - - [31/Dec/2016:04:04:20 +0100] "GET /en/accessories/169-plavutke-pro-ii.html HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
202.46.54.102 - - [31/Dec/2016:04:04:21 +0100] "GET /en/accessories/169-plavutke-pro-ii.html HTTP/1.1" 301 - "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
202.46.48.140 - - [31/Dec/2016:04:04:22 +0100] "GET /en/accessories/169-plavutke-pro-ii.html HTTP/1.1" 200 110602 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"
180.76.15.10 - - [31/Dec/2016:04:04:55 +0100] "GET /en/56-kiteboarding-gear HTTP/1.1" 406 62988 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
66.249.76.47 - - [31/Dec/2016:04:25:33 +0100] "GET /380/komplet-oceanrodeo-razor-fst8-advenced-performance-kite.jpg HTTP/1.1" 200 126044 "-" "Googlebot-Image/1.0"
112.210.233.49 - - [31/Dec/2016:04:29:17 +0100] "POST /modules/sendtoafriend/sendtoafriend_ajax.php?rand=1472104141118 HTTP/1.1" 500 - "https://proadrenalin.si/modules/sendtoafriend/sendtoafriend_ajax.php?rand=1472104141118" "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1)"
66.249.76.78 - - [31/Dec/2016:04:33:09 +0100] "POST /modules/leocustomajax/leoajax.php?rand=1482019200024 HTTP/1.1" 200 14 "https://www.proadrenalin.si/en/20-wakeboards?p=3" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Is it possible that this visits are causing the problem with slow page load ?
For 31st December i have 1342 requests, 1st Jan. 1222, 2nd Jan - 2374 requests, 4th Jan - 2391... This goes on every day.
Webshop is run by Prestashop, and as far as I've been inspecting the platform is not causing any problems which would result in slow page load. Most of modules are disabled, removed, only needed (enabled) ones are on server, cashing is on, recompile when something changes..
Any tips, links to read, possible solutions...would be very useful because currently I'm living in nightmare..

You can find the IP patterns of the robots hitting your store and then block those IPs using the .htaccess file.
Visit following URL for more details on this:
How to Block an IP address range using the .htaccess file

I have the same problem and currently trying a solution to block that bot via robots.txt like this:
User-agent: SeznamBot
Disallow: /
Taken from official source https://napoveda.seznam.cz/en/full-text-search/crawling-control/

Related

Error converting markdown into pdf in jupyter notebook

I can't convert markdown notebook into pdf but am able to convert code cell.
The error I got is :
500 : Internal Server Error
The error was:
nbconvert failed: PDF creating failed
[I 10:32:16.940 NotebookApp] Running pdflatex 3 times: [u'pdflatex', u'notebook.tex']
[C 10:32:17.174 NotebookApp] pdflatex failed: [u'pdflatex', u'notebook.tex']
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./notebook.tex
LaTeX2e <2015/01/01>
Babel <3.9l> and hyphenation patterns for 79 languages loaded.
tex/latex/amsfonts/umsb.fd)
LaTeX Warning: No \author given.
! Missing $ inserted.
<inserted text>
$
l.232 ...l the conduits, \$ \frac{\Delta P}{\mu L}
\$ are set to be 1.0
?
! Emergency stop.
<inserted text>
$
l.232 ...l the conduits, \$ \frac{\Delta P}{\mu L}
\$ are set to be 1.0
! ==> Fatal error occurred, no output PDF file produced!
Transcript written on notebook.log.
[I 10:32:17.175 NotebookApp] Running bibtex 1 time: [u'bibtex', u'notebook']
[W 10:32:17.262 NotebookApp] bibtex had problems, most likely because there were no citations
[I 10:32:17.262 NotebookApp] Running pdflatex 3 times: [u'pdflatex', u'notebook.tex']
[C 10:32:17.501 NotebookApp] pdflatex failed: [u'pdflatex', u'notebook.tex']
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
LaTeX Warning: No \author given.
! Missing $ inserted.
<inserted text>
$
l.232 ...l the conduits, \$ \frac{\Delta P}{\mu L}
\$ are set to be 1.0
?
! Emergency stop.
<inserted text>
$
l.232 ...l the conduits, \$ \frac{\Delta P}{\mu L}
\$ are set to be 1.0
! ==> Fatal error occurred, no output PDF file produced!
Transcript written on notebook.log.
[W 10:32:17.503 NotebookApp] 500 GET /nbconvert/pdf/ComputerProject1.ipynb?download=true (::1): nbconvert failed: PDF creating failed
[E 10:32:17.505 NotebookApp] {
"Accept-Language": "en-US,en;q=0.8",
"Accept-Encoding": "gzip, deflate, sdch",
"Connection": "keep-alive",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36",
"Host": "localhost:8888",
"Referer": "http://localhost:8888/notebooks/ComputerProject1.ipynb",
"Upgrade-Insecure-Requests": "1"
}
[E 10:32:17.505 NotebookApp] 500 GET /nbconvert/pdf/ComputerProject1.ipynb?download=true (::1) 787.74ms referer=http://localhost:8888/notebooks/ComputerProject1.ipynb
I have pandoc, mactex installed. Anyone knows how to solve this?
Thanks,

Multiline regex for a Ruby on Rails log file

I work with a Ruby on Rails log file which looks like the following example:
[...]
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:39:32 -0800
Processing by Staffer::SessionsController#new as */*
Rendered layouts/_compatible_browsers.html.erb (0.9ms)
Rendered layouts/headers/_guest.html.erb (0.6ms)
Cache digest for layouts/social_media_footer_link.html: bc9b2db49cc435f550be0f0dffe79548
Cache digest for layouts/_footer.html: 87dcaa136f1edad80dd5eb5a4b5dde82
Read fragment views/staffer_footer/87dcaa136f1edad80dd5eb5a4b5dde82/87dcaa136f1edad80dd5eb5a4b5dde82 (4.6ms)
Rendered layouts/_footer.html.erb (7.4ms)
Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:49:32 -0800
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:59:32 -0800
Processing by Staffer::SessionsController#new as */*
Rendered layouts/_compatible_browsers.html.erb (0.9ms)
Rendered layouts/headers/_guest.html.erb (0.6ms)
Cache digest for layouts/social_media_footer_link.html: bc9b2db49cc435f550be0f0dffe79548
Read fragment views/staffer_footer/87dcaa136f1edad80dd5eb5a4b5dde82/87dcaa136f1edad80dd5eb5a4b5dde82 (4.6ms)
Rendered layouts/_footer.html.erb (7.4ms)
Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
[...]
How can I tell sed to give me the following output for that given example?
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:39:32 -0800; Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
Started GET "/staff/sign_in" for 22.22.22.22 at 2014-02-16 03:59:32 -0800; Completed 200 OK in 527ms (Views: 445.2ms | ActiveRecord: 1.0ms)
This might work for you (GNU sed):
sed -n '/Started GET/{h;d};/Completed 200 OK/{H;g;s/\n/; /p}' file
Better syntax:
sed -n '/Started GET/{h;d;};/Completed 200 OK/{H;g;s/\n/; /p;}' file

Sending form via POST results in broken GET request on webserver (405)

Situation:
Sending a HTML form (method POST) results sporadically in broken GET request on webserver (405). In the browser is displayed "Method now allowed" (405).
The incorrect string in front of the wrong GET method looks like some form variables . For example checkout=Weiter+%3E%3E, which is the value-attribute of the submit button (Weiter >>).
WA Log entries:
"egoryID=vvXAqAFS1FIAAAFA6CQIDsbGGET /is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1268826926?JumpTarget=ViewRequisitionCheckout-ShowLoginPage HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1268341878?JumpTarget=ViewRequisition-View" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1016
"t_State=true&processLogin=WeiterGET /is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1269223568?JumpTarget=ViewRequisitionCheckout-ManageAddresses HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1268974319?JumpTarget=ViewRequisitionCheckout-ShowLoginPage" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1309
"ipTo=true&checkout=Weiter+%3E%3EGET /is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1270218168?JumpTarget=ViewRequisitionCheckout-ManageAddresses HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1269355351?JumpTarget=ViewRequisitionCheckout-ManageAddresses" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1223
"KpsHpPhxqCtk&apply=Weiter+%3E%3EGET /is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1271422613?JumpTarget=ViewRequisitionCheckoutPayment-Edit HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/1270749634?JumpTarget=ViewRequisitionCheckoutPayment-Edit" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1132
"egoryID=KzfAqAFSLHUAAAFARCQIDsbGGET /is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/499191000?JumpTarget=ViewRequisitionCheckout-ShowLoginPage HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewData-Start/499170102?JumpTarget=ViewRequisition-View" "Mozilla/5.0 (iPad; CPU OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1072
"rue&checkout=Bestellung+absendenGET /is-bin/static/WFS/XYZ-DE-Site/-/de_DE/jscript/snippets/catalog/LeftPanelCatalog.js HTTP/1.1" 405 92 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewRequisitionCheckoutFinish-Dispatch" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 1086
"ponent=&addList=In+den+WarenkorbGET /is-bin/static/WFS/XYZ-DE-Site/-/de_DE/images/ajax_loader_bg_white.png HTTP/1.1" 405 84 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewDirectRequisition-List" "Mozilla/5.0 (iPad; CPU OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A501 Safari/9537.53" 813
"4E45&QuantityString=1&Position=1GET /is-bin/static/WFS/XYZ-DE-Site/-/de_DE/images/ajax_loader.gif HTTP/1.1" 405 84 "https://www.XYZ.de/is-bin/WFS/XYZ-DE-Site/de_DE/-/EUR/ViewDirectRequisition-List" "Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25" 740
[...]
WA Log analysis:
User Agent
mostly mobile Safari (Version 6.0, 5.1.1)
-- iPad und iPhone (Apple iOS; Version 7.0.x, 6.x)
sometimes desktop Safari (Version 5.1.1)
-- Macintosh (Mac OS X; Version 10.6.8)
different pages
apparently only HTTPS (SSL)
Question
What causes this behavior?

GWTTestCase timeout in Jenkins

When I kick off a Jenkins job that runs a GWTTestCase through Maven, the job fails randomly with the same exception. I can't reproduce this behavior in Eclipse. Below is the exception.
<testcase time="300.144" classname="client.gdo.model.impl.GwtTestLevelOfDetailDefinition" name="testGetAttributesAfterJsonParse">
<error message="The browser did not complete the test method WebFrameworkClientTest.JUnit:client.gdo.model.impl.GwtTestLevelOfDetailDefinition.testGetAttributesAfterJsonParse in 300000ms.
We have no results from:
161.134.22.175 / Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
Actual time elapsed: 300.144 seconds.
Try increasing this timeout using the &apos;-testMethodTimeout minutes&apos; option
" type="com.google.gwt.junit.client.TimeoutException">com.google.gwt.junit.client.TimeoutException: The browser did not complete the test method WebFrameworkClientTest.JUnit:client.gdo.model.impl.GwtTestLevelOfDetailDefinition.testGetAttributesAfterJsonParse in 300000ms.
We have no results from:
161.134.22.175 / Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
Actual time elapsed: 300.144 seconds.
Try increasing this timeout using the &apos;-testMethodTimeout minutes&apos; option
at com.google.gwt.junit.JUnitShell.notDone(JUnitShell.java:1031)
at com.google.gwt.junit.JUnitShell.runTestImpl(JUnitShell.java:1381)
at com.google.gwt.junit.JUnitShell.runTestImpl(JUnitShell.java:1309)
at com.google.gwt.junit.JUnitShell.runTest(JUnitShell.java:653)
at com.google.gwt.junit.client.GWTTestCase.runTest(GWTTestCase.java:441)
I believe this is because depending of your architecture, GWTTestCase can be very long to execute.
Try augmenting the timeout as indicated in the error log.

What is "useragent" in codeigniter email library's configuration?

I can see that the codeigniter's email library's useragent is changeable as the document says.
But what is really the useragent and what does it do for us?
I'm not sure about the email library's case, but in most cases the useragent is the identifier of what the user "used" to make a request in the way of the browser type/version, OS type/version, etc.
This information is commonly used or contained cookies.
Here are examples of useragents:
Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:50
Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.12011-10-16 20:23:00
Mozilla/5.0 (Linux; U; Android 2.3.3; en-au; GT-I9100 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.12011-10-16 20:22:55
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 1.1.4322)2011-10-16 20:22:33
Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.02011-10-16 20:21:42
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.1 (KHTML, like Gecko)
Chrome/14.0.835.202 Safari/535.12011-10-16 20:21:13
Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en) AppleWebKit/534.1+ (KHTML, like Gecko) Version/6.0.0.337 Mobile Safari/534.1+2011-10-16 20:21:10
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)2011-10-16 20:21:07
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.12011-10-16 20:21:05
Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.34 (KHTML, like Gecko) rekonq Safari/534.342011-10-16 20:21:01
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB6; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; OfficeLiveConnector.1.4; OfficeLivePatch.1.3)2011-10-16 20:20:48
BlackBerry8300/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/107 UP.Link/6.2.3.15.02011-10-16 20:20:17
IE 7 ? Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)2011-10-16 20:20:09
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 SearchToolbar/1.22011-10-16 20:20:07