Cypress is being run in CircleCI inside a docker via command:
command: npx cypress run --headless --browser chrome
It prints too much info as well as Cypress tests results as:
base_1 | 127.0.0.1 - - [18/May/2021:19:52:18 +0000] "GET /roof.svg HTTP/1.1" 200 547 "http://
I want to omit these requests logs.
According to cypress docs Paste this in cypress/support/index.js
Cypress.Server.defaults({
delay: 500,
force404: false,
whitelist: (xhr) => {
// handle custom logic for whitelisting
return true;
}
})
I have a package which contains static files I want to reuse among applications. Based on https://webassets.readthedocs.io/en/latest/environment.html#webassets.env.Environment.load_path I came up with the following code snippet, to be used in each application's __init__.py (the shared package is loutilities):
with app.app_context():
# js/css files
asset_env.append_path(app.static_folder)
# os.path.split to get package directory
asset_env.append_path(os.path.join(os.path.split(loutilities.__file__)[0], 'tables-assets', 'static'))
but when ASSETS_DEBUG = False, this causes a ValueError exception for one of the files found in the package. (See https://github.com/louking/rrwebapp/issues/366 for detailed traceback -- this is possibly related to https://github.com/miracle2k/webassets/issues/387).
ValueError: Cannot determine url for /var/www/sandbox.scoretility.com/rrwebapp/lib/python2.7/site-packages/loutilities/tables-assets/static/branding.css
Changed code to use a url parameter which now works fine for ASSETS_DEBUG = False
asset_env.append_path(os.path.join(os.path.split(loutilities.__file__)[0], 'tables-assets', 'static'), '/loutilities')
however now when ASSETS_DEBUG = True, I see that the file failed to load in the javascript console
Failed to load resource: the server responded with a status of 404 (NOT FOUND) branding.css
Have worked around the Catch-22 using the inelegant code as follows, but wondering how to choose the append_path() url parameter which will work for both ASSETS_DEBUG = True or False.
with app.app_context():
# js/css files
asset_env.append_path(app.static_folder)
# os.path.split to get package directory
loutilitiespath = os.path.split(loutilities.__file__)[0]
# kludge: seems like assets debug doesn't like url and no debug insists on it
if app.config['ASSETS_DEBUG']:
url = None
else:
url = '/loutilities'
asset_env.append_path(os.path.join(loutilitiespath, 'tables-assets', 'static'), url)
One solution is to create a route for /loutilities/static, thus
# add loutilities tables-assets for js/css/template loading
# see https://adambard.com/blog/fresh-flask-setup/
# and https://webassets.readthedocs.io/en/latest/environment.html#webassets.env.Environment.load_path
# loutilities.__file__ is __init__.py file inside loutilities; os.path.split gets package directory
loutilitiespath = os.path.join(os.path.split(loutilities.__file__)[0], 'tables-assets', 'static')
#app.route('/loutilities/static/<path:filename>')
def loutilities_static(filename):
return send_from_directory(loutilitiespath, filename)
with app.app_context():
# js/css files
asset_env.append_path(app.static_folder)
asset_env.append_path(loutilitiespath, '/loutilities/static')
It appears that sourceURL is made relative differently in Firefox and Chrome - when some tooling generates //# sourceURL=... strings in JS files that are relative to the file they are placed in, Firefox treats the URL as relative to the JS file, while Chrome instead treats it as relative to the original HTML file. Which is correct, or is there a clearer way to state this?
In this sample application, I'm trying to use sourceURL to allow many, smaller files to be combined into a single large file but still allow the browser to know what the smaller file should be called, and sourceMappingURL to then specify the sourcemap file, relative to that original file.
Directory structure:
index.html
js/
all.js
uncompiled/
app.js
app.js.map
app.min.js
The index.html is a minimal page to load either js/all.js or js/uncompiled/app.min.js. There is no other JS being baked into js/all.js (as this is a minimal example), but in theory there could be many here. The purpose of this file is just to combined the various minified JS files into one larger file, yet still allow the developer to see the original code, and set breakpoints accordingly.
Contents of app.js:
class App {
constructor(name) {
this.name = name;
}
sayHi() {
window.alert("Hello " + this.name);
}
}
new App("Colin").sayHi();
Then, running a simple minifier rebuilds that into app.min.js with a matching app.js.map file:
var App=function(a){this.name=a};App.prototype.sayHi=function(){window.alert("Hello "+this.name)};(new App("Colin")).sayHi();
//# sourceMappingURL=app.js.map
{
"version":3,
"file":"./app.min.js",
"lineCount":1,
"mappings":"AAAA,IAAMA,IAELC,QAAW,CAACC,CAAD,CAAO,CACjB,IAAAA,KAAA,CAAYA,CADK,CAIlB,IAAA,UAAA,MAAAC,CAAAA,QAAK,EAAG,CACPC,MAAAC,MAAA,CAAa,QAAb,CAAwB,IAAAH,KAAxB,CADO,CAKTC,EAAA,IAAIH,GAAJ,CAAQ,OAAR,CAAAG,OAAA;",
"sources":["app.js"],
"names":["App","constructor","name","sayHi","window","alert"]
}
And finally, that minified output is wrapped in eval, and the sourceURL param is added to the end (line breaks added for readability):
eval('var App=function(a){this.name=a};App.prototype.sayHi=function
(){window.alert("Hello "+this.name)};(new App("Colin")).sayHi();\n
//# sourceMappingURL=app.js.map\n//# sourceURL=uncompiled/app.min.js');
If the index.html directly points to js/uncompiled/app.min.js, then both Firefox and Chrome correctly understand that app.js.map is in the same directory, and should be used when debugging. However, if index.html points instead to js/all.js, then while both browsers correctly show the eval'd contents in an individual file, only Firefox makes the path relative to all.js.
Using python -m http.server on this structure shows these results in firefox:
127.0.0.1 - - [14/Jun/2019 08:33:37] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [14/Jun/2019 08:33:37] "GET /js/all.js HTTP/1.1" 200 -
127.0.0.1 - - [14/Jun/2019 08:33:38] code 404, message File not found
127.0.0.1 - - [14/Jun/2019 08:33:38] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [14/Jun/2019 08:33:41] "GET /js/uncompiled/app.js.map HTTP/1.1" 200 -
127.0.0.1 - - [14/Jun/2019 08:33:41] "GET /js/uncompiled/app.js HTTP/1.1" 200 -
On the other hand, here is what Chrome attempts:
127.0.0.1 - - [14/Jun/2019 08:34:22] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [14/Jun/2019 08:34:22] "GET /js/all.js HTTP/1.1" 200 -
127.0.0.1 - - [14/Jun/2019 08:34:22] code 404, message File not found
127.0.0.1 - - [14/Jun/2019 08:34:22] "GET /uncompiled/app.js.map HTTP/1.1" 404 -
127.0.0.1 - - [14/Jun/2019 08:34:23] code 404, message File not found
127.0.0.1 - - [14/Jun/2019 08:34:23] "GET /favicon.ico HTTP/1.1" 404 -
Chrome appears to be assuming that the sourceURL within js/app.js is relative to index.html, while Firefox instead (correctly, from my perspective) interprets it as being relative to app.js. I suggest that Firefox is correct since this permits any HTML file to include that JS, at any path, and still have the sourcemaps loaded correctly.
Example sources, including two html files at different relative paths: https://github.com/niloc132/sourceurl-and-sourcemapping-url-relative-paths
From the spec (or the copy at https://sourcemaps.info/spec.html):
When the source mapping URL is not absolute, then it is relative to the generated code’s “source origin”. The source origin is determined by one of the following cases:
If the generated source is not associated with a script element that has a “src” attribute and there exists a //# sourceURL comment in the generated code, that comment should be used to determine the source origin. Note: Previously, this was “//# sourceURL”, as with “//# sourceMappingURL”, it is reasonable to accept both but //# is preferred.
If the generated code is associated with a script element and the script element has a “src” attribute, the “src” attribute of the script element will be the source origin.
If the generated code is associated with a script element and the script element does not have a “src” attribute, then the source origin will be the page’s origin.
If the generated code is being evaluated as a string with the eval() function or via new Function(), then the source origin will be the page’s origin.
In the case of js/all.js, it falls in the last case: the source origin will be the page's origin. So it would appears that Chrome is following the spec, even though that might seem counter-intuitive.
Develop Environment:
CentOS7
pip 18.1
Docker version 18.09.3, build 774a1f4
anaconda Command line client (version 1.7.2)
Python3.7
Scrapy 1.6.0
scrapy-splash
MongoDB(db version v4.0.6)
PyCharm
Server Specs:
CPU ->
processor: 22,
vendor_id: GenuineIntel,
cpu family: 6,
model: 45,
model name: Intel(R) Xeon(R) CPU E5-2430 0 # 2.20GHz
RAM -> Mem: 31960
64 bit
Hello.
I'm a php developer, and this is my first python project. I'm trying to use python because I heard that python has many benefits for web crawling.
I'm crawling one dynamic web site, and I need to crawl around 3,500 pages in every 5-15 seconds. For now, mine is too slow. It is crawl only 200 pages per minute.
My source is like this:
main.py
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from spiders.bot1 import Bot1Spider
from spiders.bot2 import Bot2Spider
from spiders.bot3 import Bot3Spider
from spiders.bot4 import Bot4Spider
from pprint import pprint
process = CrawlerProcess(get_project_settings())
process.crawl(Oddsbot1Spider)
process.crawl(Oddsbot2Spider)
process.crawl(Oddsbot3Spider)
process.crawl(Oddsbot4Spider)
process.start()
bot1.py
import scrapy
import datetime
import math
from scrapy_splash import SplashRequest
from pymongo import MongoClient
from pprint import pprint
class Bot1Spider(scrapy.Spider):
name = 'bot1'
client = MongoClient('localhost', 27017)
db = client.db
def start_requests(self):
count = int(self.db.games.find().count())
num = math.floor(count*0.25)
start_urls = self.db.games.find().limit(num-1)
for url in start_urls:
full_url = domain + list(url.values())[5]
yield SplashRequest(full_url, self.parse, args={'wait': 0.1}, meta={'oid': list(url.values())[0]})
def parse(self, response):
pass
settings.py
BOT_NAME = 'crawler'
SPIDER_MODULES = ['crawler.spiders']
NEWSPIDER_MODULE = 'crawler.spiders'
# Scrapy Configuration
SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'my-project-name (www.my.domain)'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 64
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 16
When execute these code, I'm using this command: python main.py
After seeing my code, please help me. I'll happily listen any saying.
1.How my spider can faster? I've tried to use threading, but it's seem not working right.
2.What is the best performance line up for web crawling?
3.Is that possible to crawl 3500 dynamic pages in every 5-15 seconds?
Thank you.
I am trying to connect with ejabberd server using strophe.js, but I got the following error:
POST http://localhost/http-bind/ 404 (Not Found)
Strophe.Bosh._processRequest.sendFunc # strophe.js:4614
Strophe.Bosh._processRequest # strophe.js:4626
Strophe.Bosh._throttledRequestHandler # strophe.js:4778
Strophe.Bosh._connect # strophe.js:4177Strophe.Connection.connect # strophe.js:2335
$scope.login # app.js:162
fn # VM165:4
Ic.(anonymous function).compile.d.on.f # angular.js:23411
$get.n.$eval # angular.js:15916
$get.n.$apply # angular.js:16016
(anonymous function) # angular.js:23416
n.event.dispatch # jquery-2.1.3.min.js:3
n.event.add.r.handle # jquery-2.1.3.min.js:3
strophe.js:2784 7
I have found that Skype some time use port 80, that was assigned to Apache server.This think may prevent the http-binding.After quit Skype, It works fine.