I am trying out scrapy's rules/link extractors.
The css under "restrict_css" is correct and i can retrieve the links using response.css in scrapy shell, but for some reason when i run this in a spider uner rules and link extractors the parse_product callback function is not called.
rules=(
Rule(LinkExtractor(restrict_css='a.i-next')),
Rule(LinkExtractor(restrict_css='div.product-image-wrapper>a'),
callback='parse_product'),
)
def parse_product(self, response):
print("Print anything for testing")
return
Thanks, any help would be appreciated.
Your start_urls match none of the rules. The first rule is for next page and second rule is for products right? but your start urls doesn't point to products category, just the landing page. You either need start directly from products listing page url, like: http://www.orsay.com/de-de/neuheiten/t-shirts/tops.html
Or add additional rule to find product listing pages.
Related
I'm using VWO to do some split url testing. Is there some query param you can pass in the url so it doesn't get put into the a/b test and doesn't get redirected.
For example lets say I'm running a split URL test on
mywebsite.com/a vs. mywebsite.com/b
I want to give someone a link to mywebsite.com/a without being included in the split URL test, ensuring that they actually get to mywebsite.com/a and not mywebsite.com/b
Is there some query param or some other way I can ensure this?
(example: mywebsite.com/a?vwo_testing=false)
You can exclude someone from the test by using a Query Parameter segmentation to exclude the query parameter. You can refer to the article below to create a Custom Visitor Segment for this.
https://vwo.com/knowledge/how-to-define-custom-visitor-segments/
Segmentation would look like -
Query Parameter: vwo_testing IS NOT EQUAL TO false
You can reach out to support#vwo.com in case you have any further query.
In my use case I was able to go to https://vwo.com/opt-out/ and opt out of being redirected. Another option is to add ?vwo_opt_out=1 param to the url.
I'm trying to use scrapy to collect the university's professors' contact information from its directory. Since I can't post more than 2 links, I put all links in the following picture.
I set last name equals from the drop-down menu as shown in the picture. Then I search all professors by last name.
Usually, the url will have some pattern from other universities' website. However, for this one, the original url is (1). It becomes (2)when I search 'An' as last name. It seems like 'An' is replaced by something like 529385FD5FF90A198625819E002B8B41? I'm not sure. Is there any way I can get the url that I need to send as a request? I mean, this time I search 'An'. If I search another last name like Lee. It will be another request. They are irregular. I can't find a pattern.
The scraper is not as complex as you think it is. It just makes a POST call from the form and that returns a GET request. Below would work
import scrapy
from scrapy.utils.response import open_in_browser
class univSpider(scrapy.Spider):
name = "univ"
start_urls = ["http://appl103.lsu.edu/dir003.nsf/(NoteID)/5903C096337C2AA28625819E0038E3E4?OpenDocument"]
def parse(self, response):
yield FormRequest.from_response(response, formname="_DIRVNAM", formdata={"LastName": "Lalwani"},callback = self.search_result)
def search_result(self, response):
open_in_browser(response)
print(response.body)
I'd like to know how to change pages by putting some in Forms by Post way.
http://www.aldoshoes.com/us/women/shoes/flats
in this page
How do I move to page 2 by putting some variabels in URL?
Please help me out.
This is not possible because it seems to me that the server does not handle GET parameters in the http request.
However it is possible to achieve this using POST parameters, either programmatically or using a plug-in for Chrome/Firefox (see Super User question).
The server accepts the following parameters:
formAction:
startRow:1
rowsperPage:12
flagID:
seasonID:
dimensionID:
sizeID:
colorID:
colorGroupID:
itemCategoryLevel1:1100
itemCategoryLevel2:1101
itemCategoryLevel3:1120
itemCategoryLevel4:
styleID:
styleDescription:
keywords:
alternateDescription:
descriptionDetail1:
descriptionDetail2:
descriptionDetail3:
onsale:N
sortBy:
itemFamilyID:
heelHeight:
To get to the second page of the shop you can set the parameter startRow to 13:
startRow:13
rowsperPage:12
Action create shows form:
def create = Action {
Ok(html.post.create(postForm))
}
How can i modify this action so that for GET request it would give out form and for the POST request it would process user input data, as if it were a separate action:
def newPost = Action { implicit request =>
postForm.bindFromRequest.fold(
errors => BadRequest(views.html.index(Posts.all(), errors)),
label => {
Posts.create(label)
Redirect(routes.Application.posts)
}
)
}
Wthat i mean is i want to combine this two actions.
UPDATE1: I want a single Action that serves GET and POST requests
It is recommended not to merge both actions, but modify routes to get the behavior you are expecting. For instance:
GET /create controllers.Posts.create
POST /create controllers.Posts.newPost
In case you have several kind of resources (post and comments, for instance), just add
a prefix to the path to disambiguate:
GET /post/create controllers.Posts.create
POST /post/create controllers.Posts.newPost
GET /comment/create controllers.Comments.create
POST /comment/create controllers.Comments.newComment
I tried once to accomplish similar thing, but I realized that I wasn't using framework like it was meant to be used. Use separate GET and POST methods like #paradigmatic showed and in cases like you specified "If we take adding comments to another action, we wouldn't be able to get infomation on post and comments in case an error occured (avoding copy-paste code)." - just render the page at the end of controller method with the view you like? and for errors etc. you can always use flash scope too? http://www.playframework.org/documentation/2.0.2/ScalaSessionFlash you could also render this form page with two or more beans and send them to controller side to catch related error messages and data.?
I am developing a website using zend framework.
i have a search form with get method. when the user clicks submit button the query string appears in the url after ? mark. but i want it to be zend like url.
is it possible?
As well as the JS approach you can do a redirect back to the preferred URL you want. I.e. let the form submit via GET, then redirect to the ZF routing style.
This is, however, overkill unless you have a really good reason to want to create neat URLs for your search queries. Generally speaking a search form should send a GET query that can be bookmarked. And there's nothing wrong with ?param=val style parameters in a URL :-)
ZF URLs are a little odd in that they force URL parameters to be part of the main URL. I.e. domain.com/controller/action/param/val/param2/val rather than domain.com/controller/action?param=val¶m2=val
This isn't always what you want, but seems to be the way frameworks are going with URL parameters
There is no obvious solution. The form generated by zf will be a standard html one. When submitted from the browser using GET it will result in a request like
/action/specified/in/form?var1=val1&var2=var2
Only solution to get a "zendlike url" (one with / instead of ? or &), would be to hack the form submission using javascript. For example you can listen for onSubmit, abort the submission and instead redirect browser to a translated url. I personally don't believe this solution is worth the added complexity, but it should perform what you're looking for.
After raging against this for a day-and-a-half, and doing my best to figure out the right way to do this fairly simple this, I gave up and did the following. I still can't believe there's not a better way.
The use case that necessitates this is a simple record listing, with a form up top for adding some filters (via GET), maybe some column sorting, and Zend_Paginate thrown in for good measure. I ran into issues using the Url view helper in my pagination partial, but I suspect with even just sorting and a filter-form, Zend_View_Helper_Url would still fall down.
But I digress. My solution was to add a method to my base controller class that merges any raw query-string parameters with the existing zend-style slashy-params, and redirects (but only if necessary). The method can be called in any action that doesn't have to handle POSTs.
Hopefully someone will find this useful. Or even better, find a better way:
/**
* Translate standard URL parameters (?foo=bar&baz=bork) to zend-style
* param (foo/bar/baz/bork). Query-string style
* values override existing route-params.
*/
public function mergeQueryString(){
if ($this->getRequest()->isPost()){
throw new Exception("mergeQueryString only works on GET requests.");
}
$q = $this->getRequest()->getQuery();
$p = $this->getRequest()->getParams();
if (empty($q)) {
//there's nothing to do.
return;
}
$action = $p['action'];
$controller = $p['controller'];
$module = $p['module'];
unset($p['action'],$p['controller'],$p['module']);
$params = array_merge($p,$q);
$this->_helper->getHelper('Redirector')
->setCode(301)
->gotoSimple(
$action,
$controller,
$module,
$params);
}