How to crawl data from encrypted url? - forms

I'm trying to use scrapy to collect the university's professors' contact information from its directory. Since I can't post more than 2 links, I put all links in the following picture.
I set last name equals from the drop-down menu as shown in the picture. Then I search all professors by last name.
Usually, the url will have some pattern from other universities' website. However, for this one, the original url is (1). It becomes (2)when I search 'An' as last name. It seems like 'An' is replaced by something like 529385FD5FF90A198625819E002B8B41? I'm not sure. Is there any way I can get the url that I need to send as a request? I mean, this time I search 'An'. If I search another last name like Lee. It will be another request. They are irregular. I can't find a pattern.

The scraper is not as complex as you think it is. It just makes a POST call from the form and that returns a GET request. Below would work
import scrapy
from scrapy.utils.response import open_in_browser
class univSpider(scrapy.Spider):
name = "univ"
start_urls = ["http://appl103.lsu.edu/dir003.nsf/(NoteID)/5903C096337C2AA28625819E0038E3E4?OpenDocument"]
def parse(self, response):
yield FormRequest.from_response(response, formname="_DIRVNAM", formdata={"LastName": "Lalwani"},callback = self.search_result)
def search_result(self, response):
open_in_browser(response)
print(response.body)

Related

VWO Split URL test no redirect link

I'm using VWO to do some split url testing. Is there some query param you can pass in the url so it doesn't get put into the a/b test and doesn't get redirected.
For example lets say I'm running a split URL test on
mywebsite.com/a vs. mywebsite.com/b
I want to give someone a link to mywebsite.com/a without being included in the split URL test, ensuring that they actually get to mywebsite.com/a and not mywebsite.com/b
Is there some query param or some other way I can ensure this?
(example: mywebsite.com/a?vwo_testing=false)
You can exclude someone from the test by using a Query Parameter segmentation to exclude the query parameter. You can refer to the article below to create a Custom Visitor Segment for this.
https://vwo.com/knowledge/how-to-define-custom-visitor-segments/
Segmentation would look like -
Query Parameter: vwo_testing IS NOT EQUAL TO false
You can reach out to support#vwo.com in case you have any further query.
In my use case I was able to go to https://vwo.com/opt-out/ and opt out of being redirected. Another option is to add ?vwo_opt_out=1 param to the url.

Call back not being called in scrapy

I am trying out scrapy's rules/link extractors.
The css under "restrict_css" is correct and i can retrieve the links using response.css in scrapy shell, but for some reason when i run this in a spider uner rules and link extractors the parse_product callback function is not called.
rules=(
Rule(LinkExtractor(restrict_css='a.i-next')),
Rule(LinkExtractor(restrict_css='div.product-image-wrapper>a'),
callback='parse_product'),
)
def parse_product(self, response):
print("Print anything for testing")
return
Thanks, any help would be appreciated.
Your start_urls match none of the rules. The first rule is for next page and second rule is for products right? but your start urls doesn't point to products category, just the landing page. You either need start directly from products listing page url, like: http://www.orsay.com/de-de/neuheiten/t-shirts/tops.html
Or add additional rule to find product listing pages.

Gatling / Scala remove Vector from string value in POST request

I'm trying to send a POST request within a Gatling test.
2 values have to be sent, the first one is extracted from my page content, the second one is hardcoded.
My issue is that when i extract a value from my page content, i end up with a string submitted in my POST request but polluted with the "Vector()" wrapper.
Here is my scenario and how my variable is extracted:
val dossier = exec(http("Content creation - Extract vars")
.get("/node/add/dossier")
.check(css("""input[name="form_token"]""", "value").findAll.saveAs("form_token_node"))
.headers(headers_0))
.pause(2)
.exec(http("Content creation")
.post("/node/add/dossier")
.headers(headers_1)
.formParam("form_token", "${form_build_id_node}")
.formParam("form_id", "node_dossier_form")
.check(status.is(303))
)
And here is how the data look like when they are sent in the POST request:
form_token: Vector(HciBSyvuZ14NIj9HHuebgHYc06gL62B0iKAQ-E-KhvA)
form_id: node_dossier_form
As you can the the form_token variable should not look like this at all, it's breaking the form submission for a unvalid reason.
So my question is, how do i get ride of the Vector() part of the string?
And the answer is use ${form_build_id_node(0)} instead of ${form_build_id_node} to access to the value. Thanks to sschaef.
Here the issue is at saving the attribute.
you have used .findAll.saveAs - Which will save as list taking all the occurrences
If you want to pass only the first occurrences, then it should be
.check(css("""input[name="form_token"]""","value").saveAs("form_token_node"))
instead of
.check(css("""input[name="form_token"]""","value").findAll.saveAs("form_token_node"))
if your going to use foreach or repeat to get more values then you can .findAll.saveAs list and create a logic to iterate the session attribute
${form_build_id_node(i)} in your scenario

Play Framework sending POST request data to GET

How would I send request data from a POST request to a GET request using Scala Play?(Using Play Framework 2.1.1)
My goal is to have a single page "Reports" that works like this: The report is a GET request, so if needed you could bookmark this report. The report consists of a table of models, and each model row has a delete button at the end. I want to click the delete button, have it post the id to my controller then reload the page with the reports filter parameters still on.
Currently I have the delete button just adding to the get, which works correctly but the remove=id parameter stays in the request/address bar. Therefore it tries to delete this model every time the page is reloaded. What I would like to do is have this form POST and then remove the model, then send all the request parameters other than remove to a GET request.
I would rather do this without javascript/AJAX.
You could reconstruct a URL through using queryString and path from the request object. Then redirect that back (without the delete parameter)
How to get query string parameters in java play framework?
Or if you have the call setup in the routes file to parse out, use the reverse route minus the delete parameter.
Play Framework - Redirect with params
http://www.mariussoutier.com/blog/2012/12/10/playframework-routes-part-1-basics/
def index() = Action { request =>
import play.api.Play.current
println(request.queryString)
val allWithoutDel = request.queryString - "del" //del is the query parameter
println(allWithoutDel)
val url = request.path + // fold or map the allWithoutDel down to a URL string again
redirect(url)
}

What is a REST response, what does it do?

I had made a REST webservice using redirecting to various paths like if i need to delete some user then i will redirect the user to this address in the #Path annotation :
user/delete
and therefore there is no thing like RESPONSE i have used.
While going through a code given to me by my senior i came accross these lines :
java.net.URI uri = uriInfo.getAbsolutePathBuilder().path(id).build();
Response.created(uri).build();
What are these lines doing, i have no idea.
Can someone explain me this w/o wiki links or any other 'Basic Rest Service' links.
Without any explicit details about the uriInfo object I can only speculate its type is the JAX-RS UriInfo class.
The first line can be broken down as below:
java.net.URI uri = uriInfo.getAbsolutePathBuilder().path(id).build();
The getAbsolutePathBuilder is documented http://jackson.codehaus.org/javadoc/jax-rs/1.0/javax/ws/rs/core/UriInfo.html#getAbsolutePathBuilder%28%29
java.net.URI uri = uriInfo.getAbsolutePathBuilder().path(id).build();
The method returns a UriBuilder object. On which the 'path(...)' method is called passing the id so if the absolute path returned http://www.host.com (this may or may not have a port number) adding the id in this method will then result in the effectively Builder holding the two parts. The base URI and the path. The two values have not yet been put together
The build method then concatenates the two values resulting a full URI. For example http://www.google.com/id (Where http://www.google.com is the absolute path)
The second line
Response.created(uri).build();
Is basically saying 'Respond with a created (201) response code, and set a Location header containing the build uri value'