How to use css selector and xpath extractor in locust? - locust

I have earlier used jmeter css selector and XPath extractor post processor to retrieve Csrf token.
Is there anyway to use these in locust as well
I want to fetch from value attribute

You can use LXML FOR XPath
from lxml import html
...
#task
my_task(self):
response = self.client.post(...)
tree = html.fromstring(response.text)
# <div title="buyer-name">Carson Busses</div>
# <span class="item-price">$29.95</span>
buyers = tree.xpath('//div[#title="buyer-name"]/text()')
(example from https://docs.python-guide.org/scenarios/scrape/)
Another option is to use pythons built in HTML parser, but that is probably more complicated for this use case (https://docs.python.org/3/library/html.parser.html)
You could also use a regular expression for a quick and dirty solution:
import re
message_regex = re.compile(r"...") # use a regex that actually matches your desired text
...
#task
my_task(self):
response = self.client.post(...)
m = message_regex.match(response.text)
my_value = m.group(1)

I have used pyquery to fetch value attribute using css selector and xpath
First we need to import pyquery
from pyquery import PyQuery as pq
for css selector
response = self.client.get("/edtwrkkd?id=251612")
tree=pq(response.text)
Test=tree("#formTest")
self.Token=Test("input[name='__RequestVerificationToken']").val()
for xpath
Test =tree("form[method='post']")
self.Token=Test("input[name='__RequestVerificationToken']").val()

Related

How to writeback to dataframe using transform_df in palantir foundry?

I created a library for updating description of the columns of the input dataset. This function takes three parameter as input (input_dataset, output_dataset, config file) and eventually writes back the description of output dataset. So now we want to import this library across various use cases. How to go for those cases where we are writing spark transformation i.e taking inputs through transform_df because here we can't assign output to output variable. In that situation how can i call my description library function? How to proceed in those situation in palantir foundry. Any suggestions?
This method isn't currently supported using the #transform_df decorator; you'll have to use the #transform decorator at the moment.
The reasoning behind this resulted from recognizing the need for broader access to metadata APIs like the #transform decorator already allows. Thus it seemed more in line with this pattern to keep it there since the #transform_df decorator is inherently higher-level.
You can always simply move over your transformations from...
from transforms.api import transform_df, Input, Output
#transform_df(
Output("/my/output"),
my_input("/my/input"),
)
def my_compute_function(my_input):
df = my_input
# ... logic ....
return my_input
...to...
from transforms.api import transform, Input, Output
#transform(
my_output=Output("/my/output"),
my_input=Input("/my/input")
)
def my_compute_function(my_input, my_output):
df = my_input.dataframe()
# ... logic ....
my_output.write_dataframe(df)
...in which only 6 lines of code need be changed.

parse xml with multiple rowtags using spark

I want to parser xml using spark so I am using spark databricks lib. sample xml is as follows:
<Transactions>
<Transaction>
<transid>1111</transid>
</Transaction>
<Transaction>
<transid>2222</transid>
</Transaction>
</Transactions>
<Payments>
<Payment>
<Id>123</Id>
</Payment>
<Payment>
<Id>456</Id>
</Payment>
</Payments>
code to parse:
val transNestedDF = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Transactions").load("trans_nested.xml")
transNestedDF.registerTempTable("TransNestedTbl")
sqlContext.sql("select Transaction[0].transid from TransNestedTbl").collect()
Here I don't have any root tag also I can't define multiple row tags so if I have to process both transactions and payments in single read using above single dataframe then how to achieve that?
need help.
Let's try this with lxml, a python library, which itself uses xpath:
If you don't have it installed, you need to:
pip intall lxml
then:
import lxml.html
pay = """ [your code above] """
doc = lxml.html.fromstring(pay)
tid =doc.xpath('Transactions//transid'.lower()) #or ('//Transactions//transid'.lower()) depending on the structure of the original doc
pid = doc.xpath('Payments//id'.lower()) #same comment
final = ''
for i in tid:
for p in pid:
final = final+i.text+'|'+p.text+' \n'
print(final)
Output:
1111|123
1111|456
2222|123
2222|456
You can't do it in one read, if there is no tag around both of these. If there is any common parent tag, you can use that as rowTag and ignore the rest that is parsed.
You can of course read them separately into two DataFrames. That works fine if you treat them separately. But you lose the association between transactions and payments, unless you can join on some ID.
But then I'd wonder why the XML structure doesn't have any common parent if these are associated.

Programmatically return a list of all functions

I want to programmatically get a list of available functions in the current MATLAB namespace, as well the available functions in a package. How can this be done?
We can use package metadata for this:
pkgs = meta.package.getAllPackages();
% Or if the specific package name is known:
mp = meta.package.fromName('matlab')
The cell array returned in the 1st case, pkgs, contains objects such as this:
package with properties:
Name: 'Sldv'
Description: ''
DetailedDescription: ''
ClassList: [29×1 meta.class]
FunctionList: [8×1 meta.method]
PackageList: [9×1 meta.package]
ContainingPackage: [0×0 meta.package]
So all that's left to do is iterate through the packages and sub-packages tand collect their FunctionList entries.
I'm not sure how to get the functions that belong to the "default" namespace, other than by parsing the function list doc page, for example using the Python API and BeautifulSoup:
fl = arrayfun(#(x)string(x{1}.string.char), py.bs4.BeautifulSoup( ...
fileread(fullfile(docroot,'matlab','functionlist-alpha.html')), ...
'html.parser').find_all("code")).';
Further to Dev-iL's answer, parsing the function list documentation web page is pretty easy because of the useful "function" class that the web devs have (currently) used to tag each function name with! Each function looks like this within the HTML:
<code class="function">accumarray</code>
So we can use urlread to grab the source, and regular expressions to strip out the inner text of each "function" class item:
str = urlread('https://mathworks.com/help/matlab/functionlist-alpha.html');
funcs = regexp( str, '(?<="function">)[0-9A-Za-z.]+', 'match' );
Note: "alpha" in the URL is for "alphabetical" rather than to denote early testing!
funcs is a cell array with all the function names on that page.
The page used above is for the most recent MATLAB version. For a specific version, use the historic documentation pages structured like so:
https://mathworks.com/help/releases/R2017b/matlab/functionlist.html

XPATH - /a/text(), cant extract email address (text)

I have simple HTML file with usernames and links to their sub-pages:
someUserName#domain.com
someUserName
I use
xpath('.//a/text()').extract_first()
to extract user name in plain text.
I have a problem when user specifies username in form of email (see first example) - empty object in returned in such case.
Edit: I have just noticed html has changed recently and I haven't rechecked:
<td><span class="__cf_email__" data-cfemail="3f4d565c544c5e514bwer4rwre58525e5653115c5052">[email protected]</span></td>
I'll extract from #href.
I have used the following code:-
import scrapy
inputString = '''<xmlData>
someUserName#domain.com
someUserName
</xmlData>'''
print scrapy.selector.Selector(text=inputString).xpath('.//a/text()').extract_first()
Output:-
someUserName#domain.com
Can you paste full python code? Because, xpath code seems working fine as:-
scrapy.selector.Selector(text=inputString).xpath('.//a/text()').extract_first()
Getting the text node children of an element (using text()) is generally discouraged, for exactly the reasons demonstrated here. With <a>content</a> you will get "content", with <a><span>content</span><a> you will get nothing, with <a>h<sub>2</sub>o</a> you will get two text nodes, "h" and "o".
Use string() to get the string value instead. The string value contains the concatenated content of all the descendant text nodes at any depth. ("content", "content", and "h2o" in these three examples).
Only reservation is that I don't know the Scrapy API so I don't know how it handles XPath expressions that return strings rather than nodes.

Gatling. Check, if a HTML result contains some string

Programming Gatling performance test I need to check, if the HTML returned from server contains a predefined string. It it does, break the test with an error.
I did not find out how to do it. It must be something like this:
val scn = scenario("CheckAccess")
.exec(http("request_0")
.get("/")
.headers(headers_0)
.check(css("h1").contains("Access denied")).breakOnFailure()
)
I called the wished features "contains" and "breakOnFailure". Does Gatling something similar?
Better solutions:
with one single CSS selector:
.check(css("h1:contains('Access denied')").notExists)
with substring:
.check(substring("Access denied").notExists)
Note: if what you're looking for only occurs at one place in your response payload, substring is sure more efficient, as it doesn't have to parse it into a DOM.
Here ist the solution
.check(css("h1").transform((s: String) => s.indexOf("Access denied"))
.greaterThan(-1)).exitHereIfFailed
You can write it very simple like:
.check(css("h1", "Access denied").notExists)
If you are not sure about H1 you can use:
.check(substring("Access denied").notExists)
IMO server should respond with proper status, thus:
.check(status.not(403))
Enjoy and see http://gatling.io/docs/2.1.7/http/http_check.html for details
EDIT:
My usage of CSS selector is wrong see Stephane Landelle solution with CSS.
I'm using substring way most of the time :)