PyTest Mark some parameters as slow but not others [duplicate] - pytest

I have been trying to parameterize my tests using #pytest.mark.parametrize, and I have a marketer #pytest.mark.test("1234"), I use the value from the test marker to do post the results to JIRA. Note the value given for the marker changes for every test_data. Essentially the code looks something like below.
#pytest.mark.foo
#pytest.mark.parametrize(("n", "expected"),[
(1, 2),
(2, 3)])
def test_increment(n, expected):
assert n + 1 == expected
I want to do something like
#pytest.mark.foo
#pytest.mark.parametrize(("n", "expected"), [
(1, 2,#pytest.mark.test("T1")),
(2, 3,#pytest.mark.test("T2"))
])
How to add the marker when using parameterized tests given that the value of the marker is expected to change with each test?

It's explained here in the documentation: https://docs.pytest.org/en/stable/example/markers.html#marking-individual-tests-when-using-parametrize
To show it here as well, it'd be:
#pytest.mark.foo
#pytest.mark.parametrize(("n", "expected"), [
pytest.param(1, 2, marks=pytest.mark.T1),
pytest.param(2, 3, marks=pytest.mark.T2),
(4, 5)
])

Related

Get networkx filtered nodes from subgraph_view of filtered edges

I've created a subgraph_view by applying a filter to edges. When I call nodes() on the subgraph it still shows me all nodes, even if none of the edges use them. I need to get a list of only nodes that are still part of the subgraph.
G = nx.path_graph(6)
G[2][3]["cross_me"] = False
G[3][4]["cross_me"] = False
def filter_edge(n1, n2):
return G[n1][n2].get("cross_me", True)
view = nx.subgraph_view(G, filter_edge=filter_edge)
# node 3 is no longer used by any edges in the subgraph
view.edges()
This produces
EdgeView([(0, 1), (1, 2), (4, 5)])
as expected. However, when I run view.nodes() I get
NodeView((0, 1, 2, 3, 4, 5))
What I expect to see is
NodeView((0, 1, 2, 4, 5))
This seems odd. Is there some way to extract only the nodes used by the subgraph?
The confusion stems from the definition of 'graph.' A disconnected node is still a part of a graph. In fact, you could have a graph with no edges at all. So the behavior of subgraph_view() is counterintuitive but correct.
If, however, you still want to achieve what you're describing, there are lots of potential ways, depending on your tolerance for modifying the original graph. I'll mention two that attempt to stay as close to your current method as possible and avoid deleting edges or nodes from G.
Method 1
The easiest way using your view object is to take it as input to edge_subgraph() (which only takes edges as input) like this:
final_view = view.edge_subgraph(view.edges())
final_view.nodes()
gives
NodeView((0, 1, 2, 4, 5))
Method 2
To me, Method 1 seems clunky and confusing by defining an intermediate view. If instead we go back up a little bit and start with G, we could define a filter_node function that checks the edge attributes of each node and filters that node if
all edges are flagged for removal, or
the node has no edges in the first place.
You could also do this by manually flagging the node itself, as you've done with the edges.
G = nx.path_graph(6)
G[2][3]["cross_me"] = False
G[3][4]["cross_me"] = False
def filter_edge(n1, n2):
return G[n1][n2].get("cross_me", True)
def filter_node(n):
return sum([i[2].get("cross_me", True) for i in G.edges(n, data=True)])
view = nx.subgraph_view(G, filter_node=filter_node, filter_edge=filter_edge)
view.nodes()
also gives the expected
NodeView((0, 1, 2, 4, 5))

Resue previous example in pytest parametrize

Consider a parametrized pytest test, which reuses the same complex
example a number of times. To keep the sample code as simple as possible,
I have simulated the 'complex' examples with very long integers.
from operator import add
from pytest import mark
parm = mark.parametrize
#parm(' left, right, result',
((9127384955, 1, 9127384956),
(9127384955, 2, 9127384957),
(9127384955, 3, 9127384958),
(9127384955, 4, 9127384959),
(4729336234, 1, 4729336235),
(4729336234, 2, 4729336236),
(4729336234, 3, 4729336237),
(4729336234, 4, 4729336238),
))
def test_one(left, right, result):
assert add(left, right) == result
The first four (and the next four) examples use exactly the same value for left but:
I have to read the examples carefully to realize this
This repetition is verbose
I would like to make it absolutely clear that exactly the same
example is being reused and save myself the need to repeat the same example many times. (Of course, I could bind the example to a global variable, and use that variable, but that variable would have to be bound at some distant point outside of my collection of examples, and I want to see the actual example in the context in which it is used (i.e. near to the other values used in this particular set) rather than having to look for it elsewhere.
Here is an implementation which allows me to make this explicit using a syntax that I find perfectly acceptable, but the implementation itself is horrible: it uses a global variable and doesn't stand a chance of working with distributed test execution.
class idem: pass
#parm(' left, right, result',
((9127384955, 1, 9127384956),
( idem , 2, 9127384957),
( idem , 3, 9127384958),
( idem , 4, 9127384959),
(4729336234, 1, 4729336235),
( idem , 2, 4729336236),
( idem , 3, 4729336237),
( idem , 4, 4729336238),
))
def test_two(left, right, result):
global previous_left
if left is idem: left = previous_left
else : previous_left = left
assert add(left, right) == result
How can this idea be implemented in a more robust way? Is there some feature built in to pytest that could help?

How can I conditionally skip a parameterized pytest scenario?

I need to flag certain tests to be skipped. However, some of the tests are parameterized and I need to be able to skip only certain of the scenarios.
I invoke the test using py.test -m "hermes_only" or py.test -m "not hermes_only" as appropriate.
Simple testcases are marked using:
#pytest.mark.hermes_only
def test_blah_with_hermes(self):
However, I have some parameterized tests:
outfile_scenarios = [('buildHermes'),
('buildTrinity')]
#pytest.mark.parametrize('prefix', outfile_scenarios)
def test_blah_build(self, prefix):
self._activator(prefix=prefix)
I would like a mechanism to filter the scenario list or otherwise skip certain tests if a pytest mark is defined.
More generally, how can I test for the definition of a pytest mark?
Thank you.
A nice solution from the documentation is this:
import pytest
#pytest.mark.parametrize(
("n", "expected"),
[
(1, 2),
pytest.param(1, 0, marks=pytest.mark.xfail),
pytest.param(1, 3, marks=pytest.mark.xfail(reason="some bug")),
(2, 3),
(3, 4),
(4, 5),
pytest.param(
10, 11, marks=pytest.mark.skipif(sys.version_info >= (3, 0), reason="py2k")
),
],
)
def test_increment(n, expected):
assert n + 1 == expected
Found it! It's elegant in its simplicity. I just mark the affected scenarios:
outfile_scenarios = [pytest.mark.hermes_only('buildHermes'),
('buildTrinity')]
I hope this helps others.

Obtaining inconsistent results in Spark

Have any Spark experts had strange experience: obtaining inconsistent map-reduce results using pypark?
Suppose in the midway, I have a RDD
....
add = sc.parallelize([(('Alex', item1), 3), (('Joe', item2), 1),...])
My goal is to aggregate how many different users, so I do
print (set(rdd.map(lambda x: (x[0][0],1)).reduceByKey(add).collect()))
print (rdd.map(lambda x: (x[0][0],1)).reduceByKey(add).collect())
print (set(rdd.map(lambda x: (x[0][0],1)).reduceByKey(add).map(lambda x: x[0]).collect()))
These three prints should have the same content (though in different formats). For example, the first one is a set of set({('Alex', 1), ('John', 10), ('Joe', 2)...}); second a list of [('Alex', 1), ('John', 10), ('Joe', 2)...]. The number of the items should be equal to the number of different users. Third is a set({'Alex', 'John', 'Joe'...})
But instead I got set({('Alex', 1), ('John', 2), ('Joe', 3)...}); second a list of [('John', 5), ('Joe', 2)...]('Alex' is even missing here). The lengths of the set and list are different.
Unfortunately, I even cannot reproduce the error if I only write a short test code; still get right results. Did any meet this problem before?
I think I figured out.
The reason is that if I used the same RDD frequently, I need to .cache().
If the RDD becomes
add = sc.parallelize([(('Alex', item1), 3), (('Joe', item2), 1),...]).cache()
then the inconsistent problem solved.
Or, if I further prepare the aggregated rdd as
aggregated_rdd = rdd.map(lambda x: (x[0][0],1)).reduceByKey(add)
print (set(aggregated_rdd.collect()))
print (aggregated_rdd.collect())
print (set(aggregated_rdd.map(lambda x: x[0]).collect()))
then there are no inconsistent problems neither.

filling a matrix with Scala library breeze

I'm new to Scala and I'm having a mental block on a seemingly easy problem. I'm using the Scala library breeze and need to take an array buffer (mutable) and put the results into a matrix. This... should be simple but? Scala is so insanely type casted breeze seems really picky about what data types it will take when making a DenseVector. This is just some prototype code, but can anyone help me come up with a solution?
Right now I have something like...
//9 elements that need to go into a 3x3 matrix, 1-3 as top row, 4-6 as middle row, etc)
val numbersForMatrix: ArrayBuffer[Double] = (1, 2, 3, 4, 5, 6, 7, 8, 9)
//the empty 3x3 matrix
var M: breeze.linalg.DenseMatrix[Double] = DenseMatrix.zeros(3,3)
In breeze you can do stuff like
M(0,0) = 100 and set the first value to 100 this way,
You can also do stuff like:
M(0, 0 to 2) := DenseVector(1, 2, 3)
which sets the first row to 1, 2, 3
But I cannot get it to do something like...
var dummyList: List[Double] = List(1, 2, 3) //this works
var dummyVec = DenseVector[Double](dummyList) //this works
M(0, 0 to 2) := dummyVec //this does not work
and successfully change the first row to the 1, 2,3.
And that's with a List, not even an ArrayBuffer.
Am willing to change datatypes from ArrayBuffer but just not sure how to approach this at all... could try updating the matrix values one by one but that seems like it would be VERY hacky to code up(?).
Note: I'm a Python programmer who is used to using numpy and just giving it arrays. The breeze documentation doesn't provide enough examples with other datatypes for me to have been able to figure this out yet.
Thanks!
Breeze is, in addition to pickiness over types, pretty picky about vector shape: DenseVectors are column vectors, but you are trying to assign to a subset of a row, which expects a transposed DenseVector:
M(0, 0 to 2) := dummyVec.t