Serenity jbehave issue when using multiple scenario outlines in a single .feature file - jbehave

We are using Serenity with JBehave. We are facing issues when we have multiple scenario outlines with examples table in a single .feature file.
We have something like this in our Feature file:
Scenario Outline: title 1
Given
When
Then
Examples:
|data|
Scenario Outline: title 2
Given 2
When 2
Then 2
Examples:
|Data|
In this case after executing Examples of Scenario 1, it is considering Scenario outline 2 also as an input to scenario 1 example table, instead of treating it as a new scenario.
This is how the output looks like:
Scenario Outline: title 1
Given
When
Then
Examples:
|data|
|Scenario Outline: title 2|
|Given 2|
|When 2|
|Then 2|
|Examples:|
|Data|
Here are the versions of plugins:
jbehave.core.version - 3.9.5;
serenity.version - 1.0.47;
serenity.jbehave.version - 1.0.21
Can someone please help resolve this?
Note: I have seen posts where people had same issues when using scenario with examples table rather than using scenario outline with examples. Here I am using scenario outline only, but still have the same issue.

Related

Combining two VCF files with differing sampleIds and locations

Good day,
How to combine multiple Variant call files (VCF) with differing subjects?
I multiple VCF datasets with differing sampleIds and locations:
file1:
contigName |start | end | names | referenceAllele | alternateAlleles| qual| filters| splitFromMultiAllelic| genotypes
1 |792460|792461|["bla"]|G |["A"] |null|["PASS"] |false | [{"sampleId": "abba", "phased": false, "calls": [0, 0]}]
1 |792461|792462|["blaA"]|G |["A"] |null|["PASS"] |false | [{"sampleId": "abba", "phased": false, "calls": [0, 0]}]
file2:
contigName |start | end | names | referenceAllele | alternateAlleles| qual| filters| splitFromMultiAllelic| genotypes
1 |792460|792461|["bla"]|G |["A"] |null|["PASS"] |false | [{"sampleId": "baab", "phased": false, "calls": [0, 0]}]
1 |792464|792465|["blaB"]|G |["A"] |null|["PASS"] |false | [{"sampleId": "baab", "phased": false, "calls": [0, 0]}]
I need to combine these to single VCF file. I'm required to work in DataBricks (pyspark/scala) environment due to data security.
Glow documentation had and idea, which I aped:
import pyspark.sql.functions as F
spark.read.format("vcf")\
.option("flattenInfoFields", True)\
.load(file_list)\
.groupBy('contigName', 'start', 'end', 'referenceAllele', 'alternateAlleles', 'qual', 'filters','splitFromMultiAllelic')\
.agg(F.sort_array(F.flatten(F.collect_list('genotypes'))).alias('genotypes'))\
.write.mode("overwrite").format("vcf").save(.my_output_destination )
This only works when sampleId's are same in both files:
Task failed while writing rows
Cannot infer sample ids because they are not the same in every row.
I'm considering creating dummy table with NULL calls for all the IDs but that seems silly. (Not to mention huge resource sink.
Is there simple way to combine VCF files with differing sampleIds? Or autofill missing values with NULL calls?
Edit: I managed to do this with bigVCF format. However it autofills -1,-1 calls. I'd like to manually set autofilled values as something more clear that's it's not 'real'
write.mode("overwrite").format("bigvcf").save(
The code above works if you have identical variants in both tables. I would not recommend using it to combine two distinct datasets as this would introduce batch effects.
The best practice for combining two datasets is to reprocess them from the BAM files to gVCF using the same pipeline. Then run joint-genotyping to merge the samples (instead of a custom spark-sql function).
Databricks does provide a GATK4 best practices pipeline that includes joint-genotyping. Or you can use Deep variant to call mutations.
If it is not possible to reprocess the data, then the two datasets should be treated separately in a meta-analysis, as opposed to merging the VCFs and performing a mega-analysis.

Scala/Spark - Find total number of value in row based on a key

I have a large text file which contains the page views of some Wikimedia projects. (You can find it here if you're really interested) Each line, delimited by a space, contains the statistics for one Wikimedia page. The schema looks as follows:
<project code> <page title> <num hits> <page size>
In Scala, using Spark RDDs or Dataframes, I wish to compute the total number of hits for each project, based on the project code.
So for example for projects with the code "zw", I would like to find all the rows that begin with project code "zw", and add up their hits. Obviously this should be done for all project codes simultaneously.
I have looked at functions like aggregateByKey etc, but the examples I found don't go into enough detail, especially for a file with 4 fields. I imagine it's some kind of MapReduce job, but how exactly to implement it is beyond me.
Any help would be greatly appreciated.
First, you have to read the file in as a Dataset[String]. Then, parse each string into a tuple, so that it can be easily converted to a Dataframe. Once you have a Dataframe, a simple .GroupBy().agg() is enough to finish the computation.
import org.apache.spark.sql.functions.sum
val df = spark.read.textFile("/tmp/pagecounts.gz").map(l => {
val a = l.split(" ")
(a(0), a(2).toLong)
}).toDF("project_code", "num_hits")
val agg_df = df.groupBy("project_code")
.agg(sum("num_hits").as("total_hits"))
.orderBy($"total_hits".desc)
agg_df.show(10)
The above snippet shows the top 10 project codes by total hits.
+------------+----------+
|project_code|total_hits|
+------------+----------+
| en.mw| 5466346|
| en| 5310694|
| es.mw| 695531|
| ja.mw| 611443|
| de.mw| 572119|
| fr.mw| 536978|
| ru.mw| 466742|
| ru| 463437|
| es| 400632|
| it.mw| 400297|
+------------+----------+
It is certainly also possible to do this with the older API as an RDD map/reduce, but you lose many of the optimizations that Dataset/Dataframe api brings.

store elements to hashet from file scala

i am playing a little bit with scala and i want to open a text file, read each line and save some of the fields in a hashset.
The input file will be something like this:
1 2 3
2 4 5
At first, i am just trying to store the first element of each column to a variable but nothing seems to happen.
My code is:
var id = 0
val textFile = sc.textFile(inputFile);
val nline = textFile.map(_.split(" ")).foreach(r => id = r(0))
I am using spark because i want to process bigger amount of data later, so i'm trying to get used to it. I am printing id but i get only 0.
Any ideas?
A couple of things:
First, inside map and foreach you are running code out on your executors. The id variable you defined is on the driver. You can pass variables to your executors using closures, but not the other way around. If you think about it, when you have 10 executors running through records simultaneously which value of ID would you expect to be returned?
Edit - foreach is an action
I mistakenly called foreach not an action below. It is an action that just lets your run arbitrary code against your rows. It is useful if you have your own code to save the result to a different data store for example. foreach does not bring any data back to the driver, so it does not help with your case.
End edit
Second, all of the spark methods you called are transformations, you haven't called an action yet. Spark doesn't actually run any code until an action is called. Instead it just builds a graph of the transformations you want to happen until you specify an action. Actions are things that require materializing a result either to provide data back to the driver or save them out somewhere like HDFS.
In your case, to get values back you will want to use an action like "collect" which returns all the values from the RDD back to the driver. However, you should only do this when you know there aren't going to be many values returned. If you are operating on 100 million records you do not want to try and pull them all back to the driver! Generally speaking you will want to only pull data back to the driver after you have processed and reduced it.
i am just trying to store the first element of each column to a
variable but nothing seems to happen.
val file_path = "file.txt"
val ds = ss.read.textFile(file_path)
val ar = ds.map(x => x.split(" ")).first()
val (x,y,z) = (ar(0),ar(1),ar(2))
You can access the first value of the columns with x,y,z as above.
With your file, x=1, y=2, z=3.
val ar1 = ds.map(x => x.split(" "))
val final_ds = ar.select($"value".getItem(0).as("col1") , $"value".getItem(1).as("col2") , $"value".getItem(2).as("col3")) // you can name the columns as like this
Output :
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| 2| 3|
| 2| 4| 5|
+----+----+----+
You can run any kind of sql's on final_ds like a small sample below.
final_ds.select("col1","col2").where(final_ds.col("col1") > 1).show()
Output:
+----+----+
|col1|col2|
+----+----+
| 2| 4|
+----+----+

Execute single cucumber test case in a scenario outline using command line command

I want to execute single test case from a scenario outline using protractor.For example in the below Scenario Outline if I want to execute the test case TCID0002 alone, how I can run the test case TCID0002 using protractor?
#shopping
Scenario Outline: Test
Given the user navigates to xxx.com
When the user searches for <product>
Then the current page is shopping cart page
Examples:
|TCID | product|
|TCID0001|soap |
|TCID0002|watch |
|TCID0003|lipstick |
To run all the test case now I use
protractor Config.js --cucumberOpts.tags="#shopping"
Is there any command to execute single test case in the scenario outline ?
You can use tags on the examples table and splitting it into two tables. Then provide the #runone tag to tags option of cucumberOpts in config file.
#runall
Examples:
|TCID | product|
|TCID0001|soap |
|TCID0002|watch |
|TCID0003|lipstick |
#runone
Examples:
|TCID | product|
|TCID0002|watch |
Found solution to execute single test case in cucumber with a help of my team member.
To run single test case follow the below 2 steps
Step 1
Keep TCID in the scenario title as shown below
Scenario Outline: <TCID> test case to validate search
Given the user navigates to xxx.com
When the user searches for <product>
Then the current page is search result page
Examples:
|TCID | product|
|TCID0001|soap |
|TCID0002|watch |
|TCID0003|lipstick |
Step 2
Use cucumberOpts.name in your command. 'cucumberOpts.name' will filter the scenarios which contain the given string in the scenario title.
--cucumberOpts.name="WAGCAR0002" will filter the WAGCAR0002 scenario alone.
Command
The below command will execute the test case 'WAGCAR0002'
protractor Config/wagConfig.js --cucumberOpts.name="WAGCAR0002"

Meteor Collection exists in Mongo but returns ReferenceError in browser console

I'm setting up a new Meteor project and am having trouble working with collections on the client side. Right now what I'm building is an administrative page where a user can maintain a list of schools.
Running meteor mongo and db.schools.find() in Terminal returns a BSON object just as I would expect, however when I enter "Schools" in the Chrome console it returns Uncaught ReferenceError: Schools is not defined which is a bummer.
My project architecture (simplified to the bits that reference the School collection) is as follows:
client/
layouts/
admin/
schools.coffee
schools.jade
lib/
subscriptions.coffee
lib/
collections.coffee
server/
lib/
publications.coffee
The contents of each of these files (in the desirable load order) is:
1) lib/collections.coffee
1| Schools = new Mongo.Collection('schools')
2) server/lib/publications.coffee
1| Meteor.publish('schools'), ->
2| Schools.find()
3) client/lib/subscriptions.coffee
1| Meteor.subscribe('schools')
4) client/layouts/admin/schools.coffee
76| Template.admin_schools.helpers
77| schools: ->
78| Schools.find()
79| numSchools: ->
80| Schools.find().count()
81| hasSchools: ->
82| Schools.find().count() > 0
5) client/layouts/admin/schools.jade
4| h2.admin__header Schools
5| span.admin__record-count {{numSchools}} entries
...
22| table.admin__list
23| if hasSchools
24| each schools
25| tr.admin__list-item
26| td #{name}
I also have a form for new collection entries which calls Schools.insert, but the error there is the same.
When the page loads, I get the following error (likely because it is called first):
debug.js:41 Exception in template helper: ReferenceError: Schools is not defined
at Object.Template.admin_schools.helpers.numSchools
Those two errors, combined with the fact that I know there exists an entry in the collection, leads me to believe that the issue lies with the client side's awareness of the existence of the collection.
This discrepancy might possibly be due to load order (I am pretty sure I accounted for that by putting the important files in lib/ directories, though I would love a second opinion), or maybe due to a spelling/syntax mistake (though the absence of compile errors is puzzling). Maybe something completely different!
Thank you very much for your time and assistance, they are much appreciated.
Turns out that because this is CoffeeScript, placing an # before Schools = new Mongo.Collection('schools') in lib/collections.coffee makes Schools a global variable, thereby solving the problem! Pretty simple fix in the end :)
Special thanks to Kishor for helping troubleshoot.
Have you checked your subscriptions on created?