How to solve "Step definition is not found" error: StepDefinitionNotFoundError - pytest

Here is my feature file - a.feature:
Scenario Outline: Some outline
Given something
When <thing> is moved to <position>
Then something else
| thing | position |
| 1 | 1 |
and save it in /tmp/a.feature
Here is my pytest step file (/tmp/
from pytest_bdd import (
#scenario('./x.feature', 'Some outline')
def test_some_outline():
"""Some outline."""
def something():
#when('<thing> is moved to <position>')
def thing_is_moved_to_position(thing, position):
assert isinstance(thing, int)
assert isinstance(position, int)
#then('something else')
def something_else():
"""something else."""
When I run it:
$ pwd
$ pytest ./
E pytest_bdd.exceptions.StepDefinitionNotFoundError: Step definition is not found: When "1 is moved to 1". Line 3 in scenario "Some outline" in the feature "/tmp/x.feature"
/home/cyan/.local/lib/python3.10/site-packages/pytest_bdd/ StepDefinitionNotFoundError
============= short test summary info =============
FAILED[1-1] - pytest_bdd.exceptions.StepDefinitionNotFoundError: Step definition is not found: When "1 is moved to 1". Line 3 in scenario "Some outli...
============ 1 failed in 0.09s ============


Dataflow job doesn't emit messages after GroupByKey()

I have a streaming dataflow pipeline that writes to BQ, and I want to window all the failed rows and do some further analysis. The pipeline looks like this, I'm getting all the error messages in the 2nd step but all the messages are getting stuck to the beam.GroupByKey(). Nothing moves downstream after that. Does anyone have any idea how to fix this?
data = (
| "Read PubSub Messages" >>,
| "write to BQ" >>
| f"Window into: {options.window_size}m" >> GroupWindowsIntoBatches(options.window_size)
| f"Failed Rows for " >> beam.ParDo(BadRows(options.bq_dataset, 'table'))
class GroupWindowsIntoBatches(beam.PTransform):
"""A composite transform that groups Pub/Sub messages based on publish
time and outputs a list of dictionaries, where each contains one message
and its publish timestamp.
def __init__(self, window_size):
# Convert minutes into seconds.
self.window_size = int(window_size * 60)
def expand(self, pcoll):
return (
# Assigns window info to each Pub/Sub message based on its publish timestamp.
| "Window into Fixed Intervals" >> beam.WindowInto(window.FixedWindows(10))
# If the windowed elements do not fit into memory please consider using `beam.util.BatchElements`.
| "Add Dummy Key" >> beam.Map(lambda elem: (None, elem))
| "Groupby" >> beam.GroupByKey()
| "Abandon Dummy Key" >> beam.MapTuple(lambda _, val: val)
also, I don't know if it's relevant but the beam.DoFn.TimestampParam inside my GroupWindowsIntoBatches has invalid timestamp (negative)
Ok, so the issue was that the messages coming from BigQuery FAILED_ROWS were not timestamped. adding | 'Add Timestamps' >> beam.Map(lambda x: beam.window.TimestampedValue(x, time.time())) seems to fix the group by.
class GroupWindowsIntoBatches(beam.PTransform):
"""A composite transform that groups Pub/Sub messages based on publish
time and outputs a list of dictionaries, where each contains one message
and its publish timestamp.
def __init__(self, window_size):
# Convert minutes into seconds.
self.window_size = int(window_size * 60)
def expand(self, pcoll):
return (
| 'Add Timestamps' >> beam.Map(lambda x: beam.window.TimestampedValue(x, time.time())) <----- Added This line
| "Window into Fixed Intervals" >> beam.WindowInto(window.FixedWindows(30))
| "Add Dummy Key" >> beam.Map(lambda elem: (None, elem))
| "Groupby" >> beam.GroupByKey()
| "Abandon Dummy Key" >> beam.MapTuple(lambda _, val: val)

Scala : Cant run gcloud compute ssh

I am trying to run a hive query using gcloud compute ssh via scala
First, here is what i tried
scala> import sys.process._
scala> val results = Seq("hive", "-e", "show databases;").!!
which is good. Now, i want to run the same hive command, but against a GCP cluster. I have gcloud setup on my VM and from the command line, i can easily do
$ gcloud compute ssh --zone myZone myNode --internal-ip -- 'hive -e "show databases;"'
Updating project ssh metadata...⠶Updated [].
Updating project ssh metadata...done.
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.2746937995265952194' (RSA) to the list of known hosts.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19 100 19 0 0 2982 0 --:--:-- --:--:-- --:--:-- 3166
Logging initialized using configuration in file:/etc/hive/conf.dist/ Async: true
Now, I want to run the above using scala. Here is what i tried
scala> val results = Seq("gcloud", "compute", "ssh", "--zone", "myZone", "myNode", "--internal-ip", "--", "hive", "-e" ,"show databases").!!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19 100 19 0 0 3270 0 --:--:-- --:--:-- --:--:-- 3800
Pseudo-terminal will not be allocated because stdin is not a terminal.
Logging initialized using configuration in file:/etc/hive/conf.dist/ Async: true
NoViableAltException(-1#[846:1: ddlStatement : ( createDatabaseStatement | switchDatabaseStatement | dropDatabaseStatement | createTableStatement | dropTableStatement | truncateTableStatement | alterStatement | descStatement | showStatement | metastoreCheck | createViewStatement | createMaterializedViewStatement | dropViewStatement | dropMaterializedViewStatement | createFunctionStatement | createMacroStatement | createIndexStatement | dropIndexStatement | dropFunctionStatement | reloadFunctionStatement | dropMacroStatement | analyzeStatement | lockStatement | unlockStatement | lockDatabase | unlockDatabase | createRoleStatement | dropRoleStatement | ( grantPrivileges )=> grantPrivileges | ( revokePrivileges )=> revokePrivileges | showGrants | showRoleGrants | showRolePrincipals | showRoles | grantRole | revokeRole | setRole | showCurrentRole | abortTransactionStatement );])
at org.antlr.runtime.DFA.noViableAlt(
at org.antlr.runtime.DFA.predict(
at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(
at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(
at org.apache.hadoop.hive.ql.Driver.compile(
at org.apache.hadoop.hive.ql.Driver.compileInternal(
at org.apache.hadoop.hive.ql.Driver.runInternal(
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
at org.apache.hadoop.hive.cli.CliDriver.processCmd(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.processLine(
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(
at org.apache.hadoop.hive.cli.CliDriver.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.util.RunJar.main(
FAILED: ParseException line 1:4 cannot recognize input near 'show' '<EOF>' '<EOF>' in ddl statement
java.lang.RuntimeException: Nonzero exit value: 64
at scala.sys.package$.error(package.scala:27)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:132)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
... 50 elided
why am i getting this error ? I also tried
scala> val results = Seq("gcloud", "compute", "ssh", "--zone", "myZone", "myNode", "--internal-ip", "--", "hive", "-e" ,"show databases;").!!
but got the same error. Then i tried
scala> val results = Seq("gcloud", "compute", "ssh", "--zone", "myZone", "myNode", "--internal-ip", "--", "'hive -e \"show databases;\"'").!!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19 100 19 0 0 3245 0 --:--:-- --:--:-- --:--:-- 3800
Pseudo-terminal will not be allocated because stdin is not a terminal.
bash: hive -e "show databases;": command not found
java.lang.RuntimeException: Nonzero exit value: 127
at scala.sys.package$.error(package.scala:27)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:132)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
... 50 elided
How can I run the gcloud comput ssh properly using scala ?
You don't need the single quotes in your last example. You're trying to pass the string:
hive -e "show databases;"
For fun, I would use triple quotes in Scala:
"""hive -e "show databases;""""
to avoid backslash. Single quotes in your good command line are processed by bash.
This is what worked in bash:
$ gcloud compute ssh --zone myZone myNode --internal-ip -- 'hive -e "show databases;"'
scala.sys.process got some basic parsing at some point. There is a space in this file name that must be quoted. Amazingly, it seems to do shell-style quotes:
$ scala
Welcome to Scala 2.13.0 (OpenJDK 64-Bit Server VM, Java 11.0.3).
Type in expressions for evaluation. Or try :help.
scala> import scala.sys.process._
import scala.sys.process._
scala> "ls -l /tmp/skypeforlinux Crashes".!!
ls: cannot access '/tmp/skypeforlinux': No such file or directory
ls: cannot access 'Crashes': No such file or directory
java.lang.RuntimeException: Nonzero exit value: 2
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:155)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:112)
... 28 elided
scala> """ls -l "/tmp/skypeforlinux Crashes"""".!!
res1: String =
"total 0
scala> """ls -l '/tmp/skypeforlinux Crashes'""".!!
res2: String =
"total 0
scala> """ls -l /tmp/skypeforlin'ux Cr'ashes""".!!
res3: String =
"total 0
scala> """echo 'hive -e "show databases;"'""".!!
res4: String =
"hive -e "show databases;"
The double quotes around "my house" are part of the file name:
scala> """ls '/tmp/"my house"'""".!!
res5: String =
"/tmp/"my house"
I guess that code is where I learned how shell-style quotes work, though I never have a chance to use that knowledge. Except for this answer, so thanks for the opportunity.

Collecting output from Apache Beam pipeline and displaying it to console

I have been working on Apache Beam for a couple of days. I wanted to quickly iterate on the application I am working and make sure the pipeline I am building is error free. In spark we can use sc.parallelise and when we apply some action we get the value that we can inspect.
Similarly when I was reading about Apache Beam, I found that we can create a PCollection and work with it using following syntax
with beam.Pipeline() as pipeline:
lines = pipeline | beam.Create(["this is test", "this is another test"])
word_count = (lines
| "Word" >> beam.ParDo(lambda line: line.split(" "))
| "Pair of One" >> beam.Map(lambda w: (w, 1))
| "Group" >> beam.GroupByKey()
| "Count" >> beam.Map(lambda (w, o): (w, sum(o))))
result =
I actually wanted to print the result to console. But I couldn't find any documentation around it.
Is there a way to print the result to console instead of saving it to a file each time?
You don't need the temp list. In python 2.7 the following should be sufficient:
def print_row(row):
print row
| ...
| "print" >> beam.Map(print_row)
result =
In python 3.x, print is a function so the following is sufficient:
| ...
| "print" >> beam.Map(print)
result =
After exploring furthermore and understanding how I can write testcases for my application I figure out the way to print the result to console. Please not that I am right now running everything to a single node machine and trying to understand functionality provided by apache beam and how can I adopt it without compromising industry best practices.
So, here is my solution. At the very last stage of our pipeline we can introduce a map function that will print result to the console or accumulate the result in a variable later we can print the variable to see the value
import apache_beam as beam
# lets have a sample string
data = ["this is sample data", "this is yet another sample data"]
# create a pipeline
pipeline = beam.Pipeline()
counts = (pipeline | "create" >> beam.Create(data)
| "split" >> beam.ParDo(lambda row: row.split(" "))
| "pair" >> beam.Map(lambda w: (w, 1))
| "group" >> beam.CombinePerKey(sum))
# lets collect our result with a map transformation into output array
output = []
def collect(row):
return True
counts | "print" >> beam.Map(collect)
# Run the pipeline
result =
# lets wait until result a available
# print the output
print output
Maybe logging info instead of print?
def _logging(elem):
return elem
P | "logging info" >> beam.Map(_logging)
Follow an example from pycharm Edu
import apache_beam as beam
class LogElements(beam.PTransform):
class _LoggingFn(beam.DoFn):
def __init__(self, prefix=''):
super(LogElements._LoggingFn, self).__init__()
self.prefix = prefix
def process(self, element, **kwargs):
print self.prefix + str(element)
yield element
def __init__(self, label=None, prefix=''):
super(LogElements, self).__init__(label)
self.prefix = prefix
def expand(self, input):
input | beam.ParDo(self._LoggingFn(self.prefix))
class MultiplyByTenDoFn(beam.DoFn):
def process(self, element):
yield element * 10
p = beam.Pipeline()
(p | beam.Create([1, 2, 3, 4, 5])
| beam.ParDo(MultiplyByTenDoFn())
| LogElements())
Out[10]: <apache_beam.runners.portability.fn_api_runner.RunnerResult at 0x7ff41418a210>
I know it isn't what you asked for but why don't you store it to a text file? It's always better than printing it via stdout and it isn't volatile

Xtext 2.8+ formatter, formatting HiddenRegion with comment

I am using Xtext 2.9 formatter and I am trying to format hiddenRegion which contains comment. Here is part of my document region i am trying to format:
Columns: 1:offset 2:length 3:kind 4: text 5:grammarElement
Kind: H=IHiddenRegion S=ISemanticRegion B/E=IEObjectRegion
35 0 H
35 15 S ""xxx::a::b"" Refblock:namespace=Namespace
50 0 H
50 1 S "}" Refblock:RCBRACKET
E Refblock PackageHead:Block=Refblock path:PackageHead/Block=Package'xxx_constants'/head=Model/packages[0]
51 0 H
51 1 S ":" PackageHead:COLON
E PackageHead Package:head=PackageHead path:Package'xxx_constants'/head=Model/packages[0]
52 >>>H "\n " Whitespace:TerminalRule'WS'
"# asd" Comment:TerminalRule'SL_COMMENT'
15 "\n " Whitespace:TerminalRule'WS'<<<
B Error'ASSD' Package:expressions+=Expression path:Package'xxx_constants'/expressions[0]=Model/packages[0]
67 5 S "error" Error:'error'
72 1 H " " Whitespace:TerminalRule'WS'
and corresponding part of the grammar
Error | Warning | Enum | Text;
'package' name=Name head=PackageHead
('error') name=ENAME parameter=Parameter COLON
Block=Refblock COLON;
Problem is that when i try prepend some characters before error keyword
for example
error.regionFor.keyword('error').prepend[setSpace("\n ")]
This indentation is prepended before the comment and not behind it. This results into improper formatting in case of single line comment before the 'error' keyword.
To provide more clarity, here is example code from my grammar and description of desired behavior:
package xxx_constants {namespace="xxx::a::b"}:
# asd
error ASSD {0}:
Hello {0,world}
This is expected result: (one space to the left)
package xxx_constants {namespace="xxx::a::b"}:
# asd
error ASSD {0}:
Hello {0,world}
and this is the actual result with prepend method
package xxx_constants {namespace="xxx::a::b"}:
# asd
error ASSD {0}:
Hello {0,world}
As the document structure says, the HiddenRegion is in this case is the statement:
# asd
How can i prepend my characters directly before the keyword 'error' and not before the comment? Thanks.
I assume you're creating an indentation-sensitive language, because you're explicitly calling BEGIN and END.
For indentation-sensitive language my answer is: You'll want to overwrite
The methods append[] and prepend[] you're using are agnostic to comments and at a later time applyHiddenRegionFormatting() is called to decide how that formatting is weaved between comments.
To make Xtext use your own subclass of HiddenRegionReplacer, overwrite
org.eclipse.xtext.formatting2.AbstractFormatter2.createHiddenRegionReplacer(IHiddenRegion, IHiddenRegionFormatting)
For languages that do not do whitespace-sensitive lexing/parsing (that's the default) the answer is to not call setSpace() to create indentation or line-wraps.
Instead, do

search for a string between repetitive tags in a txt file

Need some help with respect to search for a string between repetitive tags
I have a text file with following format repeating many times within a file
========== File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_1 START ==========
Block 1
Block 2
Input section OK
data section OK
Block 3
Input section OK
data section OK
Block 4
Input section OK
data section OK
========== File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_2 START ========
Block 1
Block 2
Input section OK
line mismatch:
output line: "Buffer allocated from pool: MIF_CTRL_POOL, buffer_id: 1, size (words): 4"
reference line: "pdcp_pdu_delete_count = 0, reset_cip_rdy = 1"
line mismatch:
output line: "pdcp_pdu_delete_count = 0, reset_cip_rdy = 1"
reference line: "-- MIF CTRL output: --------------------------------------------------------------------------------"
line mismatch:
output line: "-- MIF CTRL output: --------------------------------------------------------------------------------"
reference line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
line mismatch:
output line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
reference line: "Buffer released from pool: MIF_CTRL_POOL, buffer_id 0, size (words) 6 (used 5)"
line count mismatch:
last output line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
last reference line: "Buffer released from pool: MIF_CTRL_POOL, buffer_id 0, size (words) 6 (used 5)"
data section DIFFERS
Block 3
Input section OK
data section OK
Block 4
Input section OK
data section OK
========== File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_3 START ========
Block 1
Block 2
Input section OK
data section OK
Block 3
Input section OK
data section OK
Block 4
Input section OK
data section OK
**I need to find if anything other than 'OK' exists between the Start tags and if yes i have to mark the particular block as failed
for example if i find any other than OK between Test Case: Test_Case_1 START and Test Case: Test_Case_2 START i have to mark Test Case: Test_Case_1 as failed
Expected Output in somewhat this format
File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_1 Status: PASS
(if there is no string as 'DIFFERS' between tags (==)
File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_2 Status: FAILED
(if there is string 'DIFFERS' between tags (==)
If in case Test Case fails
File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_2 Status: FAILED
Section of block differing is:
Block 2
Input section OK
line mismatch:
output line: "Buffer allocated from pool: MIF_CTRL_POOL, buffer_id: 1, size (words): 4"
reference line: "pdcp_pdu_delete_count = 0, reset_cip_rdy = 1"
line mismatch:
output line: "pdcp_pdu_delete_count = 0, reset_cip_rdy = 1"
reference line: "-- MIF CTRL output: --------------------------------------------------------------------------------"
line mismatch:
output line: "-- MIF CTRL output: --------------------------------------------------------------------------------"
reference line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
line mismatch:
output line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
reference line: "Buffer released from pool: MIF_CTRL_POOL, buffer_id 0, size (words) 6 (used 5)"
line count mismatch:
last output line: "mif_ctrl_rlc_am_um_reset_reestablish_ind_t.rlc_reset_reestablish_ind = 3 (0x0003)"
last reference line: "Buffer released from pool: MIF_CTRL_POOL, buffer_id 0, size (words) 6 (used 5)"
data section DIFFERS
Wild guess:
while(<>) {print '**FAIL**' unless /TC ID OK/; print $_; }
But really, you should specify your requirements.
Maybe something quirky along the lines of the following:
#!/usr/bin/env perl
use strict;
use warnings;
my ($failed, $file, $test);
while (<>)
next if /^$/;
if (/^=/)
print "$file $test Status: $failed\n" if $failed;
($file, $test) = ($_ =~ /(File Name:\s+\S+).+\b(Test Case: Test_Case_\d+)\b/);
$failed = 'PASSED';
$failed = 'FAILED' if /\bDIFFERS\b/g;
print "$file $test Status: $failed\n";
$ cat testdata
========== File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_1 START ==========
Block 1 TC ID OK
Block 2 Input section OK data section OK
Block 3 Input section OK data section OK
Block 4 Input section OK data section OK
========== File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_2 START ========
Block 1 TC ID OK
Block 2 Input section OK data section DIFFERS
Block 3 Input section OK data section OK
Block 4 Input section OK data section OK
$ ./ < testdata
File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_1 Status: PASSED
File Name: fixed_am_7bitLI_HE10.txt Test Case: Test_Case_2 Status: FAILED