Perl one liner to simulate awk script - perl

I'm new to both awk and perl, so please bear with me.
I have the following awk script:
awk '/regex1/{p = 0;} /regex2/{p = 1;} p'
What this basically does is print all lines staring from line matching with regex2 until a line matching with regex1 is found.
Example:
regex1
regex2
line 1
line 2
regex1
regex2
regex1
Output:
regex2
line 1
line 2
regex2
Is it possible to simulate this using a perl one-liner? I know I can do it with a script saved in a file.
Edit:
A practical example:
24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
24 May 2017 17:00:06,828 [INFO] 567890 (Blah : Blah1) Service-name:: Content( May span multiple lines)
24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2)
Service-name: Multiple line content. Printing Object[ ID1=fac-adasd
ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]
24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
24 May 2017 17:00:06,831 [INFO] 567890 (Blah : Blah2) Service-name:: Content( May span multiple lines)
Given the search key 123456 I want to extract the following:
24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2)
Service-name: Multiple line content. Printing Object[ ID1=fac-adasd
ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]
24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
The following awk script does the job:
awk '/[0-9]{2}\s\w+\s[0-9]{4}/{n = 0} /123456/ {n =1}n' file

perl -ne 'print if (/regex2/ .. /regex1/) =~ /^\d+$/'
This is slightly crazy, but here's how it works:
-n adds an implicit loop over the input lines
the current line is in $_
the two bare regex matches (/regex2/, /regex1/) implicitly test against $_
we use .. in scalar context, which turns it into a stateful flip-flop operator
By that I mean: X .. Y starts out in the "false" state. In the "false" state it only evaluates X. If X returns a false value, it remains in the "false" state (and returns false itself). Once X returns a true value, it moves into the "true" state and returns true.
In the "true" state it only evaluates Y. If Y returns false, it remains in the "true" state (and returns true itself). Once Y returns a true value, it moves into the "false" state but it still returns true.
had we just used print if /regex2/ .. /regex1/, it would have printed all the terminating regex1 lines, too
a close reading of Range Operators in perldoc perlop reveals that you can distinguish the end points of the range
the "true" value returned by .. is actually a sequence number starting from 1, so the start of a range can be identified by checking for 1
when the end of the range is reached (i.e. we're about to move from the "true" state to the "false" state again), the return value gets a "E0" tacked on to the end
Adding "E0" to an integer doesn't affect its numeric value. Perl implicitly converts strings to numbers when needed, and something like "5E0" is just scientific notation (meaning 5 * 10**0, which is 5 * 1, which is 5).
the "false" value returned by .. is the empty string, ""
We check that the result of .. matches the regex /^\d+$/, i.e. is all digits. This excludes the empty string (because we require at least one digit to match), so we don't print lines outside of the range. It also excludes the last line in our range, because E is not a digit.

Not sure if awk prints both the start and end of the range, but Perl does:
perl -ne 'if(/regex2/ ... /regex1/){print}' file
Edit: Awk (at least Gnu awk) also has a range operator, so this could have been done more simply as:
awk '/regex2/,/regex1/' file

Related

Perl illegal division by zero at -e

I am running the following script:
# Usage: sh hmmscan-parser.sh hmmscan-file
# 1. take hmmer3 output and generate the tabular output
# 2. sort on the 6th and 7th cols
# 3. remove overlapped/redundant hmm matches and keep the one with the lower e-values
# 4. calculate the covered fraction of hmm (make sure you have downloaded the "all.hmm.ps.len" file to the same directory of this perl script)
# 5. apply the E-value cutoff and the covered faction cutoff
cat $1 | perl -e 'while(<>){if(/^\/\//){$x=join("",#a);($q)=($x=~/^Query:\s+(\S+)/m);while($x=~/^>> (\S+.*?\n\n)/msg){$a=$&;#c=split(/\n/,$a);$c[0]=~s/>> //;for($i=3;$i<=$#c;$i++){#d=split(/\s+/,$c[$i]);print $q."\t".$c[0]."\t$d[6]\t$d[7]\t$d[8]\t$d[10]\t$d[11]\n" if $d[6]<1;}}#a=();}else{push(#a,$_);}}' \
| sort -k 1,1 -k 6,6n -k 7,7n | uniq \
| perl -e 'while(<>){chomp;#a=split;next if $a[-1]==$a[-2];push(#{$b{$a[0]}},$_);}foreach(sort keys %b){#a=#{$b{$_}};for($i=0;$i<$#a;$i++){#b=split(/\t/,$a[$i]);#c=split(/\t/,$a[$i+1]);$len1=$b[-1]-$b[-2];$len2=$c[-1]-$c[-2];$len3=$b[-1]-$c[-2];if($len3>0 and ($len3/$len1>0.5 or $len3/$len2>0.5)){if($b[2]<$c[2]){splice(#a,$i+1,1);}else{splice(#a,$i,1);}$i=$i-1;}}foreach(#a){print $_."\n";}}' \
| uniq | perl -e 'open(IN,"all.hmm.ps.len");
while(<IN>)
{
chomp;
#a=split;
$b{$a[0]}=$a[1]; # creates hash of hmmName : hmmLength
}
while(<>)
{
chomp;
#a=split;
$r=($a[3]-$a[2])/$b{$a[1]}; # $a[4] = hmm end $a[3] = hmm start ; $b{$a[1]} = result of the hash of the name of the hmm (hmm length).
print $_."\t".$r."\n";
}' \
| perl -e 'while(<>){#a=split(/\t/,$_);if(($a[-2]-$a[-3])>80){print $_ if $a[2]<1e-5;}else{print $_ if $a[2]<1e-3;}}' | awk '$NF>0.3'
When I run the file, I get "Illegal division by zero at -e line 14, <> line 1"
Line 14 is :
$r=($a[3]-$a[2])/$b{$a[1]};
The first input (hmmscan-file) is under this form :
Query: NODE_1_length_300803_cov_11.207433_1 [L=264]
Description: # 139 # 930 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.543
Scores for complete sequence (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp N Model Description
------- ------ ----- ------- ------ ----- ---- -- -------- -----------
[No hits detected that satisfy reporting thresholds]
Domain annotation for each model (and alignments):
[No targets detected that satisfy reporting thresholds]
Internal pipeline statistics summary:
-------------------------------------
Query sequence(s): 1 (264 residues searched)
Target model(s): 641 (202466 nodes)
Passed MSV filter: 18 (0.0280811); expected 12.8 (0.02)
Passed bias filter: 18 (0.0280811); expected 12.8 (0.02)
Passed Vit filter: 1 (0.00156006); expected 0.6 (0.001)
Passed Fwd filter: 0 (0); expected 0.0 (1e-05)
Initial search space (Z): 641 [actual number of targets]
Domain search space (domZ): 0 [number of targets reported over threshold]
# CPU time: 0.02u 0.02s 00:00:00.04 Elapsed: 00:00:00.03
# Mc/sec: 1357.56
//
Query: NODE_1_length_300803_cov_11.207433_2 [L=184]
Description: # 945 # 1496 # 1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.431
the second input named (all.hmm.ps.len) is under this form whoever the third pearl command calls him.
BM10.hmm 28
CBM11.hmm 163
I did not understand where the problem is knowing that the script in general aims to create an array to clearly read the input (hmmscan-file).
thank you very much
So you have this error:
Illegal division by zero at -e line 14, <> line 1
And you say that line 14 is this:
$r=($a[3]-$a[2])/$b{$a[1]};
Well, there's only one division in that line of code, so it seems clear that when you're executing that line $b{$a[1]} contains zero (but note that because you don't have use warnings in your code, it's possible that it might also contain an empty string or undef and that's being silently converted to a zero).
So as the programmer, your job is to trace through your code and find out where these values are being set and what might be causing it not to contain a value that can be used in your division.
I would be happy to help work out what the problem is, but for a few problems:
Your program reads from two input files and you've only given us one.
Your variables all have single-letter names, making it impossible to understand what they are for.
Your use of a string of command-line programs is bizarre and renders your code pretty much unreadable.
If you want help (not just from here, but from any forum), your first move should be to make your code as readable as possible. It's more than likely that by doing that, you'll find the problem yourself.
But your current habit of reposting the question with incremental changes is winning you no friends here.
Update: One idea might be to rewrite your line as something like this:
if (exists $b{$a[1]}) {
$r=($a[3]-$a[2])/$b{$a[1]};
} else {
warn "Hey, I can't find the key '$a[1]' in the hash \%b\n";
}

Combining Multiple String Commands Into One Line

I'm using PowerShell and running a tool to extract Lenovo hardware RAID controller info to identify the controller number for use later on in another command line (this is part of a SCCM Server Build Task Sequence). The tool outputs a lot of data and I'm trying to isolate just what I need from the output.
I've been able to isolate what I need, but I'm thinking there has to be a more efficient way so looking for optimizations. I'm still learning when it comes to working with strings.
The line output from the tool that I'm looking for looks like this:
0 0 0 252:0 17 DRIVE Onln N 557.861 GB dsbl N N dflt -
I'm trying to get the 3 characters to the left of the :0 (the 252 but on other models this could be 65 or some other 2 or 3 digit number)
My existing code is:
$ControllerInfo = cmd /c '<path>\storcli64.exe /c0 show'
$forEach ($line in $ControllerInfo) {
if ($line -like '*:0 *') {
$ControllerNum = $line.split(':')[0] # Get everything left of :
$ControllerNum = $ControllerNum.Substring($ControllerNum.Length -3) # Get last 3 chars of string
$ControllerNum = $ControllerNum.Replace(' ', '') # Remove blanks
Write-Host $ControllerNum
break #stop looping through output
}
}
The above works but I'm wondering if there's a way to combine the three lines that start with $ControllerNum = so I can have just have a single $ControllerNum = (commands) line to set the variable instead of doing it in 3 lines. Basically want to combine the Split, Substring and Replace commands into a single line.
Thanks!
Here's another option:
$ControllerNum = ([regex]'(\d{2,3}):0').Match($line).Groups[1].Value
Used on your sample 0 0 0 252:0 17 DRIVE Onln N 557.861 GB dsbl N N dflt -
the result in $ControllerNum wil be 252
If you want just the last digits before the first :, without any whitespace, you can do that with one or two regex expressions:
$line -replace '^.*\b(\d+):.*$','$1'
Regex explanation:
^ # start of string
.* # any number of any characters
\b # word boundary
( # start capture group
\d+ # 1 or more strings
) # end capture group
: # a literal colon (:)
.* # any number of any characters
$ # end of string
replacement:
$1 # Value captured in the capture group above

sed: replace letter between square brackets

I have the following string:
signal[i]
signal[bg]
output [10:0]
input [i:1]
what I want is to replace the letters between square brackets (by underscore for example) and to keep the other strings that represents table declaration:
signal[_]
signal[__]
output [10:0]
input [i:1]
thanks
try:
awk '{gsub(/\[[a-zA-Z]+\]/,"[_]")} 1' Input_file
Globally substituting the (bracket)alphabets till their longest match then with [_]. Mentioning 1 will print the lines(edited or without edited ones).
EDIT: Above will substitute all alphabets with one single _, so to get as many underscores as many characters are there following may help in same.
awk '{match($0,/\[[a-zA-Z]+\]/);VAL=substr($0,RSTART+1,RLENGTH-2);if(VAL){len=length(VAL);;while(i<len){q=q?q"_":"_";i++}};gsub(/\[[a-zA-Z]+\]/,"["q"]")}1' Input_file
OR
awk '{
match($0,/\[[a-zA-Z]+\]/);
VAL=substr($0,RSTART+1,RLENGTH-2);
if(VAL){
len=length(VAL);
while(i<len){
q=q?q"_":"_";
i++
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]")
}
1
' Input_file
Will add explanation soon.
EDIT2: Following is the one with explanation purposes for OP and users.
awk '{
match($0,/\[[a-zA-Z]+\]/); #### using match awk's built-in utility to match the [alphabets] as per OP's requirement.
VAL=substr($0,RSTART+1,RLENGTH-2); #### Creating a variable named VAL which has substr($0,RSTART+1,RLENGTH-2); which will have substring value, whose starting point is RSTART+1 and ending point is RLENGTH-2.
RSTART and RLENGTH are the variables out of the box which will be having values only when awk finds any match while using match.
if(VAL){ #### Checking if value of VAL variable is NOT NULL. Then perform following actions.
len=length(VAL); #### creating a variable named len which will have length of variable VAL in it.
while(i<len){ #### Starting a while loop which will run till the value of VAL from i(null value).
q=q?q"_":"_"; #### creating a variable named q whose value will be concatenated it itself with "_".
i++ #### incrementing the value of variable i with 1 each time.
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]") #### Now globally substituting the value of [ alphabets ] with [ value of q(which have all underscores in it) then ].
}
1 #### Mentioning 1 will print (edited or non-edited) lines here.
' Input_file #### Mentioning the Input_file here.
Alternative gawk solution:
awk -F'\\[|\\]' '$2!~/^[0-9]+:[0-9]$/{ gsub(/./,"_",$2); $2="["$2"]" }1' OFS= file
The output:
signal[_]
signal[__]
output [10:0]
-F'\\[|\\]' - treating [ and ] as field separators
$2!~/^[0-9]+:[0-9]$/ - performing action if the 2nd field does not represent table declaration
gsub(/./,"_",$2) - replace each character with _
This might work for you (GNU sed);
sed ':a;s/\(\[_*\)[[:alpha:]]\([[:alpha:]]*\]\)/\1_\2/;ta' file
Match on opening and closing square brackets with any number of _'s and at least one alpha character and replace said character by an underscore and repeat.
awk '{sub(/\[i\]/,"[_]")sub(/\[bg\]/,"[__]")}1' file
signal[_]
signal[__]
output [10:0]
input [i:1]
The explanation is as follows: Since bracket is as special character it has to be escaped to be handled literally then it becomes easy use sub.

Brainfuck challenge

I have a any challenge. I must write brainfuck-code.
For a given number n appoint its last digit .
entrance
Input will consist of only one line in which there is only one integer n ( 1 < = n < = 2,000,000,000 ) , followed by a newline ' \ n' (ASCII 10).
exit
On the output has to find exactly one integer denoting the last digit of n .
example I
entrance: 32
exit: 2
example II:
entrance: 231231132
exit: 2
This is what I tried, but it didn't work:
+[>,]<.>++++++++++.
The last input is the newline. So you have to go two memory positions back to get the last digit of the number. And maybe you don't have to return a newline character, so the code is
,[>,]<<.
Nope sorry, real answer is
,[>,]<.
because your answer was getting one too far ;)
Depending on the interpreter, you might have to escape the return key by yourself. considering the return key is ASCII: 10, your code should look like this :
>,----- -----[+++++ +++++>,----- -----]<.
broken down :
> | //first operation (just in case your interpreter does not
support a negative pointer index)
,----- ----- | //first entry if it's a return; you don't even get in the loop
[
+++++ +++++ | //if the value was not ASCII 10; you want the original value back
>, | //every next entry
----- ----- | //check again for the the return,
you exit the loop only if the last entered value is 10
]
<. | //your current pointer is 0; you go back to the last valid entry
and you display it
Your issue is that a loop continues for forever until at the end of the loop the cell the pointer is currently on in equal to 0. Your code never prints in the loop, and never subtracts, so your loop will never end, and all that your code does is take an ASCII character as input, move one forward, take an ASCII character as input, and so on. All of your code after the end of the loop is useless, because that your loop will never end.

Code Golf - Word Scrambler

Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.
Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)
J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '
Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".
Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,
Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/
C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}
Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.
For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.
TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l
Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.