How can I extract first and last line from multiple text blocks separated with new line? - sed

I have a file containing multiple tests with detailed action written one beneath another. All test blocks are separated one from another by new line. I want to extract only first and last line from the all blocks and put it on one line for each block into a new file. Here is an example:
input.txt:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
.....
[testX]
duration
summary=x
code=
Results= PASS
output.txt should be sometime like this:
test1 PASS
test2 FAIL
...
testX PASS
eg2:
[Linux_MP3Enc_xffv.2_Con_37_003]
type = testcase
summary = MP3 encoder test
ActionGroup[Linux_Enc] = PASS
ActionGroup[Linux_Playb] = PASS
ActionGroup[Linux_Pause_Resume] = PASS
ActionGroup[Linux_Fast_Seek] = PASS
Duration = 230.607398987 s
Total_Result = PASS
[Composer__vtx_007]
type = testcase
summary = composer
Background[0xff000000] = PASS
Background[0xffFFFFFF] = PASS
Background[0xffFF0000] = PASS
Background[0xff00FF00] = PASS
Background[0xff0000FF] = PASS
Background[0xff00FFFF] = PASS
Background[0xffFFFF00] = PASS
Background[0xffFF00FF] = PASS
Duration = 28.3567230701 s
Total_Result = PASS
[Videox_Rotate_008]
type = testcase
summary = rotation
Rotation[0] = PASS
Rotation[1] = PASS
Rotation[2] = PASS
Rotation[3] = PASS
Duration = 14.0116529465 s
Total_Result = PASS
Thank you!

Short and simple gnu awk:
awk -F= -v RS='' '{print $1 $NF}' file
[Linux_MP3Enc_xffv.2_Con_37_003] PASS
[Composer__vtx_007] PASS
[Videox_Rotate_008] PASS
If you do not like the brackets:
awk -F'[]=[]' -v RS='' '{print $2 $NF}' file
Linux_MP3Enc_xffv.2_Con_37_003 PASS
Composer__vtx_007 PASS
Videox_Rotate_008 PASS

Using sed as tagged (although other tools would probably be more natural to use) :
sed -nE '/^\[.*\]$/h;s/^Results= ?//;t r;b;:r;H;x;s/\n/ /;p'
Explanation :
/^\[.*\]$/h # matches the [...] lines, put them in the hold buffer
s/^Results= ?// # matches the Results= lines, discards the useless part
t r;b # on lines which matched, jump to label r;
# otherwise jump to the end (and start processing the next line)
:r;H;x;s/\n/ /;p # label r; append the pattern space (which contains the end of the Results= line)
# to the hold buffer. Switch Hold buffer and pattern space,
# replace the linefeed in the pattern space by a space and print it
You can try it here.

One way to solve this is using a regular expression such as:
(?<testId>test\d+)(?:.*\n){4}.*(?<outcome>PASS|FAIL)
The regex matches your sample output and stores the test id (e.g. "test1") in the capture group named "testId" and the outcome (e.g. "PASS") in the capture group "outcome".
(Test it in regexr)
The regex can be used in any language with regex support. The below code shows how to do it in Python.
(Test it in repl.it)
import re
# Read from input.txt
with open('input.txt', 'r') as f:
indata = f.read()
# Modify the regex slightly to fit Python regex syntax
pattern = '(?:.*)(?P<testId>test\d+)(?:.*\n){4}.*(?P<outcome>PASS|FAIL)'
# Get a generator which yeilds all matches
matches = re.finditer(pattern, indata)
# Combine the matches to a list of strings
outputs = ['{} {}'.format(m.group('testId'), m.group('outcome')) for m in matches]
# Join all rows to one string
output = '\n'.join(outputs)
# Write to output.txt
with open('output.txt', 'w') as f:
f.write(output)
Running the above script on input.txt containing:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
[test444]
duration
summary=x
code=
Results= PASS
yields a file output.txt containing:
test1 PASS
test2 FAIL
test444 PASS

In order to print the first and last line from the block, how about:
awk -v RS="" '{
n = split($0, a, /\n/)
print a[1]
print a[n]
}' input.txt
Result for the 1st example:
[Linux_MP3Enc_xffv.2_Con_37_003]
Total_Result = PASS
[Composer__vtx_007]
Total_Result = PASS
[Videox_Rotate_008]
Total_Result = PASS
The man page of awk tells:
If RS is set to the null string, then records are separated by blank lines.
You can easily split the block with blank lines with this feature.
Hope this helps.

Related

sed: replace letter between square brackets

I have the following string:
signal[i]
signal[bg]
output [10:0]
input [i:1]
what I want is to replace the letters between square brackets (by underscore for example) and to keep the other strings that represents table declaration:
signal[_]
signal[__]
output [10:0]
input [i:1]
thanks
try:
awk '{gsub(/\[[a-zA-Z]+\]/,"[_]")} 1' Input_file
Globally substituting the (bracket)alphabets till their longest match then with [_]. Mentioning 1 will print the lines(edited or without edited ones).
EDIT: Above will substitute all alphabets with one single _, so to get as many underscores as many characters are there following may help in same.
awk '{match($0,/\[[a-zA-Z]+\]/);VAL=substr($0,RSTART+1,RLENGTH-2);if(VAL){len=length(VAL);;while(i<len){q=q?q"_":"_";i++}};gsub(/\[[a-zA-Z]+\]/,"["q"]")}1' Input_file
OR
awk '{
match($0,/\[[a-zA-Z]+\]/);
VAL=substr($0,RSTART+1,RLENGTH-2);
if(VAL){
len=length(VAL);
while(i<len){
q=q?q"_":"_";
i++
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]")
}
1
' Input_file
Will add explanation soon.
EDIT2: Following is the one with explanation purposes for OP and users.
awk '{
match($0,/\[[a-zA-Z]+\]/); #### using match awk's built-in utility to match the [alphabets] as per OP's requirement.
VAL=substr($0,RSTART+1,RLENGTH-2); #### Creating a variable named VAL which has substr($0,RSTART+1,RLENGTH-2); which will have substring value, whose starting point is RSTART+1 and ending point is RLENGTH-2.
RSTART and RLENGTH are the variables out of the box which will be having values only when awk finds any match while using match.
if(VAL){ #### Checking if value of VAL variable is NOT NULL. Then perform following actions.
len=length(VAL); #### creating a variable named len which will have length of variable VAL in it.
while(i<len){ #### Starting a while loop which will run till the value of VAL from i(null value).
q=q?q"_":"_"; #### creating a variable named q whose value will be concatenated it itself with "_".
i++ #### incrementing the value of variable i with 1 each time.
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]") #### Now globally substituting the value of [ alphabets ] with [ value of q(which have all underscores in it) then ].
}
1 #### Mentioning 1 will print (edited or non-edited) lines here.
' Input_file #### Mentioning the Input_file here.
Alternative gawk solution:
awk -F'\\[|\\]' '$2!~/^[0-9]+:[0-9]$/{ gsub(/./,"_",$2); $2="["$2"]" }1' OFS= file
The output:
signal[_]
signal[__]
output [10:0]
-F'\\[|\\]' - treating [ and ] as field separators
$2!~/^[0-9]+:[0-9]$/ - performing action if the 2nd field does not represent table declaration
gsub(/./,"_",$2) - replace each character with _
This might work for you (GNU sed);
sed ':a;s/\(\[_*\)[[:alpha:]]\([[:alpha:]]*\]\)/\1_\2/;ta' file
Match on opening and closing square brackets with any number of _'s and at least one alpha character and replace said character by an underscore and repeat.
awk '{sub(/\[i\]/,"[_]")sub(/\[bg\]/,"[__]")}1' file
signal[_]
signal[__]
output [10:0]
input [i:1]
The explanation is as follows: Since bracket is as special character it has to be escaped to be handled literally then it becomes easy use sub.

Sed: How to insert a pattern which includes single ticks?

I'm trying to use sed to replace a specific line within a configuration file:
The pattern for the line I want to replace is:
ALLOWED_HOSTS.*
The text I want to insert is:
'$PublicIP' (Including the single ticks)
But when I run the command:
sed 's/ALLOWED_HOSTS.*/ALLOWED_HOSTS = ['$PublicIP']/g' /root/project/django/mysite/mysite/settings.py
The line is changed to:
ALLOWED_HOSTS = [1.1.1.1]
instead of:
ALLOWED_HOSTS = ['1.1.1.1']
How shall I edit the command to include the single ticks as well?
You could try to escape the single ticks , or better you can reassign the variable including the simple ticks:
PublicIP="'$PublicIP'".
By the way even this sed without redifining var, works ok in my case:
$ a="3.3.3.3"
$ echo "ALLOWED_HOSTS = [2.2.2.2]" |sed 's/2.2.2.2/'"'$a'"'/g'
ALLOWED_HOSTS = ['3.3.3.3']
Even this works ok:
$ echo "ALLOWED_HOSTS = [2.2.2.2]" |sed "s/2.2.2.2/'$a'/g"
ALLOWED_HOSTS = ['3.3.3.3']

Replacing first occurence except comments using sed

I have a text file of the form:
a = 1
#b = [2,3]
c = 4
d = [5,6]
e = [7,8]
I want to replace the pattern inside the brackets (and the brackets) with a number, but ignore matches in the comments, preferably using sed.
For files with exactly one matching line, I've used
sed -i "/^#/!s/\[.*\]/9/" myfile
How can this be modified to replace only the first match if there are more?
This is the correct, because changing just the first occurence.
awk '!end && /^[^#]+ = \[/ {$3="9"; end=1}1' myfile
if there is not end flag and line is not beggining from # and match to = [, then change the third column and set the flag to prevent changing in next occurences.
a = 1
#b = [2,3]
c = 4
d = 9
e = [7,8] <--- this is not changed as you want
This one-liner should do the job:
sed '/^\s*#/!{s/\[[^]]*\]/9/}' file
add the -i option if you like to do the change in place.

Autofiltering Excel with multiple filter conditions

I am trying to autofilter in Excel using the below VBScript code. This script called multiple times from a Perl program.
Dim objExcel : Set objExcel = GetObject(,"Excel.Application")
objExcel.Visible = True
objExcel.Selection.AutoFilter
objExcel.ActiveSheet.Range("G1").AutoFilter WScript.Arguments.Item(0), _
WScript.Arguments.Item (1)
Now I would like to know: is there a way by which I can pass an array for WScript.Arguments.Item (1) so that all the conditions are selected in one go? The task is to delete the filtered value. I call this script through Perl multiple times and the above script filter one value at a time and delete. The program works fine, but is slow.
Following is the part of Perl which calls the VBScript.
while(<FILE>){
chomp;
system("CSCRIPT "."\"$currentWorkingDirectory\"".'aboveVBS.vbs 9 '."\"$_\"");
sleep(2);
}
If you put quotes around the values, VBScript will treat it as a single argument.
> cscript script.vbs arg1 "multiple values for arg 2"
In the script:
WScript.Echo WScript.Arguments.Count ' ==> 2
a = Split(WScript.Arguments(1))
WScript.Echo a(0) ' ==> multiple
WScript.Echo a(1) ' ==> values
WScript.Echo a(2) ' ==> for
WScript.Echo a(3) ' ==> arg
WScript.Echo a(4) ' ==> 2
Excel expects:
Range.AutoFilter <Field>, <Criteria>, <Operator>
If you want a list of criteria to filter on, you'll use xlFilterValues for the <Operator> argument. <Criteria> will be an array of string values, which we created above.
Const xlFilterVaues = 7
objExcel.ActiveSheet.Range("G1").AutoFilter WScript.Arguments.Item(0), a, xlFilterValues
So, just try adding Split() around WScript.Arguments(1) in your existing code, and pass xlFilterValues for the third param.
If only your second argument changes, you could pass the entire content of your data file to the VBScript:
local $/;
my $args = <FILE>;
$args =~ s/^\s+|\s+$//g;
$args =~ s/\r?\n/" "/g;
system("cscript \"$currentWorkingDirectory\\your.vbs\" 9 \"$args\"");
and change the processing in your VBScript to this:
Set xl = CreateObject("Excel.Application")
xl.Visible = True
Set wb = xl.Workbook.Open("C:\path\to\your.xlsx")
Set ws = wb.Sheets(1)
...
xl.Selection.AutoFilter
For i = 1 To WScript.Arguments.Count - 1
ws.Range("G1").AutoFilter WScript.Arguments(0), WScript.Arguments(i)
...
Next
Or you could simply call the VBScript with the field and the path to the data file:
system("cscript \"$currentWorkingDirectory\\your.vbs\" 9 \"$filepath\"");
and do all the processing in VBScript:
Set xl = CreateObject("Excel.Application")
xl.Visible = True
Set wb = xl.Workbook.Open("C:\path\to\your.xlsx")
Set ws = wb.Sheets(1)
...
xl.Selection.AutoFilter
Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.OpenTextFile(WScript.Arguments(1))
Do Until f.AtEndOfStream
ws.Range("G1").AutoFilter WScript.Arguments(0), f.ReadLine
...
Next
f.Close
Applying more than 2 AutoFilter conditions to a column at the same time is not possible. Check the signature of the AutoFilter method in the documentation:
expression .AutoFilter(Field, Criteria1, Operator, Criteria2, VisibleDropDown)
expression An expression that returns a Range object.
You have Critera1 and Criteria2 and an Operator for combining the two. Calling the AutoFilter method with another set of criteria replaces the existing criteria.

sas how to read in this raw data file

I have a raw data file like this:
JamesBrownSenior
AshleyPinkJunior
The first column is name. And second is a color tag. .. But for each column, the observation length varies.
I have tried this
data ct_11;
infile '';
length Name $ 10 Tag $ 10 Title $ 10;
input Name $ Tag $ Title $;
run;
It didn't work. I guess I missed some options.
If there is no delimiter you have to read it as a single variable and then split it afterwards based on a rule. In your case tou can add a delimiter using a regular expression and then use the scan function to write the words to different variables.
data ct_11 (keep=name tag title);
infile 'z:\nametagtitle.txt';
length line $120 name tag title $40;
input line $;
dlmline = prxchange('s/([A-Z]{1}[a-z]*)([A-Z]{1}[a-z]*)([A-Z]{1}[a-z]*)/$1 $2 $3/',-1,line);
name = scan(dlmLine,1);
tag = scan(dlmline,2);
title = scan(dlmline,3);
run;