Change one line to column - sh

I have a file like this:
20180127200000
DEFAULT, Proc_m0_s1
DEFAULT, Proc_m0_s2
DEFAULT, Proc_m0_s3
20180127200001
DEFAULT, Proc_m0_s1
DEFAULT, Proc_m0_s2
DEFAULT, Proc_m0_s3
I'd like to convert to:
20180127200000 DEFAULT, Proc_m0_s1
20180127200000 DEFAULT, Proc_m0_s2
20180127200000 DEFAULT, Proc_m0_s3
20180127200001 DEFAULT, Proc_m0_s1
20180127200001 DEFAULT, Proc_m0_s2
20180127200001 DEFAULT, Proc_m0_s3
The time stamp appears every 4 lines.

Awk should be both faster and simpler than a shell loop for this type of task.
awk 'NR%4==1 { time=$1; next }
{ print time " " $0 }' file
NR is the line number. If the line number modulo 4 is one, remember the first field but don't print. Otherwise, print with the remembered value and a space as the prefix.

iconvert() {
while true; do
read linePrefix || return 0
for i in $(seq $((prefix_lines - 1))); do
read line || return 0
echo "$linePrefix" "$line"
done
done
}
You just call iconvert like this:
prefix_lines=4
iconvert < file.txt
which gives your desired output.

Related

How can I extract first and last line from multiple text blocks separated with new line?

I have a file containing multiple tests with detailed action written one beneath another. All test blocks are separated one from another by new line. I want to extract only first and last line from the all blocks and put it on one line for each block into a new file. Here is an example:
input.txt:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
.....
[testX]
duration
summary=x
code=
Results= PASS
output.txt should be sometime like this:
test1 PASS
test2 FAIL
...
testX PASS
eg2:
[Linux_MP3Enc_xffv.2_Con_37_003]
type = testcase
summary = MP3 encoder test
ActionGroup[Linux_Enc] = PASS
ActionGroup[Linux_Playb] = PASS
ActionGroup[Linux_Pause_Resume] = PASS
ActionGroup[Linux_Fast_Seek] = PASS
Duration = 230.607398987 s
Total_Result = PASS
[Composer__vtx_007]
type = testcase
summary = composer
Background[0xff000000] = PASS
Background[0xffFFFFFF] = PASS
Background[0xffFF0000] = PASS
Background[0xff00FF00] = PASS
Background[0xff0000FF] = PASS
Background[0xff00FFFF] = PASS
Background[0xffFFFF00] = PASS
Background[0xffFF00FF] = PASS
Duration = 28.3567230701 s
Total_Result = PASS
[Videox_Rotate_008]
type = testcase
summary = rotation
Rotation[0] = PASS
Rotation[1] = PASS
Rotation[2] = PASS
Rotation[3] = PASS
Duration = 14.0116529465 s
Total_Result = PASS
Thank you!
Short and simple gnu awk:
awk -F= -v RS='' '{print $1 $NF}' file
[Linux_MP3Enc_xffv.2_Con_37_003] PASS
[Composer__vtx_007] PASS
[Videox_Rotate_008] PASS
If you do not like the brackets:
awk -F'[]=[]' -v RS='' '{print $2 $NF}' file
Linux_MP3Enc_xffv.2_Con_37_003 PASS
Composer__vtx_007 PASS
Videox_Rotate_008 PASS
Using sed as tagged (although other tools would probably be more natural to use) :
sed -nE '/^\[.*\]$/h;s/^Results= ?//;t r;b;:r;H;x;s/\n/ /;p'
Explanation :
/^\[.*\]$/h # matches the [...] lines, put them in the hold buffer
s/^Results= ?// # matches the Results= lines, discards the useless part
t r;b # on lines which matched, jump to label r;
# otherwise jump to the end (and start processing the next line)
:r;H;x;s/\n/ /;p # label r; append the pattern space (which contains the end of the Results= line)
# to the hold buffer. Switch Hold buffer and pattern space,
# replace the linefeed in the pattern space by a space and print it
You can try it here.
One way to solve this is using a regular expression such as:
(?<testId>test\d+)(?:.*\n){4}.*(?<outcome>PASS|FAIL)
The regex matches your sample output and stores the test id (e.g. "test1") in the capture group named "testId" and the outcome (e.g. "PASS") in the capture group "outcome".
(Test it in regexr)
The regex can be used in any language with regex support. The below code shows how to do it in Python.
(Test it in repl.it)
import re
# Read from input.txt
with open('input.txt', 'r') as f:
indata = f.read()
# Modify the regex slightly to fit Python regex syntax
pattern = '(?:.*)(?P<testId>test\d+)(?:.*\n){4}.*(?P<outcome>PASS|FAIL)'
# Get a generator which yeilds all matches
matches = re.finditer(pattern, indata)
# Combine the matches to a list of strings
outputs = ['{} {}'.format(m.group('testId'), m.group('outcome')) for m in matches]
# Join all rows to one string
output = '\n'.join(outputs)
# Write to output.txt
with open('output.txt', 'w') as f:
f.write(output)
Running the above script on input.txt containing:
[test1]
duration
summary
code=
Results= PASS
[test2]
duration
summary=x
code=
Results=FAIL
[test444]
duration
summary=x
code=
Results= PASS
yields a file output.txt containing:
test1 PASS
test2 FAIL
test444 PASS
In order to print the first and last line from the block, how about:
awk -v RS="" '{
n = split($0, a, /\n/)
print a[1]
print a[n]
}' input.txt
Result for the 1st example:
[Linux_MP3Enc_xffv.2_Con_37_003]
Total_Result = PASS
[Composer__vtx_007]
Total_Result = PASS
[Videox_Rotate_008]
Total_Result = PASS
The man page of awk tells:
If RS is set to the null string, then records are separated by blank lines.
You can easily split the block with blank lines with this feature.
Hope this helps.

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

convert row to column based on text

I have a rather large file (single column) with data similar to this:
BT1111
2.2.2.2/3
3.3.3.3/4
7.2.1.1/5
BT6766
2.2.1.1/5
4.5.1.1/7
BT9898
4.4.4.4/2
8.8.8.8/9
I wish to find a function that can align it into two columns, by moving all entries starting with digit one column ($1 to $2) and enrich it with the corresponding BT field, so desired output should be
BT1111;2.2.2.2/3
BT1111;3.3.3.3/4
BT1111;7.2.1.1/5
BT6766;2.2.1.1/5
BT6766;4.5.1.1/7
BT9898;4.4.4.4/2
BT9898;8.8.8.8/9
I can't imagine how to ensure the "look for next occurence" should be performed, but hope there is a function for it I have managed to overlook ?
perl -nle'if (/^\D/) { $n=$_ } else { print "$n;$_" }' input.txt
See Specifying file to process to Perl one-liner for alternate usages.
$ awk '/BT/{a=$1; next}{print a ";" $1}' input.txt
BT1111;2.2.2.2/3
BT1111;3.3.3.3/4
BT1111;7.2.1.1/5
BT6766;2.2.1.1/5
BT6766;4.5.1.1/7
BT9898;4.4.4.4/2
BT9898;8.8.8.8/9

sed: replace letter between square brackets

I have the following string:
signal[i]
signal[bg]
output [10:0]
input [i:1]
what I want is to replace the letters between square brackets (by underscore for example) and to keep the other strings that represents table declaration:
signal[_]
signal[__]
output [10:0]
input [i:1]
thanks
try:
awk '{gsub(/\[[a-zA-Z]+\]/,"[_]")} 1' Input_file
Globally substituting the (bracket)alphabets till their longest match then with [_]. Mentioning 1 will print the lines(edited or without edited ones).
EDIT: Above will substitute all alphabets with one single _, so to get as many underscores as many characters are there following may help in same.
awk '{match($0,/\[[a-zA-Z]+\]/);VAL=substr($0,RSTART+1,RLENGTH-2);if(VAL){len=length(VAL);;while(i<len){q=q?q"_":"_";i++}};gsub(/\[[a-zA-Z]+\]/,"["q"]")}1' Input_file
OR
awk '{
match($0,/\[[a-zA-Z]+\]/);
VAL=substr($0,RSTART+1,RLENGTH-2);
if(VAL){
len=length(VAL);
while(i<len){
q=q?q"_":"_";
i++
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]")
}
1
' Input_file
Will add explanation soon.
EDIT2: Following is the one with explanation purposes for OP and users.
awk '{
match($0,/\[[a-zA-Z]+\]/); #### using match awk's built-in utility to match the [alphabets] as per OP's requirement.
VAL=substr($0,RSTART+1,RLENGTH-2); #### Creating a variable named VAL which has substr($0,RSTART+1,RLENGTH-2); which will have substring value, whose starting point is RSTART+1 and ending point is RLENGTH-2.
RSTART and RLENGTH are the variables out of the box which will be having values only when awk finds any match while using match.
if(VAL){ #### Checking if value of VAL variable is NOT NULL. Then perform following actions.
len=length(VAL); #### creating a variable named len which will have length of variable VAL in it.
while(i<len){ #### Starting a while loop which will run till the value of VAL from i(null value).
q=q?q"_":"_"; #### creating a variable named q whose value will be concatenated it itself with "_".
i++ #### incrementing the value of variable i with 1 each time.
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]") #### Now globally substituting the value of [ alphabets ] with [ value of q(which have all underscores in it) then ].
}
1 #### Mentioning 1 will print (edited or non-edited) lines here.
' Input_file #### Mentioning the Input_file here.
Alternative gawk solution:
awk -F'\\[|\\]' '$2!~/^[0-9]+:[0-9]$/{ gsub(/./,"_",$2); $2="["$2"]" }1' OFS= file
The output:
signal[_]
signal[__]
output [10:0]
-F'\\[|\\]' - treating [ and ] as field separators
$2!~/^[0-9]+:[0-9]$/ - performing action if the 2nd field does not represent table declaration
gsub(/./,"_",$2) - replace each character with _
This might work for you (GNU sed);
sed ':a;s/\(\[_*\)[[:alpha:]]\([[:alpha:]]*\]\)/\1_\2/;ta' file
Match on opening and closing square brackets with any number of _'s and at least one alpha character and replace said character by an underscore and repeat.
awk '{sub(/\[i\]/,"[_]")sub(/\[bg\]/,"[__]")}1' file
signal[_]
signal[__]
output [10:0]
input [i:1]
The explanation is as follows: Since bracket is as special character it has to be escaped to be handled literally then it becomes easy use sub.

Insert double quotes multiple times into string

I have inherited a flat html file with a few hundred lines similar to this:
<blink>
<td class="pagetxt bordercolor="#666666 width="203 colspan="3 height="20>
</blink>
So far I have not been able to work out a sed way of inserting the closing double quotes for each element. Probably needs something other than sed to do this. Can anyone suggest an easy way to do this?
Thanks
sed -i 's/"\([^" >]\+\)\( \|>\)/"\1"\2/g' file.html
Explanation:
" - leading double quote
\([^" >]\+\) - non-quote-or-space-or-'>' chars, grouped (into group 1)
\( \|>\) - terminating space or '>', grouped (into group 2)
We replace it with '"<group1>"<group2>'.
One solution that pops out at me is to parse through each line of the file looking for the quote. When it finds one, activate a flag to keep track of being inside a quoted area, then continue parsing the line until it hits the first space or > it comes to and inserts an additional " just before it. Flip the flag off, then continue through the string looking for the next quote. Probably not a perfect solution, but a start perhaps.
If all lines share the same structure, you could use a simple texteditor to globally replace
' bordercolor'
with
'" bordercolor'
(without single-quotes). This is then independend from the field values and works similarly for the other fields. You still have to do some manual work, but if it's just one big file, I'd bite the bullet this time and not waste probably more time working out a sed-solution.
This should do if your file is simple - it won't work if you have whitespace which should be inside the quotes - in that case, a more complex code will be needed, but can be done along the same lines.
#!usr/bin/env python
#change the "utf-8" bellow to your files encoding
data = open("<myfile.html>").read().decode("utf-8")
new_data = []
inside_tag = False
inside_quotes = False
for char in data:
if char == "<":
inside_tag = True
if char == '"':
inside_quotes = True
if inside_tag and (char.isspace() or char==">") and inside_quotes:
new_data.append('"')
inside_quotes = False
if char == ">":
inside_tag = False
new_data.append(char)
outputfile = open("<mynewfile.html>", "wt")
outputfile.write("".join(new_data).encode("utf-8"))
outputfile.close()
with bash
for file in *
do
flag=0
while read -r line
do
case "$line" in
*"<blink>"*)
flag=1
;;
esac
if [ "$flag" -eq 1 ];then
case "$line" in
*class=\"pagetxt*">" )
line="${line%>}\">"
flag=0
;;
esac
fi
echo "${line}"
done <"file" > temp
mv temp "$file"
done
Regular expressions are your friend:
Find: (="[^" >]+)([ >])
Replace: \1"\2
After you've done that, make sure to run this one too:
Find: </?blink>
Replace: \n
(This won't fix more than one class on an element, like <element class="class1 class2 id="jimmy">)