How to assign number for a repeating pattern - sed

I am doing some calculations using gaussian. From the gaussian output file, I need to extract the input structure information. The output file contains more than 800 structure coordinates. What I did so far is, collect all the input coordinates using some combinations of the grep, awk and sed commands, like so:
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"}1' | sed '/--/d' > test.out
This helped me to grep all the input coordinates and insert a line with "structure number". So now I have a file that contains a pattern which is being repeated in a regular fashion. The file is like the following:
structure Number
4.176801 -0.044096 2.253823
2.994556 0.097622 2.356678
5.060174 -0.115257 3.342200
structure Number
4.180919 -0.044664 2.251182
3.002927 0.098946 2.359346
5.037811 -0.103410 3.389953
Here, "Structure number" is being repeated. I want to write a number like "structure number:1", "structure number 2" in increasing order.
How can I solve this problem?
Thanks for your help in advance.

I am not familiar at all with a program called gaussian, so I have no clue what the original input looked like. If someone posts an example I might be able to give an even shorter solution.
However, as far as I got it the OP is contented with the output of his/her code besided that he/she wants to append an increasing number to the lines inserted with awk.
This can be achieved with the following line (adjusting the OP's code):
grep -A 7 "Input orientation:" test.log | grep -A 5 "C" | awk '/C/{print "structure number"++i}1' | sed '/--/d' > test.out
Addendum:
Even without knowing the actual input, I am sure that one can at least get rid of the sed command leaving that piece of work to awk. Also, there is no need to quote a single character grep pattern:
grep -A 7 "Input orientation:" test.log | grep -A 5 C | awk '/C/{print "structure number"++i}!/--/' > test.out
I am not sure since I cannot test, but it should be possible to let awk do the grep's work, too. As a first guess I would try the following:
awk '/Input orientation:/{li=7}!li{next}{--li}/C/{print "structure number"++i;lc=5}!lc{next}{--lc}!/--/' test.log > test.out
While this might be a little bit longer in code it is an awk-only solution doing all the work in one process. If I had input to test with, I might come up with a shorter solution.

Related

sed `D` with address range

As explained in manual, D deletes a portion of the pattern space, up to the first embedded newline. But I can not find any doc explain D combined with address ranges. For example:
$ cat /tmp/test
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2,$D'
accident if I use one.
My wife won't let me buy a power saw. She is afraid of an
Looks like if there are 2 or more lines in pattern space, the first portion till the first embedded newline will be deleted.
Any official doc for it ?
And why 2D does not work at all?
$ cat /tmp/test | sed -ne '$p;:a;N;$!{ba};2D'
Nothing will be shown for above cmd.

perl regex matched part as output filename

when I have a simple file like
Ann Math 99
Bob Math 100
Ann Chemistry 92
Ann History 78
I may split it into files per person with
awk '{print > $1}' input_filename
However, when the file becomes complex, it is no longer possible to do so unless I use a very complex regex as a field separator. I find that I can extract output filename with some regex, and the following command seems to be able to do what I want for a test with 5 lines:
sed 5q input_filename | perl -nle 'if(/\[([A-Za-z0-9_]+)\]/){open(FH,">","$1"); print FH $_; close FH}'
but the file is large and the command seems to be inefficient. Are there better ways to do it?
original files are like this:
SOME_VERY_LONG_STUFF[TAG1]SOME_EVEN_LONGER_STUFF
SOME_VERY_LONG_STUFF[TAG2]SOME_EVEN_LONGER_STUFF
SOME_VERY_LONG_STUFF[TAG3]SOME_EVEN_LONGER_STUFF
SOME_VERY_LONG_STUFF[TAG1]SOME_EVEN_LONGER_STUFF
SOME_VERY_LONG_STUFF[TAG3]SOME_EVEN_LONGER_STUFF
...
and I just want to split it into files with name TAG1, TAG2, TAG3..., each file contains and only contains lines in the original file that has the tag in the bracket.
the first line with small modifications:
Nov 30 18:00:00 something#syslog: [2019-11-30 18:00:00][BattleEnd],{"result":1,"life":[[0,30,30],[1,30,30],[2,30,29],[3,30,29],[4,30,29],[5,28,29],[6,28,21],[7,28,21],[8,28,14],[9,28,14],[10,29,13],[11,21,13],[12,21,13],[13,15,13],[14,16,12],[15,12,12],[16,12,12],[17,9,12],[18,9,12],[19,5,12],[20,5,12],[21,3,12],[22,3,12],[23,1,12],[24,1,10],[25,1,10],[26,1,10],[27,1,10],[28,2,9],[29,-1,9]],"Info":[[160,0],[161,0],[162,0],[163,0],[155,0],[157,0],[158,0],[159,0]],"cards":[11401,11409,11408,12201,12208,10706,12002,10702,12207,12204,12001,12007,12208,10702,12005,10701,12005,11404,10705,10705,12007,11401,10706,12002,12001,12204,10701,12207,11404,11409,11408,12201]}
the tag I want is "BattleEnd". I want to split the log according to log sources.
EDIT: Since OP changed samples so adding this code now, completely based on shown samples of OP.
awk -F"[][]" '{print >> ($4);close($4)}' Input_file
OR if you want to close output files(to avoid too many files opened error) on whenever previous field is NOT matched then try following.
awk -F"[][]" 'prev!=$4{close(prev)} {print >> ($4);prev=$4}' Input_file
Could you please try following, based on your shown samples.
awk '
match($0,/[^]]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*\[/,"",val)
print >> (val)
close(val)
}
' Input_file

Improving sed program - conditions

I use this code according to this question.
$ names=(file1.txt file2.txt file3.txt) # Declare array
$ printf 's/%s/a-&/g\n' "${names[#]%.txt}" # Generate sed replacement script
s/file1/a-&/g
s/file2/a-&/g
s/file3/a-&/g
$ sed -f <(printf 's/%s/a-&/g\n' "${names[#]%.txt}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
How to make conditions that solve the following problem please?
names=(file1.txt file2.txt file3file2.txt)
I mean that there is a world in the names of files that is repeated as a part of another name of file. Then there is added a- more times.
I tried
sed -f <(printf 's/{%s}/{s-&}/g\n' "${files[#]%.tex}")
but the result is
\input{a-{file1}}
I need to find {%s} and a- place between { and %s
It's not clear from the question how to resolve conflicting input. In particular, the code will replace any instance of file1 with a-file1, even things like 'foofile1'.
On surface, the goal seems to be to change tokens (e.g., foofile1 should not be impacted by by file1 substitution. This could be achieved by adding word boundary assertion (\b) - before and after the filename. This will prevent the pattern from matching inside other longer file names.
printf 's/\\b%s\\b/a-&/g\n' "${names[#]%.txt}"
Since this explanation is too long for comment so adding an answer here. I am not sure if my previous answer was clear or not but my answer takes care of this case and will only replace exact file names only and NOT mix of file names.
Lets say following is array value and Input_file:
names=(file1.txt file2.txt file3file2.txt)
echo "${names[*]}"
file1.txt file2.txt file3file2.txt
cat file1
TEXT
\connect{file1}
\begin{file2}
\connect{file3}
TEXT
75
Now when we run following code:
awk -v arr="${names[*]}" '
BEGIN{
FS=OFS="{"
num=split(arr,array," ")
for(i=1;i<=num;i++){
sub(/\.txt/,"",array[i])
array1[array[i]"}"]
}
}
$2 in array1{
$2="a-"$2
}
1
' file1
Output will be as follows. You could see file3 is NOT replaced since it was NOT present in array value.
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{file3}
TEXT
75

Using sed to comment out lines that contain a specific string of text

Please bear with me as I'm new to the forums and tried to do my research before posting this. What I'm trying to do is to use sed to look through multiple lines of a file and any line that contains the words 'CPU Usage" I want it to comment out that line and also 19 lines immediately after that.
Example file.txt
This is some random text CPU USAGE more random text
Line2
Line3
Line4
Line5
etc.
I want sed to find the string of text CPU usage and comment out the line and the 19 lines following
#This is some random text CPU USAGE more random text
#Line2
#Line3
#Line4
#Line5
#etc.
This is what I've been trying but obviously it is not working since I'm posting on here asking for help
sed '/\/(CPU Usage)s/^/#/+18 > File_name
sed: -e expression #1, char 17: unknown command: `^'
I'd like to be able to use this on multiple files. Any help you can provide is much appreciated!
GNU sed has a non-standard extension (okay, it has many non-standard extensions, but there's one that's relevant here) of permitting /pattern/,+N to mean from the line matching pattern to that line plus N.
I'm not quite sure what you expected your sed command to do with the \/ part of the pattern, and you're missing a single quote in what you show, but this does the trick:
sed '/CPU Usage/,+19 s/^/#/'
If you want to overwrite the original files, add -i .bak (or just -i if you don't mind losing your originals).
If you don't have GNU sed, now might be a good time to install it.
This can easily be done with awk
awk '/CPU Usage/ {f=20} f && f-- {$0="#"$0}1' file
When CPU Usage is found, set flag f=20
If flag f is true, decrements until 0 and for every time, add # in front of the line and print it.
Think this should work, cant test it, if anyone finds something wrong just let me know :)
awk '/CPU Usage/{t=1}t{x++;$0="#"$0}x==19{t=0;x=0}1' file

I want to print a text file in columns

I have a text file which looks something like this:
jdkjf
kjsdh
jksfs
lksfj
gkfdj
gdfjg
lkjsd
hsfda
gadfl
dfgad
[very many lines, that is]
but would rather like it to look like
jdkjf kjsdh
jksfs lksfj
gkfdj gdfjg
lkjsd hsfda
gadfl dfgad
[and so on]
so I can print the text file on a smaller number of pages.
Of course, this is not a difficult problem, but I'm wondering if there is some excellent tool out there for solving problems like these.
EDIT: I'm not looking for a way to remove every other newline from a text file, but rather a tool which interprets text as "pictures" and then lays these out on the page nicely (by writing the appropriate whitespace symbols).
You can use this python code.
tables=input("Enter number of tables ")
matrix=[]
file=open("test.txt")
for line in file:
matrix.append(line.replace("\n",""))
if (len(matrix)==int(tables)):
print (matrix)
matrix=[]
file.close()
(Since you don't name your operating system, I'll simply assume Linux, Mac OS X or some other Unix...)
Your example looks like it can also be described by the expression "joining 2 lines together".
This can be achieved in a shell (with the help of xargs and awk) -- but only for an input file that is structured like your example (the result always puts 2 words on a line, irrespective of how many words each one contains):
cat file.txt | xargs -n 2 | awk '{ print $1" "$2 }'
This can also be achieved with awk alone (this time it really joins 2 full lines, irrespective of how many words each one contains):
awk '{printf $0 " "; getline; print $0}' file.txt
Or use sed --
sed 'N;s#\n# #' < file.txt
Also, xargs could do it:
xargs -L 2 < file.txt
I'm sure other people could come up with dozens of other, quite different methods and commandline combinations...
Caveats: You'll have to test for files with an odd number of lines explicitly. The last input line may not be processed correctly in case of odd number of lines.