awk only change matching line and print rest of lines without modification - perl

So I have a large file like the following:
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname1
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname2
RESOURCETAGMAPPINGLIST arn:aws:ec2:us-east-1:XXXXXX:instance/i-XXXXXXXXXXXXXXXXX
TAGS app-name appname1
..
I only want to modify the line with RESOURCETAGMAPPINGLIST and print the the other lines w/out modification. Then I want to print only specific fields on the matching like, like below:
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
arn ec2 us-east-1 XXXXXX
TAGS app-name appname2
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
..
I was trying using awk gsub command, but really could not get the -F: part to work out. Any help would be greatly appreciated and it does not matter if it's awk, sed or perl.

With awk. I used as input field separators at least one space ( +) or (|) one colon (:).
If a row contains string RESOURCETAGMAPPINGLIST then print columns 2, 4, 5, 6 and stop processing this row and continue with next row. If a row does not contain RESOURCETAGMAPPINGLIST then print complete row unchanged.
awk -F ' +|:' '/RESOURCETAGMAPPINGLIST/{print $2,$4,$5,$6; next} {print}' file
Output:
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
arn ec2 us-east-1 XXXXXX
TAGS app-name appname2
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
See: The Stack Overflow Regular Expressions FAQ

One awk idea using the default field delimiter and split():
awk '
/RESOURCETAGMAPPINGLIST/ { split($2,a,":") # split 2nd field on ":" delimiter, storing results in array a[]
print a[1],a[3],a[4],a[5]
next # skip to next line of inpu
}
1 # print current line
' sample.dat
# or as a one-liner sans comments:
awk '/RESOURCETAGMAPPINGLIST/ {split($2,a,":"); print a[1],a[3],a[4],a[5]; next} 1' sample.dat
This generates:
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1
arn ec2 us-east-1 XXXXXX
TAGS app-name appname2
arn ec2 us-east-1 XXXXXX
TAGS app-name appname1

Here is one way to do it:
#!/usr/bin/perl
use v5.30; # Perl v5.10 or above is required to use 'say' instead of 'print'
use warnings;
my $file = "datafile.txt"; #declare file name
open( my $fh, "<", $file ) or die( "Can't open `$file`: $!\n" );
#open the file with file handle $fh
while (my $line = <$fh>){ #read the file line by line
chomp $line; #remove line breaks
if ($line =~ m/^RESOURCETAGMAPPINGLIST/){ #if line start with
my #fields = split(/\t|:/, $line); #split in tab or : characters; each field becomes an element of #fields array
say (join ' ', (#fields[1,2,3,4])); #join the relevant fields with a single space and print
}else{
say $line; #if line does not start with RESOURCETAGMAPPINGLIST, simply print it
}
}
#when script finishes, Perl will automatically close the input file.

I assume that one keeps the fields 1,3,4,5 from the colon-separated part.
perl -wple'
s{^RESOURCETAGMAPPINGLIST\s+(.+)}
{ join " ", ( $1 =~ /([^:]+)/g )[0,2..4] }e' file
With -p the variable $_, which has the line to process, is printed after the processing.
If that keyword isn't matched the regex does nothing and $_ remains unchanged. If it matches then the whole line does and it, so $_, gets replaced with the return of the code in the replacement side. (With /e modifier the replacement side is evaluated as code, which extracts the select words separated by : in the rest of the line, captured in $1, and joins them by space.)
Or: test for the word, and either split the line and join the needed parts or print it as is
perl -wnlE'say
/^RESOURCETAGMAPPINGLIST/ ? join " ", (split /\s+|:/)[1,3..5] : $_' file
These are broken into lines so they are easier to read. They can be copy-pasted as they are (in bash), or brought onto one line. Or, this can be written far more nicely in a file, but the question seems to be asking for a command-line program.

This might work for you (GNU sed):
sed -E 's/^RESOURCETAGMAPPINGLIST *([^:]+):[^:]+:([^:]+):([^:]+):([^:]+).*/\1 \2 \3 \4/' file
Pattern match and reformat as required.

Related

Perl - Changing file name in the middle of write

I am trying to take a very large txt file (over a million lines) that I created in Perl and run it through a different statement in Perl that will essentially look something like this (note the following is shell)
a=0
b=1
while read line;
do
echo -n "" > "Write file"${b}
a=($a + 1)
while ( $a <= 5000)
do
echo $line >> "Write file"${b}
a=($a + 1)
done
a=0
b=($b + 1)
done < "read file"
Trying to size it down to 5k lines per file, and incrementing each time (filename1.txt, filename2.txt, filename3.txt, etc)
This doesn't seem to work in shell, possibly due to the size of the input file, and for the life of me I can't think of how to change what file I am writing to in the middle of the loop..
You can just do this in the shell using split.
For example:
split -l 5000 filename.txt filename.txt.
will split filename.txt into multiple files with a max of 5,000 lines each. The output files will be names filename.txt.aa, filename.txt.ab, filename.txt.ac, etc.
From my man split:
NAME
split -- split a file into pieces
SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m]] [-l line_count] [-p pattern] [file [name]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash (`-') or absent, split reads from the stan-
dard input.
The options are as follows:
-a suffix_length
Use suffix_length letters to form the suffix of the file name.
-b byte_count[k|m]
Create smaller files byte_count bytes in length. If ``k'' is appended to the number, the file is split into byte_count kilobyte pieces. If ``m'' is
appended to the number, the file is split into byte_count megabyte pieces.
-l line_count
Create smaller files n lines in length.
-p pattern
The file is split whenever an input line matches pattern, which is interpreted as an extended regular expression. The matching line will be the
first line of the next output file. This option is incompatible with the -b and -l options.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument is specified,
it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix
followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not specified, two letters are used as the suffix.
If the name argument is not specified, the file is split into lexically ordered files named with the prefix ``x'' and with suffixes as above.
As an aside, this is your fixed script:
#!/bin/sh
a=0
b=1
while read line; do
if [ $a -eq 0 ]; then
echo -n '' > out-file-${b}
fi
echo $line >> out-file-${b}
a=$(( $a + 1 ))
if [ $a -eq 10 ]; then
a=0
b=$(( $b + 1 ))
fi
done < in-file
Tested with bash and dash.

Perl how do I add text to specifically second line of file?

Trying to do this sort of thing in perl:
sed '1 a<!-- $Header: $\n Purpose: system generated file -->' -i test.xml
Add the header block and purpose to line #2 in the file for xml, shell scripts, etc...
Don't want to do this either:
`sed '1 a<!-- \$Header: \$\n Purpose: system generated file -->' -i test.xml`
But realize it's an option if absolutely necessary.
If you only pass one file, you can use the following:
perl -i -pe'
$_ .= "<!-- \$Header: \$\n Purpose: system generated file -->\n" if $. == 1;
' test.xml
If you might pass multiple files, you'll need to add a line so that $. is reset at the end of each file.
perl -i -pe'
$_ .= "<!-- \$Header: \$\n Purpose: system generated file -->\n" if $. == 1;
close(ARGV) if eof;
' test*.xml
(Note: eof() means something different than just eof. how awful is that!)
I added line breaks for readability. The commands will work as is, but you can remove the line breaks if you so desire.
Try this way:
perl -ple '++$i == 2 and $_ = "changed" # change $_ as you want' in.txt > out.txt

How do insert lines of text into file after a particular line in unix [duplicate]

This question already has answers here:
How do I add a line of text to the middle of a file using bash?
(6 answers)
Closed 10 years ago.
How do insert lines of text into file after a particular line in unix ?
Background: The file is an autogenerated textfile but I manually have to edit it every time it is regenerated to add in 4 additional lines after a particular line. I can gurantee that this line will always be in the file but I cannot guarantee excalty what line it will be on so I want the additional lines to be added on the basis of the position of this line rather than adding to a fixed rownumber. I want to automate this process as it is part of my build process.
I'm using Mac OSX so I can make use of unix comand line tools, but Im not very familiar with such tools and cannot work out how to do this.
EDIT
Thanks for the solution, although I havent managed to get them working yet:
I tried Sed solution
sed -i '/<string>1.0</string>/ a <key>CFBundleHelpBookFolder</key>\
<string>SongKongHelp</string>\
<key>CFBundleHelpBookName</key>\
<string>com.jthink.songkong.Help</string>
' /Applications/SongKong.app/Contents/Info.plist
but get error
sed: 1: "/Applications/SongKong. ...": invalid command code S
and I tried the bash solution
#!/bin/bash
while read line; do
echo "$line"
if [[ "$line" = "<string>1.0</string>"]]; then
cat mergefile.txt # or echo or printf your extra lines
fi
done < /Applications/SongKong.app/Contents/Info.plist
but got error
./buildosx4.sh: line 5: syntax error in conditional expression: unexpected token `;'
./buildosx4.sh: line 5: syntax error near `;'
./buildosx4.sh: line 5: ` if [[ "$line" = "<string>1.0</string>"]]; then'
EDIT 2
Now working, i was missing a space
#!/bin/bash
while read line; do
echo "$line"
if [[ "$line" = "<string>1.0</string>" ]]; then
cat mergefile.txt # or echo or printf your extra lines
fi
done < /Applications/SongKong.app/Contents/Info.plist
Assuming the marker line contains fnord and nothing else;
awk '1;/^fnord$/{print "foo"; print "bar";
print "baz"; print "quux"}' input >output
Another way to look at this is that you want to merge two files at some point in one of the files. If your extra four lines were in a separate file, you could make a more generic tool like this:
#!/usr/bin/awk
BEGIN {
SEARCH=ARGV[1]; # Get the search string from the command line
delete ARGV[1]; # Delete the argument, so subsequent arguments are files
}
# Collect the contents of the first file into a variable
NR==FNR {
ins=ins $0 "\n";
next;
}
1 # print every line in the second (or rather the non-first) file
# Once we're in the second file, if we see the match, print stuff...
$0==SEARCH {
printf("%s", ins); # Using printf instead of print to avoid the extra newline
next;
}
I've spelled this out for ease of documentation; you could obviously shorten it to something that looked more like triplee's answer. You'd invoke this like:
$ scriptname "Text to match" mergefile.txt origfile.txt > outputfile.txt
Done this way, you'd have a tool that could be used to achieve this kind of merge on different files and with different text.
Alternately, you could of course do this in pure bash.
#!/bin/bash
while read line; do
echo "$line"
if [[ "$line" = "matchtext" ]]; then
cat mergefile.txt # or echo or printf your extra lines
fi
done < origfile.txt
The problem can be solved efficiently for any filesize by this algorithm:
Read each line from the original file and print it to a tempfile.
If the last line was the marker line, print your insertion lines to the tempfile
Print the remaining lines
Rename the tempfile to the original filename.
As a Perl script:
#!perl
use strict; use warnings;
$^I = ".bak"; # create a backup file
while (<>) {
print;
last if /regex to determine if this is the line/;
}
print <<'END';
Yourstuff
END
print while <>; # print remaining lines
# renaming automatically done.
Testfile:
foo
bar
baz
qux
quux
Regex is /^ba/.
Usage: $ perl this_script.pl source-file
The testfile after processing:
foo
bar
Yourstuff
baz
qux
quux
use the sed 'a command with a regex for the line you need to match
sed -i '/the target line looks like this/ a this is line 1\
this is line 2\
this is line 3\
this is line 4
' FILENAME

How do I use perl like sed?

I have a file that has some entries like
--ERROR--- Failed to execute the command with employee Name="shayam" Age="34"
--Successfully executed the command with employee Name="ram" Age="55"
--ERROR--- Failed to execute the command with employee Name="sam" Age="23"
--ERROR--- Failed to execute the command with employee Name="yam" Age="3"
I have to extract only the Name and Age of those for whom the command execution was failed.
in this case i need to extract shayam 34 sam 23 yam 3. I need to do this in perl. thanks a lot..
perl -p -e 's/../../g' file
Or to inline replace:
perl -pi -e 's/../../g' file
As a one-liner:
perl -lne '/^--ERROR---.*Name="(.*?)" Age="(.*?)"/ && print "$1 $2"' file
Your title makes it not clear. Anyway...
while(<>) {
next if !/^--ERROR/;
/Name="([^"]+)"\s+Age="([^"]+)"/;
print $1, " ", $2, "\n";
}
can do it reading from stdin; of course, you can change the reading loop to anything else and the print with something to populate an hash or whatever according to your needs.
As a one liner, try:
perl -ne 'print "$1 $2\n" if /^--ERROR/ && /Name="(.*?)"\s+Age="(.*?)"/;'
This is a lot like using sed, but with Perl syntax.
The immediate question of "how do I use perl like sed?" is best answered with s2p, the sed to perl converter. Given the command line, "sed $script", simply invoke "s2p $script" to generate a (typically unreadable) perl script that emulates sed for the given set of commands.
Refer to comments :
my #a = <>; # Reading entire file into an array
chomp #a; # Removing extra spaces
#a = grep {/ERROR/} #a; # Removing lines that do not contain ERROR
# mapping with sed-like regexp to keep only names and ages :
#a = map {s/^.*Name=\"([a-z]+)\" Age=\"([0-9]+)\".*$/$1 $2/; $_} #a;
print join " ",#a; # print of array content

How can I extract all quotations in a text?

I'm looking for a SimpleGrepSedPerlOrPythonOneLiner that outputs all quotations in a text.
Example 1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"HAL,"
"said that everything was going extremely well.”
Example 2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"
etc.
(link to the corresponding text).
I like this:
perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'
It's a little verbose, but it handles escaped quotes and backtracking a lot better than the simplest implementation. What it's saying is:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch
[^"\\] # a character that is not a dquote or a backslash
| \\+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| \\ # OR again a backslash
(?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
If you don't need that much power--say it's only likely to be dialog and not structured quotes, then
/"([^"]*)"/
probably works about as well as anything else.
No regexp solution will work if you have nested quotes, but for your examples this works well
$ echo \"HAL,\" noted Frank, \"said that everything was going extremely well\"
| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"HAL,"
"said that everything was going extremely well"
$ cat eula.txt| perl -n -e 'while (m/(".*?")/g) { print $1."\n"; }'
"EULA"
"online"
"Software"
"Workstation Computer"
"Device"
"multiplexing"
"DRM"
"Secure Content"
"DRM Software"
"Secure Content Owners"
"DRM Upgrades"
"WMFSDK"
"Not For Resale"
"NFR,"
"Academic Edition"
"AE,"
"Qualified Educational User."
"Exclusion of Incidental, Consequential and Certain Other Damages"
"Restricted Rights"
"Exclusion des dommages accessoires, indirects et de certains autres dommages"
"Consumer rights"
grep -o "\"[^\"]*\""
This greps for " + anything except a quote, any number of times + "
The -o makes it only output the matched text, not the whole line.
grep -o '"[^"]*"' file
The option '-o' print only pattern