Perl Pattern matching and appending [closed] - perl

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
In perl I want to achieve the following translation:
stmt1; gosub xyz;
to
stmt1; xyz();
How can I do this?

The answers already given has provided the approximative answer, this will deal with your edge cases (missing semi-colons, additional clauses after semi-colons).
perl -plwe 's/\bgosub\s+([^;]+)/$1()/g'
It will match any sequence of characters after the gosub keyword followed by whitespace that are "not semi-colon" and remove them. I also added the /g global modifier, as it seems likely that you'd want to do all replacements possible on a single line. Note the use of word boundary \b to prevent partial matches, e.g. not replace legosub.
If the word boundary is not sufficient, e.g. it will replace 1.gosub because . causes a break between word characters, you can use a negative lookbehind instead:
perl -plwe 's/(?<![^;\s])gosub\s+([^;]+)/$1()/g'
This requires that any character before gosub is not anything except semi-colon or whitespace. Note that the double negation also allows for non-matches (beginning of line).

Run from the command line on the file you want to edit (replaceing file.ext):
perl -i.bk -pe 's/gosub (.*?);/$1()/g' file.ext

my $str = 'stmt1; gosub xyz;';
$str =~ s/gosub (.*?);?/$1();/;
print $str;

Related

Storing regex with '^' and '$' inside constant [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I revised my code and realized I stored the regex inside constant and then used the latter's value for the variable
I'm trying to store a regular expression inside variable constant using the qr// operator. Everything is fine except for '^' and '$'. I need them to match beginning-of-line and end-of-line respectively.
use constant REGEX_LINE => qr/\^(\s*)(.*)\$/;
my $rx = REGEX_LINE;
Printing $rx reveals that it contains some addiotional stuff:
(?^:^(\s*)(.*)$)
Of course now the regex doesn't match my data
If you expect ^ and $ to match start and end of line,
don't escape them (or else they will match ^ and $), and
use /m (or else they will match the start and end of the string).
use constant REGEX_LINE => qr/^(\s*)(.*)$/m;
Add escape character(\) before $ symbol otherwise it will consider as a part of variable

Append to non-empty line that doesn't start with whitespace AND is followed, two lines down, by a non-empty line that doesn't start with whitespace

I am converting several unruly, early 90's DOS-generated text files to something more usable. I need to append a set of characters to all of the non-empty lines in said text files that don't start with whitespace AND that are followed, two lines down, by another non-empty line that doesn't start with whitespace (I will refer to all single lines of text that meet these characteristics as "target" lines). BTW, irrelevant to the problem are the characteristics of the line directly below each of the target lines.
Of interest is the fact that all of the target lines in the above-mentioned text files end with the same character. Also, the command I'm looking for needs to slot into a rather long pipeline.
Suppose I have the following file:
foo
third line foo
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foo
eleventh line foo
this line starts with a space foo
last line foo
I want the output to look like this:
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
Although I'm looking for a sed solution, awk and perl are welcome as well. All solutions must be able to be used in a pipeline. Also welcomed are solutions which handle a more general case (e.g. able to append the desired text to target lines that end in various ways, including whitespace).
Now, for the backstory:
I recently asked a question similar to the subject question a few days ago (see here). As you can see, I got some great answers. It turned out, however, that I did not fully understand my problem, so I did not ask the correct question that would actually solve said problem.
Now, I'm asking the right question!
Based on what I learned by scrutinizing the answers to the question I linked to above, I've cobbled together the following sed command
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Ugly, yes, but it works for my humble purposes. Indeed, as my original intent with this question was to post a question, then self-answer same, you can see this sed construct posted below as one of the answers (posted by me).
I'm sure there are better ways to solve this particular problem, however...any ideas, anyone?
From your posted expected output it looks like you meant to say "is followed, two lines down, by a line that DOES NOT start with whitespace" instead of "is followed, two lines down, by a line that DOES start with whitespace".
This produces the output you show:
$ cat tst.awk
NR>2 { print p2 ((p2 ~ /^[^[:blank:]]/) && /^[^[:blank:]]/ ? "bar" : "") }
{ p2=p1; p1=$0 }
END { print p2 ORS p1 }
$ awk -f tst.awk file
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
It simply keeps a 2 line buffer and adds "bar" to the end of the line being printed given whatever condition you need. It will work on all POSIX awks and any others that support POSIX character classes (for the rest, change [[:blank:]] to [ \t]).
You have over-analysed the problem so that your question now reads as a computer program, and you have got that program wrong. Requirements are best explained using examples and real data, so that we have some hope of rationalising the problem in our heads
This Perl program alters your algorithm so the output matches your required output
use strict;
use warnings 'all';
chomp(my #data = <>);
my $i = 0;
for ( #data ) {
$_ .= 'bar' if /^\S/ and $data[$i+2] =~ /^\S/;
++$i;
last if $i+2 > $#data;
}
print "$_\n" for #data;
output
foobar
third line foobar
fifth line foo
this line starts with a space foo
this line starts with a space foo
ninth line foobar
eleventh line foo
this line starts with a space foo
last line foo
This sed one-liner seems to do the trick for the specific case outlined in the OP:
sed '1N;N;/^[^[:space:]]/s/^\([^[:space:]].*\o\)\(\n\n[^[:space:]].*\)$/\1bar\2/;P;D' infile
Thanks to the excellent clarifying information given by Benjamin W. in his answer to one of my recent questions, I was able to cobble together this one-liner that solved my specific problem. Please refer to same if you wish to gain insight into said command.

Generate List of IP Addresses via Command Line? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm interested in using command line (possibly Perl) to generate a list of all possible IP addresses.
I've done similar with PHP in the past by using the long2ip function and creating a list from 0 to the interger 4294967295.
Is there a way to do this in Perl instead though?
I'm basically just looking for the quickest way to generate a text file that has a list of all 4,294,967,296 possible IP addresses.
There is no need to use any modules. This is a trivial problem.
for my $i (0..255) {
for my $j (0..255) {
for my $k (0..255) {
for my $l (0..255) {
printf("%d.%d.%d.%d\n", $i,$j,$k,$l)
}
}
}
}
One-liner time?
perl -MSocket=inet_ntoa -le 'print inet_ntoa(pack "N", $_) for 0..2**32-1'
Source: http://www.perlmonks.org/?node_id=786521 via quick googling.
Perl isn't strictly necessary either, of course. The following generates a quick sed script on the fly and calls it successively.
octets () { sed "h;$(for ((i=0; i<256; i++)); do printf "g;s/^/$i./p;"; done)"; }
octets <<<'' | octets | octets | octets | sed 's/\.$//'
The octets function generates 256 copies of its input with a (zero-based) line number and a dot prepended to each. (You could easily append at the end instead, of course.) In the sed scripting language, the h command copies the input to the hold space and g retrieves it back, overwriting whatever we had there before. The C-style for loop and the <<< here string are Bash extensions, so not POSIX shell.

sed or perl replace characters leaving some text intact [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
sed replace characters leaving some text intact
how to replace some characters using sed (or maybe perl), but leaving unknown number intact in.e. in file there are multiple lines like this:
<"START=xxx">
<"START=yyy">
<"START=zzz">
The 'xxx', 'yyy' and 'zzz' are different unknown values (numbers). I want to remove ending "> and replace it, also replace the beginning (but that's not to difficult for me) so in the end the file looks like this:
<START>xxx</START>
<START>yyy</START>
<START>zzz</START>
Thank you in advance!
this should do it:
sed 's;<"\([^=]\+\)=\([^"]\+\)">;<\1>\2</\1>;' file
however keep in mind that processing xml like content with line-oriented tools is not the correct way to do it, unless the format is very strict and the case focuses on a strict and well-defined subset of the formatting language.
For the fun of it, a Perl solution:
perl -pe's#<"(.+?)=(\d+)">#<$1>$2</$1>#' <file >outfile
or
perl -pie's#<"(.+?)=(\d+)">#<$1>$2</$1>#' file
for in-place replacement
awk -F"=" '{gsub(/\"|<|>/,"");print "<"$1">"$2"</"$1">"}' your_file
tested below:
> cat temp
<"START=xxx">
<"START=yyy">
<"START=zzz">
> awk -F"=" '{gsub(/\"|<|>/,"");print "<"$1">"$2"</"$1">"}' temp
<START>xxx</START>
<START>yyy</START>
<START>zzz</START>

How do I delete newlines ('\n', 0x0A) from non-empty lines using tr(1)?

I have a file named file1 with following content:
The answer t
o your question
A conclusive a
nswer isn’t al
ways possible.
When in doubt, ask pe
ople to cite their so
urces, or to explain
Even if we don’t agre
e with you, or tell y
ou.
I would like to convert file1 into file2. Latter should look like this:
The answer to your question
A conclusive answer isn’t always possible.
When in doubt, ask people to cite their sources, or to explain
Even if we don’t agree with you, or tell you.
In case I simply execute cat file1 | tr -d "\n" > file2", all the newline characters will be deleted. Ho do delete only those newline characters, which are on the non-empty lines using the tr(1) utility?
perl -00 -lpe 'tr/\n//d'
-00 is Perl's "paragraph" mode, reading the input with one or more blank lines as the delimiter. -l appends the system newline character to the print command, so it's safe to delete all newlines in the input.
tr can't do this, but sed easily can
sed -ne '$!H;/^$/{x;s/\n//g;G;p;d;}' file1 > file2
This finds non-empty lines and holds them. Then, on empty lines, it removes newlines from the held data and prints the result followed by a newline. The held data is deleted and the process repeats.
EDIT:
Per #potong's comment, here's a version which doesn't require an extra blank line at the end of the file.
sed -ne 'H;/^$/{x;s/\n//g;G;p;};${x;s/\n//g;x;g;p;}' file1 > file2
If there's a character that you know doesn't appear in your input, you could do something like this:
# Assume that the input doesn't contain the '|' character at all
tr '\n' '|' < file1 | sed 's/\([^|]\)|\([^|]\)/\1\2/g' | tr '|' '\n' > file2
This replaces all newlines with the replacement character |; sed then deletes all instances of | that come after and before some other character; and finally, it replaces | back with newlines.
This may work for you:
# sed '1{h;d};H;${x;s/\([^\n]\)\n\([^\n]\)/\1\2/g;p};d' file
The answer to your question
A conclusive answer isn't always possible.
When in doubt, ask people to cite their sources, or to explain
Even if we don't agree with you, or tell you.
The newlines in file1 fall into four classes:
newline followed by another newline
newline preceded by newline
newline at the end of file
sandwiched newline
Deleting the first class by reading the entire input (the -000 option) and substituting one newline everywhere we see a pair of them (s/\n\n/\n/g) gets us
$ perl -000 -pe 's/\n\n/\n/g' file1
The answer t
o your question
A conclusive a
nswer isn’t al
ways possible.
When in doubt, ask pe
ople to cite their so
urces, or to explain
Even if we don’t agre
e with you, or tell y
ou.
That's not what we want because the first class of newlines should terminate lines in file2.
We may try to be clever and use negative look-behind to delete newlines preceded by other newlines (the second class), but the output is indistinguishable from the previous case, which makes sense because this time we're deleting the latter rather than the former in each adjoined pair of newlines.
$ perl -000 -pe 's/(?<=\n)\n//g' file1
The answer t
o your question
A conclusive a
nswer isn’t al
ways possible.
When in doubt, ask pe
ople to cite their so
urces, or to explain
Even if we don’t agre
e with you, or tell y
ou.
Even so, this still isn't what we want because newlines preceded by other newlines become the blank lines in file2.
It's obvious that we want to hang on to the newline at the end of file1.
What we want then is a program that deletes the fourth class only: each newline that is not preceded by another newline and that is followed by neither another newline nor logical end-of-input.
Using Perl's look-around assertions, specification is straightforward although perhaps a bit intimidating in appearance. "Not preceded by newline" is the negative look-behind (?<!\n). Using negative look-ahead (?!...) we don't want to see another newline or (|) the end of the input ($).
Putting it all together we get
$ perl -000 -pe 's/(?<!\n)\n(?!\n|$)//g' file1
The answer to your question
A conclusive answer isn’t always possible.
When in doubt, ask people to cite their sources, or to explain
Even if we don’t agree with you, or tell you.
Finally, to create file2, redirect the standard output.
perl -000 -pe 's/(?<!\n)\n(?!\n|$)//g' file1 >file2
You can't get that with tr by itself. tr is very handy, but is strictly a char-by-char filter, no look-ahead or look-behind.
You might be able to get your example output with sed, but it would really be painful (I think!). edit (sed master #Sorpigal proves me wrong!)
Here's a solution with awk
/home/shellter:>cat <<-EOS \
| awk 'BEGIN{RS="\n\n"}; { gsub("\n", "", $0) ;printf("%s %s", $0, "\n\n") }'
The answer t
o your question
A conclusive a
nswer isn’t al
ways possible.
When in doubt, ask pe
ople to cite their so
urces, or to explain
Even if we don’t agre
e with you, or tell y
ou.
EOS
# output
The answer to your question
A conclusive answer isnt always possible.
When in doubt, ask people to cite their sources, or to explain
Even if we dont agree with you, or tell you.
Weird, it is displaying as triple-spaced, but it is really dbl-spaced.
Awk has predefined variables that it populates for each file, and each line of text that it reads, i.e.
RS = RecordSeperator -- normally a line of data, but a configurable value, that when set
to '\n\n' means a blank line, or a typical separation on a paragraph
$0 = complete line of text (as defined by the internal variables RS (RecordSeparator)
In this problem, it is each paragraph of data, viewed though
as a record.
$1 = first field in text (as defined by the internal variables FS (FieldSeparator)
which defaults to (possibly multiple) space chars OR tab char
a line with 2 connected spaces chars and 1 tab char has 3 fields)
NF = Number(of)Fields in current line of data (again fields defined by value of FS as
described above)
(there are many others, besides, $0, $n, $NF, $FS, $RS).
you can programatically increment for values like $1, $2, $3, by using a variable as in the example code, like $i (i is a variable that has a number between 2 and NF. The leading '$'
says give me the value of field i (i.e. $2, $3, $4 ...)
I hope this helps.