Perl: how to format a string containing a tilde character "~" - perl

I have run into an issue where a perl script we use to parse a text file is omitting lines containing the tilde (~) character, and I can't figure out why.
The sample below illustrates what I mean:
#!/usr/bin/perl
use warnings;
formline " testing1\n";
formline " ~testing2\n";
formline " testing3\n";
my $body_text = $^A;
$^A = "";
print $body_text
The output of this example is:
testing1
testing3
The line containing the tilde is dropped entirely from the accumulator. This happens whether there is any text preceding the character or not.
Is there any way to print the line with the tilde treated as a literal part of the string?

~ is special in forms (see perlform) and there's no way to escape it. But you can create a field for it and populate it with a tilde:
formline " \#testing2\n", '~';

The first argument to formline is the "picture" (template). That picture uses various characters to mean particular things. The ~ means to suppress output if the fields are blank. Since you supply no fields in your call to formline, your fields are blank and output is suppressed.
my #lines = ( '', 'x y z', 'x~y~z' );
foreach $line ( #lines ) { # forms don't use lexicals, so no my on control
write;
}
format STDOUT =
~ ID: #*
$line
.
The output doesn't have a line for the blank field because the ~ in the picture told it to suppress output when $line doesn't have anything:
ID: x y z
ID: x~y~z
Note that tildes coming from the data are just fine; they are like any other character.
Here's probably something closer to what you meant. Create a picture, #* (variable-width multiline text), and supply it with values to fill it:
while( <DATA> ) {
local $^A;
formline '#*', $_;
print $^A, "\n";
}
__DATA__
testing1
~testing2
testing3
The output shows the field with the ~:
testing1
~testing2
testing3
However, the question is very odd because the way you appear to be doing things seems like you aren't really doing what formats want to do. Perhaps you have some tricky thing where you're trying to take the picture from input data. But if you aren't going to give it any values, what are you really formatting? Consider that you may not actually want formats.

Related

Perl: Find a match, remove the same lines, and to get the last field

Being a Perl newbie, please pardon me for asking this basic question.
I have a text file #server1 that shows a bunch of sentences (white space is the field separator) on many lines in the file.
I needed to match lines with my keyword, remove the same lines, and extract only the last field, so I have tried with:
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my #uniqmatchedline = split(/ /, #allmatchedlines);
my $lastfield = $uniqmatchedline[-1]\n";
print "$lastfield\n";
and it gives me the output showing:
1
I don't know why it's giving me just "1".
Could someone please explain why I'm getting "1" and how I can get the last field of the matched line correctly?
Thank you!
my #uniqmatchedline = split(/ /, #allmatchedlines);
You're getting "1" because split takes a scalar, not an array. An array in scalar context returns the number of elements.
You need to split on each individual line. Something like this:
my #uniqmatchedline = map { split(/ /, $_) } #allmatchedlines;
There are two issues with your code:
split is expecting a scalar value (string) to split on; if you are passing an array, it will convert the array to scalar (which is just the array length)
You did not have a way to remove same lines
To address these, the following code should work (not tested as no data):
my #allmatchedlines;
open(output1, "ssh user1#server1 cat /tmp/myfile.txt |");
while(<output1>) {
chomp;
#allmatchedlines = $_ if /mysearch/;
}
close(output1);
my %existing;
my #uniqmatchedline = grep !$existing{$_}++, #allmatchedlines; #this will return the unique lines
my #lastfields = map { ((split / /, $_)[-1]) . "\n" } #uniqmatchedline ; #this maps the last field in each line into an array
print for #lastfields;
Apart from two errors in the code, I find the statement "remove the same lines and extract only the last field" unclear. Once duplicate matching lines are removed, there may still be multiple distinct sentences with the pattern.
Until a clarification comes, here is code that picks the last field from the last such sentence.
use warnings 'all';
use strict;
use List::MoreUtils qw(uniq)
my $file = '/tmp/myfile.txt';
my $cmd = "ssh user1\#server1 cat $file";
open my $fh, '-|', $cmd // die "Error opening $cmd: $!"; # /
while (<$fh>) {
chomp;
push #allmatchedlines, $_ if /mysearch/;
}
close(output1);
my #unique_matched_lines = uniq #allmatchedlines;
my $lastfield = ( split ' ', $unique_matched_lines[-1] )[-1];
print $lastfield, "\n";
I changed to the three-argument open, with error checking. Recall that open for a process involves a fork and returns pid, so an "error" doesn't at all relate to what happened with the command itself. See open. (The # / merely turns off wrong syntax highlighting.) Also note that # under "..." indicates an array and thus need be escaped.
The (default) pattern ' ' used in split splits on any amount of whitespace. The regex / / turns off this behavior and splits on a single space. You most likely want to use ' '.
For more comments please see the original post below.
The statement #allmatchedlines = $_ if /mysearch/; on every iteration assigns to the array, overwriting whatever has been in it. So you end up with only the last line that matched mysearch. You want push #allmatchedlines, $_ ... to get all those lines.
Also, as shown in the answer by Justin Schell, split needs a scalar so it is taking the length of #allmatchedlines – which is 1 as explained above. You should have
my #words_in_matched_lines = map { split } #allmatchedlines;
When all this is straightened out, you'll have words in the array #uniqmatchedline and if that is the intention then its name is misleading.
To get unique elements of the array you can use the module List::MoreUtils
use List::MoreUtils qw(uniq);
my #unique_elems = uniq #whole_array;

Append string in the beginning and the end of a line containing certain string

all
I want to know how to append string in the beginning and the end of a line containing certain string using perl?
So for example, my line contains:
%abc %efd;
and I want to append 123 at the beginning of the line and 456 at the end of the line, so it would look like this:
123 %abc %efd 456
8/30/16 UPDATE--------------------------------
So far I have done something like this:
foreach file (find . -type f)
perl -ne 's/^\%abc\s+(\S*)/**\%abc $1/; print;' $file > tmp; mv tmp $file
end
foreach file (find . -type f)
perl -ne 's/$\%def\;\s+(\S*)/\%def\;**\n $1/; print;' $file > tmp; mv tmp $file
end
so this does pretty well except that when abc and def are not in one string.
for example:
%abc
something something something
%def
this would turn out to be
%abc
something something something
%def;
which is not what I want.
Thank you
In you case, you want to append string when line of file match the certain string, it means match and replace.
Firstly, read each line of your input file.
Secondly, check if it match with the string you want to append string into the beginning and the end.
Then replace the match string by the new string which contain additional beginning string, the match string and additional end string.
my $input_file = 'your file name here';
my $search_string = '%abc %efd';
my $add_begin = '123';
my $add_end = '456';
# Read file
open(my $IN, '<', $input_file) or die "cannot open file $input_file";
# Check each line of file
while (my $row = <$IN>) {
chomp $row;
$row =~ s/^($search_string)$/$add_begin $1 $add_end/g;
print $row."\n";
}
Try with input file as below:
%abc %efd
asdahsd
234234
%abc
%efd
%abc%efd
You will receive the result as we expected:
123 %abc %efd 456
asdahsd
234234
%abc
%efd
%abc%efd
Modify the code as your requirement and contact me if there's any issue.
Use m modifier to replacing beginning and ending with line by line.
s/^\%abc/123 $&/mg;
s/\%def$/ 456/mg;
Used together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string. source
Welcome to StackOverflow. We strive to help people solve problems in their existing code and learn languages, rather than simply answer one-off questions, the solutions to which can be easily found in 101 tutorials and documentation. The type of question you've posted doesn't leave a lot of room for learning, and doesn't do much to help future learners. It would help us greatly if you could post a more complete example, including what you've tried so far to get it working.
All that being said, there are two main ways to prepend and append to a string in Perl: 1. the concatenation operator, . and 2. string interpolation.
Concatenation
Use a . to join two strings together. You can chain operations together to compose a longer string.
my $str = '%abc %efd';
$str = '123 ' . $str . ' 456';
say $str; # prints "123 %abc %efd 456" with a trailing newline
Interpolation
Enclose a string in double quotes to instruct Perl to interpolate (i.e. find and evaluate) any Perl-style variables enclosed within the string.
my $str = '%abc %efd';
$str = "123 $str 456";
say $str; # prints "123 %abc %efd 456" with a trailing newline
You'll notice that in both examples we prepended and appended to the existing string. You can also create new variable(s) to hold the result(s) of these operations. Other methods of manipulating and building strings include the printf and sprintf functions, the substr function, the join function, and regular expressions, all of which you will encounter as you continue learning Perl.
As far as looking to see if a string contains a certain substring before performing the operation, you can use the index function or a regular expression:
if (index($str, '%abc %efd') >= 0) {
# or...
if ($str =~ /%abc %efd/) {
Remember to use strict; at the top of your Perl scripts and always (at least while you're learning) declare variables with my. If you're having trouble with the say function, you may need to add the statement use feature 'say'; to the top of your script.
You can find an index of excellent Perl tutorials at learn.perl.org. Good luck and have fun!
UPDATE Here is (I believe) a complete answer to your revised question:
find . -type f -exec perl -i.bak -pe's/^(%abc)\s+(\S*)\s+(%def;)$/**\1 \2 \3**/'
This will modify the files in place and create backup files with the extension .bak. Keep in mind that the expression \S* will only match non-whitespace characters; if you need to match strings that contain whitespace, you will need to update this expression (something like .*? might be workable for you).

what's the proper syntax to insert a multi-line substitution string and apply it toward a single array value?

Here's the scenario -- One step of the process involves fixing city names when the data is obviously misspelled, along with some basic conversions like "MTN" to "Mountain" and so forth. I've built a variable containing several substitution strings, and I'm trying to apply that set of subs on one of the input fields later down the line.
my $citysub = <<'EOF';
s/DEQUEEN/DE QUEEN/;
s/ELDORADO/EL DORADO/;
... # there are about 100 such substitution strings
EOF
...
while ($line <INFILE>)
{
...
#field = split(/","/,$line); # it's a comma-delimited file with quoted strings; this is spltting exactly like I intend; at the end, I'll piece it back together properly
...
# the 9th field and 12th field are city names, i.e., $field[8] and $field[12]
$field[8] =~ $citysub; # this is what I'm wanting to do, but it doesn't work!
# since that doesn't work, I'm using the following, but it's much slower, obviiously
$field[8] = `echo $field[8]|sed -e "$citysub"`; # external calls to system commands
So, what's the proper syntax to insert a multi-line substitution string and apply it toward a single array value?
my %citysub = ( "DEQUEEN" => "DE QUEEN", "ELDORADO" => "EL DORADO" );
for my $find ( keys %citysub ) {
my $replace = $citysub{ $find };
$field[8] =~ s/$find/$replace/g;
}
Explanation: Create a hash of "thing to match" => "thing to replace with". then loop over that hash and run s/// with the thing to match and the thing to replace with.

Perl: Replace consecutive spaces in this given scenario?

an excerpt of a big binary file ($data) looks like this:
\n1ax943021C xxx\t2447\t5
\n1ax951605B yyy\t10400\t6
\n1ax919275 G2L zzz\t6845\t6
The first 25 characters contain an article number, filled with spaces. How can I convert all spaces between the article numbers and the next column into a \x09 ? Note the one or more spaces between different parts of the article number.
I tried a workaround, but that overwrites the article number with ".{25}xxx»"
$data =~ s/\n.{25}/\n.{25}xxx/g
Anyone able to help?
Thanks so much!
Gary
You can use unpack for fixed width data:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper $_ for map join("\t", unpack("A25A*")), <DATA>;
__DATA__
1ax943021C xxx 2447 5
1ax951605B yyy 10400 6
1ax919275 G2L zzz 6845 6
Output:
$VAR1 = "1ax943021C\txxx\t2447\t5";
$VAR1 = "1ax951605B\tyyy\t10400\t6";
$VAR1 = "1ax919275 G2L\tzzz\t6845\t6";
Note that Data::Dumper's Useqq option prints whitecharacters in their escaped form.
Basically what I do here is take each line, unpack it, using 2 strings of space padded text (which removes all excess space), join those strings back together with tab and print them. Note also that this preserves the space inside the last string.
I interpret the question as there being a 25 character wide field that should have its trailing spaces stripped and then delimited by a tab character before the next field. Spaces within the article number should otherwise be preserved (like "1ax919275 G2L").
The following construct should do the trick:
$data =~ s/^(.{25})/{$t=$1;$t=~s! *$!\t!;$t}/emg;
That matches 25 characters from the beginning of each line in the data, then evaluates an expression for each article number by stripping its trailing spaces and appending a tab character.
Have a try with:
$data =~ s/ +/\t/g;
Not sure exactly what you what - this will match the two columns and print them out - with all the original spaces. Let me know the desired output and I will fix it for you...
#!/usr/bin/perl -w
use strict;
my #file = ('\n1ax943021C xxx\t2447\t5', '\n1ax951605B yyy\t10400\t6',
'\n1ax919275 G2L zzz\t6845\t6');
foreach (#file) {
my ($match1, $match2) = ($_ =~ /(\\n.{25})(.*)/);
print "$match1'[insertsomethinghere]'$match2\n";
}
Output:
\n1ax943021C '[insertsomethinghere]'xxx\t2447\t5
\n1ax951605B '[insertsomethinghere]'yyy\t10400\t6
\n1ax919275 G2L '[insertsomethinghere]'zzz\t6845\t6

How to merge files with line-skipping

Have two files:
file f1 has the next structure (after the # are comments which are not in the file)
SomeText1 #Section name - one word [a-zA-Z]
acd:some text #code:text - the code contains only [a-z]
opo:some another text #variable number of code:text pairs
wed:text too #in the SomeText1 section are 3 pairs
SomeText2
xxx:textttt #here only 1 code:text pair
SomeText3
zzz:texxxxxxx #here only 1 code:text pair too
and file f2 what contains in the same order as the above file the next lines:
1000:acd:opo:wed:123.44:4545.23:1233.23 #3 codes - like in the above segment 1
304:xxx:10:11:12.12 #1 code - these lines contains only
4654:zzz:0 #codes and numbers
the desired output is
SomeText1:1000:acd:opo:wed:123.44:4545.23:1233.23
acd:some text:
opo:some another text:
wed:text too:
SomeText2:304:xxx:10:11:12
xxx:textttt:
SomeText3:4654:zzz:0
zzz:texxxxxxx:
So need to add the lines from the f2 to "section name" line. The codes in every line in the f2 file are the same as the codes in the code:text pairs in the f1
Haven't no idea how to start, because
can't use the paste command because i don't have the same line-count in the both files, and
can't use join, because here aren't common keys in both files.
So, would be really happy, when someone tell me SOME ALGORITHM, how to start - and I will program it myself.
I'm offering you different approach - I provide a code, and you should figure out how it works ;) :)
paste -d':' f1 <(perl -pe '$\="\n"x($c=()=/[a-z]+/g)' <f2)
produces exactly what you want from your inputs.
EDIT - Explanation:
The soultion comes from your comment the lines contains only codes and numbers. Therefore it is possible easily get the codes from the line.
therefore enough enter as many empty lines after each line - how many codes you have
the /[a-z]+/g matched every code and return them
the $c =()= is the "Rolex operator" - what allows count the list of matches
the count of matched codes gives the number how much empty lines are needed
the $\ = "\n" x NUMBER - mean repeat NUMBER times the string before `x, e.g. when have 3 codes, will repeat 3 times the "\n" (newline) character.
the newlines are added to the variabe $\ - output record sep.
and because the -p switch process the file by lines and print every line in the form "print $_$\;" - so after every line will print the output record separator - what contains a number of newlines.
therefore we get empty lines
I hope than my english was enough ok for the explanation.
Or wholly in Perl:
my $skip;
while (<$f1>) {
chomp;
my $suffix;
if ($skip--) {
$suffix = "\n";
} else {
$suffix = <$f2>;
$skip = () = $suffix =~ /[a-z]+/g;
}
print "$_:$suffix";
}