This question already has answers here:
How do I print the '%' character with 'printf'?
(6 answers)
Closed 9 years ago.
I have a Perl script that takes as an input a text file containing several sentences (Sentences.txt). Each sentence is separated with a white line. The script creates separate text files for each sentence in Sentences.txt. For example, Sent1.txt for the first sentence in Sentences.txt, Sent2.txt for the second sentence in Sentences.txt and so on.
The problem comes when I try to print a sentence from Sentences.txt to the corresponding separate file (SentX.txt) using the printf function and the sentence contains a % character. How can I solve this?
This is the code:
#!/usr/bin/perl -w
use strict;
use warnings;
# Separate sentences
my $sep_dir = "./sep_dir";
# Sentences.txt
my $sent = "Sentences.txt";
open my $fsent, "<", $sent or die "can not open '$sent'\n";
# read sentences
my $kont = 1;
my $previous2_line = "";
my $previous_line = "";
my $mom_line = "";
while(my $line = <$fsent>){
chomp($line);
#
$previous2_line = $previous_line;
#
$previous_line = $mom_line;
#
$mom_line = $line;
if($mom_line !~ m/^\s*$/){
# create separate sentence file
my $fitx_esal = "Sent.$kont.txt";
open my $fesal, ">", $fitx_esal or die "can not open '$fitx_esal'\n";
printf $fesal $mom_line;
close $fesal or die "can not close '$fitx_esal'.\n";
$kont++;
}
}
close $fsent or die "can not close '$sent'.\n";
If you just want to put the sentence as you found it, why not use print? That has no Problem with %.
If printf is required you will need to replace every % with %%, for example using
$sentence =~ s/%/%%/g;
The f in printf stands for "format", not "file". You're missing the format parameter.
printf $fesal "%s", $mom_line;
But you could simply use
print $fesal $mom_line;
To include % in a (s)printf format, double it: %%.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to read from a text file only specific text, for example:
FileExample:
1111111/first/second/third/fourth.c11111111...etc...
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
I would like to read the whole file except the part of the file from the 3rd "1" from the first "/" until the ".c" after the 4th "/" to make myself more clear I will bold the text I want my program to read and leave unbolded the part of the text I don't want my program to read.
1111111/first/second/third/fourth.c11111111...etc...etc
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
after I do all the operations I want with the bolded text,I want to write it in another file the unbolded text unmodified and the bolded text with the modifications made after the operations,and placed in the original file order.
open my $fh1, '<', 'hex.txt';
open my $fh2, '<', 'hex2.txt';
until ( eof $fh1 or eof $fh2 ) {
my #l1 = map hex,unpack '(a2)*', <$fh1>;
my #l2 = map hex,unpack '(a2)*', <$fh2>;
my $n = #l2 > #l1 ? #l2 : #l1;
my #sum = map {
$l1[$_] + $l2[$_];
} 0 .. $n-1;
#sum = map { sprintf '%X', $_ } #sum;
open my $out, '>', 'sum.txt';
print { $out } #sum, "\n";
}
I want to sum the hex values from the file hex to the sum values from file hex2,both files have the same construction type, both have text and hex values in the same location and have the exact same length.i just need to understand how to tell him to read from location1 to location2.
Convert file to hex:
{
my $input = do {
open my $in, '<', $ARGV[0];
local $/;
<$in>
};
open my $out, '>', 'hex.txt';
print $out unpack 'H*', $input;
}
Your precise criteria aren't clear. Are those digits always ones? It's a mistake to show such a very simple example when you're hoping for help. But I suggest you use split
Something like this perhaps?
use strict;
use warnings;
use feature 'say';
my $data = do {
local $/;
<DATA>;
};
$data =~ tr/\n//d;
say for split qr{\d\d\d(?:/\w+)+/\w+\.c}, $data;
__DATA__
1111111/first/second/third/fourth.c11111111...etc...
1111111/afirst/asecond/athird/afourth.c11111111...etc...etc
output
1111
11111111...etc...1111
11111111...etc...etc
I changed the input to be able to recognize what 1's it matches:
abcd111/first/second/third/fourth.cX1111111...etc...
abcd111/afirst/asecond/athird/afourth.cX1111111...etc...etc
This seems to produce the output you want
perl -pe 's=([^/]+).../.*\.c=$1='
[^/] is a character class, it matches anything that's not a slash;
+ means it must be present one or more times
putting it into parentheses makes it a "capture group", i.e. Perl will remember what matched that part.
.../ matches any three character followed by a slash.
.* matches anything.
\.c matches a dot followed by a c.
the whole matching part (abcd in the sample input, up to the c before X) is substituted (hence the starting s) with $1, i.e. the contents of the first capture group, i.e. the abcd in the sample input.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a big file with repeated lines as follows:
#UUSM
ABCDEADARFA
+------qqq
!2wqeqs6777
I will like to output the all the 'second line' in the file. I have the following code snipped for doing this, but it's not working as expected. Lines 1, 3 and 4 are in the output instead.
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
while (<IN>) {
$line = $line . $_;
if ($line =~ /^\#/) {
<IN>;
#next;
my $line = $line;
}
}
print "$line";
Please help!
try this
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
my $lines = "";
while (<IN>) {
if ($. % 4 == 2) $lines .= $_;
}
print "$lines";
I assume what you are asking is how to print the line that comes after a line that begins with #:
perl -ne 'if (/^\#/) { print scalar <> }' file1.txt
This says, "If the line begins with #, then print the next line. Do this for all the files in the argument list." The scalar function is used here to impose a scalar context on the file handle, so that it does not print the whole file. By default print has a list context for its arguments.
If you actually want to print the second line in the file, well, that's even easier. Here's a few examples:
Using the line number $. variable, printing if it equals line number 2.
perl -ne '$. == 2 and print, close ARGV' yourfile.txt
Note that if you have multiple files, you must close the ARGV file handle to reset the counter $.. Note also the use of the lower precedence operator and will force print and close to both be bound to the conditional.
Using regular logic.
perl -ne 'print scalar <>; close ARGV;'
perl -pe '$_ = <>; close ARGV;'
Both of these uses a short-circuit feature by closing the ARGV file handle when the second line is printed. If you should want to print every other line of a file, both these will do that if you remove the close statements.
perl -ne '$at = $. if /^\#/; print if $. - 1 == $at' file1.txt
Written out longhand, the above is equivalent to
open my $fh, "<", "file1.txt";
my $at_line = 0;
while (<$fh>) {
if (/^\#/) {
$at_line = $.;
}
else {
print if $. - 1 == $at_line;
}
}
If you want lines 2, 6, 10 printed, then:
while (<>)
{
print if $. % 4 == 2;
}
Where $. is the current line number — and I didn't spend the time opening and closing the file. That might be:
{
my $file = "file1.txt";
open my $in, "<", $file or die "cannot open input file $file: $!";
while (<$in>)
{
print if $. % 4 == 2;
}
}
This uses the modern preferred form of file handle (a lexical file handle), and the braces around the construct mean the file handle is closed automatically. The name of the file that couldn't be opened is included in the error message; the or operator is used so the precedence is correct (the parentheses and || in the original were fine too and could be used here, but conventionally are not).
If you want the line after a line starting with # printed, you have to organize things differently.
my $print_next = 0;
while (<>)
{
if ($print_next)
{
print $_;
$print_next = 0;
}
elsif (m/^#/)
{
$print_next = 1;
}
}
Dissecting the code in the question
The original version of the code in the question was (line numbers added for convenience):
1 open(IN,"<", "file1.txt") || die "cannot open input file:$!";
2 while (<IN>) {
3 $line = $line . $_;
4 if ($line =~ /^\#/) {
5 <IN>;
6 #next;
7 my $line = $line;
8 }
9 }
10 print "$line";
Discussion of each line:
OK, though it doesn't use a lexical file handle or report which file could not be opened.
OK.
Premature and misguided. This adds the current line to the variable $line before any analysis is done. If it was desirable, it could be written $line .= $_;
Suggests that the correct description for the desired output is not 'the second lines' but 'the line after a line starting with #. Note that since there is no multi-line modifier on the regex, this will always match only the first line segment in the variable $line. Because of the premature concatenation, it will match on each line (because the first line of data starts with #), executing the code in lines 5-8.
Reads another line into $_. It doesn't test for EOF, but that's harmless.
Comment line; no significance except to suggest some confusion.
my $line = $line; is a self-assignment to a new variable hiding the outer $line...mainly, this is weird and to a lesser extent it is a no-op. You are not using use strict; and use warnings; because you would have warnings if you did. Perl experts use use strict; and use warnings; to make sure they haven't made silly mistakes; novices should use them for the same reason.
Of itself, OK. However, the code in the condition has not really done very much. It skips the second line in the file; it will later skip the fourth, the sixth, the eighth, etc.
OK.
OK, but...if you're only interested in printing the lines after the line starting #, or only interested in printing the line numbers 2N+2 for integral N, then there is no need to build up the entire string in memory before printing each line. It will be simpler to print each line that needs printing as it is found.
Earlier I was working on a loop within a loop and if a match was made it would replace the entire string from the second loop file. Now i have a slightly different situation. I'm trying to replace a substring from the first loop with a string from the second loop. They're both csv files and semicolon delimited. What i'm trying to replace are special characters: from the numerical code to the character itself The first file looks like:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
and the second file has the numerical code and the corresponding character:
Ą;Ą
ą;ą
Ǟ;Ǟ
Á;Á
á;á
Â;Â
ł;ł
The first semicolon in the second file belongs to the numerical code of the corresponding character and should not be used to split the file. The result should be:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał;8;9
This is the code I have. How can i fix this?
use strict;
use warnings;
my $inputfile1 = shift || die "input/output!\n";
my $inputfile2 = shift || die "input/output!\n";
my $outputfile = shift || die "output!\n";
open my $INFILE1, '<', $inputfile1 or die "Used/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "Used/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "Used/Not found :$!\n";
my $infile2_pos = tell $INFILE2;
while (<$INFILE1>) {
s/"//g;
my #elements = split /;/, $_;
seek $INFILE2, $infile2_pos, 0;
while (<$INFILE2>) {
s/"//g;
my #loopelements = split /;/, $_;
#### The problem part ####
if (($elements[2] =~ /\&\#\d{3}\;/g) and (($elements[2]) eq ($loopelements[0]))){
$elements[2] =~ s/(\&\#\d{3}\;)/$loopelements[1]/g;
print "$2. elements[2]\n";
}
#### End problem part #####
}
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
#print "\n"
}
close $INFILE1;
close $INFILE2;
close $OUTFILE;
exit 0;
Assuming your character codes are standard Unicode entities, you are better off using HTML::Entities to decode them.
This program processes the data you show in your first file and ignores the second file completely. The output seems to be what you want.
use strict;
use warnings;
use HTML::Entities 'decode_entities';
binmode STDOUT, ":utf8";
while (<DATA>) {
print decode_entities($_);
}
__DATA__
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
output
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
You split your #elements at every occurrence of ;, which is then removed. You will not find it in your data, the semicolon in your Regexp can never match, so no substitutions are done.
Anyway, using seek is somewhat disturbing for me. As you have a reasonable number of replacement codes (<5000), you might consider putting them into a hash:
my %subst;
while(<$INFILE2>){
/^&#(\d{3});;(.*)\n/;
$subst{$1} = $2;
}
Then we can do:
while(<$INFILE1>){
s| &# (\d{3}) | $subst{$1} // "&#$1" |egx;
# (don't try to concat undef
# when no substitution for our code is defined)
print $OUTFILE $_;
}
We do not have to split the files or view them as CSV data if replacement should occur everywhere in INFILE1. My solution should speed things up a bit (parsing INFILE2 only once). Here I assumed your input data is correct and the number codes are not terminated by a semicolon but by length. You might want to remove that from your Regexes.(i.e. m/&#\d{3}/)
If you have trouble with character encodings, you might want to open your files with :uft8 and/or use Encode or similar.
I need to edit file , the main issue is to append text between two known lines in the file
for example I need to append the following text
a b c d e f
1 2 3 4 5 6
bla bla
Between the first_line and the second_line
first_line=")"
second_line="NIC Hr_Nic ("
remark: first_line and second_line argument can get any line or string
How to do this by perl ? ( i write bash script and I need to insert the perl syntax in my script)
lidia
You could read the file in as a single string and then use a regular expression to do the search and replace:
use strict;
use warnings;
# Slurp file myfile.txt into a single string
open(FILE,"myfile.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
# Set strings to find and insert
my $first_line = ")";
my $second_line = "NIC Hr_Nic (";
my $insert = "hello world";
# Insert our text
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
# Write output to output.txt
open(OUTPUT,">output.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
By unsetting $/ we put Perl into "slurp mode" and so can easily read the whole file into $file.
We use the s/// operator to do a search and replace using our two search lines as a pattern.
The \Q and \E tell Perl to escape the strings between them, i.e. to ignore any special characters that happen to be in $first_line or $second_line.
You could always write the output over the original file if desired.
The problem as you state it is not solvable using the -i command line option since this option processes a file one line at a time; to insert text between two specific lines you'll need to know about two lines at once.
Well to concenate strings you do
my $text = $first_line . $second_line;
or
my $text = $first_line;
$text .= $second_line;
I'm not sure if I understand your question correctly. A "before and after" example of the file content would, I think, be easier. Anyhow, Here's my take on it, using splice instead of a regular expression. We must of course know the line numbers for this to work.
Load the file into an array:
my #lines;
open F, '<', 'filename' or die $!;
push #lines, $_ for <F>;
close F;
Insert the stuff (see perldoc -f splice):
splice #lines, 1, 0, ('stuff');
..and you're done. All you need to do now is save the array again:
open F, '>', 'filename' or die $!;
print F #lines;
close F;
Here is what I am trying to do:
I want to read a text file into an array of strings. I want the string to terminate when the file reads in a certain character (mainly ; or |).
For example, the following text
Would you; please
hand me| my coat?
would be put away like this:
$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';
Could I get some help on something like this?
This will do it. The trick to using split while preserving the token you're splitting on is to use a zero-width lookback match: split(/(?<=[;|])/, ...).
Note: mctylr's answer (currently the top rated) isn't actually correct -- it will split fields on newlines, b/c it only works on a single line of the file at a time.
gbacon's answer using the input record separator ($/) is quite clever--it's both space and time efficient--but I don't think I'd want to see it in production code. Putting one split token in the record separator and the other in the split strikes me as a little too unobvious (you have to fight that with Perl ...) which will make it hard to maintain. I'm also not sure why he's deleting multiple newlines (which I don't think you asked for?) and why he's doing that only for the end of '|'-terminated records.
# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!;
# set file reading to slurp (whole file) mode (note that this affects all
# file reads in this block)
local $/ = undef;
my $string = <$fh>;
# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g;
# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (#strings) = split(/(?<=[;|])/, $string);
One way is to inject another character, like \n, whenever your special character is found, then split on the \n:
use warnings;
use strict;
use Data::Dumper;
while (<DATA>) {
chomp;
s/([;|])/$1\n/g;
my #string = split /\n/;
print Dumper(\#string);
}
__DATA__
Would you; please hand me| my coat?
Prints out:
$VAR1 = [
'Would you;',
' please hand me|',
' my coat?'
];
UPDATE: The original question posed by James showed the input text on a single line, as shown in __DATA__ above. Because the question was poorly formatted, others edited the question, breaking the 1 line into 2. Only James knows whether 1 or 2 lines was intended.
I prefer #toolic's answer because it deals with multiple separators very easily.
However, if you wanted to overly complicate things, you could always try:
#!/usr/bin/perl
use strict; use warnings;
my #contents = ('');
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
$line =~ s{$/}{ };
if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
$contents[-1] .= $1;
push #contents, $2;
}
else {
$contents[-1] .= $1;
}
}
print "[$_]\n" for #contents;
__DATA__
Would you; please
hand me| my coat?
Something along the lines of
$text = <INPUTFILE>;
#string = split(/[;!]/, $text);
should do the trick more or less.
Edit: I've changed "/;!/" to "/[;!]/".
Let Perl do half the work for you by setting $/ (the input record separator) to vertical bar, and then extract semicolon-separated fields:
#!/usr/bin/perl
use warnings;
use strict;
my #string;
*ARGV = *DATA;
$/ = "|";
while (<>) {
s/\n+$//;
s/\n/ /g;
push #string => $1 while s/^(.*;)//;
push #string => $_;
}
for (my $i = 0; $i < #string; ++$i) {
print "\$string[$i] = '$string[$i]';\n";
}
__DATA__
Would you; please
hand me| my coat?
Output:
$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';