Perl LWP::Simple File.txt in Array not spaces - perl

The file does not have spaces and do i need to keep every word in the corresponding array,
content in var, the file is more large, but this is ok.
my $file = "http://www.ausa.com.ar/autopista/carteleria/plano/mime.txt";
&VPM4362=008000&VPM4381=FFFFFF&VPM4372=FFFFFF&VPM4391=008000&VPM4382=FFFF00&VPM4392=FF0000&VPM4182=FFFFFF&VPM4181=FFFF00&VPM4402=FFFFFF&VPM4401=FFFF00&VPM4412=008000&VPM4411=FF0000&VPM4422=FFFFFF&VPM4421=FFFFFF&VPM4322=FFFF00&CPMV001_1_Ico=112&CPMV001_1_1=AHORRE 15%&CPMV001_1_2=ADHIERASE AUPASS&CPMV001_1_3=AUPASS.COM.AR&CPMV002_1_Ico=0&CPMV002_1_1=ATENCION&CPMV002_1_2=RADARES&CPMV002_1_3=OPERANDO&CPMV003_1_Ico=0&CPMV003_1_1=ATENCION&CPMV003_1_2=RADARES&CPMV003_1_3=OPERANDO&CPMV004_1_Ico=255&CPMV004_1_1= &CPMV004_1_2=&CPMV004_1_3=&CPMV05 _1_Ico=0&CPMV05 _1_1=ATENCION&CPMV05 _1_2=RADARES&CPMV05 _1_3=OPERANDO&CPMV006_1_Ico=0&CPMV006_1_1=ATENCION&CPMV006_1_2=RADARES&CPMV006_1_3=OPERANDO&CPMV007_1_Ico=0&CPMV007_1_1=ATENCION&CPMV007_1_2=RADARES&CPMV007_1_3=OPERANDO&CPMV08 _1_Ico=0&CPMV08 _1_1=ATENCION&CPMV08
the code.
#!/bash/perl .T
use strict;
use warnings;
use LWP::Simple;
my $file = "http://www.ausa.com.ar/autopista/carteleria/plano/mime.txt";
my $mime = get($file);
my #new;
foreach my $line ($mime) {
$line =~ s/&/ /g;
push(#new, $line);
}
print "$new[0]\n";
Try this way but when I start the array is equal to (all together)
the output I need
print "$new[1]\n";
VPM4381=FFFFFF

You don't want to substitute on &, you want to split on &.
#new = split /&/, $line;

Related

Parsing string in multiline data with positive lookbehind

I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.

Check how many "," in each line in Perl [duplicate]

This question already has answers here:
Counting number of occurrences of a string inside another (Perl)
(4 answers)
Closed 7 years ago.
I have to check how many times was "," in each line in file. Anybody have idea how can I do it in Perl?
On this moment my code looks like it:
open($list, "<", $student_list)
while ($linelist = <$list>)
{
printf("$linelist");
}
close($list)
But I have no idea how to check how many times is "," in each $linelist :/
Use the transliteration operator in counting mode:
my $commas = $linelist =~ y/,//;
Edited in your code :
use warnings;
use strict;
open my $list, "<", "file.csv" or die $!;
while (my $linelist = <$list>)
{
my $commas = $linelist =~ y/,//;
print "$commas\n";
}
close($list);
If you just want to count the number of somethings in a file, you don't need to read it into memory. Since you aren't changing the file, mmap would be just fine:
use File::Map qw(map_file);
map_file my $map, $filename, '<';
my $count = $map =~ tr/,//;
#! perl
# perl script.pl [file path]
use strict;
use warnings;
my $file = shift or die "No file name provided";
open(my $IN, "<", $file) or die "Couldn't open file $file: $!";
my #matches = ();
my $index = 0;
# while <$IN> will get the file one line at a time rather than loading it all into memory
while(<$IN>){
my $line = $_;
my $current_count = 0;
# match globally, meaning keep track of where the last match was
$current_count++ while($line =~ m/,/g);
$matches[$index] = $current_count;
$index++;
}
$index = 0;
for(#matches){
$index++;
print "line $index had $_ matches\n"
}
You can use mmap Perl IO layer instead of File::Map. It is almost as efficient as former but most probably present in your Perl installation without needing installing a module. Next, using y/// is more efficient than m//g in array context.
use strict;
use warnings;
use autodie;
use constant STUDENT_LIST => 'text.txt';
open my $list, '<:mmap', STUDENT_LIST;
while ( my $line = <$list> ) {
my $count = $line =~ y/,//;
print "There is $count commas at $.. line.\n";
}
If you would like grammatically correct output you can use Lingua::EN::Inflect in the right place
use Lingua::EN::Inflect qw(inflect);
print inflect "There PL_V(is,$count) $count PL_N(comma,$count) at ORD($.) line.\n";
Example output:
There are 7 commas at 1st line.
There are 0 commas at 2nd line.
There is 1 comma at 3rd line.
There are 2 commas at 4th line.
There are 7 commas at 5th line.
Do you want #commas for each line in the file, or #commas in the entire file?
On a per-line basis, replace your while loop with:
my #data = <list>;
foreach my $line {
my #chars = split //, $line;
my $count = 0;
foreach my $c (#chars) { $count++ if $c eq "," }
print "There were $c commas\n";
}

Perl - passing an array to subroutine

I'm in the process of learning Perl and am trying to write a script that takes a pattern and list of files as command line arguments and passes them to a subroutine, the subroutine then opens each file and prints the lines that match the pattern. The code below works; however, it stops after printing the lines from the first file and doesn't even touch the second file. What am I missing here?
#!/usr/bin/perl
use strict;
use warnings;
sub grep_file
{
my $pattern = shift;
my #files = shift;
foreach my $doc (#files)
{
open FILE, $doc;
while (my $line = <FILE>)
{
if ($line =~ m/$pattern/)
{
print $line;
}
}
}
grep_file #ARGV;
Shift pops an element from your parameter (see: http://perldoc.perl.org/functions/shift.html).
So #files can only contain one value.
Try
sub foo
{
my $one = shift #_;
my #files = #_;
print $one."\n";
print #files;
}
foo(#ARGV);
There is little reason to use a subroutine here. You are just putting the whole program inside a function and then calling it.
The empty <> operator will read from all the files in #ARGV in sequence, without you having to open them explicitly.
I would code your program like this
use strict;
use warnings;
my $pattern = shift;
$pattern = qr/$pattern/; # Compile the regex
while (<>) {
print if $_ =~ $pattern;
}

Perl Remove Stop Words from multiple files

I have read so many forms on how to remove stop words from files, my code remove many other things but I want to include also stop words. This is how far I reached, but I don't know what I am missing. Please Advice
use Lingua::StopWords qw(getStopWords);
my $stopwords = getStopWords('en');
chdir("c:/perl/input");
#files = <*>;
foreach $file (#files)
{
open (input, $file);
while (<input>)
{
open (output,">>c:/perl/normalized/".$file);
chomp;
#####What should I write here to remove the stop words#####
$_ =~s/<[^>]*>//g;
$_ =~ s/\s\.//g;
$_ =~ s/[[:punct:]]\.//g;
if($_ =~ m/(\w{4,})\./)
{
$_ =~ s/\.//g;
}
$_ =~ s/^\.//g;
$_ =~ s/,/' '/g;
$_ =~ s/\(||\)||\\||\/||-||\'//g;
print output "$_\n";
}
}
close (input);
close (output);
The stop words are the keys of %$stopwords which have the value 1, i.e.:
#stopwords = grep { $stopwords->{$_} } (keys %$stopwords);
It might happen be true that the stop words are just the keys of %$stopwords, but according the the Lingua::StopWords docs you also need to check the value associated with the key.
Once you have the stop words, you can remove them with code like this:
# remove all occurrences of #stopwords from $_
for my $w (#stopwords) {
s/\b\Q$w\E\b//ig;
}
Note the use of \Q...\E to quote any regular expression meta-characters that might appear in the stop word. Even though it is very unlikely that stop words will contains meta-characters, this is a good practice to follow any time you want to represent a literal string in a regular expression.
We also use \b to match a word boundary. This helps ensure that we won't a stop word that occurs in the middle of another word. Hopefully this will work for you - it depends a lot on what your input text is like - i.e. do you have punctuation characters, etc.
# Always use these in your Perl programs.
use strict;
use warnings;
use File::Basename qw(basename);
use Lingua::StopWords qw(getStopWords);
# It's often better to build scripts that take their input
# and output locations as command-line arguments rather than
# being hard-coded in the program.
my $input_dir = shift #ARGV;
my $output_dir = shift #ARGV;
my #input_files = glob "$input_dir/*";
# Convert the hash ref of stop words to a regular array.
# Also quote any regex characters in the stop words.
my #stop_words = map quotemeta, keys %{getStopWords('en')};
for my $infile (#input_files){
# Open both input and output files at the outset.
# Your posted code reopened the output file for each line of input.
my $fname = basename $infile;
my $outfile = "$output_dir/$fname";
open(my $fh_in, '<', $infile) or die "$!: $infile";
open(my $fh_out, '>', $outfile) or die "$!: $outfile";
# Process the data: you need to iterate over all stop words
# for each line of input.
while (my $line = <$fh_in>){
$line =~ s/\b$_\b//ig for #stop_words;
print $fh_out $line;
}
# Close the files within the processing loop, not outside of it.
close $fh_in;
close $fh_out;
}

How can I append characters to a line in a file?

I have a CSV file that was extracted from a ticketing system (I have no direct DB access to) and need to append a couple of columns to this from another database before creating reports off of it in Excel.
I'm using Perl to pull data out of the other database and would like to just append the additional columns to the end of each line as I process the file.
Is there a way to do this without having to basically create a new file? The basic structure is:
foreach $line (#lines) {
my ($vars here....) = split (',',$line);
## get additional fields
## append new column data to line
}
You could look at DBD::CSV to treat the file as if it were a database (which would also handle escaping special characters for you).
You can use Tie::File (in the Perl core since Perl 5.8) to modify a file in place:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
my $file = shift;
tie my #lines, "Tie::File", $file
or die "could not open $file: $!\n";
for my $line (#lines) {
$line .= join ",", '', get_data();
}
sub get_data {
my $data = <DATA>;
chomp $data;
return split /-/, $data
}
__DATA__
1-2-3-4
5-6-7-8
You can also use in-place-editing with the #ARGV/<> trick by setting $^I:
#!/usr/bin/perl
use strict;
use warnings;
$^I = ".bak";
while (my $line = <>) {
chomp $line;
$line .= join ",", '', get_data();
print "$line\n";
}
sub get_data {
my $data = <DATA>;
chomp $data;
return split /-/, $data
}
__DATA__
1-2-3-4
5-6-7-8
Despite any nice interfaces, you have to eventually read the file line-by-line. You might even have to do more that that if some quoted fields can have embedded newlines. Use something that knows about CSV to avoid some of those problems. Text::CSV_XS should save you most of the hassle of odd cases.
Consider using the -i option to edit <> files in-place.