How do I modify the second column of a CSV file based on the first column? - perl

I'm new to Perl and I have a CSV file that contains e-mails and names, like this:
john#domain1.com;John
Paul#domain2.com;
Richard#domain3.com;Richard
Rob#domain4.com;
Andrew#domain5.com;Andrew
However, as you can see a few entries/lines have the e-mail address and the ; field separator, but lack the name. I need to read line by line and and if the name field is missing, I want to print in this place the begin of the e-mail until #domainX.com. Output example:
john#domain1.com;John
Paul#domain2.com;Paul
Richard#domain3.com;Richard
Rob#domain4.com;Rob
Andrew#domain5.com;Andrew
I'm new with Perl, I did the iteration of read line by line, such this:
#!/usr/bin/perl
use warnings;
use strict;
open (MYFILE, 'test.txt');
while (<MYFILE>) {
chomp;
}
But I'm failing to parse the entries to use ; as a separator and to check if the name field is missing and consequently print the begin of the e-mail without the domain.
Can someone please give me a example based on my code?

First, if the file may contain real CSV (or space SV in your case) data (e.g. quoted fields), I'd strongly recommend using a standard Perl module to parse it.
Otherwise, a quick-and-dirty example can be:
#!/usr/bin/perl
use warnings;
use strict;
# In modern Perl, please always use 3-aqr form of open and lexical filehandles.
# More robust
open $fh, "<", 'test.txt' || die "Can not open: $!\n";
while (<$fh>) {
chomp;
my ($email, name) = split(/;/, $_);
if (!$name) {
my ($userid, $domain) = split(/\#/, $email);
$name = $userid;
}
print "$space_prefix$email;$name\n"; # Print to STDOUT for simplicity of example
}
close($fh);

Try:
#!/usr/bin/env perl
use strict;
use warnings;
for my $file ( #ARGV ){
open my$in_fh, '<', $file or die "could not open $file: $!\n";
while( my $line = <$in_fh> ){
chomp( $line );
my ( $email, $name ) = split m{ \; }msx, $line;
if( ! ( defined $name && length( $name ) > 0 ) ){
( $name ) = split m{ \# }msx, $email;
$name = ucfirst( lc( $name ));
}
print "$email;$name\n";
}
}

I am not a pearl programmer, but I would split first on the space character, and then you could iterate through the results and split by the semi-colon. Then you can check the second member of the semi-colon split array, and if it is empty, replace it with the beginning of the first member of the semi-colon split array. Then, just reverse the process, first joining by semi-colons and then by spaces.

Related

how to assign data into hash from an input file

I am new to perl.
Inside my input file is :
james1
84012345
aaron5
2332111 42332
2345112 18238
wayne[2]
3505554
Question: I am not sure what is the correct way to get the input and set the name as key and number as values. example "james" is key and "84012345" is the value.
This is my code:
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $input= $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
my #names = split ' ', $data;
my #values = split ' ', $data;
#hash{#names} = #values;
print Dumper \%hash;
I'mma go over your code real quick:
#!/usr/bin/perl -w
-w is not recommended. You should use warnings; instead (which you're already doing, so just remove -w).
use strict;
use warnings;
Very good.
use Data::Dumper;
my $input= $ARGV[0];
OK.
my %hash;
Don't declare variables before you need them. Declare them in the smallest scope possible, usually right before their first use.
open my $data , '<', $input or die " cannot open file : $_\n";
You have a spurious space at the beginning of your error message and $_ is unset at this point. You should include $input (the name of the file that failed to open) and $! (the error reason) instead.
my #names = split ' ', $data;
my #values = split ' ', $data;
Well, this doesn't make sense. $data is a filehandle, not a string. Even if it were a string, this code would assign the same list to both #names and #values.
#hash{#names} = #values;
print Dumper \%hash;
My version (untested):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#ARGV == 1
or die "Usage: $0 FILE\n";
my $file = $ARGV[0];
my %hash;
{
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
local $/ = '';
while (my $paragraph = readline $fh) {
my #words = split ' ', $paragraph;
my $key = shift #words;
$hash{$key} = \#words;
}
}
print Dumper \%hash;
The idea is to set $/ (the input record separator) to "" for the duration of the input loop, which makes readline return whole paragraphs, not lines.
The first (whitespace separated) word of each paragraph is taken to be the key; the remaining words are the values.
You have opened a file with open() and attached the file handle to $data. The regular way of reading data from a file is to loop over each line, like so:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $input = $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
while (my $line = <$data>) {
chomp $line; # Removes extra newlines (\n)
if ($line) { # Checks if line is empty
my ($key, $value) = split ' ', $line;
$hash{$key} = $value;
}
}
print Dumper \%hash;
OK, +1 for using strict and warnings.
First Take a look at the $/ variable for controlling how a file is broken into records when it's read in.
$data is a file handle you need to extract the data from the file, if it's not to big you can load it all into an array, if it's a large file you can loop over each record at a time. See the <> operator in perlop
Looking at you code it appears that you want to end up with the following data structure from your input file
%hash(
james1 =>[
84012345
],
aaron5 => [
2332111,
42332,
2345112,
18238
]
'wayne[2]' => [
3505554,
]
)
See perldsc on how to do that.
All the documentation can be read using the perldoc command which comes with Perl. Running perldoc on its own will give you some tips on how to use it and running perldoc perldoc will give you possibly far more info than you need at the moment.

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

How do I find the line a word is on when the user enters text in Perl?

I have a simple text file that includes all 50 states. I want the user to enter a word and have the program return the line the specific state is on in the file or otherwise display a "word not found" message. I do not know how to use find. Can someone assist with this? This is what I have so far.
#!/bin/perl -w
open(FILENAME,"<WordList.txt"); #opens WordList.txt
my(#list) = <FILENAME>; #read file into list
my($state); #create private "state" variable
print "Enter a US state to search for: \n"; #Print statement
$line = <STDIN>; #use of STDIN to read input from user
close (FILENAME);
An alternative solution that reads only the parts of the file until a result is found, or the file is exhausted:
use strict;
use warnings;
print "Enter a US state to search for: \n";
my $line = <STDIN>;
chomp($line);
# open file with 3 argument open (safer)
open my $fh, '<', 'WordList.txt'
or die "Unable to open 'WordList.txt' for reading: $!";
# read the file until result is found or the file is exhausted
my $found = 0;
while ( my $row = <$fh> ) {
chomp($row);
next unless $row eq $line;
# $. is a special variable representing the line number
# of the currently(most recently) accessed filehandle
print "Found '$line' on line# $.\n";
$found = 1; # indicate that you found a result
last; # stop searching
}
close($fh);
unless ( $found ) {
print "'$line' was not found\n";
}
General notes:
always use strict; and use warnings; they will save you from a wide range of bugs
3 argument open is generally preferred, as well as the or die ... statement. If you are unable to open the file, reading from the filehandle will fail
$. documentation can be found in perldoc perlvar
Tool for the job is grep.
chomp ( $line ); #remove linefeeds
print "$line is in list\n" if grep { m/^\Q$line\E$/g } #list;
You could also transform your #list into a hash, and test that, using map:
my %cities = map { $_ => 1 } #list;
if ( $cities{$line} ) { print "$line is in list\n";}
Note - the above, because of the presence of ^ and $ is an exact match (and case sensitive). You can easily adjust it to support fuzzier scenarios.

Parsing a CSV file and Hashing

I am trying to parse a CSV file to read in all the other zip codes. I am trying to create a hash where each key is a zip code and the value is the number it appears in the file. Then I want to print out the contents as Zip Code - Number. Here is the Perl script I have so far.
use strict;
use warnings;
my %hash = qw (
zipcode count
);
my $file = $ARGV[0] or die "Need CSV file on command line \n";
open(my $data, '<', $file) or die "Could not open '$file $!\n";
while (my $line = <$data>) {
chomp $line;
my #fields = split "," , $line;
if (exists($hash{$fields[2]})) {
$hash{$fields[1]}++;
}else {
$hash{$fields[1]} = 1;
}
}
my $key;
my $value;
while (($key, $value) = each(%hash)) {
print "$key - $value\n";
}
exit;
You don't say which column your zip code is in, but you are using the third field to check for an existing hash element, and then the second field to increment it.
There is no need to check whether a hash element already exists: Perl will happily create a non-existent hash element and increment it to 1 the first time you access it.
There is also no need to explicitly open any files passed as command line parameters: Perl will open them and read them if you use the <> operator without a file handle.
This reworking of your own program may work. It assumes the zip code is in the second column of the CSV. If it is anywhere else just change ++$hash{$fields[1]} appropriately.
use strict;
use warnings;
#ARGV or die "Need CSV file on command line \n";
my %counts;
while (my $line = <>) {
chomp $line;
my #fields = split /,/, $line;
++$counts{$fields[1]};
}
while (my ($key, $value) = each %counts) {
print "$key - $value\n";
}
Sorry if this is off-topic, but if you're on a system with the standard Unix text processing tools, you could use this command to count the number of occurrences of each value in field #2, and not need to write any code.
cut -d, -f2 filename.csv | sort | uniq -c
which will generate something like this output, where the count is listed first, and the zipcode second:
12 12345
2 56789
34 78912
1 90210

Reading Data from a file in Perl

I have a file abc.txt that has data of the form
sHost = "Arun";
sUid ="Abc";
I want to get Arun for sHost and so forth using Perl. My code:
my $filename = "abc.txt";
use strict;
use warnings;
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>)
{
chomp $row;
if ($row=~m/sHost/)
{
print $row;
}
}
The output I am getting sHost = Arun;
But I want only 'Arun'. What logic should I apply here? I am very new to Perl and Linux.
After the chomp, alter to this and the variable $host will contain the value
if ($row=~m/sHost = "(.*)"/) {
$host=$1;
In simple terms the ( ) section is given to $1 if there is a match. See man perlre for the details
To generalise this to read any key and any value do something like this
while (my $row = <$fh>) {
if ($row = ~ /^(\w+) = "([^"]+)"/) {
$value{$1} = $2;
}
Then $value{'sHost'} will be "Arun" etc
For universal config file parsing you can use following piece of code:
my %config;
if ($row =~ m/^\s*(["'`])?(\S+)\1?\s*=\s*(["'`])?(\S+?)\3?;?$/) {
my $key = $2;
my $value = $4;
$config{$key} = $value;
}
This regexp allows you to process key-value lines with plain or surrounded by different quote type (" ' `, but you can add your symbols if you like) key/value with leading or/and trailing whitespaces, semicolon is not ogligatory. Also you can change (\S+) according to your requirements of key/value possible values (\S - all except whitespaces).
use m/.*=\s*([^\s]*)/g instead of m/sHost/
use print $1 instead of print $row
Replace
if ($row=~m/sHost/)
with
if ($row=~s/sHost//)