OK I have a file which is a list of over 5000 names, one line each;
the file is a txt file generated with microsoft excel.
The following code gave me an output of 1.
open FILEHANDLE, "< listname_FC2-3ss>0.txt";
chomp (my #genelist = <FILEHANDLE>);
close FILEHANDLE;
print "the number of item in the list is ";
print scalar #genelist;
I'm using a '10 macbook air, perl 5.12.I tried to output the list and its the last line of the file.
But I tried the code on a tiny version of 10 names which I extracted by hand myself and it worked perfectly fine, so i reckon it's got something to do with the delimiter?
Please help.
Ian
Try with
local $/ = "\r";
before file reading. It changes the input record separator to the "\r" character.
Related
I have several Company_***.csv files (altough the separator's a tab not a comma; hence should be *.tsv, but never mind) which contains a header plus numerous data lines e.g
1stHeader 2ndHeader DateHeader OtherHeaders...
111111111 SOME STRING 2020-08-01 OTHER STRINGS..
222222222 ANOT STRING 2020-08-02 OTHER STRINGS..
I have to split them according to the 3rd column here, it's a date.
Each file should be named like e.g. Company_2020_08_01.csv Company_2020_08_02.csv & so one
and containing: same header on the 1st line + matching rows as the following lines.
At first I thought about saving (once) the header in a single file e.g.
sed -n '1w Company_header.csv' Company_*.csv
then parsing the files with a pattern for the date (hence the headers would be skipped) e.g.
sed -n '/\t2020-[01][0-9]-[0-3][0-9]\t/w somefilename.csv' Company_*.csv
... and at last, insert the (missing) header in each generated file.
But I'm stuck at step 2: I can't find how I could generate (dynamically) the "filename" expected by the w command, neither how to capture the date in the search pattern (because apparently this is just an address, not a search-replace "field" as in the s/regexp/replacement/[flags] command, so you can't have capturing groups ( ) in there).
So I wonder if this is actually doable with sed? Or should I look upon other tools e.g. awk?
Disclaimer: I'm quite a n00b with these commands so I'm just learning/starting from scratch...
Perl to the rescue!
perl -e 'while (<>) {
$h = $_, next if $. == 1;
$. = 0 if eof;
#c = split /\t/;
open my $out, ">>", "Company_" . $c[2] =~ tr/-/_/r . ".csv" or die $!;
print {$out} $h unless tell $out;
print {$out} $_;
}' -- Company_*.csv
The diamond operator <> in scalar context reads a line from the input.
The first line of each file is stored in the variable $h, see $. and eof
split populates the #c array by the column values for each line
$c[2] contains the date, using tr we translate dashes to underscores to create a filename from it. open opens the file for appending.
print prints the header if the file is empty (see tell)
and prints the current line, too.
Note that it only appends to the files, so don't forget to delete any output files before running the script again.
I'm sure someone has an explanation as to what is happening with the following script:
Please note, the file I specify is available and is opening. I know this because the last line of the file is output when the program is run, but it is only the last line.
Note about the .csv file: it's generated on windows (I'm using OS X 10.7.4 with Perl 5.12.3) and uses \r line breaks. I attempted to tell perl that the line break character was \r at the top of the script but it does not work. I know they're \r as the grep search finds them in a text editor.
The script runs and only prints the last line of the file. If I plug in a regular expression it will grab the first matching field from the first line and echo it fine, but I cannot iterate over the entire file.
Any clarification is appreciated as I am new to perl.
#!/usr/bin/perl
use warnings;
print "Please enter your filename:";
my ($dataline);
open(INFO,'./expensereport.csv') || die("can't open datafile: $!");
while (my $line = <INFO>) {
chomp $line;
print $line;
}
print $!;
The carriage returns without linefeed are causing print to overwrite each line on the same line, so all you see is the last.
Run dos2unix on your input file before processing.
There are several ways to tell perl that your input file is windows-style :crlf.
perldoc -f binmode or perldoc -f open
open(INFO, '<:crlf', './expensereport.csv')
...
Ahh, that's clear! :)
Look, you have a file with \r (carriage return, literally) and \n (newline). chomp cuts off \n (new line). So you print over the same line (remember "carriage return") again and again.
Use print "$line\n"; instead
I'm trying to create a simple script to read a text file that contains records of book titles. Each record is separated with a plain old double space (\r\n\r\n). I need to count how many records are in the file.
For example here is the input file:
record 1
some text
record 2
some text
...
I'm using a regex to check for carriage return and newline, but it fails to match. What am I doing wrong? I'm at my wits' end.
sub readInputFile {
my $inputFile = $_[0]; #read first argument from the commandline as fileName
open INPUTFILE, "+<", $inputFile or die $!; #Open File
my $singleLine;
my #singleRecord;
my $recordCounter = 0;
while (<INPUTFILE>) { # loop through the input file line-by-line
$singleLine = $_;
push(#singleRecord, $singleLine); # start adding each line to a record array
if ($singleLine =~ m/\r\n/) { # check for carriage return and new line
$recordCounter += 1;
createHashTable(#singleRecord); # send record make a hash table
#singleRecord = (); # empty the current record to start a new record
}
}
print "total records : $recordCounter \n";
close(INPUTFILE);
}
It sounds like you are processing a Windows text file on Linux, in which case you want to open the file with the :crlf layer, which will convert all CRLF line-endings to the standard Perl \n ending.
If you are reading Windows files on a Windows platform then the conversion is already done for you, and you won't find CRLF sequences in the data you have read. If you are reading a Linux file then there are no CR characters in there anyway.
It also sounds like your records are separated by a blank line. Setting the built-in input record separator variable $/ to a null string will cause Perl to read a whole record at a time.
I believe this version of your subroutine is what you need. Note that people familiar with Perl will thank you for using lower-case letters and underscore for variables and subroutine names. Mixed case is conventionally reserved for package names.
You don't show create_hash_table so I can't tell what data it needs. I have chomped and split the record into lines, and passed a list of the lines in the record with the newlines removed. It would probably be better to pass the entire record as a single string and leave create_hash_table to process it as required.
sub read_input_file {
my ($input_file) = #_;
open my $fh, '<:crlf', $input_file or die $!;
local $/ = '';
my $record_counter = 0;
while (my $record = <$fh>) {
chomp;
++$record_counter;
create_hash_table(split /\n/, $record);
}
close $fh;
print "Total records : $record_counter\n";
}
You can do this more succinctly by changing Perl's record-separator, which will make the loop return a record at a time instead of a line at a time.
E.g. after opening your file:
local $/ = "\r\n\r\n";
my $recordCounter = 0;
$recordCounter++ while(<INPUTFILE>);
$/ holds Perl's global record-separator, and scoping it with local allows you to override its value temporarily until the end of the enclosing block, when it will automatically revert back to its previous value.
But it sounds like the file you're processing may actually have "\n\n" record-separators, or even "\r\r". You'd need to set the record-separator correctly for whatever file you're processing.
If your files are not huge multi-gigabytes files, the easiest and safest way is to read the whole file, and use the generic newline metacharacter \R.
This way, it also works if some file actually uses LF instead of CRLF (or even the old Mac standard CR).
Use it with split if you also need the actual records:
perl -ln -0777 -e 'my #records = split /\R\R/; print scalar(#records)' $Your_File
Or if you only want to count the records:
perl -ln -0777 -e 'my $count=()=/\R\R/g; print $count' $Your_File
For more details, see also my other answer here to a similar question.
It's been a couple months since I've been Perling, but I'm totally stuck on why this is happening...
I'm on OSX, if it matters.
I'm trying to transform lines in a file like
08/03/2011 01:00 PDT,1.11
into stdout lines like
XXX, 20120803, 0100, KWH, 0.2809, A, YYY
Since I'm reading a file, I want to chomp after each line is read in. However, when I chomp, I find my printing gets all messed up. When I don't chomp the printing is fine (except for the extra newline...). What's going on here?
while(<SOURCE>) {
chomp;
my #tokens = split(' |,'); # #tokens now [08/03/2011, 01:00, PDT, 1.11]
my $converted_date = convertDate($tokens[0]);
my $converted_time = convertTime($tokens[1]);
print<<EOF;
$XXX, $converted_date, $converted_time, KWH, $tokens[3], A, YYY
EOF
}
With the chomp call in there, the output is all mixed up:
, A, YYY10803, 0100, KWH, 1.11
Without the chomp call in there, it's at least printing in the right order, but with the extra new line:
XXX, 20110803, 0100, KWH, 1.11
, A, YYY
Notice that with the chomp in there, it's like it overwrites the newline "on top of" the first line. I've added the $|=1; autoflush, but don't know what else to do here.
Thoughts? And thanks in advance....
The lines of your input ends with CR LF. You're removing the LF only. A simple solution is to use the following instead of chomp:
s/\s+\z//;
You could also use the dos2unix command line tool to convert the files before passing them to Perl.
The problem is that you have DOS line-endings and are running on a Unix build of Perl.
One solution to this is to use PerlIO::eol. You may have to install it but there is no need for a use line in the program.
Then you can write
binmode ':raw:eol(LF)', $filehandle;
after which, regardless of the format or source of the file, the lines read will be terminated with the standard "\n".
I am running into a strange issue in perl that I can't seem to find an answer for.
I have a small script that will parse data from an external sorce (be it file, website, etc). Once the data has been parsed, it will then save it to a CSV file. However, the issue is when I am writing the file or printing to screen the data, it seems to be truncating the beginning of the string. I am using strict and warnings and I am not seeing any errors.
Here is an example:
print "Name: " . $name . "\n";
print "Type: " . $type. "\n";
print "Price: " . $price . "\n";
print "Count: " . $count . "\n";
It will return the following:
John
Blue
7.99
5
If I attempt to do it this way:
print "$name,$type,$price,$count\n";
I get the following as a result:
,7.99,5
I tried the following to see where the issue begins and get the following:
print "$name\n";
print "$name,$type\n";
print "$name,$type,$price\n";
print "$name,$type,$price,$count\n";
Results:
John
John,Blue
,7.99
,7.99,5
I am still learning perl, but can't seem to find out (maybe due to lack of knowledge) of what is causing this. I tried debugging the script, but I did not see any special character in the price variable that would cause this.
The string in $price ends with a Carriage Return. This is causing your terminal to move the cursor to the start of the line, causing the first two fields to be overwritten by the ones that follow.
You are probably reading a Windows text file on a unix box. Convert the file (using dos2unix, for example), or use s/\s+\z//; instead of chomp;.
If the CR made into the middle of a string, you could use s/\r//g;.
Per #Mat suggestion I ran the output through hexdump -C and found there was a carriage return (indicated by the hex value 0d). Using the code $price =~ s/\r//g; to remove the CR from the line of text fixed the problem.
Also, the input file was in Windows format not Unix, ran the command dos2unix to fix that.