I am using below solution from earlier solution from Alan, which works to combine text files with Pipe Character. Thanks !
Merge multiple text files and append current file name at the end of each line
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { 'sep_char' => '|' } );
open my $fho, '>', 'combined.csv' or die "Error opening file: $!";
while ( my $file = <*.txt> ) {
open my $fhi, '<', $file or die "Error opening file: $!";
( my $last_field = $file ) =~ s/\.[^\.]+$//; # Strip the file extension off
while ( my $row = $csv->getline($fhi) ) {
$csv->combine( #$row, $last_field ); # Construct new row by appending the file name without the extension
print $fho $csv->string, "\n"; # Write the combined string to combined.csv
}
}
If anyone could help enhance the solution with following 3 more requirements, would be helpful.
1) Within my text data, some data is within quotation marks in forllowing format; |"XYZ NR 456"|
Above solution is placing these data into different columns when I open the final combine.csv file, Is there a way to ensure all data remains combined within pipe character when merged.
2) Delete an entire line where it finds word Place_of_Destination
3) Current solution adds the filename at end of each line. I also want to split the file name with pipe characters
My filename structure is X_INRUS6_08072013.txt, solution adds pipe character and filename X_INRUS6_08072013 at the end of each line, What I also want to do is to split this file name further in following ; X | INRUS6 | 08072013
Is this possible, Thanks in advance for all the help.
As for your point 2, in order to skip the lines, you could use a 'next if', with the effect of skipping the line concerned:
next if ($line =~ /Place_of_Destination/)
or 'unless', as your wish:
unless ($line =~ /Place_of_Destination/) {
# do something
}
If I understood properly what you are trying to do this test is to be used in your second 'while' loop.
Related
How can i Split Value by Newline (\n) in some column, extract to new row and fill other column
My Example CSV Data (data.csv)
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP
HTTP
HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP
SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
In Service column has multiple value, separate by new line.
I want to extract it and fill with other value in some row look like this.
1,test#email.com,192.168.10.110,FTP,,
1,test#email.com,192.168.10.110,HTTP,,
1,test#email.com,192.168.10.110,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
I try to parsing with Text::CSV, I can only split multiple ip and service But i Don't known to fill other value as above example.
#!/usr/bin/perl
use Text::CSV;
my $file = "data.csv";
my #csv_value;
open my $fh, '<', $file or die "Could not open $file: $!";
my $csv = Text::CSV->new;
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
push #csv_value, $fields;
}
close $data;
Thank you in advance for any help you can provide.
To expand on my comment
perl -ne 'if (!/^\d/){print "$line$_";} else {print $_;} /(.*,).*/; $line=$1;' file1
Use the perl command line options
e = inline command
n = implicit loop, i.e. for every line in the file do the script
Each line of the file is now in the $_ default variable
if (!/^\d/){print "$line$_";} - if the line does not start with a digit print the $line (more later) variable, followed by default variable which is the line from the file
else {print $_;} - else just print the line
Now after we've done this if the line matches anything followed by a comma followed by anything, catch it with the regex bracket so it's put in $1. So for the first line $1 will be '1,test#email.com,192.168.10.109,'
/(.*,).*/; $line=$1;
Because we do this after the first line has been printed $line will always be the previous full line.
Your input CSV is broken. I would suggest to fix the generator.
With correctly formatted input CSV you will have to enable binary option in Text::CSV as your data contains non-ASCII characters.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# input has non-ASCII characters
my $csv_in = Text::CSV->new({ binary => 1 });
my $csv_out = Text::CSV->new();
$csv_out->eol("\n");
while (my $row = $csv_in->getline(\*STDIN)) {
for my $protocol (split("\n", $row->[3])) {
$row->[3] = $protocol;
$csv_out->print(\*STDOUT, $row);
}
}
exit 0;
Test with fixed input data:
$ cat dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,"FTP
HTTP
HTTPS",,
2,webmaster#email.com,192.168.10.111,"SFTP
SNMP",,
3,admin#email.com,192.168.10.112,HTTP,,
$ perl dummy.pl <dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP,,
1,test#email.com,192.168.10.109,HTTP,,
1,test#email.com,192.168.10.109,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
So far in perl, I know how to open a file for writing like so:
open my $par_fh, '>', $par_file
or die "$par_file: opening for write: $!";
print $par_fh <<PAR;
USERID=$creds
DIRECTORY=DMPDIR
USERS=$users
PAR
close $par_fh
or die "$par_file: closing after write: $!";
What I need help now is with my variable $user where in this config file I need to create a comma separate list USERS=joe,mary,sue,john with no comma on last item from a separate text file:
users.lst: (this list can get long)
joe
mary
sue
john
Do I need to open up another a while loop to read in the file? If so, how do I embed that in the file handle that I have already open? Can someone show me a good technique.
You can read all lines from a file into an array like
my #users = <$user_fh>;
remove all newlines at once:
chomp #users;
and then join all of them into a single string, by seperating each item with ,:
my $users = join ',', #users;
Then, we can interpolate that as usual;
print "USERS=$users\n";
Another solution doesn't do an explicit join, but sets the $" variable. This is the string that is put between array elements when we interpolate an array:
my #array = 1..4;
print "[#array]\n"; #=> "[1 2 3 4]";
Normally, this is a single space, but we can set it to a comma:
local $" = ",";
print "USERS=#users\n";
Opening a file for reading is much the same as for writing:
open my $par_fh, '<', 'users.lst' or die 'unable to read users.lst';
You can then read in one line at a time:
my #users;
while (my $line = <$par_fh>) {
chomp($line); # Remove newline
push #users, $line;
}
Or all at once:
my #users = <$par_fh>
chomp(#users); # Remove newlines from all elements
Close the file:
close($par_fh);
Create your config line:
my $output = 'USERS=' . join(',', #users);
And open and write to the file as you already have.
Do this before the PAR file processing.
open INPUT, "users.lst" or die $!;
while (<INPUT>) {
chomp;
push #users, $_;
}
close INPUT;
$user = "USERS=" . join(",", #users);
I want to write a script that takes a CSV file, deletes its first row and creates a new output csv file.
This is my code:
use Text::CSV_XS;
use strict;
use warnings;
my $csv = Text::CSV_XS->new({sep_char => ','});
my $file = $ARGV[0];
open(my $data, '<', $file) or die "Could not open '$file'\n";
my $csvout = Text::CSV_XS->new({binary => 1, eol => $/});
open my $OUTPUT, '>', "file.csv" or die "Can't able to open file.csv\n";
my $tmp = 0;
while (my $line = <$data>) {
# if ($tmp==0)
# {
# $tmp=1;
# next;
# }
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$csvout->print($OUTPUT, \#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
On the perl command line I write: c:\test.pl csv.csv and it doesn't create the file.csv output, but when I double click the script it creates a blank CSV file. What am I doing wrong?
Your program isn't ideally written, but I can't tell why it doesn't work if you pass the CSV file on the command line as you have described. Do you get the errors Could not open 'csv.csv' or Can't able to open file.csv? If not then the file must be created in your current directory. Perhaps you are looking in the wrong place?
If all you need to do is to drop the first line then there is no need to use a module to process the CSV data - you can handle it as a simple text file.
If the file is specified on the command line, as in c:\test.pl csv.csv, you can read from it without explicitly opening it using the <> operator.
This program reads the lines from the input file and prints them to the output only if the line counter (the $. variable) isn't equal to one).
use strict;
use warnings;
open my $out, '>', 'file.csv' or die $!;
while (my $line = <>) {
print $out $line unless $. == 1;
}
Yhm.. you don't need any modules for this task, since CSV ( comma separated value ) are simply text files - just open file, and iterate over its lines ( write to output all lines except particular number, e.g. first ). Such task ( skip first line ) is so simple, that it would be probably better to do it with command line one-liner than a dedicated script.
quick search - see e.g. this link for an example, there are numerous tutorials about perl input/output operations
http://learn.perl.org/examples/read_write_file.html
PS. Perl scripts ( programs ) usually are not "compiled" into binary file - they are of course "compiled", but, uhm, on the fly - that's why /usr/bin/perl is called rather "interpreter" than "compiler" like gcc or g++. I guess what you're looking for is some editor with syntax highlighting and other development goods - you probably could try Eclipse with perl plugin for that ( cross platform ).
http://www.eclipse.org/downloads/
http://www.epic-ide.org/download.php/
this
user#localhost:~$ cat blabla.csv | perl -ne 'print $_ if $x++; '
skips first line ( prints out only if variable incremented AFTER each use of it is more than zero )
You are missing your first (and only) argument due to Windows.
I think this question will help you: #ARGV is empty using ActivePerl in Windows 7
Earlier I was working on a loop within a loop and if a match was made it would replace the entire string from the second loop file. Now i have a slightly different situation. I'm trying to replace a substring from the first loop with a string from the second loop. They're both csv files and semicolon delimited. What i'm trying to replace are special characters: from the numerical code to the character itself The first file looks like:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
and the second file has the numerical code and the corresponding character:
Ą;Ą
ą;ą
Ǟ;Ǟ
Á;Á
á;á
Â;Â
ł;ł
The first semicolon in the second file belongs to the numerical code of the corresponding character and should not be used to split the file. The result should be:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał;8;9
This is the code I have. How can i fix this?
use strict;
use warnings;
my $inputfile1 = shift || die "input/output!\n";
my $inputfile2 = shift || die "input/output!\n";
my $outputfile = shift || die "output!\n";
open my $INFILE1, '<', $inputfile1 or die "Used/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "Used/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "Used/Not found :$!\n";
my $infile2_pos = tell $INFILE2;
while (<$INFILE1>) {
s/"//g;
my #elements = split /;/, $_;
seek $INFILE2, $infile2_pos, 0;
while (<$INFILE2>) {
s/"//g;
my #loopelements = split /;/, $_;
#### The problem part ####
if (($elements[2] =~ /\&\#\d{3}\;/g) and (($elements[2]) eq ($loopelements[0]))){
$elements[2] =~ s/(\&\#\d{3}\;)/$loopelements[1]/g;
print "$2. elements[2]\n";
}
#### End problem part #####
}
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
#print "\n"
}
close $INFILE1;
close $INFILE2;
close $OUTFILE;
exit 0;
Assuming your character codes are standard Unicode entities, you are better off using HTML::Entities to decode them.
This program processes the data you show in your first file and ignores the second file completely. The output seems to be what you want.
use strict;
use warnings;
use HTML::Entities 'decode_entities';
binmode STDOUT, ":utf8";
while (<DATA>) {
print decode_entities($_);
}
__DATA__
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
output
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
You split your #elements at every occurrence of ;, which is then removed. You will not find it in your data, the semicolon in your Regexp can never match, so no substitutions are done.
Anyway, using seek is somewhat disturbing for me. As you have a reasonable number of replacement codes (<5000), you might consider putting them into a hash:
my %subst;
while(<$INFILE2>){
/^&#(\d{3});;(.*)\n/;
$subst{$1} = $2;
}
Then we can do:
while(<$INFILE1>){
s| &# (\d{3}) | $subst{$1} // "&#$1" |egx;
# (don't try to concat undef
# when no substitution for our code is defined)
print $OUTFILE $_;
}
We do not have to split the files or view them as CSV data if replacement should occur everywhere in INFILE1. My solution should speed things up a bit (parsing INFILE2 only once). Here I assumed your input data is correct and the number codes are not terminated by a semicolon but by length. You might want to remove that from your Regexes.(i.e. m/&#\d{3}/)
If you have trouble with character encodings, you might want to open your files with :uft8 and/or use Encode or similar.
Is there a way to easily take 3 text files and turn it into a multi-tab excel sheet via script?
files are file1.txt, file 2.txt, file3.txt - i would like it to output to excelsheet1.xls with 3 tabs.
You don't mention the format of the text files so real example code is difficult, but you can use Spreadsheet::WriteExcel for this. Look at the add_worksheet() method for creating new tabs.
Given that you say that each line is a number followed by text I am presuming that this is two columns per row with a space delimiting the first and second columns and only digits in the first column. If this is not true the regex below would need to be adjusted. That said, here's some sample code.
#!/usr/bin/env perl
use strict;
use warnings;
use Spreadsheet::WriteExcel;
sub read_file{
my $f = shift;
my #row;
open(my $fh, '<', $f) or die $!;
while(<$fh>){
chomp;
s/^(d+)s+//; # assuming format of "1 Text heren2 More textn"
if(defined $1){
push(#row, [$1, $_]);
}
}
close($fh) or die $!;
return #row;
}
if($#ARGV < 1){
die "$0: file1 [file2 ... filen] output.xls\n";
}
my $xl = Spreadsheet::WriteExcel->new(pop);
foreach my $file (#ARGV){
if( -f $file){
my #rows = read_file($file);
my $sheet = $xl->add_worksheet($file);
for my $row (0..$#rows){
my #cols = #{$rows[$row]};
for my $col (0..$#cols){
$sheet->write($row, $col, $cols[$col]);
}
}
}
}
Input files are given on the command line and processed in order, turning each one in to a tab named after the file name. The output file name is given on the command line last, after one or more input file names.
EDIT: Now including the improvements FM mentioned in his comment and a trivial CLI for specifying the output file name.