How can i Split Value by Newline (\n) in some column, extract to new row and fill other column
My Example CSV Data (data.csv)
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP
HTTP
HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP
SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
In Service column has multiple value, separate by new line.
I want to extract it and fill with other value in some row look like this.
1,test#email.com,192.168.10.110,FTP,,
1,test#email.com,192.168.10.110,HTTP,,
1,test#email.com,192.168.10.110,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
I try to parsing with Text::CSV, I can only split multiple ip and service But i Don't known to fill other value as above example.
#!/usr/bin/perl
use Text::CSV;
my $file = "data.csv";
my #csv_value;
open my $fh, '<', $file or die "Could not open $file: $!";
my $csv = Text::CSV->new;
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
push #csv_value, $fields;
}
close $data;
Thank you in advance for any help you can provide.
To expand on my comment
perl -ne 'if (!/^\d/){print "$line$_";} else {print $_;} /(.*,).*/; $line=$1;' file1
Use the perl command line options
e = inline command
n = implicit loop, i.e. for every line in the file do the script
Each line of the file is now in the $_ default variable
if (!/^\d/){print "$line$_";} - if the line does not start with a digit print the $line (more later) variable, followed by default variable which is the line from the file
else {print $_;} - else just print the line
Now after we've done this if the line matches anything followed by a comma followed by anything, catch it with the regex bracket so it's put in $1. So for the first line $1 will be '1,test#email.com,192.168.10.109,'
/(.*,).*/; $line=$1;
Because we do this after the first line has been printed $line will always be the previous full line.
Your input CSV is broken. I would suggest to fix the generator.
With correctly formatted input CSV you will have to enable binary option in Text::CSV as your data contains non-ASCII characters.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# input has non-ASCII characters
my $csv_in = Text::CSV->new({ binary => 1 });
my $csv_out = Text::CSV->new();
$csv_out->eol("\n");
while (my $row = $csv_in->getline(\*STDIN)) {
for my $protocol (split("\n", $row->[3])) {
$row->[3] = $protocol;
$csv_out->print(\*STDOUT, $row);
}
}
exit 0;
Test with fixed input data:
$ cat dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,"FTP
HTTP
HTTPS",,
2,webmaster#email.com,192.168.10.111,"SFTP
SNMP",,
3,admin#email.com,192.168.10.112,HTTP,,
$ perl dummy.pl <dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP,,
1,test#email.com,192.168.10.109,HTTP,,
1,test#email.com,192.168.10.109,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
Related
I am writing a program using perl which read a value from one file and replace this value in other file. Program runs successfully, but value didn't get replaced. Please suggest me where is the error.
use strict;
use warnings;
open(file1,"address0.txt") or die "Cannot open file.\n";
my $value;
$value=<file1>;
system("perl -p -i.bak -e 's/add/$value/ig' rough.sp");
Here the value which I want to replace exists in address0.txt file. It is a single value 1. I want to place this value in place of add in other file rough.sp.
My rough.sp looks like
Vdd 1 0 add
My address0.txt looks like
1
So output should be like
Vdd 1 0 1
Please help me out. Thanks in advance
Assuming that there is a 1:1 relationship between lines in adress0.txt and rough.sp, you can proceed like this:
use strict;
use warnings;
my ($curline_1,$curline_2);
open(file1, "address0.txt") or die "Cannot open file.\n";
open(file2, "rough.sp") or die "Cannot open file.\n";
open(file3, ">out.sp") or die "Cannot open file.\n";
while (<file1>) {
$curline_1 = $_;
chomp($curline_1);
$curline_2 = <file2>;
$curline_2 =~ s/ add/ $curline_1/;
print file3 $curline_2;
}
close(file1);
close(file2);
close(file3);
exit(0);
Explanation:
The code iterates through the lines of your input files in parallel. Note that the lines read include the line terminator. Line contents from the 'address' file are taken as replacement values fpr the add literal in your .sp file. Line terminators from the 'address' file are eliminated to avoid introducing additional newlines.
Addendum:
An extension for multi-replacements might look like this:
$curline_1 = $_;
chomp($curline_1);
my #parts = split(/ +/, $curline_1); # splits the line from address0.txt into an array of strings made up of contiguous non-whitespace chars
$curline_2 = <file2>;
$curline_2 =~ s/ add/ $parts[0]/;
$curline_2 =~ s/ sub/ $parts[1]/;
# ...
Based on a mapping file, i need to search for a string and if found append the replace string to the end of line.
I'm traversing through the mapping file line by line and using the below perl one-liner, appending the strings.
Issues:
1.Huge find & replace Entries: But the issues is the mapping file has huge number of entries (~7000 entries) and perl one-liners takes ~1 seconds for each entries which boils down to ~1 Hour to complete the entire replacement.
2.Not Simple Find and Replace: Its not a simple Find & Replace. It is - if found string, append the replace string to EOL.
If there is no efficient way to process this, i would even consider replacing rather than appending.
Mine is on Windows 7 64-Bit environment and im using active perl. No *unix support.
File Samples
Map.csv
findStr1,RplStr1
findStr2,RplStr2
findStr3,RplStr3
.....
findStr7000,RplStr7000
input.csv
col1,col2,col3,findStr1,....col-N
col1,col2,col3,findStr2,....col-N
col1,col2,col3,FIND-STR-NOT-EXIST,....col-N
output.csv (Expected Output)
col1,col2,col3,findStr1,....col-N,**RplStr1**
col1,col2,col3,findStr1,....col-N,**RplStr2**
col1,col2,col3,FIND-STR-NOT-EXIST,....col-N
Perl Code Snippet
One-Liner
perl -pe '/findStr/ && s/$/RplStr/' file.csv
open( INFILE, $MarketMapFile ) or die "Error occured: $!";
my #data = <INFILE>;
my $cnt=1;
foreach $line (#data) {
eval {
# Remove end of line character.
$line =~ s/\n//g;
my ( $eNodeBID, $MarketName ) = split( ',', $line );
my $exeCmd = 'perl -i.bak -p -e "/'.$eNodeBID.'\(M\)/ && s/$/,'.$MarketName.'/;" '.$CSVFile;
print "\n $cnt Repelacing $eNodeBID with $MarketName and cmd is $exeCmd";
system($exeCmd);
$cnt++;
}
}
close(INFILE);
To do this in a single pass through your input CSV, it's easiest to store your mapping in a hash. 7000 entries is not particularly huge, but if you're worried about storing all of that in memory you can use Tie::File::AsHash.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Tie::File::AsHash;
tie my %replace, 'Tie::File::AsHash', 'map.csv', split => ',' or die $!;
my $csv = Text::CSV->new({ binary => 1, auto_diag => 1, eol => $/ })
or die Text::CSV->error_diag;
open my $in_fh, '<', 'input.csv' or die $!;
open my $out_fh, '>', 'output.csv' or die $!;
while (my $row = $csv->getline($in_fh)) {
push #$row, $replace{$row->[3]};
$csv->print($out_fh, $row);
}
untie %replace;
close $in_fh;
close $out_fh;
map.csv
foo,bar
apple,orange
pony,unicorn
input.csv
field1,field2,field3,pony,field5,field6
field1,field2,field3,banana,field5,field6
field1,field2,field3,apple,field5,field6
output.csv
field1,field2,field3,pony,field5,field6,unicorn
field1,field2,field3,banana,field5,field6,
field1,field2,field3,apple,field5,field6,orange
I don't recommend screwing up your CSV format by only appending fields to matching lines, so I add an empty field if a match isn't found.
To use a regular hash instead of Tie::File::AsHash, simply replace the tie statement with
open my $map_fh, '<', 'map.csv' or die $!;
my %replace = map { chomp; split /,/ } <$map_fh>;
close $map_fh;
This is untested code / pseudo-Perl you'll need to polish it (strict, warnings, etc.):
# load the search and replace sreings into memeory
open($mapfh, "<", mapfile);
%maplines;
while ( $mapline = <fh> ) {
($findstr, $replstr) = split(/,/, $mapline);
%maplines{$findstr} = $replstr;
}
close $mapfh;
open($ifh, "<", inputfile);
while ($inputline = <$ifh>) { # read an input line
#input = split(/,/, $inputline); # split it into a list
if (exists $maplines{$input[3]}) { # does this line match
chomp $input[-1]; # remove the new line
push #input, $maplines{$input[3]}; # add the replace str to the end
last; # done processing this line
}
print join(',', #input); # or print or an output file
}
close($ihf)
I am using below solution from earlier solution from Alan, which works to combine text files with Pipe Character. Thanks !
Merge multiple text files and append current file name at the end of each line
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { 'sep_char' => '|' } );
open my $fho, '>', 'combined.csv' or die "Error opening file: $!";
while ( my $file = <*.txt> ) {
open my $fhi, '<', $file or die "Error opening file: $!";
( my $last_field = $file ) =~ s/\.[^\.]+$//; # Strip the file extension off
while ( my $row = $csv->getline($fhi) ) {
$csv->combine( #$row, $last_field ); # Construct new row by appending the file name without the extension
print $fho $csv->string, "\n"; # Write the combined string to combined.csv
}
}
If anyone could help enhance the solution with following 3 more requirements, would be helpful.
1) Within my text data, some data is within quotation marks in forllowing format; |"XYZ NR 456"|
Above solution is placing these data into different columns when I open the final combine.csv file, Is there a way to ensure all data remains combined within pipe character when merged.
2) Delete an entire line where it finds word Place_of_Destination
3) Current solution adds the filename at end of each line. I also want to split the file name with pipe characters
My filename structure is X_INRUS6_08072013.txt, solution adds pipe character and filename X_INRUS6_08072013 at the end of each line, What I also want to do is to split this file name further in following ; X | INRUS6 | 08072013
Is this possible, Thanks in advance for all the help.
As for your point 2, in order to skip the lines, you could use a 'next if', with the effect of skipping the line concerned:
next if ($line =~ /Place_of_Destination/)
or 'unless', as your wish:
unless ($line =~ /Place_of_Destination/) {
# do something
}
If I understood properly what you are trying to do this test is to be used in your second 'while' loop.
Earlier I was working on a loop within a loop and if a match was made it would replace the entire string from the second loop file. Now i have a slightly different situation. I'm trying to replace a substring from the first loop with a string from the second loop. They're both csv files and semicolon delimited. What i'm trying to replace are special characters: from the numerical code to the character itself The first file looks like:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
and the second file has the numerical code and the corresponding character:
Ą;Ą
ą;ą
Ǟ;Ǟ
Á;Á
á;á
Â;Â
ł;ł
The first semicolon in the second file belongs to the numerical code of the corresponding character and should not be used to split the file. The result should be:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał;8;9
This is the code I have. How can i fix this?
use strict;
use warnings;
my $inputfile1 = shift || die "input/output!\n";
my $inputfile2 = shift || die "input/output!\n";
my $outputfile = shift || die "output!\n";
open my $INFILE1, '<', $inputfile1 or die "Used/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "Used/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "Used/Not found :$!\n";
my $infile2_pos = tell $INFILE2;
while (<$INFILE1>) {
s/"//g;
my #elements = split /;/, $_;
seek $INFILE2, $infile2_pos, 0;
while (<$INFILE2>) {
s/"//g;
my #loopelements = split /;/, $_;
#### The problem part ####
if (($elements[2] =~ /\&\#\d{3}\;/g) and (($elements[2]) eq ($loopelements[0]))){
$elements[2] =~ s/(\&\#\d{3}\;)/$loopelements[1]/g;
print "$2. elements[2]\n";
}
#### End problem part #####
}
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
#print "\n"
}
close $INFILE1;
close $INFILE2;
close $OUTFILE;
exit 0;
Assuming your character codes are standard Unicode entities, you are better off using HTML::Entities to decode them.
This program processes the data you show in your first file and ignores the second file completely. The output seems to be what you want.
use strict;
use warnings;
use HTML::Entities 'decode_entities';
binmode STDOUT, ":utf8";
while (<DATA>) {
print decode_entities($_);
}
__DATA__
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
output
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
You split your #elements at every occurrence of ;, which is then removed. You will not find it in your data, the semicolon in your Regexp can never match, so no substitutions are done.
Anyway, using seek is somewhat disturbing for me. As you have a reasonable number of replacement codes (<5000), you might consider putting them into a hash:
my %subst;
while(<$INFILE2>){
/^&#(\d{3});;(.*)\n/;
$subst{$1} = $2;
}
Then we can do:
while(<$INFILE1>){
s| &# (\d{3}) | $subst{$1} // "&#$1" |egx;
# (don't try to concat undef
# when no substitution for our code is defined)
print $OUTFILE $_;
}
We do not have to split the files or view them as CSV data if replacement should occur everywhere in INFILE1. My solution should speed things up a bit (parsing INFILE2 only once). Here I assumed your input data is correct and the number codes are not terminated by a semicolon but by length. You might want to remove that from your Regexes.(i.e. m/&#\d{3}/)
If you have trouble with character encodings, you might want to open your files with :uft8 and/or use Encode or similar.
I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}