Related
How can i Split Value by Newline (\n) in some column, extract to new row and fill other column
My Example CSV Data (data.csv)
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP
HTTP
HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP
SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
In Service column has multiple value, separate by new line.
I want to extract it and fill with other value in some row look like this.
1,test#email.com,192.168.10.110,FTP,,
1,test#email.com,192.168.10.110,HTTP,,
1,test#email.com,192.168.10.110,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
I try to parsing with Text::CSV, I can only split multiple ip and service But i Don't known to fill other value as above example.
#!/usr/bin/perl
use Text::CSV;
my $file = "data.csv";
my #csv_value;
open my $fh, '<', $file or die "Could not open $file: $!";
my $csv = Text::CSV->new;
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
push #csv_value, $fields;
}
close $data;
Thank you in advance for any help you can provide.
To expand on my comment
perl -ne 'if (!/^\d/){print "$line$_";} else {print $_;} /(.*,).*/; $line=$1;' file1
Use the perl command line options
e = inline command
n = implicit loop, i.e. for every line in the file do the script
Each line of the file is now in the $_ default variable
if (!/^\d/){print "$line$_";} - if the line does not start with a digit print the $line (more later) variable, followed by default variable which is the line from the file
else {print $_;} - else just print the line
Now after we've done this if the line matches anything followed by a comma followed by anything, catch it with the regex bracket so it's put in $1. So for the first line $1 will be '1,test#email.com,192.168.10.109,'
/(.*,).*/; $line=$1;
Because we do this after the first line has been printed $line will always be the previous full line.
Your input CSV is broken. I would suggest to fix the generator.
With correctly formatted input CSV you will have to enable binary option in Text::CSV as your data contains non-ASCII characters.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# input has non-ASCII characters
my $csv_in = Text::CSV->new({ binary => 1 });
my $csv_out = Text::CSV->new();
$csv_out->eol("\n");
while (my $row = $csv_in->getline(\*STDIN)) {
for my $protocol (split("\n", $row->[3])) {
$row->[3] = $protocol;
$csv_out->print(\*STDOUT, $row);
}
}
exit 0;
Test with fixed input data:
$ cat dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,"FTP
HTTP
HTTPS",,
2,webmaster#email.com,192.168.10.111,"SFTP
SNMP",,
3,admin#email.com,192.168.10.112,HTTP,,
$ perl dummy.pl <dummy.csv
No,Email,IP,Service,Comment
1,test#email.com,192.168.10.109,FTP,,
1,test#email.com,192.168.10.109,HTTP,,
1,test#email.com,192.168.10.109,HTTPS,,
2,webmaster#email.com,192.168.10.111,SFTP,,
2,webmaster#email.com,192.168.10.111,SNMP,,
3,admin#email.com,192.168.10.112,HTTP,,
I have a Perl question. I have a file each line of this file contains a different number of As Ts Gs and Cs
The file looks like below
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
I would like to add line number for each line
Then insert a \n every 6 characters and then on each of the new rows created put an
Empty space every 3 characters
Example of the output should be
Line NO 1
ATC GCT
GAS TGA
TGC TG
Line NO 2
GCC TAG
CCC TTA
GC
I have come up with the code below:
my $count = 0;
my $line;
my $row;
my $split;
open(F, "Data.txt") or die "Can't read file: $!";
open (FH, " > UpDatedData.txt") or die "Can't write new file: $!";
while (my $line = <F>) {
$count ++ ;
$row = join ("\n", ( $line =~ /.{1,6}/gs));
$split = join ("\t", ( $row =~ /.{3}/gs ));
print FH "Line NO\t$count\n$split\n";
}
close F;
close FH;
However
It gives the following out put
Line NO 1
ATC GCT
GA STG A
T GCT G
Line NO 2
GCC TAG
CC CTT A
G C
This must have something with the \n being counted as a character in this line of code
$split = join ("\t", ( $row =~ /.{3}/gs ));
Any one got any idea how to get around this problem?
Any help would be greatly appreciated.
Thanks in advance
Sinead
This should solve your problem:
use strict;
use warnings;
while (<DATA>) {
s/(.{3})(.{0,3})?/$1 $2 /g;
s/(.{7}) /$1\n/g;
printf "Line NO %d\n%s\n", $., $_;
}
__DATA__
ATCGCTGASTGATGCTG
GCCTAGCCCTTAGC
GTTCCATGCCCATAGCCAAATAAA
This is a one-liner:
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g' data.txt
The regex looks for 3 characters (does not match newline), followed by 0-3 characters and captures both of those, then inserts a space between them and newline after.
To keep track of the line numbers, you can add
s/^/Line NO $.\n/;
Which will enumerate based on input line number. If you prefer, you can keep a simple counter, such as ++$i.
-l option will handle newlines for you.
You can also do it in two stages, like so:
perl -plwe's/.{6}\K/\n/g; s/^.{3}\K/ /gm;'
Using the \K (keep) escape sequence here to keep the matched part of the string, and then simply inserting a newline after 6 characters, and then a space 3 characters after "line beginnings", which with the /m modifier also includes newlines.
So, in short:
perl -plwe 's/.{6}\K/\n/g; s/^.{3}\K/ /gm; s/^/Line NO $.\n/;' data.txt
perl -plwe 's/(.{3})(.{0,3})/$1 $2\n/g; s/^/Line NO $.\n/;' data.txt
Another solution. Note that it uses lexical filehandles and three argument form of open.
#!/usr/bin/perl
use warnings;
use strict;
open my $IN, '<', 'Data.txt' or die "Can't read file: $!";
open my $OUT, '>', 'UpDatedData.txt' or die "Can't write new file: $!";
my $count = 0;
while (my $line = <$IN>) {
chomp $line;
$line =~ s/(...)(...)/$1 $2\n/g; # Create pairs of triples
$line =~ s/(\S\S\S)(\S{1,2})$/$1 $2\n/; # A triple plus something at the end.
$line .= "\n" if $line !~ /\n$/; # A triple or less at the end.
$count++;
print $OUT "Line NO\t$count\n$line\n";
}
close $OUT;
Earlier I was working on a loop within a loop and if a match was made it would replace the entire string from the second loop file. Now i have a slightly different situation. I'm trying to replace a substring from the first loop with a string from the second loop. They're both csv files and semicolon delimited. What i'm trying to replace are special characters: from the numerical code to the character itself The first file looks like:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
and the second file has the numerical code and the corresponding character:
Ą;Ą
ą;ą
Ǟ;Ǟ
Á;Á
á;á
Â;Â
ł;ł
The first semicolon in the second file belongs to the numerical code of the corresponding character and should not be used to split the file. The result should be:
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał;8;9
This is the code I have. How can i fix this?
use strict;
use warnings;
my $inputfile1 = shift || die "input/output!\n";
my $inputfile2 = shift || die "input/output!\n";
my $outputfile = shift || die "output!\n";
open my $INFILE1, '<', $inputfile1 or die "Used/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "Used/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "Used/Not found :$!\n";
my $infile2_pos = tell $INFILE2;
while (<$INFILE1>) {
s/"//g;
my #elements = split /;/, $_;
seek $INFILE2, $infile2_pos, 0;
while (<$INFILE2>) {
s/"//g;
my #loopelements = split /;/, $_;
#### The problem part ####
if (($elements[2] =~ /\&\#\d{3}\;/g) and (($elements[2]) eq ($loopelements[0]))){
$elements[2] =~ s/(\&\#\d{3}\;)/$loopelements[1]/g;
print "$2. elements[2]\n";
}
#### End problem part #####
}
my $output_line = join(";", #elements);
print $OUTFILE $output_line;
#print "\n"
}
close $INFILE1;
close $INFILE2;
close $OUTFILE;
exit 0;
Assuming your character codes are standard Unicode entities, you are better off using HTML::Entities to decode them.
This program processes the data you show in your first file and ignores the second file completely. The output seems to be what you want.
use strict;
use warnings;
use HTML::Entities 'decode_entities';
binmode STDOUT, ":utf8";
while (<DATA>) {
print decode_entities($_);
}
__DATA__
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
output
1;2;blałblabla ąbla;7;8
3;4;bląblabla;9;10
2;3;blablablaąał8;9
You split your #elements at every occurrence of ;, which is then removed. You will not find it in your data, the semicolon in your Regexp can never match, so no substitutions are done.
Anyway, using seek is somewhat disturbing for me. As you have a reasonable number of replacement codes (<5000), you might consider putting them into a hash:
my %subst;
while(<$INFILE2>){
/^&#(\d{3});;(.*)\n/;
$subst{$1} = $2;
}
Then we can do:
while(<$INFILE1>){
s| &# (\d{3}) | $subst{$1} // "&#$1" |egx;
# (don't try to concat undef
# when no substitution for our code is defined)
print $OUTFILE $_;
}
We do not have to split the files or view them as CSV data if replacement should occur everywhere in INFILE1. My solution should speed things up a bit (parsing INFILE2 only once). Here I assumed your input data is correct and the number codes are not terminated by a semicolon but by length. You might want to remove that from your Regexes.(i.e. m/&#\d{3}/)
If you have trouble with character encodings, you might want to open your files with :uft8 and/or use Encode or similar.
I have several text files, that were once tables in a database, which is now disassembled. I'm trying to reassemble them, which will be easy, once I get them into a usable form. The first file, "keys.text" is just a list of labels, inconsistently formatted. Like:
Sa 1 #
Sa 2
U 328 #*
It's always letter(s), [space], number(s), [space], and sometime symbol(s). The text files that match these keys are the same, then followed by a line of text, also separated, or delimited, by a SPACE.
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
What I'm trying to do in the code below, is match the key from "keys.text", with the same key in the .txt files, and put a tab between the key, and the text. I'm sure I'm overlooking something very basic, but the result I'm getting, looks identical to the source .txt file.
Thanks in advance for any leads or assistance!
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");
my $key;
# Read each line one at a time
while ($key = <IN1>) {
# For each txt file in the current directory
foreach my $file (<*.txt>) {
open(IN, $file) or die("Cannot open TXT file for reading: $!");
open(OUT, ">temp.txt") or die("Cannot open output file: $!");
# Add temp modified file into directory
my $newFilename = "modified\/keyed_" . $file;
my $line;
# Read each line one at a time
while ($line = <IN>) {
$line =~ s/"\$key"/"\$key" . "\/t"/;
print(OUT "$line");
}
rename("temp.txt", "$newFilename");
}
}
EDIT: Just to clarify, the results should retain the symbols from the keys as well, if there are any. So they'd look like:
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
The regex seems quoted rather oddly to me. Wouldn't
$line =~ s/$key/$key\t/;
work better?
Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.
And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.
if Perl is not a must, you can use this awk one liner
$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*
$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...
Using split rather than s/// makes the problem straightforward. In the code below, read_keys extracts the keys from keys.text and records them in a hash.
Then for all files named on the command line, available in the special Perl array #ARGV, we inspect each line to see whether it begins with a key. If not, we leave it alone, but otherwise insert a TAB between the key and the text.
Note that we edit the files in-place thanks to Perl's handy -i option:
-i[extension]
specifies that files processed by the <> construct are to be edited in-place. It does this by renaming the input file, opening the output file by the original name, and selecting that output file as the default for print statements. The extension, if supplied, is used to modify the name of the old file to make a backup copy …
The line split " ", $_, 3 separates the current line into exactly three fields. This is necessary to protect whitespace that's likely to be present in the text portion of the line.
#! /usr/bin/perl -i.bak
use warnings;
use strict;
sub usage { "Usage: $0 text-file\n" }
sub read_keys {
my $path = "keys.text";
open my $fh, "<", $path
or die "$0: open $path: $!";
my %key;
while (<$fh>) {
my($text,$num) = split;
++$key{$text}{$num} if defined $text && defined $num;
}
wantarray ? %key : \%key;
}
die usage unless #ARGV;
my %key = read_keys;
while (<>) {
my($text,$num,$line) = split " ", $_, 3;
$_ = "$text $num\t$line" if defined $text &&
defined $num &&
$key{$text}{$num};
print;
}
Sample run:
$ ./add-tab input
$ diff -u input.bak input
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ input 2010-07-20 21:00:21.119531937 -0500
## -1,3 +1,3 ##
-Sa 1 # Random line of text follows.
-Sa 2 This text is just as random.
-U 328 #* Continuing text...
+Sa 1 # Random line of text follows.
+Sa 2 This text is just as random.
+U 328 #* Continuing text...
Fun answers:
$line =~ s/(?<=$key)/\t/;
Where (?<=XXXX) is a zero-width positive lookbehind for XXXX. That means it matches just after XXXX without being part of the match that gets substituted.
And:
$line =~ s/$key/$key . "\t"/e;
Where the /e flag at the end means to do one eval of what's in the second half of the s/// before filling it in.
Important note: I'm not recommending either of these, they obfuscate the program. But they're interesting. :-)
How about doing two separate slurps of each file. For the first file you open the keys and create a preliminary hash. For the second file then all you need to do is add the text to the hash.
use strict;
use warnings;
my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";
my %hash = ();
my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';
open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
my $line = $_;
if ($line =~ /$keys_regex/){
my $key = $1;
my $number = $2;
my $symbol = $3;
$hash{$key}{'number'} = $number;
$hash{$key}{'symbol'} = $symbol;
}
}
close $fh;
open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
my $line = $_;
if ($line =~ /^([a-zA-Z]+)/){
my $key = $1;
// strip content_file line from keys/number/symbols to leave text
line =~ s/^$key//;
line =~ s/\s*$hash{$key}{'number'}//;
line =~ s/\s*$hash{$key}{'symbol'}//;
$line =~ s/^\s+//g;
$hash{$key}{'text'} = $line;
}
}
close $fh;
open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;
I haven't had a chance to test it yet and the solution seems a little hacky with all the regex but might give you an idea of something else you can try.
This looks like the perfect place for the map function in Perl! Read in the entire text file into an array, then apply the map function across the entire array. The only other thing you might want to do is use the quotemeta function to escape out any possible regular expressions in your keys.
Using map is very efficient. I also read the keys into an array in order to not have to keep opening and closing the keys file in my loop. It's an O^2 algorithm, but if your keys aren't that big, it shouldn't be too bad.
#! /usr/bin/env perl
use strict;
use vars;
use warnings;
open (KEYS, "keys.text")
or die "Cannot open 'keys.text' for reading\n";
my #keys = <KEYS>;
close (KEYS);
foreach my $file (glob("*.txt")) {
open (TEXT, "$file")
or die "Cannot open '$file' for reading\n";
my #textArray = <TEXT>;
close (TEXT);
foreach my $line (#keys) {
chomp $line;
map($_ =~ s/^$line/$line\t/, #textArray);
}
open (NEW_TEXT, ">$file.new") or
die qq(Can't open file "$file" for writing\n);
print TEXT join("\n", #textArray) . "\n";
close (TEXT);
}
I need to edit file , the main issue is to append text between two known lines in the file
for example I need to append the following text
a b c d e f
1 2 3 4 5 6
bla bla
Between the first_line and the second_line
first_line=")"
second_line="NIC Hr_Nic ("
remark: first_line and second_line argument can get any line or string
How to do this by perl ? ( i write bash script and I need to insert the perl syntax in my script)
lidia
You could read the file in as a single string and then use a regular expression to do the search and replace:
use strict;
use warnings;
# Slurp file myfile.txt into a single string
open(FILE,"myfile.txt") || die "Can't open file: $!";
undef $/;
my $file = <FILE>;
# Set strings to find and insert
my $first_line = ")";
my $second_line = "NIC Hr_Nic (";
my $insert = "hello world";
# Insert our text
$file =~ s/\Q$first_line\E\n\Q$second_line\E/$first_line\n$insert\n$second_line/;
# Write output to output.txt
open(OUTPUT,">output.txt") || die "Can't open file: $!";
print OUTPUT $file;
close(OUTPUT);
By unsetting $/ we put Perl into "slurp mode" and so can easily read the whole file into $file.
We use the s/// operator to do a search and replace using our two search lines as a pattern.
The \Q and \E tell Perl to escape the strings between them, i.e. to ignore any special characters that happen to be in $first_line or $second_line.
You could always write the output over the original file if desired.
The problem as you state it is not solvable using the -i command line option since this option processes a file one line at a time; to insert text between two specific lines you'll need to know about two lines at once.
Well to concenate strings you do
my $text = $first_line . $second_line;
or
my $text = $first_line;
$text .= $second_line;
I'm not sure if I understand your question correctly. A "before and after" example of the file content would, I think, be easier. Anyhow, Here's my take on it, using splice instead of a regular expression. We must of course know the line numbers for this to work.
Load the file into an array:
my #lines;
open F, '<', 'filename' or die $!;
push #lines, $_ for <F>;
close F;
Insert the stuff (see perldoc -f splice):
splice #lines, 1, 0, ('stuff');
..and you're done. All you need to do now is save the array again:
open F, '>', 'filename' or die $!;
print F #lines;
close F;