How to filter columns from CSV file based on names of columns - perl

I am using the CSV data like below. I don't want to use user and timestamp from csv file. I may add few columns or delete columns.
I didnt find the any suitable method in Text CSV.
Please let me know if any method or module is available
UniqueId, Name, description, user,timestamp
1,jana,testing,janardar,12-10-2018:00:
sub _filter_common_columns_from_csv{
my $csvfile = shift;
my $CSV = Text::CSV_XS->new(
{
binary => 1,
auto_diag => 3,
allow_quotes => 0,
eol => $/
});
my $_columns ||= do {
open(my $fh, '<', $csvfile) or die $!;
my #cols = #{ $CSV->getline($fh) };
close $fh or die $!;
for (#cols) { s/^\s+//; s/\s+$//; }
\#cols;
};
my #columns = #{ $_columns };
my %deleted;
my #regexes = qw(user timestamp);
foreach my $regex (#regexes) {
foreach my $i (0 .. ($#columns - 1)) {
my $col = $columns[$i];
$deleted{$i} = $col if $col =~ /$regex/;
}
}
my #wanted_columns = grep { !$deleted{$_} } 0 .. $#columns - 1;
my $input_temp = "$ENV{HOME}/output/temp_test.csv";
open my $tem, ">",$input_temp or die "$input_temp: $!";
open(my $fh, '<', $csvfile) or die $!;
while (my $row = $CSV->getline($fh)) {
my #fields = #$row;
$CSV->print($tem, [ #fields[#wanted_columns] ]) or $CSV->error_diag;
}
close $fh or die $!;
close $tem or die $!;
return $input_temp;
}

See getline_hr
use warnings;
use strict;
use feature 'say';
use List::MoreUtils qw(any);
use Text::CSV;
my $file = shift #ARGV || die "Usage: $0 filename\n";
my #exclude_cols = qw(user timestamp);
my $csv = Text::CSV->new ( { binary => 1 } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, '<', $file or die "Can't open $file: $!";
my #cols = #{ $csv->getline($fh) };
my #wanted_cols = grep {
my $name = $_;
not any { $name eq $_ } #exclude_cols;
} #cols;
my $row = {};
$csv->bind_columns (\#{$row}{#cols});
while ($csv->getline($fh)) {
my #wanted_fields = #$row{ #wanted_cols };
say "#wanted_fields";
}
The syntax #$row{#wanted_cols} is for a hash slice, which returns a list of values for the keys in #wanted_cols from the hashref $row.

Actual example using Text::AutoCSV to remove given named columns from arbitrary CSV files like in your posted code (More complicated than the examples in the documentation for only writing specific columns):
#!/usr/bin/perl
use warnings;
use strict;
use Text::AutoCSV qw/remove_accents/;
sub remove_columns {
my ($infile, $outfile, $drop) = #_;
my $csv = Text::AutoCSV->new(in_file => $infile, out_file => $outfile);
# Normalize column names the same way that Text::AutoCSV does
my %drops = map { my $h = remove_accents $_;
$h =~ s/[^[:alnum:]_]//gi;
$h = uc $h;
$h => 1 } #$drop;
my #cols = grep { not exists $drops{$_} } $csv->get_fields_names;
# Hack to avoid reading the file twice.
$csv->{out_fields} = \#cols;
$csv->write();
}
remove_columns "in.csv", "out.csv", [ "user", "timestamp" ];

If you want to modify your CSV in other ways, too, and if SQL would be convenient for those modifications, then consider using DBD::CSV.
You can then open a database handle on your CSV file, select the desired columns with a SELECT query, and write the results with Text::CSV or Text::CSV_XS.
For more details, see the DBD::CSV documentation or e.g. this simple wrapper script for querying CSV files.

Related

Compare two files and write matching data from first file using perl

First file
FirstName:LastName:Location:Country:ID
FirstName1:LastName1:Location1:Country1:ID1
FirstName2:LastName2:Location2:Country2:ID2
FirstName3:LastName3:Location3:Country3:ID3
FirstName4:LastName4:Location4:Country4:ID4
Second file
FirstName:LastName:Location:Country:Old_ID
FirstName2:LastName2:Location2:Country2:Old_ID2
FirstName4:LastName4:Location4:Country4:Old_ID4
Have to compare first and second file and print matching rows with data from first file which is have new ID's.
Below script fetches me Old_ID's from second file and not the new ones from first file
use warnings;
use strict;
my $details = 'file2.txt';
my $old_details = 'file1.txt';
my %names;
open my $data, '<', $details or die $!;
while (<$data>)
{
my ($name, #ids) = split;
push #{ $names{$_} }, $name for #ids;
}
open my $old_data, '<', $old_details or die $!;
while (<$old_data>)
{
chomp;
print #{ $names{$_} // [$_] }, "\n";
}
Output:
FirstName:LastName:Location:Country:Old_ID
FirstName2:LastName2:Location2:Country2:Old_ID2
FirstName4:LastName4:Location4:Country4:Old_ID4
Expected output:
FirstName:LastName:Location:Country:ID
FirstName2:LastName2:Location2:Country2:ID2
FirstName4:LastName4:Location4:Country4:ID4
Just try this way:
use strict; # Use strict Pragma
use warnings;
my ($file1, $filecnt1, $file2, $filecnt2) = ""; #Declaring variables
$file1 = "a1.txt"; $file2 = "b1.txt"; #Sample files
readFileinString($file1, \$filecnt1); # Reading first file
readFileinString($file2, \$filecnt2); # Reading second file
$filecnt2=~s/\:Old\_ID/\:ID/g; # Replacing that difference content
my #firstfle = split "\n", $filecnt1; # Move content to array variable to compare
my #secndfle = split "\n", $filecnt2;
my %firstfle = map { $_ => 1 } #firstfle; #Mapping the array into hash variable
my #scdcmp = grep { $firstfle{$_} } #secndfle;
print join "\n", #scdcmp;
#---------------> File reading
sub readFileinString
#--------------->
{
my $File = shift;
my $string = shift;
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
#---------------> File Writing
sub writeFileinString
#--------------->
{
my $File = shift;
my $string = shift;
my #cDir = split(/\\/, $File);
my $tmp = "";
for(my $i = 0; $i < $#cDir; $i++)
{
$tmp = $tmp . "$cDir[$i]\\";
mkdir "$tmp";
}
if(-f $File){
unlink($File);
}
open(FILE, ">$File") or die "\n\nFailed File Open for Writing: [$File]\n\nReason: $!\n";
print FILE $$string;
close(FILE);
}

Perl - need to store the column values into hash

I want to create a hash with column header as hash key and column values as hash values in Perl.
For example, if my csv file has the following data:
A,B,C,D,E
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15 ...
I want to create a hash as follows:
A=> 1,6,11
B=>2,7,12
c=>3,8,13 ...
So that just by using the header name I can use that column values.
Is there a way in PERL to do this? Please help me.
I was able to store required column values as array using the following script
use strict;
use warnings;
open( IN, "sample.csv" ) or die("Unable to open file");
my $wanted_column = "A";
my #cells;
my #colvalues;
my $header = <IN>;
my #column_names = split( ",", $header );
my $extract_col = 0;
for my $header_line (#column_names) {
last if $header_line =~ m/$wanted_column/;
$extract_col++;
}
while ( my $row = <IN> ) {
last unless $row =~ /\S/;
chomp $row;
#cells = split( ",", $row );
push( #colvalues, $cells[$extract_col] );
}
my $sizeofarray = scalar #colvalues;
print "Size of the coulmn= $sizeofarray";
But I want to do this to all my column.I guess Hash of arrays will be the best solution but I dont know how to implement it.
Text::CSV is a useful helper module for this sort of thing.
use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
my %combined;
open( my $input, "<", "sample.csv" ) or die("Unable to open file");
my $csv = Text::CSV->new( { binary => 1 } );
my #headers = #{ $csv->getline($input) };
while ( my $row = $csv->getline($input) ) {
for my $header (#headers) {
push( #{ $combined{$header} }, shift(#$row) );
}
}
print Dumper \%combined;
Since you requested without a module - you can use split but you need to bear in mind the limitations. CSV format allows for things like commas nested in quotes. split won't handle that case very well.
use strict;
use warnings;
use Data::Dumper;
my %combined;
open( my $input, "<", "sample.csv" ) or die("Unable to open file");
my $line = <$input>;
chomp ( $line );
my #headers = split( ',', $line );
while (<$input>) {
chomp;
my #row = split(',');
for my $header (#headers) {
push( #{ $combined{$header} }, shift(#row) );
}
}
print Dumper \%combined;
Note: Both of these will effectively ignore any extra columns that don't have headers. (And will get confused by duplicate column names).
Another solution by using for loop :
use strict;
use warnings;
my %data;
my #columns;
open (my $fh, "<", "file.csv") or die "Can't open the file : ";
while (<$fh>)
{
chomp;
my #list=split(',', $_);
for (my $i=0; $i<=$#list; $i++)
{
if ($.==1) # collect the columns, if its first line.
{
$columns[$i]=$list[$i];
}
else #collect the data, if its not the first line.
{
push #{$data{$columns[$i]}}, $list[$i];
}
}
}
foreach (#columns)
{
local $"="\,";
print "$_=>#{$data{$_}}\n";
}
Output will be like this :
A=>1,6,11
B=>2,7,12
C=>3,8,13
D=>4,9,14
E=>5,10,15

Perl - have a comma separated output , want to write that in a CSV

Here is the code:
my #col= sort keys %colnames;
print "mRNA,".join(",",#col)."\n";
foreach my $row(keys %rownames){
print "$row";
foreach my $col(#col){
my $num=$mat{$col}->{$row};
$num=~s/(\.\d\d)\d+/$1/;
print ",$num";
}
print "\n";
}
Output:
mRNA,Benzopyrene12h_replica1,Benzopyrene12h_replica2
E2F1,5.01,4.72
REV1,2.76,2.67
POLK,1.21,1.87
POLH,1.49,1.56
POLI,1.94,2.45
Please help me write this output to .csv file.
Something like this might work... Combining with Miller's answer. I didn't test it, just giving you an idea. And it's defiantly could be written more cleanly and less redundant.
use strict;
use warnings;
use autodie;
my $csvFile = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
my #col= sort keys %colnames;
my #csv;
$csv[0][0] = "mRNA,";
my #joinCol = join(",",#col);
my $i =1;
foreach (#joinCol) {
$csv[0][$i] = $_;
$i++;
}
my $k = 1;
foreach my $row(keys %rownames){
my $j = 0;
print "$row";
$csv[$k][$j] = $row;
foreach my $col(#col){
my $num=$mat{$col}->{$row};
$num=~s/(\.\d\d)\d+/$1/;
print ",$num";
$csv[$k][$j] = $num;
$j++;
}
print "\n";
$k++;
}
open $fh, '>', "new.csv" or die "Couldn't open csv file: $! \n";
for (#csv) {
$csvFile->print($fh, $_);
}
close $fh;
To write to a CSV file, use Text::CSV
use strict;
use warnings;
use autodie;
# Your Data Initialization
my %colnames; # = Something
my %rownames; # = Something else
my %mat; # = a hash of hash
# Prepare CSV
my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open $fh, '>', "new.csv";
my #col = sort keys %colnames;
# Output Header
$csv->print($fh, ['mRNA', #col]);
# Output Rows
for my $row (keys %rownames){
my #data = ($row);
for my $col (#col){
my $num = $mat{$col}{$row};
$num =~ s/(\.\d\d)\d+/$1/;
push #data, $num;
}
$csv->print($fh, \#data);
}
close $fh;

Reading and comparing lines in Perl

I am having trouble with getting my perl script to work. The issue might be related to the reading of the Extract file line by line within the while loop, any help would be appreciated. There are two files
Bad file that contains a list of bad IDs (100s of IDs)
2
3
Extract that contains a delimited data with the ID in field 1 (millions of rows)
1|data|data|data
2|data|data|data
2|data|data|data
2|data|data|data
3|data|data|data
4|data|data|data
5|data|data|data
I am trying to remove all the rows from the large extract file where the IDs match. There can be multiple rows where the ID matches. The extract is sorted.
#use strict;
#use warnning;
$SourceFile = $ARGV[0];
$ToRemove = $ARGV[1];
$FieldNum = $ARGV[2];
$NewFile = $ARGV[3];
$LargeRecords = $ARGV[4];
open(INFILE, $SourceFile) or die "Can't open source file: $SourceFile \n";
open(REMOVE, $ToRemove) or die "Can't open toRemove file: $ToRemove \n";
open(OutGood, "> $NewFile") or die "Can't open good output file \n";
open(OutLarge, "> $LargeRecords") or die "Can't open Large Records output file \n";
#Read in the list of bad IDs into array
#array = <REMOVE>;
#Loop through each bad record
foreach (#array)
{
$badID = $_;
#read the extract line by line
while(<INFILE>)
{
#take the line and split it into
#fields = split /\|/, $_;
my $extractID = $fields[$FieldNum];
#print "Here's what we got: $badID and $extractID\n";
while($extractID == $badID)
{
#Write out bad large records
print OutLarge join '|', #fields;
#Get the next line in the extract file
#fields = split /\|/, <INFILE>;
my $extractID = $fields[$FieldNum];
$found = 1; #true
#print " We got a match!!";
#remove item after it has been found
my $input_remove = $badID;
#array = grep {!/$input_remove/} #array;
}
print OutGood join '|', #fields;
}
}
Try this:
$ perl -F'|' -nae 'BEGIN {while(<>){chomp; $bad{$_}++;last if eof;}} print unless $bad{$F[0]};' bad good
First, you are lucky: The number of bad IDs is small. That means, you can read the list of bad IDs once, stick them in a hash table without running into any difficulty with memory usage. Once you have them in a hash, you just read the big data file line by line, skipping output for bad IDs.
#!/usr/bin/env perl
use strict;
use warnings;
# hardwired for convenience
my $bad_id_file = 'bad.txt';
my $data_file = 'data.txt';
my $bad_ids = read_bad_ids($bad_id_file);
remove_data_with_bad_ids($data_file, $bad_ids);
sub remove_data_with_bad_ids {
my $file = shift;
my $bad = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
exists $bad->{ $id } or print $line;
}
}
close $in
or die "Cannot close '$file': $!";
return;
}
sub read_bad_ids {
my $file = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
my %bad;
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
$bad{ $id } = undef;
}
}
close $in
or die "Cannot close '$file': $!";
return \%bad;
}
sub extract_id {
my $string_ref = shift;
if (my ($id) = ($$string_ref =~ m{\A ([0-9]+) }x)) {
return $id;
}
return;
}
I'd use a hash as follows:
use warnings;
use strict;
my #bad = qw(2 3);
my %bad;
$bad{$_} = 1 foreach #bad;
my #file = qw (1|data|data|data 2|data|data|data 2|data|data|data 2|data|data|data 3|data|data|data 4|data|data|data 5|data|data|data);
my %hash;
foreach (#file){
my #split = split(/\|/);
$hash{$split[0]} = $_;
}
foreach (sort keys %hash){
print "$hash{$_}\n" unless exists $bad{$_};
}
Which gives:
   
1|data|data|data
4|data|data|data
5|data|data|data

Retrieve first row from CSV as headers using Text::CSV

I feel like I'm missing something rather obvious, but can't find any answers in the documentation. Still new to OOP with Perl, but I'm using Text::CSV to parse a CSV for later use.
How would I go about extracting the first row and pushing the values to array #headers?
Here's what I have so far:
#!/usr/bin/perl
use warnings;
use diagnostics;
use strict;
use Fcntl ':flock';
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
my $file = "sample.csv";
my #headers; # Column names
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
chomp $line;
if ($csv->parse($line)) {
my $r = 0; # Increment row counter
my $field_count = $csv->fields(); # Number of fields in row
# While data exists...
while (my $fields = $csv->getline( $data )) {
# Parse row into columns
print "Row ".$r.": \n";
# If row zero, process headers
if($r==0) {
# Add value to #columns array
push(#headers,$fields->[$c]);
} else {
# do stuff with records...
}
}
$r++
}
close $data;
You'd think that there would be a way to reference the existing fields in the first row.
Pretty much straight from the documentation, for example.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
my $file = 'o33.txt';
open my $io, "<", $file or die "$file: $!";
my $header = $csv->getline ($io);
print join("-", #$header), "\n\n";
while (my $row = $csv->getline ($io)) {
print join("-", #$row), "\n";
}
__END__
***contents of o33.txt
lastname,firstname,age,gender,phone
mcgee,bobby,27,M,555-555-5555
kincaid,marl,67,M,555-666-6666
hofhazards,duke,22,M,555-696-6969
Prints:
lastname-firstname-age-gender-phone
mcgee-bobby-27-M-555-555-5555
kincaid-marl-67-M-555-666-6666
hofhazards-duke-22-M-555-696-6969
Update: Thinking about your problem, it may be that you want to address the data by its column name. For that, you might be able to use something (also from the docs), like this:
$csv->column_names ($csv->getline ($io));
while (my $href = $csv->getline_hr ($io)) {
print "lastname is: ", $href->{lastname},
" and gender is: ", $href->{gender}, "\n"
}
Note: You can use Text::CSV instead of Text::CSV_XS, as the former is a wrapper around the latter.
Thought I'd post my results for others.
#!/usr/bin/perl
use warnings;
use diagnostics;
use strict;
use Text::CSV;
sub read_csv {
my $csv = Text::CSV->new({ sep_char => ',' });
my $file = shift;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
# Process Row Zero
my $header = $csv->getline ($data);
my $field_count = $csv->fields();
# Read the rest of the file
while (my $line = <$data>) {
chomp $line;
# Read line if possible
if ($csv->parse($line)) {
my $r = 0;
# While data exists...
while (my $fields = $csv->getline( $data )) {
# Parse row into columns
print Display->H2;
print "Row ".$r.": ".#$fields." columns. \n";
# Print column values
for(my $c=0; $c<#$fields; $c++) {
print #$header[$c]." : ".#$fields[$c]."\n";
}
$r++
}
}
close $data;
}
}
Cheers