Relative Record Separator in Perl - perl

I have a data that looks like this:
id:40108689 --
chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),
id:40108116 --
chr22_scrambled_bysegments:25375481:F : chr22_scrambled_bysegments:25375481:F (1.0),
chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),
id:1 --
chr22:21133765:F : chr22:21133765:F (0.0),
So each record is separated by id:[somenumber] --
What's the way to access the data so that we can have a hash of array:
$VAR = { 'id:40108689' => [' chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),'],
'id:40108116' => ['chr22_scrambled_bysegments:25375481:F :chr22_scrambled_bysegments:25375481:F (1.0)',
'chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),'
#...etc
}
I tried to approach this using record separator. But not sure how to generalize it?
{
local $/ = " --\n"; # How to include variable content id:[number] ?
while ($content = <INFILE>) {
chomp $content;
print "$content\n" if $content; # Skip empty records
}
}

my $result = {};
my $last_id;
while (my $line = <INFILE>) {
if ($line =~ /(id:\d+) --/) {
$last_id = $1;
next;
}
next unless $last_id; # Just in case the file doesn't start with an id line
push #{ $result->{$last_id} }, $line;
}
use Data::Dumper;
print Dumper $result;
Uses the normal record separator.
Uses $last_id to keep track of the last id row encountered and is set to the next id when another one is encountered. Pushes non-id rows on to an array for the hash key of the last matched id line.

Related

Split string into a hash of hashes (perl)

at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH

compare the value of same items with one exception case

A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
QC4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
QA4,train,75
F,movie,
It should do comparison between the values of same type (wherever the 2nd column matches) and Print the differ .Now i want output to look like:
**A,food,75 is not matching with B,food,75 C,food,85
A,car,69 is not matching with C,car,136 B,Car,136
A,house,179 is not matching with B,house,150
QC4,train,95 is not matching with QA4,train,75
F,movie missing value
Code I've written is below but its not printing the format the way I want.
while (FILE) {
my $line = $_ ;
my #lines = split /,/, $line ;
$data{$lines[1]}{$lines[0]} = $lines[2] ;
}
foreach my $item (keys %val) {
foreach my $letter1 (keys %{$val{$item}}) {
foreach my $letter2 (keys %{$val{$item}}) {
if ( ($val{$item}{$letter1} != $val{$item}{$letter2}) && ($letter1 ne
$letter2) && ( (!defined $done{$item}{$letter1}{$letter2}) ||
(!defined
$done{$item}{$letter2}{$letter1}) ) ) {
print "$item : $letter1, $val{$item}{$letter1}, $letter2 ,
$val{$item}
{$letter2}\n" ;
}
}
Really hard to follow the logic of your code.
But I seem to get the desired result with this:
[Edit] The code was edited as per the comments
use strict;
use warnings;
my (%hash,
);
while(my $line=<DATA>) {
chomp $line;
my ($letter, $object, $number)=split /,/, $line;
### here we are dealing with missing values
if ($number =~ m{^\s*$}) {
print $line, "missing value\n";
next;
}
### here we dissever exceptional items apart from the others
if ($letter=~m{^E[AC]X\d$}) {
$object = "exceptional_$object";
}
$number+=0; # in case there is whitespace at the end
push #{$hash{$object}{$number}}, [$letter,$number,$line];
}
for my $object(sort keys %hash) {
my $oref = $hash{$object};
if (1==keys %$oref) {
next;
}
my $str;
for my $item (values %$oref) {
$str .= $str ? " $item->[0][2]" : "$item->[0][2] is not matching with";
}
print ($str,"\n");
}
__DATA__
A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
ECX4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
EAX4,train,75
F,movie,
output
F,movie,missing value
A,car,69 is not matching with B,car,136
EAX4,train,75 is not matching with ECX4,train,95
C,food,85 is not matching with A,food,75
A,house,179 is not matching with B,house,150
What the algorithm does:
Looping through the input we remember the all the lines for each unique pair of object and number.
After going through the input loop we do the following:
For each object we skip it if it has no different numbers:
if (1==keys %$oref) {
next
}
If it has, we build an output string from a list of the first remembered lines for that object and number (that is we omit the duplicates for the object and number);
the first item from the list amended with "is not matching with".
Also, I am reading from the special filehandle DATA, which accesses embedded data in the script. This is for convenience of demoing the code

Text file to hash table

I want to create a hash table for the data in my file.
The file contains a bunch of commands that are written as
===|showcommand|
Every time I see this delimiter I want to create a hash key and store the data below it as an array in the value until it sees the next delimiter.
The next delimiter will do the same thing which is to create a hash key with the delimiter name and store the data on the next lines following it into an array as a value.
my %commands;
my $name;
my $body;
while (<>) {
if (my ($new_name) = /===\|([^|]*)\|/) {
$commands{$name} = $body if defined($name);
$name = $new_name;
$body = '';
} else {
$body .= $_;
}
}
$commands{$name} = $body if defined($name);
Assumes the body of the command starts on the line after the header, and stop on the line before the one with the next header.
You probably have it already working, but adding a small comment still, regarding the question on how to return the hash from a function.
Here's an example:
Input -file (used the following, which, I think contains similar structure as your input -file.
===|showcommand|
cmd1
cmd2
cmd3
cmd4
===|testcommand|
command1
command2
command3
===|anothercommand|
another1
another2
another3
another4
Perl -script:
use strict;
# Calling ReadCommandFile to build hash.
my %commands = ReadCommandFile("./commands.txt");
# ReadCommandFile - reads commands.txt and builds
# a hash.
sub ReadCommandFile()
{
my $file = shift;
my %hash = ();
my $name;
open(FILE, "<$file");
while(<FILE>)
{
if($_ =~ /===\|(.*)\|/)
{
$name = $1;
$hash{$name} = [];
}
else
{
my $line = $_;
$line =~ s/\n$//;
push(#{$hash{$name}}, $line);
}
}
close(FILE);
return %hash;
}
As a result, you should get the following hash (output from Data::Dumper):
$VAR1 = 'anothercommand';
$VAR2 = [
'another1',
'another2',
'another3',
'another4'
];
$VAR3 = 'showcommand';
$VAR4 = [
'cmd1',
'cmd2',
'cmd3',
'cmd4'
];
$VAR5 = 'testcommand';
$VAR6 = [
'command1',
'command2',
'command3'
];
You can then access individual elements like this:
Get the third command from "showcommand":
print "\nCommand #3: " . $commands{'showcommand'}[2];
Output: cmd3
The data from the file is copied to a hash and the commands are added as an array under the respective keywords.
Thanks!

awk two files based on 1st & 2nd column

I'm trying to merge a new file with an old one.
There is a unique key on the primary column, then a separator '=' and a value.
In case primary key exist in both file I must keep the old value and if the new one is different add a comment near the line.
In case primary key exist only in old, keep it.
In case primary key exist only in new, insert it
For example:
In the old file:
$ cat oldfile.txt
VAR NAME ONE = FOO
TWO BAR = VALUE
; this is a comment
In the new one:
$ cat newfile.txt
TWO BAR = VALUE
; this is a comment
VAR NAME ONE = BAR
NEW = DATA
Desired output :
$ cat output.txt
VAR NAME ONE = FOO
;new value:
; VAR NAME ONE = BAR
TWO BAR = VALUE
; this is a comment
NEW = DATA
I've tried to deal with diff but it work only line by line, I'm pretty sure awk can do it.. but I'm not an expert with awk. I can write something in ksh to do the job, but I'm pretty sure there is an other way quicker and simpler.
Please note, order of line in previous and new file can change and I'm on AIX (Unix), not Linux.
Thanks for your help :)
EDIT:
I did not precise in the first post, new comments must be kept if they are not already present in the previous file.
Perl solution. First, it reads the new file into a hash. Then it goes over the old one and consults the hash for the changes. You did not specify what to do with comments in the new file, you have to tweak the code at the corresponding comment.
#!/usr/bin/perl
use warnings;
use strict;
my ($oldfile, $newfile) = #ARGV;
my %new;
my $last;
open my $NEW, '<', $newfile or die $!;
while (<$NEW>) {
chomp;
if (/^;/) {
$new{$last}{comment} = $_; # What should we do with comments?
} elsif (! /=/) {
warn "Invalid new line $.\n";
} else {
my ($key, $value) = split /\s* = \s*/x, $_, 2;
$new{$key}{value} = $value;
$last = $key;
}
}
open my $OLD, '<', $oldfile or die $!;
while (<$OLD>) {
chomp;
if (/^;/) {
print "$_\n";
} elsif (my ($key, $value) = split /\s* = \s*/x, $_, 2) {
if (exists $new{$key}) {
print "$key = $value\n";
if ($new{$key}{value} ne $value) {
print ";new value:\n";
print "; $key = $new{$key}{value}\n";
}
} else {
print "$key = $value\n";
}
delete $new{$key};
} else {
warn "Invalid old line $.\n";
}
}
for my $key (keys %new) {
print "$key = $new{$key}{value}\n";
}
Using awk:
awk '
BEGIN {FS=OFS="="}
NR==FNR {
line[$1] = $2;
next
}
($1 in line) && ($2!=line[$1]) {
print $0;
print ";new value:";
print "; "$1, line[$1];
delete line[$1];
next
}
($1 in line) && ($2==line[$1]) {
print $0;
delete line[$1];
next
}1
END {
for (k in line)
print k, line[k]
}' newfile oldfile
Output:
VAR NAME ONE = FOO
;new value:
; VAR NAME ONE = BAR
TWO BAR = VALUE
; this is a comment
NEW = DATA

Ignore empty lines in a CSV file using Perl

I have to read a CSV file, abc.csv, select a few fields from them and form a new CSV file, def.csv.
Below is my code. I am trying to ignore empty lines from abc.csv.
genNewCsv();
sub genNewCsv {
my $csv = Text::CSV->new();
my $aCsv = "abc.csv"
my $mCsv = "def.csv";
my $fh = FileHandle->new( $aCsv, "r" );
my $nf = FileHandle->new( $mCsv, "w" );
$csv->print( $nf, [ "id", "flops""ub" ] );
while ( my $row = $csv->getline($fh) ) {
my $id = $row->[0];
my $flops = $row->[2];
next if ( $id =~ /^\s+$/ ); #Ignore empty lines
my $ub = "TRUE";
$csv->print( $nf, [ $id, $flops, $ub ] );
}
$nf->close();
$fh->close();
}
But I get the following error:
Use of uninitialized value $flops in pattern match (m//)
How do I ignore the empty lines in the CSV file?
I have used Stack Overflow question Remove empty lines and space with Perl, but it didn't help.
You can skip the entire row if any fields are empty:
unless(defined($id) && defined($flop) && defined($ub)){
next;
}
You tested if $id was empty, but not $flops, which is why you got the error.
You should also be able to do
unless($id && $flop && $ub){
next;
}
There, an empty string would evaluate to false.
Edit: Now that I think about it, what do you mean by ignore lines that aren't there?
You can also do this
my $id = $row[0] || 'No Val' #Where no value is anything you want to signify it was empty
This will show a default value for the the variable, if the first value evaluated to false.
Run this on your file first to get non-empty lines
sub removeblanks {
my #lines = ();
while(#_) {
push #lines, $_ unless( $_ =~ /^\s*$/ ); #add any other conditions here
}
return \#lines;
}
You can do the following to skip empty lines:
while (my $row = $csv->getline($fh)){
next unless grep $_, #$row;
...
You could use List::MoreUtils to check if any of the fields are defined:
use List::MoreUtils qw(any);
while(my $row = ...) {
next unless any { defined } #{ $row };
...
}