I have a string str a\tb\tc\td\te
I want the 1st field value a to go in a variable, then 2nd field value b to go in other variable, then both c\td to go in 3rd variable and last field value e to go in one variable.
If I do
my ($a,$b,$c,$d) = split(/\t/,$_,4);
$c will acquire only c and $d will acquire d\te
I can do:
my ($a,$b,$c) = split(/\t/,$_,3);
Then c will get c\td\te
and I can somehow (How?) get rid of last value and get it in $d
How to achieve this?
split is good when you're keeping the order. If you're breaking the ordering like this you have a bit of a problem. You have two choices:
split according to \t and then join the ones you want.
be explicit.
an example of the first choice is:
my ($a,$b,$c1, $c2, $d) = split /\t/, $_;
my $c = "$c1\t$c2";
an example of the second choice is:
my ($a, $b, $c, $d) = (/(.*?)\t(.*?)\t(.*?\t.*?)\t(.*?)/;
each set of parentheses captures what you want exactly. Using the non-greedy modifier (?) after the * ensures that the parentheses won't capture \t.
Edit: if the intent is to have an arbitrary number of variables, you're best off using an array:
my #x = split /\t/, $_;
my $a = $x[0];
my $b = $x[1];
my $c = join "\t", #x[2..($#x-1)];
my $d = $x[-1];
You can use a regex with a negative look-ahead assertion, e.g.:
my #fields = split /\t(?!d)/, $string;
Related
I have the following in a variable in Perl.
my $test = * file/test/ttt/rrr/aaa/abc.fff.ter.yyy:myfilename.txt
I want to extract
myfilename.txt
The initial path with / will change (there may be many directories which are not fixed), and I only want the last file name.
How can I do this?
I tried to use:
$filename= (split /\//, $test)[4] ;
The best solution is probably to use a module designed for this exact thing.
use File::Basename;
my $filename = basename($test);
Other solutions are likely regex based:
If you want to extract the part of a string that is after a colon, like in this string, you could do:
my ($filename) = $test =~ /:(.+)/;
Or if you want to extract a basename + extension at the end
my ($filename) = $test =~ /(\w+\.\w+)$/;
Or a split based solution
my $filename = (split /:/, $test)[-1];
You can count backward from the end of a list with a negative index. The last element of a list is -1, the next to last -2, and so on:
$filename= (split /\//, $test)[-1];
If you have your list in an array, the $#array_name is the index of the last element. That's a bit more clunky than just using -1 though:
my #array = qw( 1 3 7 );
my $last = $array[$#array];
If you don't care about the array, you can remove the last element with pop, which returns that value:
my $last = pop #array
Added one of the methods from the above answers:
$fullpath = $ARGV[0];
my ($filename) = $fullpath=~m/(?:\\|\/)(?:[^\:]+)\:([^\/\\]+)$/;
print $filename;
The above code should work Windows or Linux.
I was wondering how to get multiple inputs from the same input line.
For example, the user inputs: 1, 2, 3 . Is there a way to split them and put them into an array.
From perlrequick:
To extract a comma-delimited list of numbers, use
$x = "1.618,2.718, 3.142";
#const = split /,\s*/, $x; # $const[0] = '1.618'
# $const[1] = '2.718'
# $const[2] = '3.142'
The ",\s*" is a regular expression meaning one comma followed by any number of spaces.
This answer is absolutely right. And if you want to account for the user adding accidental spaces before a comma too (an input like "1, 2 , 3"), you can use
split /\s*,\s*/, $inputstring
So in your case specifically, what you want is
chomp(my $inputstring = <STDIN>);
my ($a, $b, $c) = split( /\s*,\s*/, $inputstring );
chomp removes the trailing newline from the captured input. The parenthesis in split are optional, but make it clear that we are supplying arguments to split. Finally, this code will only look at the first three inputs. If you want to capture all of them more generally, use
chomp(my $inputstring = <STDIN>);
my #inputarray = split( /\s*,\s*/, $inputstring );
I need some help decoding this perl script. $dummy is not initialized with anything throughout anywhere else in the script. What does the following line mean in the script? and why does it mean when the split function doesn't have any parameter?
($dummy, $class) = split;
The program is trying to check whether a statement is truth or lie using some statistical classification method. So lets say it calculates and give the following number to "truth-sity" and "falsity" then it checks whether the lie detector is correct or not.
# some code, some code...
$_ = "truth"
# more some code, some code ...
$Truthsity = 9999
$Falsity = 2134123
if ($Truthsity > $Falsity) {
$newClass = "truth";
} else {
$newClass = "lie";
}
($dummy, $class) = split;
if ($class eq $newClass) {
print "correct";
} elsif ($class eq "true") {
print "false neg";
} else {
print "false pos"
}
($dummy, $class) = split;
Split returns an array of values. The first is put into $dummy, the second into $class, and any further values are ignored. The first arg is likely named dummy because the author plans to ignore that value. A better option is to use undef to
ignore a returned entry: ( undef, $class ) = split;
Perldoc can show you how split functions. When called without arguments, split will operate against $_ and split on whitespace. $_ is the default variable in perl, think of it as an implied "it," as defined by context.
Using an implied $_ can make short code more concise, but it's poor form to use it inside larger blocks. You don't want the reader to get confused about which 'it' you want to work with.
split ; # split it
for (#list) { foo($_) } # look at each element of list, foo it.
#new = map { $_ + 2 } #list ;# look at each element of list,
# add 2 to it, put it in new list
while(<>){ foo($_)} # grab each line of input, foo it.
perldoc -f split
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on
whitespace (after skipping any leading whitespace). Anything matching PATTERN
is taken to be a delimiter separating the fields. (Note that the delimiter may
be longer than one character.)
I'm a big fan of the ternary operator ? : for setting string values and of pushing logic into blocks and subroutines.
my $Truthsity = 9999
my $Falsity = 2134123
print test_truthsity( $Truthsity, $Falsity, $_ );
sub test_truthsity {
my ($truthsity, $falsity, $line ) = #_;
my $newClass = $truthsity > $falsity ? 'truth' : 'lie';
my (undef, $class) = split /\s+/, $line ;
my $output = $class eq $newClass ? 'correct'
: $class eq 'true' ? 'false neg'
: 'false pos';
return $output;
}
There may be a subtle bug in this version. split with no args is not the exactly the same as split(/\s+/, $_), they behave differently if the line starts with spaces. In fully qualified split, blank leading fields are returned. split with no args drops the leading spaces.
$_ = " ab cd";
my #a = split # #a contains ( 'ab', 'cd' );
my #b = split /\s+/, $_; # #b contains ( '', 'ab', 'cd')
From the documentation for split:
split /PATTERN/,EXPR
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace). Anything
matching PATTERN is taken to be a delimiter separating the fields.
(Note that the delimiter may be longer than one character.)
So since both the pattern and the expression are omitted, we are splitting the default variable $_ on whitespace.
The purpose of the $dummy variable is to capture the first element of the list returned from split and ignore it, because the code is only interested in the second element, which gets put into $class.
You'll have to look at the surrounding code to find out what $_ is in this context; it may be a loop variable or a list item in a map block, or something else.
If you read the documentation, you'll find that:
The default for the first operand is " ".
The default for the second operand is $_.
The default for the third operand is 0.
so
split
is short for
split " ", $_, 0
and it means:
Take $_, split its value on whitespace, ignoring leading and trailing whitespace.
The first resulting field is placed in $dummy, and the second in $class.
Based on its name, I presume you proceed to never use $dummy again, so it's simply acting as a placeholder. You can get rid of it, though.
my ($dummy, $class) = split;
can be written as
my (undef, $class) = split; # Use undef as a placeholder
or
my $class = ( split )[1]; # Use a list slice to get second item
$aa = "Main:http://google-test.com:8080/service"
(or)
$aa = "http://google-test.com:8080/service2"
I want to split this into two parts:
Main:
http://google-test.com:8080/service
But it is not working with this split:
split (/\:/,$aa,1);
You need change the limit from 1 to 2.
perl -le 'my $aa="Main:http://google-test.com:8080/service"; my #parts = split(/:/, $aa, 2); print scalar #parts;'
From perldoc -f split:
If LIMIT is specified and positive, it represents the maximum number
of fields the EXPR will be split into,
It looks like you were trying to use it as the maximum number of times to split and not the number of parts to return.
New question, new answer:
my ($a1, $a2) = $aa =~ /^(\w*):?(http://.+)$/;
Assuming the "Main" part can only be alphanumerics. This will also match $a1 to the empty string if "Main" is left out, which you can check for with an if statement or similar.
Split would work too, with a limit of two, as gpojd has already answered.
my ($a1, $a2) = split /:/, $aa, 2;
But then you would need to check and see what you caught in the two variables. E.g. the URL could be in either $a1 or $a2. And you might need to join them back together afterwards.
You want to split it at the colons?
try:
my #DATA;
$aa = "Main:http://google-test.com:8080/service";
#DATA = split(/:/, $aa);
Then you can access the different parts of the split using:
for ($i = 0; $i < #DATA; $i++)
{
print "data section $i value is: " . $DATA[$i] . "\n";
}
I would like to optimise this Perl sub:
push_csv($string,$addthis,$position);
for placing strings inside a CSV string.
e.g. if $string="one,two,,four"; $addthis="three"; $position=2;
then push_csv($string,$addthis,$position) will change the value of $string = "one,two,three,four";
sub push_csv {
my #fields = split /,/, $_[0]; # split original string by commas;
$_[1] =~ s/,//g; # remove commas in $addthis
$fields[$_[2]] = $_[1]; # put the $addthis string into
# the array position $position.
$_[0] = join ",", #fields; # join the array with commas back
# into the string.
}
This is a bottleneck in my code, as it needs to be called a few million times.
If you are proficient in Perl, could you take a look at it, and suggest optimisation/alternatives? Thanks in advance! :)
EDIT:
Converting to #fields and back to string is taking time, I just thought of a way to speed it up where I have more than one sub call in a row. Split once, then push more than one thing into the array, then join once at the end.
For several reasons, you should be using Text::CSV to handle these low-level CSV details. Provided that you are able to install the XS version, my understanding is that it will run faster than anything you can do in pure Perl. In addition, the module will correctly handle all sorts of edge cases that you are likely to miss.
use Text::CSV;
my $csv = Text::CSV->new;
my $line = 'foo,,fubb';
$csv->parse($line);
my #fields = $csv->fields;
$fields[1] = 'bar';
$csv->combine(#fields);
print $csv->string; # foo,bar,fubb
Keep your array as an array in the first place, not as a ,-separated string?
You might want to have a look at Data::Locations.
Or try (untested, unbenchmarked, doesn't append new fields like your original can...)
sub push_csv {
$_[1] =~ y/,//d;
$_[0] =~ s/^(?:[^,]*,){$_[2]}\K[^,]*/$_[1]/;
return;
}
A few suggestions:
Use tr/,//d instead of s/,//g as it is faster. This is essentially the same as ysth's suggestion to use y/,//d
Perform split only as much as is needed. If $position = 1, and you have 10 fields, then you're wasting computation performing unnecessary splits and joins. The optional third argument to split can be leveraged to your advantage here. However, this does depend on how many consecutive empty fields you are expecting. It may not be worth it if you don't know ahead of time how many of these you have
You're quite right in wanting to perform multiple appends with one sub-call. There is no need to perform multiple splits and joins when one will do just as well
You really ought to be using Text::CSV, but here's how I would revise the implementation of your sub in pure Perl (assuming a maximum of one consecutive empty field):
sub push_csv {
my ( $items, $positions ) = #_[1..2];
# Test inputs
warn "No. of items to add & positions not equal"
and
return unless #{$items} == #{$positions};
my $maxPos; # Find the maximum position number
for my $position ( #{$positions} ) {
$maxPos ||= $position;
$maxPos = $position if $maxPos < $position;
}
my #fields = split /,/ , $_[0], $maxPos+2; # Split only as much as needed
splice ( #fields, $positions->[$_], 1, $items->[$_] ) for 0 .. $#{$items};
$_[0] = join ',' , #fields;
print $_[0],"\n";
}
Usage
use strict;
use warnings;
my $csvString = 'one,two,,four,,six';
my #missing = ( 'three', 'five' );
my #positions = ( 2, 4 );
push_csv ( $csvString, \#missing, \#positions );
print $csvString; # Prints 'one,two,three,four,five,six'
If you're hitting a bottleneck by splitting and joining a few million times... then don't split and join constantly. split each line once when it initially enters the system, pass that array (or, more likely, a reference to the array) around while doing your processing, and then do a single join to turn it into a string when you're ready for it to leave the system.
e.g.:
#!/usr/bin/env perl
use strict;
use warnings;
# Start with some CSV-ish data
my $initial_data = 'foo,bar,baz';
# Split it into an arrayref
my $data = [ split /,/, $initial_data ];
for (1 .. 1_000_000) {
# Pointless call to push_csv, just to make it run
push_csv($data, $_, $_ % 3);
}
# Turn it back into a string and display it
my $final_data = join ',', #$data;
print "Result: $final_data\n";
sub push_csv {
my ($data_ref, $value, $position) = #_;
$$data_ref[$position] = $value;
# Alternately:
# $data_ref->[$position] = $value;
}
Note that this simplifies things enough that push_csv becomes a single, rather simple, line of processing, so you may want to just do the change inline instead of calling a sub for it, especially if runtime efficiency is a key criterion - in this trivial example, getting rid of push_csv and doing it inline reduced run time by about 70% (from 0.515s to 0.167s).
Don't you think it might be easier to use arrays and splice, and only use join to create the comma separation at the end?
I really don't think using s/// repeatedly is a good idea if this is a major bottleneck in your code.