convert 1,200,000 to the word 1,2 Million - numbers

I am trying to use this:
$string= 1,200,000
$pattern = '/,\d{3},\d{3}$/i';
$replacement = ' Million';
preg_replace($pattern , $replacement, $string);
should be return 1.2 Million
but just returns 1 Million
how can I solve this problem?

i was found the answer for myselt very simple
function number_convert($int)
{
$value=str_replace(".","",$int);
$value=str_replace(",","",$value);
return ($value/1000000)." Million";
}
but just around Million

Related

Truncate, Convert String and set output as variable

It seems so simple. I need a cmdlet to take a two word string, and truncate the first word to just the first character and truncate the second word to 11 characters, and eliminate the space between them. So "Arnold Schwarzenegger" would output to a variable as "ASchwarzeneg"
I literally have no code. My thinking was to
$vars=$var1.split(" ")
$var1=""
foreach $var in $vars{
????
}
I'm totally at a loss as to how to do this, and it seems so simple too. Any help would be appreciated.
Here is one way to do it using the index operator [ ] in combination with the range operator ..:
$vars = 'Arnold Schwarzenegger', 'Short Name'
$names = foreach($var in $vars) {
$i = $var.IndexOf(' ') + 1 # index after the space
$z = $i + 10 # to slice from `$i` until `$i + 10` (11 chars)
$var[0] + [string]::new($var[$i..$z])
}
$names

How to get substring of the line enclosed in double quotes

I have an input string :
ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6
If I use split function, I'm getting weird output.
my ($field1, $field2, $field3, $field4) = "";
while (<DATAFILE>) {
$row = $_;
$row =~ s/\r?\n$//;
($field1, $field2, $field3, $field4) = split(/,/, $row);
}
output I am getting is:
field1 :: ACC000121
field2 :: 2290
field3 :: "01009900
field4 :: 01009901
Expected output:
field1 = ACC000121
field2 = 2290
field3 = 01009900,01009901,01009902,01009903,01009904
field4 = 4
field5 = 5
field6 = 6
I am quite weak in Perl. Please help me
If you have CSV data, you really want to use Text::CSV to parse it. As you've discovered, parsing CSV data is usually not as trivial as just splitting on commas, and Text::CSV can handle all the edge cases for you.
use strict;
use warnings;
use Data::Dump;
use Text::CSV;
my $csv = Text::CSV->new;
while (<DATA>) {
$csv->parse($_);
my #fields = $csv->fields;
dd(\#fields);
}
__DATA__
ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6
Output:
[
"ACC000121",
2290,
"01009900,01009901,01009902,01009903,01009904",
4,
5,
6,
]
I agree with Matt Jacob's answer — you should parse CSV with Text::CSV unless you've got a very good reason not to do so.
If you're going to deal with it using regular expressions, I think you'll do better with m// than split. For example, this seems to cover most single line CSV data variants, though it does not remove the quotes around a quoted field as Text::CSV would — that requires a separate post-processing step.
use strict;
use warnings;
sub splitter
{
my($row) = #_;
my #fields;
my $i = 0;
while ($row =~ m/((?=,)|[^",][^,]*|"([^"]|"")*")(?:,|$)/g)
{
print "Found [$1]\n";
$fields[$i++] = $1;
}
for (my $j = 0; $j < #fields; $j++)
{
print "$j = [$fields[$j]]\n";
}
}
my $row;
$row = q'ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6';
print "Row 1: $row\n";
splitter($row);
$row = q'ACC000121,",",2290,"01009900,""aux data"",01009902,01009903,01009904",,5"abc",6,""';
print "Row 2: $row\n";
splitter($row);
Obviously, that has a fair amount of diagnostic code in it. The output (from Perl 5.22.0 on Mac OS X 10.11.1) is:
Row 1: ACC000121,2290,"01009900,01009901,01009902,01009903,01009904",4,5,6
Found [ACC000121]
Found [2290]
Found ["01009900,01009901,01009902,01009903,01009904"]
Found [4]
Found [5]
Found [6]
0 = [ACC000121]
1 = [2290]
2 = ["01009900,01009901,01009902,01009903,01009904"]
3 = [4]
4 = [5]
5 = [6]
Row 2: ACC000121,",",2290,"01009900,""aux data"",01009902,01009903,01009904",,5"abc",6,""
Found [ACC000121]
Found [","]
Found [2290]
Found ["01009900,""aux data"",01009902,01009903,01009904"]
Found []
Found [5"abc"]
Found [6]
Found [""]
0 = [ACC000121]
1 = [","]
2 = [2290]
3 = ["01009900,""aux data"",01009902,01009903,01009904"]
4 = []
5 = [5"abc"]
6 = [6]
7 = [""]
In the Perl code, the match is:
m/((?=,)|[^",][^,]*|"([^"]|"")*")(?:,|$)/
This looks for and captures (in $1) either an empty field followed by a comma, or for something other than a double quote followed by zero or more non-commas, or for a double quote followed by a sequence of zero or more occurrences of "not a double quote or two consecutive double quotes" and another double quote; it then expects either a comma or end of string.
Handling multi-line fields requires a little more work. Removing the escaping double quotes also requires a little more work.
Using Text::CSV is much simpler and much less error prone (and it can handle more variants than this can).

Dedup multi line records with perl

I have multi-line records in a text file I'd like to dedupe using perl:
Records are delimited by "#end-of-record" string and look like this:
CAPTAIN GIBLET'S NEWT CORRAL
555 RANDOM ST
TARDIS, CT 99999
We regret to inform you that we must repossess your pants in part due to your being 6 months late on payments. But mostly it's maliciousness. :)
TOTAL DUE: $30.00
#end-of-record
Here is my initial attempt:
#!/usr/bin/perl -w
use strict;
{
local $/ = "#end-of-record";
my %seen;
while ( my $record = <> ) {
if (not exists $seen{$record}) {
print $record;
$seen{$record} = 1;
}
}
}
This is printing out every record ...and duplicate records. Where did I go wrong?
UPDATE
Above code seems to work.
gawk 'BEGIN {ORS = RS = "#end-of-record\n"} !$seen[$0]++
END { print $ORS }' yourfile

fast way to compare rows in a dataset

I asked this question in R and got a lot of answers, but all of them crash my 4Gb Ram computer after a few hours running or they take a very long time to finish.
faster way to compare rows in a data frame
Some people said that it's not a job to be done in R. As I don't know C and I'm a little bit fluent in Perl, I'll ask here.
I'd like to know if there is a fast way to compare each row of a large dataset with the other rows, identifying the rows with a specific degree of homology. Let's say for the simple example below that I want homology >= 3.
data:
sample_1,10,11,10,13
sample_2,10,11,10,14
sample_3,10,10,8,12
sample_4,10,11,10,13
sample_5,13,13,10,13
The output should be something like:
output
sample duplicate matches
1 sample_1 sample_2 3
2 sample_1 sample_4 4
3 sample_2 sample_4 3
Matches are calculated when both lines have same numbers on same positions,
perl -F',' -lane'
$k = shift #F;
for my $kk (#o) {
$m = grep { $h{$kk}[$_] == $F[$_] } 0 .. $#F;
$m >=3 or next;
print ++$i, " $kk $k $m";
}
push #o, $k;
$h{$k} = [ #F ];
' file
output,
1 sample_1 sample_2 3
2 sample_1 sample_4 4
3 sample_2 sample_4 3
This solution provides an alternative to direct comparison, which will be slow for large data amounts.
Basic idea is to build an inverted index while reading the data.
This makes comparison faster if there are a lot of different values per column.
For each row, you look up the index and count the matches - this way you only consider the samples where this value actually occurs.
You might still have a memory problem because the index gets as large as your data.
To overcome that, you can shorten the sample name and use a persistent index (using DB_File, for example).
use strict;
use warnings;
use 5.010;
my #h;
my $LIMIT_HOMOLOGY = 3;
while(my $line = <>) {
my #arr = split /,/, $line;
my $sample_no = shift #arr;
my %sim;
foreach my $i (0..$#arr) {
my $value = $arr[$i];
our $l;
*l = \$h[$i]->{$value};
foreach my $s (#$l) {
$sim{$s}++;
}
push #$l, $sample_no;
}
foreach my $s (keys %sim) {
if ($sim{$s}>=$LIMIT_HOMOLOGY) {
say "$sample_no: $s. Matches: $sim{$s}";
}
}
}
For 25000 rows with 26 columns with random integer values between 1 and 100, the program took 69 seconds on my mac book air to finish.

Better way to extract elements from a line using perl?

I want to extract some elements from each line of a file.
Below is the line:
# 1150 Reading location 09ef38 data = 00b5eda4
I would like to extract the address 09ef38 and the data 00b5eda4 from this line.
The way I use is the simple one like below:
while($line = < INFILE >) {
if ($line =~ /\#\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*=\s*(\S+)/) {
$time = $1;
$address = $4;
$data = $6;
printf(OUTFILE "%s,%s,%s \n",$time,$address,$data);
}
}
I am wondering is there any better idea to do this ? easier and cleaner?
Thanks a lot!
TCGG
Another option is to split the string on whitespace:
my ($time, $addr, $data) = (split / +/, $line)[1, 4, 7];
You could use matching and a list on LHS, something likes this:
echo '# 1150 Reading location 09ef38 data = 00b5eda4' |
perl -ne '
$,="\n";
($time, $addr, $data) = /#\s+(\w+).*?location\s+(\w+).*?data\s*=\s*(\w+)/;
print $time, $addr, $data'
Output:
1150
09ef38
00b5eda4
In python the appropriate regex will be like:
'[0-9]+[a-zA-Z ]*([0-9]+[a-z]+[0-9]+)[a-zA-Z ]*= ([0-9a-zA-Z]+)'
But I don't know exactly how to write it in perl. You can search for it. If you need any explanation of this regexp, I can edit this post with more precise description.
I find it convenient to just split by one or more whitespaces of any kind, using \s+. This way you won't have any problems if the input string has any tab characters in it instead of spaces.
while($line = <INFILE>)
{
my ($time, $addr, $data) = (split /\s+/, $line)[1, 4, 7];
}
When splitting by ANY kind of whitespace it's important to note that it'll also split by the newline at the end, so you'll get an empty element at the end of the return. But in most cases, unless you care about the total amount of elements returned, there's no need to care.