Extract the £ (pound) currency symbol and the amount (56) from an html file - eclipse

Extract the £ (pound) currency symbol and the amount (56) from an html file. It is printing the amount as £56 and prints the currency as Â. How can I print only 56, without the symbol? It is working fine with a $ sign.
Part of the code:
cost= "£56"
currencySymbol = cost[0]
print (currencySymbol, cost[1:])
The output I am getting:
Â: £56

there are many ways that you can do it, you can use split, regex and one method that I did below:
Hope it helps you
import re
cost= "£560,000"
match = re.search(r'([\D]+)([\d,]+)', cost)
output = (match.group(1), match.group(2).replace(',',''))
print (output);
output -->('£', '560000')
check here (https://ideone.com/Y053Vb)

Resolved: i tried to run below code in separate file in eclipse and given error about utf-8.
i search the error and got answer, it is eclipse who is changing unicode style to avoid i used to run in python IDLE, i think we can change unicode in eclipse?.
Thanks to Martijn Pieters
[SyntaxError: Non-UTF-8 code starting with '\x91'
cost= "£56"
currencySymbol = cost[0]
print (currencySymbol, cost[1:])
#resolution :when using file use encoding
#with open('index.html', encoding="UTF-8") as productFile:

Related

Error in Linkdatagen : Use of uninitiated value $chr in concatenation (.) or string

Hi I was trying to use linkdatagen, which is a perl based tool. It requires a vcf file (using mpileup from SAMtools) and a hapmap annotation file (provided). I have followed the instructions but the moment I use the perl script provided, I get this error.
The codes I used are:
samtools mpileup -d10000 -q13 -Q13 -gf hg19.fa -l annotHapMap2U.txt samplex.bam | bcftools view -cg -t0.5 - > samplex.HM.vcf
Perl vcf2linkdatagen.pl -variantCaller mpileup -annotfile annotHapMap2U.txt -pop CEU -mindepth 10 -missingness 0 samplex.HM.vcf > samplex.brlmm
Use of uninitiated value $chr in concatenation (.) or string at vcf2linkdatagentest.pl line 487, <IN> line 1.... it goes on and on.. I have mailed the authors, and haven't heard from them yet. Can anyone here please help me? What am I doing wrong?
The perl script is :
http://bioinf.wehi.edu.au/software/linkdatagen/vcf2linkdatagen.pl
The HapMap file can be downloaded from the website mentioned below.
http://bioinf.wehi.edu.au/software/linkdatagen/
Thanks so much
Ignoring lines starting with #, vcf2linkdatagen.pl expects the first field of the first line of the VCF to contain something of the form "chrsomething", and your file doesn't meet that expectation. Examples from a comment in the code:
chr1 888659 . T C 226 . DP=26;AF1=1;CI95=1,1;DP4=0,0,9,17;MQ=49;FQ=-81 GT:PL:GQ 1/1:234,78,0:99
chr1 990380 . C . 44.4 . DP=13;AF1=7.924e-09;CI95=1.5,0;DP4=3,10,0,0;MQ=49;FQ=-42 PL 0
The warning means that a variable used in a string is not initialized (undefined). It is an indication that something might be wrong. The line in question can be traced to this statement
my $chr = $1 if ($tmp[0] =~ /chr([\S]+)/);
It is bad practice to use postfix if statements on a my statement.
As ikegami notes a workaround for this might be
my ($chr) = $tmp[0] =~ /chr([\S])/;
But since the match failed, it will likely return the same error. The only way to solve is to know more about the purpose of this variable, if the error should be fatal or not. The author has not handled this case, so we do not know.
If you want to know more about the problem, you might add a debug line such as this:
warn "'chr' value not found in the string '$tmp[0]'" unless defined $chr;
Typically, an error like this occurs when someone gives input to a program that the author did not expect. So if you see which lines give this warning, you might find out what to do about it.

Perl Code : Output not displayed properly

I have a perl code where I access multiple txt files and produce output for them.
While I run the code, the output lines on the console are overwritten.
2015-04-21:12-04-54|getFilesInInputDir| ********** name : PEPORT **********
PEPORT4-21:12-04-54|readNFormOutputFile| name :
PEPORT" is : -04-54|readNFormOutputFile| Frequency for name "
Please note, that the second and third line it should have been like
2015-04-21:12-04-54|readNFormOutputFile| name : PEPORT
2015-04-21:12-04-54|readNFormOutputFile| Frequency for name "PEPORT"
Also, after this the code stops processing my files. The code seems fine. May I know what may be the possible cause for this.
Thanks.
Seems like CR/LF versus LF issue. Convert your input from MSWin to Linux by running dos2unix or fromdos, or remove the "\r" characters from within the Perl code.
As choroba says, I guess you are reading a file on Linux that has been generated on Windows. The easiest fix is to replace chomp with s/\s+\z//or s/\p{cntrl}+\z//
Or, if trailing spaces are significant, you can use s/[\r\n]+\z// or, if you are running version 10 or later of Perl 5, s/\R\z//

How can I save Perl/Expect output that contains mixed ascii content?

I have a perl script that uses the expect library to login to a remote system. I'm getting the final output of the interaction with the before method:
$exp->before();
I'm saving this to a text file. When I use cat on the file it outputs fine in the terminal, but when I open the text file in an editor or try to process it the formatting is bizarre:
[H[2J[1;19HCIRCULATION ACTIVITY by TERMINAL (Nov 6,14)[11;1H
Is there a better way to save the output?
When I run enca it's identified as:
7bit ASCII characters
Surrounded by/intermixed with non-text data
you can remove none ascii chars.
$str1 =~ s/[^[:ascii:]]//g;
print "$str1\n";
I was able to remove the ANSI escape codes from my output by using the Text::ANSI::Util library's ta_strip() function:
my $ansi_string = $exp->before();
my $clean_string = ta_strip($ansi_string);

Perl: Problem with changing encoding in the middle of reading a file

I am using Perl to load some 'macro' files. These macros can, however, be encoded in various encodings, so there is a directive defined for users writing their macros (i.e.
#encoding iso-8859-2
at the beginning of the macro).
Every time this directive is encountered in the macro, a function setting encoding is called and looks sth like this:
sub change_encoding {
my ($file_handle, $encoding) = #_;
$file_handle->flush();
binmode($file_handle); # get rid of IO layers
binmode($file_handle,":encoding($encoding)");
}
The problem is that when I read the macro using standard
while($line = <$file_handle>){
process_macro($line);
}
I got messages saying "utf8 "\xXY" does not map to Unicode", but only if characters with diacritics is near the #encoding directive. I tried several examples and I was able to have half of the string with \xXY codes and other half of the string with correctly decoded characters, like here:
sub macro5_fn {
print "\xBElu\xBBou\xE8k\xFD k\xF9\xF2 úpěl ďábelské ódy\n";
}
If I put more comments before the function, all the characters are OK:
sub macro5_fn {
print "žluťoučký kůň úpěl ďábelské ódy\n";
}
Simply said, the number of correctly decoded characters depends on the distance of these characters from the #encoding directive, the ones that are close are not decoded correctly.
It seems to me that this is an issue of Perl and PerlIO (not) flushing the buffer. Or am I doing something wrong?
Thank you for your answers.
The problem is that <> reads more than just one line, so the next line or so is being interpreted under the old encoding before you ever see the #encoding directive for the new.
Your best bet is probably to read the file in binary mode and use the Encode module to decode each line from the current encoding.

Perl CSV without exponential

I am generating CSV, and I want to store numbers without exponential format.
Please give me some suggestion.
I tried:
I used , perfectly,
I tried single quote before the large number, so I got as expected out in CSV, but number fore it showed single quote, if I click that number then number displaying perfectly.
I tried with delimeter, that is ' quote before one trailing slash.
So far no luck.
You might try using something like the Math::BigInt module:
use Math::BigInt;
my $num = new Math::BigInt(2);
$num=$num**128;
print "$num\n";
which will output:
340282366920938463463374607431768211456