Perl autoincrement of string not working as before - perl

I have some code where I am converting some data elements in a flat file. I save the old:new values to a hash which is written to a file at the end of processing. On subsequence execution, I reload into a hash so I can reuse previously converted values on additional data files. I also save the last conversion value so if I encounter an unconverted value, I can assign it a new converted value and add it to the hash.
I had used this code before (back in Feb) on six files with no issues. I have a variable that is set to ZCKL0 (last character is a zero) which is retrieved from a file holding the last used value. I apply the increment operator
...
$data{$olddata} = ++$dataseed;
...
and the resultant value in $dataseed is 1 instead of ZCKL1. The original starting seed value was ZAAA0.
What am I missing here?

Do you use the $dataseed variable in a numeric context in your code?
From perlop:
If you increment a variable that is
numeric, or that has ever been used in
a numeric context, you get a normal
increment. If, however, the variable
has been used in only string contexts
since it was set, and has a value that
is not the empty string and matches
the pattern /^[a-zA-Z][0-9]\z/ , the
increment is done as a string,
preserving each character within its
range.

As prevously mentioned, ++ on strings is "magic" in that it operates differently based on the content of the string and the context in which the string is used.
To illustrate the problem and assuming:
my $s='ZCL0';
then
print ++$s;
will print:
ZCL1
while
$s+=0; print ++$s;
prints
1
NB: In other popular programming languages, the ++ is legal for numeric values only.
Using non-intuitive, "magic" features of Perl is discouraged as they lead to confusing and possibly unsupportable code.

You can write this almost as succinctly without relying on the magic ++ behavior:
s/(\d+)$/ $1 + 1 /e
The e flag makes it an expression substitution.

Related

What does $variable{$2}++ mean in Perl?

I have a two-column data set in a tab-separated .txt file, and the perl script reads it as FH and this is the immediate snippet of code that follows:
while(<FH>)
{
chomp;
s/\r//;
/(.+)\t(.+)/;
$uniq_tar{$2}++;
$uniq_mir{$1}++;
push#{$mir_arr{$1}},$2;
push #{$target{$2}} ,$1;
}
When I try to print any of the above 4 variables, it says the variables are uninitialized.
And, when I tried to print $uniq_tar{$2}++; and $uniq_mir{$1}++;
It just prints some numbers which I cannot understand.
I would just like to know what this part of code evaluate in general?
$uniq_tar{$2}++;
The while loop puts each line of your file, in turn, into Perl's special variable $_.
/.../ is the match operator. By default it works on $_.
/(.*)\t(.*)/ is a regular expression inside the match operator. If the regex matches what is in $_, then the bits of the matching string that are inside the two pairs of parentheses are stored in Perl's special variables $1 and $2.
You have hashes called %uniq_tar and %uniq_mir. You access individual elements in a hash using the $hashname{key}. So, $uniq_tar{$1} is finding the value in %uniq_tar associated with the key that is stored in $1 (that is - the part of your record before the first tab).
$variable++ increments the number in $variable. So $uniq_tar{$1}++ increments the value that we found in the previous paragraph.
So, as zdim says, it's a frequency counter. You read each line in the file, and extract the bits of data before and after the first tab in the line. You then increment the values in two hashes to count the number of occurences of each of the strings.

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.
Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe
Use inspect with reverse and stop after first occurence of #

How are scalars stored 'under the hood' in perl?

The basic types in perl are different then most languages, with types being scalar, array, hash (but apparently not subroutines, &, which I guess are really just scalar references with syntactical sugar). What is most odd about this is that the most common data types: int, boolean, char, string, all fall under the basic data type "scalar". It seems that perl decides rather to treat a scalar as a string, boolean, or number based off of the operator that modifies it, implying the scalar itself is not actually defined as "int" or "String" when saved.
This makes me curious as to how these scalars are stored "under the hood", particularly in regards to it's effect on efficiency (yes I know scripting languages sacrifice efficiency for flexibility, but they still need to be as optimized as possible when flexibility concerns are not affected). It's much easier for me to store the number 65535 (which takes two bytes) then the string "65535" which takes 6 bytes, as such recognizing that $val = 65535 is storing an int would allow me to use 1/3 the memory, in large arrays this could mean fewer cache hits as well.
It's not just limited to saving memory of course. There are times when I can offer more significant optimizations if I know what type of scalar to expect. For instance if I have a hash using very large integers as keys it would be far faster to look up a value if I recognizing the keys as ints, allowing a simply modulo for creating my hash key, then if I have to run more complex hashing logic on a string that has 3 times the bytes.
So I'm wondering how perl handles these scalars under the hood. Does it store every value as a string, sacrificing the extra memory and cpu cost of constant converting string to int in the case that a scalar is always used as an int? Or does it have some logic for inference the type of scalar used to determine how to save and manipulate it?
Edit:
TJD linked to perlguts, which answers half my question. A scalar is actually stored as string, int (signed, unsigned, double) or pointer. I'm not too surprised, I had mostly expected this behavior to occur under the hood, though it's interesting to see the exact types. I'm leaving this question open though because perlguts is actually to low level. Other then telling me that 5 data types exist it doesn't specify how perl works to alternate between them, ie how perl decides which SV type to use when a scalar is saved and how it knows when/how to cast.
There are actually a number of types of scalars. A scalar of type SVt_IV can hold undef, a signed integer (IV) or an unsigned integer (UV). One of type SVt_PVIV can also hold a string[1]. Scalars are silently upgraded from one type to another as needed[2]. The TYPE field indicates the type of a scalar. In fact, arrays (SVt_AV) and hashes (SVt_HV) are really just types of scalars.
While the type of a scalar indicates what the scalar can contain, flags are used to indicate what a scalar does contain. This is stored in the FLAGS field. SVf_IOK signals that a scalar contains a signed integer, while SVf_POK indicates it contains a string[3].
Devel::Peek's Dump is a great tool for looking at the internals of scalars. (The constant prefixes SVt_ and SVf_ are omitted by Dump.)
$ perl -e'
use Devel::Peek qw( Dump );
my $x = 123;
Dump($x);
$x = "456";
Dump($x);
$x + 0;
Dump($x);
'
SV = IV(0x25f0d20) at 0x25f0d30 <-- SvTYPE(sv) == SVt_IV, so it can contain an IV.
REFCNT = 1
FLAGS = (IOK,pIOK) <-- IOK: Contains an IV.
IV = 123 <-- The contained signed integer (IV).
SV = PVIV(0x25f5ce0) at 0x25f0d30 <-- The SV has been upgraded to SVt_PVIV
REFCNT = 1 so it can also contain a string now.
FLAGS = (POK,IsCOW,pPOK) <-- POK: Contains a string (but no IV since !IOK).
IV = 123 <-- Meaningless without IOK.
PV = 0x25f9310 "456"\0 <-- The contained string.
CUR = 3 <-- Number of bytes used by PV (not incl \0).
LEN = 10 <-- Number of bytes allocated for PV.
COW_REFCNT = 1
SV = PVIV(0x25f5ce0) at 0x25f0d30
REFCNT = 1
FLAGS = (IOK,POK,IsCOW,pIOK,pPOK) <-- Now contains both a string (POK) and an IV (IOK).
IV = 456 <-- This will be used in numerical contexts.
PV = 0x25f9310 "456"\0 <-- This will be used in string contexts.
CUR = 3
LEN = 10
COW_REFCNT = 1
illguts documents the internal format of variables quite thoroughly, but perlguts might be a better place to start.
If you start writing XS code, keep in mind it's usually a bad idea to check what a scalar contains. Instead, you should request what should have been provided (e.g. using SvIV or SvPVutf8). Perl will automatically convert the value to the requested type (and warn if appropriate). API calls are documented in perlapi.
In fact, it can hold a string an either a signed integer or an unsigned integer at the same time.
All scalars (including arrays and hashes, excluding one type of scalar that can only hold undef) have two memory blocks at their base. Pointers to the scalar point to its head, which contains the TYPE field and a pointer to the body. Upgrading a scalar replaces the body of the scalar. That way, pointers to the scalar aren't invalidated by an upgrade.
An undef variable is one without any uppercase OK flags set.
The formats used by Perl for data storage are documented in the perlguts perldoc.
In short, though, a Perl scalar is stored as a SV structure containing one of a number of different types, such as an int, a double, a char *, or a pointer to another scalar. (These types are stored as a C union, so only one of them will be present at a time; the SV contains flags indicating which type is used.)
(With regard to hash keys, there's an important gotcha to note there: hash keys are always strings, and are always stored as strings. They're stored in a different type from other scalars.)
The Perl API includes a number of functions which can be used to access the value of a scalar as a desired C type. For example, SvIV() can be used to return the integer value of a SV: if the SV contains an int, that value is returned directly; if the SV contains another type, it's coerced to an integer as appropriate. These functions are used throughout the Perl interpreter for type conversions. However, there is no automatic inference of types on output; functions which operate on strings will always return a PV (string) scalar, for instance, regardless of whether the string "looks like" a number or not.
If you're curious what a given scalar looks like internally, you can use the Devel::Peek module to dump its contents.
Others have addressed the "how are scalars stored" part of your question, so I'll skip that. With regard to how Perl decides which representation of a value to use and when to convert between them, the answer is it depends on which operators are applied to the scalar. For example, given this code:
my $score = 0;
The scalar $score will be initialised with an integer value. But then when this line of code is run:
say "Your score is $score";
The double quote operator means that Perl will need a string representation of the value. So the conversion from integer to string will take place as part of the process of assembling the string argument to the say function. Interestingly, after the stringification of $score, the underlying representation of the scalar will now include both an integer and a string representation, allowing subsequent operations to directly grab the relevant value without having to convert again. If a numeric operator is then applied to the string (e.g.: $score++) then the numeric part will be updated and the (now invalid) string part will be discarded.
This is the reason why Perl operators tend to come in two flavours. For example comparing values of numbers is done with <, ==, > while performing the same comparisons with strings would be done with lt, eq, gt. Perl will coerce the value of the scalar(s) to the type which matches the operator. This is why the + operator does numeric addition in Perl but a separate operator . is needed to do string concatenation: + will coerce its arguments to numeric values and . will coerce to strings.
There are some operators that will work with both numeric and string values but which perform a different operation depending on the type of value. For example:
$score = 0;
say ++$score; # 1
say ++$score; # 2
say ++$score; # 3
$score = 'aaa';
say ++$score; # 'aaa'
say ++$score; # 'aab'
say ++$score; # 'aac'
With regard to questions of efficiency (and bearing in mind standard disclaimers about premature optimisation etc). Consider this code which reads a file containing one integer per line, each integer is validated to check it is exactly 8 digits long and the valid ones are stored in an array:
my #numbers;
while(<$fh>) {
if(/^(\d{8})$/) {
push #numbers, $1;
}
}
Any data read from a file will initially come to us as a string. The regex used to validate the data will also require a string value in $_. So the result is that our array #numbers will contain a list of strings. However, if further uses of the values will be solely in a numeric context, we could use this micro-optimisation to ensure that the array contained only numeric values:
push #numbers, 0 + $1;
In my tests with a file of 10,000 lines, populating #numbers with strings used nearly three times as much memory as populating with integer values. However as with most benchmarks, this has little relevance to normal day-to-day coding in Perl. You'd only need to worry about that in situations where you a) had performance or memory issues and b) were working with a large number of values.
It's worth pointing out that some of this behaviour is common to other dynamic languages (e.g.: Javascript will silently coerce numeric values to strings).

ValueError when converting to int lines retrieved via getline.linecache

this is my first message here, I hope I will not commit any mistake.
I am writing a python 2.7 script which performs comparisons between lines from a long list of lines provided as an external input file. Some of these lines contain just numbers, and on those I perform simple sums after their retrieval via getline.linecache.
My problem is that after a certain number of lines I am getting the error:
ValueError: invalid literal for int() with base 10
I do understand that somehow this has to do with the fact that there is some problem when I try to convert the lines retrieved to the integer type, but according to what I read each line should be retrieved from a memory database as a string, and indeed if I try to print the type of the values retrieved I get str. I printed the problematic values in order to understand why they failed to be converted to int: at first i included some semantic mistakes (I was taking some wrong lines, which were containing letters, and this of course failed to be converted to int), but still I get the error on merely numerical strings. On all of those numerical strings, I tried len(linecache.getline('input', line_n)) to see if any extra characters were present, but I just found '\n', which does not give any problems when converting from str to int.
My input file is made by a series of lines, some numerical some not; here are few lines:
1
id3021-a
1
129485768
129485769
2
id2034
102
944709842
944709848
For examples, line 4 here can be retrieved, but not converted to int. How could I convert str to int without getting errors?
I found the solution! Adding a '0' to the beginning of the string fixes the problem (I do not know why, the problematic lines were not empty):
int('0' + linecache.getline('input', line_n))
See here: Trouble converting string to int in Django/Python

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);