Most efficient way to replace a line in a text document? - text-processing

I am learning to code in Unix with C. So far I have written the code to find the index of the first byte of the line that I want to replace. The problem is that sometimes, the number of bytes replacing the line might be greater than the number of bytes already on the line. In this case, the code start overwriting the next line. I came up with two standard solutions:
a) Rather than trying to edit the file in-place, I could copy the entire file into memory, edit it by shifting all the bytes if necessary and rewriting it back to file.
b) Only copy the line I want to end-of-file to memory and edit.
Both suggestions doesn't scale well. And I don't want to impose any restrictions on the line size(like every line must be 50 bytes or something). Is there any efficient way to do the line replacement ? Any help would be appreciated.

With text files you always have to "spool" them as the text to delete/insert/replace will nearly always be larger or smaller than what was there.
"Spooling" means to open a temp file in the file's directory, reading the original file and writing it to the temp file, stop where the replace/insert/delete starts, do your thing and copy the remainder to the output. If everything went fine, then unlink the original file and rename the new file to the old file.
P.s.: if you don't want to have restrictions on line size, then you must use fgetc/fputc to process character-by-character (no sweat; C can be pretty fast, your disks permitting).

I actually came across this problem last month with log files that had grown to 30 GB and one line. Utils like sed, perl wanted to consume all available memory to do anything with them at all. So technically, neither of your solutions scale well. But in practice, they're fine, with (b) being preferred. You should use fgets with a buffer size of, say, 8kB and iterate until the last character is a newline or you've reached EOF. In my soultion, I used perl's sysread function and read 16 kB chunks at a time.
From memory:
#define BUF_SZ 16383
char *buf = alloca(BUF_SZ + 1);
infile = fopen(...);
while (!feof(infile) && fgets(buf, BUF_SZ, infile) != NULL) {
readmore = (buf[0] != '\0' && buf[ strlen(buf)-1 ] != '\n');
/* other processing
.
.
*/
if (readmore) {
/* apply different strategies for dealing with buf */
}
}
I think the strategy really depends on what you're trying to do. If you want to remove the line or truncate it, but you need only match the beginning of the line, then it's pretty trivial (no special code). However, if you need to do a long pattern match that might extend past the first 16kB, then you have to do something like move the last n bytes (where n is the maximize size of your search pattern) to the beginning of buf
and do the next read into &buf[ n ].
You will output to a new filehandle, and when everything is done and done correctly, you unlink the first file, and rename the new file to the old one. Also research the mktemp for creating the temporary file in the same directory and the atexit call for cleaning up after in case there was an error.

The most efficient way to replace a line of text in a file depends on a number of things.[1] The primary concern when wanting an efficient search and replace is to minimize the number of file reads/writes as file I/O is generally an order of magnitude slower than memory operations. The trivial case occurs when the search and replacement string have the exact number of characters. In that, and only in that, case can you look to do an in-file replacement of the text without having to write a second (or temporary file).
Given the file I/O efficiencies, the most efficient (fastest) way of performing the search/replace is to either mmap the entire file or use sendfile. Both can make use of kernel-space copying of blocks of text which generally yields significant improvement over user-space copy operations. Neither are that difficult. The next best option is to use a buffered read to read the entire contents of the file into memory and then perform a search on the memory buffer to identify the location (address) where the contents are to be changed. You can then incrementally write the buffer out to a third file writing the replacement text at each required location identified during search of the original buffer.
Given the concerns in your example, there is no need to read the entire file into memory at once (even though files are rarely larger than INT_MAX bytes). With embedded systems, etc. where both memory storage and efficiency are challenges, you can simply set a block size of some arbitrary size that meets your size constraints and then read the file one-block of memory at a time, perform the search/replace (handling corner cases as required, e.g. where less than a search length of characters containing the first part of the search string remain in a given block, etc...)
The key is to minimize the number of times you go back to the drive for more information or write from your buffer to the drive. So generally the larger the block the better.
The following is a short minimum example that will read a given file in block-sized memory chunks as specified by the user (or as limited by the file-size itself if smaller than the requested block size). The code simply reads a block of data from a file, uses memchr to locate each beginning character in the search term. When the beginning character is found, memcmp is used to check a search-term-length of memory beginning at the matched character. Depending on whether the comparison matches or fails, the various indexes are updated and the search continues.
Data is output to the new file (stdout) in this case, each time a matching term is found within a block of memory, or at the end of the block if no matching terms are found. This code can be further optimized and additional sanity check can always be added. Look over the example and let me know if you have any questions. None of it is difficult, but there are a couple of separate indexes necessary (e.g. current buffer position, and last read/write position) to perform the search/replace. The example that follows uses a text file that contains several paragraphs that discus "personal injury" where "hygiene" is substituted for "injury" and the results are output to stdout.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#define BUFSZ (1 << 20) /* default max block size (1M) */
void *find_rplc_file (char *srch, char *rplc, FILE *ifp, FILE *ofp, long blksz);
int main (int argc, char **argv) {
if (argc != 4) {
fprintf (stderr, "error: insufficient input.\n"
"usage: %s infile <search> <replace>\n", argv[0]);
return 1;
}
FILE *ifp = fopen (argv[1], "rb");
if (!ifp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
if (!find_rplc_file (argv[2], argv[3], ifp, stdout, BUFSZ)) {
fprintf (stderr, "error: find/replace failure.\n");
return 1;
}
putchar ('\n');
fclose (ifp);
return 0;
}
void *find_rplc_file (char *srch, char *rplc, FILE *ifp, FILE *ofp, long blksz)
{
if (!ifp || !srch || !rplc || !blksz) return NULL;
char *fb, *filebuf = NULL;
size_t offset = 0, nbytes = 0, readsz = 0, rlen, slen;
long bytecnt = 0, readpos = 0, size = 0;
rlen = strlen (rplc); /* length of search/replace text */
slen = strlen (srch);
fseek (ifp, 0, SEEK_END);
if ((size = ftell (ifp)) == -1) { /* get file length */
fprintf (stderr, "error: unable to determine file length.\n");
return NULL;
}
fseek (ifp, 0, SEEK_SET);
/* limit blksz to less or INT_MAX or blksz */
blksz = blksz > INT_MAX ? INT_MAX : blksz;
/* validate blksz does not exceed file size */
readsz = blksz > size ? size : blksz;
/* allocate memory for filebuf */
if (!(filebuf = calloc (readsz, sizeof *filebuf))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return NULL;
}
/* read entire file readsz bytes at a time */
while ((nbytes = fread (filebuf, sizeof *filebuf, readsz, ifp))) {
if (nbytes != readsz) fprintf (stderr, "warning: short read.\n");
readpos = 0; /* initialize read position & pointer */
fb = filebuf;
/* for each occurrence of 1st char of search term */
while ((fb = memchr (fb, *srch, nbytes - offset))) {
/* set current offset in buffer */
offset = fb - filebuf;
/* if less than length of search term remains */
if (offset + slen > nbytes) {
nbytes = offset; /* set nbytes to current offset */
/* reset file pointer to account for nbytes reduction */
fseek (ifp, bytecnt + nbytes, SEEK_SET);
goto getnext; /* read next block from here */
}
/* otherwise compare fb to search term */
if (memcmp (srch, fb, slen) == 0) {
/* if term found, write prior buffer to output file */
fwrite (filebuf + readpos, sizeof *filebuf,
offset - readpos, ofp);
/* write replacement text */
fwrite (rplc, sizeof *rplc, rlen, ofp);
/* set next readpos to 1st char following search term */
readpos = offset + slen;
}
fb++; /* advance fb pointer for next memchr search */
}
getnext:
bytecnt += nbytes; /* increment bytecnt with bytes searched */
/* write remaining buffer to output file */
fwrite (filebuf + readpos, sizeof *filebuf,
nbytes - readpos, ofp);
/* check file complete */
if (bytecnt == size) break;
/* set next read size (either blksz or remaining chars < blksz) */
readsz = size - bytecnt > blksz ? blksz : size - bytecnt;
}
/* validate all bytes successfully read */
if ((long)bytecnt != size) {
fprintf (stderr, "error: file read failed.\n");
return NULL;
}
free (filebuf); /* free filebuf */
return srch; /* return something other than NULL for success */
}
Example Input
$ cat dat/damages.txt
Personal injury damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal injury case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal injury cases. Thus, while a jury has latitude in
assessing intangible damages in personal injury cases,
a jury's damage award does not escape the scrutiny of
appellate review.
Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.
Search "injury" Replace "hygiene"
$ ./bin/fread_blks_min dat/damages.txt "injury" "hygiene"
Personal hygiene damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal hygiene case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal hygiene cases. Thus, while a jury has latitude in
assessing intangible damages in personal hygiene cases,
a jury's damage award does not escape the scrutiny of
appellate review.
Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.
Footnote[1]: the most efficient way is to use one shell tools designed for the task such as sed or awk, which both have years of development and fairly good text handling optimizations built in.

Related

char array in c in overflowed scanf operation

I am new to C programming and i am really confused on below code:
#include <stdio.h>
#include <string.h>
int main(void)
{
char arrstr[6];
int i;
printf("Enter: ");
scanf("%s",arrstr);
printf("arrstr is %s\n",arrstr);
printf("length os arrstr is %d\n",strlen(arrstr));
for(i=0;i<20;i++)
{
printf("arrstr[%d] is %c, dec value is %d\n",i,arrstr[i],arrstr[i]);
}
return 0;
}
As from my understanding, after the declaration of arrstr[6], the compiler will allocate 6 bytes for this char array, and consider the last '\0' char, 5 valid chars can be stored in the char array.
But after i run this short code, i get below result:
The printf shows all chars i input, no matter how long is it. But when i using an index to check the array, seems i cannot find the extra chars in the array.
Can anyone helps to explain what happened?
Thanks.
Try changing your code by adding this line right after the scanf statement:
arrstr[5] = '\0';
What has happened is that the null character was overwritten by the user entry. Putting the null character back in manually gives you proper behavior for the next two lines, the printf statements.
The for loop is another matter. C does not have any kind of bounds checking so it's up to the programmer to not overrun the bounds of an array. The values you get after that could be anything at all, as you are just reading uninitialized RAM bytes at that point. A standard way of avoiding this is to use a const int variable to declare the array size:
const int SIZE = 6;
char arrstring[SIZE];
Then also use SIZE as the limit in the for loop.
P.S. There is still a problem here with the user entry as written, because a user could theoretically enter hundreds of characters, and that would get written out-of-bounds it seems, possibly causing weird bugs. There are ways to limit the amount of user entry, but it gets fairly involved, here are some stackoverflow posts on the topic. Keep in mind for future reference:
Limiting user entry with fgets instead of scanf
Cleaning up the stdin stream after using fgets

perl6: Cannot unbox 65536 bit wide bigint into native integer

I try some examples from Rosettacode and encounter an issue with the provided Ackermann example: When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones), I get (similar, but now copyable):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Cannot unbox 65536 bit wide bigint into native integer
in sub A at t/ackermann.p6 line 3
in sub A at t/ackermann.p6 line 11
in sub A at t/ackermann.p6 line 3
in block <unit> at t/ackermann.p6 line 17
Removing the proto declaration in line 3 (by commenting out):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Numeric overflow
in sub A at t/ackermann.p6 line 8
in sub A at t/ackermann.p6 line 11
in block <unit> at t/ackermann.p6 line 17
What went wrong? The program doesn't allocate much memory. Is the natural integer kind-of limited?
I replaced in the code from Ackermann function the 𝑚 with m and the 𝑛 with n for better terminal interaction for copying errors and tried to comment out proto declaration. I also asked Liz ;)
use v6;
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
multi A(0, Int \n) { n + 1 }
multi A(1, Int \n) { n + 2 }
multi A(2, Int \n) { 3 + 2 * n }
multi A(3, Int \n) { 5 + 8 * (2 ** n - 1) }
multi A(Int \m, 0 ) { A(m - 1, 1) }
multi A(Int \m, Int \n) { A(m - 1, A(m, n - 1)) }
# Testing:
say A(4,1);
say .chars, " digits starting with ", .substr(0,50), "..." given A(4,2);
A(4, 3).say;
Please read JJ's answer first. It's breezy and led to this answer which is effectively an elaboration of it.
TL;DR A(4,3) is a very big number, one that cannot be computed in this universe. But raku(do) will try. As it does you will blow past reasonable limits related to memory allocation and indexing if you use the caching version and limits related to numeric calculations if you don't.
I try some examples from Rosettacode and encounter an issue with the provided Ackermann example
Quoting the task description with some added emphasis:
Arbitrary precision is preferred (since the function grows so quickly)
raku's standard integer type Int is arbitrary precision. The raku solution uses them to compute the most advanced answer possible. It only fails when you make it try to do the impossible.
When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones)
Replacing the variable names is not a significant change.
But adding the A(4,3) line shifted the code from being computable in reality to not being computable in reality.
The example you modified has just one explanatory comment:
Here's a caching version of that ... to make A(4,2) possible
Note that the A(4,2) solution is nearly 20,000 digits long.
If you look at the other solutions on that page most don't even try to reach A(4,2). There are comments like this one on the Phix version:
optimised. still no bignum library, so ack(4,2), which is power(2,65536)-3, which is apparently 19729 digits, and any above, are beyond (the CPU/FPU hardware) and this [code].
A solution for A(4,2) is the most advanced possible.
A(4,3) is not computable in practice
To quote Academic Kids: Ackermann function:
Even for small inputs (4,3, say) the values of the Ackermann function become so large that they cannot be feasibly computed, and in fact their decimal expansions cannot even be stored in the entire physical universe.
So computing A(4,3).say is impossible (in this universe).
It must inevitably lead to an overflow of even arbitrary precision integer arithmetic. It's just a matter of when and how.
Cannot unbox 65536 bit wide bigint into native integer
The first error message mentions this line of code:
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
The state # is an anonymous state array variable.
By default # variables use the default concrete type for raku's abstract array type. This default array type provides a balance between implementation complexity and decent performance.
While computing A(4,2) the indexes (m and n) remain small enough that the computation completes without overflowing the default array's indexing limit.
This limit is a "native" integer (note: not a "natural" integer). A "native" integer is what raku calls the fixed width integers supported by the hardware it's running on, typically a long long which in turn is typically 64 bits.
A 64 bit wide index can handle indices up to 9,223,372,036,854,775,807.
But in trying to compute A(4,3) the algorithm generates a 65536 bits (8192 bytes) wide integer index. Such an integer could be as big as 265536, a 20,032 decimal digit number. But the biggest index allowed is a 64 bit native integer. So unless you comment out the caching line that uses an array, then for A(4,3) the program ends up throwing the exception:
Cannot unbox 65536 bit wide bigint into native integer
Limits to allocations and indexing of the default array type
As already explained, there is no array that could be big enough to help fully compute A(4,3). In addition, a 64 bit integer is already a pretty big index (9,223,372,036,854,775,807).
That said, raku can accommodate other array implementations such as Array::Sparse so I'll discuss that briefly below because such possibilities might be of interest for other problems.
But before discussing bigger arrays, running the code below on tio.run shows the practical limits for the default array type on that platform:
my #array;
#array[2**29]++; # works
#array[2**30]++; # could not allocate 8589967360 bytes
#array[2**60]++; # Unable to allocate ... 1152921504606846977 elements
#array[2**63]++; # Cannot unbox 64 bit wide bigint into native integer
(Comment out error lines to see later/greater errors.)
The "could not allocate 8589967360 bytes" error is a MoarVM panic. It's a result of tio.run refusing a memory allocation request.
I think the "Unable to allocate ... elements" error is a raku level exception that's thrown as a result of exceeding some internal Rakudo implementation limit.
The last error message shows the indexing limit for the default array type even if vast amounts of memory were made available to programs.
What if someone wanted to do larger indexing?
It's possible to create/use other # (does Positional) data types that support things like sparse arrays etc.
And, using this mechanism, it's possible that someone could write an array implementation that supports larger integer indexing than is supported by the default array type (presumably by layering logic on top of the underlying platform's instructions; perhaps the Array::Sparse I linked above does).
If such an alternative were called BigArray then the cache line could be replaced with:
my #array is BigArray;
proto A(Int \𝑚, Int \𝑛) { #array[𝑚][𝑛] //= {*} }
Again, this still wouldn't be enough to store interim results for fully computing A(4,3) but my point was to show use of custom array types.
Numeric overflow
When you comment out the caching you get:
Numeric overflow
Raku/Rakudo do arbitrary precision arithmetic. While this is sometimes called infinite precision it obviously isn't actually infinite but is instead, well, "arbitrary", which in this context also means "sane" for some definition of "sane".
This classically means running out of memory to store a number. But in Rakudo's case I think there's an attempt to keep things sane by switching from a truly vast Int to a Num (a floating point number) before completely running out of RAM. But then computing A(4,3) eventually overflows even a double float.
So while the caching blows up sooner, the code is bound to blow up later anyway, and then you'd get a numeric overflow that would either manifest as an out of memory error or a numeric overflow error as it is in this case.
Array subscripts use native ints; that's why you get the error in line 3, when you use the big ints as array subscripts. You might have to define a new BigArray that uses Ints as array subscripts.
The second problem arises in the ** operator: the result is a Real, and when the low-level operations returns a Num, it throws an exception.
https://github.com/rakudo/rakudo/blob/master/src/core/Int.pm6#L391-L401
So creating a BigArray might not be helpful anyway. You'll have to create your own ** too, that always works with Int, but you seem to have hit the (not so infinite) limit of the infinite precision Ints.

Giving freopen() and stderr a buffer (restricting size of a log file for iOS app)

I'm currently redirecting NSLog() output to a file using a call to freopen() from the App Delegate. I would like to restrict the log file size, but doing this-
unsigned long long fs = 3000;
while ([fileAttributes fileSize] < fs) {
freopen([FILEPATH cStringUsingEncoding:NSASCIIStringEncoding], "a+", stderr);
}
causes the app to be stuck with a black screen in an infinite loop. Is there any way I can set a buffer size for stderr, and then have a loop wherein I only continue to write to the file if the filesize + buffer size does not exceed the file size?
Looks like you are stuck in that infinite for loop. Can you not do that, but just check the fileSize and dump strings once the file size is greater than fs?
Your code says:
As long as [fielAttributes fileSize]
is less than fs, keep calling
freopen(...).
Is that really what you want? I'd guess that you might prefer something like:
if ([fileAttributes fileSize] < fs) {
file = freopen([FILEPATH cStringUsingEncoding:NSASCIIStringEncoding], "a+", stderr);
}
fwrite(file, "something to log");
fclose(file);
Of course, that's not great either, because it writes log information for fs characters and then stops. Instead, it seems like you probably want to preserve the LAST fs characters. If that's the case, then a pragmatic approach would be to simply copy the last fs characters to a new file and delete the old one from time to time.

perlre length limit

From man perlre:
The "*" quantifier is equivalent to "{0,}", the "+" quantifier to "{1,}", and the "?" quantifier to "{0,1}". n and m are limited to integral values less than a preset limit defined when perl is built. This is usually 32766 on the most common platforms. The actual limit can be seen in the error message generated by code such as this:
$_ **= $_ , / {$_} / for 2 .. 42;
Ay that's ugly - Isn't there some constant I can get instead?
Edit: As daxim pointed out (and perlretut hints towards) it might be that 32767 is a magical hardcoded number. A little searching in the Perl code goes a long way, but I'm not sure how to get to the next step and actually find out where the default reg_infty or REG_INFTY is actually set:
~/dev/perl-5.12.2
$ grep -ri 'reg_infty.*=' *
regexec.c: if (max != REG_INFTY && ST.count == max)
t/re/pat.t: $::reg_infty = $Config {reg_infty} // 32767;
t/re/pat.t: $::reg_infty_m = $::reg_infty - 1;
t/re/pat.t: $::reg_infty_p = $::reg_infty + 1;
t/re/pat.t: $::reg_infty_m = $::reg_infty_m; # Surpress warning.
Edit 2: DVK is of course right: It's defined at compile time, and can probably be overridden only with REG_INFTY.
Summary: there are 3 ways I can think of to find the limit: empirical, "matching Perl tests" and "theoretical".
Empirical:
eval {$_ **= $_ , / {$_} / for 2 .. 129};
# To be truly portable, the above should ideally loop forever till $# is true.
$# =~ /bigger than (-?\d+) /;
print "LIMIT: $1\n"'
This seems obvious enough that it doesn't require explanation.
Matches Perl tests:
Perl has a series of tests for regex, some of which (in pat.t) deal with testing this max value. So, you can approximate that the max value computed in those tests is "good enough" and follow the test's logic:
use Config;
$reg_infty = $Config {reg_infty} // 2 ** 15 - 1; # 32767
print "Test-based reg_infinity limit: $reg_infty\n";
The explanation of where in the tests this is based off of is in below details.
Theoretical: This is attempting to replicate the EXACT logic used by C code to generate this value.
This is harder that it sounds, because it's affected by 2 things: Perl build configuration and a bunch of C #define statements with branching logic. I was able to delve fairly deeply into that logic, but was stalled on two problems: the #ifdefs reference a bunch of tokens that are NOT actually defined anywhere in Perl code that I can find - and I don't know how to find out from within Perl what those defines values were, and the ultimate default value (assuming I'm right and those #ifdefs always end up with the default) of #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0) (The actual limit is gotten by removing 1 bit off that resulting all-ones number - details below).
I'm also not sure how to access the amount of bytes in short from Perl for whichever implementation was used to build perl executable.
So, even if the answer to both those questions can be found (which I'm not sure of), the resulting logic would most certainly be "uglier" and more complex than the straightforward "empirical eval-based" one I offered as the first option.
Below I will provide the details of where various bits and pieces of logic related to to this limit live in Perl code, as well as my attempts to arrive at "Theoretically correct" solution matching C logic.
OK, here is some investigation part way, you can complete it yourself as I have ti run or I will complete later:
From regcomp.c: vFAIL2("Quantifier in {,} bigger than %d", REG_INFTY - 1);
So, the limit is obviously taken from REG_INFTY define. Which is declared in:
rehcomp.h:
/* XXX fix this description.
Impose a limit of REG_INFTY on various pattern matching operations
to limit stack growth and to avoid "infinite" recursions.
*/
/* The default size for REG_INFTY is I16_MAX, which is the same as
SHORT_MAX (see perl.h). Unfortunately I16 isn't necessarily 16 bits
(see handy.h). On the Cray C90, sizeof(short)==4 and hence I16_MAX is
((1<<31)-1), while on the Cray T90, sizeof(short)==8 and I16_MAX is
((1<<63)-1). To limit stack growth to reasonable sizes, supply a
smaller default.
--Andy Dougherty 11 June 1998
*/
#if SHORTSIZE > 2
# ifndef REG_INFTY
# define REG_INFTY ((1<<15)-1)
# endif
#endif
#ifndef REG_INFTY
# define REG_INFTY I16_MAX
#endif
Please note that SHORTSIZE is overridable via Config - I will leave details of that out but the logic will need to include $Config{shortsize} :)
From handy.h (this doesn't seem to be part of Perl source at first glance so it looks like an iffy step):
#if defined(UINT8_MAX) && defined(INT16_MAX) && defined(INT32_MAX)
#define I16_MAX INT16_MAX
#else
#define I16_MAX PERL_SHORT_MAX
I could not find ANY place which defined INT16_MAX at all :(
Someone help please!!!
PERL_SHORT_MAX is defined in perl.h:
#ifdef SHORT_MAX
# define PERL_SHORT_MAX ((short)SHORT_MAX)
#else
# ifdef MAXSHORT /* Often used in <values.h> */
# define PERL_SHORT_MAX ((short)MAXSHORT)
# else
# ifdef SHRT_MAX
# define PERL_SHORT_MAX ((short)SHRT_MAX)
# else
# define PERL_SHORT_MAX ((short) (PERL_USHORT_MAX >> 1))
# endif
# endif
#endif
I wasn't able to find any place which defined SHORT_MAX, MAXSHORT or SHRT_MAX so far. So the default of ((short) (PERL_USHORT_MAX >> 1)) it is assumed to be for now :)
PERL_USHORT_MAX is defined very similarly in perl.h, and again I couldn't find a trace of definition of USHORT_MAX/MAXUSHORT/USHRT_MAX.
Which seems to imply that it's set by default to: #define PERL_USHORT_MAX ((unsigned short)~(unsigned)0). How to extract that value from Perl side, I have no clue - it's basically a number you get by bitwise negating a short 0, so if unsigned short is 16 bytes, then PERL_USHORT_MAX will be 16 ones, and PERL_SHORT_MAX will be 15 ones, e.g. 2^15-1, e.g. 32767.
Also, from t/re/pat.t (regex tests): $::reg_infty = $Config {reg_infty} // 32767; (to illustrate where the non-default compiled in value is stored).
So, to get your constant, you do:
use Config;
my $shortsize = $Config{shortsize} // 2;
$c_reg_infty = (defined $Config {reg_infty}) ? $Config {reg_infty}
: ($shortsize > 2) ? 2**16-1
: get_PERL_SHORT_MAX();
# Where get_PERL_SHORT_MAX() depends on logic for PERL_SHORT_MAX in perl.h
# which I'm not sure how to extract into Perl with any precision
# due to a bunch of never-seen "#define"s and unknown size of "short".
# You can probably do fairly well by simply returning 2**8-1 if shortsize==1
# and 2^^16-1 otherwise.
say "REAL reg_infinity based on C headers: $c_reg_infty";

Casting Command Line Arguments?

i have a command line executable program that accepts a configuration file as it's one argument for launch. the configuration file contains only 2 lines of code:
#ConfigFile.cfg
name=geoffrey
city=montreal
currently, the wrapper program that i'm building for this command line executable program writes the configuration file to disk and than sends that written file as the argument to launch the program on the command line.
> myProgram.exe configFile.cfg
is it possible to cast these configuration file entries as the configuration file, allowing me to bypass having to write the configuration file to disk but still allow the program to run as if it's reading from a configuration file?
perhaps something like the following:
> myProgram.exe configFile.cfg(name=geoffrey) configFile.cfg(city=montreal)
If you don't control the source for the program you're wrapping, and it doesn't already provide a facility to receive input in another way, you're going to find it difficult at best.
One possibility would be to intercept the file open and access calls used by the program though this is a horrible way to do it.
It would probably involve injecting your own runtime libraries containing (for C) fopen, fclose, fread and so on, between the program and the real libraries (such as using LD_LIBRARY_PATH or something similar), and that's assuming it's not statically linked. Not something for the faint of heart.
If you're worried about people being able to see your file, there's plenty of ways to avoid that, such as by creating it with rwx------ permissions in a similarly protected directory (assuming UNIX-like OS). That's probably safer than using command line arguments which any joker logged in could find out with a ps command.
If you just don't want the hassle of creating a file, I think you'll find the hassle of avoiding it is going to be so much more.
Depending on what you're really after, it wouldn't take much to put together a program that accepted arguments, wrote them to a temporary file, called the real program with that temporary file name, then deleted the file.
It would still be being written to disk, but that would no longer be the responsibility of your wrapper program. Something along the lines of (a bit rough, but you should get the idea):
#include <stdio.h>
#define PROG "myProgram"
int main (int argCount, char *argVal[]) {
char *tmpFSpec;
FILE *fHndl;
int i;
char *cmdBuff;
tmpFSpec = getTempFSpec(); // need to provide this.
cmdBuff = malloc (sizeof(PROG) + 1 + strlen (tmpFSpec) + 1);
if (cmdBuff == NULL) {
// handle error.
return 1;
}
fHndl = fopen (tmpFSpec, "w");
if (fHndl == NULL) {
// handle error.
free (cmdBuff);
return 1;
}
for (i = 1; i < argCount; i++)
fprintf (fHndl, "%s\n", argVal[i]);
sprintf (cmdBuff, "%s %s", PROG, tmpFSpec);
system (cmdBuff);
return 0;
}