oneliner -- multiple file substitution transformation produces out-of-sync results - perl

Context
perl 5.22
multi-file transformation with perl oneliner
Overview
TrevorWattanStewie has a directory full of config files, and he wants to transform them.
The transformation operation is best understood by comparing "BEFORE" to "AFTER".
Files BEFORE
## ./configfile001.config
TrevorWattanStewie#oldmail.com;--blank--
## ./configfile002.config
TrevorWattanStewie#oldmail.com;--blank--
## ./configfile003.config
TrevorWattanStewie#oldmail.com;--blank--
## ./configfile004.config
TrevorWattanStewie#oldmail.com;--blank--
SallyWattanStewie#oldmail.com;--blank--
RickyWattanStewie#oldmail.com;--blank--
Files AFTER (Desired result)
## ./configfile001.config
TrevorWattanStewie#newmail.com;configfile001.config
## ./configfile002.config
TrevorWattanStewie#newmail.com;configfile002.config
## ./configfile003.config
TrevorWattanStewie#newmail.com;configfile003.config
## ./configfile004.config
TrevorWattanStewie#newmail.com;configfile004.config
SallyWattanStewie#newmail.com;configfile004.config
RickyWattanStewie#newmail.com;configfile004.config
Step by Step Explanation
Trevor wants to:
replace all --blank-- tokens with the name of the file currently being processed.
change all substrings from #oldmail into #newmail
Trevor's attempt
Trevor decides the quickest way to get the job done is with a perl oneliner script.
The oneliner Trevor uses is as follows:
perl -pi -e '$curf=$ARGV[0];s/--blank--/$curf/; s/#oldmail.com/#newmail.com/;' *.asc
Problem
When Trevor runs the script, the output does not meet his expectations.
The actual result is as follows:
Files AFTER (Actual result)
## ./configfile001.config
TrevorWattanStewie#oldmail.com;configfile002.config
## ./configfile002.config
TrevorWattanStewie#oldmail.com;configfile003.config
## ./configfile003.config
TrevorWattanStewie#oldmail.com;configfile004.config
## ./configfile004.config
TrevorWattanStewie#oldmail.com;
SallyWattanStewie#oldmail.com;
RickyWattanStewie#oldmail.com;
Questions
Why did Trevor's script fail to transform #oldmail to #newmail?
Why is the file numbering mismatched? The sequence numbering is off by one.

You want to use the variable $ARGV, which is the name of the currently processed file.
So s/--blank--/$ARGV/;
Also, #oldmail (etc) will be interpolated inside the regex, as Wumpus Q. Wumbley notes.
I always run my one-liners with -wE.

Trevor didn't enable warnings, thus missing out on the explanation:
$ perl -wpi -e '$curf=$ARGV[0];s/--blank--/$curf/; s/#oldmail.com/#newmail.com/;' *.asc
Possible unintended interpolation of #oldmail in string at -e line 1.
Possible unintended interpolation of #newmail in string at -e line 1.
#oldmail and #newmail are arrays. the s/// operator interpolates variables, including arrays. You need to use \#

Related

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

Uppercasing filename in Makefile using sed

I try to convert a filename such as foo/bar/baz.proto into something like foo/bar/Baz.java in my Makefile. For this purpose, I thought I could use sed. However, it seems that the command does not work as expected:
uppercase_file = $(shell echo "$(1)" | sed 's/\(.*\/\)\(.*\)/\1\u\2/')
# generated Java sources
PROTO_JAVA_TARGETS := ${PROTO_SPECS:$(SRCDIR)/%.proto=$(JAVAGEN)/$(call uppercase_file,%).java}
When I try to run the sed command on the command line it seems to work:
~$ echo "foo/bar/baz" | sed 's/\(.*\/\)\(.*\)/\1\u\2/'
foo/bar/Baz
Any ideas why this does not work inside the Makefile?
UPDATE:
The java files are generated with the following target:
$(JAVAGEN)/%.java: $(SRCDIR)/%.proto
How can I apply the substitution also for targets?
GNU Make does not replace % character in the replacement part of a substitution reference (which is basically a syntactic sugar for patsubst) if it is part of a variable reference. I have not found this behavior described in the documentation, but you can look it implemented in the source code (the relevant function I believe is find_char_unquote).
I suggest moving the call out of the substitution reference, since uppercase_file obviously works properly on any file path:
PROTO_JAVA_TARGETS := $(call uppercase_file,${PROTO_SPECS:$(SRCDIR)/%.proto=$(JAVAGEN)/%.java})
If $(PROTO_SPECS) resolves not to a single element, but rather to a list of elements, you can use foreach to call the function on every elements of a processed list:
PROTO_JAVA_TARGETS := $(foreach JAVA,${PROTO_SPECS:$(SRCDIR)/%.proto=$(JAVAGEN)/%.java},$(call uppercase_file,$(JAVA)))
The java files are generated with the following target: $(JAVAGEN)/%.java: $(SRCDIR)/%.proto
How can I apply the substitution also for targets?
Since Make matches targets first, and there is no way to run sed backwards, what you need here is either define an inverse function, or generate multiple explicit rules. I will show the latter approach.
define java_from_proto
$(call uppercase_file,$(1:$(SRCDIR)/%.proto=$(JAVAGEN)/%.java)): $1
# Whatever recipe you use.
# Use `$$#`, `$$<` and so on instead of `$#` or `$<`.
endef
$(foreach PROTO,$(PROTO_SPECS),$(eval $(call java_from_proto,$(PROTO))))
We basically generate one rule per file in $(PROTO_SPEC) using a multiline variable syntax, and then use eval to install that rule. There is also a very similar example on this documentation page that can be helpful.

How to perl convert xml (name with pattern) to json?

The next convert test.xml to json:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test.xml "))'
but I need convert any xml (example test-1.xml test-2.xml test-3.xml test-4.xml etc) with pattern name /tmp/test-*.xml, but if I use:
perl -MJSON::Any -MXML::Simple -le'print JSON::Any->new()->objToJson(XMLin("/tmp/test-*.xml "))'
I have the next messages:
File does not exist: /tmp/test-*.xml at -e line 1
How I do it?
There's problems with what you're trying to do:
XML::Simple isn't simple. It's for simple XML. It'll mangle your XML and give inconsistent results. See: Why is XML::Simple "Discouraged"?
XML is fundamentally more complicated than JSON, so there's no linear transformation. You need to figure out what'd you'd do with attributes and duplicate elements for a start.
File does not exist: /tmp/test-*.xml at -e line 1 - means the file doesn't exist. So you're not going to get very far. But XMLin doesn't accept wildcards. You'll have to process one file at a time.
The first two points are solvable, provided you accept that this cannot be a generic solution - to give a moderately general solution, we'll need an example of your source XML. But it won't be a one liner.
You seem to be asking how to find files matching a file glob.
You could use
my #qfns = glob("/tmp/test-*.xml");
If you just want the first matching file, use
my ($qfn) = glob("/tmp/test-*.xml");
Do not use the following since glob acts an iterator in scalar context.
my $qfn = glob("/tmp/test-*.xml"); # XXX
You can try this using glob and map functions.
perl -MJSON::Any -MXML::Simple -le'local $,="\n"; print map { JSON::Any->new()->objToJson(XMLin($_)) } glob "/path/to/my/test*.xml"'

perl memory usage when processing a file inline

I have a CGI script that's used by our employees to fetch logs from servers that they don't have direct access to. For reasons I won't go into, after a recent update to our app some of these logs now have characters like linefeeds, tabs, backslashes, etc. translated into their text equivalents. As such, I've modified the CGI script to invoke the following to convert these back to their original values:
perl -i -pe 's/\\r/\r/g && s/\\n/\n/g && s/\\t/\t/g && s/\\\//\//g' $filename
I was just informed that some people are now getting out of memory errors when they try to fetch logs that are fairly large (a few hundred MB).
My question: How does perl manage memory when an inline command like this is invoked? Is it reading the whole file in, processing it, then writing it out, or is it creating a temporary file, processing the lines from the input file one at a time then replacing the file once complete?
This is using perl 5.10.1 on a 64-bit Amazon linux instance.
The -p switch creates a while(<>){...; print} loop to iterate on each “line” in your input file.
If all of your newlines have been converted into "\\n", then your file would just be a single very long line. Therefore, your command would be loading the entire file into memory to perform your fix.
To avoid that, you'll have to intentionally buffer the file using either sysread or $/.
It would probably be easiest to create an actual script instead of a one-liner to do the work. However, if you know that all of your newlines are converted, then one simple fix would be to use $/ = "\\n"
As a secondary note, your regex is flawed. You're currently listing out your translations s/// using a shortcut operator. If any one of the earlier regexes doesn't match for a particular line, then no other translations would be attempted. You should instead use simple semicolons to separate your regexes:
's/\\r/\r/g; s/\\n/\n/g; s/\\t/\t/g; s|\\/|/|g'

zsh filename globbling/substitution

I am trying to create my first zsh completion script, in this case for the command netcfg.
Lame as it may sound I have stuck on the first hurdle, disclaimer, I know how to do this crudely, however I seek the "ZSH WAY" to do this.
I need to list the files in /etc/networking but only the files, not the directory component, so I do the following.
echo $(ls /etc/network.d/*(.))
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
What I wanted was:
ethernet-dhcp wireless-wpa-config
So I try (excuse my naivity) :
echo ${(s/*\/)$(ls /etc/network.d/*(.))}
/etc/network.d/ethernet-dhcp /etc/network.d/wireless-wpa-config
It seems that this doesn't work, I'm sure there must be some clever way of doing this by splitting into an array and getting the last part but as I say, I'm complete noob at this.
Any advice gratefully received.
General note: There is no need to use ls to generate the filenames. You might as well use echo some*glob. But if you want to protect the possible embedded newline characters even that is a bad idea. The first example below globs directly into an array to protect embedded newlines. The second one uses printf to generate NUL terminated data to accomplish the same thing without using a variable.
It is easy to do if you are willing to use a variable:
typeset -a entries
entries=(/etc/network.d/*(.)) # generate the list
echo ${entries#/etc/network.d/} # strip the prefix from each one
You can also do it without a variable, but the extra stuff to isolate individual entries is a bit ugly:
# From the inside, to the outside:
# * glob the entries
# * NUL terminate them into a single string
# * split at NUL
# * strip the prefix from each one
echo ${${(0)"$(printf '%s\0' /etc/network.d/*(.))"}#/etc/network.d/}
Or, if you are going to use a subshell anyway (i.e. the command substitution in the previous example), just cd to the directory so it is not part of the glob expansion (plus, you do not have to repeat the directory name):
echo ${(0)"$(cd /etc/network.d && printf '%s\0' *(.))"}
Chris Johnsen's answer is full of useful information about zsh, however it doesn't mention the much simpler solution that works in this particular case:
echo /etc/network.d/*(:t)
This is using the t history modifier as a glob qualifier.
Thanks for your suggestions guys, having done yet more reading of ZSH and coming back to the problem a couple of days later, I think I've got a very terse solution which I would like to share for your benefit.
echo ${$(print /etc/network.d/*(.)):t}
I'm used to seeing basename(1) stripping off directory components; also, you can use echo /etc/network/* to get the file listing without running the external ls program. (Running external programs can slow down completion more than you'd like; I didn't find a zsh-builtin for basename, but that doesn't mean that there isn't one.)
Here's something I hope will help:
haig% for f in /etc/network/* ; do basename $f ; done
if-down.d
if-post-down.d
if-pre-up.d
if-up.d
interfaces