Copying a string(passed as command line arguments to Perl) into text file - perl

I have a string containing lots of text with white-spaces like:
String str = "abc xyz def";
I am now passing this string as a command line argument to a perl file using C# as in:
Process p = new Process();
p.StartInfo.FileName = "c:\\perl\\bin\\perl.exe";
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl " + str + " " + text_file;
In the run_cmd.pl file, I have the follwing:
open FILE, ">$ARGV[1]" or die "Failed opening file";
print FILE $ARGV[0];
close FILE;
On printing, I am able to copy only part of the string i.e. "abc" into text_file since Perl interprets it as a single argument.
My question is, is it possible for me to copy the entire string into the text file including the white spaces?

If you want a white space separated argument treated as a single argument, with most programs, you need to surround it with " "
e.g run_cmd.pl "abc xyz def" filename
Try
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl \"" + str + "\" " + text_file;
Side note:
I don't know about windows, but in Linux there's a number of arguments and maximum length of one argument limit so you might want to consider passing the string some other way, reading it from a tmp file for example.

It's a little bit of a hack, but
$ARGV[$#ARGV]
would be the last item in #ARGV, and
#ARGV[0 .. ($#ARGV - 1)]
would be everything before that.

It's not perl -- it's your shell. You need to put quotes around the arguments:
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl '" + str + "' " + text_file;
If text_file comes from user input, you'll likely want to quote that, too.
(You'll also need to escape any existing quotes in str or text_file; I'm not sure what the proper way to escape a quote in Windows is)

#meidwar said: "you might want to consider passing the string some other way, reading it from a tmp file for example"
I'll suggest you look into a piped-open. See http://search.cpan.org/~jhi/perl-5.8.0/pod/perlopentut.pod#Pipe_Opens and http://perldoc.perl.org/perlipc.html#Using-open()-for-IPC
These let you send as much data as your called code can handle and are not subject to limitations of the OS's command-line.

Related

Perl: how to format a string containing a tilde character "~"

I have run into an issue where a perl script we use to parse a text file is omitting lines containing the tilde (~) character, and I can't figure out why.
The sample below illustrates what I mean:
#!/usr/bin/perl
use warnings;
formline " testing1\n";
formline " ~testing2\n";
formline " testing3\n";
my $body_text = $^A;
$^A = "";
print $body_text
The output of this example is:
testing1
testing3
The line containing the tilde is dropped entirely from the accumulator. This happens whether there is any text preceding the character or not.
Is there any way to print the line with the tilde treated as a literal part of the string?
~ is special in forms (see perlform) and there's no way to escape it. But you can create a field for it and populate it with a tilde:
formline " \#testing2\n", '~';
The first argument to formline is the "picture" (template). That picture uses various characters to mean particular things. The ~ means to suppress output if the fields are blank. Since you supply no fields in your call to formline, your fields are blank and output is suppressed.
my #lines = ( '', 'x y z', 'x~y~z' );
foreach $line ( #lines ) { # forms don't use lexicals, so no my on control
write;
}
format STDOUT =
~ ID: #*
$line
.
The output doesn't have a line for the blank field because the ~ in the picture told it to suppress output when $line doesn't have anything:
ID: x y z
ID: x~y~z
Note that tildes coming from the data are just fine; they are like any other character.
Here's probably something closer to what you meant. Create a picture, #* (variable-width multiline text), and supply it with values to fill it:
while( <DATA> ) {
local $^A;
formline '#*', $_;
print $^A, "\n";
}
__DATA__
testing1
~testing2
testing3
The output shows the field with the ~:
testing1
~testing2
testing3
However, the question is very odd because the way you appear to be doing things seems like you aren't really doing what formats want to do. Perhaps you have some tricky thing where you're trying to take the picture from input data. But if you aren't going to give it any values, what are you really formatting? Consider that you may not actually want formats.

Remove unsafe HTTP characters from a string

I have to send a bunch of string variables as payloads in a HTTP POST message using Perl.
I want to remove "unsafe" characters, such as < > “ ‘ % ; ) ( & + from my string variable.
I know I can use a regex pattern to find and replace each of these characters, but I am wondering if there's any existing Perl library that already does that.
For example, I found Apache::Util
my $esc = Apache::Util::escape_uri($uri);
Can I use Apache::Util::escape for this? Or is there a better way?
EDIT 1: I have already mentioned that by unsafe, I mean characters like < > “ ‘ % ; ) ( & + which can be used in SQL-injection. I don't know how to describe this problem better.
EDIT 2: Here's the code that I am working on -it's an Embedded perl code:
$cgi = CGI->new();
my $param1 = $cgi->param('param1');
my $param2 = $cgi->param('param2');
my $param3 = $cgi->param('param3');
# I want to remove unsafe characters (< > “ ‘ % ; ) ( & +) from $param1, $param2 and $param3
# Q is, do I use Apache::Util::escape_uri; even if that's for removing unsafe chars from URI?
# OR do I use URI::Escape 'uri_escape';?
$script = <<__HTML__;
<script>
API.call ({
'paramA': '$param1',
'paramB': '$param2',
'paramC': '$param3'
});
</script>
__HTML__
EDIT 3: If anyone else has the same question, I ended up writing a perl function that looks for certain characters such as "(", "{", "$", ";", etc and removes them from your provided string parameter.
List of all characters that I am escaping are:
";", "(", ")", "[", "]", "{", "}", "~", "`", "/", "<", ">", "&", "|", "'", "\"", "\\"
Obviously, there's room for exclusions as well.
There is no general definition of unsafe characters, so it falls to you to determine whether any of your answers fulfill your requirement
Looking at the source of Apache::Util it does some very unpleasant things to its own name space, and I wouldn't trust it. It is intended to be used as a component of mod_perl, and shouldn't be accessed in isolation
I think the canonical way of escaping HTTP URIs is to use the URI::Escape module
use URI::Escape 'uri_escape';
You must provide data and code for any more help than this

terminal command: handle special characters in filename

I want to execute some commands in terminal. I create them in Swift 3.0 and write them to a command file. But some special characters make problems, e.g. single quote:
mv 'Don't do it.txt' 'Don_t do it.txt'
I use single quote to cover other special characters. But what's about single quotes itself. How can I convert them in a way every possible filename can be handled correctly?
You question is strange:
In this case we would be writing to shell script rather than a text file
You are replacing single quotes in the output file name, but not spaces,
which should be replaced
Here is a solution that gives proper escaping for the input files, and proper
replacing (read: spaces too) for the output files:
#!/usr/bin/awk -f
BEGIN {
mi = "\47"
no = "[^[:alnum:]%+,./:=#_-]"
print "#!/bin/sh"
while (++os < ARGC) {
pa = split(ARGV[os], qu, mi)
printf "mv "
for (ro in qu) {
printf "%s", match(qu[ro], no) ? mi qu[ro] mi : qu[ro]
if (ro < pa) printf "\\" mi
}
gsub(no, "_", ARGV[os])
print FS ARGV[os]
}
}
Result:
#!/bin/sh
mv 'dont do it!.txt' dont_do_it_.txt
mv Don\''t do it.txt' Don_t_do_it.txt
mv dont-do-it.txt dont-do-it.txt

Replace character in pig in HDInsight using powershell

My data is in the following format..
{"Foo":"ABC","Bar":"20090101100000","Quux":"{\"QuuxId\":1234,\"QuuxName\":\"Sam\"}"}
I need it to be in this format:
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}
I'm trying to using Pig's replace function to get it in the format I need..
So, I tried as in here..
#Specify the cluster name
$clusterName = "CLUSTERNAME"
#Where the output will be saved
$statusFolder = "/tutorial/pig/status"
#Store the Pig Latin into $QueryString
$QueryString = "LOGS = LOAD 'wasb:///example/data/sample.log'as unparsedString:chararray;" +
"REPL1 = foreach LOGS REPLACE($0, '"\\{', '\\{');"
...and so on..
I receive an error at the second line (REPL1 =...)
Unexpected token '\\' in expression or statement.
Now this code works perfectly well when I run it using remote desktop
Any help is sincerely appreciated..
Thanks
I assume you attempt to store the following string value in the variable:
REPL1 = foreach LOGS REPLACE($0, '"\\{', '\\{');
The first "interpretation" of your string is by the PowerShell parser. Since you use double-quotes ("), it's treated as an expandable string.
Since you don't escape the " inside the REPLACE() statement, the parser assumes that the string stops there.
What you're left with is:
"REPL1 = foreach LOGS REPLACE(, '"
# a valid string, $0 expanded to an empty string
\\
# two slashes , PowerShell cannot resolve these to anything meaningful
{
# opening curly brace
', '
# a valid string literal
\\
# two slashes , PowerShell still cannot resolve these to anything meaningful
{
# opening curly brace
');"
# non-terminated string
You need to escape the " inside REPLACE(), either by using a two double-quotes in succession (""), or use the backtick escape sequence (\"`):
$QueryString += "REPL1 = foreach LOGS REPLACE($0, '`"\\{', '\\{');"
or
$QueryString += "REPL1 = foreach LOGS REPLACE($0, '""\\{', '\\{');"
Your might also want to escape $0, to avoid string expansion:
$QueryString += "REPL1 = foreach LOGS REPLACE(`$0, '""\\{', '\\{');"

Using sed to remove embedded newlines

What is a sed script that will remove the "\n" character but only if it is inside "" characters (delimited string), not the \n that is actually at the end of the (virtual) line?
For example, I want to turn this file
"lalala","lalalslalsa"
"lalalala","lkjasjdf
asdfasfd"
"lalala","dasdf"
(line 2 has an embedded \n ) into this one
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
(Line 2 and 3 are now joined, and the real line feed was replaced with the character string \\n (or any other easy to spot character string, I'm not picky))
I don't just want to remove every other newline as a previous question asked, nor do I want to remove ALL newlines, just those that are inside quotes. I'm not wedded to sed, if awk would work, that's fine too.
The file being operated on is too large to fit in memory all at once.
sed is an excellent tool for simple substitutions on a single line but for anything else you should use awk., e.g:
$ cat tst.awk
{
if (/"$/) {
print prev $0
prev = ""
}
else {
prev = prev $0 " \\\\n "
}
}
$ awk -f tst.awk file
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
Below was my original answer but after seeing #NeronLeVelu's approach of just testing for a quote at the end of the line I realized I was doing this in a much too complicated way. You could just replace gsub(/"/,"&") % 2 below with /"$/ and it'd work the same but the above code is a simpler implementation of the same functionality and will now handle embedded escaped double quotes as long as they aren't at the end of a line.
$ cat tst.awk
{ $0 = saved $0; saved="" }
gsub(/"/,"&") % 2 { saved = $0 " \\\\n "; next }
{ print }
$ awk -f tst.awk file
"lalala","lalalslalsa"
"lalalala","lkjasjdf \\n asdfasfd"
"lalala","dasdf"
The above only stores 1 output line in memory at a time. It just keeps building up an output line from input lines while the number of double quotes in that output line is an odd number, then prints the output line when it eventually contains an even number of double quotes.
It will fail if you can have double quotes inside your quoted strings escaped as \", not "", but you don't show that in your posted sample input so hopefully you don't have that situation. If you have that situation you need to write/use a real CSV parser.
sed -n ':load
/"$/ !{N
b load
}
:cycle
s/^\(\([^"]*"[^"]*"\)*\)\([^"]*"[^"]*\)\n/\1\3 \\\\n /
t cycle
p' YourFile
load the lines in working buffer until a close line (ending with ") is found or end reach
replace any \n that is after any couple of open/close " followed by a single " with any other caracter that " between from the start of file by the escapped version of new line (in fact replace starting string + \n by starting string and escaped new line)
if any substitution occur, retry another one (:cycle and t cycle)
print the result
continue until end of file
thanks to #Ed Morton for remark about escaped new line