I am trying the following one liner to convert a file from shiftjis encoding to utf-8 and its not working. Any helpful smart people available?
perl -i.bak -e 'use utf8; use Encode qw(decode encode); my $ustr = Encode::decode("shiftjis",$_); my $val = Encode::encode("utf-8",$ustr); print "$val";' filename
I am pretty new to code pages and the web seems rife with all sorts of complexities on the subject. I just want a one liner. The input file and the output file appear to be the same.
You forgot the -n switch, which will iterate over each line of input, loading one line at a time into $_ and executing the code provided in the -e argument.
More concisely, you could write your program like
perl -MEncode -pi.bak -e '$_=encode("utf-8",decode("shiftjis",$_))' filename
Perl is an odd choice for this, given that there's already a standard utility for doing it:
iconv -f shift-jis -t utf-8 filename
Of course, that doesn't easily let you edit a file in-place, but there's also recode which is likewise installed on my system somehow :)...
recode shift-jis..utf-8 filename
Or use moreutils:
iconv -f shift-jis -t utf-8 filename | sponge filename
Hmm. Seems like TMTOWTDI.
Related
Problem Background
We have several thousand large (10M<lines) text files of tabular data produced by a windows machine which we need to prepare for upload to a database.
We need to change the file encoding of these files from cp1252 to utf-8, replace any bare Unix LF sequences (i.e. \n) with spaces, then replace the DOS line end sequences ("CR-LF", i.e \r\n) with Unix line end sequences (i.e. \n).
The dos2unix utility is not available for this task.
We initially had a bash function that packaged these operations together using iconv and sed, with iconv doing the encoding and sed dealing with the LF/CRLF sequences. I'm trying to replace part of this bash function with a perl command.
Example Code
Based on some helpful code review, I want to change this function to a perl script.
The author of the code review suggested the following perl to replace CRLF (i.e. "\r\n") with LF ("\n").
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;'
The explanation for why this is better than what we had previously makes perfect sense, but this line fails for me with:
Unrecognized switch: -g (-h will show valid options).
More interestingly, the author of the code review also suggests it is possible to perform the decode/recode in a perl script, too, but I am completely unsure where to start.
Questions
Please can someone explain why the suggested answer fails with Unrecognized switch: -g (-h will show valid options).?
If it helps, the line is supposed to receive piped input from incov as follows (though I am interested in learning how to use perl to do the redcoding/recoding step, too):
iconv --from-code=CP1252 --to-code=UTF-8 $1$ | \
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;'
> "$2"
(Highly simplified) example input for testing:
apple|orange|\n|lemon\r\nrasperry|strawberry|mango|\n\r\n
Desired output:
apple|orange| |lemon\nrasperry|strawberry|mango| \n
Perl recently added the command line switch -g as an alias for 'gulp mode' in Perl v5.36.0.
This works in Perl version v5.36.0:
s=$(printf "Line 1\nStill Line 1\r\nLine 2\r\nLine 3\r\n")
perl -g -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;' <<<"$s"
Prints:
Line 1 Still Line 1
Line 2
Line 3
But any version of perl earlier than v5.36.0, you would do:
perl -0777 -pe 's/(?<!\r)\n/ /g; s/\r\n/\n/g;' <<<"$s"
# same
BTW, the conversion you are looking for a way easier in this case with awk since it is close to the defaults.
Just do this:
awk -v RS="\r\n" '{gsub(/\n/," ")} 1' <<<"$s"
Line 1 Still Line 1
Line 2
Line 3
Or, if you have a file:
awk -v RS="\r\n" '{gsub(/\n/," ")} 1' file
This is superior to the posted perl solution since the file is processed record be record (each block of text separated by \r\n) versus having the read the entire file into memory.
(On Windows you may need to do awk -v RS="\r\n" -v ORS="\n" '...')
Another note:
You can get similar behavior from Perl by:
Setting the input record separator to the fixed string $/="\r\n" in a BEGIN block;
Use the -l switch so every line has the input record separator removed;
Use tr for speedy replacement of \n with ' ';
Possible set the output record separator, $/="\n", on Windows.
Full command:
perl -lpE 'BEGIN{$/="\r\n"} tr/\n/ /' file
The error message is about the command line switch -g you use in perl -g -pe .... This is not about the switch at the regex - which is valid (but useless since there is only a single \n in a line anyway, and -p reads line by line).
This switch simply does not exist with the perl version you are using. It was only added with perl 5.36, so you are likely using an older version. Try -0777 instead.
Could command lines parameters been saved to a file and then pass the file to perl to parse out the options? Like response file (prefix the name with #) for some Microsoft tools.
I am trying to pass expression to perl via command line, like perl -e 'print "\n"', and Windows command prompt makes using double quotes a little hard.
There are several solutions, from most to least preferable.
Write your program to a file
If your one liner is too big or complicated, write it to a file and run it. This avoids messing with shell escapes. You can reuse it and debug it and work in a real editor.
perl path\to\some_program
Command line options to perl can be put on the otherwise useless on Windows #! line. Here's an example.
#!/usr/bin/perl -i.bak -p
# -i.bak Backs up the file.
# -p Puts each line into $_ and writes out the new value of $_.
# So this changes all instances in a file of " with '.
s{"}{'}g;
Use alternative quote delimiters
Perl has a slew of alternative ways to write quotes. Use them instead. This is good for both one liners as well as things like q[<tag key='value'>].
perl -e "print qq[\n]"
Escape the quote
^ is the cmd.exe escape character. So ^" is treated as a literal quote.
perl -e "print ^"\n^""
Pretty yucky. I'd prefer using qq[] and reserve ^" for when you need to print a literal quote.
perl -e "print qq[^"\n]"
Use the ASCII code
The ASCII and UTF-8 hex code for " is 22. You can supply this to Perl with qq[\x22].
perl -e "print qq[\x22\n]"
You can read the file into a string and then use
use Getopt::Long qw(GetOptionsFromString);
$ret = GetOptionsFromString($string, ...);
to parse the options from that.
I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...
You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak
More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv
If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)
In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe
I simply wanna read in a logfile, do a search and replace, and then write out the changes to that same logfile.
What's the best practice way of doing this in Perl?
I normally code up a one liner for this:
perl -i -pe 's/some/thing/' log.file
See Here
This is often done with a one-liner:
perl -pi.bak -e "s/find/replace/g" <file>
Note the -i.bak portion -- this creates a backup file with the extension .bak. If you want to play without a net you can do this to overwrite the existing file without a backup:
perl -pi -e "s/find/replace/g" <file>
or you can use sed (I know... you asked about perl):
sed -i 's/find/replace/g' <file>