Perl Command Understanding

Perl Command Understanding - perl

I have been working on a product code to resolve an issue but am stuck on a line of code
Can anyone help me understand what exactly does this command do?
perl -MText::CSV -lne 'BEGIN{$p = Text::CSV->new()} print join "|", $p->fields() if $p->parse($_)' /home/daily/${FULL_FILENAME} > /home/output.txt
I think its to copy the file to my home location with some transformations but not sure exactly

This is a slightly broken program that translates a comma-separated values (CSV) file to a pipe-separated values file.
The particular command-line switches are documented in perlrun. This is a "one-liner", so you can read about those to see what's going on there.
The Text::CSV module deals with CSV files, and the program is parsing a line from the file and re-outputting as a pipe-separated file.
But, this program deals with each line as a complete record. That might be fine for you, but at some point you might end up with a literal value that has a newline in it, like a,"b\nc",d. Now reading line-by-line breaks the program since the quotes appear to be unclosed within the first line. Note only that, it blindly concatenates the parsed fields without considering if any of the fields should be quoted. It might be unlikely that a pipe character would be in the data, but the problem isn't it's rarity but the consequences and costliness when it does show up.
The rewrite.pl example script in the related module Text::CSV_XS is a tool that could replace this one-liner. It properly reads the input and knows how to properly translate it.

Related

Problems with Wget64

Attempting to write a Wget to get and save Vanguard pricing data. So far I have 2 statements that both work correctly from the Command Line when I paste the string. When I save the string as a bat file one works and the other gives an unexpected result.
The string that works correctly in both places is:
Wget64 --output-document=C:\Users\Default\downloads\VVA_Daily_Portfolio-%DATE:~-4%-%DATE:~4,2%-%DATE:~7,2%.html "https://personal.vanguard.com/us/funds/annuities/variable"
The string that only works in the Command Line and not as a bat file is:
Wget64 --output-document=C:\Users\Default\downloads\VVA_Fund64_History-%DATE:~-4%-%DATE:~4,2%-%DATE:~7,2%.html "https://personal.vanguard.com/us/funds/tools/pricehistorysearch?radio=1&results=get&FundType=VVAP&FundIntExt=INT&FundId=0064&fundName=0064&fundValue=0064&radiobutton2=1&beginDate=03%2F01%2F2017&endDate=12%2F31%2F2017&year=#res"
Can someone help me write the script so that the expected result is achieved. I suspect that the Vanguard website can tell the difference between a Command Line vs bat file query, or that there is something inherently different between the two methods of execution.
ANy assistance is appreciated. Dan

The cmd command parser behaves differently in command line and batch files. In this case, the main problem is the variable expansion. In command line when a variable does not contain a value (it is undefined), the variable read operation is not removed, but inside batch files the read operation is removed.
That means that something like echo(%thisDoesNotExist% will output (under the assumption the variable does not exist) %thisDoesNotExist% in command line and nothing in batch file.
What relation has this with your problem?
If we split your wget in parts you have
Wget64
--output-document="C:\Us ... y-%DATE:~-4%-%DATE:~4,2%-%DATE:~7,2%.html"
^........^ ^.........^ ^.........^
"https://pe ... h?radio=1& ... &beginDate=03%2F01%2F2017&endDate=12%2F31%2F2017&year=#res"
^....^ ^....^
You can see where the parser tries to resolve variables, correctly in the output case and incorrectly (from the purpouse of the command point of view) in the URL.
You need to escape (by doubling them) the percent signs that are not part of a variable read operation, ex. ... beginDate=03%%2F01%%2F2017&...

zip recursively each file in a dir, where the name of the file has spaces in it

I am quite stuck; I need to compress the content of a folder, where I have multiple files (extension .dat). I went for shell scripting.
So far I told myself that is not that hard: I just need to recursively read the content of the dir, get the name of the file and zip it, using the name of the file itself.
This is what I wrote:
for i in *.dat; do zip $i".zip" $i; done
Now when I try it I get a weird behavior: each file is called like "12/23/2012 data102 test1.dat"; and when I run this sequence of commands; I see that zip instead of recognizing the whole file name, see each part of the string as single entity, causing the whole operation to fail.
I told myself that I was doing something wrong, and that the i variable was wrong; so I have replaced echo, instead than the zip command (to see which one was the output of the i variable); and the $i output is the full name of the file, not part of it.
I am totally clueless at this point about what is going on...if the variable i is read by zip it reads each single piece of the string, instead of the whole thing, while if I use echo to see the content of that variable it gets the correct output.
Do I have to pass the value of the filename to zip in a different way? Since it is the content of a variable passed as parameter I was assuming that it won't matter if the string is one or has spaces in it, and I can't find in the man page the answer (if there is any in there).
Anyone knows why do I get this behavior and how to fix it? Thanks!

You need to quote anything with spaces in it.
zip "$i.zip" "$i"
Generally speaking, any variable interpolation should have double quotes unless you specifically require the shell to split it into multiple tokens. The internal field separator $IFS defaults to space and tab, but you can change it to make the shell do word splitting on arbitrary separators. See any decent beginners' shell tutorial for a detailed account of the shell's quoting mechanisms.

How can I force emacs (or any editor) to read a file as if it is in ASCII format?

I could not find this answer in the man or info pages, nor with a search here or on Google. I have a file which is, in essence, a text file, but it somehow got screwed up upon saving. (I think there are a few strange bytes at the front of the file accidentally.)
I am able to open the file, and it makes sense, using head or cat, but not using any sort of editor.
In the end, all I wish to do is open the file in emacs, delete the "messy" characters, and save it once cleaned up. The file, however, is huge, so I need something powerful like emacs to be able to open it.
Otherwise, I suppose I can try to create a script to read this in line by line, forcing the script to read it in text format, then write it. But I wanted something quick, since I won't be doing this over & over.
Thanks!
Mike

perl -i.bk -pe 's/[^[:ascii:]]//g;' file
Found this perl one liner here: http://www.perlmonks.org/?node_id=619792

Try M-xfind-file-literally in Emacs.

You could edit the file using hexl-mode, which lets you edit the file in hexadecimal. That would let you see precisely what those offending characters are, and remove them.
It sounds like you either got a different line ending in the file (eg: carriage returns on a *nix system) or it got saved in an unexpected encoding.

You could use strings to grab "printable characters in file". You might have to play with the --encoding though I have only ever used it to grab ascii strings from executable files.

Read and delete text between two strings in perl

I need a way to read and delete text between two different strings found in some file, then delete the two strings. Like a "cut command." I would like to have the text stored in a variable.
I saw the post about reading text between two strings, but I could not figure out how to delete it as well.
I intend to execute the stored text in bash. Efficiency is desirable. This script is not going to be used on large files, but it may be executed many times sequentially so the faster the script works the better.
The stored text will usually have special characters.
Thanks

Specify the beginning and ending strings via the environment, and the file to use on the perl command line:
export START_STRING='abc def'
export END_STRING='ghi jkl'
perl -0777 -i -wpe's/\Q$ENV{START_STRING}\E(.*)\Q$ENV{END_STRING}\E/s;print STDERR $1' file_to_use 2>savedtext

Perl and reading files with different encodings

I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I find the regular expression '/Book Title: (.*)/' and take what's in the paren. Then, I create a separate .txt file with the name of the text file being my book. However, in my unix server, when I look at the name of the file, it's actually not, for example, 'LordOfTheFlies.txt' but rather 'LordOfTheFlies^M.txt'
What is this '^M'? Is that a weird end of line encoding I'm not taking into account? I tried chomp but it doesn't seem to be working. What is the best file encoding for working with perl?

It's the additional carriage return character that Windows systems insert before line feed characters (M == 13th letter, hence ASCII 13 is visualised as ^M).
It has nothing to do with file encoding, it's just the line ending policy biting you. Perl is usually good at handling line ending characters correctly, but if they occur somewhere else than the end of a line you have to do it yourself. You can use s/\r// instead of chomp() to get them out.

Before processing the file, you need to know the encoding of the file, which is determined by the producer of the file.
That "^M" is control-M, which is a carriage return, and is not needed in Unix file systems.Looks like the file is created in Unix and transferred to Windows. It can also be added with ftp when text file are transfered as binaries.

Try chop, instead of 'chomp'. Chomp removes the 'new line character'. s/\r// is also good.
For your general question, you might want to use appropriate module for the file type you have to make your life easier and better with Perl.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse