This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I never learned Perl and try to understand a piece written by someone else. Could someone tell me what perl -pi.bak -e "tr/\n\r/ /d" test.XML means? Thanks in advance.
perl -pi.bak -e "tr/\n\r/ /d" test.XML
-p read input from argument file or standard input (implicit while loop around program)
-i.bak perform in-place edit on argument file and save a copy with extension .bak.
-e the code
tr/\n\r/ /d transliterate all characters in the left hand side \n\r to their corresponding character in the right hand side (a space), and /d delete any entries which do not have a corresponding character.
So basically this will take the file test.XML, store a copy in test.XML.bak, change all newlines to space, delete all line feed characters \r, and save this copy as test.XML.
Be aware that you can overwrite backups by running this command multiple times. The backup is not backed up.
Briefly, it's executing the script following -e for each line of the input file test.XML. It then writes the modifications back to the input file and retains the original file with a .bak suffix. I would create a sample XML file and try it out. You'll see that your original file is modified (check the filesystem dates) and the original retained with the .bak suffix.
The tr function is documented here, and translates characters. The above is manipulating carriage-returns/line-feeds.
The -p and -i arguments let Perl rename the input file such that it's retained as a backup of the original. See perlrun for more details.
See here for more tips/tricks relating to Perl one-liners.
It changes the newlines in test.XML to be spaces. It will also backup the original file as test.XML.bak.
Related
I have a list of files .tex file that contain fragments in the tex that build ps pictures which can be slow to process.
There are multiple fragments across multiple files and the end delimiter is \end{pspicture}
% this is the beginning of the fragment
\begin{pspicture}(0,0)(23,5)
\rput{0}(0,3){\crdKs}
\rput(1,3){\crdtres}
\rput(5,3){\crdAh}
\rput(6,3){\crdKh}
\rput(7,3){\crdsixh}
\rput(8,3){\crdtreh}
\rput(12,3){\crdQd}
\rput(13,3){\crdeigd}
\rput(14,3){\crdsixd}
\rput(15,3){\crdfived}
\rput(16,3){\crdtwod}
\rput(20,3){\crdKc}
\rput(21,3){\crdfourc}
\end{pspicture}
I would like to extract the fragments.
I am not sure how to go about this? can awk do this or sed?
They seem to work line by line, rather than work on the whole fragment.
I am not really looking for a solution just a good candidate tool.
sed -En '/^\\begin\{pspicture\}.*$/,/^\\end\{pspicture\}.*$/p' file
Utilising sed with -E for regular expressions.
Use //,// to determine start and ending regular expressions and print all lines from the start to the end.
I'd like to download many files (about 10000) from ftp-server. Names of the files are too long. I'd like to save them only with the date in names. For example: ABCDE201604120000-abcde.nc I prefer to be 20160412.nc
Is it possible?
I am not sure if wget provides similar functionality, nevertheless with curl, one can profit from the relatively rich syntax it provides in order to specify the URL of interest. For example:
curl \
"https://ftp5.gwdg.de/pub/misc/openstreetmap/SOTMEU2014/[53-54].{mp3,mp4}" \
-o "file_#1.#2"
will download files 53.mp3, 53.mp4, 54.mp3, 54.mp4. The output file is specified as file_#1.#2 - here, #1 is replaced by curl with the value of the sequence [53-54] corresponding to the file being downloaded. Similarly, #2 is replace with either mp3 or mp4. Thus, e.g., 53.mp3 will be saved as file_53.mp3.
ewcz's answer works fine if you can enumerate the file names as shown in the post. However, if the filenames are difficult to enumerate, for example, because the integers are sparsely populated, this solution would result in a lot of 404 Not Found requests.
If this is the case, then it is probably better to download all the files recursively, as you have shown, and rename them afterwards. If the file names follow a fixed pattern, you can select the substring from the original name and use it as the new name. In the given example, the new file names start at position 5 and are 8 characters long. The following bash command renames all *.nc files in the current directory.
for f in *.nc; do mv "$f" "${f:5:8}.nc" ; done
If the filenames do not follow a fix pattern and might vary in length, you can use more complex pattern substitution using sed, see SO post for an example.
This question already has an answer here:
sed find and replace between two tags with multi line
(1 answer)
Closed 8 years ago.
I need to replace a token in a file with a multi-line paragrah, which has several line breakers inside it if the paragraph is represented as a string.
If I use sed the usually for a string to string replacement, the line breakers inside the new string would complain.
So now I want to open the file and seek to that token location, then write the new content into the file from there, but not sure how to achieve that. Can anybody help?
EDIT:
Looks like I probably can put both the file and the content to be inserted as arrays then use splice in perl. Might not be the easiest way though.
perl -i -pe's/token/foo\nbar\nbaz\n/g' file
You can't really insert into a file. Just like inserting into a string, you must first move the remainder of the string out of the way. With files, it's easier just to copy the entire file.
The provided code opens file, deletes file, creates file, then copies (with substitutions) from the open handle to the new handle.
It's my understanding that sed can do this too. It's my understanding that sed also uses -i to enable this feature.
Check out: How do I change, delete, or insert a line in a file, or append to the beginning of a file?
The easiest solutions will be to either use perl's $INPLACE_EDIT, optionally done as a one liner like demonstrated by ikegami, or perhaps to use Tie::File.
I hope you are all well.
So my question is about the procedure to open multiple raw data files that are compressed.
My files' names are ordered so I have for example : o_equities_20080528.tas.zip o_equities_20080529.tas.zip o_equities_20080530.tas.zip ...
Thank you all in advance.
How much work this will be depends on whether:
You have enough space to extract all the files simultaneously into one folder
You need to be able to keep track of which file each record has come from (i.e. you can't tell just from looking at a particular record).
If you have enough space to extract everything and you don't need to track which records came from which file, then the simplest option is to use a wildcard infile statement, allowing you to import the records from all of your files in one data step:
infile "c:\yourdir\o_equities_*.tas" <other infile options as per individual files>;
This syntax works regardless of OS - it's a SAS feature, not shell expansion.
If you have enough space to extract everything in advance but you need to keep track of which records came from each file, then please refer to this page for an example of how to do this using the filevar option on the infile statement:
http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm
If you don't have enough space to extract everything in advance, but you have access to 7-zip or another archive utility, and you don't need to keep track of which records came from each file, you can use a pipe filename and extract to standard output. If you're on a Linux platform then this is very simple, as you can take advantage of shell expansion:
filename cmd pipe "nice -n 19 gunzip -c /yourdir/o_equities_*.tas.zip";
infile cmd <other infile options as per individual files>;
On windows it's the same sort of idea, but as you can't use shell expansion, you have to construct a separate filename for each zip file, or use some of 7zip's more arcane command-line options, e.g.:
filename cmd pipe "7z.exe e -an -ai!C:\yourdir\o_equities_*.tas.zip -so -y";
This will extract all files from all of the matching archives to standard output. You can narrow this down further via the 7-zip command if necessary. You will have multiple header lines mixed in with the data - you can use findstr to filter these out in the pipe before SAS sees them, or you can just choose to tolerate the odd error message here and there.
Here, the -an tells 7-zip not to read the zip file name from the command line, and the -ai tells it to expand the wildcard.
If you need to keep track of what came from where and you can't extract everything at once, your best bet (as far as I know) is to write a macro to process one file at a time, using the above techniques and add this information while you're importing each dataset.
How to rewrite a file from a shell script without any danger of truncating the file if out of disk space?
This handy perl one liner replaces all occurrences of "foo" with "bar" in a file called test.txt:
perl -pi -e 's/foo/bar/g' test.txt
This is very useful, but ...
If the file system where test.txt resides has run out of disk space, test.txt will be truncated to a zero-byte file.
Is there a simple, race-condition-free way to avoid this truncation occuring?
I would like the test.txt file to remain unchanged and the command to return an error if the file system is out of space.
Ideally the solution should be easily used from a shell script without requiring additional software to be installed (beyond "standard" UNIX tools like sed and perl).
Thanks!
In general, this can’t be done. Remember that the out-of-space condition can hit anywhere along the sequence of actions that give the appearance of in-place editing. Once the filesystem is full, perl may not be able to undo previous actions in order to restore the original state.
A safer way to use the -i switch is to use a nonempty backup suffix, e.g.,
perl -pi.bak -e 's/foo/bar/g' test.txt
This way, if something goes wrong along the way, you still have your original data.
If you want to roll your own, be sure to check the value returned from the close system call. As the Linux manual page notes,
Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.
As with everything else in life, leave yourself more margin for error. Disk is cheap. Dig out the pocket change from your couch cushions and go buy yourself half a terabyte or so.
From perldoc perlrun:
-i[extension]
specifies that files processed by the "<>" construct are to be edited in-place.
It does this by renaming the input file, opening the output file by the original
name, and selecting that output file as the default for print() statements. The
extension, if supplied, is used to modify the name of the old file to make a
backup copy, following these rules:
If no extension is supplied, no backup is made and the current file is
overwritten.
[…]
Rephrased:
The backup filename is determined from the value of the -i-switch, if one is given.
The original file is renamed to the new filename, and opened for the script. Renaming is atomic on most filesystems.
A file with the name of the original file is opened for writing. The file will start with length zero, but is not identical to the original file (which has a different name now).
After the script has finished, and if no explicit backup extension was provided, the backup file is deleted. The original file is then lost.
Should the system run out of drive space, then the new file is endangered, not the original file which was never copied or moved (at least on filesystems with an inode-like concept).