I pieced together this one line to change the values in a csv file. It works perfect except that it removes all the spaces. If someone could explain what I'm doing wrong I would appreciate it.
perl -pne 's/\s+(-?\d+\.?\d*)/$1>100?1000:$1/ge
Everything matching the LHS of your regex
\s+(-?\d+\.?\d*)
will be replaced. That includes the whitespace matched by \s+. You can use a zero-width look-behind assertion as Matt suggested:
perl -pe 's/(?<=\s)(-?\d+\.?\d*)/$1>100?1000:$1/ge' file
or the special \K form, which will "keep" everything before the \K:
perl -pe 's/\s+\K(-?\d+\.?\d*)/$1>100?1000:$1/ge' file
Note that both -p and -n loop through every line of your input file(s), so you only need one or the other (although -p overrides -n if you do specify both). I used -p because it prints each line automatically. Details in perldoc perlrun.
Related
I have a fortran code with global comments, which start with a double exclamation mark (i.e., !!) and personal comments, which start with a single exclamation mark (i.e., !), and I just want to hide my personal comment lines (or substitute the line with another line, e.g., '! jw'). For example, the original code looks like this:
!! This is a global comment
Code..
Code..
! This is a personal comment
code... ! This is a personal comment
!! This is a global comment
code...
Then, I want to update the original code as:
!! This is a global comment
Code..
Code..
! jw
code... ! jw
!! This is a global comment
code...
I have tried to use "sed" and "awk", but I failed. So, would someone can help me? I prefer to use "sed" instead "awk" by the way.
Use Perl one-liner with negative lookbehind pattern:
perl -pe 's/(?<!!)!\s.*/! jw/' in_file > out_file
To change the file in-place:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' in_file
To change multiple files in-place, for example ex*.f90 files:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' ex*.f90
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
(?<!!)! : Exclamation point that is not preceded by an exclamation point.
\s : Whitespace.
.* : Any character, repeated 0 or more times.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlre: Negative lookbehind
perldoc perlrequick: Perl regular expressions quick start
sed '/!!/!s/!.*/! jw/' file
/!!/! If the line does not contain !!, then
s/!.*/! jw/ substitute all following a exclamation mark with ! jw.
awk 'BEGIN{FS=OFS="!"}$2{$2=" jw"}1' file
BEGIN{FS=OFS="!"} Set the field separators to !.
$2{$2=" jw"} If the 2nd field is not empty, substitute it by jw.
1 Print the line.
If the line starts with ! then you could do something like
sed 's/^! /! jw/' mycode.fortran >newcodefile.fortran
I would put it into a new file then rename after. If you overwrite your file you could end up cause problems if anything goes wrong.
The s in the string to sed tells it to search, and replace.
the ^ means start of line, so if the comment is further in the line than the beginning this won't find that comment.
Then we search for a line that starts with ! followed by a space and replace with ! jw
If you just run it as:
sed 's/^! /! jw/' mycode.fortran
without redirecting the output to a file it will stream the output to your console so you can see if it's working. Then run it again output to a file with the redirect >, check the file then do your renaming. Don't get rid of your original code file until your completely sure it worked and didn't do anything you didn't want.
I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?
You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.
# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection
I have a number of paragraphs that have returns at the end of a line. I do not want returns at the end of lines, I will let the layout program take care of that. I would like to remove the returns, and replace them with spaces.
The issue is that I do want returns in between paragraphs. So, if there is more than one return in a row (2, 3, etc) I would like to keep two returns.
This would allow for there to be paragraphs, with one blank line between then, but all other formatting for lines would be removed. This would allow the layout program to worry about the line breaks, and not the have the breaks determined by a set number of characters, as they are now.
I would like to use Perl to accomplish this change, but am open to other methods.
example text:
This is a test.
This is just a test.
This too is a test.
This too is just a test.
would become:
This is a test. This is just a test.
This too is a test. This too is just a test.
Can this be done easily?
Using a perl one-liner. Replace 2 or more newlines with just 2. Strip all single newlines:
perl -0777 -pe 's{(\n{2})\n*|\n}{$1//" "}eg' file.txt > newfile.txt
Switches:
-0777: Slurps the entire file
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
I came up with another solution and also wanted to explain what your regex was matching.
Matt#MattPC ~/perl/testing/8
$ cat input.txt
This is a test.
This is just a test.
This too is a test.
This too is just a test.
another test.
test.
Matt#MattPC ~/perl/testing/8
$ perl -e '$/ = undef; $_ = <>; s/(?<!\n)\n(?!\n)/ /g; s/\n{2,}/\n\n/g; print' input.txt
This is a test. This is just a test.
This too is a test. This too is just a test.
another test. test.
I basically just wrote a perl program and mashed it into a one-liner. It would normally look like this.
# First two lines read in the whole file
$/ = undef;
$_ = <>;
# This regex replaces every `\n` by a space
# if it is not preceded or followed by a `\n`
s/(?<!\n)\n(?!\n)/ /g;
# This replaces every two or more \n by two \n
s/\n{2,}/\n\n/g;
# finally print $_
print;
perl -p -i -e 's/(\w+|\s+)[\r\n]/$1 /g' abc.txt
Part of the problem here is what you are matching. (\w+|\s+) matches one of more word characters, which is the same as [a-zA-Z0-9_], OR one or more whitespace characters, which is the same as [\t\n\f\r ].
This wouldn't match your input, since you aren't matching periods, and no line consists of only white space or only characters (even the blank lines would need two whitespace characters to match it, since we have [\r\n] at the end). Plus, neither would match a period.
I have a file 1.htm. I want to replace a letter ṣ (s with dot below). I tried with both sed and perl and it does not replace.
sed -i 's/ṣ/s/g' "1.htm"
perl -i -pe 's/ṣ/s/g' "1.htm"
can anyone suggest what to do
1.html (not replacing ṣ)
Also i have found another strange thing. Sed (same command as above) replaces in one file but not the other I am putting the links
replacable.html
unreplacable.html same as 1.html
Why is it happening so. sed is able to replace ṣ in one file but not the other.
You have combined characters in the html file. That is, the "ṣ" is really a "s" followed by a " ̣" (a COMBINING DOT BELOW). One possibility to fix the oneliner is:
perl -C -i -pe 's/s\x{0323}/s/g' "1.htm"
That is, turn utf8 mode for stdout/stdin on (-C) and explicitely write the two characters in the left side of the s///.
Another possibility is to normalize all the combining characters using Unicode::Normalize, e.g.:
perl -C -MUnicode::Normalize -Mutf8 -i -pe '$_=NFKC($_); s/ṣ/s/g' "1.htm"
But this would also normalize all the other characters in the input file, which may or may not be OK for you.
This might work for you (GNU sed):
sed 's/\o341\o271\o243/s/g' file
To find seds octal interpretation of a character use:
echo 'ṣ'| sed l
This returns (for me):
\341\271\243$
ṣ
Then use \onnn (or combinations of) to find the correct pattern in the lefthandside (LFH) of the substitute command.
N.B. \onnn may also be used in the RHS of the substitute command.
I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!
\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.
Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.