Perl script for normalizing negative numbers

Perl script for normalizing negative numbers - perl

I have several large csv files in which I would like to replace all numbers less than 100, including negative numbers with 500 or another positive number.
I'm not a programmer but I found a nice perl one liner to replace the white space with comma 's/[^\S\n]+/,/g'. I was wondering if there's any easy way to do this as well.

Using Windows formatting for a perl 1-liner
perl -F/,/ -lane "print join(q{,},map{/^[-\d.]+$/ && $_ < 100 ? 100: $_} #F),qq{\n};" input.csv > output.csv

The following works for me, assuming there are 2 files in the directory:
test1.txt:
201,400,-1
-2.5,677,90.66,30.32
222,18
test2.txt
-1,-1,-1,99,101
3,3,222,190,-999
22,100,100,3
using the one liner:
perl -p -i.bak -e 's/(-?\d+\.?\d*)/$1<100?500:$1/ge' *
-p will apply the search-replace process to each line in each file, -i.bak means do the replacement in the original files and backup those files with new files having .bak extension. s///ge part will find all the numbers (including negative numbers) and then compare each number with 100, if less than 100 then replace it with 500. g means find all match numbers. e means the replacement part will be treated as Perl code. * means process all the files in the directory
After executed this one liner, I got 4 files in the directory as:
test1.txt.bak test1.txt test2.txt.bak test2.txt
and the content for test1.txt and test2.txt are:
test1.txt
201,400,500
500,677,500,500
222,500
test2.txt
500,500,500,500,101
500,500,222,190,500
500,100,100,500

Related

Perl - Changing file name in the middle of write

I am trying to take a very large txt file (over a million lines) that I created in Perl and run it through a different statement in Perl that will essentially look something like this (note the following is shell)
a=0
b=1
while read line;
do
echo -n "" > "Write file"${b}
a=($a + 1)
while ( $a <= 5000)
do
echo $line >> "Write file"${b}
a=($a + 1)
done
a=0
b=($b + 1)
done < "read file"
Trying to size it down to 5k lines per file, and incrementing each time (filename1.txt, filename2.txt, filename3.txt, etc)
This doesn't seem to work in shell, possibly due to the size of the input file, and for the life of me I can't think of how to change what file I am writing to in the middle of the loop..

You can just do this in the shell using split.
For example:
split -l 5000 filename.txt filename.txt.
will split filename.txt into multiple files with a max of 5,000 lines each. The output files will be names filename.txt.aa, filename.txt.ab, filename.txt.ac, etc.
From my man split:
NAME
split -- split a file into pieces
SYNOPSIS
split [-a suffix_length] [-b byte_count[k|m]] [-l line_count] [-p pattern] [file [name]]
DESCRIPTION
The split utility reads the given file and breaks it up into files of 1000 lines each. If file is a single dash (`-') or absent, split reads from the stan-
dard input.
The options are as follows:
-a suffix_length
Use suffix_length letters to form the suffix of the file name.
-b byte_count[k|m]
Create smaller files byte_count bytes in length. If ``k'' is appended to the number, the file is split into byte_count kilobyte pieces. If ``m'' is
appended to the number, the file is split into byte_count megabyte pieces.
-l line_count
Create smaller files n lines in length.
-p pattern
The file is split whenever an input line matches pattern, which is interpreted as an extended regular expression. The matching line will be the
first line of the next output file. This option is incompatible with the -b and -l options.
If additional arguments are specified, the first is used as the name of the input file which is to be split. If a second additional argument is specified,
it is used as a prefix for the names of the files into which the file is split. In this case, each file into which the file is split is named by the prefix
followed by a lexically ordered suffix using suffix_length characters in the range ``a-z''. If -a is not specified, two letters are used as the suffix.
If the name argument is not specified, the file is split into lexically ordered files named with the prefix ``x'' and with suffixes as above.

As an aside, this is your fixed script:
#!/bin/sh
a=0
b=1
while read line; do
if [ $a -eq 0 ]; then
echo -n '' > out-file-${b}
fi
echo $line >> out-file-${b}
a=$(( $a + 1 ))
if [ $a -eq 10 ]; then
a=0
b=$(( $b + 1 ))
fi
done < in-file
Tested with bash and dash.

Inserting headers into multiple files

I found some command line with Perl that inserts headers into my files without going through the tedious process of inserting them one by one. Can someone walk me through the Perl aspect of this command line? I'm new to this and can't seem to find the right explanations for what I wrote.
cat header.txt | perl -0 -i -pe 'BEGIN{$h = <STDIN>}; print $h' 1*

-e
rather than provide a script in a xxxx.pl file, provide it on the command line
-p
makes it iterate over filename arguments somewhat like sed but also prints the contents of $_ at the end of the script.
the two above are combined in -pe
-i
indicate you want to edit the file in place and write the output to the same file. In practice, Perl renames the input file and reads from this renamed version while writing to a new file with the original name
-0
redefines the end of record character (\n by default) so that you can read the entire input file as a single line
1*
is the command line argument to your script, so I guess you are modifying any file with a name that starts with 1 (you could have used *.c, or whatever depending on the type of files you are trying to modify)
print $h
prints the variable $h that is the "main" of your script. if it was initialized with the content of the header file (the intent of this one-liner) then it will print the header file
BEGIN{ some code here }
this is stuff you execute before the script starts. this is where I'm stumped. this doesn't seem like valid perl code
so basically:
this will supposedly slurp the entire header file (because of -0) in the BEGIN block and store it in the variable $h
iterate over all the files specified by the wildcards at the end of the command line
for each file: print the header (print $h) then print hte file itself (because of -pe)
so it's equivalent to spelling the script out:
$h = gets content of the entire header file
while (<>){ #loop implied by -pe, iterates over all the 1* files
# the main contents of the "-e" script are inserted below as part of executing -pe
print h$; #print the header we saved
print $_; # implied by -pe, and since we are using -0, this prints the entire content in one shot
# end of the "-e" script. again it was a single print $h statement, the second print is implied by -pe
}
It's a bit hard to explain, take a look at the perlrun documentation for details (run man perlrun).
This is not 100% complete explanation because I don;t think the BEGIN block is right. I tried it on my ubuntu machine and it complained about its syntax too

Here's something similar, with an explanation. The program in the question doesn't run on my mac.
I needed to add the #nullable disable directive to the top of all my csharp files as part of migrating to nullable reference types.
perl -w -i -p -0777 -e 's/^/#nullable disable\n\n/' $(find . -iname '*.cs')
-w enable warnings
-i edit files in place
-p read each file block by block, printing each block after applying a perl expression. the default block size is one line
-0777 changes the default block size to the entire file
-e the perl expression to execute
The final argument uses shell command substitution to create a list of files. It passes that list of file paths to the perl command. The find command searches for files that end in .cs.
The perl program is a single substitution command. It matches the very beginning of the block and replaces (prepends, really) with "#nullable disable" and a couple new-lines.

Print line numbers after comparison

Can someone tell me the best way to print the number of different lines in 2 files. I have 2 directories with 1000s of files and I have a perl script that compares all files in dir1 with all files in dir2 and outputs the difference to a different file. Now I need to add something like Filename - # of different lines
File1 - 8
File2 - 30
Right now I am using
my $diff = `diff -y --suppress-common-lines "$DirA/$file" "$DirB/$file"`;
But along with this I also need to print how many lines are different in each one of those 1000 files.
Sorry is a duplicate of my prev thread. So would be glad if some moderator could delete the previous one

Why you even use perl?
for i in "$dirA"/*; do file="${i##*/}"; echo "$file - $(diff -y --suppress-common-lines "$i" "$dirB/$file" | wc -l)" ; done > diffs.txt

Perl oneliner match repeating itself

I'm trying to read a specific section of a line out of a file with Perl.
The file in question is of the following syntax.
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
My oneliner is simple,
perl -ne 'm/^\$USER1\$\s*=\s*(\S*?)\s*$/m; print "$1";' /my/file
For some reason I'm getting the extraction for $1 repeated several times over, apparently once for every line in the file after my match occurs. What am I missing here?

You are executing print for every line of the file because print gets called for every line, whether the regex matches or not. Replace the first ; with an &&.

From perlre:
NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
Try this instead:
perl -ne 'print "$1" if m/^\$USER1\$\s*=\s*(\S*?)\s*$/m;' /my/file

$ cat test.txt
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
$ perl -nle 'print if /^\$USER1/;' test.txt
$USER1$=/usr/....

Try this
perl -ne '/^.*1?=([\w\W].*)$/;print "$1";' file

replace line with sed in csh

I am trying to change the content of a specific line in a batch of files. I thought that would be a piece of cake but for some reason, nothing happens, so I guess I am missing something.
Line 8 should have been replaced.
Here the csh script I used:
#!/bin/csh
#
# replace context in line xxx by yyy
# 2010/05/07
set files = `ls FILENAMEPART*`
echo $files
foreach file ($files)
sed '8,8 s/1/2 /' $file
end
thanks for suggestions

sed prints the resulting file (with the lines replaced) to stdout by default and leaves the source (input) file untouched. Use the -i option for in-place editing, which means that the changes are made directly in $file.