Brute force attack test on password for file - perl

I'm trying to create a brute force that will work on a specific files password.
I'm not sure how to get this code to work. This is what I have so far. This code produces the correct possible combinations for the password but I am not sure how to implement this into a brute force attack.
my #alpha = qw(a b c d e f g h i j k l m n o p q r s t u v w x y z);
my $password = #alpha[1];
my #combo = ();
for my $one(#alpha){
for my $two(#alpha){
for my $three(#alpha){
for my $four(#alpha){ push #combo, "$one$two$three$four\n"} }}
I assume ill need to use this command somewhere and secret_file_brute.zip is the file I'm using to test on.
I'm not sure how to declare the $password variable and how to enter my generated combinations from above one by one where the $password command is until the passwords is a match.
$returnVal = system("unzip -qq -o -P $password
secret_file_brute.zip > /dev/null 2>&1");

I think you're trying to generate all possible combination of passwords with the 26 latin characters. Right? Why not use the increment operator?
$password = "a";
for (;;) {
say "$password";
$password++;
}
$password will go from a to z, then from aa to zz, then from aaa to zzz, etc. Thus generating each and every possible combination of passwords from the 26 latin alphabetic characters.
If you're only interested in four character combinations:
$password = "aaaa";
while ( length $password < 5 ) {
say "$password";
$password++;
}

Brute force password cracking is very inefficient, so not really useful except as proof of concept.
You've a 4 character alphabetical password, which is a fairly trivial case.
First off - you can write:
my #alpha =( "a".."z" );
generating the words as you're doing will work, but you'll be inserting a linefeed, which means whatever system command you're running won't work.
You also might find making the attempt as you go will improve your speed, not least because you can use multiprocessing trivially for this sort of operation.
Also - you can trap the return code for system to see when you succeed. Capturing the text output of system won't help - you need to inspect $? - see: http://perldoc.perl.org/functions/system.html
Something like this maybe?
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $parallel = 8;
my #alpha = ( "a" .. "z" );
my $manager = Parallel::ForkManager->new($parallel);
my $parent_pid = $$;
for my $one (#alpha) {
for my $two (#alpha) {
for my $three (#alpha) {
for my $four (#alpha) {
$manager->start and next;
system(
"unzip -qq -o -P $one$two$three$four secret_file_brute.zip > /dev/null 2>&1"
);
if ( not $? ) {
print "Password was $one$two$three$four\n";
kill $parent_pid;
}
$manager->finish;
}
}
}
}

Related

perl parse command line arguments using shift command

I have a question regarding parsing command line arguments and the use of the shift command in Perl.
I wanted to use this line to launch my Perl script
/home/scripts/test.pl -a --test1 -b /path/to/file/file.txt
So I want to parse the command line arguments. This is part of my script where I do that
if ($arg eq "-a") {
$main::john = shift(#arguments);
} elsif ($arg eq "-b") {
$main::doe = shift(#arguments);
}
I want to use then these arguments in a $command variable that will be executed afterwards
my $var1=$john;
my $var2=$doe;
my $command = "/path/to/tool/tool --in $line --out $outputdir $var1 $var2";
&execute($command);
Now here are two problems that I encounter:
It should not be obligatory to specify -a & -b at the command line. But what happens now is that when I don't specify -a, I get the message that I'm using an uninitialized value at the line where the variable is defined
Second problem: $var2 will now equal $doe so it will be in this case /path/to/file/file.txt. However I want $var2 to be equal to --text /path/to/file/file.txt. Where should I specify this --text. It cannot be standardly in the $command, because then it will give a problem when I don't specify -b. Should I do it when I define $doe, but how then?
You should build your command string according to the contents of the variables
Like this
my $var1 = $john;
my $var2 = $doe;
my $command = "/path/to/tool/tool --in $line --out $outputdir";
$command .= " $var1" if defined $var1;
$command .= " --text $var2" if defined $var2;
execute($command);
Also
Don't use ampersands & when you are calling Perl subroutine. That hasn't been good practice for eighteen years now
Don't use package variables like $main:xxx. Lexical variables (declared with my) are almost all that is necessary
As Alnitak says in the comment you should really be using the Getopt::Long module to avoid introducing errors into your command-line parsing
GetOpt::Long might be an option: http://search.cpan.org/~jv/Getopt-Long-2.48/lib/Getopt/Long.pm
Regarding your sample:
You didn't say what should happen if -a or -b are missing, but defaults may solve your problem:
# Use 'join' as default if $var1 is not set
my $var1 = $john // 'john';
# Use an empty value as default if $var2 is not set
my $var2 = $doe // '';
Regarding the --text prefix:
Do you want to set it always?
my $command = "/path/to/tool/tool --in $line --out $outputdir $var1 --text $var2";
Or do you want to set it if -b = $var2 has been set?
# Prefix
my $var2 = "--text $john";
# Prefix with default
my $var2 = defined $john ? "--text $john" : '';
# Same, but long format
my $var2 = ''; # Set default
if ($john) {
$var2 = "--text $john";
}

Perl brute force attack

I am having a lot of trouble trying to create a brute force script. The password I need to crack is 1 to 4 characters long and all lowercase letters. I think I have figured out the code to generate all the possible combinations but I am not sure how to test this on a file. Any guidance or hints would be great.
$password = "aaaa";
while ( length $password < 5 ) {
print "$password\n";
$password++;
I had this similar problem. Either you are in my class or scripting classes around the country do this problem at the same time. My professor encourages forum use but we can't share answers with direct classmates at our university.
If you know me from your class by my username, then I ask that you do not use my code. Otherwise enjoy. I have commented the code since learning from working code is the best way to learn.
As long as you are using only letters you can just increment a scalar instead of nesting loops. If you do need to use other characters I bet you could just use an array of possible characters and increment through that array for each position, though let's ignore that since you seem to only need those letters =)
sub brute2()
{
print "Bruteforce Attack...\n";
print "Enter password length: "; #Prompt user for maximum length for pass
chomp(my $plen = (<>)); #Receive input and remove newline character
print "Password Length is $plen\n";
$plen++;
print "Press any key to continue.\n"; #Execute once they hit any key
if (<>)
{
my $pass = "a"; #This code assumes only letters a..z, so we just set here
while ( length $pass < $plen ) #Run check loop until we exaust all possibilities within the maximum length
{
my $status = system("unzip -pp -o -P $pass secret_file_brute.zip > /dev/null 2>&1"); #System call to compare our password against a zip file, this will set status to the return value
print ("Attempting: $pass Return: $status\n");
if ($status == 0) #Return value of 0 means success
{
print ("Password is: $pass Return is: $status\n"); #Print correct password. I did return value also for debug
last; #Break loop since we got correct password
}
$pass++; #Increment $pass var to next iteration IE "a" to "b", "aa" to "ab", "zzz" to "aaaa" etc...
}
}
}
According to the man page I found, unzip returns exit code 82 when it can't decrypt.
sub try {
my ($password) = #_;
system("unzip -qq -o -P $password secret_file_brute.zip >/dev/null 2>&1");
die("Can't launch unzip: $!\n") if $? == -1;
die("unzip killed by signal ".($? & 0x7F)."\n") if $? & 0x7F;
my $exit_code = $? >> 8;
die("unzip exited with error $exit_code\n") if $exit_code && $exit_code != 82;
return !$exit_code;
}
Your code does not generate all of the possible passwords (e.g. it doesn't generate aaa). The following does:
sub brute_force {
for (my $password = 'a'; length($password)<5; ++$password) {
return $password if try($password);
}
return undef;
}
The final bit is to display the results.
{
my $password = brute_force();
defined($password)
or die("Password not found\n");
print("$password\n");
}

Perl - Trouble with my unzip system call for zip file crack

I am a junior currently taking a scripting languages class that is suppose to spit us out with intermediate level bash, perl, and python in one semester. Since this class is accelerated, we speed through topics quickly and our professor endorses using forums to supplement our learning if we have questions.
I am currently working on our first assignment. The requirement is to create a very simple dictionary attack using a provided wordlist "linux.words" and a basic bruteforce attack. The bruteforce needs to compensate for any combination of 4 letter strings.
I have used print statements to check if my logic is sound, and it seems it is. If you have any suggestions on how to improve my logic, I am here to learn and I am all ears.
This is on Ubuntu v12.04 in case that is relevant.
I have tried replacing the scalar within the call with a straight word like unicorn and it runs fine, obviously is the wrong password, and it returns correctly. I have done this both in terminal and in the script itself. My professor has looked over this for a good 15 minutes he could spare, before referring me to forum, and said it looked good. He suspected that since I wrote the code using Notepad++ there might be hidden characters. I rewrote the code straight in the terminal using vim and it gave the same errors above. The code pasted is below is from vim.
My actual issue is that my system call is giving me problems. It returns the help function for unzip showing usages and other help material.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
#Prototypes
sub brute();
sub dict();
sub AddSlashes($);
### ADD SLASHES ###
sub AddSlashes($)
{
my $text = shift;
$text =~ s/\\/\\\\/g;
$text =~ s/'/\\'/g;
$text =~ s/"/\\"/g;
$text =~ s/\\0/\\\\0/g;
return $text;
}
### BRUTEFORCE ATTACK ###
sub brute()
{
print "Bruteforce Attack...\n";
print "Press any key to continue.\n";
if (<>)
{
#INCEPTION START
my #larr1 = ('a'..'z'); #LEVEL 1 +
foreach (#larr1)
{
my $layer1 = $_; #LEVEL 1 -
my #larr2 = ('a'..'z'); #LEVEL 2 +
foreach (#larr2)
{
my $layer2 = $_; # LEVEL 2 -
my#larr3 = ('a'..'z'); #LEVEL 3 +
foreach (#larr3)
{
my $layer3 = $_; #LEVEL 3 -
my#larr4 = ('a'..'z'); #LEVEL 4 +
foreach (#larr4)
{
my $layer4 = $_;
my $pass = ("$layer1$layer2$layer3$layer4");
print ($pass); #LEVEL 4 -
}
}
}
}
}
}
### DICTIONARY ATTACK ###
sub dict()
{
print "Dictionary Attack...\n"; #Prompt User
print "Provide wordlist: ";
my $uInput = "";
chomp($uInput = <>); #User provides wordlist
(open IN, $uInput) #Bring in wordlist
or die "Cannot open $uInput, $!"; #If we cannot open file, alert
my #dict = <IN>; #Throw the wordlist into an array
foreach (#dict)
{
print $_; #Debug, shows what word we are on
#next; #Debug
my $pass = AddSlashes($_); #To store the $_ value for later use
#Check pass call
my $status = system("unzip -qq -o -P $pass secret_file_dict.zip > /dev/null 2>&1"); #Return unzip system call set to var
#Catch the correct password
if ($status == 0)
{
print ("Return of unzip is ", $status, " and pass is ", $pass, "\n"); #Print out value of return as well as pass
last;
}
}
}
### MAIN ###
dict();
exit (0);
Here is my error
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
aerify
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).
-p extract files to pipe, no messages -l list files (short format)
-f freshen existing files, create none -t test compressed archive data
-u update files, create if necessary -z display archive comment only
-v list verbosely/show version info -T timestamp archive to latest
-x exclude files that follow (in xlist) -d extract files into exdir
modifiers:
-n never overwrite existing files -q quiet mode (-qq => quieter)
-o overwrite files WITHOUT prompting -a auto-convert any text files
-j junk paths (do not make directories) -aa treat ALL files as text
-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields
-C match filenames case-insensitively -L make (some) names lowercase
-X restore UID/GID info -V retain VMS version numbers
-K keep setuid/setgid/tacky permissions -M pipe through "more" pager
-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives
-I CHARSET specify a character encoding for UNIX and other archives
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
aerifying
It is obviously not complete. In the main I will switch the brute(); for dict(); as needed to test. Once I get the system call working I will throw that into the brute section.
If you need me to elaborate more on my issue, please let me know. I am focused here on learning, so please add idiot proof comments to any thing you respond to me with.
First: DO NOT USE PERL'S PROTOTYPES. They don't do what you or your professor might wish they do.
Second: Don't write homebrew escaping routines such as AddSlashes. Perl has quotemeta. Use it.
Your problem is not with the specific programming language. How much time your professor has spent on your problem, how many classes you take are irrelevant to the problem. Focus on the actual problem, not all the extraneous "stuff".
Such as, what is the point of sub brute? You are not calling it in this script, it is not relevant to your problem, so don't post it. Narrow down your problem to the smallest relevant piece.
Don't prompt for the wordlist file in the body of dict. Separate the functionality into bite sized chunks so in each context you can focus on the problem at hand. Your dict_attack subroutine should expect to receive either a filehandle or a reference to an array of words. To keep memory footprint low, we'll assume it's a filehandle (so you don't have to keep the entire wordlist in memory).
So, your main looks like:
sub main {
# obtain name of wordlist file
# open wordlist file
# if success, call dict_attack with filehandle
# dict_attack returns password on success
}
Now, you can focus on dict_attack.
#!/usr/bin/perl
use strict;
use warnings;
main();
sub dict_attack {
my $dict_fh = shift;
while (my $word = <$dict_fh>) {
$word =~ s/\A\s+//;
$word =~ s/\s+\z//;
print "Trying $word\n";
my $pass = quotemeta( $word );
my $cmd = "unzip -qq -o -P $pass test.zip";
my $status = system $cmd;
if ($status == 0) {
return $word;
}
}
return;
}
sub main {
my $words = join("\n", qw(one two three four five));
open my $fh, '<', \$words or die $!;
if (my $pass = dict_attack($fh)) {
print "Password is '$pass'\n";
}
else {
print "Not found\n";
}
return;
}
Output:
C:\...> perl y.pl
Trying one
Trying two
Trying three
Trying four
Trying five
Password is 'five'

How to quickly find and replace many items on a list without replacing previously replaced items in BASH?

I want to perform about many find and replace operations on some text. I have a UTF-8 CSV file containing what to find (in the first column) and what to replace it with (in the second column), arranged from longest to shortest.
E.g.:
orange,fruit2
carrot,vegetable1
apple,fruit3
pear,fruit4
ink,item1
table,item2
Original file:
"I like to eat apples and carrots"
Resulting output file:
"I like to eat fruit3s and vegetable1s."
However, I want to ensure that if one part of text has already been replaced, that it doesn't mess with text that was already replaced. In other words, I don't want it to appear like this (it matched "table" from within vegetable1):
"I like to eat fruit3s and vegeitem21s."
Currently, I am using this method which is quite slow, because I have to do the whole find and replace twice:
(1) Convert the CSV to three files, e.g.:
a.csv b.csv c.csv
orange 0001 fruit2
carrot 0002 vegetable1
apple 0003 fruit3
pear 0004 fruit4
ink 0005 item1
table 0006 item 2
(2) Then, replace all items from a.csv in file.txt with the matching column in b.csv, using ZZZ around the words to make sure there is no mistake later in matching the numbers:
a=1
b=`wc -l < ./a.csv`
while [ $a -le $b ]
do
for i in `sed -n "$a"p ./b.csv`; do
for j in `sed -n "$a"p ./a.csv`; do
sed -i "s/$i/ZZZ$j\ZZZ/g" ./file.txt
echo "Instances of '"$i"' replaced with '"ZZZ$j\ZZZ"' ("$a"/"$b")."
a=`expr $a + 1`
done
done
done
(3) Then running this same script again, but to replace ZZZ0001ZZZ with fruit2 from c.csv.
Running the first replacement takes about 2 hours, but as I must run this code twice to avoid editing the already replaced items, it takes twice as long. Is there a more efficient way to run a find and replace that does not perform replacements on text already replaced?
Here's a perl solution which is doing the replacement in "one phase".
#!/usr/bin/perl
use strict;
my %map = (
orange => "fruit2",
carrot => "vegetable1",
apple => "fruit3",
pear => "fruit4",
ink => "item1",
table => "item2",
);
my $repl_rx = '(' . join("|", map { quotemeta } keys %map) . ')';
my $str = "I like to eat apples and carrots";
$str =~ s{$repl_rx}{$map{$1}}g;
print $str, "\n";
Tcl has a command to do exactly this: string map
tclsh <<'END'
set map {
"orange" "fruit2"
"carrot" "vegetable1"
"apple" "fruit3"
"pear" "fruit4"
"ink" "item1"
"table" "item2"
}
set str "I like to eat apples and carrots"
puts [string map $map $str]
END
I like to eat fruit3s and vegetable1s
This is how to implement it in bash (requires bash v4 for the associative array)
declare -A map=(
[orange]=fruit2
[carrot]=vegetable1
[apple]=fruit3
[pear]=fruit4
[ink]=item1
[table]=item2
)
str="I like to eat apples and carrots"
echo "$str"
i=0
while (( i < ${#str} )); do
matched=false
for key in "${!map[#]}"; do
if [[ ${str:$i:${#key}} = $key ]]; then
str=${str:0:$i}${map[$key]}${str:$((i+${#key}))}
((i+=${#map[$key]}))
matched=true
break
fi
done
$matched || ((i++))
done
echo "$str"
I like to eat apples and carrots
I like to eat fruit3s and vegetable1s
This will not be speedy.
Clearly, you may get different results if you order the map differently. In fact, I believe the order of "${!map[#]}" is unspecified, so you might want to specify the order of the keys explicitly:
keys=(orange carrot apple pear ink table)
# ...
for key in "${keys[#]}"; do
One way to do it would be to do a two-phase replace:
phase 1:
s/orange/##1##/
s/carrot/##2##/
...
phase 2:
s/##1##/fruit2/
s/##2##/vegetable1/
...
The ##1## markers should be chosen so that they don't appear in the original text or the replacements of course.
Here's a proof-of-concept implementation in perl:
#!/usr/bin/perl -w
#
my $repls = $ARGV[0];
die ("first parameter must be the replacement list file") unless defined ($repls);
my $tmpFmt = "###%d###";
open(my $replsFile, "<", $repls) || die("$!: $repls");
shift;
my #replsList;
my $i = 0;
while (<$replsFile>) {
chomp;
my ($from, $to) = /\"([^\"]*)\",\"([^\"]*)\"/;
if (defined($from) && defined($to)) {
push(#replsList, [$from, sprintf($tmpFmt, ++$i), $to]);
}
}
while (<>) {
foreach my $r (#replsList) {
s/$r->[0]/$r->[1]/g;
}
foreach my $r (#replsList) {
s/$r->[1]/$r->[2]/g;
}
print;
}
I would guess that most of your slowness is coming from creating so many sed commands, which each need to individually process the entire file. Some minor adjustments to your current process would speed this up a lot by running 1 sed per file per step.
a=1
b=`wc -l < ./a.csv`
while [ $a -le $b ]
do
cmd=""
for i in `sed -n "$a"p ./a.csv`; do
for j in `sed -n "$a"p ./b.csv`; do
cmd="$cmd ; s/$i/ZZZ${j}ZZZ/g"
echo "Instances of '"$i"' replaced with '"ZZZ${j}ZZZ"' ("$a"/"$b")."
a=`expr $a + 1`
done
done
sed -i "$cmd" ./file.txt
done
Doing it twice is probably not your problem. If you managed to just do it once using your basic strategy, it would still take you an hour, right? You probably need to use a different technology or tool. Switching to Perl, as above, might make your code a lot faster (give it a try)
But continuing down the path of other posters, the next step might be pipelining. Write a little program that replaces two columns, then run that program twice, simultaneously. The first run swaps out strings in column1 with strings in column2, the next swaps out strings in column2 with strings in column3.
Your command line would be like this
cat input_file.txt | perl replace.pl replace_file.txt 1 2 | perl replace.pl replace_file.txt 2 3 > completely_replaced.txt
And replace.pl would be like this (similar to other solutions)
#!/usr/bin/perl -w
my $replace_file = $ARGV[0];
my $before_replace_colnum = $ARGV[1] - 1;
my $after_replace_colnum = $ARGV[2] - 1;
open(REPLACEFILE, $replace_file) || die("couldn't open $replace_file: $!");
my #replace_pairs;
# read in the list of things to replace
while(<REPLACEFILE>) {
chomp();
my #cols = split /\t/, $_;
my $to_replace = $cols[$before_replace_colnum];
my $replace_with = $cols[$after_replace_colnum];
push #replace_pairs, [$to_replace, $replace_with];
}
# read input from stdin, do swapping
while(<STDIN>) {
# loop over all replacement strings
foreach my $replace_pair (#replace_pairs) {
my($to_replace,$replace_with) = #{$replace_pair};
$_ =~ s/${to_replace}/${replace_with}/g;
}
print STDOUT $_;
}
A bash+sed approach:
count=0
bigfrom=""
bigto=""
while IFS=, read from to; do
read countmd5sum x < <(md5sum <<< $count)
count=$(( $count + 1 ))
bigfrom="$bigfrom;s/$from/$countmd5sum/g"
bigto="$bigto;s/$countmd5sum/$to/g"
done < replace-list.csv
sed "${bigfrom:1}$bigto" input_file.txt
I have chosen md5sum, to get some unique token. But some other mechanism can also be used to generate such token; like reading from /dev/urandom or shuf -n1 -i 10000000-20000000
A awk+sed approach:
awk -F, '{a[NR-1]="s/####"NR"####/"$2"/";print "s/"$1"/####"NR"####/"}; END{for (i=0;i<NR;i++)print a[i];}' replace-list.csv > /tmp/sed_script.sed
sed -f /tmp/sed_script.sed input.txt
A cat+sed+sed approach:
cat -n replace-list.csv | sed -rn 'H;g;s|(.*)\n *([0-9]+) *[^,]*,(.*)|\1\ns/####\2####/\3/|;x;s|.*\n *([0-9]+)[ \t]*([^,]+).*|s/\2/####\1####/|p;${g;s/^\n//;p}' > /tmp/sed_script.sed
sed -f /tmp/sed_script.sed input.txt
Mechanism:
Here, it first generates the sed script, using the csv as input file.
Then uses another sed instance to operate on input.txt
Notes:
The intermediate file generated - sed_script.sed can be re-used again, unless the input csv file changes.
####<number>#### is chosen as some pattern, which is not present in the input file. Change this pattern if required.
cat -n | is not UUOC :)
This might work for you (GNU sed):
sed -r 'h;s/./&\\n/g;H;x;s/([^,]*),.*,(.*)/s|\1|\2|g/;$s/$/;s|\\n||g/' csv_file | sed -rf - original_file
Convert the csv file into a sed script. The trick here is to replace the substitution string with one which will not be re-substituted. In this case each character in the substitution string is replaced by itself and a \n. Finally once all substitutions have taken place the \n's are removed leaving the finished string.
There are a lot of cool answers here already. I'm posting this because I'm taking a slightly different approach by making some large assumptions about the data to replace ( based on the sample data ):
Words to replace don't contain spaces
Words are replaced based on the longest, exactly matching prefix
Each word to replace is exactly represented in the csv
This a single pass, awk only answer with very little regex.
It reads the "repl.csv" file into an associative array ( see BEGIN{} ), then attempts to match on prefixes of each word when the length of the word is bound by key length limits, trying to avoid looking in the associative array whenever possible:
#!/bin/awk -f
BEGIN {
while( getline repline < "repl.csv" ) {
split( repline, replarr, "," )
replassocarr[ replarr[1] ] = replarr[2]
# set some bounds on the replace word sizes
if( minKeyLen == 0 || length( replarr[1] ) < minKeyLen )
minKeyLen = length( replarr[1] )
if( maxKeyLen == 0 || length( replarr[1] ) > maxKeyLen )
maxKeyLen = length( replarr[1] )
}
close( "repl.csv" )
}
{
i = 1
while( i <= NF ) { print_word( $i, i == NF ); i++ }
}
function print_word( w, end ) {
wl = length( w )
for( j = wl; j >= 0 && prefix_len_bound( wl, j ); j-- ) {
key = substr( w, 1, j )
wl = length( key )
if( wl >= minKeyLen && key in replassocarr ) {
printf( "%s%s%s", replassocarr[ key ],
substr( w, j+1 ), !end ? " " : "\n" )
return
}
}
printf( "%s%s", w, !end ? " " : "\n" )
}
function prefix_len_bound( len, jlen ) {
return len >= minKeyLen && (len <= maxKeyLen || jlen > maxKeylen)
}
Based on input like:
I like to eat apples and carrots
orange you glad to see me
Some people eat pears while others drink ink
It yields output like:
I like to eat fruit3s and vegetable1s
fruit2 you glad to see me
Some people eat fruit4s while others drink item1
Of course any "savings" of not looking the replassocarr go away when the words to be replaced goes to length=1 or if the average word length is much greater than the words to replace.

How do I repeat a character n times in a string?

I am learning Perl, so please bear with me for this noob question.
How do I repeat a character n times in a string?
I want to do something like below:
$numOfChar = 10;
s/^\s*(.*)/' ' x $numOfChar$1/;
By default, substitutions take a string as the part to substitute. To execute code in the substitution process you have to use the e flag.
$numOfChar = 10;
s/^(.*)/' ' x $numOfChar . $1/e;
This will add $numOfChar space to the start of your text. To do it for every line in the text either use the -p flag (for quick, one-line processing):
cat foo.txt | perl -p -e "$n = 10; s/^(.*)/' ' x $n . $1/e/" > bar.txt
or if it's a part of a larger script use the -g and -m flags (-g for global, i.e. repeated substitution and -m to make ^ match at the start of each line):
$n = 10;
$text =~ s/^(.*)/' ' x $n . $1/mge;
Your regular expression can be written as:
$numOfChar = 10;
s/^(.*)/(' ' x $numOfChar).$1/e;
but - you can do it with:
s/^/' ' x $numOfChar/e;
Or without using regexps at all:
$_ = ( ' ' x $numOfChar ) . $_;
You're right. Perl's x operator repeats a string a number of times.
print "test\n" x 10; # prints 10 lines of "test"
EDIT: To do this inside a regular expression, it would probably be best (a.k.a. most maintainer friendly) to just assign the value to another variable.
my $spaces = " " x 10;
s/^\s*(.*)/$spaces$1/;
There are ways to do it without an extra variable, but it's just my $0.02 that it'll be easier to maintain if you do it this way.
EDIT: I fixed my regex. Sorry I didn't read it right the first time.