Generate word from list of characters - perl

I asked this question and I realized I was asking the question incorrectly, though the answer #Zdim provided is exactly what I asked: So now I need to change that question a bit.
my $str = 'aaaa';
print $str++, $/ while $str le 'dddd';
So the above code does each combination from aaaa to dddd for instance:
aaaa
aaab
aaac
...
daaa
...
dddd
However, we need to generate all the possible combinations of a given set of the given characters. whether they are numeric, special characters or alphabetical characters. So If I tell the script the minimum 2 and maximum is 4 letter words and I give an input string of:
abcdefG1234%##
it will then generate:
aa
aaa
aaaa
bb
aaab
bbbb
####
abc#
ab#1
...
So it should use each of the characters and create each possible combination from minimum 2 characters to maximum 4 characters.
So even if I give the entire alphanumeric and special characters, it will create each possible word or string within the range of 2 to 4 characters.
If We take this glob example, it is close, but it will only do all the sets of 4, not all combinations from 2, then 3 and then 4
print, while glob '{A,B,C,D,#,#,a,d,e,f}'x4

for my $i (2..4) {
say while glob '{A,B,C,D,#,#,a,d,e,f}' x $i;
}

One way for this is to use a little extension of the linked question and answer. To generate the sequence of ascii codes which will be sampled from, from a given string
perl -wE'say for map { ord($_) } split "", q(abcdefG1234%##)'
Now with that list on hand, run the code from the linked page for sequences of length 2 through 4.

Related

Using sed to replace a number located between two other numbers

I need to replace a numeric value, that occurs in a specific line of a series of config files in a pattern like this:
string number_1 number_to_replace number_2
I want to obtain something like this:
string number_1 number_replaced number_2
The difficulties I encountered are:
number_1 or number_2 can be equal to number_to_replace, so a simple replacement is not possible.
number_1 and number_2 vary between config files so I don't know them in advance.
The closest attempt I got until now is:
echo "field 4 4 4" | sed 's/\s4\s/3/'
Which ouputs:
field34 4
This is close, given that I want to replace the intermediate number I added another "\s" to try to use the known fact that the line starts with a character.
echo "field 4 4 4" | sed 's/\s\s4\s/3/'
Which gives:
field 4 4 4
So, nothing is replaced this time. How can I proceed? A somewhat detailed explanation would be ideal, because my knowledge of replacing expressions that involve patterns in nearly zero.
Thanks.
You can do something like below, which matches your exact sequence of digits as in the example. You could replace 3 with any digit of your choice.
sed 's/\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)/\1 3 \3/'
Notice that I've used the POSIX bracket expression to match the whitespace character which should be supported in any variant of sed you are using. Note that \s is supported in only the GNU variants.
The literal meaning of the regex definition is to match a single digit followed by a space, then a digit and space and another digit. The captured groups are stored from \1. Since your intention is to remove the 2nd digit, you replace that with the word of your choice.
If the extra escapes causes it unreadable, use the -E flag for extended regex support. I've used the default BRE version

Insert new line after any amount of numeric characters

I need to insert a new line, or delimiter, in a text file after a "numeric" string consisting of 10 numbers, then a "-", then either 1 to 4 numbers...
Example:
randomtext,1234567890-1234blahblah
Should be:
randomtext,1234567890-1234, blahblah
Or:
randomtext,1234567890-1234
blahblah
Note that the set of numbers will always be 10 characters, the numbers after the - will either 1,2,3 or 4 characters.
I've used sed a lot for similar tasks, but can't find a way to work with the last set of numbers which vary from 1 to 4 characters....
I really hope someone can help!
Many thanks!
$ echo randomtext,1234567890-1234blahblah |
sed -E 's/[0-9]{10}+-[0-9]{1,4}/&\n/'
randomtext,1234567890-1234
blahblah

Alternating string

I am trying to use regular expressions to match against a string that starts with 7 numbers, then has a "K" inbetween it, and then 3 numbers again. For example:
1234567K890.
I currently have $_a -match '^\d{7}K\d{3}'. However, this does not work for my purposes. Does anyone have a solution?
PS C:\> "1234567K890" -match "\d{7}(k)\d{3}"
This \d{7} matches 7 digits then (k) matches letter k and \d{3} matches last three characters.
Tested this, works for your example and some others:
$string = "1234567K890"
$string -match '^[0-9]{7}(k)[0-9]{3}$'"
It matches against exactly 7 numbers, then against K (casing does not matter), then against exactly 3 numbers. The characters at the beginning and the end of the string restrict against whitespace at the beginning and end of the string -- if you want whitespace to be allowed, you can just remove them.
Here's a powershell regex reference, which may help in the future.

Searching for recursive pattern within a string

Using Perl, I want to search a string of nucleotides (AGCT) for pattern of no less and no more than three nucleotides that repeat consecutively at least seven times. I need to also save that combination for print to file as well as a total count.
The pattern of these three nucleotides will be unknown in the sense that while there are only 64 possible combinations, we will not know which one will be the repeating combination.
I have two lines of thought going in my head about how to go about this:
Create a list of the possible combinations and check against that, while producing a count. This doesn't seem feasible because every three nucleotides would produce a match. And it still wouldn't solve the problem of consecutive matching.
OR Check the first three nucleotides against the next three, if matching, check the next three. If no match, shift the reading frame to the second nucleotide in the string and try the search again.
This regex ought to do the trick:
/( ([ACGT]{3}) \2{6,} )/x
Match three chars of ACGT, then repeat the capture $2 at least six additional times. The whole matched string is in $1 and will have three times the length of actual groups: $n = length($1)/3.
Test:
my $regex = qr/( ([ACGT]{3}) \2{6,} )/x;
"TACGACGACGACGACGACGACGACGT" =~ $regex;
printf "Matched %s exactly %d times\n", $2, length($1)/3;
Output:
Matched ACG exactly 8 times
Looks good.

perl How to replace one occurence of a character with two

I would like to translate all instances of a character with two characters. The usual way I would do it is:
$text =~ s/a/aa/g;
I only want single instances of a character to be doubled. So aa would remain aa and not turn into aaaa.
I am thinking I have to use variables in the s/// statement but I cannot find any suitable pattern here or on the net.
Match instances of a that are not next to another a:
s/(?<!a)a(?!a)/aa/g;