Longest common substring for more than two strings in PowerShell? - powershell

how can I find the matching strings in an array of strings in PowerShell:
example:
$Arr = "1 string first",
"2 string second",
"3 string third",
"4 string fourth"
Using this example, I want this returned:
" string "
I want to use this to find matching parts of file names and then remove that part of the file name (like removing the artist's name from a set of mp3 files for example), without having to specify which part of the file name should be replaced manually.

$arr = "qdfbsqds", "fbsqdt", "bsqda"
$arr | %{
$substr = for ($s = 0; $s -lt $_.length; $s++) {
for ($l = 1; $l -le ($_.length - $s); $l++) {
$_.substring($s, $l);
}
}
$substr | %{$_.toLower()} | select -unique
} | group | ?{$_.count -eq $arr.length} | sort {$_.name.length} | select -expand name -l 1
# returns bsqd
produce a list of all the unique substrings of the inputstrings
filter for substrings that occur #inputstrings times (i.e. in all input strings)
sort these filtered substrings based on the length of the substring
return the last (i.e. longest) of this list

If it ( the artist name etc) is only going to be a single word:
$Arr = "1 string first", "2 string second", "3 string third", "4 string fourth"
$common = $Arr | %{ $_.split() } | group | sort -property count | select -last 1 | select -expand name
$common = " {0} " -f $common
Update:
Implementation that seems to work for multiple words ( finding the longest common substring of words):
$arr = "1 string a first", "2 string a second", "3 string a third", "4 string a fourth"
$common = $arr | %{
$words = $_.split()
$noOfWords = $words.length
for($i=0;$i -lt $noOfWords;$i++){
for($j=$i;$j -lt $noOfWords;$j++){
$words[$i..$j] -join " "
}
}
} | group | sort -property count,name | select -last 1 | select -expand name
$common = " {0} " -f $common
$common

Here is a "Longest Common Substring" function for two strings in PowerShell (based on wikibooks C# example):
Function get-LongestCommonSubstring
{
Param(
[string]$String1,
[string]$String2
)
if((!$String1) -or (!$String2)){Break}
# .Net Two dimensional Array:
$Num = New-Object 'object[,]' $String1.Length, $String2.Length
[int]$maxlen = 0
[int]$lastSubsBegin = 0
$sequenceBuilder = New-Object -TypeName "System.Text.StringBuilder"
for ([int]$i = 0; $i -lt $String1.Length; $i++)
{
for ([int]$j = 0; $j -lt $String2.Length; $j++)
{
if ($String1[$i] -ne $String2[$j])
{
$Num[$i, $j] = 0
}else{
if (($i -eq 0) -or ($j -eq 0))
{
$Num[$i, $j] = 1
}else{
$Num[$i, $j] = 1 + $Num[($i - 1), ($j - 1)]
}
if ($Num[$i, $j] -gt $maxlen)
{
$maxlen = $Num[$i, $j]
[int]$thisSubsBegin = $i - $Num[$i, $j] + 1
if($lastSubsBegin -eq $thisSubsBegin)
{#if the current LCS is the same as the last time this block ran
[void]$sequenceBuilder.Append($String1[$i]);
}else{ #this block resets the string builder if a different LCS is found
$lastSubsBegin = $thisSubsBegin
$sequenceBuilder.Length = 0 #clear it
[void]$sequenceBuilder.Append($String1.Substring($lastSubsBegin, (($i + 1) - $lastSubsBegin)))
}
}
}
}
}
return $sequenceBuilder.ToString()
}
To use this for more than two strings, use it like this:
Function get-LongestCommonSubstringArray
{
Param(
[Parameter(Position=0, Mandatory=$True)][Array]$Array
)
$PreviousSubString = $Null
$LongestCommonSubstring = $Null
foreach($SubString in $Array)
{
if($LongestCommonSubstring)
{
$LongestCommonSubstring = get-LongestCommonSubstring $SubString $LongestCommonSubstring
write-verbose "Consequtive diff: $LongestCommonSubstring"
}else{
if($PreviousSubString)
{
$LongestCommonSubstring = get-LongestCommonSubstring $SubString $PreviousSubString
write-verbose "first one diff: $LongestCommonSubstring"
}else{
$PreviousSubString = $SubString
write-verbose "No PreviousSubstring yet, setting it to: $PreviousSubString"
}
}
}
Return $LongestCommonSubstring
}
get-LongestCommonSubstringArray $Arr -verbose

If I understand your question:
$Arr = "1 string first", "2 string second", "3 string third", "4 string fourth"
$Arr -match " string " | foreach {$_ -replace " string ", " "}

Related

How would you generate a unique sequence from 000 to 9ZZ

How would you generate a unique sequence from 000 to 9ZZ. Lastly, my export-csv is not working. Please see the data and export output below.
The alphanumeric sequence starts from 0 through 9 and then A through Z.
Please be advised, my PowerShell skill are a bit new. :)
$i = #()
$a = 0..9
$b = 65..90 | Foreach{[Char]$_}
$i = $a + $b
For($d = 0; $d -le $i.count; $d++){
$g = $i[$d]
For($e = 0; $e -le $i.count; $e++){
$h = $i[$e]
For($f = 0; $f -le $i.count; $f++){
$j = $i[$f]
$k = "{0}{1}{2}" -f $g, $h, $j
$k #| Export-Csv -Path .\List.csv -NoTypeInformation -Append
If($k -eq '9ZZ'){
Break
}
}
}
}
Data Output:
000
001
.
.
|
V
009
00A
.
.
00Z
00 <-- I don't get this.
010
Export:
Length
3 <-- I don't get this either.
.
.
|
v
3
Any and all help is appreciated. Thank you in Advanced. ;)
I would do it this way, with 3 foreach loops and a labeled break:
$digits = 0..9
$chars = (65..90).ForEach([char])
$dict = $digits + $chars
$result = :outer foreach($i in $dict)
{
foreach($x in $dict)
{
foreach($z in $dict)
{
'{0}{1}{2}' -f $i,$x,$z
if($i -eq 9 -and $x -eq 'Z' -and $z -eq 'Z')
{
break outer
}
}
}
}
$result | Export-Csv ...
This is a classic off-by-1 error - your loops run from 0 through $i.count (included), which is exactly 1 longer than $i.
Change -le $i.count to -lt $i.count in all 3 loop conditions and it'll work.
You could simplify your code by generating the full range of digits/characters up front a little differently, and then use 3 nested foreach loops instead:
$digits = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'.ToCharArray()
$ranges =
:outerLoop
foreach($a in $digits){
foreach($b in $digits){
foreach($c in $digits){
# save and output new value
($label = "${a}${b}${c}")
# exit the label generation completely if we've reached the desired upper boundary
if($label -eq '9ZZ'){ break outerLoop }
}
}
}
$ranges now contain the correct range of labels from 000 through 9ZZ
To complement the helpful existing answers with a generalized solution for producing permutations of characters with a given number of places, using recursive helper function Get-Permutations:
function Get-Permutations {
param(
[Parameter(Mandatory)]
[string] $Chars, # e.g. '0123456789'
[Parameter(Mandatory)]
[uint] $NumPlaces # e.g. 2, to produce '00', '01', ..., '99'
)
switch ($NumPlaces) {
0 { return }
1 { [string[]] $Chars.ToCharArray() }
default {
Get-Permutations -Chars $Chars -NumPlaces ($NumPlaces-1) | ForEach-Object {
foreach ($c in $chars.ToCharArray()) { $_ + $c }
}
}
}
}
You'd call it as follows (to get '000', '001', ..., but all the way up to 'ZZZ' - you can post-filter with ... | Where-Object { $_ -le '9ZZ' })
Get-Permutations '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' 3
Note:
In PowerShell (Core) 7+ you can create the string of characters with
-join ('0'..'9' + 'A'..'Z')
In Windows PowerShell, where you cannot use [char] instances with the range operator, .., you can use the following, more concise alternative to your approach:
-join ([char[]] (48..57) + [char[]] (65..90)), based on the chars.' code points obtained with, e.g. [int] [char] 'A'
similarly you could cheat since '009' -le '00A'
$sb = [System.Text.StringBuilder]::new()
function inc([string]$s){
[byte[]]$v = $s.ToCharArray()
$radix = $v.Count-1
$carry = $true
while($carry -and $radix -ge 0){
$carry = $false
switch(++$v[$radix]){
91{$v[$radix--] = 48; $carry = $true}
58{$v[$radix] = 65}
}
}
$null = $sb.clear()
if($carry){$null = $sb.Append('1')}
return $sb.Append([char[]]$v).ToString()
}
Describe 'Test inc function' {
It 'increments 000' {inc '000' | should be '001'}
It 'increments 001' {inc '001' | should be '002'}
It 'increments 008' {inc '008' | should be '009'}
It 'increments 009' {inc '009' | should be '00A'}
It 'increments 00A' {inc '00A' | should be '00B'}
It 'increments 00Y' {inc '00Y' | should be '00Z'}
It 'increments 00Z' {inc '00Z' | should be '010'}
It 'increments 010' {inc '010' | should be '011'}
It 'increments 0ZZ' {inc '0ZZ' | should be '100'}
It 'increments 1ZZ' {inc '1ZZ' | should be '200'}
It 'increments ZZZ' {inc 'ZZZ' | should be '1000'}
}
measure-command {for($i = '000'; $i -le '9ZZ'; $i = inc $i){$i}}
&{for($i = '000'; $i -le '9ZZ'; $i = inc $i){
$i
}} | set-content -Path .\List.csv
Edit: unnecessarily faster
Inspired by mklement0's awesome answer
$cache = #{}
function Get-Permutations{
param(
[Parameter(Mandatory=$true)][string]$Alphabet # e.g. '0123456789'
,[Parameter(Mandatory=$true)][uint32]$Length # e.g. 2, to produce '00', '01', ..., '99'
)
if($Length -eq 0)
{return}
if($Length -eq 1)
{return $Alphabet.ToCharArray()}
if($Alphabet -NotIn $cache.Keys){
$cache.$Alphabet = new-object String[][] $Length
$cache.$Alphabet[1] = $Alphabet.ToCharArray()
}elseif($cache.$Alphabet.Count -lt $Length){
$tmp = new-object String[][] $Length
[Array]::Copy($cache.$Alphabet,$tmp,$cache.$Alphabet.Count)
$cache.$Alphabet = $tmp
}
Get-Permutations-Helper $Alphabet $Length
}
function Get-Permutations-Helper{
param(
[string]$Alphabet
,[uint32]$Length
)
$TailLength = $Length-1
if($cache.$Alphabet[$TailLength] -eq $null){
$cache.$Alphabet[$TailLength] = Get-Permutations-Helper $Alphabet $TailLength
}
foreach($head in $Alphabet.ToCharArray()){
foreach($tail in $cache.$Alphabet[$TailLength]){
$head + $tail
}
}
}
Describe 'Test Get-Permutations function' {
It 'Makes an ordered ABC,2 permutation' {
$valid = #('AA','AB','AC','BA','BB','BC','CA','CB','CC')
$cache = #{}; $i = 0
Get-Permutations 'ABC' 2 | foreach-object {
$_ | should be $valid[$i++]
}
}
It 'Extends the cache' {
$cache = #{}
$null = Get-Permutations 'ABC' 2
$cache.'ABC'[1][0] | should be 'A'
$cache.'ABC'[1][0] = 'test'
$results = Get-Permutations 'ABC' 3
$cache.'ABC'[2][0] | should be 'Atest'
$results[0] | should be 'AAtest'
}
}
$digits = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
$spin = 10
$ttl=$null;for($i=0;$i -lt $spin;$i++){
$cache = #{}
$ttl+=measure-command {
Get-Permutations $digits 3
}
};$ttl;$ttl.TotalMilliseconds/$spin
$ttl=$null;for($i=0;$i -lt $spin;$i++){
$cache = #{}
$ttl+=measure-command {
$enu = (Get-Permutations $digits 3).GetEnumerator();
while($enu.MoveNext() -and ($v = $enu.Current) -le '9ZZ'){$v}
}
};$ttl;$ttl.TotalMilliseconds/$spin
$enu = (Get-Permutations $digits 3).GetEnumerator();
&{while($enu.MoveNext() -and ($v = $enu.Current) -le '9ZZ'){
$v
}} | set-content -Path .\List.csv

How to remove the last comma in powershell?

Below is my script and I want to pass the value if not null to my $weekDays variable in comma separated format but I want to remove the last comma, so please help me on this.
$a = "sun"
$b = "mon"
$c = $null
$d = $a,$b,$c
$weekDays = $null
Foreach ($i in $d)
{
if ($i)
{
$weekDays = $i
$weekDays = $weekDays + ","
Write-Host "$weekDays"
}
}
Output: sun,mon,
I want: sun, mon
No need to loop through the list yourself since there's already the -join operator for that purpose, but you need to remove the null elements first
($d | Where-Object { $_ -ne $null }) -join ", "
Where-Object (alias where) will filter out the null elements.
If you just want to exclude the last null item then use this
$d[0..($d.Length - 2)] -join ", "
Note that your code produces the below output
sun,
mon,
and not sun,mon, in the same line. To print with new lines like that you need to use
($d | where { $_ -ne $null }) -join ",`n"
$a = "sun"
$b = "mon"
$c = $null
$d = $a,$b,$c
$weekDays = $null
Foreach ($i in $d)
{
if ($i)
{
if (-not ([string]::IsNullOrEmpty($weekDays)))
{
$weekDays += ","
}
$weekDays = $weekDays + $i
}
}
Write-Host "$weekDays"
##Output : sun,mon, I want : sun, mon
Your variable is null at start and you want to prepend a comma in all subsequent cases, that is, when the variable is no longer null.

I would like to loop this array and Insert the Value Array by Array

$number = $args[0]
$myarray = &{$h = #{}; foreach ($a in 1..$number) { $h["array$a"] = ""; } return $h; }
Write-Output $myarray
echo ============================================
if ($myarray["array1"] -eq "") {
$myarray["array1"] = "C:\myscripts\Powershell_testing\FolderA\aa.txt"
Write-Output $myarray
}
elseif ($myarray["array2"] -eq "") {
$myarray["array2"] = "C:\myscripts\Powershell_testing\FolderB\bb.txt"
Write-Output $myarray
}
elseif ($myarray["array3"] -eq "") {
$myarray["array3"] = "C:\myscripts\Powershell_testing\FolderC\cc.txt"
Write-Output $myarray
}
else {
Write-Output "Array is file exist"
}
enter image description here
First of all, you are not dealing with arrays at all. Your code is apparently meant to fill a Hashtable with constructed file paths as values.
Why not use a simple for loop to construct this filepath and add to the hash at the same time?
Something like:
$number = 4
$myHash = #{}
for ($i = 1; $i -le $number; $i++) {
# construct a file path and add that as value to the hashtable
$file = "C:\myscripts\Powershell_testing\Folder{0}\{1}{1}.txt" -f [char](64 + $i), [char](96 + $i)
$myHash["value$i"] = $file
}
$myHash
Output:
Name Value
---- -----
value1 C:\myscripts\Powershell_testing\FolderA\aa.txt
value3 C:\myscripts\Powershell_testing\FolderC\cc.txt
value4 C:\myscripts\Powershell_testing\FolderD\dd.txt
value2 C:\myscripts\Powershell_testing\FolderB\bb.txt
Or if you want it ordered (which a Hashtable by definition is not):
$number = 4
$myHash = [ordered]#{}
for ($i = 1; $i -le $number; $i++) {
# construct a file path and add that as value to the hashtable
$file = "C:\myscripts\Powershell_testing\Folder{0}\{1}{1}.txt" -f [char](64 + $i), [char](96 + $i)
$myHash["value$i"] = $file
}
$myHash
Output:
Name Value
---- -----
value1 C:\myscripts\Powershell_testing\FolderA\aa.txt
value2 C:\myscripts\Powershell_testing\FolderB\bb.txt
value3 C:\myscripts\Powershell_testing\FolderC\cc.txt
value4 C:\myscripts\Powershell_testing\FolderD\dd.txt
[char](64 + $i) and [char](96 + $i) give you the characters you need. ASCII 65 --> A and ASCII 97 --> a, and so on.

Convert Array of Numbers into a String of Ranges

I was asking myself how easily you could convert an Array of Numbers Like = 1,2,3,6,7,8,9,12,13,15 into 1 String that "Minimizes" the numbers, so Like = "1-3,6-9,12-13,15".
I am probably overthinking it because right now I don't know how I could achieve this easily.
My Attempt:
$newArray = ""
$array = 1,2,3,6,7,8,9,12,13,15
$before
Foreach($num in $array){
If(($num-1) -eq $before){
# Here Im probably overthinking it because I don't know how I should continue
}else{
$before = $num
$newArray += $num
}
}
This should working, Code is self explaining, hopefully:
$array = #( 1,2,3,6,7,8,9,12,13,15 )
$result = "$($array[0])"
$last = $array[0]
for( $i = 1; $i -lt $array.Length; $i++ ) {
$current = $array[$i]
if( $current -eq $last + 1 ) {
if( !$result.EndsWith('-') ) {
$result += '-'
}
}
elseif( $result.EndsWith('-') ) {
$result += "$last,$current"
}
else {
$result += ",$current"
}
$last = $current
}
if( $result.EndsWith('-') ) {
$result += "$last"
}
$result = $result.Trim(',')
$result = '"' + $result.Replace(',', '","') +'"'
$result
I have a slightly different approach, but was a little too slow to answer. Here it is:
$newArray = ""
$array = 1,2,3,6,7,8,9,12,13,15
$i = 0
while($i -lt $array.Length)
{
$first = $array[$i]
$last = $array[$i]
# while the next number is the successor increment last
while ($array[$i]+1 -eq $array[$i+1] -and ($i -lt $array.Length))
{
$last = $array[++$i]
}
# if only one in the interval, output that
if ($first -eq $last)
{
$newArray += $first
}
else
{
# else output first and last
$newArray += "$first-$last"
}
# don't add the final comma
if ($i -ne $array.Length-1)
{
$newArray += ","
}
$i++
}
$newArray
Here is another approach to the problem. Firstly, you can group elements by index into a hashtable, using index - element as the key. Secondly, you need to sort the dictionary by key then collect the range strings split by "-" in an array. Finally, you can simply join this array by "," and output the result.
$array = 1, 2, 3, 6, 7, 8, 9, 12, 13, 15
$ranges = #{ }
for ($i = 0; $i -lt $array.Length; $i++) {
$key = $i - $array[$i]
if (-not ($ranges.ContainsKey($key))) {
$ranges[$key] = #()
}
$ranges[$key] += $array[$i]
}
$sequences = #()
$ranges.GetEnumerator() | Sort-Object -Property Key -Descending | ForEach-Object {
$sequence = $_.Value
$start = $sequence[0]
if ($sequence.Length -gt 1) {
$end = $sequence[-1]
$sequences += "$start-$end"
}
else {
$sequences += $start
}
}
Write-Output ($sequences -join ",")
Output:
1-3,6-9,12-13,15

Replace each occurrence of string in a file dynamically

I have some text file which has some occurrences of the string "bad" in it. I want to replace each occurrence of "bad" with good1, good2, good3, ,, good100 and so on.
I am trying this but it is replacing all occurrences with the last number, good100
$raw = $(gc raw.txt)
for($i = 0; $i -le 100; $i++)
{
$raw | %{$_ -replace "bad", "good$($i)" } > output.txt
}
How to accomplish this?
Try this:
$i = 1
$raw = $(gc raw.txt)
$new = $raw.split(" ") | % { $_ -replace "bad" , "good($i)" ; if ($_ -eq "bad" ) {$i++} }
$new -join " " | out-file output.txt
This is good if the raw.txt is single line and contains the word "bad" always separed by one space " " like this: alfa bad beta bad gamma bad (and so on...)
Edit after comment:
for multiline txt:
$i = 1
$new = #()
$raw = $(gc raw.txt)
for( $c = 0 ; $c -lt $raw.length ; $c++ )
{
$l = $raw[$c].split(" ") | % { $_ -replace "bad" , "good($i)" ; if ($_ -eq "bad" ) {$i++} }
$l = $l -join " "
$new += $l
}
$new | out-file output.txt
For such things, I generally use Regex::Replace overload that takes a Matchevaluator:
$evaluator ={
$count++
"good$count"
}
gc raw.txt | %{ [Regex]::Replace($_,"bad",$evaluator) }
The evaluator also gets the matched groups as argument, so you can do some advanced replaces with it.
Here's another way, replacing just one match at a time:
$raw = gc raw.txt | out-string
$occurrences=[regex]::matches($raw,'bad')
$regex = [regex]'bad'
for($i=0; $i -le $occurrences.count; $i++)
{
$raw = $regex.replace($raw,{"good$i"},1)
}
$raw