I have this code which works as it should:
Get-Content $path\$newName -Encoding OEM |ForEach-Object {$_ -replace '<Num:(\d{8,20})>$','$1'}| Set-Content $path\$txtName -Encoding UTF8
The string is replaced by the digits. But I would like to be able to use $1 outside the loop.
Like:
write-host $1
For example. But if i do this noting is output.
Any suggestions?
Thanks.
Based on VivekKumarSinghs script.
$InFile = '.\test.txt'
$OutFile= '.\test2.txt'
$RegEx = "<Num:(\d{8,20})>$"
$array = #()
Get-Content $InFile -Encoding OEM | ForEach-Object {
if ($_ -match $RegEx ){$array += $matches[1]}
$_ -replace $RegEx,"`$1"
} | Set-Content $OutFile -Encoding UTF8
$array
> gc .\test.txt
<Num:1234567890>
<Num:23456789101112>
> .\SO_50579315.ps1
1234567890
23456789101112
> gc .\test2.txt
1234567890
23456789101112
One way would be assigning $1 to an array like this -
$array = #()
Get-Content $path\$newName -Encoding OEM | ForEach-Object {$_ -replace '<Num:(\d{8,20})>$','$1'; $array += $1 } | Set-Content $path\$txtName -Encoding UTF8
You can use the values of $1 like $array[0], $array[1], $array[2].. and so on.
Related
I'm trying to replace ALL accented letters and some strings in multiple files located in one folder. The strings replacement is working but not the accented letters
I've multiple files located in "C:\\FilePath"
I've created a Batch file with the following code:
#echo off
Powershell.exe -executionpolicy remotesigned -File C:\Users\User\Desktop\IFCParser.ps1
pause
And IFCParser.ps1 contains all the following lines, one after the other:
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCBuilding') {(Get-Content $_ | ForEach {$_ -replace 'IFCBuilding', 'IFCBuildingElementProxy'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCAnotherWord') {(Get-Content $_ | ForEach {$_ -replace 'IFCAnotherWord', 'IFCBuildingElementProxy'}) | Set-Content $_ }}
The above code DOES the job when I run the bat file, but I can't get the following part to work:
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'á', 'a'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'é' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'é', 'e'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'í', 'i'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ó', 'o'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ú', 'u'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Á', 'A'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'É' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'É', 'E'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Í', 'I'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ó', 'O'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ú', 'U'}) | Set-Content $_ }}
I'm testing this on a file like this:
áéíóúÁÉÍÓÚÑñáéíóúÁ
ÉÍÓÚÑñáéíóúÁÉÍÓÚÑñá
éíóúÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñáéíó
úÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñ
And it stays the same, no accents removed.
I think that I've something wrong with the encoding, I've run this with the parameter just in the first GetContent, only on the second one, and with no -Encoding at all.
By the way, I'm sure that there are more effective ways of doing this, but I'm just starting with this here and not finding one that works.
As for replacing the contents of the files in your folder, you should be able to do that using just one Get-ChildItem call.
Put this helper function on top of your script; it is used for replacing all the accented letters in the files:
function Replace-Diacritics {
Param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string] $Text
)
($Text.Normalize( [Text.NormalizationForm]::FormD ).ToCharArray() |
Where-Object {[Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne
[Globalization.UnicodeCategory]::NonSpacingMark }) -join ''
}
Now the rest of the code simplified:
Get-ChildItem -Path 'C:\FilePath\*.*' -File -Recurse | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw -Encoding UTF8 | Replace-Diacritics
$content -replace '\b(IFCBuilding|IFCAnotherWord)\b', 'IFCBuildingElementProxy' | Set-Content -Path $_.FullName -Encoding UTF8
}
Using your example file, the new content after calling `Replace-Diacritics``will be:
aeiouAEIOUNnaeiouA
EIOUNnaeiouAEIOUNna
eiouAEIOUNnaeiouAEIOUNnaeio
uAEIOUNnaeiouAEIOUNn
Operator -replace uses regex. The pattern '\b(IFCBuilding|IFCAnotherWord)\b' means to find he words 'IFCBuilding' OR 'IFCAnotherWord' as whole words (\b is a Word Boundary) and replace these with 'IFCBuildingElementProxy'.
If you also need this to be case-sensitive, use -creplace instead of -replace
For very large files, Get-Content may not be the cmdlet you'll want to use as it reads the file in memory as a whole.
To handle those large files using a combination of a StreamReader and a StreamWriter would be much more memory efficient (at the cost of more disk read/write actions).
Note that you cannot read a file and write to the same file simultaniously, so the code below will create a new name for the updated file by appending _New to the BaseName.
Again start with this helper function on top
function Replace-Diacritics {
Param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string] $Text
)
($Text.Normalize( [Text.NormalizationForm]::FormD ).ToCharArray() |
Where-Object {[Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne
[Globalization.UnicodeCategory]::NonSpacingMark }) -join ''
}
Get-ChildItem -Path 'C:\FilePath\*.*' -File -Recurse | ForEach-Object {
# create a StreamReader to read the file line-by-line
$reader = [System.IO.StreamReader]::new($_.FullName, [System.Text.Encoding]::UTF8)
# older PowerShell versions use:
# $reader = New-Object System.IO.StreamReader($_.FullName, [System.Text.Encoding]::UTF8)
# create a full path and filename for the updated output file
$outFile = Join-Path -Path $_.DirectoryName -ChildPath ('{0}_New{1}' -f $_.BaseName, $_.Extension)
# create a StreamWriter object to write the lines to the new output file
# The StreamWriter class by default writes files with UTF-8 encoding without a Byte-Order Mark (BOM)
$writer = [System.IO.StreamWriter]::new($outFile)
# loop through the lines of the file
while ($null -ne ($line = $reader.ReadLine())) {
if (![string]::IsNullOrWhiteSpace($line)) {
$line = ($line | Replace-Diacritics) -replace '\b(IFCBuilding|IFCAnotherWord)\b', 'IFCBuildingElementProxy'
}
$writer.WriteLine($line)
}
# clean up for next file
$writer.Flush()
$writer.Dispose()
$reader.Dispose()
}
Running a single line of code on a single file like this works as expected:
Get-ChildItem -Path C:\temp\testdata.txt | ForEach-Object {
If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {
(Get-Content $_ -Encoding UTF8 | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_ }
}
Given this, your code must be failing in the file recursion or in the execution process.
Run the script in an editor before trying to run as a batch and try adding error trapping. You can also add some logging to track down what's happening when running as batch:
Start-Transcript -Path 'c:\temp\outputlog.txt'
Try {
Get-ChildItem -Path C:\temp\testdata.txt -recurse -ErrorAction Stop | ForEach-Object {
Write-Host "Processing $_"
If (Get-Content $_.FullName -Encoding UTF8 -ErrorAction Stop | Select-String 'á' -AllMatches) {
Write-Host "Found match for á, replacing...."
(Get-Content $_ -Encoding UTF8 -ErrorAction Stop | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_ -ErrorAction Stop }
}
}
Catch {
$_
Stop-Transcript
}
Stop-Transcript
I have to process some text and got some difficulties:
The text .\text.txt is formatted like that:
name,
surname,
address,
name.
surname,
address,
etc.
What I want to achieve is join the objects that ends with the "," like this:
name,surname,address
name,surname,address
etc
I was working on something like this:
$content= path to the text.txt
$result= path to the result file
Get-Content -Encoding UTF8 $content | ForEach-object {
if ( $_ -match "," ) {
....join the selected lines....
}
} |Set-Content -Encoding UTF8 $result
What I need to consider is also that lines which terminate with "," may have a next line empty which should be a CR in the $result
You can do this by splitting the blocks of data on the empty newlines first:
# read the content of the file as one single multiline string
$content = Get-Content -Path 'Path\To\The\file.txt' -Raw -Encoding UTF8
# split on two or more newlines and dispose of empty blocks
$content -split '(\r?\n){2,}' | Where-Object { $_ -match '\S' } | ForEach-Object {
# trim the text block, split on newline and remove the trailing commas (or dots)
# output these joined with a comma
($_.Trim() -split '\r?\n' ).TrimEnd(",.") -join ','
} | Set-Content -Path 'Path\To\The\NEW_file.txt' -Encoding UTF8
Output:
name,surname,address
name,surname,address
all your terms ends with a , so you could use regex:
$content= "C:\test.txt"
$result= "path to the result file"
$CR = "`r`n"
$lines = Get-Content -Encoding UTF8 $content -raw
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
$lines = [regex]::new(',(?:\r?\n){2,}', $option).Replace($lines, $CR + $CR)
$lines = [regex]::new(',\r?\n', $option).Replace($lines, ",")
$lines | Out-File -FilePath $result -Encoding utf8
result:
name,surname,address
name1,surname,address
name,surname,address
name,surname,address
Below piece of code will give the required result.
$content= "Your file path"
$resultPath = "result file path"
Get-Content $content | foreach {
$data = $_
if($data -eq "address,")
{
$NewData = $data -replace ',',''
$data = $NewData + "`r`n"
}
$out = $out + $data
}
$out | Out-File $resultPath
At the moment I have this code to replace all the headers of my csv file.
$Csv = Import-Csv "$treatmentfolder\2_1_traitement.csv"
$OldColumnHeaders = "Avis,N° invent.,Cd.Srv.Cl.,NumOrdre"
$NewColumnHeaders = "avis","num_inventaire","cd_srv_cl","num_ordre"
$i=0
ForEach ($header in $OldColumnHeaders){
if ($header -ne $NewColumnHeaders[$i]){
$Csv |
Select-Object *,#{n=$NewColumnHeaders[$i]; e={$header} } -Exclude $header |
Export-Csv -NoTypeInformation "$treatmentfolder\2_1_2_traitement.csv"
(gc "$treatmentfolder\2_1_2_traitement.csv") |
% {$_ -replace '"', ""} |
out-file "$treatmentfolder\2_2_traitement.csv" -Fo -Encoding UTF8
$Csv= Import-Csv "$treatmentfolder\2_2_traitement.csv"
}
$i += 1
}
The problem that I have is that I have an error that says that the "avis" already exists as a column header even though the values are different with the uppercase and lowercase 'a'. How can I change replace this header then?
There are 2 text files in the CWD, a.txt, b.txt. From a.txt, I would like to delete all lines whose first 5 characters are NOT present in b.txt as any lines' first 5 characters. (Or, stating otherwise, keep only those lines in a.txt, whose first 5 characters is present in b.txt as any lines' first 5 characters.) Content after the 5th character to the end of the line is irrelevant.
For example: a.txt
abcde000dsdsddsdsdsdsdsd
0123456xxx
kkk
xyzxyzxyzfeeeee
kkkkkkkkkkk
and b.txt:
012345aabbcc
kkkkkkkhhkkvv
nnnnnnn5777nnnn77567
Intended result (lines in a.txt whose 1-5 character is present in b.txt):
0123456xxx
kkkkkkkkkkk
When I am running the code, it gives me an empty results.txt, but no error messages. What I am missing?
$pattern = "^[5]"
$set1 = Get-Content -Path a.txt
$results = New-Object -TypeName System.Text.StringBuilder
Get-Content -Path b.txt | foreach {
if ($_ -match $pattern) {
[void]$results.AppendLine($_)
}
}
$results.ToString() | Out-File -FilePath .\results.txt -Encoding ascii
Your code doesn't work because your pattern doesn't match anything. The regular expression ^[5] means "the character '5' at the beginning of the string" (the square brackets define a character class), not "5 characters at the beginning of the string". The latter would be ^.{5}. Also, you never match the content of a.txt against the content of b.txt.
There are several ways to do what you want:
Extract the first 5 characters from each line of b.txt. to an array and compare the lines of a.txt against that array. Esperento57's answer sort of uses this approach, but in a way that requires PowerShell v3 or newer. A variant that'll work on all PowerShell versions could look like this:
$pattern = '^(.{5}).*'
$ref = (Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
Get-Unique
Get-Content 'a.txt' | Where-Object {
$ref -contains ($_ -replace $pattern, '$1')
} | Set-Content 'results.txt'
Since lookups in arrays are comparatively slow and don't scale well (they get significantly slower with increasing number of elements in the array) you could also put the reference values in a hashtable so you can do index lookups (which are significantly faster):
$pattern = '^(.{5}).*'
$ref = #{}
(Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
ForEach-Object { $ref[$_] = $true }
Get-Content 'a.txt' | Where-Object {
$ref.ContainsKey(($_ -replace $pattern, '$1'))
} | Set-Content 'results.txt'
Another alternative would be to build a second regular expression from the substrings extracted from b.txt and compare the content of a.txt against that expression:
$pattern = '^(.{5}).*'
$list = (Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
Get-Unique |
ForEach-Object { [regex]::Escape($_) }
$ref = '^({0})' -f ($list -join '|')
(Get-Content 'a.txt') -match $ref | Set-Content 'results.txt'
Note that each of these approaches will ignore lines shorter than 5 characters.
try Something like this:
$listB=get-content "c:\temp\b.txt" | where {$_.Length -gt 4} | select #{N="First5";E={$_.Substring(0, 5)}}
get-content "c:\temp\a.txt" | where {$_.Length -gt 4 -and $_.Substring(0, 5) -in $listB.First5}
If performance is a concern, consider to use the hashtable(s) as index:
$Pattern = '^(.{5}).*'
$a = #{}; $b = #{}
Get-Content -Path a.txt | Where {$_ -Match $Pattern} | ForEach {$a[$Matches[1]] = #($a[$Matches[1]] + $_)}
Get-Content -Path b.txt | Where {$_ -Match $Pattern} | ForEach {$b[$Matches[1]] = #($b[$Matches[1]] + $_)}
$a.Keys | Where {$b.Keys -Contains $_} | ForEach {$a.$_} | Set-Content results.txt
How can I avoid getting a blank line at the end of an Out-File?
$DirSearcher = New-Object System.DirectoryServices.DirectorySearcher([adsi]'')
$DirSearcher.Filter = '(&(objectClass=Computer)(!(cn=*esx*)) (!(cn=*slng*)) (!(cn=*dcen*)) )'
$DirSearcher.FindAll().GetEnumerator() | sort-object { $_.Properties.name } `
| ForEach-Object { $_.Properties.name }`
| Out-File -FilePath C:\Computers.txt
I have tried several options and none of them seem to do anything, they all still have a blank line at the end.
(get-content C:\Computers.txt) | where {$_ -ne ""} | out-file C:\Computers.txt
$file = C:\Computers.txt
Get-Content $file | where {$_.Length -ne 0} | Out-File "$file`.tmp"
Move-Item "$file`.tmp" $file -Force
Use [IO.File]::WriteAllText:
[IO.File]::WriteAllText("$file`.tmp",
((Get-Content $file) -ne '' -join "`r`n"),
[Text.Encoding]::UTF8)
Often when you're looking to see if strings have no character data, you will want to use String.IsNullOrWhiteSpace():
Get-Content $file | Where-Object { ![String]::IsNullOrWhiteSpace($_) } | Out-File "$file`.tmp"
this the best solution for the avoiding the empty line at the end of txt file using powershell command
Add-Content C:\Users\e5584332\Desktop\CSS.txt "Footer | Processed unique data || $count " -NoNewline