Keep Same Encoding With Set-Content Multiple Files in PowerShell - powershell

I'm attempting to write a script to be used to migrate an application from server to server and/or from one drive letter to another drive letter. My goal is to copy the directory from one location, move it to another, and then run a script to edit all instances of the old hostname, IP address, and drive letter to reflect the new hostname, IP address, and drive letter on the new server. This appears to do exactly that:
ForEach($File in (Get-ChildItem $path\* -Include *.xml,*.config -Recurse)){
(Get-Content $File.FullName -Raw) -replace [RegEx]::Escape($oldhost),$newhost `
-replace [RegEx]::Escape($oldip),$newip `
-replace "$olddriveletter(?=:\Application)",$newDriveLetter |
Set-Content $File.FullName -NoNewLine
}
The one problem I am having is that the files all have different types of encoding. Some ANSI, some UTF-8, some Unicode, etc. When I run the script, it saves everything as ANSI and then my application fails to work. I know how to add the encoding parameter, but is there any way to keep the same encoding on each individual file, without writing out a script specifying each individual file in the directory and the encoding that each individual file has?

That would be difficult. It's too bad that get-content doesn't pass an encoding property. Here's a script that tries to get the encoding if there's a signature. Maybe you can just run it first and check them all. But some windows files are unicode no bom. At least xml files can say the encoding. get-childitem *.xml | select-string encoding There might be a better way to load xml files, see the bottom answer: Powershell: Setting Encoding for Get-Content Pipeline
# encoding.ps1
# https://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding
param([Parameter(ValueFromPipeline=$True)] $filename)
process {
$reader = [IO.StreamReader]::new($filename, [Text.Encoding]::default,$true)
$peek = $reader.Peek()
$encoding = $reader.currentencoding
$reader.close()
[pscustomobject]#{Name=split-path $filename -leaf
BodyName=$encoding.BodyName
EncodingName=$encoding.EncodingName}
}
# end encoding.ps1
PS C:\users\me> get-childitem chinese16.txt | encoding
Name BodyName EncodingName
---- -------- ------------
chinese16.txt utf-16 Unicode
Something like this will use the encoding indicated in the xml file, even if it didn't truly match beforehand. (This also makes the xml pretty.)
PS C:\users\me> [xml]$xml = get-content file.xml
PS C:\users\me> $xml.save("$pwd\file.xml")

Use the file.exe from the git binaries to find out the encoding.
Then, add the encoding parameter to the set-content line with if else statements to meet your needs.
ForEach($File in (Get-ChildItem $path\*)){
$Content = Get-Content $File.FullName -Raw -replace [RegEx]::Escape($oldhost),$newhost `
-replace [RegEx]::Escape($oldip),$newip `
-replace "$olddriveletter(?=:\Application)",$newDriveLetter
$Encoding = file --mime-encoding $File
$FullName = $File.FullName
Write-Host "$FullName - $Encoding"
if(-NOT ($Encoding -like "UTF")){
Set-Content $Content -NoNewLine -Encoding UTF8
}
else {
Set-Content $Content -NoNewLine
}
}
Reference:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/set-content
http://gnuwin32.sourceforge.net/packages/file.htm

Related

Powershell command to identify all the Windows type and UNIX type text file

The GIT is messing up few files and it saves the unix based files in LF format in the system.
There is also few of the windows file that gets saved in CR LF format.
I need to differentiate between the UNIX based file and Windows based file.
I was able to successfully write the below code for 1 text file. The below returned TRUE as it is a windows file.
PS C:\Desktop\SecretSauce> (GET-CONTENT 'HIDEME.TXT' -raw) -match "\r\n$"
Question:
There are 1000's of files in different format(txt, cpp, hpp, sql) in both LF and CR LF format in the same location.
I need to get the output with the path of file, filename with extension and True (If it is CR LF) and False (if it is LF).
when I execute this command to check for multiple files, output is not returning any result.
(Get-Content -Path 'C:\Desktop\SecretSauce\*.*' -raw) -match "\r\n$"
What is the best approach for this using powershell ?
It's an expensive approach, because each file is read in full, but the following should do what you want:
Get-ChildItem -File -Path H:\Desktop\Parent_Folder\Sub-Folder2\*.* |
ForEach-Object {
[pscustomobject] #{
HasCRLF = (Get-Content -Raw -LiteralPath $_.FullName) -match '\r\n'
Name = $_.Name
FullName = $_.FullName
}
}
You'll see output such as the following:
HasCRLF Name FullName
------- ---- --------
False foo.txt H:\Desktop\Parent_Folder\Sub-Folder2\foo.txt
True bar.txt H:\Desktop\Parent_Folder\Sub-Folder2\bar.txt

Powershell Get -ChildItem: filtering csv files and -Recurse not working

I created a short powershell script to convert csv files from Unicode to UTF-8 encoding. My script outputs new files with the the original file name preceded by UTF8. I'm running into two issues:
I'm trying to only run the powershell script on csv files. Currently the script runs on every file in the directory, including the powershell script (it outputs a new file called UTF8pshell_script if the powershell script was called pshell_script for example). The other methods where I've tried to only run the script on csv files just end up making the script not do anything.
I'm trying to run the script on sub-directories. The first issue is that output files created from csv files in subdirectories have no content inside them whatsoever. If the script is ran in the same directory as the csv file this problem does not arise. This is not crucial but I am also uncertain how to get output files created from those in subdirectories to be outputted in the same subdirectories (currently they are outputted in the main directory where the powershell script is).
as
Get-Content -Encoding Unicode $_ | Out-File -Encoding UTF8
Get-ChildItem -Recurse | ForEach-Object {Get-Content -Encoding Unicode $_ | Out-File -Encoding UTF8 "UTF8$_"}
The desired output is the powershell script running on only csv files, and outputting files to the same subdirectories where the files they were created form are.
Get-ChildItem takes a -Filter parameter, which for files is the simple wildcard pattern. This will allow you to restrict your cmdlet to CSV files only:
Get-ChildItem -Filter *.csv
To process subdirectories, you may also use the -Recurse switch
Get-ChildItem -Filter *.csv -Recurse
Now, I'm never quite sure how $_ changes as you pass different objects through the pipe, so I'm probably not doing the next steps the most efficient way - but it will be clear what I'm trying to do:
Each file object that we find needs to be processed as follows:
Dissect it into a path and a filename: $filepath = $_.PSParentPath; $filename = $_.PSChildName
Load up the CSV: Import-CSV -Path $_
Output the new CSV with the proper encoding: Export-CSV -Path ("{0}\UTF8{1}" -f $filepath,$filename) -Encoding UTF8
So, we put it all together:
Get-ChildItem -Filter *.csv -Recurse -exclude UTF8* | ForEach-Object {
$filepath = $_.PSParentPath
$filename = $_.PSChildName
Import-CSV -Path $_ |
Export-CSV -Encoding UTF8 -Path ("{0}\UTF8{1}" -f $filepath,$filename) -NoTypeInformation
}
The -Exclude UTF8* in the Get-ChildItem ensures that when you create a file, it doesn't get picked up later and re-processed. The -NoTypeInformation on the Export-CSV compensates for a stupidity built in to the cmdlet that causes an extra line with a meaningless object type name at the beginning of the file.
Depending on the original encoding (and presence of a BOM) you might have to specify an encoding also on the input side.
ForEach($Csv in (Get-ChildItem -Filter *.csv -Recurse -Exclude UTF8*)){
(Get-Content $Csv.FullName -raw) |
Set-Content -Path {Join-Path $Csv.Directory ("UTF8"+$Csv.Name)} -Encoding UTF8
}
LotPings beat me to this by 10 minutes with a virtually identical answer, but I'm leaving this for the 'passing an empty file to the pipeline' bit that I have. I also realize after the fact that you don't need a pipeline variable for that same reason, as you only need it if you pass things through the pipeline within the loop.
If all you want to do is change the encoding I would use a ForEach($x in $y){} loop, or a ForEach-Object{} loop with a PipelineVariable on the Get-ChildItem. I'll show that since I think pipeline variables are under used. I would also not read the file and pipe it to something, since if the file is empty you won't create a new file as nothing is passed down the pipeline.
Get-ChildItem *.csv -Recurse -PipelineVariable File | ForEach-Object{
Set-Content -Value (Get-Content $File.FullName -Encoding Unicode) -Path {Join-Path $File.Directory "UTF8$($File.Name)"} -Encoding UTF8
}
if you specify the file extension at the end of Get-ChildItem.
This will get only the files with the .csv extension.
By specifying the File path in Out-File it will send it to the specified directory.
Get-ChildItem -Path C:\folder\*.csv -Recurse | ForEach-Object {Get-Content -Encoding Unicode $_ | Out-File -FilePath C:\Folder -Encoding UTF8 "UTF8$_"}

Out of memory exception on [System.IO.File]::ReadAllText with large CSV

I have a simple PowerShell script that replaces "false" or "true" with "0" or "1":
$InputFolder = $args[0];
if($InputFolder.Length -lt 3)
{
Write-Host "Enter a path name as your first argument" -foregroundcolor Red
return
}
if(-not (Test-Path $InputFolder)) {
Write-Host "File path does not appear to be valid" -foregroundcolor Red
return
}
Get-ChildItem $InputFolder
$content = [System.IO.File]::ReadAllText($InputFolder).Replace("`"false`"", "`"0`"").Replace("`"true`"", "`"1`"").Replace("`"FALSE`"", "`"0`"").Replace("`"TRUE`"", "`"1`"")
[System.IO.File]::WriteAllText($InputFolder, $content)
[GC]::Collect()
This works fine for almost all files I have to amend, with the exception of one 808MB CSV.
I have no idea how many lines are in this CSV, as nothing I have will open it properly.
Interestingly, the PowerShell script will complete successfully when invoked manually via either PowerShell directly or via command prompt.
When this is launched as part of the SSIS package it's required for, that's when the error happens.
Sample data for the file:
"RowIdentifier","DateProfileCreated","IdProfileCreatedBy","IDStaffMemberProfileRole","StaffRole","DateEmploymentStart","DateEmploymentEnd","PPAID","GPLocalCode","IDStaffMember","IDOrganisation","GmpID","RemovedData"
"134","09/07/1999 00:00","-1","98","GP Partner","09/07/1999 00:00","14/08/2009 15:29","341159","BRA 871","141","B83067","G3411591","0"
Error message thrown:
I'm not tied to PowerShell - I'm open to other options. I had a cribbed together C# script previously, but that died on small files than this - I'm no C# developer, so was unable to debug it at all.
Any suggestions or help gratefully received.
Generally, avoiding read large files all at once, as you can run out of memory, as you've experienced.
Instead, process text-based files line by line - both reading and writing.
While PowerShell generally excels at line-by-line (object-by-object) processing, it it is slow with files with many lines.
Using the .NET Framework directly - while more complex - offers much better performance.
If you process the input file line by line, you cannot directly write back to it and must instead write to a temporary output file, which you can replace the input file with on success.
Here's a solution that uses .NET types directly for performance reasons:
# Be sure to use a *full* path, because .NET typically doesn't have the same working dir. as PS.
$inFile = Convert-Path $Args[0]
$tmpOutFile = [io.path]::GetTempFileName()
$tmpOutFileWriter = [IO.File]::CreateText($tmpOutFile)
foreach ($line in [IO.File]::ReadLines($inFile)) {
$tmpOutFileWriter.WriteLine(
$line.Replace('"false"', '"0"').Replace('"true"', '"1"').Replace('"FALSE"', '"0"').Replace('"TRUE"', '"1"')
)
}
$tmpOutFileWriter.Dispose()
# Replace the input file with the temporary file.
# !! BE SURE TO MAKE A BACKUP COPY FIRST.
# -WhatIf *previews* the move operation; remove it to perform the actual move.
Move-Item -Force -LiteralPath $tmpOutFile $inFile -WhatIf
Note:
UTF-8 encoding is assumed, and the rewritten file will not have a BOM. You can change this by specifying the desired encoding to the .NET methods.
As an aside: Your chain of .Replace() calls on each input line can be simplified as follows, using PowerShell's -replace operator, which is case-insensitive, so only 2 replacements are needed:
$line -replace '"false"', '"0"' -replace '"true"', '"1"'
However, while that is shorter to write, it is actually slower than the .Replace() call chain, presumably because -replace is regex-based, which incurs extra processing.
You could read the file Per line with get-content -readcount, Out-file a temp file, then delete old file and rename-item the temp file the old files name.
Small things that would need fixing. This will add a new empty line at end of file. This will change the encoding. You could try and get the current file encoding and set the encoding on the Out-file -encoding
function Replace-LargeFilesInFolder(){
Param(
[string]$DirectoryPath,
[string]$OldString,
[string]$NewString,
[string]$TempExtention = "temp",
[int]$LinesPerRead = 500
)
Get-ChildItem $DirectoryPath -File | %{
$File = $_
Get-Content $_.FullName -ReadCount $LinesPerRead |
%{
$_ -replace $OldString, $NewString |
out-file "$($File.FullName).$($TempExtention)" -Append
}
Remove-Item $File.FullName
Rename-Item "$($File.FullName).$($TempExtention)" -NewName $($File.FullName)
}
}
Replace-LargeFilesInFolder -DirectoryPath C:\TEST -LinesPerRead 1 -OldString "a" -NewString "5"

Hebrew characters can't be used when renaming files

I'm trying to use the following code:
ls *.xml | Foreach {$i=1} {
$nonParsedXML = $_
[xml]$parsedXML = Get-Content $nonParsedXML -Encoding utf8
$title = $parsedXML.title
$nonParsedXMLwithExtension = "$($title).xml"
Rename-Item $nonParsedXML -NewName $nonParsedXMLwithExtension
}
The code tries to rename a file, and the new name is the content of a tag within the files. It works when the content of the tag is in English, but it doesn't work correctly when the content is in Hebrew - The code is renaming the file, but instead of Hebrew characters I see block characters.
In case you wonder, the problem occurs when I'm using PowerShell ISE.
When I'm using PowerShell in the command prompt, the code can't be run because it can't handle the Hebrew characters at all and produces errors.
Could you please clarify this issue? Is there a solution?
It seems like default file charset, try to change it to extended charset.
http://answers.microsoft.com/en-us/windows/forum/windows_7-windows_programs/default-utf-8-encoding-for-new-notepad-documents/525f0ae7-121e-4eac-a6c2-cfe6b498712c
I hope that this article will help you.
A very nice and helpful guy gave me the solution. Here it is:
dir -Path C:\Temp\Test -Filter *.xml | ForEach-Object {
$xml = [xml](Get-Content -Path $_.FullName -Encoding UTF8)
Rename-Item -Path $_.FullName -NewName "$($xml.title).xml"
}

Using PowerShell, read multiple known file names, append text of all files, create and write to one output file

I have five .sql files and know the name of each file. For this example, call them one.sql, two.sql, three.sql, four.sql and five.sql. I want to append the text of all files and create one file called master.sql. How do I do this in PowerShell? Feel free to post multiple answers to this problem because I am sure there are several ways to do this.
My attempt does not work and creates a file with several hundred thousand lines.
PS C:\sql> get-content '.\one.sql' | get-content '.\two.sql' | get-content '.\three.sql' | get-content '.\four.sql' | get-content '.\five.sql' | out-file -encoding UNICODE master.sql
Get-Content one.sql,two.sql,three.sql,four.sql,five.sql > master.sql
Note that > is equivalent to Out-File -Encoding Unicode. I only tend to use Out-File when I need to specify a different encoding.
There are some good answers here but if you have a whole lot of files and maybe you don't know all of the names this is what I came up with:
$vara = get-childitem -name "path"
$varb = foreach ($a in $vara) {gc "path\$a"}
example
$vara = get-childitem -name "c:\users\test"
$varb = foreach ($a in $vara) {gc "c:\users\test\$a"}
You can obviously pipe this directly into | add-content or whatever but I like to capture in variables so I can manipulate later on.
See if this works better
get-childitem "one.sql","two.sql","three.sql","four.sql","five.sql" | get-content | out-file -encoding UNICODE master.sql
I needed something similar, Chris Berry's post helped, but I think this is more efficient:
gci -name "*PathToFiles*" | gc > master.sql
The first part gci -name "*PathToFiles*" gets you your file list. This can be done with wildcards to just get your .sql files i.e. gci -name "\\share\folder\*.sql"
Then pipes to Get-Content and redirects the output to your master.sql file. As noted by Kieth Hill, you can use Out-File in place of > to better control your output if needed.
I think logical way of solving this is to use Add-Content
$files = Get-ChildItem '.\one.sql', '.\two.sql', '.\three.sql', '.\four.sql', '.\five.sql'
$files | foreach { Get-Content $_ | Add-Content '.\master.sql' -encoding UNICODE }
hovewer Get-Content is usually very slow when reading multiple very large files. If its your case this article could help: http://keithhill.spaces.live.com/blog/cns!5A8D2641E0963A97!756.entry
What about:
Get-Content .\one.sql,.\two.sql,.\three.sql,.\four.sql,.\five.sql | Set-Content .\master.sql
Here is how I do concatenate sql files from the Sql folder:
# Set the current location of the script to use relative path
Set-Location $PSScriptRoot
# Concatenate all the sql files
$concatSql = Get-Content -Path .\Sql\*.sql
# Write/overwrite sql to single file
Add-Content -Path concatFile.sql -Value $concatSql