Speed up Out-File across multiple items - powershell

I'm new to PowerShell but I'm trying to work through a folder of multiple files. I have files that will randomly be UTF-8, UTF-8 BOM, LE, BE, anything. I need all of them to be UTF-8 to be ingested by my ETL software our company uses. I tried numerous methods and the one below is the only one I got consistently working for any encoding.
This works almost instantly for one record, but when applying the foreach it takes roughly a minute and half for three items. I'm going to run this daily across thousands so this is unreasonable.
Any help would be appreciated.
'''
# Primary Variables
$Path = ".\Test\"
$Filename = "DriveThru_FileList.txt"
$Files = Get-ChildItem $Path\*.txt -Exclude ($Filename)
Foreach ($File in $Files){
$Content = Get-Content $File
$Content | Out-File -FilePath $File -Encoding "Default"
}
'''

I think you can speed this up by using .NET methods:
$Path = ".\Test\*"
$Filename = "DriveThru_FileList.txt"
# just store the files FullName in a string array
# when using te -Exclude switch, you must either also use -Recurse or
# have the path end in '\*' like in this example
$Files = Get-ChildItem -Path $Path -Filter '*.txt' -Exclude $Filename | Select-Object -ExpandProperty FullName
foreach ($file in $Files) {
# by default, this will create files in UTF-8 encoding without a Byte-Order Mark
# you can add a 3rd parameter if you need a different encoding like
# [System.IO.File]::WriteAllText($file, ([System.IO.File]::ReadAllText($file)), [System.Text.Encoding]::UTF8)
[System.IO.File]::WriteAllText($file, ([System.IO.File]::ReadAllText($file)))
}

Related

Trying to create an array of filenames

I am trying to use the PSWritePDF module to merge pdfs. I have about 64 folders and each of them have about 20+ files that need to be merged. In the end, I would have 64 pdfs, each containing the merged files from each of the 64 folders. I have already written some code but I am struggling to create an array of file names that I can pass to the Merge-PDF function. I know the first part of this code is redundant, just haven't fixed it yet.
#https://github.com/EvotecIT/PSWritePDF/blob/master/Example/Example03.Merging/Example03.ps1
#This gives me the 64 folder names
$folder_NM = Get-ChildItem -Path \\main_directory\CURRENT |
Where-Object {$_.PSIsContainer} |
Foreach-Object {$_.Name}
#This iterates through the 64 folders
foreach ($X IN $folder_NM)
{
#this grabs each of the 64 directories
$main_path = join-path -path \\main_directory\CURRENT -ChildPath $X
#This grabs the names of the pdfs in each folder
$file_names = Get-ChildItem $main_path |
ForEach-Object {$_.Name}
#This is grabbing each file in the folder and giving me the formatted string I need to pass to Merge-PDF. i.e. C:\\User\Current\pdf.1
foreach($Y in $file_names){
$idv_files = join-path -path $main_path -ChildPath $Y
#This is where I am stuck. I am trying to create an array with each filename comma separated. This currently just overwrites itself each time it goes through the loop.
$arr = $idv_files-join','
#This is needed for mergePDF
$OutputFile = "$maindirectory\TESTING\$X.pdf"
#This only puts the most recent file in the output file. Thus the need for an array of file names.
Merge-PDF -InputFile $arr -OutputFile $OutputFile
#Debugging
#Write-Host $arr
}
}
Specifically, this is where I am struggling. I am getting the correct files in $idv_files and if I use those in Merge-PDF then I just get a PDF with the one file that was processed last. I think I just need them comma separated and all put into the same array so that Merge-PDF will merge them all together.
foreach($Y in $file_names){
$idv_files = join-path -path $main_path -ChildPath $Y
#This is where I am stuck. I am trying to create an array with each filename comma separated. This currently just overwrites itself each time it goes through the loop.
$arr = $idv_files-join','
Anything helps. Very new to powershell!
Untested but, if the function takes [string[]] as input as in my comment, this should get you a MERGED PDF.pdf on each folder.
I would recommend you to test this with a few folders containing pdf files on your local host before trying with your FS.
# Get the Directories
$folder_NM = Get-ChildItem -Path \\main_directory\CURRENT -Directory
#This iterates through the 64 folders
foreach ($dir IN $folder_NM)
{
# This gets you the array of PDF Files
$file_names = Get-ChildItem $dir.FullName -Filter *.pdf -File |
Sort-Object Name
# Define the output file for Merged PDF
$OutputFile = Join-Path $dir.FullName -ChildPath 'MERGED PDF.pdf'
# If Merge-PDF takes [string[]] as input, this should work
Merge-PDF -InputFile $file_names.FullName -OutputFile $OutputFile
}
It appeared that you wanted the merged .pdf file to be the subdirectory name + '.pdf'. Perhaps I misunderstood. This is also UNTESTED, but might do what you want. Using the current Windows PowerShell 5.1 or any PowerShell Core, testing for .PSIsContainer is not necessary. Get-ChildItem supports -File and -Directory switches.
[CmdletBinding()]
param ()
$RootDir = '\\main_directory\CURRENT'
# Get the subdirectory list.
Get-ChildItem -Directory -Path $RootDir |
# Process each subdirectory.
ForEach-Item {
# Create an array of the .pdf files to be merged.
$file_names = (Get-ChildItem -File -Path $_.FullName -Filter '*.pdf').FullName
#This is needed for mergePDF
$OutputFile = Join-Path -Path $RootDir -ChildPath $($_.Name + '.pdf')
Write-Verbose "OutputFile is $OutputFile"
Merge-PDF -InputFile $file_names -OutputFile $OutputFile
}

Powershell search directory for code files with text matching input a txt file

Data mapping project, in house system to new vendor system. First step is find all the occurrences of current database field names (or column names to be precise) in the C# .cs source files. Trying to use Powershell. Have recently created PS searches with Get-ChildItem and Select-String that work well but the search string array was small and easily hard coded inline. But the application being ported has a couple hundred column names and significant amounts of code. So armed with a text file of all the column names Pipleline would seem like a god tool to create a the basic cross ref for further analysis. However, I was not able to get the Pipeline to work with an external variable anyplace other than first step. Trying using -PipelineVariable, $_. and global variable. Did not find anything specific after lots of searching. P.S. This is my first question to StackoOverflow, be kind please.
Here is what I hoped would work but do dice so far.
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr = [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String $s | Select-Object Path, LineNumber, line | Export-csv $outputfile
}
Did find that this will print the list one time but not twice. In fact it seems using the variable in this way results in processing simply skipping any further pipeline steps.
foreach ($s in $arr) {Write-Host $s | Write $s}
If it isn't possible to do this in Powershell easily my fallback is to do with C# although would much rather get the level up with PowerShell if anyone can point me to the correct understanding of how to do things in the Pipepline, or alternatively construct an equivalent function. Seems like such a natural fit for Powershell.
Thanks.
You're calling Export-csv $outputfile in a loop, which rewrites the whole file in every iteration, so that only the last iteration's output will end up in the file.
While you could use -Append to iteratively append to the output file, it is worth aking a step back: Select-String can accept an array of patterns, causing a line that matches any of them to be considered a match.
Therefore, your code can be simplified as follows:
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
-Pattern (Get-Content $inputFile) passes the lines of input file $inputFile as an array of patterns to match.
By default, these lines are interpreted as regexes (regular expressions); to ensure that they're treated as literals, add -SimpleMatch to the Select-String call.
This answer to a follow-up question shows how to include the specific pattern among the multiple ones passed to -Pattern that matched on each line in the output.
I think you want to append each occurrence to the csv file. And you need to get the content of the file. Try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | Export-csv -Append -Path "$outputfile"
}
}
-Append was not introduced before powershell v3.0 (Windows 8) then try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | ConvertTo-CSV -NoTypeInformation | Select-Object -Skip 1 | Out-File -Append -Path "$outputfile"
}
}

How to combine the contents of multiple files in powershell

I was wondering how I would go about combining specific files together using powershell. Example: I want to take EDIDISC.UPD EDIACCA.UPD EDIBRUM.UPD ETC ETC ETC ETC and Combine the contents of these files together and make a new file named A075MMDDYY.UPD. Now I would want it to be able to be run by whomever has the .UPD files on their network drive. Such as example: Mine would be in N:\USERS\Kevin, someone else's may be in N:\USERS\JohnDoe.
So far I only have:
Param (
$path = (Get-Location).Path,
$filetype = "*.UPD",
$files = (Get-ChildItem -Filter $filetype),
$Newfile = $path + "\Newfile.UPD"
)
$files | foreach { Get-Content $_ | Out-File -Append $Newfile -Encoding ascii }
Focusing just on the aspect of concatenating (catting) the contents of multiple files to form a new file, assuming the current directory (.):
$dir = '.'
$newfile ="$dir/Newfile.UPD"
Get-Content "$dir/*.UPD" -Exclude (Split-Path -Leaf $newFile) |
Out-File -Append -Encoding Ascii $newFile
You can pass a wildcard expression directly to Get-Content's (implied) -Path parameter in order to retrieve the contents of all matching files.
Since the output file is placed in the same dir. and matches the wildcard expression too, however, it must be excluded from matching, by filename, hence the -Exclude (Split-Path -Leaf $newFile) argument (this additionally assumes that there's no input file whose name is the same as the output file's).

Print a substring from many files to a single text File

I need to extract the 20th to 30th letters from many files in a single folder and print the results to a text file.
I understand i need to use substring(20,10) but fail at managing multiple files.
I created a working script for single file use:
$var = Get-Content -Path C:\testfiles\1.txt
$var.Substring(20,10)
But now i need to handle multiple files.
Any help?
This might work
#Path to directory
$DirPath = "C:\Files"
#get all text files from above path
$Files = Get-ChildItem -Path $DirPath -Filter "*.txt"
#path to text file where strings are stored
$DesFile = "C:\String.txt"
#loop through all files and save to desfile
foreach ($txt in $Files)
{
$var = Get-Content -Path $txt.FullName
$var.Substring(20,10) | Add-Content -Path $DesFile
}

Using a ForEach loop to reverse csv files

What I need to do is to take a number of text files (csv, all in one directory) and create files where the entire output is reversed line by line, but keeping the header at the top instead of ending up at the bottom.
I am able to take one file (by name) and copy the first line and create a new file with just that line in it. Then I take the original file minus the first line, read it into an array and reverse it. I then append that to the file that only has the header. It works fine, except for the output name, which I'd like to be [file-REV.csv] but so far I've only gotten to [file.csv-REV]...
So, once I had that working, I thought it was time to have the program find all of the csv's in the directory and loop through them, creating a reverse file for each.
This is what I have so far:
cd c:\users\$([Environment]::UserName)\working
$Path = "c:\users\$([Environment]::UserName)\working\"
ForEach ($file in Get-ChildItem -Filter *.csv) {
Get-Content $file -totalcount 1 | Set-Content .\$file-REV.csv
$flip = (get-content | select -Skip 1)
[array]::Reverse($flip)
$flip | add-content "$file-REV.csv"
}
Here is the message I receive when executing the script:
cmdlet Get-Content at command pipeline position 1
Supply values for the following parameters:
Path[0]:
I've tried to put in the entire path Get-Content -Path c:\users\jmurphy\working\ and then it complains that it can't find the entire path.
Couple things. First you are defining the folder to work in a variable, so use that in the Get-ChildItem. (I change the name to $Folder out of habit because $Path is already used variable in the Environment scope. Also $env: is a quicker way to read the variables out of the Environment scope.).
$Folder = "C:\Users\$env:UserName\working\"
$Files = Get-ChildItem $Folder -Filter *.csv
Second, you'll just want to use the Fullname property from what's returned from Get-ChildItem because that's the full path of each file.
ForEach ($File in $Files) {
So you'll want to use that full path to the file in your Get-Content
Get-Content $File.Fullname -totalcount 1 | Set-Content .\$($file.Basename)-REV.csv
and you'll want to use $File again as the path to the file when you call Get-Content again:
$Flip = (Get-Content $File.Fullname | Select -Skip 1)
Basename is the property from Get-Childitem with just the filename without the extension. You can force an evaluation of a property of a variable inside double quotes by enclosing in $().
$Flip | Add-Content "$($file.Basename)-REV.csv"
All together the script should look like this:
$Folder = "C:\Users\$env:UserName\working\"
$Files = Get-ChildItem $Folder -Filter *.csv
ForEach ($File in $Files) {
Get-Content $File.Fullname -totalcount 1 | Set-Content .\$($file.Basename)-REV.csv
$Flip = (Get-Content $File.Fullname | select -Skip 1)
[array]::Reverse($Flip)
$Flip | Add-Content "$($file.basename)-REV.csv"
}
Joe, you are doing great at learning PowerShell. You just need to harness the power of object orientation a little more.
You already know that Get-ChildItem will return an object about the file(s). What you need to know are the members included in that object. Use Get-Member to find that out.
$file = Get-ChildItem .\t.csv
$file | Get-Member
Once you find an interesting member, see what its value is.
$file.Name
$file.Extension
$file.BaseName
From this, you can construct a new file name.
$newFilename = $file.Basename + '-REV' + $file.Extension
Put this before the Get-Content line. Then use $newFilename with Add-Content.