How to parse through folders and files using PowerShell? - powershell

I am trying to construct a script that moves through specific folders and the log files in it, and filters the error codes. After that it passes them into a new file.
I'm not really sure how to do that with for loops so I'll leave my code bellow.
If someone could tell me what I'm doing wrong, that would be greatly appreciated.
$file_name = Read-Host -Prompt 'Name of the new file: '
$path = 'C:\Users\user\Power\log_script\logs'
Add-Type -AssemblyName System.IO.Compression.FileSystem
function Unzip
{
param([string]$zipfile, [string]$outpath)
[System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath)
}
if ([System.IO.File]::Exists($path)) {
Remove-Item $path
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
} else {
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
}
$folder = Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles'
$files = foreach($logfolder in $folder) {
$content = foreach($line in $files) {
if ($line -match '([ ][4-5][0-5][0-9][ ])') {
echo $line
}
}
}
$content | Out-File $file_name -Force -Encoding ascii
Inside the LogFiles folder are three more folders each containing log files.
Thanks

Expanding on a comment above about recursing the folder structure, and then actually retrieving the content of the files, you could try something line this:
$allFiles = Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' -Recurse
# iterate the files
$allFiles | ForEach-Object {
# iterate the content of each file, line by line
Get-Content $_ | ForEach-Object {
if ($_ -match '([ ][4-5][0-5][0-9][ ])') {
echo $_
}
}
}

It looks like your inner loop is of a collection ($files) that doesn't yet exist. You assign $files to the output of a ForEach(...) loop then try to nest another loop of $files inside it. Of course at this point $files isn't available to be looped.
Regardless, the issue is you are never reading the content of your log files. Even if you managed to loop through the output of Get-ChildItem, you need to look at each line to perform the match.
Obviously I cannot completely test this, but I see a few issues and have rewritten as below:
$file_name = Read-Host -Prompt 'Name of the new file'
$path = 'C:\Users\user\Power\log_script\logs'
$Pattern = '([ ][4-5][0-5][0-9][ ])'
if ( [System.IO.File]::Exists( $path ) ) { Remove-Item $path }
Expand-Archive 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
Select-String -Path 'C:\Users\user\Power\log_script\logs\LogFiles\*' -Pattern $Pattern |
Select-Object -ExpandProperty line |
Out-File $file_name -Force -Encoding ascii
Note: Select-String cannot recurse on its own.
I'm not sure you need to write your own UnZip function. PowerShell has the Expand-Archive cmdlet which can at least match the functionality thus far:
Expand-Archive -Path <SourceZipPath> -DestinationPath <DestinationFolder>
Note: The -Force parameter allows it to over write the destination files if they are already present. which may be a substitute for testing if the file exists and deleting if it does.
If you are going to test for the file that section of code can be simplified as:
if ( [System.IO.File]::Exists( $path ) ) { Remove-Item $path }
Unzip 'C:\Users\user\Power\log_script\logs.zip' 'C:\Users\user\Power\log_script'
This is because you were going to run the UnZip command regardless...
Note: You could also use Test-Path for this.
Also there are enumerable ways to get the matching lines, here are a couple of extra samples:
Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' |
ForEach-Object{
( Get-Content $_.FullName ) -match $Pattern
# Using match in this way will echo the lines that matched from each run of
# Get-Content. If nothing matched nothing will output on that iteration.
} |
Out-File $file_name -Force -Encoding ascii
This approach will read the entire file into an array before running the match on it. For large files it may pose a memory issue, however it enabled the clever use of -match.
OR:
Get-ChildItem -Path 'C:\Users\user\Power\log_script\logs\LogFiles' |
Get-Content |
ForEach-Object{ If( $_ -match $Pattern ) { $_ } } |
Out-File $file_name -Force -Encoding ascii
Note: You don't need the alias echo or its real cmdlet Write-Output

UPDATE: After fuzzing around a bit and trying different things I finally got it to work.
I'll include the code below just for demonstration purposes.
Thanks everyone
$start = Get-Date
"`n$start`n"
$file_name = Read-Host -Prompt 'Name of the new file: '
Out-File $file_name -Force -Encoding ascii
Expand-Archive -Path 'C:\Users\User\Power\log_script\logs.zip' -Force
$i = 1
$folders = Get-ChildItem -Path 'C:\Users\User\Power\log_script\logs\logs\LogFiles' -Name -Recurse -Include *.log
foreach($item in $folders) {
$files = 'C:\Users\User\Power\log_script\logs\logs\LogFiles\' + $item
foreach($file in $files){
$content = Get-Content $file
Write-Progress -Activity "Filtering..." -Status "File $i of $($folders.Count)" -PercentComplete (($i / $folders.Count) * 100)
$i++
$output = foreach($line in $content) {
if ($line -match '([ ][4-5][0-5][0-9][ ])') {
Add-Content -Path $file_name -Value $line
}
}
}
}
$end = Get-Date
$time = [int]($end - $start).TotalSeconds
Write-Output ("Runtime: " + $time + " Seconds" -join ' ')

Related

Concatenating Output from Folder

I have thousands of PDF documents that I am trying to comb through and pull out only certain data. I have successfully created a script that goes through each PDF, puts its content into a .txt, and then the final .txt is searched for the requested information. The only part I am stuck on is trying to combine all the data from each PDF into this .txt file. Currenly, each successive PDF simply overwrites the previous data and the search is only performed on the final PDF in the folder. How can I alter this set of code to allow each bit of information to be concatenated into the .txt instead of overwriting?
$all = Get-Childitem -Path $file1 -Recurse -Filter *.pdf
foreach ($f in $all){
$outfile = -join ', '
$text = convert-PDFtoText $outfile
}
Here is my entire script for reference:
Start-Process powershell.exe -Verb RunAs {
function convert-PDFtoText {
param(
[Parameter(Mandatory=$true)][string]$file
)
Add-Type -Path "C:\ps\itextsharp.dll"
$pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $file
for ($page = 1; $page -le $pdf.NumberOfPages; $page++){
$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
Write-Output $text
}
$pdf.Close()
}
$content = Read-Host "What are we looking for?: "
$file1 = Read-Host "Path to search: "
$all = Get-Childitem -Path $file1 -Recurse -Filter *.pdf
foreach ($f in $all){
$outfile = $f -join ', '
$text = convert-PDFtoText $outfile
}
$text | Out-File "C:\ps\bulk.txt"
Select-String -Path C:\ps\bulk.txt -Pattern $content | Out-File "C:\ps\select.txt"
Start-Sleep -Seconds 60
}
Any help would be greatly appreciated!
To capture all output across all convert-PDFtoText in a single output file, use a single pipeline with the ForEach-Object cmdlet:
Get-ChildItem -Path $file1 -Recurse -Filter *.pdf |
ForEach-Object { convert-PDFtoText $_.FullName } |
Out-File "C:\ps\bulk.txt"
A tweak to your convert-PDFtoText function would allow for a more concise and efficient solution:
Make convert-PDFtoText accept Get-ChildItem input directly from the pipeline:
function convert-PDFtoText {
param(
[Alias('FullName')
[Parameter(Mandatory, ValueFromPipelineByPropertyName)]
[string] $file
)
begin {
Add-Type -Path "C:\ps\itextsharp.dll"
}
process {
$pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $file
for ($page = 1; $page -le $pdf.NumberOfPages; $page++) {
[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
}
$pdf.Close()
}
}
This then allows you to simplify the command at the top to:
Get-ChildItem -Path $file1 -Recurse -Filter *.pdf |
convert-PDFtoText |
Out-File "C:\ps\bulk.txt"

Fix script to allow a scanning of all fields for every row to find if that field contains a target folder name

This is an deepening extension to solve a previously question
Since the input csv file is inconsistent, I cannot use that in this way.
The folder to move to is not always in the same column, so any code trying to use that as input will hit the problem of the value not corresponding to the folder I want to move to. It simply reads garbage ("spam") where it expects the folder name.
The only way to do it is by examining all fields for every row to find if that field contains a target folder name you can use. That means a LOT of Test-Path lines
This is the incriminated code part
Foreach ($fileName in $_.Group.FileName) {
$ValidFileName = $filename -replace $invalid
$targetFile = Join-Path -Path $TargetFolder -ChildPath $fileName
Error is explicitly telling it to move those folders when I iterate through the FileName column. Foreach ($fileName in $_.Group.FileName) {.. this is reading: (1959 10) Showcase Presents n 22,(1959 12) Showcase Presents n 23,alfa, da definire.
My request: Since that examination of all fields for every row is required so I ask for a code editing of this script. Also if this means a LOT of Test-Path lines I suppose that there is no alternative.
However this script below don't create folders and move anything so my request is to try to fix it
$csvpath = 'C:\temp\temp.csv'
$invalid = "[{0}]" -f [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ''))
$sourcePath = 'C:\temp\'
Import-Csv C:\temp\temp.csv -Header Title,FileName,Link -Delimiter ';' |
Group-Object Title |
Foreach {
# I prefer to add a trailing slash to folder names
$TargetFolder = Join-Path -Path $sourcePath -ChildPath (($_.Name -replace $invalid)+'\')
# We don't have to create the new folders, because -Force will create them for us
Foreach ($fileName in $_.Group.FileName) {
$ValidFileName = $filename -replace $invalid
$targetFile = Join-Path -Path $TargetFolder -ChildPath $fileName
# Write your values to the console - Make sure the folder is what it should be
Write-Output "Moving '$targetFile' to '$TargetFolder'"
Move-Item $targetFile $TargetFolder -Force -WhatIf
}
}
I was hesitant to even comment on this since you're just not taking any of our advise on all the previoous posts, and just repeating what we say in a new question. We are here to help with your code, and not write it for you but, learning isn't a crime so I will just assume you're just unfamiliar with powershell in general.
$csvpath = 'C:\tomp\temp.csv'
#$invalid = "[{0}]" -f [RegEx]::Escape(([IO.Path]::GetInvalidFileNameChars() -join ''))
$sourcePath = 'C:\tomp\'
Import-Csv $csvpath -Header Title,FileName,Link -Delimiter ',' |
Group-Object Title |
ForEach-Object `
-Begin {
$FoldersLocation = Get-ChildItem -Path C:\tomp -Directory
$Source_Folder = ($FoldersLocation.Parent.FullName | Select-Object -Unique)
#Create array list to hold both columns
[System.Collections.ArrayList]$CombinedRows = #()
} `
-Process {
# Create folder if it doesn't exist
$Destination = Join-Path -Path $Source_Folder -ChildPath $_.Name
if ((Test-Path -Path $Destination) -eq $false) {
"Created: $Destination"
#$null = New-Item -Path ($FoldersLocation.Parent.FullName | Select -Unique) -Name $_.Name -ItemType Directory
}
#region Combine two columns
Foreach ($fileName in $_.Group.FileName) {
$null = $CombinedRows.Add($fileName)
}
Foreach ($link in $_.Group.Link) {
$null = $CombinedRows.Add($link)
}
#endregion
} `
-End {
Foreach ($name in $FoldersLocation) {
if ($name.Name -in $CombinedRows) {
if ($name.Name -match "Showcase"){
# No need for output when supplying the -verbose switch
"Moving: '$($Name.FullName)' to 'C:\tomp\Lanterna Verde - Le Prime Storie'"
# Move-Item -Path $Name.FullName -Destination "C:\tomp\Lanterna Verde - Le Prime Storie"
}
else {
# No need for output when supplying the -verbose switch
"Moving: '$($Name.FullName)]' to 'C:\tomp\Batman DC'"
# Move-Item -Path $Name.FullName -Destination "C:\tomp\Batman DC"
}
}
}
}
All it comes down to is control of flow logic. If a certain condition is met, then do this, or do something else if its not. Added very few inline comments as most of it pretty self explanatory.
I recommend reading up on some powershell instead of hoping others could do everything for you.
Powershell

Replace the text for all files in a Directory

I have written the below conditional script to go through the files in the directory and replace the one text in all files only if file contains the word as 'Health'
cd -Path "\\shlhfilprd08\Direct Credits\Temp2"
ForEach ($file in (Get-ChildItem -Path "\\shlhfilprd08\Direct Credits\Temp2"))
{
$filecontent = Get-Content -path $file -First 1
if($filecontent -like '*Health*'){$filecontent = $filecontent -replace 'TEACHERF','UniHlth '}
Set-Content $file.PSpath -Value $filecontent
}
I come across with two issues such as
If the ($filecontent -like 'Health'), it is replacing the word in first raw and deleting other rows along with replace.I do not want that to happen
I'm getting set-content to path is denied error message for file content does not contain the Health text
Can you try with this
cd -Path "\\shlhfilprd08\Direct Credits\Temp2"
$configFiles = Get-ChildItem . *.config -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "TEACHERF", "UniHlth " } |
Set-Content $file.PSPath
}
I would try this; it worked for me in a little file
(make a small copy of a few data into a new folder and test it there)
$path = "\\shlhfilprd08\Direct Credits\Temp2"
$replace ="TEACHERF" #word to be replaced
$by = "UniHlth " #by this word (change $replace by $by)
gci $path -file | %{
foreach($line in $(Get-content $_.Fullname)){
if($line -like $replace){
$newline = $line.Replace($($replace),$($by))
Set-Content $_.FullName $newline
}
}
}

Count characters for each line

I am new to WinPowerShell. Please, would you be so kind to give me some code or information, how to write a program which will do for all *.txt files in a folder next:
1.Count characters for each line in the file
2. If length of line exceeds 1024 characters to create a subfolder within that folder and to move file there (that how I will know which file has over 1024 char per line)
I've tried though VB and VBA (this is more familiar to me), but I want to learn some new cool stuff!
Many thanks!
Edit: I found some part of a code that is beginning
$fileDirectory = "E:\files";
foreach($file in Get-ChildItem $fileDirectory)
{
# Processing code goes here
}
OR
$fileDirectory = "E:\files";
foreach($line in Get-ChildItem $fileDirectory)
{
if($line.length -gt 1023){# mkdir and mv to subfolder!}
}
If you are willing to learn, why not start here.
You can use the Get-Content command in PS to get some information of your files. http://blogs.technet.com/b/heyscriptingguy/archive/2013/07/06/powertip-counting-characters-with-powershell.aspx and Getting character count for each row in text doc
With your second edit I did see some effort so I would like to help you.
$path = "D:\temp"
$lengthToNotExceed = 1024
$longFiles = Get-ChildItem -path -File |
Where-Object {(Get-Content($_.Fullname) | Measure-Object -Maximum Length | Select-Object -ExpandProperty Maximum) -ge $lengthToNotExceed}
$longFiles | ForEach-Object{
$target = "$($_.Directory)\$lengthToNotExceed\"
If(!(Test-Path $target)){New-Item $target -ItemType Directory -Force | Out-Null}
Move-Item $_.FullName -Destination $target
}
You can make this a one-liner but it would be unnecessarily complicated. Use measure object on the array returned by Get-Content. The array being, more or less, a string array. In PowerShell strings have a length property which query.
That will return the maximum length line in the file. We use Where-Object to filter only those results with the length we desire.
Then for each file we attempt to move it to the sub directory that is in the same location as the file matched. If no sub folder exists we make it.
Caveats:
You need at least 3.0 for the -File switch. In place of that you can update the Where-Object to have another clause: $_.PSIsContainer
This would perform poorly on files with a large number of lines.
Here's my comment above indented and line broken in .ps1 script form.
$long = #()
foreach ($file in gci *.txt) {
$f=0
gc $file | %{
if ($_.length -ge 1024) {
if (-not($f)) {
$f=1
$long += $file
}
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
I was also mentioning labels and breaks there. Rather than $f=0 and if (-not($f)), you can break out of the inner loop with break like this:
$long = #()
foreach ($file in gci *.txt) {
:inner foreach ($line in gc $file) {
if ($line.length -ge 1024) {
$long += $file
break inner
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
Did you happen to notice the two different ways of calling foreach? There's the verbose foreach command, and then there's command | %{} where the iterative item is represented by $_.

Using Powershell to replace multiple strings in multiple files & folders

I have a list of strings in a CSV file. The format is:
OldValue,NewValue
223134,875621
321321,876330
....
and the file contains a few hundred rows (each OldValue is unique). I need to process changes over a number of text files in a number of folders & subfolders. My best guess of the number of folders, files, and lines of text are - 15 folders, around 150 text files in each folder, with approximately 65,000 lines of text in each folder (between 400-500 lines per text file).
I will make 2 passes at the data, unless I can do it in one. First pass is to generate a text file I will use as a check list to review my changes. Second pass is to actually make the change in the file. Also, I only want to change the text files where the string occurs (not every file).
I'm using the following Powershell script to go through the files & produce a list of the changes needed. The script runs, but is beyond slow. I haven't worked on the replace logic yet, but I assume it will be similar to what I've got.
# replace a string in a file with powershell
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}
if (Test-Path "C:\Work\ListofAllFilenamesToSearch.txt") { # if file exists
Remove-Item "C:\Work\ListofAllFilenamesToSearch.txt"
}
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames1 = Search $filefolder1 $ftype
$filenames1 | Out-File "C:\Work\ListofAllFilenamesToSearch.txt" -Width 2000
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
(Get-Content "C:\Work\NumberXrefList.CSV" |where {$_.readcount -gt 1}) | foreach{
$OldFieldValue, $NewFieldValue = $_.Split("|")
$filenamelist = (Get-Content "C:\Work\ListofAllFilenamesToSearch.txt" -ReadCount 5) #|
foreach ($j in $filenamelist) {
#$testvar = (Get-Content $j )
#$testvar = (Get-Content $j -ReadCount 100)
$testvar = (Get-Content $j -Delimiter "\n")
Foreach ($i in $testvar)
{
if ($i -imatch $OldFieldValue) {
$j + "|" + $OldFieldValue + "|" + $NewFieldValue | Out-File "C:\Work\FilesThatNeedToBeChanged.txt" -Width 2000 -Append
}
}
}
}
$FileFolder = (Get-Content "C:\Work\FilesThatNeedToBeChanged.txt" -ReadCount 5)
Get-ChildItem $FileFolder -Recurse |
select -ExpandProperty fullname |
foreach {
if (Select-String -Path $_ -SimpleMatch $OldFieldValue -Debug -Quiet) {
(Get-Content $_) |
ForEach-Object {$_ -replace $OldFieldValue, $NewFieldValue }|
Set-Content $_ -WhatIf
}
}
In the code above, I've tried several things with Get-Content - default, with -ReadCount, and -Delimiter - in an attempt to avoid an out of memory error.
The only thing I have control over is the length of the old & new replacement strings file. Is there a way to do this in Powershell? Is there a better option/solution? I'm running Windows 7, Powershell version 3.0.
Your main problem is that you're reading the file over and over again to change each of the terms. You need to invert the looping of the replace terms and looping of the files. Also, pre-load the csv. Something like:
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames = gci -Path $filefolder1 -Filter $ftype -Recurse
$replaceValues = Import-Csv -Path "C:\Work\NumberXrefList.CSV"
foreach ($file in $filenames) {
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues) {
$contents = $contents -replace $replaceValue.OldValue, $replaceValue.NewValue
}
Copy-Item $file "$file.old"
Set-Content -Path $file -Value $contents
}