I have several csv and txt files in a directory with data in them. I need to truncate the data from all of these files but leave the header in each.
You can use following script - it should work, if all files have more than one line...
$files = dir .\* -include ('*.csv', '*.txt')
foreach ($file in $files) {
$firstline = (get-content $file)[0]
set-content $file -Value $firstline
}
You do not need to read the whole file in order to just capture the first line..
Get-ChildItem -Path 'D:\Test' -File | Where-Object { $_.Extension -match '\.(csv|txt)'} | ForEach-Object {
# only read the first line using -TotalCount
($_ | Get-Content -TotalCount 1) | Set-Content -Path $_.FullName
}
The above could produce empty or whitespace only files if the top line is empty or only contains whitespaces..
Perhaps then the best option to quickly truncate these files to the top NON-EMPTY line would be:
Get-ChildItem -Path 'D:\Test' -File | Where-Object { $_.Extension -match '\.(csv|txt)'} | ForEach-Object {
$newcontent = switch -Regex -File $_.FullName {
'\S' { $_ ; break} # output the first line that is not empty or whitespace-only and exit the switch
}
# write back to the file
$newcontent | Set-Content -Path $_.FullName
}
P.S. Using -Filter as parameter on Get-ChildItem would work faster, but unfortunately, the filter can only be used for ONE file pattern only, like '*.csv'.
If you need recursion (search subfolders as well), then you could user the -Include parameter which accepts an array of file patterns. However, for that to work, you also need to add switch -Recurse OR have the path end in \*.
-Include is not as fast as -Filter, just about the same speed as by using a Where-Object clause in the examples above
Related
cd 'A:\P\E\D'
$files = Get-ChildItem . *.CSV -rec
ForEach ($file in $files) {
(Get-Content $file -Raw) | ForEach-Object {
*some simple code*
} | Set-Content $file
}
How to modify this powershell script to locate only files starting with letters A/a to O/o and ending with .csv in specified directory cd?
I thought the solution below would work, but the test file M_K_O_X.CSV stored in the cd directory was not found and modified. The solution above will find and modify the file. It's possible that I have the regex expression wrong or the problem is somewhere else? I tried also this regex -- "[A-O]..CSV"
cd 'A:\P\E\D'
$files = Get-ChildItem . -rec | Where-Object { $_.Name -like "[a-oA-O]*.*.CSV" }
ForEach ($file in $files) {
(Get-Content $file -Raw) | ForEach-Object {
*some simple code*
} | Set-Content $file
}
Looking at your wildcard pattern, seems like you have an extra *. that shouldn't be there:
'M_K_O_X.CSV' -like '[a-oA-O]*.*.CSV' # False
'M_K_O_X.CSV' -like '[a-oA-O]*.CSV' # True
In this case you could simply use the -Include Parameter which supports character ranges. Also PowerShell is case insensitive by default, [a-oA-O]*.CSV can be reduced to [a-o]*.CSV:
Get-ChildItem 'A:\P\E\D' -Recurse -Include '[a-o]*.csv' | ForEach-Object {
($_ | Get-Content -Raw) | ForEach-Object {
# *some simple code*
} | Set-Content -LiteralPath $_.FullName
}
As commented, I would use the standard wildcard -Filter to filter for all files with a .csv extension.
Then pipe to a Where-Object clause in which you can use regex -match
$files = Get-ChildItem -Path 'A:\P\E\D' -Filter '*.csv' -File -Recurse |
Where-Object { $_.Name -match '^[a-o]' }
foreach ($file in $files) {
# switch `-Raw` makes Get-Content return a single multiline string, so no need for a loop
$content = Get-Content -Path $file.FullName -Raw
# *some simple code manipulating $content*
$content | Set-Content -Path $file.FullName
}
However, if these are valid csv files, I would not recommend using a pure textual manipulation on them, instead use Import-Csv -Path $file.FullName and work on the properties on each of the objects returned.
I have this file structure
In PowerShell my location is set to Folder. SubSubFolders has a lot of xml files, and I want to add a line there only if content of version.txt file is a and that line doesn't exist there already.
I was able to figure out how to change an xml file in particular SubSubFolder, but I can't do it when I start in Folder folder and and taking into consideration version
#here I need to add: only if version.txt content of xml file in parent folder is "a"
$files = Get-ChildItem -Filter *blah.xml -Recurse | Where{!(Select-String -SimpleMatch "AdditionalLine" -Path $_.fullname -Quiet)} | Format-Table FullName
foreach($file in $files)
{
(Get-Content $file.FullName | Foreach-Object { $_
if ($_ -match "AdditionalLineAfterThisLine")
{
"AdditionalLine"
}
}) | Set-Content $file.FullName
}
If I understand you correctly, you're looking for the following:
$files = (
Get-ChildItem -Filter *blah.xml -Recurse |
Where-Object{
-not ($_ | Select-String -SimpleMatch "AdditionalLine" -Quiet) -and
(Get-Content -LiteralPath "$($_.DirectoryName)/../version.txt") -eq 'a'
}
).FullName
Note that the assumption is that the version.txt file contains just one line. If it contains multiple lines, the -eq 'a' operation would act as a filter and return all lines whose content is 'a', which in the implied Boolean context of -and would yield $true if one or more such lines, potentially among others, exist.
I am trying to do a search for specific file within a directory with the following command:
gci -recurse -path "E:\" | select-string "searchContent" | select path
doing so gave me an insufficient memory error. I have seen other posts recommending piping it into foreach-object, but I couldn't figure out how to get it to work in my scenario. Any assistance appreciated!
When reading the file as a whole (single multiline string), your seach can be much faster than by testing line-by-line.
Also, you could speed up significantly if you could use a filename pattern as filter for the Get-ChildItem cmdlet. If you for instance only want to search through .txt files, add -Filter '*.txt'.
In any case, append switch -File so Get-ChildItem won't try to pass DirectoryInfo objects to the rest of the code.
Try:
# since we use regular expression operator `-match`, escape the word or phrase you need to find
$searchContent = [regex]::Escape('whateveryouarelookingfor')
$result = Get-ChildItem -Path 'E:\' -Recurse -File | ForEach-Object {
if ((Get-Content -Path $_.FullName -Raw) -match $searchContent) { $_.FullName }
}
A bit faster than using ForEach-Object{..} would be to use a foreach() instead (skipping processing time needed to pipe results)
# since we use regular expression operator `-match`, escape the word or phrase you need to find
$searchContent = [regex]::Escape('whateveryouarelookingfor')
$result = foreach ($file in (Get-ChildItem -Path 'E:\' -Recurse -File)) {
if ((Get-Content -Path $file.FullName -Raw) -match $searchContent) { $file.FullName }
}
Now you can display the full path and filenames on screen
$result
and save it as text file on disk
$result | Set-Content -Path ('X:\FilesContaining_{0}.txt' -f $searchContent)
Just assign it to a variable, and then have a foreach loop that assigns each one to another variable.
$files = gci -recurse -path "E:\"
foreach ($fileName in $files)
{
if ($fileName.Name -like "*searchContent*")
{
write-host $fileName.Name
}
}
I feel this should consume less memory. Can't tell for sure but you can let me know. The concept is the same but using [System.IO.StreamReader].
Note: This will keep on looking for all files it can find, if you need the loop to stop at first finding then a new condition should be added.
foreach($file in Get-ChildItem -Recurse -path "E:\" -File)
{
$reader = [System.IO.StreamReader]::new($file.FullName)
while(-not $reader.EndOfStream)
{
if($reader.ReadLine() -match 'searchContent')
{
$file.FullName
break
}
}
$reader.Dispose()
}
I have about 2500 CSV files each around 20MB in terms of file size. I am trying to filter out certain rows from each file and save that to a new file.
So, if i have :
File 1 :
Row1
Row2
Row3
File 2 :
Row2
Row3
and so on..
If i filter for all files and select "Row2" as filter text the new folder should have all the files with only rows that match the filter text.
Looking through some forums, I came up with the following that might help me filter the rows, but Im not sure how I can do it recursively, plus I also don't know if this is a fast enough method. Any help is appreciated.
Get-Content "C:\Path to file" | Where{$_ -match "Rowfiltertext*"} | Out-File "Path to Out file"
I'm using windows so I guess Powershell type of solution would be the best here.
The text to be filtered will always be in the first column.
Thanks
Siddhant
Here's two fast ways of searching for a string inside (text) files:
1) using switch
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
# iterate through the lines in the file and output the ones that match the search pattern
switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
} | Set-Content -Path $outputPath # add -PassThru to also show on screen
2) using Select-String
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
($_ | Select-String -Pattern $searchPattern).Line
} | Set-Content -Path $outputPath # add -PassThru to also show on screen
In case you want to output a new csv file for every original file,
use:
3) using switch
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv'
if (!(Test-Path -Path $outputPath -PathType Container)) {
$null = New-Item -Path $outputPath -ItemType Directory
}
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
# iterate through the lines in the file and output the ones that match the search pattern
$result = switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
$result | Set-Content -Path $outFile # add -PassThru to also show on screen
}
4) using Select-String
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
($_ | Select-String -Pattern $searchPattern).Line | Set-Content -Path $outFile # add -PassThru to also show on screen
}
Hope that helps
Re. "fast enough method":
Get-Content is extremely slow.
You could use "System.IO.StreamReader" instead, i.e. read the complete file content into a string, then split this string into rows and so on, e.g.:
[System.IO.FileStream]$objFileStream = New-Object System.IO.FileStream($Csv.FullName, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.StreamReader]$objStreamReader = New-Object System.IO.StreamReader($objFileStream, [System.Text.Encoding]::UTF8)
$strFileContent = ($objStreamReader.ReadToEnd())
$objStreamReader.Close()
$objStreamReader.Dispose()
$objFileStream.Close()
$objFileStream.Dispose()
[string[]]$arrFileContent = $strFileContent -split("`r`n")
How can I change the following code to look at all the .log files in the directory and not just the one file?
I need to loop through all the files and delete all lines that do not contain "step4" or "step9". Currently this will create a new file, but I'm not sure how to use the for each loop here (newbie).
The actual files are named like this: 2013 09 03 00_01_29.log. I'd like the output files to either overwrite them, or to have the SAME name, appended with "out".
$In = "C:\Users\gerhardl\Documents\My Received Files\Test_In.log"
$Out = "C:\Users\gerhardl\Documents\My Received Files\Test_Out.log"
$Files = "C:\Users\gerhardl\Documents\My Received Files\"
Get-Content $In | Where-Object {$_ -match 'step4' -or $_ -match 'step9'} | `
Set-Content $Out
Give this a try:
Get-ChildItem "C:\Users\gerhardl\Documents\My Received Files" -Filter *.log |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to the original file
$content | Where-Object {$_ -match 'step[49]'} | Set-Content $_.FullName
#filter and save content to a new file
$content | Where-Object {$_ -match 'step[49]'} | Set-Content ($_.BaseName + '_out.log')
}
To get the content of a directory you can use
$files = Get-ChildItem "C:\Users\gerhardl\Documents\My Received Files\"
Then you can loop over this variable as well:
for ($i=0; $i -lt $files.Count; $i++) {
$outfile = $files[$i].FullName + "out"
Get-Content $files[$i].FullName | Where-Object { ($_ -match 'step4' -or $_ -match 'step9') } | Set-Content $outfile
}
An even easier way to put this is the foreach loop (thanks to #Soapy and #MarkSchultheiss):
foreach ($f in $files){
$outfile = $f.FullName + "out"
Get-Content $f.FullName | Where-Object { ($_ -match 'step4' -or $_ -match 'step9') } | Set-Content $outfile
}
If you need to loop inside a directory recursively for a particular kind of file, use the below command, which filters all the files of doc file type
$fileNames = Get-ChildItem -Path $scriptPath -Recurse -Include *.doc
If you need to do the filteration on multiple types, use the below command.
$fileNames = Get-ChildItem -Path $scriptPath -Recurse -Include *.doc,*.pdf
Now $fileNames variable act as an array from which you can loop and apply your business logic.
Other answers are great, I just want to add... a different approach usable in PowerShell:
Install GNUWin32 utils and use grep to view the lines / redirect the output to file http://gnuwin32.sourceforge.net/
This overwrites the new file every time:
grep "step[49]" logIn.log > logOut.log
This appends the log output, in case you overwrite the logIn file and want to keep the data:
grep "step[49]" logIn.log >> logOut.log
Note: to be able to use GNUWin32 utils globally you have to add the bin folder to your system path.