Find lines with specific characters recursively - powershell

I am searching for all lines with '.png' and '.jpg' strings in them across multiple folders of TXT files.
Tried:
(Get-ChildItem K:\FILES -Recurse -Include '*.txt') | ForEach-Object {
(Get-Content $_) -match '\.png','\.jpg' | out-file K:\Output.txt
}
but it does not output anything. No error either. I did something similar recently and it was working. I am scratching my head wondering what am I doing wrong here...

By placing your Out-File call inside the ForEach-Object script block, you're rewriting your output file in full for every input file, so that the last input file's results - which may be none - end up as the sole content of the file.
The immediate fix is to move the Out-File call to its own pipeline segment, so that it receives all output, across all files:
Get-ChildItem K:\FILES -Recurse -Include '*.txt' |
ForEach-Object {
#(Get-Content $_) -match '\.png', '\.jpg'
} |
Out-File K:\Output.txt
Note: Technically, adding -Append to your Out-File call inside the ForEach-Object could have worked too, but this approach should be avoided:
Every Out-File call must open and close the output file, which makes the operation much slower.
You need to ensure that there is no preexisting output file beforehand - otherwise you'll end up appending to that file's existing content.
However, consider speeding up your command with the help of Select-String:
Get-ChildItem K:\FILES -Recurse -Include '*.txt' |
Select-String -Pattern '\.png', '\.jpg' |
ForEach-Object Line |
Out-File K:\Output.txt
Note:
In PowerShell (Core) 7+, you can use the -Raw switch with Select-String, which directly outputs only the text of all matching lines, in which case ForEach-Object Line isn't needed.
If you want to prefix each matching line with the source file path:
Get-ChildItem K:\FILES -Recurse -Include '*.txt' |
Select-String -Pattern '\.png', '\.jpg' |
ForEach-Object { '{0}: {1}' -f $_.Path, $_.Line } |
Out-File K:\Output.txt
Note: If you pipe Select-String output directly (without -Raw or ForEach-Object Line) to Out-File (or if you use >), you'll get similar output (even including a character position), but with limitations:
You'll get a blank line at the top and the bottom of the file.
Long line texts may be truncated.
The reason is that Out-File and its virtual alias > send the for-display representations of the input objects to the output file, which aren't meant for programmatic processing and can incur truncation of the data based on the line length (number of columns) of the current console window.

Related

Powershell how to get-content across several subfolders

I'm working on a script to output some data from multiple files based on a string search. It outputs the string found, followed by the following six characters. I can get this to work for an exact location. However, I want to search across files inside multiple subfolders in the path. Using the below script, I get PermissionDenied errors...
[regex] $pattern = '(?<=(a piece of text))(?<chunk>.*)'
Get-Content -Path 'C:\Temp\*' |
ForEach-Object {
if ($_ -match $pattern) {
$smallchunk = $matches.chunk.substring(0, 6)
}
}
"$smallchunk" | Out-File 'C:\Temp\results.txt'
If I change -Path to one of the subfolders, it works fine, but I need it to go inside each subfolder and execute the get-content.
e.g., look inside...
C:\Temp\folder1\*
C:\Temp\folder2\*
C:\Temp\folder3\*
And so on...
Following up on boxdog's suggestion of Select-String, the only limitation would be folder recursion. Unfortunately, Select-String only allows the searching of multiple files in one directory.
So, the way around this is piping the output of Get-ChildItem with a -Recurse switch into Select-String:
$pattern = "(?<=(a piece of text))(?<chunk>.*)"
Get-ChildItem -Path "C:\Temp\" -Exclude "results.txt" -File -Recurse |
Select-String -Pattern $pattern |
ForEach-Object -Process {
$_.Matches[0].Groups['chunk'].Value.Substring(0,6)
} | Out-File -FilePath "C:\Temp\results.txt"
If there's a need for the result to be saved to $smallchunk you can still do so inside the loop if need be.
Abraham Zinala's helpful answer is the best solution to your problem, because letting Select-String search your files' content is faster and more memory-efficient than reading and processing each line with Get-Content.
As for what you tried:
Using the below script I get PermissionDenied errors...
These stem from directories being among the file-system items output by Get-ChildItem, which Get-Content cannot read.
If your files have distinct filename extensions that your directories don't, one option is to pass them to the (rarely used with Get-Content) -Include parameter; e.g.:
Get-Content -Path C:\Temp\* -Include *.txt, *.c
However, as with Select-String, this limits you to a single directory's content, and it doesn't allow you to limit processing to files fundamentally, if extension-based filtering isn't possible.
For recursive listing, you can use Get-ChildItem with -Recurse, as in Abraham's answer, and pipe the file-info objects to Get-Content:
Get-ChildItem -Recurse C:\Temp -Include *.txt, *.c | Get-Content
If you want to simply limit output to files, whatever their name is, use the -File switch (similarly, -Directory limits output to directories):
Get-ChildItem -File -Recurse C:\Temp | Get-Content

Powershell Get -ChildItem: filtering csv files and -Recurse not working

I created a short powershell script to convert csv files from Unicode to UTF-8 encoding. My script outputs new files with the the original file name preceded by UTF8. I'm running into two issues:
I'm trying to only run the powershell script on csv files. Currently the script runs on every file in the directory, including the powershell script (it outputs a new file called UTF8pshell_script if the powershell script was called pshell_script for example). The other methods where I've tried to only run the script on csv files just end up making the script not do anything.
I'm trying to run the script on sub-directories. The first issue is that output files created from csv files in subdirectories have no content inside them whatsoever. If the script is ran in the same directory as the csv file this problem does not arise. This is not crucial but I am also uncertain how to get output files created from those in subdirectories to be outputted in the same subdirectories (currently they are outputted in the main directory where the powershell script is).
as
Get-Content -Encoding Unicode $_ | Out-File -Encoding UTF8
Get-ChildItem -Recurse | ForEach-Object {Get-Content -Encoding Unicode $_ | Out-File -Encoding UTF8 "UTF8$_"}
The desired output is the powershell script running on only csv files, and outputting files to the same subdirectories where the files they were created form are.
Get-ChildItem takes a -Filter parameter, which for files is the simple wildcard pattern. This will allow you to restrict your cmdlet to CSV files only:
Get-ChildItem -Filter *.csv
To process subdirectories, you may also use the -Recurse switch
Get-ChildItem -Filter *.csv -Recurse
Now, I'm never quite sure how $_ changes as you pass different objects through the pipe, so I'm probably not doing the next steps the most efficient way - but it will be clear what I'm trying to do:
Each file object that we find needs to be processed as follows:
Dissect it into a path and a filename: $filepath = $_.PSParentPath; $filename = $_.PSChildName
Load up the CSV: Import-CSV -Path $_
Output the new CSV with the proper encoding: Export-CSV -Path ("{0}\UTF8{1}" -f $filepath,$filename) -Encoding UTF8
So, we put it all together:
Get-ChildItem -Filter *.csv -Recurse -exclude UTF8* | ForEach-Object {
$filepath = $_.PSParentPath
$filename = $_.PSChildName
Import-CSV -Path $_ |
Export-CSV -Encoding UTF8 -Path ("{0}\UTF8{1}" -f $filepath,$filename) -NoTypeInformation
}
The -Exclude UTF8* in the Get-ChildItem ensures that when you create a file, it doesn't get picked up later and re-processed. The -NoTypeInformation on the Export-CSV compensates for a stupidity built in to the cmdlet that causes an extra line with a meaningless object type name at the beginning of the file.
Depending on the original encoding (and presence of a BOM) you might have to specify an encoding also on the input side.
ForEach($Csv in (Get-ChildItem -Filter *.csv -Recurse -Exclude UTF8*)){
(Get-Content $Csv.FullName -raw) |
Set-Content -Path {Join-Path $Csv.Directory ("UTF8"+$Csv.Name)} -Encoding UTF8
}
LotPings beat me to this by 10 minutes with a virtually identical answer, but I'm leaving this for the 'passing an empty file to the pipeline' bit that I have. I also realize after the fact that you don't need a pipeline variable for that same reason, as you only need it if you pass things through the pipeline within the loop.
If all you want to do is change the encoding I would use a ForEach($x in $y){} loop, or a ForEach-Object{} loop with a PipelineVariable on the Get-ChildItem. I'll show that since I think pipeline variables are under used. I would also not read the file and pipe it to something, since if the file is empty you won't create a new file as nothing is passed down the pipeline.
Get-ChildItem *.csv -Recurse -PipelineVariable File | ForEach-Object{
Set-Content -Value (Get-Content $File.FullName -Encoding Unicode) -Path {Join-Path $File.Directory "UTF8$($File.Name)"} -Encoding UTF8
}
if you specify the file extension at the end of Get-ChildItem.
This will get only the files with the .csv extension.
By specifying the File path in Out-File it will send it to the specified directory.
Get-ChildItem -Path C:\folder\*.csv -Recurse | ForEach-Object {Get-Content -Encoding Unicode $_ | Out-File -FilePath C:\Folder -Encoding UTF8 "UTF8$_"}

One Line PowerShell Batch Rename based on File Contents

I have a list of EDI text files with specific text in them. Currently in order for our custom scripting to convert them into an SQL table, we need to be able to see the X12 file type in the filename. Because we are using SQL script to get the files into tables this solution needs to be a one line solution. We have a definition table of client files which specify which field terminator and file types to look for so we will be later substitute those values into the one line solution to be executed individually. I am currently looking at Powershell (v.3) to do this for maximum present and future compatibility. Also, I am totally new to Powershell, and have based my script generation on posts in this forum.
Files example
t.text.oxf.20170815123456.out
t.text.oxf.20170815234567.out
t.text.oxf.20170815345678.out
t.text.oxf.20170815456789.out
Search strings to find within files: (To find EDI X12 file type uniquely, which may be duplicated within the same file n times)
ST*867
ST*846
ST~867
ST~846
ST|867
ST|846
Here is what I have so far which does not show itself doing anything with the whatif parameter:
(Get-ChildItem .\ -recurse | Select-String -pattern 'ST~867' -SimpleMatch).Path | Foreach -Begin {$i=1} -Process {Rename-Item -LiteralPath $_ -NewName ($_ -replace 'out$','867.out' -f $i++) -whatif}
The fist part:
(Get-ChildItem .\ -recurse | Select-String -pattern 'ST~867' -SimpleMatch).Path
Simply gets a list of paths that we need to input to be renamed
The second part after the | pipe:
Foreach -Begin {$i=1} -Process {Rename-Item -LiteralPath $_ -NewName ($_ -replace '\.out','.867.out' -f $i++) -whatif}
will supposedly loop through that list and rename the files adding the EDI type to the end of the file. I have tried 'out$','867.out' with no change.
Current Errors:
The first part shows duplicated path elements probably because there are multiple Transaction Set Headers in the files, is there any way to force it to be unique?
The command does not show any Errors (red text) but with the whatif parameter shows that it does not rename any files (tried running it without as well).
1) remove duplicates using -List switch in Select-String
2) you need to really pipe the objects into the for loop
Try this?
Select-String -Path .\*.out -pattern 'ST~867' -SimpleMatch -List | Select-Object Path | ForEach-Object { Rename-Item $_.path ($_.path -replace 'out$','867.out') }

Select-Object omitting output from Select-String powershell

I wrote a Powershell script that looks for a string (such as ERROR) in log files and grabs those lines and outputs those lines to a file, for simpler reading and such (the industry I'm in has VERY large log files), but I'm having an issue. Before, when the (relevant) part of the code looked like this:
Select-String -Path "$file" -Pattern "$string" -CaseSensitive | Out-File -filepath $filepath
It would output the file path, the line number, and then the actual line, making for a very cluttered file. Well I only needed the line and the line number, so I did this:
Select-String -Path "$file" -Pattern "$string" -CaseSensitive | Select-Object -Property LineNumber,Line | Out-File -filepath $filepath
Which would return lines looking like this:
978 2017-07-10 10:46:11,288 ERROR [Music...
That is the line number then the line, with the line only totaling 35 characters.
Before I piped Select-String to Select-Object, the script would output the whole line, but now with Select-Object it omits some output. I tried adding -verbose parameters to both Select-String and Select-Object, but that did nothing.
Can you try this :
Select-String -Path "test.xml" -Pattern "ERROR" -CaseSensitive | ft -Property LineNumber,Line -Wrap | Out-File -FilePath c:\out.txt
The reason for your problem is screen buffer length(increasing powershell screen buffer width) ,you can change it as well but the above snippet is simpler and effective

How do I remove carriage returns from text file using Powershell?

I'm outputting the contents of a directory to a txt file using the following command:
$SearchPath="c:\searchpath"
$Outpath="c:\outpath"
Get-ChildItem "$SearchPath" -Recurse | where {!$_.psiscontainer} | Format-Wide -Column 1'
| Out-File "$OutPath\Contents.txt" -Encoding ASCII -Width 200
What I end up with when I do this is a txt file with the information I need, but it adds numerous carriage returns I don't need, making the output harder to read.
This is what it looks like:
c:\searchpath\directory
name of file.txt
name of another file.txt
c:\searchpath\another directory
name of some file.txt
That makes a txt file that requires a lot of scrolling, but the actual information isn't that much, usually a lot less than a hundred lines.
I would like for it to look like:
c:\searchpath\directory
nameoffile.txt
c:\searchpath\another directory
another file.txt
This is what I've tried so far, not working
$configFiles=get-childitem "c:\outpath\*.txt" -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object {$_ -replace "'n", ""} |
Set-Content $file.PSPath
}
I've also tried 'r but both options leave the file unchanged.
Another attempt:
Select-String -Pattern "\w" -Path 'c:\outpath\contents.txt' | foreach {$_.line}'
| Set-Content -Path c:\outpath\contents2.txt
When I run that string without the Set-content at the end, it appears exactly as I need it in the ISE, but as soon as I add the Set-Content at the end, it once agains carriage returns where I don't need them.
Here's something interesting, if I create a text file with a few carriage returns and a few tabs, then if I use the same -replace script I've been using, but uset to replace the tabs, it works perfect. Butr and n do not work. It's almost as though it doesn't recognize them as escape characters. But if I addr and `n in the txt file then run the script, it still doesn't replace anything. Doesn't seem to know what to do with it.
Set-Content adds newlines by default. Replacing Set-Content by Out-File in your last attempt in your question will give you the file you want:
Select-String -Pattern "\w" -Path 'c:\outpath\contents.txt' | foreach {$_.line} |
Out-File -FilePath c:\outpath\contents2.txt
It's not 'r (apostrophe), it's a back tick: `r. That's the key above the tab key on the US keyboard layout. :)
You can simply avoid all those empty lines by using Select-Object -ExpandProperty Name:
Get-ChildItem "$SearchPath" -Recurse |
Where { !$_.PSIsContainer } |
Select-Object -ExpandProperty Name |
Out-File "$OutPath\Contents.txt" -Encoding ASCII -Width 200
... if you don't need the folder names.