First I would like to generate a directory listing for all text files in a directory. Next take each of those text files and get-contents. Then I want to go through the contents and search for all text files in another directory which share contents with the first file and output those corresponding matches to a file named after the source file.
I'm very new at all of this and realize I'm missing some serious fundamental chunks of knowledge. What little scripting experience I have is in Javascript which doesn't seem entirely transferable. (Although programming is programming I'm told.)
This is what I have so far:
$max = get-content h:test1\one.txt | Measure-Object
$A = get-content h:test1\one.txt
For($i=0; $i -lt $max.count ; $i++){
select-string h:test2\*.txt -pattern $($A[$i]) | Format-Table | Out-File ($i + '.txt')
}
I'm hoping for something like:
$max = get-content $files[i] | Measure-Object
$A = get-content files[i]
For($j=0; $j -lt $max.count ; $j++){
select-string h:test2\*.txt -pattern $($A[$j]) | Format-Table | Out-File($files[i].basename + $j + '.txt')
}
Any and all help would be extremely appreciated,
Kurtis
So
Book 1 (one.txt)
The capital of France is Paris.
The population of Paris is twelve.
Book 2 (two.txt)
France is a beautiful country.
The capital of France is Paris.
I basically want a report of the fact that two.txt shares a line with one.txt.
First I would like to generate a directory listing for all text files in a directory
Here's how:
$textFiles1 = dir -Path C:\Books1 -Filter *.txt
$textFiles2 = dir -Path C:\Books2 -Filter *.txt
Next take each of those text files and get-contents.
I want to see whether any lines from the first book are in any of the other books.
Here's an algorithm to do this (untested) (tested):
foreach ($textFile in $textFiles1) {
$lines = get-content -Path $textFile
foreach ($line in $lines) {
foreach ($textFile2 in $textFiles2) {
$lines2 = get-content -Path $textFile2
if ($lines2 -contains $line) {
$matchMessage = 'Line: "{0}" is duplicated in "{1}".' -f $line, $textFile2
$matchMessage | out-file C:\report.txt -encoding UTF8 -Append
}
}
}
}
notepad C:\report.txt
OK let's break this down:
First I would like to generate a directory listing for all text files in a directory. Next take each of those text files and get-contents.
dir *.txt | get-content
Then I want to go through the contents and search for all text files
in another directory which share contents with the first file
OK, now we pipe all that text to select-string (first filtering out all empty string with ?{$_}:
dir *.txt | get-content | ?{$_} | %{select-string -path searchPath\*.txt -pattern "$_" -simple}
and output those corresponding matches to a file named after the
source file.
So now it gets tricky because we have to go back and track our source file name, we do this by wraqpping our query in a foreach (i.e. %{} ):
dir *.txt | %{ $sourceFile = $_; get-content $_ | ?{$_} | %{select-string -path searchPath\*.txt -pattern "$_" -simple} | out-file "$sourceFile.results" }
Related
Data mapping project, in house system to new vendor system. First step is find all the occurrences of current database field names (or column names to be precise) in the C# .cs source files. Trying to use Powershell. Have recently created PS searches with Get-ChildItem and Select-String that work well but the search string array was small and easily hard coded inline. But the application being ported has a couple hundred column names and significant amounts of code. So armed with a text file of all the column names Pipleline would seem like a god tool to create a the basic cross ref for further analysis. However, I was not able to get the Pipeline to work with an external variable anyplace other than first step. Trying using -PipelineVariable, $_. and global variable. Did not find anything specific after lots of searching. P.S. This is my first question to StackoOverflow, be kind please.
Here is what I hoped would work but do dice so far.
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr = [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String $s | Select-Object Path, LineNumber, line | Export-csv $outputfile
}
Did find that this will print the list one time but not twice. In fact it seems using the variable in this way results in processing simply skipping any further pipeline steps.
foreach ($s in $arr) {Write-Host $s | Write $s}
If it isn't possible to do this in Powershell easily my fallback is to do with C# although would much rather get the level up with PowerShell if anyone can point me to the correct understanding of how to do things in the Pipepline, or alternatively construct an equivalent function. Seems like such a natural fit for Powershell.
Thanks.
You're calling Export-csv $outputfile in a loop, which rewrites the whole file in every iteration, so that only the last iteration's output will end up in the file.
While you could use -Append to iteratively append to the output file, it is worth aking a step back: Select-String can accept an array of patterns, causing a line that matches any of them to be considered a match.
Therefore, your code can be simplified as follows:
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
-Pattern (Get-Content $inputFile) passes the lines of input file $inputFile as an array of patterns to match.
By default, these lines are interpreted as regexes (regular expressions); to ensure that they're treated as literals, add -SimpleMatch to the Select-String call.
This answer to a follow-up question shows how to include the specific pattern among the multiple ones passed to -Pattern that matched on each line in the output.
I think you want to append each occurrence to the csv file. And you need to get the content of the file. Try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | Export-csv -Append -Path "$outputfile"
}
}
-Append was not introduced before powershell v3.0 (Windows 8) then try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | ConvertTo-CSV -NoTypeInformation | Select-Object -Skip 1 | Out-File -Append -Path "$outputfile"
}
}
I have a text file that looks like this:
Data I'm NOT looking for
More data that doesn't matter
Even more data that I don't
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
Ending that I don't need
Here is what the output needs to be:
File1.txt
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
File2.txt
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
I need to do this for every file in a folder (sometimes there will be multiple files that will need to be filtered.) The files names can be incrementing: ex. File1.txt, File2.txt, File3.txt.
This is what I have tried with no luck:
ForEach-Object{
$text -join "`n" -split '(?ms)(?=^&START)' -match '^&START' |
Out-File B:\PowerShell\$filename}
Thanks!
Looks like you were pretty close: your code correctly extracted the paragraphs of interest, but intra-paragraph out-filtering of non-&-starting lines was missing, and you needed to write to paragraph-specific output files:
$text -join "`n" -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $ndx=0 } { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
This creates sequentially numbered files starting with File1.txt for every paragraph of interest.
To do it for every file in a folder, with output filenames using fixed naming scheme File<n> across all input files (and thus cumulative numbering):
Get-ChildItem -File . | ForEach-Object -Begin { $ndx=0 } -Process {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
}
To do it for every file in a folder, with output filenames based on the input filenames and numbering per input file (PSv4+, due to use of -PipelineVariable):
Get-ChildItem -File . -PipelineVariable File | ForEach-Object {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object {$ndx=0} { $_ -split '\n' -match '^&' | Out-File "$($File.Name)$((++$ndx)).txt" }
}
You post a second question (against the rules) and it was deleted but here is my quick answer for it. I hope it will help you and give you more sense how PS works:
$InputFile = "C:\temp\test\New folder (3)\File1.txt"
# get file content
$a=Get-Content $InputFile
# loop for every line in range 2 to last but one
for ($i=1; $i -lt ($a.count-1); $i++)
{
#geting string part between & and / , and construct output file name
$OutFile = "$(Split-Path $InputFile)\$(($a[$i] -split '/')[0] -replace '&','').txt"
$a[0]| Out-File $OutFile #creating output file and write first line in it
$a[$i]| Out-File $OutFile -Append #write info line
$a[-1]| Out-File $OutFile -Append #write last line
}
Something like this?
$i=0
gci -path "C:\temp\ExplodeDir" -file | %{ (get-content -path $_.FullName -Raw).Replace("`r`n`r`n", ";").Replace("`r`n", "~").Split(";") | %{if ($_ -like "*Start*") {$i++; ($_ -split "~") | out-file "C:\temp\ResultFile\File$i.txt" }} }
Example Text
DESKTOP HARD DRIVE (S/N:9VMJ31W0)
I would like to find the text in the file
9VMJ31W0
Additional Example Text
SERVER HARD DRIVE (S/N:3NM2Y5HB)
SERVER HARD DRIVE (S/N:3NM2YXBD)
SERVER HARD DRIVE (S/N:6SD1MZFE)
SERVER HARD DRIVE (S/N:3NM2YX1Q)
SERVER HARD DRIVE (S/N:6SD1E8SA)
SERVER HARD DRIVE (S/N:3NM305ZQ)
SERVER HARD DRIVE (S/N:B365P760VG2F)
SERVER HARD DRIVE (S/N:B365P760VG54)
I would like the output file to read something like this
3NM2Y5HB
3NM2YXBD
6SD1MZFE
3NM2YX1Q
6SD1E8SA
3NM305ZQ
B365P760VG2F
B365P760VG54
then output this to a file in PowerShell preferably.
The files will be located in a specific folder, searching sub folders would be awesome but not necessary.
The output would be a single multiple lined .txt file.
Does anyone have an example file I could use to perform this? I found lots of things similar but nothing I was able to actually complete the whole task with.
#Clear output variable
$Output = #()
#Get your files
$Files = Get-ChildItem -Recurse -Path "*" -Exclude "Output.txt"
#Loop through files
$Files | ForEach-Object {
#Use Regular expression to match the desired serial number string
$Matched = Get-Content $_.FullName | Select-String -AllMatches 'S\/N:([A-Za-z0-9]*)'
#Loop through the matched strings
$Matched | ForEach-Object {
#Save to $Output the grouped (inner) string i.e. you want "9VMJ31W0" not "S/N:9VMJ31W0"
$Output += $_.Matches.Groups.Value
}
}
#Write output to file
$Output | Out-File Output.txt
A little searching pointed me to powershell's SubString method when dealing with strings. See this page for more information about strings.
PS C:\Scripts\updates> $f = gc C:\Scripts\p.txt
PS C:\Scripts\updates> $f
DESKTOP HARD DRIVE (S/N:9VMJ31W0)
DESKTOP HARD DRIVE (S/N:9VMJ31W1)
DESKTOP HARD DRIVE (S/N:9VMJ31W2)
PS C:\Scripts\updates> $f | GM
(Truncated)
Substring Method string Substring(int startIndex), str
PS C:\Scripts\updates> $f.substring(24,8) | out-file C:\Temp\HDDSerials.txt
PS C:\Scripts\updates> Get-Content C:\Temp\HDDSerials.txt
9VMJ31W0
9VMJ31W1
9VMJ31W2
For starters, this will output all the lines that contain "DESKTOP HARD DRIVE" in all the files in "D:\MyFolder" and subfolders (-Recurse) and append them to "D:\MyFolder\Output.txt".
Get-ChildItem -Recurse -Path "D:\MyFolder" -Exclude "Output.txt" |
% {Get-Content $_.FullName | Where-Object {$_ -like '*DESKTOP HARD DRIVE*'} |
Select-Object} | Out-File "D:\MyFolder\Output.txt"
Best to send output to a separate folder, or use the -Exclude to exclude it from being processed.
Regular Expressions are the way to do this as it doesn't matter where in the file the Serial number is, it will find it:
#Clear output variable
$Output = #()
#Get your files
$Files = Get-ChildItem -Recurse -Path "*" -Exclude "Output.txt"
#Loop through files
$Files | ForEach-Object {
#Use Regular expression to match the desired serial number string
$Matched = Get-Content $_.FullName | Select-String -AllMatches 'S\/N:([A-Za-z0-9]*)'
#Loop through the matched strings
$Matched | ForEach-Object {
#Save to $Output the grouped (inner) string i.e. you want "9VMJ31W0" not "S/N:9VMJ31W0"
$Output += $_.Matches.Groups[1].Value
}
}
#Write output to file
$Output | Out-File Output.txt
If you want to more specifically match "DESKTOP HARD DRIVE (S/N:9VMJ31W0)" and not just "S/N:9VMJ31W0" then you change the matches to:
Select-String -AllMatches 'DESKTOP HARD DRIVE \(S\/N:([A-Za-z0-9]*)\)'
Here is a one-liner:
Select-String -Path C:\temp\files\*.txt -Exclude output.txt -Pattern '(?<=S/N:)\w+(?=\))' -AllMatches |
Select-Object -ExpandProperty matches |
Select-Object -ExpandProperty Value |
Out-File -FilePath C:\temp\files\output.txt -Append
uses a lookbehind to find text after S\N: and a lookahead for the end brace )
Note: this assumes that your text is stored in text files *.txt
I have a list of strings in a CSV file. The format is:
OldValue,NewValue
223134,875621
321321,876330
....
and the file contains a few hundred rows (each OldValue is unique). I need to process changes over a number of text files in a number of folders & subfolders. My best guess of the number of folders, files, and lines of text are - 15 folders, around 150 text files in each folder, with approximately 65,000 lines of text in each folder (between 400-500 lines per text file).
I will make 2 passes at the data, unless I can do it in one. First pass is to generate a text file I will use as a check list to review my changes. Second pass is to actually make the change in the file. Also, I only want to change the text files where the string occurs (not every file).
I'm using the following Powershell script to go through the files & produce a list of the changes needed. The script runs, but is beyond slow. I haven't worked on the replace logic yet, but I assume it will be similar to what I've got.
# replace a string in a file with powershell
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}
if (Test-Path "C:\Work\ListofAllFilenamesToSearch.txt") { # if file exists
Remove-Item "C:\Work\ListofAllFilenamesToSearch.txt"
}
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames1 = Search $filefolder1 $ftype
$filenames1 | Out-File "C:\Work\ListofAllFilenamesToSearch.txt" -Width 2000
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
(Get-Content "C:\Work\NumberXrefList.CSV" |where {$_.readcount -gt 1}) | foreach{
$OldFieldValue, $NewFieldValue = $_.Split("|")
$filenamelist = (Get-Content "C:\Work\ListofAllFilenamesToSearch.txt" -ReadCount 5) #|
foreach ($j in $filenamelist) {
#$testvar = (Get-Content $j )
#$testvar = (Get-Content $j -ReadCount 100)
$testvar = (Get-Content $j -Delimiter "\n")
Foreach ($i in $testvar)
{
if ($i -imatch $OldFieldValue) {
$j + "|" + $OldFieldValue + "|" + $NewFieldValue | Out-File "C:\Work\FilesThatNeedToBeChanged.txt" -Width 2000 -Append
}
}
}
}
$FileFolder = (Get-Content "C:\Work\FilesThatNeedToBeChanged.txt" -ReadCount 5)
Get-ChildItem $FileFolder -Recurse |
select -ExpandProperty fullname |
foreach {
if (Select-String -Path $_ -SimpleMatch $OldFieldValue -Debug -Quiet) {
(Get-Content $_) |
ForEach-Object {$_ -replace $OldFieldValue, $NewFieldValue }|
Set-Content $_ -WhatIf
}
}
In the code above, I've tried several things with Get-Content - default, with -ReadCount, and -Delimiter - in an attempt to avoid an out of memory error.
The only thing I have control over is the length of the old & new replacement strings file. Is there a way to do this in Powershell? Is there a better option/solution? I'm running Windows 7, Powershell version 3.0.
Your main problem is that you're reading the file over and over again to change each of the terms. You need to invert the looping of the replace terms and looping of the files. Also, pre-load the csv. Something like:
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames = gci -Path $filefolder1 -Filter $ftype -Recurse
$replaceValues = Import-Csv -Path "C:\Work\NumberXrefList.CSV"
foreach ($file in $filenames) {
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues) {
$contents = $contents -replace $replaceValue.OldValue, $replaceValue.NewValue
}
Copy-Item $file "$file.old"
Set-Content -Path $file -Value $contents
}
How can I change the following code to look at all the .log files in the directory and not just the one file?
I need to loop through all the files and delete all lines that do not contain "step4" or "step9". Currently this will create a new file, but I'm not sure how to use the for each loop here (newbie).
The actual files are named like this: 2013 09 03 00_01_29.log. I'd like the output files to either overwrite them, or to have the SAME name, appended with "out".
$In = "C:\Users\gerhardl\Documents\My Received Files\Test_In.log"
$Out = "C:\Users\gerhardl\Documents\My Received Files\Test_Out.log"
$Files = "C:\Users\gerhardl\Documents\My Received Files\"
Get-Content $In | Where-Object {$_ -match 'step4' -or $_ -match 'step9'} | `
Set-Content $Out
Give this a try:
Get-ChildItem "C:\Users\gerhardl\Documents\My Received Files" -Filter *.log |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to the original file
$content | Where-Object {$_ -match 'step[49]'} | Set-Content $_.FullName
#filter and save content to a new file
$content | Where-Object {$_ -match 'step[49]'} | Set-Content ($_.BaseName + '_out.log')
}
To get the content of a directory you can use
$files = Get-ChildItem "C:\Users\gerhardl\Documents\My Received Files\"
Then you can loop over this variable as well:
for ($i=0; $i -lt $files.Count; $i++) {
$outfile = $files[$i].FullName + "out"
Get-Content $files[$i].FullName | Where-Object { ($_ -match 'step4' -or $_ -match 'step9') } | Set-Content $outfile
}
An even easier way to put this is the foreach loop (thanks to #Soapy and #MarkSchultheiss):
foreach ($f in $files){
$outfile = $f.FullName + "out"
Get-Content $f.FullName | Where-Object { ($_ -match 'step4' -or $_ -match 'step9') } | Set-Content $outfile
}
If you need to loop inside a directory recursively for a particular kind of file, use the below command, which filters all the files of doc file type
$fileNames = Get-ChildItem -Path $scriptPath -Recurse -Include *.doc
If you need to do the filteration on multiple types, use the below command.
$fileNames = Get-ChildItem -Path $scriptPath -Recurse -Include *.doc,*.pdf
Now $fileNames variable act as an array from which you can loop and apply your business logic.
Other answers are great, I just want to add... a different approach usable in PowerShell:
Install GNUWin32 utils and use grep to view the lines / redirect the output to file http://gnuwin32.sourceforge.net/
This overwrites the new file every time:
grep "step[49]" logIn.log > logOut.log
This appends the log output, in case you overwrite the logIn file and want to keep the data:
grep "step[49]" logIn.log >> logOut.log
Note: to be able to use GNUWin32 utils globally you have to add the bin folder to your system path.