Scan txt file for multiple strings and save the following lines - powershell

I have a problem that I am trying to solve, however, due to my non existing PowerShell knowledge it is proving to be harder than I hoped. So any help would be appreciated.
The problem can be simplified as:
Find a string in a txtfile
Extract the information on the row after that string
Store the information in a handle
Find a second string in the txtfile and repeat the procedure
Store both strings in a new file or delete everything else in the txt file.
I am then trying to do this for approx 20k files. I would love to have the information under their keyword and comma delimited so that I can import them in other systems.
My files look somewhat like the following
random words
that are unimportant
Keyword
FirstlineofNumbersthatIwanttoExtract
random words again that are unimportant
Secondkeyword
SecondLineOfNumbersThatIWantToExtract
end of the file
All files are however not similar in terms of the row that the lines I want to extract are on. I would the output to be something like
Keyword, SecondKeyword
FirstLineOfNumbersThatIWantToExtract, SecondLineOfNumbersThatIWantToExtract
And done. I got this far
[System.IO.DirectoryInfo]$folder = 'C:\users\xx\Desktop\mappcent3'
foreach ($file in ($folder.EnumerateFiles())) {
if ($file.Extension -eq '.txt') {
$content = Get-Content $file
$FirstRegex = 'KeyWordOne
(.+)$'
$First_output = "\1"
$test = Select-String -Path $file.FullName -Pattern $FirstRegex
}
}

This would do something similar to what you are asking. This requires PowerShell 3.0+
$path = 'C:\users\xx\Desktop\mappcent3'
$firstKeyword = "Keyword"
$secondKeyword = "Secondkeyword"
$resultsPath = "C:\Temp\results.csv"
Get-ChildItem $path -Filter "*.txt" | ForEach-Object{
# Read the file in
$fileContents = Get-Content $_.FullName
# Find the first keyword data
$firstKeywordData = ($fileContents | Select-String -Pattern $firstKeyword -Context 0,1 -SimpleMatch).Context.PostContext[0]
# Find the second keyword data
$secondKeywordData = ($fileContents | Select-String -Pattern $secondKeyword -Context 0,1 -SimpleMatch).Context.PostContext[0]
# Create a new object with details gathered.
[pscustomobject][ordered]#{
File = $_.FullName
FirstKeywordData = $firstKeywordData
SecondKeywordData = $secondKeywordData
}
} | Export-CSV $resultsPath -NoTypeInformation
Select-String is what does most of the magic here. We take advantage of -Context which consumes lines before and after the match. We want the one following so that is why we use 0,1. Wrap that up in a custom object and then we can export it to a CSV file.
Keyword Overlap
Beware that your keywords can overlap and create odd results in your output files. In your sample Keyword matches multiple lines so the result set would reflect that.
If you did just want to write back to the original file you could easily do that as well
"$firstKeywordData,$secondKeywordData" | Set-Content $_.FullName
Or something similar.

The Select-String cmdlet has a -Context parameter that makes it easy to extract lines before or after the line on which there's a match.
You can use Export-Csv to export to the format you require (although with 20K files you may want to write directly to the output files)
foreach($file in Get-ChildItem C:\users\xx\Desktop\mappcent3 |Where {-not $_.PsIsContainer})
{
$FirstKeyword = 'FirstKeyword'
$FirstLine = Select-String -Path $file.FullName -Pattern $FirstKeyword -Context 0,1 |Select -Expand Context -First 1 |Select -Expand PostContext
$SecondKeyword = 'SecondKeyword'
$SecondLine = Select-String -Path $file.FullName -Pattern $SecondKeyword -Context 0,1 |Select -Expand Context -First 1 |Select -Expand PostContext
New-Object psobject -Property #{$FirstKeyword=$FirstLine;$SecondKeyword=$SecondLine} |Export-Csv (Join-Path $file.DirectoryName ($file.BaseName + '_keywords.txt'))
}

Related

Extract the end of every lines from multiple files

I have several txt files distributed in several sub-folders.
This is what a file looks like:
Data file. version 01.10.
1
8
*
DAT\Trep\Typ10
DAT\Trep\Typ12
DAT\Trep\Typ13
what I would like to do is to extract only the part after the last "\" in order to get something like this:
Typ10 FileName.txt Path
Typ12 FileName.txt Path
Typ13 FileName.txt Path
...
I tried the following
Get-ChildItem -Path 'D:\MyData\*.txt' -Recurse | ForEach-Object {Write-Output $_; $data=Get-Content $_}
$data = $data -replace '.*\\'
$data
it works well for a single file but not with several (-recurse).
Being a powershell beginner I can't figure out how to improve my script.
I also tried to add this to get the result shown above in my post, but that doesn't work either.
Select-Object -Property #{Name = 'Result list'; Expression = { $data }}, Filename, Path
Thanks in advance for your kind help
Use Select-String:
Get-ChildItem -Path D:\MyData\*.txt -Recurse -File |
Select-String '^.+\\(.+)' |
ForEach-Object {
[pscustomobect] #{
Result = $_.Matches.Groups[1].Value
FileName = $_.FileName
Path = $_.Path
}
}
As for your desire to exclude certain folders during recursive traversal:
Unfortunately, Get-ChildItem -Exclude only excludes the matching folders themselves, not also their content. There are two relevant feature requests to potentially overcome this limitation in the future:
GitHub issue #4126 asks for path patterns to be supported too in the future.
GitHub issue #15159 proposes a new subtree-exclusion parameter, such as
-ExcludeRecursive.
For now, a different approach with post-filtering based on Where-Object is required, using folder names Folder1 and Folder2 as examples:
Get-ChildItem -Path D:\MyData\*.txt -Recurse |
Where-Object FullName -NotLike *\Folder1\* |
Where-Object FullName -NotLike *\Folder2\* |
Select-String '^.+\\(.+)' |
ForEach-Object {
[pscustomobect] #{
Result = $_.Matches.Groups[1].Value
FileName = $_.FileName
Path = $_.Path
}
}
For a more flexible, cross-platform approach based on regex matching (which is invariably more complex), see the bottom section of this answer.

Powershell search directory for code files with text matching input a txt file

Data mapping project, in house system to new vendor system. First step is find all the occurrences of current database field names (or column names to be precise) in the C# .cs source files. Trying to use Powershell. Have recently created PS searches with Get-ChildItem and Select-String that work well but the search string array was small and easily hard coded inline. But the application being ported has a couple hundred column names and significant amounts of code. So armed with a text file of all the column names Pipleline would seem like a god tool to create a the basic cross ref for further analysis. However, I was not able to get the Pipeline to work with an external variable anyplace other than first step. Trying using -PipelineVariable, $_. and global variable. Did not find anything specific after lots of searching. P.S. This is my first question to StackoOverflow, be kind please.
Here is what I hoped would work but do dice so far.
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr = [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String $s | Select-Object Path, LineNumber, line | Export-csv $outputfile
}
Did find that this will print the list one time but not twice. In fact it seems using the variable in this way results in processing simply skipping any further pipeline steps.
foreach ($s in $arr) {Write-Host $s | Write $s}
If it isn't possible to do this in Powershell easily my fallback is to do with C# although would much rather get the level up with PowerShell if anyone can point me to the correct understanding of how to do things in the Pipepline, or alternatively construct an equivalent function. Seems like such a natural fit for Powershell.
Thanks.
You're calling Export-csv $outputfile in a loop, which rewrites the whole file in every iteration, so that only the last iteration's output will end up in the file.
While you could use -Append to iteratively append to the output file, it is worth aking a step back: Select-String can accept an array of patterns, causing a line that matches any of them to be considered a match.
Therefore, your code can be simplified as follows:
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
-Pattern (Get-Content $inputFile) passes the lines of input file $inputFile as an array of patterns to match.
By default, these lines are interpreted as regexes (regular expressions); to ensure that they're treated as literals, add -SimpleMatch to the Select-String call.
This answer to a follow-up question shows how to include the specific pattern among the multiple ones passed to -Pattern that matched on each line in the output.
I think you want to append each occurrence to the csv file. And you need to get the content of the file. Try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | Export-csv -Append -Path "$outputfile"
}
}
-Append was not introduced before powershell v3.0 (Windows 8) then try this:
$inputFile = "C:\DataColumnsNames.txt"
$outputFile = "C:\DataColumnsUsages.txt"
$arr [string[]](Get-Content $inputfile)
foreach ($s in $arr) {
Get-ChildItem -Path "C:ProjectFolder\*" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force | Foreach {
Get-Content "$_.Fullname" | Select-String $s | Select-Object Path, LineNumber, line | ConvertTo-CSV -NoTypeInformation | Select-Object -Skip 1 | Out-File -Append -Path "$outputfile"
}
}

Import .csv to create a list of filenames and corresponding owners

I am working on creating a script that will read a .csv document containing a single column of filenames (one per cell) and search a larger folder for each of the files matching the filenames provided and identify the 'owner' using:
(get-acl $file).owner
Currently I have several bits of code that can do individual parts, but I am having a hard time tying it all together. Ideally, a user can simply input file names into the .csv file, then run the script to output a second .csv or .txt identifying each file name and it's owner.
csv formatting will appear as below (ASINs is header):
ASINs
B01M8N1D83.MAIN.PC_410
B01M14G0JV.MAIN.PC_410
Pull file names without header:
$images = Get-Content \\path\ASINs.csv | Select -skip 1
Find images in larger folder to pull full filename/path (not working):
ForEach($image in $images) {
$images.FullName | ForEach-Object
{
$ASIN | Get-ChildItem -Path $serverPath -Filter *.jpg -Recurse -ErrorAction SilentlyContinue -Force | Set-Content \\path\FullNames.csv
}
}
At that point I would like to use the full file paths provided by FullNames.csv to pull the owners from the files in their native location using the above mentioned:
(get-acl $file).owner
Does anyone have any ideas how to tie these together into one fluid script?
EDIT
I was able to get the following to work without the loop, reading one of the filenames, but I need it to loop as there are multiple filenames.
New CSV Format:
BaseName
B01LVVLSCM.MAIN.PC_410
B01LVY65AN.MAIN.PC_410
B01MAXORH6.MAIN.PC_410
B01MTGEMEE.MAIN.PC_410
New Script:
$desktopPath = [System.Environment]::GetFolderPath([System.Environment+SpecialFolder]::Desktop)
$images = $desktopPath + '\Get_Owner'
Get-ChildItem -Path $images | Select BaseName | Export-Csv $desktopPath`\Filenames.csv -NoTypeInformation
$serverPath = 'C:\Users\tuggleg\Desktop\Archive'
$files = Import-Csv -Path $desktopPath`\Filenames.csv
While($true) {
ForEach ($fileName in $files.BaseName)
{
Get-ChildItem -Path $serverPath -Filter "*$fileName*" -Recurse -ErrorAction 'SilentlyContinue' |
Select-Object -Property #{
Name='Owner'
Expression={(Get-Acl -Path $_.FullName).Owner}
},'*' |
Export-Csv -Path $desktopPath`\Owners.csv -NoTypeInformation
}
}
Any ideas on the loop issue? Thanks everyone!
This example assumes your csv contains partial filenames. It will search the filepath and filter for those partials.
Example.csv
"ASINs"
"B01M8N1D83.MAIN.PC_410"
"B01M14G0JV.MAIN.PC_410"
Code.ps1
$Files = Import-Csv -Path '.\Example.csv'
ForEach ($FileName in $Files.ASINs)
{
Get-ChildItem -Path $serverPath -Filter "*$FileName*" -Recurse -ErrorAction 'SilentlyContinue' |
Select-Object -Property #{
Name='Owner'
Expression={(Get-Acl -Path $_.FullName).Owner}
},'*' |
Export-Csv -Path '\\path\FullNames.csv' -NoTypeInformation
}

Using Powershell to export to CSV with columns

I am trying to do a simple script that pulls in the name of the file and the contents of said text file into a CSV file. I am able to pull in all of the information well enough but it's not splitting up into different columns in the CSV file. When I open up the CSV file in excel everything is in the first column, and I need the two bits of information separated into separate columns. So far my working code is as follows:
$Data = Get-ChildItem -Path c:path -Recurse -Filter *.txt |
where {$_.lastwritetime -gt(Get-Date).addDays`enter code here`(-25)}
$outfile = "c:path\test.csv"
rm $outfile
foreach ($info in $Data) {
$content = Get-Content $info.FullName
echo "$($info.BaseName) , $content" >> $outfile
}
I figured out how to seperate the information by rows but I need it by columns. I'm new to powershell and can't seem to get past this little speed bump. Any input would be greatly appreciated. Thanks in advance!
Output:
Itm# , TextContent
Itm2 , NextTextContent
What I need:
Itm# | Text Content |
Itm2 | NextTextContent |
Except for a few syntactical errors your code appears to be working as expected. I worry if you are having issues in Excel with you text import. I touched up your code a bit but it is functionally the same as what you had.
$Data = Get-ChildItem -Path "C:\temp" -Recurse -Filter *.txt |
Where-Object {$_.LastWriteTime -gt (Get-Date).addDays(-25)}
$outfile = "C:\temp\test.csv"
If(Test-Path $outfile){Remove-Item $outfile -Force}
foreach ($info in $Data) {
$content = Get-Content $info.FullName
"$($info.BaseName) , $content" | Add-Content $outfile
}
I don't know what version of Excel you have but look for the text import wizard.
Do you mean something like this?
Get-ChildItem -Path C:\path -Recurse -Filter *.txt |
Where-Object { $_.LastWriteTime -gt (Get-Date).AddDays(-25) } | ForEach-Object {
New-Object PSObject -Property #{
"Itm#" = $_.FullName
"TextContent" = Get-Content $_.FullName
} | Select-Object Itm#,TextContent
} | Export-Csv List.csv -NoTypeInformation
Excel will treat the data in csv files which are delimited bij the ; as a single columns.
I always use the -delimiter switch on export-csv or convertto-csv to set this as a delimiter.

Loop through a CSV Column

I have a short script in which I am recursively searching for a string and writing out some results. However I have hundreds of strings to search for, so I would like to grab the value from a CSV file use it as my string search and move to the next row.
Here is what I have:
function searchNum {
#I would like to go from manual number input to auto assign from CSV
$num = Read-Host 'Please input the number'
get-childitem "C:\Users\user\Desktop\SearchFolder\input" -recurse | Select String -pattern "$num" -context 2 | Out-File "C:\Users\user\Desktop\SearchFolder\output\output.txt" -width 300 -Append -NoClobber
}
searchNum
How can I run through a CSV to assign the $num value for each line?
Do you have a CSV with several columns, one of which you want to use as search values? Or do you have a "regular" text file with one search pattern per line?
In case of the former, you could read the file with Import-Csv:
$filename = 'C:\path\to\your.csv'
$searchRoot = 'C:\Users\user\Desktop\SearchFolder\input'
foreach ($pattern in (Import-Csv $filename | % {$_.colname})) {
Get-ChildItem $searchRoot -Recurse | Select-String $pattern -Context 2 | ...
}
In case of the latter a simple Get-Content should suffice:
$filename = 'C:\path\to\your.txt'
$searchRoot = 'C:\Users\user\Desktop\SearchFolder\input'
foreach ($pattern in (Get-Content $filename})) {
Get-ChildItem $searchRoot -Recurse | Select-String $pattern -Context 2 | ...
}
I assume you need something like this
$csvFile = Get-Content -Path "myCSVfile.csv"
foreach($line in $csvFile)
{
$lineArray = $line.Split(",")
if ($lineArray -and $lineArray.Count -gt 1)
{
#Do a search num with the value from the csv file
searchNum -num $lineArray[1]
}
}
This will read a csv file and call you function for each line. The parameter given will be the value in the csv file (the second item on the csv line)