I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$_.FullName
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
Get-Unique
}
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array
I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
}
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.
Related
I have some log files.
Some of the UPDATE SQL statements are getting errors, but not all.
I need to know all the statements that are getting errors so I can find the pattern of failure.
I can sort all the log files and get the unique lines, like this:
$In = "C:\temp\data"
$Out1 = "C:\temp\output1"
$Out2 = "C:\temp\output2"
Remove-Item $Out1\*.*
Remove-Item $Out2\*.*
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$content | Where-Object {$_ -match 'STATEMENT'} | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out2\* | Sort-Object -Unique | Set-Content $Out3\output.txt
Works great.
But some of the logs have a leading date-time stamp in the leading 24 char. I need to strip that out, or all those lines are unique.
If it helps, all the files either have the leading timestamp or they don't. The lines are not mixed within a single file.
Here is what I have so far:
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$s = $content | Where-Object {$_ -match 'STATEMENT'}
# strip datetime from front if exists
If (Where-Object {$s.Substring(0,1) -Match '/d'}) { $s = $s.Substring(24) }
$s | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out1\* | Sort-Object -Unique | Set-Content $Out2\output.txt
But it just write the lines out without stripping the leading chars.
Regex /d should be \d (\ is the escape character in general, and character-class shortcuts such as d for a digit[1] must be prefixed with it).
Use a single pipeline that passes the Where-Object output to a ForEach-Object call where you can perform the conditional removal of the numeric prefix.
$content |
Where-Object { $_ -match 'STATEMENT' } |
ForEach-Object { if ($_[0] -match '\d') { $_.Substring(24) } else { $_ } } |
Set-Content $Out1\$_
Note: Strictly speaking, \d matches everything that the Unicode standard considers a digit, not just the ASCII-range digits 0 to 9; to limit matching to the latter, use [0-9].
I have a multiple text files and I need to find and count unique specific words in those files.
Like we need to find how many users logged in for certain time from multiple log files.
I have created the following code, its working fine for lesser files but for multiple larger files its taking too much time
$A =Get-Content C:\Users\XXXXXXX\Documents\Python\Test\*.log | ForEach-Object { $wrds=$_.Split(" "); foreach ($i in $wrds) { Write-Output $i } } | Sort-Object | Get-Unique | select-string -pattern "AAA" -CaseSensitive -SimpleMatch
is it possible to finetune this to run faster.
If I understand correctly, you would like to find certain user logins occurring in many log files, based on your use of Select-String.
# an array of usernames to search for
$users = 'user1', 'user2', 'userX'
# create a regex from this array by joining the values with regex 'OR' (the pipe symbol)
[regex]$regex = ($users | ForEach-Object { [regex]::Escape($_)}) -join '|'
# or if you need whole string matches instead of allowing partial matches, use
# [regex]$regex = '\b({0})\b' -f (($users | ForEach-Object { [regex]::Escape($_)}) -join '|')
# get a list of all log files
$logFiles = Get-ChildItem -Path 'C:\Users\XXXXXXX\Documents\Python\Test' -Filter '*.log' -File
# loop trhough the list of log files and find the matches in each of them
$result = foreach ($file in $logFiles) {
$allmatches = $regex.Matches(($file | Get-Content -Raw))
$logins = #($allmatches.Value | Select-Object -Unique)
if ($logins.Count) {
[PsCustomObject]#{
LogFile = $file.FullName
LoginCount = $logins.Count
Users = $logins -join ', '
}
}
}
# visual output
$result | Out-GridView -Title 'Login search results'
# or save as CSV file
$result | Export-Csv -Path 'X:\somewhere\results.csv' -NoTypeInformation
I have a CSV file which is structured like this:
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
So the Value in the second field (separator ';') marks the data which belongs together and value 140000001 or 140000671 is the trigger.
So the result should be:
1st file: 140000001.txt
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
2nd file: 140000671.txt
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
For now I found a snippet which splits the big file by the second field:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\\*"
$header = Get-Content -Path $src | select -First 1
Get-Content -Path $src | select -Skip 1 | foreach {
$file = "$(($_ -split ";")[1]).txt"
Write-Verbose "Wrting to $file"
$file = $file.Replace('"',"")
if (-not (Test-Path -Path $dstDir\$file))
{
Out-File -FilePath $dstDir\$file -InputObject $header -Encoding ascii
}
$file -replace '"', ""
Out-File -FilePath $dstDir\$file -InputObject $_ -Encoding ascii -Append
}
For the rest I'm standing in the dark.
Please help.
The Import-CSV cmdlet will work here, if you don't already know about it. I would use that, as it returns all the rows as different objects in an array, with the properties being the column values. And you don't have to manually remove the quotes and such. Assuming the second column is a date time value, and should be unique for each group of 4 consecutive rows, then this will work:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
$DateTimeGroups = $csv | Group-Object -Property 'ColumnTwoHeader'
foreach ($group in $DateTimeGroups) {
$filename = $group.Group.'ColumnFiveHeader' | select -Unique
$group.Group | Export-CSV "$dstDir\$filename.txt" -Append -NoTypeInformation
}
However, this will break if two of those "groups of 4 consecutive rows" have the same value for the second column and the fifth column. There isn't a way to fix this unless you are certain that there will always be 4 consecutive rows in each time group. In which case:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
if ($csv.count % 4 -ne 0) {
Write-Error "CSV does not have a proper number of rows. Attempting to continue will be bad :)"
return
}
for ($i = 0 ; $i -lt $csv.Count ; $i=$i+4) {
$group = $csv[$i..($i+4)]
$group | Export-Csv "$dstDir\$($group[3].'ColumnFiveHeader').txt" -Append -NoTypeInformation
}
Just be sure to replace Column2Header and Column5Header with the appropriate values.
If performance is not a concern, combining Import-Csv / Export-Csv with Group-Object allows the most concise, direct expression of your intent, using PowerShell's ability to convert CSV to objects and back:
$src = "C:\temp\ORD001.txt" # Input CSV file
$dstDir = "C:\temp\files" # Output directory
# Delete previous output files, if necessary.
Remove-Item -Path "$dstDir\*" -WhatIf
# Import the source CSV into custom objects with properties named for the columns.
# Note: The assumption is that your CSV header line defines columns "Col1", "Col2", ...
Import-Csv $src -Delimiter ';' |
# Group the resulting objects by column 2
Group-Object -Property Col2 |
ForEach-Object { # Process each resulting group.
# Determine the output filename via the group's last row's column 5 value.
$outFile = '{0}\{1}.txt' -f $dstDir, $_.Group[-1].Col5
# Append the group at hand to the target file.
$_.Group | Export-Csv -Append -Encoding Ascii $outFile -Delimiter ';' -NoTypeInformation
}
Note:
The assumption - in line with your sample data - is that it is always the last row in a group of lines sharing the same column-2 value whose column 5 contains the root of the output filename (e.g., 140000001)
Sorry but I don't have a Header Column. It's a semikolon seperated txt file for an interface
You can simply read the file with Get-Content, and then search for the trigger in the line.
I hope this small example can help:
$file = Get-Content CSV_File.txt
$140000001 = #()
$140000671 = #()
$bTrig = #()
foreach($line in $file){
$bTrig += $line
if($line -match ';"140000001";'){
$140000001 += $bTrig
$bTrig = #()
}
elseif($line -match ';"140000671";'){
$140000671 += $bTrig
$bTrig = #()
}
}
if($bTrig.Count -ne 0){Write-Warning "No trigger for $bTrig"}
$140000001 | Out-File 140000001.txt -Encoding ascii
$140000671 | Out-File 140000671.txt -Encoding ascii
I have a text file containing some data as follows:
test|wdthe$muce
check|muce6um#%
How can I check for a particular string like test and retrieve the text after the | symbol to a variable in a PowerShell script?
And also,
If Suppose there is variable $from=test#abc.com and how to search the file by splitting the text before "#" ?
this may be one possible solution
$filecontents = #'
test|wdthe$muce
check|muce6um#%
'#.split("`n")
# instead of the above, you would use this with the path of the file
# $filecontents = get-content 'c:\temp\file.txt'
$hash = #{}
$filecontents | ? {$_ -notmatch '^(?:\s+)?$'} | % {
$split = $_.Split('|')
$hash.Add($split[0], $split[1])
}
$result = [pscustomobject]$hash
$result
# and to get just what is inside 'test'
$result.test
*note: this may only work if there is only one of each line in the file. if you get an error, try this other method
$search = 'test'
$filecontents | ? {$_ -match "^$search\|"} | % {
$_.split('|')[1]
}
First you need to read the text from the file.
$content = Get-Content "c:\temp\myfile.txt"
Then you want to grab the post-pipe portion of each matching line.
$postPipePortion = $content | Foreach-Object {$_.Substring($_.IndexOf("|") + 1)}
And because it's PowerShell you could also daisy-chain it together instead of using variables:
Get-Content "C:\temp\myfile.txt" | Foreach-Object {$_.Substring($_.IndexOf("|") + 1)}
The above assumes that you happen to know every line will include a | character. If this is not the case, you need to select out only the lines that do have the character, like this:
Get-Content "C:\temp\myfile.txt" | Select-String "|" | Foreach-Object {$_.Line.Substring($_.Line.IndexOf("|") + 1)}
(You need to use the $_.Line instead of just $_ now because Select-String returns MatchInfo objects rather than strings.)
Hope that helps. Good luck.
gc input.txt |? {$_ -match '^test'} |% { $_.split('|') | select -Index 1 }
or
sls '^test' -Path input.txt |% { $_.Line.Split('|') | select -Index 1 }
or
sls '^test' input.txt |% { $_ -split '\|' | select -Ind 1 }
or
(gc input.txt).Where{$_ -match '^test'} -replace '.*\|'
or
# Borrowing #Anthony Stringer's answer shape, but different
# code, and guessing names for what you're doing:
$users = #{}
Get-Content .\input.txt | ForEach {
if ($_ -match "(?<user>.*)\|(?<pwd>.*)") {
$users[$matches.user]=$matches.pwd
}
}
$users = [pscustomobject]$users
I have a short script in which I am recursively searching for a string and writing out some results. However I have hundreds of strings to search for, so I would like to grab the value from a CSV file use it as my string search and move to the next row.
Here is what I have:
function searchNum {
#I would like to go from manual number input to auto assign from CSV
$num = Read-Host 'Please input the number'
get-childitem "C:\Users\user\Desktop\SearchFolder\input" -recurse | Select String -pattern "$num" -context 2 | Out-File "C:\Users\user\Desktop\SearchFolder\output\output.txt" -width 300 -Append -NoClobber
}
searchNum
How can I run through a CSV to assign the $num value for each line?
Do you have a CSV with several columns, one of which you want to use as search values? Or do you have a "regular" text file with one search pattern per line?
In case of the former, you could read the file with Import-Csv:
$filename = 'C:\path\to\your.csv'
$searchRoot = 'C:\Users\user\Desktop\SearchFolder\input'
foreach ($pattern in (Import-Csv $filename | % {$_.colname})) {
Get-ChildItem $searchRoot -Recurse | Select-String $pattern -Context 2 | ...
}
In case of the latter a simple Get-Content should suffice:
$filename = 'C:\path\to\your.txt'
$searchRoot = 'C:\Users\user\Desktop\SearchFolder\input'
foreach ($pattern in (Get-Content $filename})) {
Get-ChildItem $searchRoot -Recurse | Select-String $pattern -Context 2 | ...
}
I assume you need something like this
$csvFile = Get-Content -Path "myCSVfile.csv"
foreach($line in $csvFile)
{
$lineArray = $line.Split(",")
if ($lineArray -and $lineArray.Count -gt 1)
{
#Do a search num with the value from the csv file
searchNum -num $lineArray[1]
}
}
This will read a csv file and call you function for each line. The parameter given will be the value in the csv file (the second item on the csv line)