I have some log files.
Some of the UPDATE SQL statements are getting errors, but not all.
I need to know all the statements that are getting errors so I can find the pattern of failure.
I can sort all the log files and get the unique lines, like this:
$In = "C:\temp\data"
$Out1 = "C:\temp\output1"
$Out2 = "C:\temp\output2"
Remove-Item $Out1\*.*
Remove-Item $Out2\*.*
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$content | Where-Object {$_ -match 'STATEMENT'} | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out2\* | Sort-Object -Unique | Set-Content $Out3\output.txt
Works great.
But some of the logs have a leading date-time stamp in the leading 24 char. I need to strip that out, or all those lines are unique.
If it helps, all the files either have the leading timestamp or they don't. The lines are not mixed within a single file.
Here is what I have so far:
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$s = $content | Where-Object {$_ -match 'STATEMENT'}
# strip datetime from front if exists
If (Where-Object {$s.Substring(0,1) -Match '/d'}) { $s = $s.Substring(24) }
$s | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out1\* | Sort-Object -Unique | Set-Content $Out2\output.txt
But it just write the lines out without stripping the leading chars.
Regex /d should be \d (\ is the escape character in general, and character-class shortcuts such as d for a digit[1] must be prefixed with it).
Use a single pipeline that passes the Where-Object output to a ForEach-Object call where you can perform the conditional removal of the numeric prefix.
$content |
Where-Object { $_ -match 'STATEMENT' } |
ForEach-Object { if ($_[0] -match '\d') { $_.Substring(24) } else { $_ } } |
Set-Content $Out1\$_
Note: Strictly speaking, \d matches everything that the Unicode standard considers a digit, not just the ASCII-range digits 0 to 9; to limit matching to the latter, use [0-9].
Related
How to select strings from file via one lines?
For example my file contains strings
string1
string2
string3
string4
i want get
string2
string4
I try it this way
Get-Content -Path "E:\myfile.txt" | Select-String
but i don't know how make this from Select-String method
If you literally want to select these two lines, then I guess this is the shortest way to do that:
(Get-Content -Path "E:\myfile.txt")[1,3]
or
Get-Content -Path "E:\myfile.txt" | Select-Object -Index 1,3
However, if you mean you want to select only the even numbered lines from the file, you could do this:
# return only the even lines (for odd lines, do for ($i = 0; ...)
$text = Get-Content -Path "E:\myfile.txt"; for ($i = 1; $i -lt #($text).Count; $i+=2) { $text[$i] }
Or by using Select-String
# return only the even lines (for odd lines, remove the ! exclamation mark
(Select-String -Path "E:\myfile.txt" -Pattern '.*' | Where-Object {!($_.LineNumber % 2)}).Line
Get-Content -Path "~\Desktop\strings.txt" | Select-String -Pattern "string2|string4"
You can use the Where-Object cmdlet to filter a stream of objects (strings in this case):
Get-Content -Path "E:\myfile.txt" | Where-Object {$_ -match '[24]$'}
# or
Get-Content -Path "E:\myfile.txt" | Where-Object {$_ -like '*[24]'}
# or
Get-Content -Path "E:\myfile.txt" | Where-Object {$_.EndsWith('2') -or $_.EndsWith('4')'}
If you want only even-numbered lines from the file:
Get-Content -Path "E:\myfile.txt" | Where-Object {$_.ReadCount % 2 -eq 0}
There are 2 text files in the CWD, a.txt, b.txt. From a.txt, I would like to delete all lines whose first 5 characters are NOT present in b.txt as any lines' first 5 characters. (Or, stating otherwise, keep only those lines in a.txt, whose first 5 characters is present in b.txt as any lines' first 5 characters.) Content after the 5th character to the end of the line is irrelevant.
For example: a.txt
abcde000dsdsddsdsdsdsdsd
0123456xxx
kkk
xyzxyzxyzfeeeee
kkkkkkkkkkk
and b.txt:
012345aabbcc
kkkkkkkhhkkvv
nnnnnnn5777nnnn77567
Intended result (lines in a.txt whose 1-5 character is present in b.txt):
0123456xxx
kkkkkkkkkkk
When I am running the code, it gives me an empty results.txt, but no error messages. What I am missing?
$pattern = "^[5]"
$set1 = Get-Content -Path a.txt
$results = New-Object -TypeName System.Text.StringBuilder
Get-Content -Path b.txt | foreach {
if ($_ -match $pattern) {
[void]$results.AppendLine($_)
}
}
$results.ToString() | Out-File -FilePath .\results.txt -Encoding ascii
Your code doesn't work because your pattern doesn't match anything. The regular expression ^[5] means "the character '5' at the beginning of the string" (the square brackets define a character class), not "5 characters at the beginning of the string". The latter would be ^.{5}. Also, you never match the content of a.txt against the content of b.txt.
There are several ways to do what you want:
Extract the first 5 characters from each line of b.txt. to an array and compare the lines of a.txt against that array. Esperento57's answer sort of uses this approach, but in a way that requires PowerShell v3 or newer. A variant that'll work on all PowerShell versions could look like this:
$pattern = '^(.{5}).*'
$ref = (Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
Get-Unique
Get-Content 'a.txt' | Where-Object {
$ref -contains ($_ -replace $pattern, '$1')
} | Set-Content 'results.txt'
Since lookups in arrays are comparatively slow and don't scale well (they get significantly slower with increasing number of elements in the array) you could also put the reference values in a hashtable so you can do index lookups (which are significantly faster):
$pattern = '^(.{5}).*'
$ref = #{}
(Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
ForEach-Object { $ref[$_] = $true }
Get-Content 'a.txt' | Where-Object {
$ref.ContainsKey(($_ -replace $pattern, '$1'))
} | Set-Content 'results.txt'
Another alternative would be to build a second regular expression from the substrings extracted from b.txt and compare the content of a.txt against that expression:
$pattern = '^(.{5}).*'
$list = (Get-Content 'b.txt') -match $pattern -replace $pattern, '$1' |
Get-Unique |
ForEach-Object { [regex]::Escape($_) }
$ref = '^({0})' -f ($list -join '|')
(Get-Content 'a.txt') -match $ref | Set-Content 'results.txt'
Note that each of these approaches will ignore lines shorter than 5 characters.
try Something like this:
$listB=get-content "c:\temp\b.txt" | where {$_.Length -gt 4} | select #{N="First5";E={$_.Substring(0, 5)}}
get-content "c:\temp\a.txt" | where {$_.Length -gt 4 -and $_.Substring(0, 5) -in $listB.First5}
If performance is a concern, consider to use the hashtable(s) as index:
$Pattern = '^(.{5}).*'
$a = #{}; $b = #{}
Get-Content -Path a.txt | Where {$_ -Match $Pattern} | ForEach {$a[$Matches[1]] = #($a[$Matches[1]] + $_)}
Get-Content -Path b.txt | Where {$_ -Match $Pattern} | ForEach {$b[$Matches[1]] = #($b[$Matches[1]] + $_)}
$a.Keys | Where {$b.Keys -Contains $_} | ForEach {$a.$_} | Set-Content results.txt
I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$_.FullName
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
Get-Unique
}
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array
I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
}
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.
I have a massive amount of .nc files (text files) where I need to change different lines based on their linenumer and content.
Example:
So far I have:
Get-ChildItem I:\temp *.nc -recurse | ForEach-Object {
$c = ($_ | Get-Content)
$c = $c -replace "S355J2","S235JR2"
$c = $c.GetType() | Format-Table -AutoSize
$c = $c -replace $c[3],$c[4]
[IO.File]::WriteAllText($_.FullName, ($c -join "`r`n"))
}
This is not working, however, since it returns only a few PowerShell lines to each file, instead of the original (changed) content.
I don't know what you expect $c = $c.GetType() | Format-Table -AutoSize to do, but it most likely doesn't do whatever it is you're expecting.
If I understand your question correctly you essentially want to
remove the line pos,
replace the code S355J2 with S235JR2, and
remove a section SI if it exists.
The following code should work:
Get-ChildItem I:\temp *.nc -Recurse | ForEach-Object {
(Get-Content $_.FullName | Out-String) -replace 'pos\r\n\s+' -replace 'S355J2', 'S235JR2' -replace '(?m)^SI\r\n(\s+.*\n)+' |
Set-Content $_.FullName
}
Out-String mangles the content of the input file into a single string, and the daisy-chained replacement operations modify that string before it's written back to the file. The expression (?m)^SI\r\n(\s+.*\n)+ matches a line beginning with SI and followed by one or more indented lines. The (?m) modifier is to allow matching start-of-line in a multiline string, otherwise ^ would only match the beginning of the string.
Edit: If you need to replace variable text in the 3rd line with the text from the 4th line (thus duplicating the 4th line) you're indeed better off working with an array for that. Delay the mangling of the string array until after that replacement:
Get-ChildItem I:\temp *.nc -Recurse | ForEach-Object {
$txt = #(Get-Content $_.FullName)
$txt[3] = $txt[4]
($txt | Out-String) -replace 'S355J2', 'S235JR2' -replace '(?m)^SI\r\n(\s+.*\n)+' |
Set-Content $_.FullName
}
I would like to get content from files in a folder (ignoring the header lines, since some file may ONLY contain the header). But in the output, I would like to include the filename from which the line is read. So far, I have the following:
Get-ChildItem | Get-Content | Where { $_ -notlike "HEADER_LINE_TEXT" } | Out-File -FilePath output_text.txt
I've tried to work with creating a variable in the Where block, $filename=$_.BaseName, and using it in the output, but this didn't work.
EDIT:
I ended up with the following:
Get-ChildItem -Path . |
Where-Object { $_.FullName -like "*records.txt"; $fname=$_FullName; } |
Get-Content |
Select-Object { ($fname + "|" + $_.Trim()) } |
Where { $_ -notlike "*HEADER_LINE_TEXT*" } |
Format-Table -HideTableHeaders |
Out-File -FilePath output_text.txt
This looks lengthy, and can probably be made shorter and clearer. Can someone help with cleaning this up a bit? I'll either post the solution, or vote for a cleaner solution, if one is posted. Thanks.
This looks like a case where it would make it more readable to not make it a one liner at cost of a little additional memory usage.
$InputFolder = "C:\example"
$OutputFile = "C:\example\output_text.txt"
$Files = Get-ChildItem $InputFolder | Where-Object { $_.FullName -like "*records.txt"}
Foreach ($File in $Files) {
$FilteredContent = Get-Content $File.FullName | Where-Object {$_ -notlike "*HEADER_LINE_TEXT*"}
$Output = $FilteredContent | Foreach-Object { "$($File.FullName)|$($_.Trim())" }
$Output | Out-File $OutputFile -Append
}
If you are going to go oneliner style for brevity, you could cut down on length by using position for parameters and using aliases.
Here are a couple other changes:
No need for the second semicolon in your first where block.
I think your variable wasn't working because you were missing the period between $_ and fullname.
Format-Table isn't needed because you already have the string you want to output
You can optimize a little by moving the second where earlier so that you don't trim() on lines you are just going to filter
Looks like you want to use foreach instead of select
Removed the + operator for string concatenation, instead using $() to evaluate inside parenthesis
gci . |
? { $_.FullName -like "*records.txt"; $fname=$_.FullName } |
% { gc $_.FullName } |
? { $_ -notlike "*HEADER_LINE_TEXT*" } |
% { "$fname|$($_.Trim())" } |
Out-File output_text.txt