Read numbers from multiple files and sum

Read numbers from multiple files and sum - powershell

I have a logfile C:\temp\data.log
It contains the following data:
totalSize = 222,6GB
totalSize = 4,2GB
totalSize = 56,2GB
My goal is to extract the numbers from the file and sum them up including the number after the comma. So far it works if I don't regex the number included with value after comma, and only use the number in front of the comma. The other problem I have is if the file only contains one row like below example, if it only contains one line it splits up the number 222 into three file containing the number 2 in three files. If the above logfile contains 2 lines or more it works and sums up as it should, as long I don't use value with comma.
totalSize = 222,6GB
Here is a bit of the code for the regex to add to end of existing variable $regex included with comma is:
[,](\d{1,})
I haven't included the above regex, as it does not sum up properly then.
The whole script is below:
#Create path variable to store contents grabbed from $log_file
$extracted_strings = "C:\temp\amount.txt"
#Create path variable to read from original file
$log_file = "C:\temp\data.log"
#Read data from file $log_file
Get-Content -Path $log_file | Select-String "(totalSize = )" | out-file $extracted_strings
#Create path variable to write only numbers to file $output_numbers
$output_numbers = "C:\temp\amountresult.log"
#Create path variable to write to file jobblog1
$joblog1_file = "C:\temp\joblog1.txt"
#Create path variable to write to file jobblog2
$joblog2_file = "C:\temp\joblog2.txt"
#Create path variable to write to file jobblog3
$joblog3_file = "C:\temp\joblog3.txt"
#Create path variable to write to file jobblog4
$joblog4_file = "C:\temp\joblog4.txt"
#Create path variable to write to file jobblog5
$joblog5_file = "C:\temp\joblog5.txt"
#Create pattern variable to read with select string
$regex = "[= ](\d{1,})"
select-string -Path $extracted_strings -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_numbers
(Get-Content -Path $output_numbers)[0..0] -replace '\s' > $joblog1_file
(Get-Content -Path $output_numbers)[1..1] -replace '\s' > $joblog2_file
(Get-Content -Path $output_numbers)[2..2] -replace '\s' > $joblog3_file
(Get-Content -Path $output_numbers)[3..3] -replace '\s' > $joblog4_file
(Get-Content -Path $output_numbers)[4..4] -replace '\s' > $joblog5_file
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
$result
So my questions is:
How can I get this to work if the file C:\temp\data.log only contains one string without dividing that single number into multiple files. It should also work if it contains multiple strings, as it is now it works with multiple strings.
And how can I include the comma values in the calculation?
The result I get if I run this script should be 282, maybe its even possible to shorten the script?

Where $log_file has contents like the example above.
Get-Content $log_file | Where-Object{$_ -match "\d+(,\d+)?"} |
ForEach-Object{[double]($matches[0] -replace ",",".")} |
Measure-Object -Sum |
Select-Object -ExpandProperty sum
Match all of the lines that have numerical values with optional commas. I am assuming they could be optional as I do not know how whole numbers appear. Replace the comma with a period and cast as a double. Using measure object we sum up all the values and expand the result.
Not the only way to do it but it is simple enough to understand what is going on.
You can always wrap the above up in a loop so that you can use it for multiple files. Get-ChildItem "C:temp\" -Filter "job*" | ForEach-Object... etc.

Matt's helpful answer shows a concise and effective solution.
As for what you tried:
As for why a line with a single token such as 222,6 can result in multiple outputs in this command:
select-string -Path $extracted_strings -Pattern $regex -AllMatches |
% { $_.Matches } | % { $_.Value } > $output_numbers
Your regex, [= ](\d{1,}), does not explain the symptom, but just \d{1,} would, because that would capture 222 and 6 separately, due to -AllMatches.
[= ](\d{1,}) probably doesn't do what you want, because [= ] matches a single character that can be either a = or a space; with your sample input, this would only ever match the space before the numbers.
To match characters in sequence, simply place them next to each other: = (\d{1,})
Also note that even though you're enclosing \d{1,} in (...) to create a capture group, your later code doesn't actually use what that capture group matched; use (...) only if you need it for precedence (in which case you can even opt out of subexpression capturing with (?:...)) or if you do have a need to access what the subexpression matched.
That said, you could actually utilize a capture group here (an alternative would be to use a look-behind assertion), which allows you to both match the leading =<space> for robustness and extract only the numeric token of interest (saving you the need to trim whitespace later).
If we simplify \d{1,} to \d+ and append ,\d+ to also match the number after the comma, we get:
= (\d+,\d+)
The [System.Text.RegularExpressions.Match] instances returned by Select-String then allow us to access what the capture group captured, via the .Groups property (the following simplified example also works with multiple input lines):
> 'totalSize = 222,6GB' | Select-String '= (\d+,\d+)' | % { $_.Matches.Groups[1].Value }
222,6
On a side note: your code contains a lot of repetition that could be eliminated with arrays and pipelines; for instance:
$joblog1_file = "C:\temp\joblog1.txt"
$joblog2_file = "C:\temp\joblog2.txt"
$joblog3_file = "C:\temp\joblog3.txt"
$joblog4_file = "C:\temp\joblog4.txt"
$joblog5_file = "C:\temp\joblog5.txt"
could be replaced with (create an array of filenames, using a pipeline):
$joblog_files = 1..5 | % { "C:\temp\joblog$_.txt" }
and
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
could then be replaced with (pass the array of filenames to Get-Content):
$result = Get-Content $joblog_files

Related

Duplicate lines in a text file multiple times based on a string and alter duplicated lines

SHORT: I am trying to duplicate lines in all files in a folder based on a certain string and then replace original strings in duplicated lines only.
Contents of the original text file (there are double quotes in the file):
"K:\FILE1.ini"
"K:\FILE1.cfg"
"K:\FILE100.cfg"
I want to duplicate the entire line 4 times only if a string ".ini" is present in a line.
After duplicating the line, I want to change the string in those duplicated lines (original line stays the same) to: for example, ".inf", ".bat", ".cmd", ".mov".
So the expected result of the script is as follows:
"K:\FILE1.ini"
"K:\FILE1.inf"
"K:\FILE1.bat"
"K:\FILE1.cmd"
"K:\FILE1.mov"
"K:\FILE1.cfg"
"K:\FILE100.cfg"
Those files are small, so using streams is not neccessary.
I am at the beginning of my PowerShell journey, but thanks to this community, I already know how to replace string in files recursively:
$directory = "K:\PS"
Get-ChildItem $directory -file -recurse -include *.txt |
ForEach-Object {
(Get-Content $_.FullName) -replace ".ini",".inf" |
Set-Content $_.FullName
}
but I have no idea how to duplicate certain lines multiple times and handle multiple string replacements in those duplicated lines.
Yet ;)
Could point me in the right direction?

To achieve this with the operator -replace you can do:
#Define strings to replace pattern with
$2replace = #('.inf','.bat','.cmd','.mov','.ini')
#Get files, use filter instead of include = faster
get-childitem -path [path] -recurse -filter '*.txt' | %{
$cFile = $_
#add new strings to array newData
$newData = #(
#Read file
get-content $_.fullname | %{
#If line matches .ini
If ($_ -match '\.ini'){
$cstring = $_
#Add new strings
$2replace | %{
#Output new strings
$cstring -replace '\.ini',$_
}
}
#output current string
Else{
$_
}
}
)
#Write to disk
$newData | set-content $cFile.fullname
}
This gives you the following output:
$newdata
"K:\FILE1.inf"
"K:\FILE1.bat"
"K:\FILE1.cmd"
"K:\FILE1.mov"
"K:\FILE1.ini"
"K:\FILE1.cfg"
"K:\FILE100.cfg"

Powershell Files fetch

Am looking for some help to create a PowerShell script.
I have a folder where I have lots of files, I need only those file that has below two content inside it:
must have any matching string pattern as same as in file file1 (the content of file 1 is -IND 23042528525 or INDE 573626236 or DSE3523623 it can be more strings like this)
also have date inside the file in between 03152022 and 03312022 in the format mmddyyyy.
file could be old so nothing to do with creation time.
then save the result in csv containing the path of the file which fulfill above to conditions.
Currently am using the below command that only gives me the file which fulfilling the 1 condition.
$table = Get-Content C:\Users\username\Downloads\ISIN.txt
Get-ChildItem `
-Path E:\data\PROD\server\InOut\Backup\*.txt `
-Recurse |
Select-String -Pattern ($table)|
Export-Csv C:\Users\username\Downloads\File_Name.csv -NoTypeInformation

To test if a file contains a certain keyword from a range of keywords, you can use regex for that. If you also want to find at least one valid date in format 'MMddyyyy' in that file, you need to do some extra work.
Try below:
# read the keywords from the file. Ensure special characters are escaped and join them with '|' (regex 'OR')
$keywords = (Get-Content -Path 'C:\Users\username\Downloads\ISIN.txt' | ForEach-Object {[regex]::Escape($_)}) -join '|'
# create a regex to capture the date pattern (8 consecutive digits)
$dateRegex = [regex]'\b(\d{8})\b' # \b means word boundary
# and a datetime variable to test if a found date is valid
$testDate = Get-Date
# set two variables to the start and end date of your range (dates only, times set to 00:00:00)
$rangeStart = (Get-Date).AddDays(1).Date # tomorrow
$rangeEnd = [DateTime]::new($rangeStart.Year, $rangeStart.Month, 1).AddMonths(1).AddDays(-1) # end of the month
# find all .txt files and loop through. Capture the output in variable $result
$result = Get-ChildItem -Path 'E:\data\PROD\server\InOut\Backup'-Filter '*.txt'-File -Recurse |
ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
# first check if any of the keywords can be found
if ($content -match $keywords) {
# now check if a valid date pattern 'MMddyyyy' can be found as well
$dateFound = $false
$match = $dateRegex.Match($content)
while ($match.Success -and !$dateFound) {
# we found a matching pattern. Test if this is a valid date and if so
# set the $dateFound flag to $true and exit the while loop
if ([datetime]::TryParseExact($match.Groups[1].Value,
'MMddyyyy',[CultureInfo]::InvariantCulture,
[System.Globalization.DateTimeStyles]::None,
[ref]$testDate)) {
# check if the found date is in the set range
# this tests INCLUDING the start and end dates
$dateFound = ($testDate -ge $rangeStart -and $testDate -le $rangeEnd)
}
$match = $match.NextMatch()
}
# finally, if we also successfully found a date pattern, output the file
if ($dateFound) { $_.FullName }
elseif ($content -match '\bUNKNOWN\b') {
# here you output again, because unknown was found instead of a valid date in range
$_.FullName
}
}
}
# result is now either empty or a list of file fullnames
$result | set-content -Path 'C:\Users\username\Downloads\MatchedFiles.txt'

PowerShell - Find and replace multiple patterns to anonymize file

I need you help. I have a log.txt file with various data in it which I have to anonymize.
I would like to retrieve all these "strings" matching a predefined patterns, and replace these by another values for each of them. What is important is that each new string from the same pattern (and with different value from the previous) should be replaced by the predefined value increased by +1 (e.g. "orderID = 123ABC" becomes "orderID = order1" and "orderID=456ABC" becomes "orderID=order2").
The patterns to search for are more than 20 so it is not possible to put them all in single line.
My idea is:
Define "patterns.txt" file
Define "replace.txt" file ("pattern" value and replacement value)
Search for all "patterns" in the log file, the result will be ARRAY
Find the unique entries in that ARRAY
Get the "replacement" value for each unique entry in the ARRAY
Replace all occurrences in log.txt. The tricky part here is that any occurrence of the same type (but different value from the previous one) needs to be incremented by (+1) in order to be different from the one before.
Example of what I have :
requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>12345a-12345b-12345c</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
orderID>012345ABCDE</orderID
orderID>012345ABCDE</orderID
orderID>ABCDE012345</orderID
orderID>ABCDE012345</orderID
keyId>XYZ123</keyId
keyId>ABC987</keyId
keyId>XYZ123</keyId
Desired result:
requestID>Request-1</requestID
requestID>Request-2</requestID
requestID>Request-1</requestID
requestID>Request-1</requestID
orderID>Order-1</orderID
orderID>Order-1</orderID
orderID>Order-2</orderID
orderID>Order-2</orderID
keyId>Key-1</keyId
keyId>Key-2</keyId
keyId>Key-1</keyId
For the moment I managed only to find the unique values per type:
$N = "C:\FindAndReplace\input.txt"
$Patterns = "C:\FindAndReplace\pattern.txt"
(Select-String $N -Pattern 'requestID>\w{6}-\w{6}-\w{6}</requestID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<orderID>\w{20}</orderID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<keyId>\w{8}</keyId>').Matches.Value | Sort-Object -Descending -Unique
Thanks in advance for any suggestion on how to progress.

Your patterns don't match your sample data. I've corrected the patterns to accommodate the actual sample data.
It seems a simple hash table per type would fulfill the need to keep track of matches and counts. If we process the log file with a switch statement using the -Regex and -File parameters we can work on each line at a time. The logic for each is
Check if the current match exists in the specific type's match array.
If not, add it with it's replacement value (type-count) and increment count.
If it does exist, use the already defined replacement value.
Capture all the output in a variable and then write it out to file when done.
Create the example log file
$log = New-TemporaryFile
#'
<requestID>qwerty1-qwerty2-qwerty3</requestID> -match
<requestID>12345a-12345b-12345c</requestID>
<requestID>qwerty1-qwerty2-qwerty3</requestID>
<requestID>qwerty1-qwerty2-qwerty3</requestID>
<orderID>012345ABCDE</orderID>
<orderID>012345ABCDE</orderID>
<orderID>ABCDE012345</orderID>
<orderID>ABCDE012345</orderID>
<keyId>XYZ123</keyId>
<keyId>ABC987</keyId>
<keyId>XYZ123</keyId>
'# | Set-Content $log -Encoding UTF8
Define "tracker" variables for each type containing the count and a matches array
$Request = #{
Count = 1
Matches = #()
}
$Order = #{
Count = 1
Matches = #()
}
$Key = #{
Count = 1
Matches = #()
}
Read and process the log file line by line
$output = switch -Regex -File $log {
'<requestID>(\w{6,7}-\w{6,7}-\w{6,7})</requestID>' {
if(!$Request.matches.($matches.1))
{
$Request.matches += #{$matches.1 = "Request-$($Request.count)"}
$Request.count++
}
$_ -replace $matches.1,$Request.matches.($matches.1)
}
'<orderID>(\w{11})</orderID>' {
if(!$Order.matches.($matches.1))
{
$Order.matches += #{$matches.1 = "Order-$($Order.count)"}
$Order.count++
}
$_ -replace $matches.1,$Order.matches.($matches.1)
}
'<keyId>(\w{6})</keyId>' {
if(!$Key.matches.($matches.1))
{
$Key.matches += #{$matches.1 = "Key-$($Key.count)"}
$Key.count++
}
$_ -replace $matches.1,$Key.matches.($matches.1)
}
default {$_}
}
$output | Set-Content $log -Encoding UTF8
The $log file now contains
<requestID>Request-1</requestID>
<requestID>Request-2</requestID>
<requestID>Request-1</requestID>
<requestID>Request-1</requestID>
<orderID>Order-1</orderID>
<orderID>Order-1</orderID>
<orderID>Order-2</orderID>
<orderID>Order-2</orderID>
<keyId>Key-1</keyId>
<keyId>Key-2</keyId>
<keyId>Key-1</keyId>

PowerShell search for text and adding a line

Sorry for the long post. Wanted to explain in detail.
I'm trying to achieve three things and very nearly there. Probably a school boy error. Tried nested loops etc but could not get it working.
It appears I need to split the $resultszone array.
Search for specific areas within file. In the example below, it's the section after \zones\, test1.in-addr.arpa, test2.in-addr.arpa, etc.
Copy and trim content after area found. In first example, just test1.in-addr.arpa (Removing the beginning "\" and end "]"
Add a line including the area found (example test1.in-addr.arpa), to below the line containing "Type".
Example source file:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\DNS Server\Zones\test1.in-addr.arpa]
"Type"=dword:00000001
"SecureSecondaries"=dword:00000002
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\DNS Server\Zones\test2.in-addr.arpa]
"Type"=dword:00000001
"SecureSecondaries"=dword:00000002
Expected result
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\DNS Server\Zones\test1.in-addr.arpa]
"Type"=dword:00000001
"DatabaseFile"="test1.in-addr.arpa.dns"
"SecureSecondaries"=dword:00000002
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\DNS Server\Zones\test2.in-addr.arpa]
"Type"=dword:00000001
"DatabaseFile"="test2.in-addr.arpa.dns"
"SecureSecondaries"=dword:00000002
I've managed to achieve all using the code below, except it adds a line including all the results from area found, for every section.
For example:
"DatabaseFile"="test1.in-addr.arpa test2.in-addr.arpa
#Get FileName Path
$FileName = "C:\temp\test.conf"
#Search for pattern in file and trim to desired format.
#Store array in $resultsZone
$resultszone = Select-String -Path "c:\temp\test.conf" -Pattern '(?<=Zones)(.*)' |
select -expa matches |
select -expa value |
% { $_.Trim("\]") }
# Get contents of file
(Get-Content $FileName) | ForEach-Object {
#Start Loop to find area of File to insert line
$_ # send the current line to output
if ($_ -match "type") {
#Add Line after the selected pattern (type) including area trimmed
"""DatabaseFile" + """=""" + $resultszone + ".dns" + """"
}
} | Set-Content C:\temp\elctest.conf

I think this achieves what you're looking for:
$FileName = "C:\Temp\test.conf"
Get-Content $FileName | ForEach-Object {
$Match = ($_ | Select-String -pattern '(?<=Zones\\)(.*)').matches.value
if ($Match) { $LastMatch = ($Match).Trim("\]") }
$_
if ($LastMatch -and $_ -match 'type') {
"""DatabaseFile" + """=""" + $LastMatch + ".dns" + """"
}
} | Set-Content C:\Temp\elctest.conf
The fix is that we do the Select-String within the loop against each line, and then store when it matches in another variable (named $LastMatch) so that when we reach the line where we want to insert the previous time it matched, we have it.

Performing A String Operation in a -replace Expression

I'm trying to make using of String.Substring() to replace every string with its substring from a certain position. I'm having a hard time figuring out the right syntax for this.
$dirs = Get-ChildItem -Recurse $path | Format-Table -AutoSize -HideTableHeaders -Property #{n='Mode';e={$_.Mode};width=50}, #{n='LastWriteTime';e={$_.LastWriteTime};width=50}, #{n='Length';e={$_.Length};width=50}, #{n='Name';e={$_.FullName -replace "(.:.*)", "*($(str($($_.FullName)).Substring(4)))*"}} | Out-String -Width 40960
I'm referring to the following expression
e={$_.FullName -replace "(.:.*)", "*($(str($($_.FullName)).Substring(4)))*"}}
The substring from the 4th character isn't replacing the Full Name of the path.
The paths in question are longer than 4 characters.
The output is just empty for the Full Name when I run the script.
Can someone please help me out with the syntax
EDIT
The unaltered list of strings (as Get-ChildItem recurses) would be
D:\this\is\where\it\starts
D:\this\is\where\it\starts\dir1\file1
D:\this\is\where\it\starts\dir1\file2
D:\this\is\where\it\starts\dir1\file3
D:\this\is\where\it\starts\dir1\dir2\file1
The $_.FullName will therefore take on the value of each of the strings listed above.
Given an input like D:\this\is or D:\this\is\where, then I'm computing the length of this input (including the delimiter \) and then replacing $_.FullName with a substring beginning from the nth position where n is the length of the input.
If input is D:\this\is, then length is 10.
Expected output is
\where\it\starts
\where\it\starts\dir1\file1
\where\it\starts\dir1\file2
\where\it\starts\dir1\file3
\it\starts\dir1\dir2\file1

If you want to remove a particular prefix from a string you can do so like this:
$prefix = 'D:\this\is'
...
$_.FullName -replace ('^' + [regex]::Escape($prefix))
To remove a prefix of a given length you can do something like this:
$len = 4
...
$_.FullName -replace "^.{$len}"

When having trouble, simplify:
This function will do what you are apparently trying to accomplish:
Function Remove-Parent {
param(
[string]$Path,
[string]$Parent)
$len = $Parent.length
$Path.SubString($Len)
}
The following is not the way you likely would use it but does demonstrate that the function returns the expected results:
#'
D:\this\is\where\it\starts
D:\this\is\where\it\starts\dir1\file1
D:\this\is\where\it\starts\dir1\file2
D:\this\is\where\it\starts\dir1\file3
D:\this\is\where\it\starts\dir1\dir2\file1
'# -split "`n" | ForEach-Object { Remove-Parent $_ 'D:\This\Is' }
# Outputs
\where\it\starts
\where\it\starts\dir1\file1
\where\it\starts\dir1\file2
\where\it\starts\dir1\file3
\where\it\starts\dir1\dir2\file1
Just call the function with the current path ($_.fullname) and the "prefix" you are expecting to remove.
The function above is doing this strictly on 'length' but you could easily adapt it to match the actual string with either a string replace or a regex replace.
Function Remove-Parent {
param(
[string]$Path,
[string]$Parent
)
$remove = [regex]::Escape($Parent)
$Path -replace "^$remove"
}
The output was the same as above.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Read numbers from multiple files and sum - powershell

Related

Duplicate lines in a text file multiple times based on a string and alter duplicated lines

Powershell Files fetch

PowerShell - Find and replace multiple patterns to anonymize file

PowerShell search for text and adding a line

Performing A String Operation in a -replace Expression

Categories

Resources