Powershell script to sum only specific records from file(s) - powershell

I am trying to work with a directory full of files.
I want to find specific rows within the file,
from those rows, extract a numeric value
and them sum up all these values, for all values, in a directory.
It would look like this...
File1.txt
bread:123
ham:456
eggs:789
File2.txt
bread:999
mayo:789
eggs:123
and so on...
I want to find the row with eggs, extract the number, and sum these numbers together across files.
I found this script from other posts but it's only segements, I still have trouble understanding how to use and pipe/ variables /braces.
dir . -filter "*.txt" -Recurse -name | foreach{(GC $_).Count} | measure-object -sum
#?
Get-Content | Select-String -Pattern "eggs*"
#?
$record -split ":"
I want the script to say "eggs = 912" which would be 123 + 789 = 912

Here is a possible solution:
$pattern = 'eggs'
$sum = Get-ChildItem . -File -Recurse -Filter *.txt |
Get-Content |
Where-Object { $_ -match $pattern } |
ForEach-Object { ($_ -split ':')[1] } |
Measure-Object -Sum |
ForEach-Object Sum
"$pattern = $sum"
Output:
eggs = 912
Get-ChildItem finds all files recursively that match the filter
Get-Content reads each line of every file and passes that on in the pipeline
Where-Object includes only lines that match the given RegEx pattern
The ForEach-Object line splits the line at : and extracts the sub string, which is at array index [1].
Measure-Object accumulates all numbers (it converts strings to double, if necessary). Internally, it creates a variable in its begin block, accumulates the pipeline input to this variable in its process block and outputs the variable value in its end block.
The last ForEach-Object line is necessary because Measure-Object actually outputs an object with a Sum property, but we only want the value of that property, not the entire object. If you'd remove that line you'd have to write "$pattern = $($sum.Sum)" instead, to access the Sum property of the sum object.

You can treat the as csv files. import-csv doesn't take wildcards for the filename.
import-csv file1.txt,file2.txt -Delimiter : -Header item,amount |
where item -eq eggs | measure -sum amount
Count : 2
Average :
Sum : 912
Maximum :
Minimum :
StandardDeviation :
Property : amount

Related

Export the matched output to a CSV as Column 1 and Column 2 using Powershell

I have below code to match the pattern and save it in CSV file. I need to save regex1 and regex2 as col1 and col2 in csv instead of saving all in 1st col.
$inputfile = ( Get-Content D:\Users\naham1224\Desktop\jil.txt )
$FilePath = "$env:USERPROFILE\Desktop\jil2.csv"
$regex1 = "(insert_job: [A-Za-z]*_*\S*)"
$regex2 = "(machine: [A-Z]*\S*)"
$inputfile |
Select-String -Pattern $regex2,$regex1 -AllMatches |
ForEach-Object {$_.matches.groups[1].value} |
Add-Content $FilePath`
Input file contains : input.txt
/* ----------------- AUTOSYS_DBMAINT ----------------- */
insert_job: AUTOSYS_DBMAINT job_type: CMD
command: %AUTOSYS%\bin\DBMaint.bat
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "Runs DBmaint process on AE Database - if fails - MTS - will run next scheduled time"
std_out_file: ">$$LOGS\dbmaint.txt"
std_err_file: ">$$LOGS\dbmaint.txt"
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 0
notification_msg: "Check DBMaint output in autouser.PD1\\out directory"
notification_emailaddress: jnatal#cbs.com
/* ----------------- TEST_ENV ----------------- */
insert_job: TEST_ENV job_type: CMD
command: set
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "output env"
std_out_file: ">C:\Users\svc.autosys\Documents\env.txt"
std_err_file: ">C:\Users\svc.autosys\Documents\env.txt"
alarm_if_fail: 1
alarm_if_terminated: 1
Current output :
Current output
Expected output :
Expected output
I am trying various ways to do so but no luck. any suggestions and help is greatly appreciated.
Here is how I would do this:
$inputPath = 'input.txt'
$outputPath = 'output.csv'
# RegEx patterns to extract data.
$patterns = #(
'(insert_job): ([A-Za-z]*_*\S*)'
'(machine): ([A-Z]*\S*)'
)
# Create an ordered Hashtable to collect columns for one row.
$row = [ordered] #{}
# Loop over all occurences of the patterns in input file
Select-String -Path $inputPath -Pattern $patterns -AllMatches | ForEach-Object {
# Extract key and value from current match
$key = $_.matches.Groups[ 1 ].Value
$value = $_.matches.Value
# Save one column of current row.
$row[ $key ] = $value
# If we have all columns of current row, output it as PSCustomObject.
if( $row.Count -eq $patterns.Count ) {
# Convert hashtable to PSCustomObject and output (implicitly)
[PSCustomObject] $row
# Clear Hashtable in preparation for next row.
$row.Clear()
}
} | Export-Csv $outputPath -NoTypeInformation
Output CSV:
"insert_job","machine"
"insert_job: AUTOSYS_DBMAINT","machine: PWISTASASYS01"
"insert_job: TEST_ENV","machine: PWISTASASYS01"
Remarks:
Using Select-String with parameter -Path we don't have to read the input file beforehand.
An ordered Hashtable (a dictionary) is used to collect all columns, until we have an entire row to output. This is the crucial step to produce multiple columns instead of outputting all data in a single column.
Converting the Hashtable to a PSCustomObject is necessary because Export-Csv expects objects, not dictionaries.
While the CSV looks like your "expected output" and you possibly have good reason to expect it like that, in a CSV file the values normally shouldn't repeat the column names. To remove the column names from the values, simply replace $value = $_.matches.Value by $_.matches.Groups[ 2 ].Value, which results in an output like this:
"insert_job","machine"
"AUTOSYS_DBMAINT","PWISTASASYS01"
"TEST_ENV","PWISTASASYS01"
As for what you have tried:
Add-Content writes only plain text files from string input. While you could use it to create CSV files, you would have to add separators and escape strings all by yourself, which is easy to get wrong and more hassle than necessary. Export-CSV otoh takes objects as inputs and cares about all of the CSV format details automatically.
As zett42 mentioned Add-Content is not the best fit for this. Since you are looking for multiple values separated by commas Export-Csv is something you can use. Export-Csv will take objects from the pipeline, convert them to lines of comma-separated properties, add a header line, and save to file
I took a little bit of a different approach here with my solution. I've combined the different regex patterns into one which will give us one match that contains both the job and machine names.
$outputPath = "$PSScriptRoot\output.csv"
# one regex to match both job and machine in separate matching groups
$regex = '(?s)insert_job: (\w+).+?machine: (\w+)'
# Filter for input files
$inputfiles = Get-ChildItem -Path $PSScriptRoot -Filter input*.txt
# Loop through each file
$inputfiles |
ForEach-Object {
$path = $_.FullName
Get-Content -Raw -Path $path | Select-String -Pattern $regex -AllMatches |
ForEach-Object {
# Loop through each match found in the file.
# Should be 2, one for AUTOSYS_DBMAINT and another for TEST_ENV
$_.Matches | ForEach-Object {
# Create objects with the values we want that we can output to csv file
[PSCustomObject]#{
# remove next line if not needed in output
InputFile = $path
Job = $_.Groups[1].Value # 1st matching group contains job name
Machine = $_.Groups[2].Value # 2nd matching group contains machine name
}
}
}
} | Export-Csv $outputPath # Pipe our objects to Export-Csv
Contents of output.csv
"InputFile","Job","Machine"
"C:\temp\powershell\input1.txt","AUTOSYS_DBMAINT","PWISTASASYS01"
"C:\temp\powershell\input1.txt","TEST_ENV","PWISTATEST2"
"C:\temp\powershell\input2.txt","AUTOSYS_DBMAINT","PWISTASAPROD1"
"C:\temp\powershell\input2.txt","TEST_ENV","PWISTATTEST1"

Powershell Select-String Problem with variable

I have a CSV list of numbers that I would like to use Select-String to return the name of the file that the string is in.
When I do this
$InvoiceList = Import-CSV "C:\invoiceList.csv"
Foreach ($invoice in $InvoiceList)
{
$orderFilename = $FileList | Select-String -Pattern "3505343956" | Select Filename,Pattern
$orderFilename
}
It gives me a response, I realize it is in a loop, but it gives me a response (albeit many times). This is what I would like.
Order# 199450619.pdf.txt 3505343956
Order# 199450619.pdf.txt 3505343956
But, when I run this:
$InvoiceList = Import-CSV "C:\invoiceList.csv"
Foreach ($invoice in $InvoiceList)
{
$orderFilename = $FileList | Select-String -Pattern "$invoice" | Select Filename,Pattern
$orderFilename
}
or this
$InvoiceList = Import-CSV "C:\invoiceList.csv"
Foreach ($invoice in $InvoiceList)
{
$orderFilename = $FileList | Select-String -Pattern $invoice | Select Filename,Pattern
$orderFilename
}
I get nothing in return.
I know there is data in $invoice because if I just ouput $invoice, I get all the invoice numbers that are in the CSV.
What am I doing wrong?
Since $InvoiceList contains the output from a Import-Csv call, it contains custom objects with properties named for the CSV columns, not strings.
Therefore, you must explicitly access the property that contains the invoice number (as a string) in order to use it as search pattern with Select-String.
Assuming that the property / column name of interest is InvoiceNum (adjust as needed):
Foreach ($invoice in $InvoiceList.InvoiceNum) { # Note the .InvoiceNum
$FileList | Select-String -Pattern $invoice | Select Filename,Pattern
}
Note:
Even though $InvoiceList contains an array of objects, PowerShell allows you to access a property that only exists on the elements of the array (.InvoiceNum here), and get the elements' property values as a result - this convenient feature is called member-access enumeration.
However, note that Select-String's -Pattern parameter accepts an array of search patterns, so you can shorten your command as follows, which also improves performance:
$FileList |
Select-String -Pattern (Import-Csv C:\invoiceList.csv).InvoiceNum |
Select-Object Filename,Pattern

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

Is there a way to use Select-String to print the line number and the matched value (not the line)?

I am using powershell to search several files for a specific match to a regular expression. Because it is a regular expression, I wan't to only see what I have programmed my regex to accept, and the line number at which it is matched.
I then want to take the matched value and the line number and create an object to output to an excel file.
I can get each item in individual select string statements, but then they won't be matched up with each other
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select LineNumber, Matches.Value
#Will only print out the lineNumber
Select-String -Path $pathToFile -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Foreach {$_.matches} | Select value
#Will only print matched value and can't print linenumber
Can anyone help me get both the line number and the matched value?
Edit: Just to clarify what I am doing
$files = Get-ChildItem $directory -Include *.vb,*.cs -Recurse
$colMatchedFiles = #()
foreach ($file in $files) {
$fileContent = Select-String -Path $file -Pattern '(?<={\n\s*Box\s=\s")14\d{3}(?=",)' |
Select-Object LineNumber, #{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
write-host $fileContent #just for checking if there is anything
}
This still does not get anything, it just outputs a bunch of blank lines
Edit: What I am expecting to happen is for this script to search the content of all the files in the directory and find the lines that match the regular expression. Below is what I would expect for output for each file in the loop
LineNumber Match
---------- -----
324 15
582 118
603 139
... ...
File match sample:
{
Box = "5015",
Description = "test box 1"
}....
{
Box = "5118",
Description = "test box 2"
}...
{
Box = "5139",
Description = "test box 3"
}...
Example 1
Select the LineNumber and group value for each match. Example:
$sampleData = #'
prefix 1 A B suffix 1
prefix 2 A B suffix 2
'# -split "`n"
$sampleData | Select-String '(A B)' |
Select-Object LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value}}
Example 2
Search *.vb and *.cs for files containing the string Box = "<n>", where <n> is some number, and output the filename, line number of the file, and the number on the box = lines. Sample code:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' |
Select-Object Path,
LineNumber,
#{Name="Match"; Expression={$_.Matches[0].Groups[1].Value -as [Int]}}
This returns output like the following:
Path LineNumber Match
---- ---------- -----
C:\Temp\test1.cs 2 5715
C:\Temp\test1.cs 6 5718
C:\Temp\test1.cs 10 5739
C:\Temp\test1.vb 2 5015
C:\Temp\test1.vb 6 5118
C:\Temp\test1.vb 10 5139
Example 3
Now that we know that we want the line before the match to contain {, we can use the -Context parameter with Select-String. Example:
Get-ChildItem $pathToFiles -Include *.cs,*.vb -Recurse |
Select-String 'box = "(\d+)"' -Context 1 | ForEach-Object {
# Line prior to match must contain '{' character
if ( $_.Context.DisplayPreContext[0] -like "*{*" ) {
[PSCustomObject] #{
Path = $_.Path
LineNumber = $_.LineNumber
Match = $_.Matches[0].Groups[1].Value
}
}
}

Display Multiple pipeline Values Of Same Get-Content

I am trying to get the string value in csv file.
$path = "product.csv"
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]"
I successfully grab the string, however I wish display the line numbers then the string values.
Example Output:
LineNo String
1 a
2 b
3 c
I did successfully grab the linenumber using below command. How should I combine the command with the first command so the output will be alike the example output?
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber
If you want the entire line, select the Line property:
... |Select-Object LineNumber,Line
If you only want the part of the line that was matched by the pattern, you'll need a calculated property to grab the Value from the Matches property:
... |Select-Object LineNumber,#{Name='String';Expression={$_.Matches.Value}}