Read specific text from text files - powershell

I wrote a PowerShell script to compare two text files. In file1 the data is organised. But in file2 the data is not organised. I usually organize data manually. But now the data is increased. I need to automate organising using PowerShell.
PowerShell has to read data between two special characters. For example: <****# is my data. It has to read **** only. this pattern repeats 'n' number of times.

Use a regular expression <(.*?)# to match the relevant substring. .*? matches all characters between < and the next occurrence of # (non-greedy/shortest match). The parentheses put the match in a capturing group, so it can be referenced later.
Select-String -Path 'C:\path\to\file2.txt' -Pattern '<(.*?)#' -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Groups[1].Value }
$_.Groups[1] refers to the first capturing group in a match.

Related

Powershell - Scan for multiple strings in multiple files

I am having issues resolving a revision to the script below.
This script will take in a number of key words, either manually added or read from a file.
It will output the data when it finds a match by listing the File Name, Line number and the search word. Unfortunately if I'm searching for multiple words it has to scan the files for each separate word. That means if I have 20 search words, it will open, scan, close each file 20 times. Once for each word.
Not good as it takes time and I still have to troll through the output to total how many matches per file.
Every change I make is disastrous as it prints every single search word without knowing what word was the match or worse it fails to run.
Would anyone be able to help me alter the script to scan the files once for ALL the search words and list out only the matches in a readable way like the output below?
$searchWords="Address", "City","State"
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}
-Pattern accepts an array of patterns, and which of those pattern caused a given match can be accessed via the .Pattern property of Select-String's output objects:[1]
Get-Childitem -Path "C:\Workspace\src" -Recurse -include "*.cbl" |
Select-String -Pattern "Address", "City", "State" |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
Note: I'm passing the search words as a literal array here, for brevity; in your code, simply replace "Address", "City", "State" with $searchWords (without enclosure in "...").
As an aside: Using -Filter instead of -Include can speed up your command, and, given that your command arguments don't contains spaces or other metacharacters, quoting them is optional:
Get-Childitem -Path C:\Workspace\src -Recurse -Filter *.cbl |
Select-String -Pattern Address, City, State |
Select Path, LineNumber, #{n='SearchWord';e={$_.Pattern}}
[1] Note: Only the first among the specified pattern that matches, in the order specified, that matches on a given line is reported as the matching pattern for that line - even if others would match too. Even -AllMatches doesn't change that - it would only report multiple matches per line for that first pattern.

How to pull strings from multiple .txt files based on file name and append them to a new file?

so basically I have a series of .txt files following the same nomenclature, such as:
1111210803_3CE_080977851__006908818__21110300013442021110420211105_20211110_120447_35418862_820
1111210933_3CE_006908818__2111040001442021110520211108_20211110_120447_35418860_820
The naming convention on these all files always starts with the date, ie 111121. Inside these files, you have lines of strings. I am interested in pulling a specific string in the first line from each of these files. Here is an example of the first line:
123456789012345678901234567890123 I 696969CCHKCTX 12345678901 DA 22758287
In particular, I am interested in the 696969CCHKCTX string. All files will have some numbers followed by the CCHKCTX value. I want to pull the 696969 part of the 696969CCHKCTX string from each .txt file and append them all into a new file.
If possible, I would like to sum those strings and add the appropriate decimal place, as they are to actually be dollar values ie 696969 is actually representing 6969.69 and the last two numbers in that string always represent the cent amount. This rule applies to all the .txt files. I want to be able to apply this to all files that are the same date (ie all files starting with 111121)
How would I go about this?
Try the following, which combines Get-ChildItem, Group-Object, and ForEach-Object, as well as the -replace operator:
Get-ChildItem -File | # get files of interest; add path / filter as needed.
Group-Object { $_.Name.Substring(0, 6) } | # group by shared date prefix
ForEach-Object {
$firstLines = $_.Group | Get-Content -First 1 # get all 1st lines
# Extract the cents amounts and sum them.
$sumCents = 0.0
$firstLines.ForEach({
$sumCents += [double] ($_ -replace '.+\b(\d+)CCHKCTX\b.+', '$1')
})
# Output an object with the date prefix and the sum dollar amount.
[pscustomobject] #{
Date = $_.Name
Sum = $sumCents / 100
}
}
The above outputs a table-formatted representation to the display. You can save it to a file with > / Out-File, for instance, though it's better to use a structured text format for later processing, such as Export-Csv.

select-string -pattern wildcard for a specific field

I'm still pretty new to powershell. Not sure to fix this or even what I am doing wrong.
My ultimate goal is to pull codes of 5 groups of 5 characters, groups of characters are delimited by -, from a long txt file. Example JTI45-534YS-PKQN6-MSE9S-2PFNM. There are multiple of these and I need to pull them all out of a file at once.
I'm trying multiple different variations on
Select-String .\reducedCodes.txt -Pattern "*-*-*-*-*"
or
Select-String .\reducedCodes.txt -Pattern "?????-?????-?????-?????-?????"
Thanks in advance.
Since your string that you are looking for looks like an alphanumeric, you could use a regex word, with is denoted by \w. And since there are five in a row, you could use \w{5} then they are separated by the -character. So Select-String normally gives you the lines containing the matches, and you just want the matches, you can then get the Matches property, where Value is the full match. Also note the Groups property where if you put the \w{5} inside () you could get an individual group.
(Select-String .\reducedCodes.txt -Pattern '\w{5}-\w{5}-\w{5}-\w{5}-\w{5}').Matches.Value

Add quotes to each column in a CSV via Powershell

I am trying to create a Powershell script which wraps quotes around each columns of the file on export to CSV. However the Export-CSV applet only places these where they are needed, i.e. where the text has a space or similar within it.
I have tried to use the following to wrap the quotes on each line but it ends up wrapping three quotes on each column.
$r.SURNAME = '"'+$r.SURNAME+'"';
Is anyone able to share how to forces these on each column of the file - so far I can just find info on stripping these out.
Thanks
Perhaps a better approach would be to simply convert to CSV (not export) and then a simple regex expression could add the quotes then pipe it out to file.
Assuming you are exporting the whole object $r:
$r | ConvertTo-Csv -NoTypeInformation `
| % { $_ -replace ',(.*?),',',"$1",' } `
| Select -Skip 1 | Set-Content C:\temp\file.csv
The Select -Skip 1 removes the header. If you want the header just take it out.
To clarify what the regex expression is doing:
Match: ,(.*?),
Explanation: This will match section of each line that has a comma followed by any number of characters (.*) without being greedy (? : basically means it will only match the minimum number of characters that is needed to complete the match) and the finally is ended with a comma. The parenthesis will hold everything between the two commas in a match variable to be used later in the replace.
Replace: ,"$1",
Explanation: The $1 holds the match between the two parenthesis mention above in the match. I am surrounding it with quotes and re-adding the commas since I matched on those as well they must be replaced or they are simply consumed. Please note, that while the match portion of the -replace can have double quotes without an issue, the replace section must be surrounded in single quotes or the $1 gets interpreted by PowerShell as a PowerShell variable and not a match variable.
You can also use the following code:
$r.SURNAME = "`"$($r.SURNAME)`""
I have cheated to get what I want by re-parsing the file through the following - guess that it acts as a simple find and replace on the file.
get-content C:\Data\Downloads\file2.csv
| foreach-object { $_ -replace '"""' ,'"'}
| set-content C:\Data\Downloads\file3.csv
Thanks for the help on this.

Powershell extract values from file

I have a log file that has a lot of fields listed in it. I want to extract the fields out of the file, but I don't want to search through the file line by line.
I have my pattern:
$pattern="Hostname \(Alias\):(.+)\(.+Service: (.+)"
This will give me the two values that I need. I know that if I have a string, and I'm looking for one match I can use the $matches array to find the fields. In other words, If I'm looking at a single line in the file using the string variable $line, I can extract the fields using this code.
if($line -matches $pattern){
$var1=$matches[1]
$var2=$matches[2]
}
But how can I get these values without searching line by line? I want to pass the whole file as a single string, and add the values that I am extracting to two different arrays.
I'm looking for something like
while($filetext -match $pattern){
$array1+=$matches[1]
$array2+=$matches[2]
}
But this code puts me in an infinite loop if there is even one match. So is there a nextMatch function I can use?
PowerShell 2.0 addressed this limitation by adding the -AllMatches parameter to the Select-String cmdlet e.g.:
$filetext | Select-String $pattern -AllMatches |
Foreach {$_.Matches | Foreach {$_.Groups[1] | Foreach {$_.Value}}}