search first "x" amount of characters in a line, and output entire line - powershell

I have a text file that could contain up to 1000 lines of data in the following format:
14410:3012669|EU14410|20/01/2017||||1|6|4|OUT FROM UNDER||22/02/2017 04:01:47|22/02/2017 21:19:52
14:3012670|EU016271751|20/01/2017||||2|6|4|BLOCK BET|\\acis-prod\Pictures\Entry\EU01627.jpg|22/02/2017 04:02:02|22/02/2017 21:19:52
301111:3012671|EU016275|20/01/2017||||2|6|4|VITAE MEDICAL CLINIC|\\tm-prod\Pictures\Entry\EU01.jpg|22/02/2017 04:02:11|22/02/2017 21:19:53
each line will start with the following format
"set of characters up to max of 8":"set of characters unlimited max"
I want to search the characters ONLY up until the first colon. Those characters could contain any amount up to a maximum of 8. (hopefully shown well in my examples above) I'm trying to search those first characters, up to the ":" of each line to see if it contains a string, and return the whole line. still new to powershell so I've only tried a simple select:
$path = "C:\Users\ME\Desktop\acsep22\acsnic-20170222_233324.done"
Get-ChildItem $path -recurse | Select-String -pattern ("14410","3011981","3011982",) | out-file $logfile |format-list
which works - but I didn't take into account that the string could also appear twice in the same line ( though unrelated to the first 7 characters)
for example:
14410:3012669|EU14410|
contains 14410 twice, they're unrelated in terms of their significance and I only want to search and return based on the first number
could somebody help me achieve this or could some one point me toward the cmdlet that would help?
I've tried various searches online (and via the Microsoft online resource) but a lot of results are more to do with "return the first X amount of characters" rather than "search using only the first X amount and return line"
Kind Regards

You could use a simple Where-Object filter to check whether the string before the : is one of the strings you expect:
$strings = '14410','3011981','3011982'
Get-Content $path |Where-Object {$strings -contains ($_ -split ':')[0]}
This is probably the most PowerShell-idiomatic approach.
If you want to use Select-String, you'll have to construct a regex pattern that will match on strings that start with one of the strings and then a colon:
$strings = '14410','3011981','3011982'
$pattern = '^(?:{0}):' -f ($strings -join '|') # pattern is now '^(?:14410|3011981|3011982):'
Select-String -Path $path -Pattern $pattern
If you just want the bare string itself from the output, grab the Line property from the objects returned by Select-String:
Select-String -Path $path -Pattern $pattern |Select-Object -Expand Line
or
Select-String -Path $path -Pattern $pattern |ForEach-Object Line
The pattern above uses a non-capturing group (?:pattern-goes-here) to match any one of the strings, at the start ^ of a string, followed by :.
Both solutions will work with an arbitrary number of strings

Related

Use Select-String to get the single word matching pattern from files

I am trying to get only the word when using Select-String, but instead it is returning the whole string
Select-String -Path .\*.ps1 -Pattern '-Az' -Exclude "Get-AzAccessToken","-Azure","Get-AzContext"
I want to get all words in all .ps1 files that contain '-Az', for example 'New-AzHierarchy'
Select-String outputs objects of type Microsoft.PowerShell.Commands.MatchInfo by default, which supplement the whole line (input object) on which a match was found (.Line property) with metadata about the match (in PowerShell (Core) 7+, you can use -Raw to directly output the matching lines (input objects) only).
Note that in the default display output, it appears that only the matching lines are printed, with PowerShell (Core) 7+ now highlighting the part that matched the pattern(s).
Select-String's -Include / -Exclude parameters do not modify what patterns are matched; instead, they modify the -Path argument to further narrow down the set of input files. Since a wildcard expression as part of the -Path argument is usually sufficient, these parameters are rarely used.
Therefore:
Use the objects in the .Matches collection property Select-String's output objects to access the part of the line that actually matched the given pattern(s).
Since you want to capture entire command names that contain substring -Az, such as New-AzHierarchy, you must use a regex pattern that also captures the relevant surrounding characters: \w+-Az\w+
The simplest way to exclude specific matches is to filter them out afterwards, using a Where-Object call.
# Note: -AllMatches ensures that if there are *multiple* matches
# on a single line, they are all reported.
Select-String -Path .\*.ps1 -Pattern '\w+-Az\w+' -AllMatches |
ForEach-Object { $_.Matches.Value } |
Where-Object { $_ -notin 'Get-AzAccessToken', '-Azure', 'Get-AzContext' }

Powershell, how to capture argument(s) of Select-String and include with matched output

Thanks to #mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.
The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile) |
Select-Object Path, LineNumber, line |
Export-csv $outputfile
However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?
Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.
$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) |
Select-Object $x, Path, LineNumber, line |
Export-csv $outputile
Thanks.
The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.
The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.
Here's a simple example, using an array of strings to simulate lines from files as input:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -Pattern ('bar', 'foo') |
Select-Object Line, LineNumber, Pattern
The above yields:
Line LineNumber Pattern
---- ---------- -------
A fool and 1 foo
his barn 2 bar
foo and bar on the same line 4 bar
Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.
To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:
Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')
Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 foo
Now, 'foo' is properly reported as the matching pattern.
To report all patterns found on each line:
Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.
The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.
'A fool and',
'his barn',
'are soon parted.',
'foo and bar on the same line' |
Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) |
Select-Object Line, LineNumber,
#{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
This yields (abbreviated to show only the last match):
Line LineNumber Pattern
---- ---------- -------
...
foo and bar on the same line 4 {foo, bar}
Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.
The solid information and examples from #mklement0 were enough to point me in the right direction for researching and understanding more about Powershell and the object pipeline and calculated properties.
I was able to finally achieve my goals of a cross referencing a list of table and field names to the C# code base.The input file is simply table and field names, pipe delimited. (one of the glitches I had was not using pipe in the split, it was a visual error that took awhile to finally see, so check for that). The output is the table name, field name, code file name, line number and actual line. It's not perfect but much better than manual effort for a few hundred fields! And now there are possibilities for further automation in the data mapping and conversion project. Thought about using C# utility programming but that might have taken just as long to figure out and implement and much more cumbersome that a working Powershell.
The key for me at this point is "working"! My first deeper dive into the abstruse world of Powershell. The key points of my solution are the use of the calculated property to get the table and field names in the output, realization that expressions can be used in certain places like to build a Pattern and that the pipeline is passing only certain specific objects after each command (maybe that is too restricted of a view but it's better than what I had before).
Hope this helps someone in future. I could not find any examples close enough to get over the hump and so asked my first ever stackoverflow questions.
$inputFile = "C:\input.txt"
$outputFile = "C:\output.csv"
$results = Get-Content $inputfile
foreach ($i in $results) {
Get-ChildItem -Path "C:\ProjectFolder" -Filter *.cs -Recurse -ErrorAction SilentlyContinue -Force |
Select-String -Pattern $i.Split('|')[1] |
Select-Object #{ n='Pattern'; e={ $i.Split('|')[0], $i.Split('|')[1] -join '|'} }, Filename, LineNumber, line |
Export-Csv $outputFile -Append}

Need powershell command to detect matching line with * character to match any number of string

Below command will find partial matching line from a file and remove it
Set-Content -Path "file.config" -Value (get-content -Path "file.config" | Select-String -Pattern 'entry=67889_$d_*_0.1' -NotMatch
Command without * works fine but with * in between to match any string does not work and no output is obtained
I think you may have two problems going on here. If you have a variable $d then you need to write ${d}_ instead of $d_ else since _ is a valid variable character your variable name is seen as $d_ As for the * it means any number of the proceeding character, If you want a total wildcard you can use .* since . matches any character.
Select-String -Pattern 'entry=67889_${d}_.*_0.1'
Note that also means you need to escape the . in 0.1 i.e.0\.1 if you need it to match only a period character.

Extract lines matching a pattern from all text files in a folder to a single output file

I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.
$files = Get-ChildItem "folder" -Filter *.txt
foreach ($file in $files)
{
if ($_ -like "*%%*")
{
Set-Content "Output.txt"
}
}
I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.
Something like this:
Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"
The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):
(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.
If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
^%% only matches literal %% at the start (^) of a line.
Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.
Set-Content Output.txt sends all matching lines to single output file Output.txt
Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).
As for what you've tried:
$_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.
Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).
The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.
Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.
First you have to use
Get-Content
in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.
I hope this logic helps you
ls *.txt | %{
$f = $_
gc $f.fullname | {
if($_.StartWith("%%") -eq 1){
$_ >> Output.txt
}#end if
}#end gc
}#end ls
Alias
ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment
http://ss64.com/ps/

Extract specific data

Please help. I am trying to extract multiple filenames from the following .xml file. I then need to copy the list of files from one folder to another. A part of the XML I have posted below:
<component>
<altname>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</altname>
<filename>CP021404.scexe</filename>
<name>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</name>
<description>This package contains vSphere 5.1 and VMware </description>
<component>
<component>
<altname>Online ROM Flash - Power Management Controller </altname>
<filename>CP021615.scexe</filename>
I used Windows PowerShell as below and got the output, but the output contains filenames (CP021404.scexe, CP021614.scexe below), line# and symbol still in it. What am I doing wrong on my first PS attempt?
PowerShell
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = ".exe"
select-string -Path $input_path -Pattern $regex -AllMatches > $output_file
Output
PowerShell\hpsum_inventory.xml:8: <filename>CP021404.scexe</filename>
PowerShell\hpsum_inventory.xml:18: <filename>CP021614.scexe</filename>
The problem is that you're using a RegEx match and the period character in RegEx matches any character except Line Feed/New Line characters, so it's matching any character followed by 'exe'. Really what you want to do is read the file as XML, and just output the <filename> nodes.
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = "exe$"
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | out-file $output_file
Edit: Ok, you need to incorporate that into a string, that's easy enough. We'll add a ForEach loop (I use the alias % for that) to the last line to insert the file name into a string.
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | %{"copy c:\powershell\$_ x:\firmware\"} | out-file $output_file
Edit2: Ok, so you want the knowledge in general of how to match text in a file. Can do! Select string will do what you want actually, it just wasn't the best method in general for the example you gave earlier. This gets a bit more interesting, since you need to be familiar with RegEx matching patterns, but other than that it's fairly straight forward. You want to use the -Pattern match again, but let me suggest a better pattern:
"filename>(.*?)<"
That looks for the filename tag, including closing > on it, and grabs everything up to the next < character. The () denote a capturing group, so the rest is ignored as far as the capture goes. Then we pipe to a ForEach loop, and for each line that it finds that matches we select the Matches property, and the second Group property of that (the first contains the whole text, including the filename> and < bits). So it looks like this:
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
Now that only gets the file names. If we want to incorporate the rest of your thing about inserting it into text you enclose the part in the ForEach loop inside a sub-expression $() and then put that into your double quoted string like such:
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{"copy c:\powershell\$($_.matches.groups[1].value) x:\firmware"}|Out-File $output_file
Personally I would suggest not doing that directly as it limits you. I'd collect the data in an array, then pipe that array into a process that does what you want, but then at least you have the collection so you can do with it what you want.
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
$Filenames = select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
$Filenames|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
Why do it that way? What if you don't want to over-write something? Then you can do something like:
$Filenames|?{$_ -notin (GCI X:\firmware -file|select -expand name)}|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
For your collection of serial numbers, try the regex pattern of:
"Serial Number: (\S*)"
In RegEx there are a few escaped characters that have special meaning, and capitalizing them inverts that meaning. \s means whitespace, so spaces, tabs, what not. Doing it as a capital means something that is NOT whitespace. The asterisk means however many of the previous thing (not whitespace) it can find. So this looks for 'Serial Number: ' and then captures everything after that until it reaches the end of the line or encounters whitespace. Check out this link to see how it works.