analog command grep -o in Powershell - powershell

What command in Powershell replaces grep -o (which displays only the matched portion of a line instead of a whole line) ?
i try use Select-Object but it always display full line.
For example:
next line
<a id="tm_param1_text1_item_1" class="tm_param1param2_param3 xxx_zzz qqq_rrrr_www_vv_no_empty" >eeee <span id="ttt_xxx_zzz">0</span></a>
use next command:
cat filename | grep -o '>[0-9]' | grep -o '[0-9]'
output: 0
When i use Select-Object i always see full line (

One way is:
$a = '<a id="tm_param1_text1_item_1" class="tm_param1param2_param3 xxx_zzz qqq_rrrr_www_vv_no_empty" >eeee <span id="ttt_xxx_zzz">0</span></a>'
$a -match '>([0-9])<' #returns true and populate the $matches automatic variable
$matches[1] #returns 0

For selecting strings in text, use select-string rather than select-object. It will return a MatchInfo object. You can access the matches by querying the matches property:
$a = '<a id="tm_param1_text1_item_1" class="tm_param1param2_param3 xxx_zzz qqq_rrrr_www_vv_no_empty" >eeee <span id="ttt_xxx_zzz">0</span></a>'
($a | select-string '>[0-9]').matches[0].value # returns >0

InPowerShell v3:
sls .\filename -pattern '^[0-9]' -AllMatches | % matches | % value
Explanation:
sls is an alias for Select-String. It takes a filename/path as well as a pattern as parameters. It produces "matches"
% matches selects all matches regardless of file etc.
% value selects the value of each match

The solutions that have been proposed so far only produce the first match from each line. To fully emulate the behavior of grep -o (which produces every match from each line) something like this is required:
Get-Content filename | Select-String '>([0-9])' -AllMatches |
Select-Object -Expand Matches | % { $_.Groups[1].Value }
Select-String -AllMatches returns all matches from an input string.
Select-Object -Expand Matches "disconnects" matches from the same line, so that all submatches can be selected via $_.Groups[1]. Without this expansion the submatch from the second match of a line would be $_.Groups[3].

Related

How can I write the output of Select-String in powershell without the line numbers?

I am using powershell to filter a textfile using a regular expression. To do that I am using the following command:
Select-String -Pattern "^[0-9]{2}[A-Z]{2}[a-z]{5}" -CaseSensitive rockyou.txt > filter.txt
The issue however, when writing them to filter.txt it's preceding the matched strings with the name of the original file followed by the line numbers e.g.:
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
How can I make it so that it ommits the line numbers?
Select-String outputs an object per match, and each has a Line property containing the original line in which the match occurred. You can grab only the Line value, like so:
... |Select-String ... |Select-Object -ExpandProperty Line |Out-File filter.txt
This way seems to work. Set-content saves the string version of the matchinfo object without any extra blank lines, as opposed to out-file or ">".
get-content rockyou.txt | select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' -ca |
set-content filter.txt
get-content filter.txt
01ABcdefg
It occurred to me you might still want the filename:
select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' rockyou.txt -ca |
% { $_.filename + ':' + $_.line } > filter.txt
cat filter.txt
rockyou.txt:01ABcdefg

Flatten LaTeX file with PowerShell

I would like to make a simple PowerShell script that:
Takes an input .tex file
replaces occurrences of \input{my_folder/my_file} with the file content itself
outputs a new file
My first step is to match the different file names so as to import them, although the following code outputs not only the file names but also \include{file1}, \include{file2}, etc.
$ms = Get-Content ms.tex -Raw
$environment = "input"
$inputs = $ms | Select-String "\\(?:input|include)\{([^}]+)\}" -AllMatches | Foreach {$_.matches}
Write-Host $inputs
I thought using the parenthesis would create a matched group but this fails, can you to me explain why and what is the proper way of just getting the filenames instead of the full match?
On regex101 this regexp \\(?:input|include)\{([^}]+)\} seems to work fine.
You are looking for Positive lookbehind and positive lookahead:
#'
Some line
\input{my_folder/my_file}
Other line
'# | Select-String '(?<=\\input{)[^}]+(?=})' -AllMatches | Foreach {$_.matches}
Result
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 18
Length : 17
Value : my_folder/my_file

Display Multiple pipeline Values Of Same Get-Content

I am trying to get the string value in csv file.
$path = "product.csv"
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]"
I successfully grab the string, however I wish display the line numbers then the string values.
Example Output:
LineNo String
1 a
2 b
3 c
I did successfully grab the linenumber using below command. How should I combine the command with the first command so the output will be alike the example output?
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber
If you want the entire line, select the Line property:
... |Select-Object LineNumber,Line
If you only want the part of the line that was matched by the pattern, you'll need a calculated property to grab the Value from the Matches property:
... |Select-Object LineNumber,#{Name='String';Expression={$_.Matches.Value}}

Select-String sometimes results in "System.Object[]"

I'm working on a script that combines parts of two text files. These files are not too large (about 2000 lines each).
I'm seeing strange output from select-string that I don't think should be there.
Here's samples of my two files:
CC.csv - 2026 lines
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
LS126L47L6/1L3#536,07450,1,B
LS126L47L6/2R1#515,07451,1,B
LS126L47L6/10#525,07452,1,B
LS126L47L6/1L4#538,07453,1,B
GI.txt - 1995 lines
07445,B,SH,1
07446,B,SH,1
07448,B,SH,1
07449,B,SH,1
07450,B,SH,1
07451,B,SH,1
07452,B,SH,1
07453,B,SH,1
07454,B,SH,1
And here's a sample of the output file:
output in myfile.csv
LS126L47L6/3R1#516,07446,1,B
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
System.Object[],B
LS126L47L6/2R1#515,07451,1,B
This is the script I'm using:
sc ./myfile.csv "col1,col2,col3,col4"
$mn = gc cc.csv | select -skip 1 | % {$_.tostring().split(",")[1]}
$mn | % {
$a = (gc cc.csv | sls $_ ).tostring() -replace ",[a-z]$", ""
if (gc GI.txt | sls $_ | select -first 1)
{$b = (gc GI.txt | sls $_ | select -first 1).tostring().split(",")[1]}
else {$b = "NULL"
write-host "$_ is not present in GI file"}
$c = $a + ',' + $b
ac ./myfile.csv -value $c
}
The $a variable is where I am sometimes seeing the returned string as System.Object[]
Any ideas why? Also, this script takes quite some time to finish. Any tips for a newb on how to speed it up?
Edit: I should add that I've taken one line from the cc.csv file, saved in a new text file, and run through the script in console up through assigning $a. I can't get it to return "system.object[]".
Edit 2: After follow the advice below and trying a couple of things I've noticed that if I run
$mn | %{(gc cc.csv | sls $_).tostring()}
I get System.Object[].
But if I run
$mn | %{(gc cc.csv | sls $_)} | %{$_.tostring()}
It comes out fine. Go figure.
The problem is caused by a change in multiplicity of matches. If there are multiple matching elements an Object[] array (of MatchInfo elements) is returned; a single matching element results in a single MatchInfo object (not in an array); and when there are no matches, null is returned.
Consider these results, when executed against the "cc.csv" test-data supplied:
# matches many
(gc cc.csv | Select-String "LS" ).GetType().Name # => Object[]
# matches one
(gc cc.csv | Select-String "538").GetType().Name # => MatchInfo
# matches none
(gc cc.csv | Select-String "FAIL") # => null
The result of calling ToString on Object[] is "System.Object[]" while the result is a more useful concatenation of the matched values when invoked directly upon a MatchInfo object.
The immediate problem can be fixed with selected | Select -First 1, which will result in a MatchInfo being returned for the first two cases. Select-String will still search the entire input - extra results are simply discarded.
However, it seems like the look-back into "cc.csv" (with the Select-String) could be eliminated entirely as that is where $_ originally comes from. Here is a minor [untested] adaptation, of what it may look like:
gc cc.csv | Select -Skip 1 | %{
$num = $_.Split(",")[1]
$a = $_ -Replace ",[a-z]$", ""
# This is still O(m*n) and could be improved with a hash/set probe.
$gc_match = Select-String $num -Path gi.csv -SimpleMatch | Select -First 1
if ($gc_match) {
# Use of "Select -First 1" avoids the initial problem; but
# it /may/ be more appropriate for an error to indicate data problems.
# (Likewise, an error in the original may need further investigation.)
$b = $gc_match.ToString().Split(",")[1]
} else {
$b = "NULL"
Write-Host "$_ is not present in GI file"
}
$c = $a + ',' + $b
ac ./myfile.csv -Value $c
}

Return first matching line

I have an XML command that returns a list of URLs, example
PS > $xml.rss.channel.item.link
http://example.com/20140704.exe
http://example.com/20140704.tar.xz
http://example.com/20140624.exe
http://example.com/20140624.tar.xz
http://example.com/20140507.tar.xz
From this list, I would like to return the first .tar.xz line. I have this
command
$xml.rss.channel.item.link | ? {$_ -match '.tar.xz'} | select -first 1
But I would prefer a command with only one pipe if possible.
You don't need a pipe at all:
(Select-Xml -Xml $xml -XPath "(//link[contains(.,'.tar.xz')])[1]").Node.InnerText
Note: XPath is case-sensitive. If that is an issue, you can use a trick with translate() function and force it to ignore the case.
A different way using two pipes
$xml.rss.channel.item.link | Select-String .tar.xz | select -first 1
One pipe
($xml.rss.channel.item.link | Select-String .tar.xz)[0]