I am using powershell to filter a textfile using a regular expression. To do that I am using the following command:
Select-String -Pattern "^[0-9]{2}[A-Z]{2}[a-z]{5}" -CaseSensitive rockyou.txt > filter.txt
The issue however, when writing them to filter.txt it's preceding the matched strings with the name of the original file followed by the line numbers e.g.:
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
How can I make it so that it ommits the line numbers?
Select-String outputs an object per match, and each has a Line property containing the original line in which the match occurred. You can grab only the Line value, like so:
... |Select-String ... |Select-Object -ExpandProperty Line |Out-File filter.txt
This way seems to work. Set-content saves the string version of the matchinfo object without any extra blank lines, as opposed to out-file or ">".
get-content rockyou.txt | select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' -ca |
set-content filter.txt
get-content filter.txt
01ABcdefg
It occurred to me you might still want the filename:
select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' rockyou.txt -ca |
% { $_.filename + ':' + $_.line } > filter.txt
cat filter.txt
rockyou.txt:01ABcdefg
I would like to make a simple PowerShell script that:
Takes an input .tex file
replaces occurrences of \input{my_folder/my_file} with the file content itself
outputs a new file
My first step is to match the different file names so as to import them, although the following code outputs not only the file names but also \include{file1}, \include{file2}, etc.
$ms = Get-Content ms.tex -Raw
$environment = "input"
$inputs = $ms | Select-String "\\(?:input|include)\{([^}]+)\}" -AllMatches | Foreach {$_.matches}
Write-Host $inputs
I thought using the parenthesis would create a matched group but this fails, can you to me explain why and what is the proper way of just getting the filenames instead of the full match?
On regex101 this regexp \\(?:input|include)\{([^}]+)\} seems to work fine.
You are looking for Positive lookbehind and positive lookahead:
#'
Some line
\input{my_folder/my_file}
Other line
'# | Select-String '(?<=\\input{)[^}]+(?=})' -AllMatches | Foreach {$_.matches}
Result
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 18
Length : 17
Value : my_folder/my_file
I am trying to get the string value in csv file.
$path = "product.csv"
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]"
I successfully grab the string, however I wish display the line numbers then the string values.
Example Output:
LineNo String
1 a
2 b
3 c
I did successfully grab the linenumber using below command. How should I combine the command with the first command so the output will be alike the example output?
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber
If you want the entire line, select the Line property:
... |Select-Object LineNumber,Line
If you only want the part of the line that was matched by the pattern, you'll need a calculated property to grab the Value from the Matches property:
... |Select-Object LineNumber,#{Name='String';Expression={$_.Matches.Value}}
I'm working on a script that combines parts of two text files. These files are not too large (about 2000 lines each).
I'm seeing strange output from select-string that I don't think should be there.
Here's samples of my two files:
CC.csv - 2026 lines
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
LS126L47L6/1L3#536,07450,1,B
LS126L47L6/2R1#515,07451,1,B
LS126L47L6/10#525,07452,1,B
LS126L47L6/1L4#538,07453,1,B
GI.txt - 1995 lines
07445,B,SH,1
07446,B,SH,1
07448,B,SH,1
07449,B,SH,1
07450,B,SH,1
07451,B,SH,1
07452,B,SH,1
07453,B,SH,1
07454,B,SH,1
And here's a sample of the output file:
output in myfile.csv
LS126L47L6/3R1#516,07446,1,B
LS126L47L6/1L2#519,07448,1,B
LS126L47L6/1R1-1#503,07449,1,B
System.Object[],B
LS126L47L6/2R1#515,07451,1,B
This is the script I'm using:
sc ./myfile.csv "col1,col2,col3,col4"
$mn = gc cc.csv | select -skip 1 | % {$_.tostring().split(",")[1]}
$mn | % {
$a = (gc cc.csv | sls $_ ).tostring() -replace ",[a-z]$", ""
if (gc GI.txt | sls $_ | select -first 1)
{$b = (gc GI.txt | sls $_ | select -first 1).tostring().split(",")[1]}
else {$b = "NULL"
write-host "$_ is not present in GI file"}
$c = $a + ',' + $b
ac ./myfile.csv -value $c
}
The $a variable is where I am sometimes seeing the returned string as System.Object[]
Any ideas why? Also, this script takes quite some time to finish. Any tips for a newb on how to speed it up?
Edit: I should add that I've taken one line from the cc.csv file, saved in a new text file, and run through the script in console up through assigning $a. I can't get it to return "system.object[]".
Edit 2: After follow the advice below and trying a couple of things I've noticed that if I run
$mn | %{(gc cc.csv | sls $_).tostring()}
I get System.Object[].
But if I run
$mn | %{(gc cc.csv | sls $_)} | %{$_.tostring()}
It comes out fine. Go figure.
The problem is caused by a change in multiplicity of matches. If there are multiple matching elements an Object[] array (of MatchInfo elements) is returned; a single matching element results in a single MatchInfo object (not in an array); and when there are no matches, null is returned.
Consider these results, when executed against the "cc.csv" test-data supplied:
# matches many
(gc cc.csv | Select-String "LS" ).GetType().Name # => Object[]
# matches one
(gc cc.csv | Select-String "538").GetType().Name # => MatchInfo
# matches none
(gc cc.csv | Select-String "FAIL") # => null
The result of calling ToString on Object[] is "System.Object[]" while the result is a more useful concatenation of the matched values when invoked directly upon a MatchInfo object.
The immediate problem can be fixed with selected | Select -First 1, which will result in a MatchInfo being returned for the first two cases. Select-String will still search the entire input - extra results are simply discarded.
However, it seems like the look-back into "cc.csv" (with the Select-String) could be eliminated entirely as that is where $_ originally comes from. Here is a minor [untested] adaptation, of what it may look like:
gc cc.csv | Select -Skip 1 | %{
$num = $_.Split(",")[1]
$a = $_ -Replace ",[a-z]$", ""
# This is still O(m*n) and could be improved with a hash/set probe.
$gc_match = Select-String $num -Path gi.csv -SimpleMatch | Select -First 1
if ($gc_match) {
# Use of "Select -First 1" avoids the initial problem; but
# it /may/ be more appropriate for an error to indicate data problems.
# (Likewise, an error in the original may need further investigation.)
$b = $gc_match.ToString().Split(",")[1]
} else {
$b = "NULL"
Write-Host "$_ is not present in GI file"
}
$c = $a + ',' + $b
ac ./myfile.csv -Value $c
}
I have an XML command that returns a list of URLs, example
PS > $xml.rss.channel.item.link
http://example.com/20140704.exe
http://example.com/20140704.tar.xz
http://example.com/20140624.exe
http://example.com/20140624.tar.xz
http://example.com/20140507.tar.xz
From this list, I would like to return the first .tar.xz line. I have this
command
$xml.rss.channel.item.link | ? {$_ -match '.tar.xz'} | select -first 1
But I would prefer a command with only one pipe if possible.
You don't need a pipe at all:
(Select-Xml -Xml $xml -XPath "(//link[contains(.,'.tar.xz')])[1]").Node.InnerText
Note: XPath is case-sensitive. If that is an issue, you can use a trick with translate() function and force it to ignore the case.
A different way using two pipes
$xml.rss.channel.item.link | Select-String .tar.xz | select -first 1
One pipe
($xml.rss.channel.item.link | Select-String .tar.xz)[0]