Matching lines in file from list - powershell

I have two text files, Text1.txt and Text2.txt
Text2.txt is a list of keywords, one keyword per line. I want to read from Text1.txt and any time a keyword in the Text2.txt list shows up, pipe that entire line of text to a new file, output.txt
Without using Text2.txt I figured out how to do it manually in PowerShell.
Get-Content .\Text1.txt | Where-Object {$_ -match 'CAPT'} | Set-Content output.txt
That seems to work, it searches for "CAPT" and returns the entire line of text, but I don't know how to replace the manual text search with a variable that pulls from Text2.txt
Any ideas?

Using some simple regex you can make a alternative matching string from all the keywords in the file Text2.txt
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_)}) -Join "|"
Get-Content .\Text1.txt | Where-Object {$_ -match $pattern} | Set-Content output.txt
In case your keywords have special regex characters we need to be sure they are escaped here. The .net regex method Escape() handles that.
This is not an efficient approach for large files but it is certainly a simple method. If your keywords were all similar like CAPT CAPS CAPZ then we could improve it but I don't think it would be worth it depending how often the keywords change.
Changing the pattern
If you wanted to just match the first 4 characters from the lines in your input file that is just a matter of making a change in the loop.
$pattern = (Get-Content .\Text2.txt | ForEach-Object{[regex]::Escape($_.Substring(0,4))}) -Join "|"

Related

How to remove white space using powershell in multiple text file in the same folder?

folder name is: c:\home\alltext\
inside has: 2 text files with different names(each text contents extra whitespace that I want to trim)
text1.txt
text2.txt
I don't want to use notepad++ and do one by one text.txt if I have more than 2 command.
I tried PowerShell it returns both text1 and text2 together in same one text.txt.
How can I trim them in one command and return individual txt?
This is my command:
(get-content c:\home\alltext\*.txt).trim() -ne '' | Set-content c:\home\alltext\*.txt
You need to process the input files one by one:
Get-ChildItem c:\home\alltext*.txt | ForEach-Object {
Set-Content -LiteralPath $_.FullName -Value (($_ | Get-Content).Trim() -ne '')
}
Note that PowerShell never preserves the original character encoding when reading text files, so you may have to use the -Encoding parameter with Set-Content.
As for what you tried:
(get-content c:\home\alltext*.txt).trim() -ne '' streams the non-blank lines of all files matching wildcard expression c:\home\alltext*.txt, across file boundaries.
Perhaps surprisingly, not only does Set-Content's (positionally implied) -Path parameter accept wildcard expressions too, it writes the same content (the stringified versions of whatever input it receives) to whatever files happen to match that wildcard expression.
This problematic behavior is discussed in GitHub issue #6729; unfortunately, it was decided to retain the current behavior.

Issues merging multiple CSV files in Powershell

I found a nifty command here - http://www.stackoverflow.com/questions/27892957/merging-multiple-csv-files-into-one-using-powershell that I am using to merge CSV files -
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append
Now this does what it says on the tin and works great for the most part. I have 2 issues with it however, and I am wondering if there is a way they can be overcome:
Firstly, the merged csv file has CRLF line endings, and I am wondering how I can make the line endings just LF, as the file is being generated?
Also, it looks like there are some shenanigans with quote marks being added/moved around. As an example:
Sample row from initial CSV:
"2021-10-05"|"00:00"|"1212"|"160477"|"1.00"|"3.49"LF
Same row in the merged CSV:
"2021-10-05|""00:00""|""1212""|""160477""|""1.00""|""3.49"""CRLF
So see that the first row has lost its trailing quotes, other fields have doubled quotes, and the end of the row has an additional quote. I'm not quite sure what is going on here, so any help would be much appreciated!
For dealing with the quotes, the cause of the “problem” is that your CSV does not use the default field delimiter that Import-CSV assumes - the C in CSV stands for comma, and you’re using the vertical bar. Add the parameter -Delimiter "|" to both the Import-CSV and Export-CSV cmdlets.
I don’t think you can do anything about the line-end characters (CRLF vs LF); that’s almost certainly operating-system dependent.
Jeff Zeitlin's helpful answer explains the quote-related part of your problem well.
As for your line-ending problem:
As of PowerShell 7.2, there are no PowerShell-native features that allow you to control the newline format of file-writing cmdlets such as Export-Csv.
However, if you use plain-text processing, you can use multi-line strings built with the newline format of interest and save / append them with Set-Content and its -NoNewLine switch, which writes the input strings as-is, without a (newline) separator.
In fact, to significantly speed up processing in your case, plain-text handling is preferable, since in essence your operation amounts to concatenating text files, the only twist being that the header lines of all but the first file should be skipped; using plain-text handling also bypasses your quote problem:
$tokenCount = 1
Get-ChildItem -Filter *.csv |
Get-Content -Raw |
ForEach-Object {
# Get the file content and replace CRLF with LF.
# Include the first line (the header) only for the first file.
$content = ($_ -split '\r?\n', $tokenCount)[-1].Replace("`r`n", "`n")
$tokenCount = 2 # Subsequent files should have their header ignored.
# Make sure that each file content ends in a LF
if (-not $content.EndsWith("`n")) { $content += "`n" }
# Output the modified content.
$content
} |
Set-Content -NoNewLine ./merged/merged.csv # add -Encoding as needed.

Batch File to Find and Replace in text file using whole word only?

I am writing a script which at one point has to check in a text file and remove certain strings. So far I have this:
powershell -Command "(gc myFile.txt) -replace 'foo', 'bar' | Out-File -encoding ASCII myFile.txt"
The only problem is that that can find and replace but will not remove the line all together.
The second problem is that say I am removing the line that has Mark, it needs to not remove a line that has something like Markus.
I don't know if this is possible with the powershell interface?
Your current code will only replace foo with bar, this is what replace does.
Removing the whole line if it matches requires a different approach, almost backwards, as you can use notmatch to output any lines that do not match you filter - effectively removing them.
Also using regex word boundaries will then only match Mark but not Markus:
(Get-Content file.txt) | Where-Object {$_ -notmatch "\bMark\b"} | Set-Content file.txt

Remove start and end spaces in specific csv column

I am trying to remove start and end spaces in column data in CSV file. I've got a solution to remove all spaces in the csv, but it's creating non-readable text in description column.
Get-Content –path test.csv| ForEach-Object {$_.Trim() -replace "\s+" } | Out-File -filepath out.csv -Encoding ascii
e.g.
'192.168.1.2' ' test-1-TEST' 'Ping Down at least 1 min' '3/11/2017' 'Unix Server' 'Ping' 'critical'
'192.168.1.3' ' test-2-TEST' ' Ping Down at least 3 min' '3/11/2017' 'windows Server' 'Ping' 'critical'
I only want to remove space only from ' test-1-TEST' and not from 'Ping Down at least 1 min'. Is this possible?
"IP","ServerName","Status","Date","ServerType","Test","State"
"192.168.1.2"," test-1-TEST","Ping Down at least 1 min","3/11/2017","Unix Server","Ping","critical"
"192.168.1.3"," test-2-TEST"," Ping Down at least 3 min","3/11/2017","windows Server","Ping","critical"
For example file above:
Import-Csv C:\folder\file.csv | ForEach-Object {
$_.ServerName = $_.ServerName.trim()
$_
} | Export-Csv C:\folder\file2.csv -NoTypeInformation
Replace ServerName with the name of the Column you want to remove spaces from (aka trim).
If your CSV does not have header (which means its not a true CSV) and/or you want to better preserve the original file structure and formatting you could try to expand on your regex a little.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
That should remove all leading and trailing spaces inside the quoted values. Not the delimeters themselves.
Read the file in as one string. Could be bad idea depending on file size. Not required as the solution is not dependent on that. Can still be read line be line with the same transformation achieving the same result. Use two replacements that are similar. First is looking for spaces that exist after a single quote but not followed by another quote or space. Second is looking for spaces before a quote that are not preceded by a quote or space.
Just wanted to give a regex example. You can look into this with more detail and explanation at regex101.com. There you will see an alternation pattern instead of two separate replacements.
(Get-Content c:\temp\test.txt -Raw) -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
The first example is a little easier on the eyes.
I was having issues consistently replicating this but if you are having issues with it replacing newlines as well then you can just do the replacement one line at a time and that should work as well.
(Get-Content c:\temp\test.txt) | Foreach-Object{
$_ -replace "(?<=')\s+(?=[^' ])|(?<=[^' ])\s+(?=')"
} | Set-Content c:\temp\test.txt

Extract specific data

Please help. I am trying to extract multiple filenames from the following .xml file. I then need to copy the list of files from one folder to another. A part of the XML I have posted below:
<component>
<altname>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</altname>
<filename>CP021404.scexe</filename>
<name>HP Broadcom Online Firmware Upgrade Utility for VMware 5.x</name>
<description>This package contains vSphere 5.1 and VMware </description>
<component>
<component>
<altname>Online ROM Flash - Power Management Controller </altname>
<filename>CP021615.scexe</filename>
I used Windows PowerShell as below and got the output, but the output contains filenames (CP021404.scexe, CP021614.scexe below), line# and symbol still in it. What am I doing wrong on my first PS attempt?
PowerShell
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = ".exe"
select-string -Path $input_path -Pattern $regex -AllMatches > $output_file
Output
PowerShell\hpsum_inventory.xml:8: <filename>CP021404.scexe</filename>
PowerShell\hpsum_inventory.xml:18: <filename>CP021614.scexe</filename>
The problem is that you're using a RegEx match and the period character in RegEx matches any character except Line Feed/New Line characters, so it's matching any character followed by 'exe'. Really what you want to do is read the file as XML, and just output the <filename> nodes.
$input_path = ‘C:\PowerShell\hpsum_inventory.xml’
$output_file = ‘C:\powershell\hpsum_inventory-o.xml’
$regex = "exe$"
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | out-file $output_file
Edit: Ok, you need to incorporate that into a string, that's easy enough. We'll add a ForEach loop (I use the alias % for that) to the last line to insert the file name into a string.
(Select-Xml -Path $input_path -XPath //filename).node.InnerText | ?{$_ -match $regex} | %{"copy c:\powershell\$_ x:\firmware\"} | out-file $output_file
Edit2: Ok, so you want the knowledge in general of how to match text in a file. Can do! Select string will do what you want actually, it just wasn't the best method in general for the example you gave earlier. This gets a bit more interesting, since you need to be familiar with RegEx matching patterns, but other than that it's fairly straight forward. You want to use the -Pattern match again, but let me suggest a better pattern:
"filename>(.*?)<"
That looks for the filename tag, including closing > on it, and grabs everything up to the next < character. The () denote a capturing group, so the rest is ignored as far as the capture goes. Then we pipe to a ForEach loop, and for each line that it finds that matches we select the Matches property, and the second Group property of that (the first contains the whole text, including the filename> and < bits). So it looks like this:
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
Now that only gets the file names. If we want to incorporate the rest of your thing about inserting it into text you enclose the part in the ForEach loop inside a sub-expression $() and then put that into your double quoted string like such:
select-string -Path $input_path -Pattern "filename>(.*?)<"|%{"copy c:\powershell\$($_.matches.groups[1].value) x:\firmware"}|Out-File $output_file
Personally I would suggest not doing that directly as it limits you. I'd collect the data in an array, then pipe that array into a process that does what you want, but then at least you have the collection so you can do with it what you want.
$input_path = 'C:\PowerShell\hpsum_inventory.xml'
$output_file = 'C:\powershell\hpsum_inventory-o.xml'
$regex = "filename>(.*?)<"
$Filenames = select-string -Path $input_path -Pattern "filename>(.*?)<"|%{$_.matches.groups[1].value}
$Filenames|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
Why do it that way? What if you don't want to over-write something? Then you can do something like:
$Filenames|?{$_ -notin (GCI X:\firmware -file|select -expand name)}|%{"copy c:\powershell\$_ x:\firmware"}|Out-File $output_file
For your collection of serial numbers, try the regex pattern of:
"Serial Number: (\S*)"
In RegEx there are a few escaped characters that have special meaning, and capitalizing them inverts that meaning. \s means whitespace, so spaces, tabs, what not. Doing it as a capital means something that is NOT whitespace. The asterisk means however many of the previous thing (not whitespace) it can find. So this looks for 'Serial Number: ' and then captures everything after that until it reaches the end of the line or encounters whitespace. Check out this link to see how it works.