Powershell/Batch Files: Verify that a file contains at least one entry from a list of strings - powershell

Here is my current issue: I have a list of 1800 customer numbers (ie 123456789). I need to determine which of these numbers show up in another, much larger (4 gb) file. The larger file is a fixed-width file of all customer information. I know how I would do this in SQL, but like I said it's a flat file.
When searching for individual numbers, I was using a command I found elsewhere on this site which worked very well:
get-content CUSTOMERINFO.txt -ReadCount 1000 | foreach { $_ -match "123456789" }
However, I do not have the expertise to translate this into another command, or a batch file, which would load list.txt and search all lines in customerinfo.txt for the requisite strings.
Time is not a major constraint, as this is running on a test server and will be a once-off project.
Thank you very much for any help you can provide.

So I appreciate everyone's help. Everybody gave me helpful info that let me get to my final solution, so I appreciate it. Especially to the guy who asked if this was a codewriting request, because it made me realize I needed to just write some code.
For anyone else who runs into the same problem, here is the code I ended up using:
$matches = Get-Content .\list.txt
foreach ($entry in $matches)
{ $results = get-content FiletoSearch -ReadCount 1000 | foreach { $_ -match $entry }
if ($results -eq $null) {
$entry }
else {
"found"}
}
This gives a 'found' entry for everything that was found (which is information I don't need), and gives back the value searched for when it's not found (which is information I do need).

The match comparator can work over multiple values, you can separate them with a bar | character.
e.g.
get-content CUSTOMERINFO.txt -ReadCount 1000 | foreach { $_ -match "DEF|YZ" }
You can also read the contents of a file and replace newlines with a character of your choice. So if list.txt is a list of values to search, such as
DEF
XY
Then you can read it and convert it to a bar-separated list using the join operator:
(Get-Content list.txt) -join "|"
Put them together and you should have your solution:
$listSearch = (Get-Content list.txt) -join "|";
get-content CUSTOMERINFO.txt -ReadCount 1000 | foreach { $_ -match $listSearch}

Related

Powershell to change sample rate CSV file from data logger

I have csv data from a datalogger that collected the info every second rather than every 15minutes and I made a script to export every 900th entry. The script works for smaller csv files (up to 80mb). But I have one file at 3.6GB, and it doesn't work.
I looked online and found better methods to increase the speed (don't have .net, and haven't been able to get stream.reader to work).
Here is the script:
$file = Import-Csv z:\csv\input_file.csv -Header A,B,C,D,E,F
$counter = 0
ForEach ($item in $file)
{
$counter++
If($counter -lt 900)
{
}
Else{
Write-Output “$item” | Out-File "z:\csv\output_file.csv" -Append
$counter=0
}
}
Any ideas/optimizations are greatly appreciated.
Thanks.
You can skip reading it as a CSV, and just read it as text. Then loop through iterating by 900 at a time, and output those lines.
$file = Get-Content z:\csv\input_file.csv -ReadCount 1000
For($i=0; $i -le $file.count;$i=$i+900){
$file[$i] | Add-Content z:\csv\output_file.csv
}
I'm sure there are probably other optimizations that could be made, but that's a simple way to speed things up.
Edit: Ok, so -ReadCount behaves a little differently than I had anticipated. When set to a number other than 0 or 1 it creates an array of arrays of strings. So, basically [array[string[]]], at which point there's two options here... either use -ReadCount 0 to read the entire file at once, or better yet read in 900 lines at a time and output only the first of each set, and pass that directly down the pipe to Set-Content.
Get-Content z:\csv\input_file.csv -ReadCount 900 | %{$_[0]} | Set-Content z:\csv\output_file.csv
So that will read the file into memory 900 lines at a time, and then pass only the first line from each series down the pipe, and output that to the output file.

Pulling a substring for each line in file

Using Powershell, I am simply trying to pull 15 characters starting from the 37th position of any record that begins with a 6. I'd like to loop through and generate a record for each instance so it can later be put into an output file. But I seem to not be hitting the correct syntax just to return the 15 characters I know I am missing something obvious. Been at this for a while. Here is my script:
$content = Get-Content -Path .\tmfhsyst*.txt | Where-Object { $_.StartsWith("6") }
foreach ($line in $contents)
{
$val102 = $line.substring(36,15)
}
write-output $val102
Just as Bill_Stewart pointed out, you need to move your Write-Output line inside the ForEach loop. A possibly better way to do it would just be to pipe it:
Get-Content -Path .\tmfhsyst*.txt | Where-Object { $_.StartsWith("6") } | foreach{$_.substring(36,15)}
That should give you the output you desired.
Using Substring() has the disadvantage that it will raise an error if the string is shorter than start index + substring length. You can avoid this with a regular expression match:
(Get-Content -Path .\tmfhsyst*.txt) -match '^6.{35}(.{15})' | % { $matches[1] }

Get all lines containing a string in a huge text file - as fast as possible?

In Powershell, how to read and get as fast as possible the last line (or all the lines) which contains a specific string in a huge text file (about 200000 lines / 30 MBytes) ?
I'm using :
get-content myfile.txt | select-string -pattern "my_string" -encoding ASCII | select -last 1
But it's very very long (about 16-18 seconds).
I did tests without the last pipe "select -last 1", but it's the same time.
Is there a faster way to get the last occurence (or all occurences) of a specific string in huge file?
Perhaps it's the needed time ...
Or it there any possiblity to read the file faster from the end as I want the last occurence?
Thanks
Try this:
get-content myfile.txt -ReadCount 1000 |
foreach { $_ -match "my_string" }
That will read your file in chunks of 1000 records at a time, and find the matches in each chunk. This gives you better performance because you aren't wasting a lot of cpu time on memory management, since there's only 1000 lines at a time in the pipeline.
Have you tried:
gc myfile.txt | % { if($_ -match "my_string") {write-host $_}}
Or, you can create a "grep"-like function:
function grep($f,$s) {
gc $f | % {if($_ -match $s){write-host $_}}
}
Then you can just issue: grep $myfile.txt $my_string
$reader = New-Object System.IO.StreamReader("myfile.txt")
$lines = #()
if ($reader -ne $null) {
while (!$reader.EndOfStream) {
$line = $reader.ReadLine()
if ($line.Contains("my_string")) {
$lines += $line
}
}
}
$lines | Select-Object -Last 1
Have you tried using [System.IO.File]::ReadAllLines();? This method is more "raw" than the PowerShell-esque method, since we're plugging directly into the Microsoft .NET Framework types.
$Lines = [System.IO.File]::ReadAllLines();
[Regex]::Matches($Lines, 'my_string_pattern');
I wanted to extract the lines that contained failed and also write this lines to a new file, I will add the full command for this
get-content log.txt -ReadCount 1000 |
>> foreach { $_ -match "failed" } | Out-File C:\failes.txt

Fastest way to parse thousands of small files in PowerShell

I have over 16000 inventory log files ranging in size from 3-5 KB on a network share.
Sample file looks like this:
## System Info
SystemManufacturer:=:Dell Inc.
SystemModel:=:OptiPlex GX620
SystemType:=:X86-based PC
ChassisType:=:6 (Mini Tower)
## System Type
isLaptop=No
I need to put them into a DB, so I started parsing them and creating a custom object for each that I can later use to check duplicates, normalize etc...
Initial parse with a code snippet as in below took about 7.5mins.
Foreach ($invlog in $invlogs) {
$content = gc $invlog.FullName -ReadCount 0
foreach ($line in $content) {
if ($line -match '^#|^\s*$') { continue }
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{Name=$invitem;Value=$value}
}
}
I started optimizing it and after several trial and error ended up with this which takes 2mins and 4 secs:
Foreach ($invlog in $invlogs) {
foreach ($line in ([System.IO.File]::ReadLines("$($invlog.FullName)") -match '^\w') ) {
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{name=$invitem;Value=$value} #2.04mins
}
}
I also tried using a hash instead of PSCustomObject, but to my surprise it took much longer (5mins 26secs)
Foreach ($invlog in $invlogs) {
$hash=#{}
foreach ($line in ([System.IO.File]::ReadLines("$($invlog.FullName)") -match $propertyline) ) {
$invitem,$value=$line -split ':=:'
$hash[$invitem]=$value #5.26mins
}
}
What would be the fastest method to use here?
See if this is any faster:
Foreach ($invlog in $invlogs) {
#(gc $invlog.FullName -ReadCount 0) -notmatch '^#|^\s*$' |
foreach {
$invitem,$value=$line -split ':=:'
[PSCustomObject]#{Name=$invitem;Value=$value}
}
}
The -match and -notmatch operators, when appied to an array return all the elements that satisfy the match, so you can eliminate having to test every line for the lines to exclude.
Are you really wanting to create a PS Object for every line, or just one for every file?
If you want one object per file, see if this is any quicker:
The multi-line regex eliminates the line array, and a filter is used in place of the foreach to create the hash entries.
$regex = [regex]'(?ms)^(\w+):=:([^\r]+)'
filter make-hash { #{$_.groups[1].value = $_.groups[2].value} }
Foreach ($invlog in $invlogs) {
$regex.matches([io.file]::ReadAllText($invlog.fullname)) | make-hash
}
The objective of switching to using the multi-line regex and [io.file]::ReadAllText] is to simplify what Powershell is doing with the file input internally. The result of [io.file]::ReadAllText() will be a string object, which is a much simpler type of object than the array of strings that [io.file]::ReadAllLines() will produce, and requires less overhead to counstruct internally. A filter is essentially just the Process block of a function - it will run once for every object that comes to it from the pipeline, so it emulates the action of foreach-object, but actually runs slightly faster (I don't know the internals well enough to tell you exactly why). Both of these changes require more coding and only result in a marginal increase in performace. In my testing switching to multi-line gained about .1ms per file, and changing from foreach-object to the filter another .1 ms. You probably don't see these techniques used very often because of the low return compared to the additional coding work required, but it becomes significant when you start to multiply those fractions of a ms by 160K iterations.
Try this:
Foreach ($invlog in $invlogs) {
$output = #{}
foreach ($line in ([IO.File]::ReadLines("$($invlog.FullName)") -ne '') ) {
if ($line.Contains(":=:")) {
$item, $value = $line.Split(":=:") -ne ''
$output[$item] = $value
}
}
New-Object PSObject -Property $output
}
As a general rule, Regex is sometimes cool but always slower.
Wouldn't you want an object per system, and not per key-value pair? :S
Like this.. By replacing Get-Content to the .Net method you could probably save some time.
Get-ChildItem -Filter *.txt -Path <path to files> | ForEach-Object {
$ht = #{}
Get-Content $_ | Where-Object { $_ -match ':=:' } | ForEach-Object {
$ht[($_ -split ':=:')[0].Trim()] = ($_ -split ':=:')[1].Trim()
}
[pscustomobject]$ht
}
ChassisType SystemManufacturer SystemType SystemModel
----------- ------------------ ---------- -----------
6 (Mini Tower) Dell Inc. X86-based PC OptiPlex GX620

Powershell: Search data in *.txt files to export into *.csv

First of all, this is my first question here. I often come here to browse existing topics, but now I'm hung on my own problem. And I didn't found a helpful resource right now. My biggest concern would be, that it won't work in Powershell... At the moment I try to get a small Powershell tool to save me a lot of time. For those who don't know cw-sysinfo, it is a tool that collects information of any host system (e.g. Hardware-ID, Product Key and stuff like that) and generates *.txt files.
My point is, if you have 20, 30 or 80 server in a project, it is a huge amount of time to browse all files and just look for those lines you need and put them together in a *.csv file.
What I have working is more like the basic of the tool, it browses all *.txt in a specific path and checks for my keywords. And here is the problem that I just can use the words prior to those I really need, seen as follow:
Operating System: Windows XP
Product Type: Professional
Service Pack: Service Pack 3
...
I don't know how I can tell Powershell to search for "Product Type:"-line and pick the following "Professional" instead. Later on with keys or serial numbers it will be the same problem, that is why I just can't browse for "Standard" or "Professional".
I placed my keywords($controls) in an extra file that I can attach the project folders and don't need to edit in Powershell each time. Code looks like this:
Function getStringMatch
{
# Loop through the project directory
Foreach ($file In $files)
{
# Check all keywords
ForEach ($control In $controls)
{
$result = Get-Content $file.FullName | Select-String $control -quiet -casesensitive
If ($result -eq $True)
{
$match = $file.FullName
# Write the filename according to the entry
"Found : $control in: $match" | Out-File $output -Append
}
}
}
}
getStringMatch
I think this is the kind of thing you need, I've changed Select-String to not use the -quiet option, this will return a matches object, one of the properties of this is the line I then split the line on the ':' and trim any spaces. These results are then placed into a new PSObject which in turn is added to an array. The array is then put back on the pipeline at the end.
I also moved the call to get-content to avoid reading each file more than once.
# Create an array for results
$results = #()
# Loop through the project directory
Foreach ($file In $files)
{
# load the content once
$content = Get-Content $file.FullName
# Check all keywords
ForEach ($control In $controls)
{
# find the line containing the control string
$result = $content | Select-String $control -casesensitive
If ($result)
{
# tidy up the results and add to the array
$line = $result.Line -split ":"
$results += New-Object PSObject -Property #{
FileName = $file.FullName
Control = $line[0].Trim()
Value = $line[1].Trim()
}
}
}
}
# return the results
$results
Adding the results to a csv is just a case of piping the results to Export-Csv
$results | Export-Csv -Path "results.csv" -NoTypeInformation
If I understand your question correctly, you want some way to parse each line from your report files and extract values for some "keys". Here are a few lines to give you an idea of how you could proceede. The example is for one file, but can be generalized very easily.
$config = Get-Content ".\config.txt"
# The stuff you are searching for
$keys = #(
"Operating System",
"Product Type",
"Service Pack"
)
foreach ($line in $config)
{
$keys | %{
$regex = "\s*?$($_)\:\s*(?<value>.*?)\s*$"
if ($line -match $regex)
{
$value = $matches.value
Write-Host "Key: $_`t`tValue: $value"
}
}
}