Need to extract 8 & 9 digit file numbers from 40,000 emails using PowerShell - powershell

I am attempting to extract 8 & 9 digit file numbers from 40,000 emails that have been saved as .txt files. The File numbers can appear any where in the email(s)...(it's not a standard form), but should always be 8 or 9 digits in length. The file numbers can also be formatted several different ways Like: xxx xx xxxx, xxx-xx-xxxx, xxxxxxxxx, 8 digit #'s: YY YYY YYY, YY-YYY-YYY, YYYYYYYY. I created a PowerShell script that reads the text file extracts the file numbers matching the said pattern and creates & saves them to a .csv file.
Problems: If there is any text proceeding the file# on the line, the script fails to grab the file #. It also grabs additional text (on the same line after the File #). I need only exact matches to the set patterns.
Solution does not need to be in PowerShell, If there is a better solution in vbscript I'm also open to that.
current script is below:
$Num = #()
$Num += Select-String -Path "$PSSCRIPTROOT\text.txt" -Pattern '\d{8}$|^\d{2}\s\d{3}\s\d{3}$|^\d{2}-\d{3}-\d{3}$'
$Num += Select-String -Path "$PSSCRIPTROOT\text.txt" -Pattern '\d{9}$|^\d{3}\s\d{2}\s\d{4}$|^\d{3}-\d{2}-\d{4}$'
ForEach ($Matches in $Num){
$Found = $Matches.ToString().Split(":")
$o = new-object PSObject
$o | add-member NoteProperty "FoundOnLine" $Found[2]
$o | add-member NoteProperty "Number" $Found[3]
$o | export-csv "$PSscriptroot\FoundNumbers.csv" -notypeinformation -Append
Write-Output $o
PLEASE HELP!

This should do the trick actually ...
$File = "$PSSCRIPTROOT\text.txt"
$Pattern = '\d\d(\s|-)*\d(\s|-)*\d(\s|-)*\d{4,5}'
Select-String -Path $File -Pattern $Pattern -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value

Related

Get a logfile for a specific date

I want to save in my computer "C:\logFiles" a specific date for logfile generated by program in another PC,
path that i will get from it the log file is "C:\Sut\Stat\03-2021.log"
Example : this file "C:\Sut\Stat\03-2021.Sutwin.log" contenant all the log of Mars month but i just want to get the log of last 7 Days from 19-03-2021 to 26-03-2021
I found this script in the internet but i doesn't work for me i need some help:
Example of the file .log in the photo attached:
Rest of image for the first screenshot :
my PC name : c01234
name of PC contenant log file : c06789
file that i will get from it the infos : 03-2021.Sutwin.log (exist in pc c06789)
i want to transfer the contents of just last 7 days in a folder in my PC c01234 with name Week11_LogFile
$log = "2015-05-09T06:39:34 Some information here
2015-05-09T06:40:34 Some information here
" -split "`n" | Where {$_.trim()}
#using max and min value for the example so all correct dates will comply
$upperLimit = [datetime]::MaxValue #replace with your own date
$lowerLimit = [datetime]::MinValue #replace with your own date
$log | foreach {
$dateAsText = ($_ -split '\s',2)[0]
try
{
$date = [datetime]::Parse($dateAsText)
if (($lowerLimit -lt $date) -and ($date -lt $upperLimit))
{
$_ #output the current item because it belongs to the requested time frame
}
}
catch [InvalidOperationException]
{
#date is malformed (maybe the line is empty or there is a typo), skip it
}
}
Based on your images, your log files look like simple tab-delimited files.
Assuming that's the case, this should work:
# Import the data as a tab-delimited file and add a DateTime column with a parsed value
$LogData = Import-Csv $Log -Delimiter "`t" |
Select-Object -Property *, #{n='DateTime';e={[datetime]::ParseExact($_.Date + $_.Time, 'dd. MMM yyHH:mm:ss', $null)}}
# Filter the data, drop the DateTime column, and write the output to a new tab-delimited file
$LogData | Where-Object { ($lowerLimit -lt $_.DateTime) -and ($_.DateTime -lt $upperLimit) } |
Select-Object -ExcludeProperty DateTime |
Export-Csv $OutputFile -Delimiter "`t"
The primary drawback here is that on Windows Powershell (v5.1 and below) you can only export the data quoted. On Powershell 7 and higher you can use -UseQuotes Never to prevent the fields from being double quote identified if that's important.
The only other drawback is that if these log files are huge then it will take a long time to import and process them. You may be able to improve performance by making the above a one-liner like so:
Import-Csv $Log -Delimiter "`t" |
Select-Object -Property *, #{n='DateTime';e={[datetime]::ParseExact($_.Date + $_.Time, 'dd. MMM yyHH:mm:ss', $null)}} |
Where-Object { ($lowerLimit -lt $_.DateTime) -and ($_.DateTime -lt $upperLimit) } |
Select-Object -ExcludeProperty DateTime |
Export-Csv $OutputFile -Delimiter "`t"
But if the log files are extremely large then you may run into unavoidable performance problems.
It's a shame your example of a line in the log file does not reveal the exact date format.
2015-05-09 could be yyyy-MM-dd or yyyy-dd-MM, so I'm guessing it's yyyy-MM-dd in below code..
# this is the UNC path where the log file is to be found
# you need permissions of course to read that file from the remote computer
$remotePath = '\\c06789\C$\Sut\Stat\03-2021.log' # or use the computers IP address instead of its name
$localPath = 'C:\logFiles\Week11_LogFile.log' # the output file
# set the start date for the week you are interested in
$startDate = Get-Date -Year 2021 -Month 3 -Day 19
# build an array of formatted dates for an entire week
$dates = for ($i = 0; $i -lt 7; $i++) { '{0:yyyy-MM-dd}' -f $startDate.AddDays($i) }
# create a regex string from that using an anchor '^' and the dates joined with regex OR '|'
$regex = '^({0})' -f ($dates -join '|')
# read the log file and select all lines starting with any of the dates in the regex
((Get-Content -Path $remotePath) | Select-String -Pattern $regex).Line | Set-Content -Path $localPath

Powershell script to return search results from a list of keywords

I have a project name called 'SFO104' and I have a list of serial numbers i.e 5011849, 5011850 etc and I have to search a long list of 500+ serial numbers to see if they exist in any other documents not relating to the project name SFO104 or the PO number 114786.
I was thinking of outputting the search results to a csv for each serial number searched but the below isnt working.
$searchWords = gc C:\Users\david.craven\Documents\list.txt
$results = #()
Foreach ($sw in $searchWords)
{
$files = gci -path C:\Users\david.craven\Dropbox\ -filter "*$sw*" -recurse | select FullName
foreach ($file in $files)
{
$object = New-Object System.Object
$object | Add-Member -Type NoteProperty –Name SearchWord –Value $sw
$object | Add-Member -Type NoteProperty –Name FoundFile –Value $file
$results += $object
}
}
$results | Export-Csv C:\Users\david.craven\Documents\results.csv -NoTypeInformation
The image below shows my search of the serial number 5011849 and the results returned correspond to project SFO104 which is as expected.
Your code works, the file is getting populated. However, what you have specified does not have the headers defined as in your screen shot. Also, what does that list.txt look like. My searchlist.txt is a single column file:
Hello
client
Using your code as is, only changing the file path and name, and a slight modification to where the filename is accessed, gives these results...
$searchWords = gc 'D:\Scripts\searchlist.txt'
$results = #()
Foreach ($sw in $searchWords)
{
$files = gci -path d:\temp -filter "*$sw*" -recurse
foreach ($file in $files)
{
$object = New-Object System.Object
$object | Add-Member -Type NoteProperty –Name SearchWord –Value $sw
$object | Add-Member -Type NoteProperty –Name FoundFile –Value $file.FullName
$results += $object
}
}
$results | Export-Csv d:\temp\searchresults.csv -NoTypeInformation
# Results
# psEdit -filenames 'd:\temp\searchresults.csv'
SearchWord FoundFile
---------- ---------
Hello D:\temp\Duplicates\PowerShellOutput.txt
Hello D:\temp\Duplicates\BeforeRename1\PowerShellOutput.txt
Hello D:\temp\Duplicates\PoSH\PowerShellOutput.txt
Hello D:\temp\Duplicates\Text\PowerShellOutput.txt
client D:\temp\Client.txt
client D:\temp\Duplicates\CertLabClients_v1.ps1
client D:\temp\Duplicates\Check Logon Server for Client.ps1
client D:\temp\Duplicates\Create Wireless Hosted Networks in Windows Clients.ps1
...
Update for OP
Since you are using a comma separate list. You need to break that into separate items. I changed my file to this
Hello,client
You cannot match on that layout unless you are trying to match the whole consecutive string. So, if I break the above this way ...
$searchWords = (gc 'D:\Scripts\searchlist.txt') -split ','
… thus the results are as shown before.
Update for the OP
Example, test with this (a different rough approach)...
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "d:\temp" -Recurse -include "*.txt","*.csv" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}
The LineNumber was sonly added so show where the string was located. Also, note, your code, and what I provide here, will only work for text, csv files.
If you plan to hit these, doc, docx, xls, xlsx, that means way more code as you have to use the default apps Word, Excel, to open and read these files.
This means using the COM Object model for each of those file types in your code. As discussed and shown here:
How do I make powershell search a Word document for wildcards and return the word it found?
You'd need to do a similar thing for Excel or PowerPoint, and if you have PDF, that requires and addon.
Update for OP
Like I said, I put this together quickly so it is a bit rough (no error handling, etc...) by I did test it using my input file and target folder tree and it does work.
# This is what my input looks like
Hello,client
595959, 464646
LIC
Running the code should have given you the results below, using only .txt,.csv files. Using any other file type will error by design as per my comment above regarding, you cannot use this approach for non text-based files without using the native app for the non text file type.
$searchWords = ((gc 'D:\Scripts\searchlist.txt') -split ',').Trim()
Foreach ($sw in $searchWords)
{
Get-Childitem -Path "d:\temp" -Recurse -include "*.txt","*.csv" |
Select-String -Pattern "$sw" |
Select Path,LineNumber,#{n='SearchWord';e={$sw}}
}
Path LineNumber SearchWord
---- ---------- ----------
D:\temp\Duplicates\BeforeRename1\PsGet.txt 157 Hello
...
D:\temp\Duplicates\PoSH\PsGet.txt 157 Hello
...
D:\temp\Duplicates\BeforeRename1\PoSH-Get-Mo... 108 client
D:\temp\Duplicates\BeforeRename1\Powershell ... 12 client
D:\temp\Duplicates\BeforeRename1\Powershell ... 15 client
D:\temp\Duplicates\BeforeRename1\PsGet.txt 454 client
...
D:\temp\newfile.txt 4 client
D:\temp\MyFile.txt 5 595959
D:\temp\ProcessNames.csv 4 595959
D:\temp\Duplicates\Text\JSON-CSS.txt 30 464646
D:\temp\Duplicates\JSON-CSS.txt 30 464646
D:\temp\MyFile.txt 5 464646
D:\temp\ProcessNames.csv 4 464646
D:\temp\Duplicates\BeforeRename1\GetSetScree... 7 LIC

I use -NoTypeInformation so why do I get header back when using Out-File?

I filtered by date this file data1.csv
2017.11.1,09:55,1.1,1.2,1.3,1.4,1
2017.11.2,09:55,1.5,1.6,1.7,1.8,2
I don't get a header with -NoTypeInformation:
$CutOff = (Get-Date).AddDays(-2)
$filePath = "data1.csv"
$Data = Import-Csv $filePath -Header Date,Time,A,B,C,D,E
$Data2 = $Data | Where-Object {$_.Date -as [datetime] -gt $Cutoff} | convertto-csv -NoTypeInformation -Delimiter "," | % {$_ -replace '"',''}
But when rewriting with Out-File
$Data2 | Out-File "data2.csv" -Encoding utf8 -Force
I get header back as data2.csv contains:
Date,Time,A,B,C,D,E
2017.11.2,09:55,1.5,1.6,1.7,1.8,2
Why do I have Date,Time,A,B,C,D,E ?
-NoTypeInformation is not about the header but the data type of the rows in the file. Remove it to see what shows up. From Microsoft
Omits the type information header from the output. By default, the string in the output contains #TYPE followed by the fully-qualified name of the object type.
Emphasis mine.
CSVs need headers. That is why it is making one. If you don't want to see the header in the output use Select-Object -Skip 1 to remove it.
$Data |
Where-Object {$_.Date -as [datetime] -gt $Cutoff} |
ConvertTo-CSV -NoTypeInformation -Delimiter "," |
Select-Object -Skip 1 |
% {$_ -replace '"'}
I would not pipe Out-File to itself. You could pipe to Set-Content here just as well.
I am guessing this whole process is to keep the source file in the same state just with some lines filtered out based on date. You could skip most of this just by parsing the date out in each line.
$threshold = (Get-Date).AddDays(-2)
$filePath = "c:\temp\bagel.txt"
(Get-Content $filePath) | Where-Object{
$date,$null=$_.Split(",",2)
[datetime]$date -gt $threshold
} | Set-Content $filePath
Now you don't have to worry about PowerShell CSV object structure or output since we act on the raw data of the file itself.
That will take each line of the input file and filter it out if the parsed date does not match the threshold. Change encoding on the input output cmdlets as you see necessary. What $date,$null=$_.Split(",",2) is doing is splitting the line
on the comma into 2 parts. First of which becomes $date and since this is just a filtering condition we dump the rest of the line into $null.
Properly-formed CSV files must have column headers. Your use of -NoTypeInformation in generating the CSV does not affect column headers; instead, it affects whether the PowerShell object type information is included. If you Export-CSV without -NoTypeInformation, the first line of your CSV file will have a line that looks like #TYPE System.PSCustomObject, which you don't want if you're going to open the CSV in a spreadsheet program.
If you subsequently Import-CSV, the headers (Date, Time, A, B, C) are used to create the fields of a PSObject, so that you can refer to them using the standard dot notation (e.g., $CSV[$line].Date).
The ability to specify -Header on Import-CSV is essentially a "hack" to allow the cmdlet to handle files that are comma-separated, but which did not include column headers.

How can I shift column values and add new ones in a CSV

I have to create a new column in my CSV data with PowerShell.
There is my code:
$csv = Import-Csv .\test1.csv -Delimiter ';'
$NewCSVObject = #()
foreach ($item in $csv)
{
$NewCSVObject += $item | Add-Member -name "ref" -value " " -MemberType NoteProperty
}
$NewCSVObject | export-csv -Path ".\test2.csv" -NoType
$csv | Export-CSV -Path ".\test2.csv" -NoTypeInformation -Delimiter ";" -Append
When I open the file, the column is here but a the right and I would like to have this at the left like column A. And I don't know if I can export the two object in one line like this (it doesn't work):
$csv,$NewCSVObject | Export-CSV -Path ".\test2.csv" -NoTypeInformation -Delimiter ";" -Append
The input file (It would have more lines than just the one):
A B C D E F G H
T-89 T-75 T-22 Y-23 Y-7 Y-71
The current output file:
A B C D E F G H
Y-23 Y-7 Y-71 ref: ref2:
The expected result in the Excel table, display "ref:" and "ref:2" before the product columns:
A B C D E F G H
ref: T-89 T-75 T-22 ref2: Y-23 Y-7 Y-71
This might be simpler if we just treat the file as a flat text file and save it in a csv format. You could use the csv objects and shift the values into other rows but that is not really necessary. Your approach of adding columns via Add-Member is not accomplishing this goal as it will be adding new columns and would not match your desired output. Export-CSV wants to write to file objects with the same properties as well which you were mixing which gave your unexpected results.
This is a verbose way of doing this. You could shorten this easily with something like regular expressions (see below). I opted for this method since it is a little easier to follow what is going on.
# Equivelent to Get-Content $filepath. This just shows what I am doing and is a portable solution.
$fileContents = "A;B;C;D;E;F;G;H",
"T-89;T-75;T-22;Y-23;Y-7;Y-71",
"T-89;T-75;T-22;Y-23;Y-7;Y-71"
$newFile = "C:\temp\csv.csv"
# Write the header to the output file.
$fileContents[0] | Set-Content $newFile
# Process the rest of the lines.
$fileContents | Select-Object -Skip 1 | ForEach-Object{
# Split the line into its elements
$splitLine = $_ -split ";"
# Rejoin the elements. adding the ref strings
(#("ref:") + $splitLine[0..2] + "ref2:" + $splitLine[3..5]) -join ";"
} | Add-Content $newFile
What the last line is going is concatenating an array. Starts with "ref:" add the first 3 elements of the split line followed by "ref2:" and the remaining elements. That new array is joined on semicolons and sent down the pipe to be outputted to the file.
If you are willing to give regex a shot this could be done with less code.
$fileContents = Get-Content "C:\source\file\path.csv"
$newFile = "C:\new\file\path.csv"
$fileContents[0] | Set-Content $newFile
($fileContents | Select-Object -Skip 1) -replace "((?:.*?;){3})(.*)",'ref:;$1ref2:;$2' | Add-Content $newFile
What that does is split each line beyond the first on the 3rd semicolon (Explanation). The replacement string is built from the ref strings and the matched content.
You can use Select-Object to specify order.
Assuming your headers are A-H (I know that instead of A it should be ref, from the code, but not sure if T-89 etc are your other headers)
$NewCSVObject | Select-Object A,B,C,D,E,F,G,H | Export-Csv -Path ".\test2.csv" -NoType

Using powershell, how do I extract a 7-digit number from a subject-line (of an email ), regular expressions?

I have the following code which lists the first 5 items in the Inbox folder (of Outlook).
How would I extract only the number portion of it( say - 7 digit arbitrary numberss, which are embedded within other text)? Then using Powershell commands, I'd really like to take those extracted numbers and dump them to a CSV file(thus, they can be easily incorporated into an existing spreadsheet I use).
Here's what I tried :
$outlook = new-object -com Outlook.Application
$sentMail = $outlook.Session.GetDefaultFolder(6) # == olFolderInbox
$sentMail.Items | select -last 10 TaskSubject # ideally, grabbing first 20
$matches2 = "\d+$"
$res = gc $sentMail.Items | ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] }
but this does not run correctly, but rather .. keeps me hanging with awaiting-input symbol: like so :
>>
>>
>>
Do I need to perhaps create a separate variable in between the 1st part and 2nd part?
Not sure what the $matches variable is for but try to replace your last line with something like below.
For Subject Line Items:
$sentMail.Items | % { $_.TaskSubject | Select-String -Pattern '^\d{3}-\d{3}-\d{4}' | % {([string]$_).Substring(0,12)} }
For Message Body Items:
$sentMail.Items | % { ($_.Body).Split("`n") | Select-String -Pattern '^\d{3}-\d{3}-\d{4}' |% {([string]$_).Substring(0,12)} }
Here is a refrence to Select-String which I use pretty often.
https://technet.microsoft.com/library/hh849903.aspx
Here is a reference to the Phone number portion which I have never used but found pretty cool.
http://blogs.technet.com/b/heyscriptingguy/archive/2011/03/24/use-powershell-to-search-a-group-of-files-for-phone-numbers.aspx
Good luck!
Here is an edited version for 7 digit extraction via subject line. This assumes the number has a space on each side but can be modified a bit if necessary. You may also want to adjust the depth by changing the -First portion to Select * or just making 100 deeper in range.
$outlook = New-Object -com Outlook.Application
$Mail = $outlook.Session.GetDefaultFolder(6) # Folder Inbox
$Mail.Items | select -First 100 TaskSubject |
% { $_.TaskSubject | Select-String -Pattern '\s\d{7}\s'} |
% {((Select-String -InputObject $_ -Pattern '\s\d{7}\s').Line).split(" ") |
% {if(($_.Length -eq 7) -and ($_ -match '\d{7}')) {$_ | Out-File -FilePath "C:\Temp\SomeFile.csv" -Append}}}
Some of this you have already addressed / figured out but I wanted to explain the issues with your current code.
If you expect multiple matches and want to return those then you would need to use Select-String with the -AllMatches parameter. Your regex, in your example, is currently looking for a sequence of digits at the end of the subject. That would only return one match so lets looks at the issues with your code.
$sentMail.Items | select -last 10 TaskSubject
You are filtering the last 10 items but you are not storing those for later use so they would merely be displayed on screen. We cover a solution later.
One of the primary reasons for using -match is to get the Boolean value that is returned for code like if blocks and where clauses. You can still use it in the way you intended. Looking at the current code in question:
$res = gc $sentMail.Items | ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] }
The two big issues with this are you are calling Get-Content(gc) on each item. Get-Content is for pulling file data which $sentMail.Items is not. You also having a large where block. Where blocks will pass data to the output steam based on a true or false condition. Your malformed statement ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] } wont do this... at least not well.
$outlook = new-object -com Outlook.Application
$sentMail = $outlook.Session.GetDefaultFolder(6) # == olFolderInbox
$matches2 = "\d+$"
$sentMail.Items | select -last 10 -ExpandProperty TaskSubject | ?{$_ -match $matches2} | %{$Matches[0]}
Take the last 10 email subjects and check if either of them match the regex string $matches2. If they do then return the string match to standard output.