Parse .rtf file, output email addresses in .csv file?
I have an .rtf file containing a bunch of email addresses, I need this parsed so that I can compare a .csv file to active users in Active Directory.
Basically I want what is to the left of "#my.domain.com"
$finds = Select-String -Path "path\to\my.rtf" -Pattern "#my.domain.com" | ForEach-Object {$_.Matches}
$finds | Select-Object -First 1 | ft *
This of course gives me one result so that I don't have alot of output.
I only manage to get matches or the complete line.
I've tried adding something along the line of
$finds = Select-String -Path "path\to\my.rtf" -Pattern "\w.#my.domain.com"
This gives me the very two last letters in the addresses.
If I keep adding dots to the "wildcard"
-Pattern "\w.....#my.domain.com"
I also get a ton of numbers/characters (.rtf formatting) for addresses that contains fewer characters.
How do I do this?
EDIT: I will update the question as soon as I've found a solution. As of now I'm trying with regular expressions.
Example:
-Pattern "\w*?#my.domain.com"
$mPattern = "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})"
$lines = get-content "path\to\your.rtf"
foreach($line in $lines){
([regex]::MAtch($rtfInput, $mpattern, "IgnoreCase ")).value }
This code worked for me. My inital code but with a new search pattern.
$finds = Select-String -Path "path\to\my.rtf" -Pattern "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})" | ForEach-Object {$_.Matches}
$finds | Select-Object -First 10 | ft *
Thanks!
Related
Hello and thank you for your time. Here is what I am looking to do. I have several log files that I need to search through. I do this by using Get-ChildItem -Path C:\mylogfiles\*.log | Select-String -Pattern 'MyTextHere' However, now I want to complicate my life and only select text that is between single quotes in the log files.
Here is a sample of my log file:
This is some sample text in my log file. It has a lot of garbage that I don't want to see. However, it has text that I want to find, and if found I would like it to save just the selected text to a CSV file. I want to copy everything that is between single quotes. Here comes the text 'Please copy this text that is between the single quotes'
Any idea how I would go about doing this?
The following combines Select-String with ForEach-Object to extract only the phrases of interest (the parts of the line that matched the regex), wraps them in a [pscustomobject] instance with a .Phrase property and exports the results with Export-Csv:
Select-String -Path C:\mylogfiles\*.log -AllMatches -Pattern "'.*?'" |
ForEach-Object {
foreach ($phrase in $_.Matches.Value) {
[pscustomobject] #{ Phrase = $phrase.Trim("'") }
}
} |
Export-Csv -NoTypeInformation -Encoding utf8 result.csv
Note: If there can only ever be at most one phrase of interest per line, you can omit the -AllMatches switch and replace the ForEach-Object call with the following Select-String call, which uses a calculated property:
# ... |
Select-Object -Property #{ Name='Phrase'; Expression={ $_.Matches.Value.Trim("'") } } |
# ...
I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.
I'm trying to get the output of two separate files although I'm stuck on the wild card or contains select-string search from file A (Names) in file B (name-rank).
The contents of file A is:
adam
george
william
assa
kate
mark
The contents of file B is:
12-march-2020,Mark-1
12-march-2020,Mark-2
12-march-2020,Mark-3
12-march-2020,william-4
12-march-2020,william-2
12-march-2020,william-7
12-march-2020,kate-54
12-march-2020,kate-12
12-march-2020,kate-44
And I need to match on every occurrence of the names after the '-' so my ordered output should look like this which is a combination of both files as the output:
mark
Mark-1
Mark-2
Mark-3
william
william-2
william-4
william-7
Kate
kate-12
kate-44
kate-54
So far I only have the following and I'd be grateful for any pointers or assistance please.
import-csv (c:\temp\names.csv) |
select-string -simplematch (import-csv c:\temp\names-rank.csv -header "Date", "RankedName" | select RankedName) |
set-content c:\temp\names-and-ranks.csv
I imagine the select-string isn't going to be enough and I need to write a loop instead.
The data you give in the example does not give you much to work with, and the desired output is not that intuitive, most of the time with Powershell you would like to combine the data in to a much richer output at the end.
But anyway, with what is given here and what you want, the code bellow will get what you need, I have left comments in the code for you
$pathDir='C:\Users\myUser\Downloads\trash'
$names="$pathDir\names.csv"
$namesRank="$pathDir\names-rank.csv"
$nameImport = Import-Csv -Path $names -Header names
$nameRankImport= Import-Csv -Path $namesRank -Header date,rankName
#create an empty array to collect the result
$list=#()
foreach($name in $nameImport){
#get all the match names
$match=$nameRankImport.RankName -like "$($name.names)*"
#add the name from the First list
$list+=($name.names)
#if there are any matches, add them too
if($match){
$list+=$match
}
}
#Because its a one column string, Export-CSV will now show us what we want
$list | Set-Content -Path "$pathDir\names-and-ranks.csv" -Force
For this I would use a combination of Group-Object and Where-Object to first group all "RankedName" items by the name before the dash, then filter on those names to be part of the names we got from the 'names.csv' file and output the properties you need.
# read the names from the file as string array
$names = Get-Content -Path 'c:\temp\names.csv' # just a list of names, so really not a CSV
# import the CSV file and loop through
Import-Csv -Path 'c:\temp\names-rank.csv' -Header "Date", "RankedName" |
Group-Object { ($_.RankedName -split '-')[0] } | # group on the name before the dash in the 'RankedName' property
Where-Object { $_.Name -in $names } | # use only the groups that have a name that can be found in the $names array
ForEach-Object {
$_.Name # output the group name (which is one of the $names)
$_.Group.RankedName -join [environment]::NewLine # output the group's 'RankedName' property joined with a newline
} |
Set-Content -Path 'c:\temp\names-and-ranks.csv'
Output:
Mark
Mark-1
Mark-2
Mark-3
william
william-4
william-2
william-7
kate
kate-54
kate-12
kate-44
I have a csv file with two columns and multiple rows, which has the information of files with folder location and its corresponding size, like below
"Folder_Path","Size"
"C:\MSSQL\DATA\UsersData\FTP.txt","21345"
"C:\MSSQL\DATA\UsersData\Norman\abc.csv","78956"
"C:\MSSQL\DATA\UsersData\Market_Database\123.bak","1234456"
What i want do is remove the "C:\MSSQL\DATA\" part from every row in the csv and keep the rest of the folder path after starting from UsersData and all other data intact as this info is repetitive. So my csv should like this below.
"Folder_Path","Size"
"UsersData\FTP.txt","21345"
"UsersData\Norman\abc.csv","78956"
"UsersData\Market_Database\123.bak","1234456"
What i am running is as below
Import-Csv ".\abc.csv" |
Select-Object -Property #{n='Folder_Path';e={$_.'Folder_Path'.Split('C:\MSSQL\DATA\*')[0]}}, * |
Export-Csv '.\output.csv' -NTI
Any help is appreciated!
Seems like a job for a simple string replace:
Get-Content "abc.csv" | foreach { $_.replace("C:\MSSQL\DATA\", "") | Set-Content "output.csv"
or:
[System.IO.File]::WriteAllText("output.csv", [System.IO.File]::ReadAllText("abc.csv" ).Replace("C:\MSSQL\DATA\", ""))
This should work:
Import-Csv ".\abc.csv" |
Select-Object -Property #{n='Folder_Path';e={$_.'Folder_Path' -replace '^.*\\(.*\\.*)$', '$1'}}, Size |
Export-Csv '.\output.csv' -NoTypeInformation
I have a directory on a server called 'servername'. In that directory, I have subdirectories whose name is a date. In those date directories, I have about 150 .csv file audit logs.
I have a partially working script that starts from inside the date directory, enumerates and loops through the .csv's and searches for a string in a column. Im trying to get it to export the row for each match then go on to the next file.
$files = Get-ChildItem '\\servername\volume\dir1\audit\serverbeingaudited\20180525'
ForEach ($file in $files) {
$Result = If (import-csv $file.FullName | Where {$_.'path/from' -like "*01May18.xlsx*"})
{
$result | Export-CSV -Path c:\temp\output.csv -Append}
}
What I am doing is searching the 'path\from' column for a string - like a file name. The column contains data that is always some form of \folder\folder\folder\filename.xls. I am searching for a specific filename and for all instances of that file name in that column in that file.
My issue is getting that row exported - export.csv is always empty. Id also like to start a directory 'up' and go through each date directory, parse, export, then go on to the next directory and files.
If I break it down to just one file and get it out of the IF it seems to give me a result so I think im getting something wrong in the IF or For-each but apparently thats above my paygrade - cant figure it out....
Thanks in advance for any assistance,
RichardX
The issue is your If block, when you say $Result = If () {$Result | ...} you are saying that the new $Result is equal what's returned from the if statement. Since $Result hasn't been defined yet, this is $Result = If () {$null | ...} which is why you are getting a blank line.
The If block isn't even needed. you filter your csv with Where-Object already, just keep passing those objects down the pipeline to the export.
Since it sounds like you are just running this against all the child folders of the parent, sounds like you could just use the -Recurse parameter of Get-ChildItem
Get-ChildItem '\\servername\volume\dir1\audit\serverbeingaudited\' -Recurse |
ForEach-Object {
Import-csv $_.FullName |
Where-Object {$_.'path/from' -like "*01May18.xlsx*"}
} | Export-CSV -Path c:\temp\output.csv
(I used a ForEach-Object loop rather than foreach just demonstrate objects being passed down the pipeline in another way)
Edit: Removed append per Bill_Stewart's suggestion. Will write out all entries for the the recursed folders in the run. Will overwrite on next run.
I don't see a need for appending the CSV file? How about:
Get-ChildItem '\\servername\volume\dir1\audit\serverbeingaudited\20180525' | ForEach-Object {
Import-Csv $_.FullName | Where-Object { $_.'path/from' -like '*01May18.xlsx*' }
} | Export-Csv 'C:\Temp\Output.csv' -NoTypeInformation
Assuming your CSVs are in the same format and that your search text is not likely to be present in any other columns you could use a Select-String instead of Import-Csv. So instead of converting string to object and back to string again, you can just process as strings. You would need to add an additional line to fake the header row, something like this:
$files = Get-ChildItem '\\servername\volume\dir1\audit\serverbeingaudited\20180525'
$result = #()
$result += Get-Content $files[0] -TotalCount 1
$result += ($files | Select-String -Pattern '01May18\.xlsx').Line
$result | Out-File 'c:\temp\output.csv'