Powershell - creating hashtables from large text files and searching - powershell

I'm working with a hashtable which I've built using a list of 3.5 million IP addresses stored in CSV format, and I am trying to search through this table using wildcards.
The CSV is MaxMind's list of IPs, which I convert to Hashtable using the following code
[System.IO.File]::ReadLines("C:\temp\iptest.csv") | ForEach-Object { $data= $_.split(','); $ht = #{"geoname_id"="$($data[1])";"registered_country_geoname_id"="$($data[2])"}
$name = $($data[0])
$mainIPHhash.add($name, $ht)}
The code just pulls out the CIDR and it's corresponding City/Country code.
This works well, and builds the table in a little over two minutes, but the issue I am now facing is searching this hashtable for wild card entries.
If I search for a complete CIDR, the search happens in milliseconds
$mainIPHhash.item("1.0.0.0/24")
Measure command reports - TotalSeconds : 0.0001542
But if I need to do a wildcard search, it has to loop through the hashtable looking for my like values, which takes a long time!
$testingIP = "1.0.*"
$mainIPHhash.GetEnumerator() | Where-Object { $_.key -like $testingIP }
Measure command reports - TotalSeconds : 33.3016279
Is there a better way for searching wildcard entries in Hashtables?
Cheers
Edit:
Using a regex search, I can get it down to 19 seconds. But still woefully slow
$findsStr = "^$(($testingIP2).split('.')[0])" +"\."+ "$(($testingIP2).split('.')[1])" +"\."
$mainIPHhash.GetEnumerator() | foreach {if($_.Key -match $findsStr){#Dostuff }}
The above takes the first two octets of the IP address, and uses regex to find them in the hashtable.
Days : 0
Hours : 0
Minutes : 0
Seconds : 19
Milliseconds : 733
Ticks : 197339339
TotalDays : 0.000228402012731481
TotalHours : 0.00548164830555556
TotalMinutes : 0.328898898333333
TotalSeconds : 19.7339339
TotalMilliseconds : 19733.9339

You can take the list of IPs and do either -like or -match for a list. Either should be faster than a Where-Object clause
$mainIPhash.Values -like '1.0.*'
$mainIPhash.Values -match '^1\.0\.'

Other solution may be, use group-object :
$contentcsv=import-csv "C:\temp\iptest.csv" -Header Name, geoname_id, registered_country_geoname_id |Group Name
$contentcsv | where Name -like '1.0.*'

Related

Removing samaccountname from group after N time - powershell

I am looking to find a way to remove a user from a group after a specific amount of time.
Via the below link I found that you can find users that were added with 10 days or more:
https://gallery.technet.microsoft.com/scriptcenter/Find-the-time-a-user-was-a0bfc0cf#content
As an output I get the example below:
ModifiedCount : 2
DomainController : DC3
LastModified : 5/4/2013 6:48:06 PM
Username : joesmith
State : ABSENT
Group : CN=Domain Admins,CN=Users,DC=Domain,DC=Com
I would like to return SamAccountName instead of Username.
I was trying to look at code and I know this is something to do with the variable $pattern But I am not that good in powershell to know at first sight.
Looking at that code, the Username property IS the SamAccountName.
However, if you want to change that label, you can either simply change it on line 106 from
Username = [regex]::Matches($rep.context.postcontext,"CN=(?<Username>.*?),.*") | ForEach {$_.Groups['Username'].Value}
into:
SamAccountName = [regex]::Matches($rep.context.postcontext,"CN=(?<Username>.*?),.*") | ForEach {$_.Groups['Username'].Value}
Or change the label in the objects returned afterwards with a calculated property:
$returnedObjects | Select-Object #{Name = 'SamAccountName'; Expression = {$_.Username}}, * -ExcludeProperty Username

Exporting ranges of rows from large text files

I have about 5GBs of logdata I need to filter down and find matching rows and then include +/- 75 rows from the matching row. If the format of the data is important it is in broken XML which is missing some tags.
My code to find the rows with matches:
$ExampleFile = [System.IO.File]::ReadLines("C:\temp\filestomove\ExampleLog.txt")
$AccountNumber = "*123456789*"
$LineCount = 0
$RowsToExport = #()
foreach($line in $ExampleFile){
if($line -like "*$AccountNumber*"){
$RowsToExport += $LineCount
}
$LineCount += 1
}
Above code does the job fairly quickly, it manages about a MB of log per second. Which is a speed I can live with since it's a one time job.
What I am struggling with, is exporting the matched rows in a way that is not very slow.
My Current code for that looks something like this:
foreach($row in $RowsToExport){
$IndexRangeHigh = [int]$row + 75
$IndexRangeLow = [int]$row - 75
$test | select -Index ($IndexRangeLow..$IndexRangeHigh) | out-file C:\temp\Example.txt -append
}
That takes a really long time, I have my doubts about using select -index as I suspect it is very slow.
Measure-command on above gave me the following result on a 50MB test file:
TotalDays : 0,00354806909375
TotalHours : 0,08515365825
TotalMinutes : 5,109219495
TotalSeconds : 306,5531697
TotalMilliseconds : 306553,1697
While reading the file and matching the rows only took me 55 seconds.
To sum everything up to a question:
How can I export a range of rows from a large variable? Is there other method I can use to select rows from the $ExampleFile variable instead of using select -index ($ExampleRangeLow..$ExampleRangeHigh)?
PowerShell has a cmdlet (Select-String) that allows extracting text befor and/or after a match.
Select-String -Path 'C:\path\to\your.log' -Pattern '123456789' -Context 75
The output of Select-String is an object with several properties, so additional code is required if you need the matching lines in text form:
... | ForEach-Object {
$pre = $_.Context.PreContext | Out-String
$post = $_.Context.PostContext | Out-String
"{0}{1}`n{2}" -f $pre, $_.Line, $post
}

PowerShell: ConvertFrom-Json to export multiple objects to csv

As you probably understand from the title, I'm new to PowerShell, having a hard time even trying to describe my question. So please forgive my terminology.
Scenario
I am using PowerShell to query the audit log of Office 365. The cmdlet Search-UnifiedAuditLog returns "multiple sets of objects"(?), one of which has an array of other objects(?). The output is JSON if I got this right.
Here is an example of what is returned (I will call it one "Set of Objects"):
RunspaceId : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
RecordType : AzureActiveDirectoryStsLogon
CreationDate : 21/02/2017 12:05:23
UserIds : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Operations : UserLoggedIn
AuditData : {"CreationTime":"2017-02-21T12:05:23","Id":"{"ID":"00000000000000000","Type":3}],"ActorContextId":"xxxxxxxxxxxxxxxxxxxxxxxxx","ActorIpAddress":"xxxxxxxxxxxxx","InterSystemsId":"xxxxxxxxxxxxxxxxx","IntraSystemId":"000000000000-000000-000","Target":[{"ID":"00-0000-0000-c000-000000000000","Type":0}],"TargetContextId":"xxxxxxxxxxxxxxxxxx","ApplicationId":"xxxxxxxxxxxxxxxxxxxxxxxxxx"}
ResultIndex : 1
ResultCount : 16
Identity : xxxxxxxxxxxxxxxxxxxxxxxxxxx
IsValid : True
ObjectState : Unchanged
Now, I want some of the content of the AuditData line exported to a csv (normally containing much more data than copied here). This works fine for one "set of objects" (like the one above). To do this I use:
$LogOutput = Search-UnifiedAuditLog -StartDate 2/20/2017 -EndDate 2/23/2017 -ResultSize 1
$ConvertedOutput = ConvertFrom-Json -InputObject $LogOutput.AuditData
$ConvertedOutput | Select-Object CreationTime,UserId,ClientIP | Export-Csv -Path "C:\users\some.user\desktop\users.csv
ResultSize returns 1 instead of multiple "sets of objects". The ConvertFrom-Json does not work if I remove ResultSize.
So the question is:
Can I loop through all the "set of objects" and convert from json and have this exported line-by-line on a csv? Resulting in something like this:
UserId,Activity,UserIP
this#user.com, loggedIn, 10.10.10.10
that#user.org, accessedFile, 11.11.11.11
A pedagogic answer would be very, very much appreciated. Many thanks!
Instead of -ResultSize, try using Search-UnifiedAuditLog <args> | Select-Object -ExpandProperty AuditData | ConvertFrom-Json
This will make only the AuditData property get forwarded into ConvertFrom-Json and ignore the rest of the object from Search-UnifiedAuditLog

A better way of calculating top-32 file sizes for DFS folder staging size

A common task when setting up a DFS replica is to determine the size of the 32-largest files in the replicated folder - the sum of these should be the minimum size of the staging area, according to current best practice.
A method of finding and calculating this top-32 file size is given in a Technet blog: https://blogs.technet.microsoft.com/askds/2011/07/13/how-to-determine-the-minimum-staging-area-dfsr-needs-for-a-replicated-folder/
It relies on using Get-ChildItem to find all files and their sizes in a path, sort by size, discard all but the 32 largest, and then calculate the sum.
It's fine when you have a limited number of files in your path, but there are serious drawbacks when indexing a folder that has hundreds of thousands, if not millions of files. The process dumps everything into memory while it's executing - in my sample, it consumes over 2GB of virtual memory - and takes a long time, even when the individual files are quite small. The memory remains allocated until the PS instance is closed.
PS C:\> measure-command { (get-childitem F:\Folder\with\966693\items -recurse |
sort-object length -descending | select-object -first 32 |
measure-object -property length -sum).sum }
Days : 0
Hours : 0
Minutes : 6
Seconds : 6
Milliseconds : 641
Ticks : 3666410633
TotalDays : 0.00424353082523148
TotalHours : 0.101844739805556
TotalMinutes : 6.11068438833333
TotalSeconds : 366.6410633
TotalMilliseconds : 366641.0633
I'd be surprised if you could speed up Get-ChildItem much, unless you could avoid building [IO.FileInfo] objects for every file (.Net DirectorySearcher maybe?).
But you might be able to reduce the memory requirements by not keeping all the results, only the ongoing N largest, 100 in this example, but adjust to test memory / performance e.g.
$BufferSize = 100
$FileSizes = New-Object System.Collections.ArrayList
Get-ChildItem 'd:\downloads' -Force -Recurse -File | ForEach {
$null = $FileSizes.Add($_.Length)
if ($FileSizes.Count -gt $BufferSize)
{
$FileSizes.Sort()
$FileSizes.RemoveRange(0, ($BufferSize-32))
}
}
($FileSizes[0..31] | measure-object -Sum).Sum/1GB
Added -Force parameter to gci in case some of the biggest files are hidden.
With a slight tweak - instantiating a System.Collections.ArrayList to store the list of file lengths - the time to execute the query over the same directory is nearly halved. You're not constantly creating/destroying a standard fixed-sized array as you add a new item to it.
Memory usage for the Powershell process for this sample remains at less than 900MB. I also like having a variable to set to $null if I want to reuse the PS console.
measure command { $total = New-Object System.Collections.ArrayList;
gci F:\Folder\with\966693\items -file -r |
ForEach { $total.Add($_.length)>$null } ;
(($total | sort -descending | select -first 32 |measure-object -sum).sum/1GB) }
Days : 0
Hours : 0
Minutes : 3
Seconds : 34
Milliseconds : 215
Ticks : 2142159038
TotalDays : 0.00247935073842593
TotalHours : 0.0595044177222222
TotalMinutes : 3.57026506333333
TotalSeconds : 214.2159038
TotalMilliseconds : 214215.9038
Tidier multi-line version:
$total = New-Object System.Collections.ArrayList
gci F:\Folder\with\966693\items -file -r | ForEach { $total.Add($_.length)>$null }
($total | sort -descending | select -first 32 | measure-object -sum).sum/1GB

Throwing error intermittently

I am getting a weird error, and that too sometimes while executing my script. The error is:
Method invocation failed because [System.Object[]] does not contain a method named 'op_Subtraction'.
The line in which I get this error is:
$LineNr = $dbsnap_file | Select-String -Pattern $check | Select-Object -ExpandProperty LineNumber
$del = $dbsnap_file[$LineNr-13] -split ':' | Select-Object -Last 1
$dbsnap_file is gc (some_file). That file contents are like:
AllocatedStorage : 5
AvailabilityZone : us-west-1a
DBInstanceIdentifier : test-multisite
DBSnapshotIdentifier : test-multisite-2015-09-03-04-15
Encrypted : False
Engine : mysql
EngineVersion : 5.6.19a
InstanceCreateTime : 12/19/2014 5:19:26 AM
Iops : 0
KmsKeyId :
LicenseModel : general-public-license
MasterUsername : root
OptionGroupName : default:mysql-5-6
PercentProgress : 100
Port : 3306
SnapshotCreateTime : 9/2/2015 11:15:36 PM
SnapshotType : automated
$check has value like test-multisite-2015-09-03-04-15. So, what I get as $del is the SnapshotCreateTime.
Iam recieving this error intermittently, sometimes its working, sometimes not. Can someone please guide me through what will be the issue.?
Like CB was saying Select-String will return all matches. You were expecting only one and the code was built around that assumption. The error you are getting is fairly explicit.
[System.Object[]] does not contain a method named 'op_Subtraction'
You were trying to subtract 13 from an object instead of an integer. As discussed in chat it turned out the issue was your source file had a double of data.
The solution in this case was to clean your source. If you are comfortable with assumptions you can also address this issue by updating the select
$LineNr = $dbsnap_file | Select-String -Pattern $check | Select-Object -First 1 -ExpandProperty LineNumber
That will ensure only one is returned. Caveat being you are ignoring real data. So verify the source and the contents of the $LineNr are the solutions I would recommend here.