fast line search in big array - powershell

fast line search in big array - powershell - powershell

I have some array (export from csv file) with ~ 100k lines . File size is ~ 22mb on disk
All i need is to find line with some data, process it and load to mssql (i need to sync csv data with mssql)
Problem is that search tooks almost 1 second (~ TotalMilliseconds : 655,0788)!
$csv.Where({$_.'Device UUID' -eq 'lalala'})
Any way just to speed it up?

Load all 100K rows into a hashtable, using the Device UUID property as the key - this will make it much faster to locate a row than by iterating the whole array with .Where({...}):
$deviceTable = #{}
Import-Csv .\path\to\device_list.csv |ForEach-Object {
$deviceTable[$_.'Device UUID'] = $_
}
This will now take significantly less than 1 second:
$matchingDevice = $deviceTable['lalala']

If you only need one or a few lookups, you can consider the following alternative to Mathias R. Jessen's helpful answer. Note that, like Mathias' solution, it requires reading all rows into memory at once:
# Load all rows into memory.
$allRows = Import-Csv file.csv
# Get the *index* of the row with the column value of interest.
# Note: This lookup is case-SENSITIVE.
$rowIndex = $allRows.'Device UUID'.IndexOf('lalala')
# Retrieve the row of interest by index, if found.
($rowOfInterest = if ($rowIndex -ne -1) { $allRows[$rowIndex] })
Once the rows are loaded into memory (as [pscustomobject] instances, which itself won't be fast), the array lookup - via member-access enumeration - is reasonably fast, thanks to .NET performing the (linear) array search, using the System.Array.IndexOf() method.
The problem with your .Where({ ... }) approach is that iteratively calling a PowerShell script block ({ ... }) many times is computationally expensive.
It comes down to the following trade-off:
Either: Spend more time up front to build up a data structure ([hashtable]) that allows efficient lookup (Mathias' answer)
Or: Read the file more quickly, but spend more time on each lookup (this answer).

Playing around with sqlite shell.
'Device UUID' | set-content file.csv
1..2200kb | % { get-random } | add-content file.csv # 1.44 sec, 25mb
'.mode csv
.import file.csv file' | sqlite3 file # 2.92 sec, 81mb
# last row
'select * from file where "device uuid" = 2143292650;' | sqlite3 file
# 'select * from file where "device uuid" > 2143292649 and "device uuid" < 2143292651;' | sqlite3 file
2143292650
(history)[-1] | % { $_.endexecutiontime - $_.startexecutiontime }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 570
Ticks : 5706795
TotalDays : 6.60508680555556E-06
TotalHours : 0.000158522083333333
TotalMinutes : 0.009511325
TotalSeconds : 0.5706795
TotalMilliseconds : 570.6795
# 34 ms after this:
# 'create index deviceindex on file("device uuid");' | sqlite3 file
# with ".timer on", it's 1ms, after the table is loaded

Related

Create a file of numbers based on a file containing count of numbers (tally)

I have a requirement to convert a .csv file containing data like this:
100,3
101,2
102,4
to a csv. file containing this:
100
100
100
101
101
102
102
102
102
I've written a macro in Excel that does this but the requirement is to carry this out against ~1 million records which crashes Excel.
Does anybody have a Powershell solution for this?

Assuming the CSV does not contain headers, I'd do the following:
Import-Csv tally.txt -Header Number,Tally |ForEach-Object {
,$_.Number * $_.Tally
} |Set-Content output.txt
The expression ,"100" * "2" will cause PowerShell to produce an array consisting of 2 copies of the string "100" - exactly the kind of expansion we want!

Extract log lines from a starting string to the first time stamp after with PowerShell

It is my first time that I am reaching back to you as I am stuck on something and been scratching my head for over a week now. It is worth saying that I just started with PowerShell a few months ago and I love using it for my scripts, but apparently my skills still need improving. I am unable to find a simple and elegant solution that would extract a log from clearly defined start line until the first empty line CF\LF or time stamp that follows.
I am attaching the log I am trying to extract the data from. To specify the problem and give some more details about the log lines - they can vary in number, the end line of each log can also vary and the time stamp is different for each log depending on the time the test was executed.
cls
# Grab the profile system path
$userProfilePath = $env:LOCALAPPDATA
# Define log path
$logPath = "$userProfilePath\DWIO\logs\IOClient.txt"
# Define the START log line matching string
# This includes the the tests that PASS and FAIL
$logStartLine = " TEST "
# Find all START log lines matching the string and grab their line number
$StartLine = (Get-Content $logPath | select-string $logStartLine)
#Get content from file
foreach ($start in $StartLine) {
# Extract the date time stamp from every starting line
$dateStamp = ($start -split ' ')[0]
#Regex pattern to compare two strings
$pattern = "(.*)$dateStamp"
#Perform the opperation
$result = [regex]::Match($file,$pattern).Groups[1].Value
Write-Host $result
}
The log format is like:
08-31 16:32:20 INFO - [IOBridgeThread - mPerformAndComputeIntegrityCheck] - BridgeAsyncCall - mPerformAndComputeIntegrityCheck Result = TEST PASSED
Average Camera Temperature :40.11911°C
Module 0
Nb Points: 50673 pts (>32500)
Noise:
AMD: 0.00449238 mm (<0.027)
STD DEV: 0.006961088 mm
Dead camera: false
Module 1
Nb Points: 53809 pts (>40000)
Noise:
AMD: 0.0055302843 mm (<0.027)
STD DEV: 0.00869096 mm
Dead camera: false
Module consistency
Weak module: false
M0 to M1
Distance: 0.007857603 mm (<0.015)
Angle: 0.022567615 degrees (<0.07)
Target
Position: 0.009392071 mm (<5.0)
Angle: 0.54686683 degrees (<5.0)
Intensity: 120.35959
08-31 16:32:20 INFO - [cIOScannerService RUNNING] - Scanner State is now Scan-Ready
The issue is that the line at the end of every log would be different as well as the log lines would differ so it is the only logical way to achieve the correct extraction is to match the first line which would always contain: " TEST " and then grab the log to the first timestamp appearance after or the empty line which also shows every time at the end of the log.
Just not sure how to achieve that and the code I have is returning no/empty matches, however if I echo $StartLine - it shows correctly the log starting lines.

You can match the first line that starts with a date time like format and contains TEST in the line. Then capture in group 1 all the content that does not start with a date time like format.
(?m)^\d{2}-\d{2} \d{2}:\d{2}:\d{2}.*\bTEST\b.*\r?\n((?:(?!\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*(?:\r?\n|$))*)
Explanation
(?m) Inline modifier for multiline
^ Start of line
\d{2}-\d{2} \d{2}:\d{2}:\d{2}.*\bTEST\b.* Match a date time like pattern followed by TEST in the line
\r?\n Match a newline
( Capture group 1
(?: Non capture group
(?!\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*(?:\r?\n|$) If the line does not start with a date time like pattern, match the whole line followed by either a newline or the end of the line
)* Close non capture group and repeat 0+ times
) Close group 1
See a regex101 demo and a .NET regex demo (click on the Table tab) and a powershell demo
You can use Get-Content -Raw to get the contents of a file as one string.
$textIOClient = Get-Content -Raw "$userProfilePath\DWIO\logs\IOClient.txt"
$pattern = "(?m)^\d{2}-\d{2} \d{2}:\d{2}:\d{2}.*\bTEST\b.*\r?\n((?:(?!\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*(?:\r?\n|$))*)"
Select-String $pattern -input $textIOClient -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}

I found an approach I really loved in this answer elsewhere on the site:
PowerShell - Search String in text file and display until the next delimeter
Using that, I wrote a little code around it in the following to show you how to use the results:
$itemCount = 1
$Server = ""
$Data = #()
$Collection = #()
Switch(GC C:\temp\stackTestlog.txt){
{[String]::IsNullOrEmpty($Server) -and !([String]::IsNullOrWhiteSpace($_))}{$Server = $_;Continue}
{!([String]::IsNullOrEmpty($Server)) -and !([String]::IsNullOrEmpty($_))}{$Data+="`n$_";Continue}
{[String]::IsNullOrEmpty($_)}{$Collection+=[PSCustomObject]#{Server=$Server;Data=$Data};Remove-Variable Server; $Data=#()}
}
If(!([String]::IsNullOrEmpty($Server))){$Collection+=[PSCustomObject]#{Server=$Server;Data=$Data};Remove-Variable Server; $Data=#()}
if(($null -eq $collection) -or ($Collection.Count -eq 0)){
Write-Warning "Could not parse file"
}
else{
Write-Output "Found $($collection.Count) members"
ForEach($item in $Collection){
#add additional code here if you need to do something with each parsed log entry
Write-Output "Item # $itemCount $($item.Server) records"
Write-Host $item.Data -ForegroundColor Cyan
$itemCount++
}
}
You can extend this in the line with a comment, and then remove the Write-output and Write-Host lines too.
Here's what it looks like in action.
Found 2 members
Item #1 08-31 16:32:20 INFO - [IOBridgeThread - mPerformAndComputeIntegrityCheck] - BridgeAsyncCall - mPerformAndCompu
teIntegrityCheck Result = TEST PASSED records
Average Camera Temperature :40.11911Â°C
#abridged...
Item #2 blahblahblah

Format output of an array into a string

I'm trying to format an array into a string.
What I'm doing is this:
$PysicalMemory | Format-Table #{n="Capacity(GB)";e={$_.Capacity/1GB}}, Speed
This gives me the output in this form:
Capacity(GB) Speed
------------ -----
4 1600
4 1600
But I would like to format it in a single string like this, but I have no luck:
4GB1600/4GB1600

this requires a slightly different method than you used, but it DOES give the output you seem to want & is easily tweaked ...
$CIM_RAM = #(Get-CimInstance CIM_PhysicalMemory)
$RAM_Info = foreach ($CR_Item in $CIM_RAM)
{
'{0}GB{1}Mhz' -f ($CR_Item.Capacity / 1GB), $CR_Item.Speed
}
$RAM_Info -join '/'
output = 2GB800Mhz/2GB800Mhz/2GB800Mhz/2GB800Mhz
yes, my ddr2 ram is really that slow. [grin]

Replace string untill its length is less than limit with PowerShell

I try to update users AD accounts properties with values imported from csv file.
The problem is that some of the properties like department allow strings of length of max length 64 that is less than provided in the file which can be up to 110.
I have found and adopted solution provided by TroyBramley in this thread - How to replace multiple strings in a file using PowerShell (thank You Troy).
It works fine but... Well. After all replaces have place the text is less meaningful than originally.
For example, original text First Department of something1 something2 something3 something4 would result in 1st Dept of sth1 sth2 sth3 sth4
I'd like to have control over the process so I can stop it when the length of the string drops just under the limit alowed by AD property.
By the way. I'd like to have a choice which replacement takes first, second and so on, too.
I put elements in a hashtable alphabetically but it seems that they are not processed this way. I can't figure out the pattern.
I can see the resolution by replacing strings one by one, controlling length after each replacement. But with almost 70 strings it leds to huge portion of code. Maybe there is simpler way?

You can iterate the replacement list until the string reaches the MaxLength defined.
## Q:\Test\2018\06\26\SO_51042611.ps1
$Original = "First Department of something1 something2 something3 something4"
$list = New-Object System.Collections.Specialized.OrderedDictionary
$list.Add("First","1st")
$list.Add("Department","Dept")
$list.Add("something1","sth1")
$list.Add("something2","sth2")
$list.Add("something3","sth3")
$list.Add("something4","sth4")
$MaxLength = 40
ForEach ($Item in $list.GetEnumerator()){
$Original = $Original -Replace $Item.Key,$Item.Value
If ($Original.Length -le $MaxLength){Break}
}
"{0}: {1}" -f $Original.Length,$Original
Sample output with $MaxLength set to 40
37: 1st Dept of sth1 sth2 sth3 something4

Issues importing csv column and replacing it from hash value

Please note that this data has been cleaned to prevent identifying information and considerable white space has been removed from between the commas in order to aid in readability. Lastly at the end of the TYPE column there is an additional line saying how many lines were exported which hopefully will be ignored by the script.
TYPE ,DATE ,TIME ,STREET ,CROSS-STREET ,X-COORD ,Y-COORD
459 ,2015-05-03 00:00:00.000,00:58:35,FOO DR ,A RD/B CT , 0.0, 0.0
488 ,2015-05-03 00:00:00.000,02:31:54,BAR AV ,C ST/D ST , 0.0, 0.0
I am attempting to import this CSV using Import-CSV, convert the TYPE numeric codes into different strings. An example would be 459 becomes Apple. 488 becomes Banana and so forth. I have created a hash with the TYPE numbers as the key and the value being what I want it changed to.
So my issue is really two-fold; I have been so far unable to get the TYPE CSV column to import into the script (I've been trying an array for the most part) and I am not sure the best way to build the logic to check the array data against my hash keys and replace it with the appropriate value.
# declare filename to modify
$strFileName="test.csv"
# import the type data into its own array
$imported_CSV = Import-Csv $strFileName
# populate hash
$conversion_Hash = #{
187 = Homicide;
211 = Robbery;
245 = Assault;
451 = Arson;
459 = Burglary;
484 = Larceny;
487 = Grand Theft;
488 = Petty Theft;
10851 = Stolen Vehicle;
HS = Drug;
}
# perform the conversion
foreach ($record in $imported_CSV)
{
$conversion_Hash[$record.Type]
}
This has no logic and just contains the code that was presented in the answer below. Note that I addressed that it doesn't work in the comments below.

I think this is an example of what you are looking for:
$hashTable = #{459= Apple; 488= Banana;}
$csv = import-csv <file>
foreach($record in $csv)
{
$hashTable[$record.Type] #returns hash value
}
Output:
Apple
Banana

So we have several little issues here. The two big ones are your source file and the your hashtable keys are integers and not strings.
# declare filename to modify
$strFileName="c:\temp\point.csv"
# import the type data into its own array
$imported_CSV = (Get-Content $strFileName) -replace "\s*,\s*","," | ConvertFrom-Csv
# populate hash
$conversion_Hash = #{
"187" = "Homicide";
"211" = "Robbery";
"245" = "Assault";
"451" = "Arson";
"459" = "Burglary";
"484" = "Larceny";
"487" = "Grand Theft";
"488" = "Petty Theft";
"10851" = "Stolen Vehicle";
"HS" = "Drug";
}
# perform the conversion
foreach ($record in $imported_CSV)
{
$conversion_Hash[$record.Type]
}
Output from naughty people
Burglary
Petty Theft
I don't know if your source file looks like it does in your question but there is a bunch of whitespace there that will be giving you a hassle. Namely you dont have a TYPE column but a "TYPE " (without the spaces). Same goes for the other columns. Data is affected as well. It's not 459 but "459 "(without the spaces).
To fix that I check the file and replace all space surrounding the commas with just the comma.
TYPE,DATE,TIME,STREET,CROSS-STREET,X-COORD,Y-COORD
459,2015-05-03 00:00:00.000,00:58:35,FOO DR,A RD/B CT,0.0,0.0
488,2015-05-03 00:00:00.000,02:31:54,BAR AV,C ST/D ST,0.0,0.0
If your data already looks like that then you need to be careful posting this stuff in your question. Onto the other issue with your comparison
You will see I have quoted almost everything in that hashtable. I had to for the values as they were being taken as commands otherwise. I also quoted the keys as the csv table contains string and not integers. I would have just casted to [int] to avoid the whole issue but one of your keys is called "HS" which does not look like a number to me :).
What I might have done
Just to play a little I might have added another note property to the list called TypeAsString which would add a column.
# perform the conversion
$imported_CSV | ForEach-Object{
$_ | Add-Member -MemberType NoteProperty -Name "TypeAsString" -Value $conversion_Hash[$_.Type] -PassThru
}
So the output from one item would look like this
TYPE : 459
DATE : 2015-05-03 00:00:00.000
TIME : 00:58:35
STREET : FOO DR
CROSS-STREET : A RD/B CT
X-COORD : 0.0
Y-COORD : 0.0
TypeAsString : Burglary
I could have made a more dynamic property like a script property, so that changes in $conversion_Hash are updated instantly, but this should suffice for what you need.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

fast line search in big array - powershell - powershell

Related

Create a file of numbers based on a file containing count of numbers (tally)

Extract log lines from a starting string to the first time stamp after with PowerShell

Format output of an array into a string

Replace string untill its length is less than limit with PowerShell

Issues importing csv column and replacing it from hash value

Categories

Resources