I have an annoying report output (curse be to HP) that is not something I can shape by a query - it's all or nothing more or less. I would like to take two lines from each "chunk" of output and construct an array from this. I figured it would be a simple split() operation but no such luck. A sample of the output is as so:
Medium identifier : 1800010a:54bceddd:1d8c:0007
Medium label : [ARJ170L6] ARJ170L6
Location : [TapeLibrary: 24]
Medium Owner : wfukut01
Status : Poor
Blocks used [KB] : 2827596544
Blocks total [KB] : 2827596544
Usable space [KB] : 1024
Number of writes : 16
Number of overwrites : 4
Number of errors : 0
Medium initialized : 19 January 2015, 11:43:32
Last write : 26 April 2016, 21:02:12
Last access : 26 April 2016, 21:02:12
Last overwrite : 24 April 2016, 04:48:55
Protected : Permanent
Write-protected : No
Medium identifier : 1800010a:550aa81e:3a0c:0006
Medium label : [ARJ214L6] ARJ214L6
Location : External
Medium Owner : wfukut01
Status : Poor
Blocks used [KB] : 2904963584
Blocks total [KB] : 2904963584
Usable space [KB] : 0
Number of writes : 9
Number of overwrites : 7
Number of errors : 0
Medium initialized : 19 March 2015, 10:42:45
Last write : 30 April 2016, 22:14:19
Last access : 30 April 2016, 22:14:19
Last overwrite : 29 April 2016, 13:41:35
Protected : Permanent
Write-protected : No
What would be ideal is if the final output of this work would create an array somewhat similar to this:
Location UsableSpace
--------- -------------
External 0
TapeLibrary 1024
So I can (for example) query the output so that I can do operations on the data within the array:
$myvar | where-object { $_.Location -eq "TapeLibrary" }
Perhaps there are better approaches? I would be more than happy to hear them!
If the command is not a Powershell cmdlet like Kolob Canyon's answer then you would need to parse the text. Here's an inelegant example using -match and regex to find the lines with Location and Usable space [KB] and find the word characters after the colon.
((Get-Content C:\Example.txt -Raw) -split 'Medium identifier') | ForEach-Object {
[void]($_ -match 'Location\s+:\s(.*?(\w+).*)\r')
$Location = #($Matches.values | Where-Object {$_ -notmatch '\W'})[0]
[void]($_ -match 'Usable\sspace\s\[KB\]\s+:\s(.*?(\w+).*)\r')
$UsableSpace = #($Matches.values | Where-Object {$_ -notmatch '\W'})[0]
if ($Location -or $UsableSpace){
[PSCustomObject]#{
Location = $Location
UsableSpace = $UsableSpace
}
}
}
As this is extremely fragile and inelegant, it's much better to interact with an object where ever possible.
Assuming the data is as regular as it looks, you could use multiple assignment to extract the data from the array as in:
$data = 1, 2, "ignore me", 3, 10, 22, "ignore", 30
$first, $second, $null, $third, $data = $data
where the first, second and fourth array elements go into the variables, "ignore me" gets discarded in $null and the remaining data goes back into data. In your case, this would look like:
# Read the file into an array
$data = Get-Content data.txt
# Utility to fix up the data row
function FixUp ($s)
{
($s -split ' : ')[1].Trim()
}
# Loop until all of the data is processed
while ($data)
{
# Extract the current record using multiple assignment
# $null is used to eat the blank lines
$identifier,$null,$label,$location,$owner,$status,
$used,$total,$space,$writes,$overwrites,
$errors, $initialized, $lastwrite, $lastaccess,
$lastOverwrite, $protected, $writeprotected,
$null, $null, $data = $data
# Convert it into a custom object
[PSCustomObject] [ordered] #{
Identifier = fixup $identifier
Label = fixup $label
location = fixup $location
Owner = fixup $owner
Status = fixup $status
Used = fixup $used
Total = fixup $total
Space = fixup $space
Write = fixup $writes
OverWrites = fixup $overwrites
Errors = fixup $errors
Initialized = fixup $initialized
LastWrite = [datetime] (fixup $lastwrite)
LastAccess = [datetime] (fixup $lastaccess)
LastOverWrite = [datetime] (fixup $lastOverwrite)
Protected = fixup $protected
WriteProtected = fixup $writeprotected
}
}
Once you have the data extracted, you can format it any way you want
That looks a very regular pattern, so I'd say there are three typical approaches to this.
First: your own bucket-fill-trigger-empty parser, load lines in until you reach the next trigger ("Medium identifier"), then empty out the bucket to the pipeline and start a new one.
Something like:
$bucket = #{}
foreach ($line in Get-Content -LiteralPath C:\path\data.txt)
{
# if full, empty bucket to pipeline
if ($line -match '^Medium identifier')
{
[PSCustomObject]$bucket
$bucket = #{}
}
# add line to bucket (unless it's blank)
if (-not [string]::IsNullOrWhiteSpace($line))
{
$left, $right = $line.Split(':', 2)
$bucket[$left.Trim()] = $right.Trim()
}
}
# empty last item to pipeline
[PSCustomObject]$bucket
Adjust to taste for identifying numbers, dates, etc.
Second: a multiline regex: I tried, but can't. It would look something like:
# Not working, but for example:
$r = #'
Medium identifier : (?<MediumIdentifier>.*)
\s*
Write-protected : (?<WriteProtected>.*)
Blocks used [KB] : (?<BlockesUsed>.*)
Medium label : (?<MediumLabel>.*)
Last write : (?<LastWrite>.*)
Medium Owner : (?<MediumOwner>.*)
Usable space [KB] : (?<UsableSpaceKB>.*)
Number of overwrites : (?<NumberOfOverwrites>.*)
Last overwrite : (?<LastOverwrite>.*)
Medium identifier : (?<MediumIdentifier>.*)
Blocks total [KB] : (?<BlocksTotalKB>.*)
Number of errors : (?<NumberOfErrors>.*)
Medium initialized : (?<MediumInitialized>.*)
Status : (?<Status>.*)
Location : (?<Location>.*)
Protected : (?<Protected>.*)
Number of writes : (?<NumberOfWrites>.*)
Last access : (?<LastAccess>.*)
\s*
'#
[regex]::Matches((get-content C:\work\a.txt -Raw), $r,
[System.Text.RegularExpressions.RegexOptions]::IgnoreCase +
[System.Text.RegularExpressions.RegexOptions]::Singleline
)
Third: ConvertFrom-String - http://www.lazywinadmin.com/2014/09/powershell-convertfrom-string-and.html or https://blogs.technet.microsoft.com/ashleymcglone/2016/09/14/use-the-new-powershell-cmdlet-convertfrom-string-to-parse-klist-kerberos-ticket-output/ then after you made the template
Get-Content data.txt | ConvertFrom-String -TemplateFile .\template.txt
The easiest way is to use the command itself and select certain properties. If the command is a powershell cmdlet, it should return an Object.
$output = Some-HPCommand | select 'Medium label', 'Location'
Then you can access specific properties:
$output.'Medium label'
$output.Location
If you can provide the exact command, I can write this more accurately.
The biggest issue when people are learning powershell is they treat output like a String... Everything in PowerShell is object-oriented, and once you begin to think in terms of Objects, it becomes much easier to process data; in other words, always try to handle output as objects or arrays of objects. It will make your life a hell of a lot easier.
If each section is in the same format, i.e. the Usable space section is always 5 lines down from the location then you can use the Select-String in combination with the context parameter. Something like this:
Select-String .\your_file.txt -Pattern '(?<=Location\s*:\s).*' -Context 0, 5 | % {
New-Object psobject -Property #{
Location = (($_.Matches[0] -replace '\[|\]', '') -split ':')[0]
UsableSpace = ($_.Context.PostContext[4] -replace '^\D+(\d+)$', '$1' )
}
}
Related
I am trying to parse a very large log file that consists of space delimited text across about 16 fields. Unfortunately the app logs a blank line in between each legitimate one (arbitrarily doubling the lines I must process). It also causes fields to shift because it uses space as both a delineator as well as for empty fields. I couldn't get around this in LogParser. Fortunately Powershell affords me the option to reference fields from the end as well making it easier to get later fields affected by shift.
After a bit of testing with smaller sample files, I've determined that processing line by line as the file is streaming with Get-Content natively is slower than just reading the file completely using Get-Content -ReadCount 0 and then processing from memory. This part is relatively fast (<1min).
The problem comes when processing each line, even though it's in memory. It is taking hours for a 75MB file with 561178 legitimate lines of data (minus all the blank lines).
I'm not doing much in the code itself. I'm doing the following:
Splitting line via space as delimiter
One of the fields is an IP address that I am reverse DNS resolving, which is obviously going to be slow. So I have wrapped this into more code to create an in-memory arraylist cache of previously resolved IPs and pulling from it when possible. The IPs are largely the same so after a few hundred lines, resolution shouldn't be an issue any longer.
Saving the needed array elements into my pscustomobject
Adding pscustomobject to arraylist to be used later.
During the loop I'm tracking how many lines I've processed and outputting that info in a progress bar (I know this adds extra time but not sure how much). I really want to know progress.
All in all, it's processing some 30-40 lines per second, but obviously this is not very fast.
Can someone offer alternative methods/objectTypes to accomplish my goals and speed this up tremendously?
Below are some samples of the log with the field shift (Note this is a Windows DNS Debug log) as well as the code below that.
10/31/2022 12:38:45 PM 2D00 PACKET 000000B25A583FE0 UDP Snd 127.0.0.1 6c94 R Q [8385 A DR NXDOMAIN] AAAA (4)pool(3)ntp(3)org(0)
10/31/2022 12:38:45 PM 2D00 PACKET 000000B25A582050 UDP Snd 127.0.0.1 3d9d R Q [8081 DR NOERROR] A (4)pool(3)ntp(3)org(0)
NOTE: the issue in this case being [8385 A DR NXDOMAIN] (4 fields) vs [8081 DR NOERROR] (3 fields)
Other examples would be the "R Q" where sometimes it's " Q".
$Logfile = "C:\Temp\log.txt"
[System.Collections.ArrayList]$LogEntries = #()
[System.Collections.ArrayList]$DNSCache = #()
# Initialize log iteration counter
$i = 1
# Get Log data. Read entire log into memory and save only lines that begin with a date (ignoring blank lines)
$LogData = Get-Content $Logfile -ReadCount 0 | % {$_ | ? {$_ -match "^\d+\/"}}
$LogDataTotalLines = $LogData.Length
# Process each log entry
$LogData | ForEach-Object {
$PercentComplete = [math]::Round(($i/$LogDataTotalLines * 100))
Write-Progress -Activity "Processing log file . . ." -Status "Processed $i of $LogDataTotalLines entries ($PercentComplete%)" -PercentComplete $PercentComplete
# Split line using space, including sequential spaces, as delimiter.
# NOTE: Due to how app logs events, some fields may be blank leading split yielding different number of columns. Fortunately the fields we desire
# are in static positions not affected by this, except for the last 2, which can be referenced backwards with -2 and -1.
$temp = $_ -Split '\s+'
# Resolve DNS name of IP address for later use and cache into arraylist to avoid DNS lookup for same IP as we loop through log
If ($DNSCache.IP -notcontains $temp[8]) {
$DNSEntry = [PSCustomObject]#{
IP = $temp[8]
DNSName = Resolve-DNSName $temp[8] -QuickTimeout -DNSOnly -ErrorAction SilentlyContinue | Select -ExpandProperty NameHost
}
# Add DNSEntry to DNSCache collection
$DNSCache.Add($DNSEntry) | Out-Null
# Set resolved DNS name to that which came back from Resolve-DNSName cmdlet. NOTE: value could be blank.
$ResolvedDNSName = $DNSEntry.DNSName
} Else {
# DNSCache contains resolved IP already. Find and Use it.
$ResolvedDNSName = ($DNSCache | ? {$_.IP -eq $temp[8]}).DNSName
}
$LogEntry = [PSCustomObject]#{
Datetime = $temp[0] + " " + $temp[1] + " " + $temp[2] # Combines first 3 fields Date, Time, AM/PM
ClientIP = $temp[8]
ClientDNSName = $ResolvedDNSName
QueryType = $temp[-2] # Second to last entry of array
QueryName = ($temp[-1] -Replace "\(\d+\)",".") -Replace "^\.","" # Last entry of array. Replace any "(#)" characters with period and remove first period for friendly name
}
# Add LogEntry to LogEntries collection
$LogEntries.Add($LogEntry) | Out-Null
$i++
}
Here is a more optimized version you can try.
What changed?:
Removed Write-Progress, especially because it's not known if Windows PowerShell is used. PowerShell versions below 6 have a big performance impact with Write-Progress
Changed $DNSCache to Generic Dictionary for fast lookups
Changed $LogEntries to Generic List
Switched from Get-Content to switch -Regex -File
$Logfile = 'C:\Temp\log.txt'
$LogEntries = [System.Collections.Generic.List[psobject]]::new()
$DNSCache = [System.Collections.Generic.Dictionary[string, psobject]]::new([System.StringComparer]::OrdinalIgnoreCase)
# Process each log entry
switch -Regex -File ($Logfile) {
'^\d+\/' {
# Split line using space, including sequential spaces, as delimiter.
# NOTE: Due to how app logs events, some fields may be blank leading split yielding different number of columns. Fortunately the fields we desire
# are in static positions not affected by this, except for the last 2, which can be referenced backwards with -2 and -1.
$temp = $_ -Split '\s+'
$ip = [string] $temp[8]
$resolvedDNSRecord = $DNSCache[$ip]
if ($null -eq $resolvedDNSRecord) {
$resolvedDNSRecord = [PSCustomObject]#{
IP = $ip
DNSName = Resolve-DnsName $ip -QuickTimeout -DnsOnly -ErrorAction Ignore | select -ExpandProperty NameHost
}
$DNSCache[$ip] = $resolvedDNSRecord
}
$LogEntry = [PSCustomObject]#{
Datetime = $temp[0] + ' ' + $temp[1] + ' ' + $temp[2] # Combines first 3 fields Date, Time, AM/PM
ClientIP = $ip
ClientDNSName = $resolvedDNSRecord.DNSName
QueryType = $temp[-2] # Second to last entry of array
QueryName = ($temp[-1] -Replace '\(\d+\)', '.') -Replace '^\.', '' # Last entry of array. Replace any "(#)" characters with period and remove first period for friendly name
}
# Add LogEntry to LogEntries collection
$LogEntries.Add($LogEntry)
}
}
If it's still slow, there is still the option to use Start-ThreadJob as a multithreading approach with chunked lines (like 10000 per job).
I have an input file with below contents:
27/08/2020 02:47:37.365 (-0516) hostname12 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'session1' (from 'vmpms1\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 1, session category usage now 1, total module concurrent usage now 1, total category usage now 1)
27/08/2020 02:47:37.600 (-0516) hostname13 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'sssion2' (from 'vmpms2\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT-Read' - 1 licenses have been allocated by concurrent usage category 'Floating' (session module usage now 2, session category usage now 2, total module concurrent usage now 1, total category usage now 1)
27/08/2020 02:47:37.115 (-0516) hostname141 ult_licesrv CMN 5 Logging Housekee 00000 Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_011130.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020 02:47:37.115 (-0516) hostname141 ult_licesrv CMN 5 Logging Housekee 00000 Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_021310.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020 02:47:37.625 (-0516) hostname150 ult_licesrv ULT 5 LiceSrv Main[108 00000 Session 'session1' (from 'vmpms1\app1#pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 2, session category usage now 1, total module concurrent usage now 2, total category usage now 1)
I need to generate and output file like below:
Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage
27/08/2020,02:47:37.365 (-0516),hostname12,1,1,1,1
27/08/2020,02:47:37.600 (-0516),hostname13,2,2,1,1
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.625 (-0516),hostname150,2,1,2,1
The output data order is: Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage.
Put 0,0,0,0 if no entry for session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage
I need to get content from the input file and write the output to another file.
Update
I have created a file input.txt in F drive and pasted the log details into it.
Then I form an array by splitting the file content when a new line occurs like below.
$myList = (Get-Content -Path F:\input.txt) -split '\n'
Now I got 5 items in my array myList. Then I replace the multiple blank spaces with a single blank space and formed a new array by splitting each element by blank space. Then I print the 0 to 3 array elements. Now I need to add the end values (session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage).
PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
PS C:\Users\user> for ($i = 0; $i -le ($myList.length - 1); $i += 1) {
>> $newList = ($myList[$i] -replace '\s+', ' ') -split ' '
>> $newList[0]+','+$newList[1]+' '+$newList[2]+','+$newList[3]
>> }
27/08/2020,02:47:37.365 (-0516),hostname12
27/08/2020,02:47:37.600 (-0516),hostname13
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.625 (-0516),hostname150
If you really need to filter on the granularity that you're looking for, then you may need to use regex to filter the lines.
This would assume that the rows have similarly labeled lines before the values you're looking for, so keep that in mind.
[System.Collections.ArrayList]$filteredRows = #()
$log = Get-Content -Path C:\logfile.log
foreach ($row in $log) {
$rowIndex = $log.IndexOf($row)
$date = ([regex]::Match($log[$rowIndex],'^\d+\/\d+\/\d+')).value
$time = ([regex]::Match($log[$rowIndex],'\d+:\d+:\d+\.\d+\s\(\S+\)')).value
$hostname = ([regex]::Match($log[$rowIndex],'(?<=\d\d\d\d\) )\w+')).value
$sessionModuleUsage = ([regex]::Match($log[$rowIndex],'(?<=session module usage now )\d')).value
if (!$sessionModuleUsage) {
$sessionModuleUsage = 0
}
$sessionCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=session category usage now )\d')).value
if (!$sessionCategoryUsage) {
$sessionCategoryUsage = 0
}
$moduleConcurrentUsage = ([regex]::Match($log[$rowIndex],'(?<=total module concurrent usage now )\d')).value
if (!$moduleConcurrentUsage) {
$moduleConcurrentUsage = 0
}
$totalCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=total category usage now )\d')).value
if (!$totalCategoryUsage) {
$totalCategoryUsage = 0
}
$hash = [ordered]#{
Date = $date
time = $time
hostname = $hostname
session_module_usage = $sessionModuleUsage
session_category_usage = $sessionCategoryUsage
module_concurrent_usage = $moduleConcurrentUsage
total_category_usage = $totalCategoryUsage
}
$rowData = New-Object -TypeName 'psobject' -Property $hash
$filteredRows.Add($rowData) > $null
}
$csv = $filteredRows | convertto-csv -NoTypeInformation -Delimiter "," | foreach {$_ -replace '"',''}
$csv | Out-File C:\results.csv
What essentially needs to happen is that we need to get-content of the log, which returns an array with each item terminated on a newline.
Once we have the rows, we need to grab the values via regex
Since you want zeroes in some of the items if those values don't exist, I have if statements that assign '0' if the regex returns nothing
Finally, we add each filtered item to a PSObject and append that object to an array of objects in each iteration.
Then export to a CSV.
You can probably pick apart the lines with a regex and substrings easily enough. Basically something like the following:
# Iterate over the lines of the input file
Get-Content F:\input.txt |
ForEach-Object {
# Extract the individual fields
$Date = $_.Substring(0, 10)
$Time = $_.Substring(12, $_.IndexOf(')') - 11)
$Hostname = $_.Substring(34, $_.IndexOf(' ', 34) - 34)
$session_module_usage = 0
$session_category_usage = 0
$module_concurrent_usage = 0
$total_category_usage = 0
if ($_ -match 'session module usage now (\d+), session category usage now (\d+), total module concurrent usage now (\d+), total category usage now (\d+)') {
$session_module_usage = $Matches[1]
$session_category_usage = $Matches[2]
$module_concurrent_usage = $Matches[3]
$total_category_usage = $Matches[4]
}
# Create custom object with those properties
New-Object PSObject -Property #{
Date = $Date
time = $Time
hostname = $Hostname
session_module_usage = $session_module_usage
session_category_usage = $session_category_usage
module_concurrent_usage = $module_concurrent_usage
total_category_usage = $total_category_usage
}
} |
# Ensure column order in output
Select-Object Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage |
# Write as CSV - without quotes
ConvertTo-Csv -NoTypeInformation |
ForEach-Object { $_ -replace '"' } |
Out-File F:\output.csv
Whether to pull the date, time, and host name from the line with substrings or regex is probably a matter of taste. Same goes for how strict the format must be matched, but that to me mostly depends on how rigid the format is. For more free-form things where different lines would match different regexes, or multiple lines makes up a single record, I also quite like switch -Regex to iterate over the lines.
I have a list of pdf filenames that need to be parsed and ultimately sent to a sql table, with the parse out pieces each in their own column. How would I split based on a dash '-' and ultimately get it into a table.
What cmdlets would you start with to split on a character? I need to split based on the dash '-'.
Thanks for the help.
Example File Names:
tester-2458-full_contact_snapshot-20200115_1188.pdf
tester-2458-limited_contact_snapshot-20200119_9330.pdf
Desired Results:
There is also a -split operator.
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_split
basic example:
if you have file names in $FilePaths array.
foreach($filepath in $FilePaths)
{
$parts = $filepath -split '-';
[pscustomobject]#{"User" = $parts[0]; "AppID" = $parts[1]; "FileType" = $parts[2]; "FilePath"=$filepath }
}
Use $variable.split('-') which will return a string array with a length equal to however many elements are produced by the split operation.
yet another way is to use regex & named capture groups. [grin]
what it does ...
creates a set of file name strings to work with
when ready to use real data, remove the entire #region/#endregion block and use either (Get-ChildItem).Name or another method that gives you plain strings.
iterates thru the collection of file name strings
uses $Null = to suppress the False/True output of the -match call
does a regex match with named capture groups
uses the $Match automatic variable to plug the captured values into the desired properties of a [PSCustomObject]
sends that PSCO out to the $Results collection
displays that on screen
sends it to a CSV for later use
the code ...
#region >>> fake reading in a list of file names
# in real life, use (Get-ChildItem).Name
$InStuff = #'
tester-2458-full_contact_snapshot-20200115_1188.pdf
tester-2458-limited_contact_snapshot-20200119_9330.pdf
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a list of file names
$Results = foreach ($IS_Item in $InStuff)
{
$Null = $IS_Item -match '^(?<User>.+)-(?<AppId>.+)-(?<FileType>.+)-(?<Date>.+)\.pdf$'
[PSCustomObject]#{
User = $Matches.User
AppId = $Matches.AppId
FileType = $Matches.FileType
Date = $Matches.Date
FileName = $IS_Item
}
}
# display on screen
$Results
# send to CSV file
$Results |
Export-Csv -LiteralPath "$env:TEMP\JM1_-_FileReport.csv" -NoTypeInformation
output to screen ...
User : tester
AppId : 2458
FileType : full_contact_snapshot
Date : 20200115_1188
FileName : tester-2458-full_contact_snapshot-20200115_1188.pdf
User : tester
AppId : 2458
FileType : limited_contact_snapshot
Date : 20200119_9330
FileName : tester-2458-limited_contact_snapshot-20200119_9330.pdf
content of the C:\Temp\JM1_-_FileReport.csv file ...
"User","AppId","FileType","Date","FileName"
"tester","2458","full_contact_snapshot","20200115_1188","tester-2458-full_contact_snapshot-20200115_1188.pdf"
"tester","2458","limited_contact_snapshot","20200119_9330","tester-2458-limited_contact_snapshot-20200119_9330.pdf"
I have a text output which shows the runtimes of each selection I make in a script.
Check Level: 0, 38.99607466333333
Check Level: 1, 60.93540646055553
etc.
What I'd like to do is have some read-host lines, showing a choice of what level I'd like to go to, and next to that the variable showing how long on average it takes, i.e. 'Checklevel 1 takes 60 minutes'.
The following script works, but I can't help thinking there's a better alternative:
$CheckLevel0 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 0,*"}
$CheckLevel1 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 1,*"}
$CheckLevel2 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 2,*"}
$CheckLevel3 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 3,*"}
$CheckLevel4 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 4,*"}
$CheckLevel5 = Get-Content $RuntimeFile | Where {$_ -like "Check Level: 5,*"}
Ideally, I'd expect to have all the $CheckLevelx variables populated with one or two lines... I've tried all sorts.
Whlist gvee's solution is simple and elegant, it doesn't work if you'd like to show a menu that shows execution times too.
That being said, you are right on the track about simpler a solution. Whenever one has more than, say, three variables named value0, value1, ... valuen, it's usually time to use a data structure. An array would be an obvious choice, quite often a hashtable would do too. Via .Net, there are many types for more special needs.
If you need to do more complex processing with the data file, consider preprocessing it. Let's use a regex and hashtable like so,
# Some dummy data. Note the duplicate entry for level 1
$d = ('Check Level: 0, 38.99607466333333',`
'Check Level: 1, 60.93540646055553',`
'Check Level: 2, 34.43543543967473',`
'Check Level: 1, 99.99990646055553')
# A regular expression to match strings
$rex = [regex]::new('Check Level: (\d+),.*')
# Populate a hashtable with contents
$ht = #{}
$d | % {
$level = $rex.Match($_).groups[1].value
$line = $rex.Match($_).groups[0].value
if( $ht.ContainsKey($level)) {
# Handle duplicates here.
$ht[$level] = $line
}
else {
$ht.Add($level, $line)
}
}
# Print the hashtable in key order.
$ht.GetEnumerator() | sort
Name Value
---- -----
0 Check Level: 0, 38.99607466333333
1 Check Level: 1, 99.99990646055553
2 Check Level: 2, 34.43543543967473
I get the below output from a PowerShell query. I don't have access to the server to run the query, so I have no option to influence the output format.
Example:
Name : folderC
FullName : D:\folderA\folderB\folderC
Length :
CreationTime : 2/8/2014 11:12:58 AM
LastAccessTime: 2/8/2014 11:12:58 AM
Name : filename.txt
FullName : D:\folderA\folderB\filename.txt
Length : 71560192
CreationTime : 11/25/2015 3:10:43 PM
LastAccessTime: 11/25/2015 3:10:43 PM
How can I format above content to get something more usable, maybe like a table format like so:
Name|FullName|Length|CreationTime|LastAccessTime
I think you need to split the text into records, replace the colons with equals so that you can use the ConvertFrom-StringData to turn each record into a hash which you can then feed into New-Object to convert into an object. Outputting the the object into pipe separated data can then be done with the ConvertTo-Csv. Something like so:
$x = #"
Name : folderC
FullName : D:\folderA\folderB\folderC
Length : 0
CreationTime : 2/8/2014 11:12:58 AM
LastAccessTime : 2/8/2014 11:12:58 AM
Name : filename.txt
FullName : D:\folderA\folderB\filename.txt
Length : 71560192
CreationTime : 11/25/2015 3:10:43 PM
LastAccessTime : 11/25/2015 3:10:43 PM
"#
($x -split '[\r\n]+(?=Name)') | % {
$_ -replace '\s+:\s+', '='
} | % {
$_ | ConvertFrom-StringData
} | % {
New-Object psobject -Property $_
} | ConvertTo-Csv -Delimiter '|' -NoTypeInformation
As #alroc notes in a comment on the question, it is possible that objects are available to the OP, given that they state that the output is "from a Powershell query" - if so, simple reformatting of the object array using the usual cmdlets is an option.
By contrast, this answer assumes that only a text representation, as printed in the question, is available.
Dave Sexton's answer is a simpler and more elegant choice, if:
the input has no empty values (the OP's sample input does).
the input file is small enough to be read into memory as a whole.
Consider the approach below to avoid the issues above and/or if you want more control over how the input is converted into custom objects, notably with respect to creating properties with types other than [string]: extend the toObj() function below (as written, all properties are also just strings).
Get-Content File | % `
-begin {
function toObj([string[]] $lines) {
$keysAndValues = $lines -split '(?<=^[^ :]+)\s*: '
$htProps = #{}
for ($i = 0; $i -lt $keysAndValues.Count; $i += 2) {
$htProps.($keysAndValues[$i]) = $keysAndValues[$i+1]
}
return [PSCustomObject] $htProps
}
$lines = #()
} `
-process {
if ($_.trim() -ne '') {
$lines += $_
} else {
if ($lines) { toObj $lines }
$lines = #()
}
} `
-end {
if ($lines) { toObj $lines }
} | Format-Table
Explanation:
Uses ForEach-Object (%) with separate begin, process, and end blocks.
The -begin, executed once at the beginning:
Defines helper function toObj() that converts a block of contiguous nonempty input lines to a single custom object.
toObj() splits an array of lines into an array of contiguous key-value elements, converts that array to a hashtable, which is then converted to a custom object.
Initializes array $lines, which will store the lines of a single block of contiguous nonempty input lines
The -process block, executed for each input line:
If the input line at hand is nonempty: Adds it to the current block of contiguous nonempty input lines stored in array $lines.
Otherwise: Submits the current block to toObj() for conversion to a custom object, and then resets the $lines array to start the next block. In effect, toObj() is invoked for each paragraph (run of nonempty lines).
The -end block, executed once at the end:
Submits the last paragraph to toObj() for conversion to a custom object.
Finally, the resulting array of custom objects is passed to Format-Table.