How to convert filecontent using powershell - powershell

I have a log file with a weird format that I would like to convert to a table. The format is that each line contains multiple keyvalue pairs (same pairs on each row). I want to convert these rows so that each property becomes a column in a table containing the value from the row.
Note that the original log file contains 39 properies on each row and the log file is about 80MB.
Example rows:
date=2019-12-02 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-01 srcip=8.8.8.8 destip=8.8.4.4 srcintf="xyz abc"
date=2019-12-03 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-05 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-07 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
I have tried:
Get-Content .\testfile.log | select -First 10 | ConvertFrom-String | select p1, p2, p3 | ft | Format-Wide
But this will not break out the property name to the column name. So in this example i want P1 to be date, p2 srcip, and p3 destip and that the first part of each value is removed.
Anyone have any tips or creative ideas how to convert this to a table?

ConvertFrom-String provides separator-based parsing as well as heuristics-based parsing based on templates containing example values. The separator-based parsing applies automatic type conversions you cannot control, and the template language is poorly documented, with the exact behavior hard to predict - it's best to avoid this cmdlet altogether. Also note that it's not available in PowerShell [Core] v6+.
Instead, I suggest an approach based on the switch statement[1] and the -split operator to create a collection of custom objects ([pscustomobject]) representing the log lines:
# Use $objects = switch ... to capture the generated objects in a variable.
switch -File .\testfile.log { # Loop over all file lines
default {
$oht = [ordered] #{ } # Define an aux. ordered hashtable
foreach ($keyValue in -split $_) { # Loop over key-value pairs
$key, $value = $keyValue -split '=', 2 # Split pair into key and value
$oht[$key] = $value -replace '^"|"$' # Add to hashtable with "..." removed
}
[pscustomobject] $oht # Convert to custom object and output.
}
}
Note:
The above assumes that your values have no embedded spaces; if they do, more work is needed - see next section.
To capture the generated custom objects in a variable, simply use $objects = switch ...
With two ore more log lines, $objects becomes an [object[]] array of [pscustomobject] instances. If you want to ensure that $objects also becomes an array even if there happens to be just one log line, use [array] $objects = switch ... ([array] is effectively the same as [object[]]).
To directly send the output objects through the pipeline to other cmdlets, enclose the switch statement in & { ... }
With your sample input, this yields:
date srcip destip srcintf
---- ----- ------ -------
2019-12-02 8.8.8.8 8.8.4.4 port2
2019-12-01 8.8.8.8 8.8.4.4 port2
2019-12-03 8.8.8.8 8.8.4.4 port2
2019-12-05 8.8.8.8 8.8.4.4 port2
2019-12-07 8.8.8.8 8.8.4.4 port2
Variant with support for values with embedded spaces inside "..." (e.g., srcintf="port 2"):
switch -file .\testfile.log {
default {
$oht = [ordered] #{ }
foreach ($keyValue in $_ -split '(\w+=(?:[^"][^ ]*|"[^"]*"))' -notmatch '^\s*$') {
$key, $value = $keyValue -split '=', 2
$oht[$key] = $value -replace '^"|"$'
}
[pscustomobject] $oht
}
}
Note that there's no support for embedded escaped " instances (e.g, srcintf="port \"2\"" won't work).
Explanation:
$_ -split '(\w+=(?:[^"][^ ]*|"[^"]*"))' splits by a regex that matches key=valueWithoutSpaces and key="value that may have spaces" tokens and, by virtue of enclosing the expression in (...) (creating a capture group), includes these "separators" in the tokens that -split outputs (by default, separators aren't included).
-notmatch '^\s*$' then weeds out empty and all-spaces tokens from the result (the "data tokens", which aren't of interest in our case), leaving effectively just the key-value pairs.
$key, $value = $keyValue -split '=', 2 splits the given key-value token by = into at most 2 tokens, and uses a destructuring assignment to assign the key and the value to separate variables.
$oht[$key] = $value -replace '^"|"$' adds an entry to the aux. hashtable with the key and value at hand, where -replace '^"|"$' uses the -replace operator to remove " from the beginning and end of the value, if present.
[1] switch -File is a flexible and much faster alternative to processing a file line by line with a combination of Get-Content and ForEach-Object.

So what you could do is cut each line into a hashtable of key value pairs passing those to ConvertFrom-StringData instead. There is a couple of caveats with this approach. In keeping it simple your source data is space delimited. This would break if you real data contained spaces (which can be mitigated.) Other obvious caveat is you can't guarantee property order.
Get-Content c:\temp\so.txt | ForEach-Object{
[PSCustomObject](($_ -split " ") -join "`r`n" | ConvertFrom-StringData)
} | Select-Object date, srcip, destip, srcintf
Output:
date srcip destip srcintf
---- ----- ------ -------
2019-12-02 8.8.8.8 8.8.4.4 "port2"
2019-12-01 8.8.8.8 8.8.4.4 "port2"
2019-12-03 8.8.8.8 8.8.4.4 "port2"
2019-12-05 8.8.8.8 8.8.4.4 "port2"
2019-12-07 8.8.8.8 8.8.4.4 "port2"

OK, for the purposes of discussion, I am going to assume the following:
The data is in a file PSDATA.TXT
There are no spaces in the data other than the spaces separating the name-value pairs.
It is acceptable for the resulting tabular data to treat all the values as strings.
Given that...
Get-Content -Path PSDATA.TXT |
ForEach-Object {$_ -replace ' ','";' -replace '=','="' -replace '""','"'} |
ForEach-Object {New-Object PSObject -Property (Invoke-Expression ("[Ordered]#{{{0}}}" -f $_))}
... will generate a table where each line in the file becomes a PSObject with fields taking their names from the name in each name-value pair, and the associated value being the value of the field, as a string. If you're not using PowerShell v4 or later (I'm not sure about 3), you can omit the [Ordered], with the side effect of the order of the fields in the PSObject not necessarily being in the same order as in the file.
If you wanted to have an array of these PSObjects for further processing, you could wrap the whole line above in a variable assignment, e.g., $A=(«that whole thing above, on one line»), and if you wanted to send it to a CSV file, you could just add | Export-CSV -path NewCSVFile.CSV to the end.

I would prefer a datatable, so you easily can sort, filter, merge etc. the logfile:
$logFilePath = 'C:\test\test.log'
$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('P1',[string]::empty.GetType() )
[void]$dt.Columns.Add('P2',[string]::empty.GetType() )
[void]$dt.Columns.Add('P3',[string]::empty.GetType() )
foreach( $line in [System.IO.File]::ReadLines($logFilePath) )
{
$tokenArray = $line -split '[= ]'
$row = $dt.NewRow()
$row.P1 = $tokenArray[1]
$row.P2 = $tokenArray[3]
$row.P3 = $tokenArray[5]
[void]$dt.Rows.Add( $row )
}
$dt

Related

How to make netstat output's headings show properly in out-gridview?

when I use:
netstat -f | out-gridview
in PowerShell 7.3, I get the window but it has only one column which is a string. I don't know why it's not properly creating a column for each of the headings like Proto, Local Address etc.
how can I fix this?
While commenter Toni makes a good point to use Get-NetTCPConnection | Out-GridView instead, this answer addresses the question as asked.
To be able to show output of netstat in grid view, we have to parse its textual output into objects.
Fortunately, all fields are separated by at least two space characters, so after replacing these with comma, we can simply use ConvertFrom-CSV (thanks to an idea of commenter Doug Maurer).
netstat -f |
# Skip unwanted lines at the beginning
Select-Object -skip 3 |
# Replace two or more white space characters by comma, except at start of line
ForEach-Object { $_ -replace '(?<!^)\s{2,}', ',' } |
# Convert into an object and add it to grid view
ConvertFrom-Csv | Out-GridView
For a detailed explanation of the RegEx pattern used with the -replace operator, see this RegEx101 demo page.
This is the code of my original answer, which is functionally equivalent. I'll keep it as an example of how choosing the right tool for the job can greatly simplify code.
$headers = #()
# Skip first 3 lines of output which we don't need
netstat -f | Select-Object -skip 3 | ForEach-Object {
# Split each line into columns
$columns = $_.Trim() -split '\s{2,}'
if( -not $headers ) {
# First line is the header row
$headers = $columns
}
else {
# Create an ordered hashtable
$objectProperties = [ordered] #{}
$i = 0
# Loop over the columns and use the header columns as property names
foreach( $key in $headers ) {
$objectProperties[ $key ] = $columns[ $i++ ]
}
# Convert the hashtable into an object that can be shown by Out-GridView
[PSCustomObject] $objectProperties
}
} | Out-GridView

Powershell: Import-csv, rename all headers

In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.
CSV:
user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1
I can do:
import-csv foo.csv | where {$_.user -eq $user}
But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?
I am aware of calculated property like:
Select-Object #{ expression={$_.database1_dat}; label='database1' }
but i have to know all column names for that, as far as I know.
Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?
Thanks :-)
Assuming that file foo.csv fits into memory as a whole, the following solution performs well:
If you need a memory-throttled - but invariably much slower - solution, see Santiago Squarzon's helpful answer or the alternative approach in the bottom section.
$headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2
# You can pipe the result to `where {$_.user -eq $user}`
ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';'
Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).
-split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.
Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
, 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
$headerRow -replace '_dat(?=;|$)'
-replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.
Alternative solution: algorithmic renaming of an object's properties:
Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.
As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.
However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:
Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
# Initialize an aux. ordered hashtable to store the renamed
# property name-value pairs.
$renamedProperties = [ordered] #{}
# Process all properties of the input object and
# add them with cleaned-up names to the hashtable.
foreach ($prop in $_.psobject.Properties) {
$renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
}
# Convert the aux. hashtable to a custom object and output it.
[pscustomobject] $renamedProperties
}
You can do something like this:
$textInfo = (Get-Culture).TextInfo
$headers = (Get-Content .\test.csv | Select-Object -First 1).Split(';') |
ForEach-Object {
$textInfo.ToTitleCase($_) -replace '_dat'
}
$user = 'peter'
Get-Content .\test.csv | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' -Header $headers |
Where-Object User -EQ $user
User Database1 Database2 Database3
---- --------- --------- ---------
peter 1 0 1
Not super efficient but does the trick.

Powershell output in one line

I'm pretty new when it comes to scripting with powershell (or in general whe it comes to scripting). The problem that i have, is that i got a bunch of variables i want to output in one line. Here is not the original but simplified code:
$a = 1
$b = 2
$c = $a; $b;
Write-output $c
The output looks like this:
1
2
You may guess how i want the output to look like:
12
I've searched the net to get a solution but nothing seem to work. What am i doing wrong?
Right now you're only assigning $a to $c and then outputting $b separately - use the #() array subexpression operator to create $c instead:
$c = #($a; $b)
Then, use the -join operator to concatenate the two values into a single string:
$c -join ''
You can make things easier on yourself using member access or Select-Object to retrieve property values. Once the values are retrieved, you can them manipulate them.
It is not completely clear what you really need, but the following is a blueprint of how to get the desired system data from your code.
# Get Serial Number
$serial = Get-CimInstance CIM_BIOSElement | Select-Object -Expand SerialNumber
# Serial Without Last Digit
$serialMinusLast = $serial -replace '.$'
# First 7 characters of Serial Number
# Only works when serial is 7 or more characters
$serial.Substring(0,7)
# Always works
$serial -replace '(?<=^.{7}).*$'
# Get Model
$model = Get-CimInstance Win32_ComputerSystem | Select-Object -Expand Model
# Get First Character and Last 4 Characters of Model
$modelSubString = $model -replace '^(.).*(.{4})$','$1$2'
# Output <model substring - serial number substring>
"{0}-{1}" -f $modelSubString,$serialMinusLast
# Output <model - serial number>
"{0}-{1}" -f $model,$serial
Using the syntax $object | Select-Object -Expand Property will retrieve the value of Property only due to the use of -Expand or -ExpandProperty. You could opt to use member access, which uses the syntax $object.Property, to achieve the same result.
If you have an array of elements, you can use the -join operator to create a single string of those array elements.
$array = 1,2,3,'go'
# single string
$array -join ''
The string format operator -f can be used to join components into a single string. It allows you to easily add extra characters between the substrings.

Cleanup huge text file containing domain

I have a database that contains a log of domains listed in the following matter:
.youtube.com
.ziprecruiter.com
0.etsystatic.com
0.sparkpost.com
00.mail.ne1.yahoo.com
00072e01.pphosted.com
00111b01.pphosted.com
001d4f01.pphosted.com
011.mail.bf1.yahoo.com
1.amazonaws.com
How would I go about cleaning them up using powershell or grep, though I rather use powershell, so that they contain only the root domain with the .com extension and remove whatever word and . is before that.
I'm thinking best way to do is is a query that looks for dots from right to left and removes the second dot and whatever comes after it. For example 1.amazonaws.com here we remove the second dot from the right and whatever is after it?
i.e.
youtube.com
ziprecruiter.com
etsystatic.com
yahoo.com
pphosted.com
amazonaws.com
You can read each line into an array of strings with Get-Content, Split on "." using Split(), get the last two items with [-2,-1], then join the array back up using -join. We can then retrieve unique items using -Unique from Select-Object.
Get-Content -Path .\database_export.txt | ForEach-Object {
$_.Split('.')[-2,-1] -join '.'
} | Select-Object -Unique
Or using Select-Object -Last 2 to fetch the last two items, then piping to Join-String.
Get-Content -Path .\database_export.txt | ForEach-Object {
$_.Split('.') | Select-Object -Last 2 | Join-String -Separator '.'
} | Select-Object -Unique
Output:
youtube.com
ziprecruiter.com
etsystatic.com
sparkpost.com
yahoo.com
pphosted.com
amazonaws.com
You can use the String.Trim() method to clean leading and trailing dots, then use the regex -replace operator to remove everything but the top- and second-level domain name:
$strings = Get-Content database_export.txt
#($strings |ForEach-Object Trim '.') -replace '.*?(\w+\.\w+)$','$1' |Sort-Object -Unique
here is yet another method. [grin]
what it does ...
creates an array of strings to work with
when ready to do this for real, remove the entire #region/#endregion section and use Get-Content to load the file.
iterates thru the $InStuff collection of strings
splits the current item on the dots
grabs the last two items in the resulting array
joins them with a dot
outputs the new string to the $Results collection
shows that on screen
the code ...
#region >>> fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
.youtube.com
.ziprecruiter.com
0.etsystatic.com
0.sparkpost.com
00.mail.ne1.yahoo.com
00072e01.pphosted.com
00111b01.pphosted.com
001d4f01.pphosted.com
011.mail.bf1.yahoo.com
1.amazonaws.com
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a text file
$Results = foreach ($IS_Item in $InStuff)
{
$IS_Item.Split('.')[-2, -1] -join '.'
}
$Results
output ...
youtube.com
ziprecruiter.com
etsystatic.com
sparkpost.com
yahoo.com
pphosted.com
pphosted.com
pphosted.com
yahoo.com
amazonaws.com
please note that this code expects the strings to be more-or-less-valid URLs. i can think of invalid ones that end with a dot ... and those would fail. if you need to deal with such, add the needed validation code.
another idea ... if the file is large [tens of thousands of strings], you may want to use the ForEach-Object pipeline cmdlet [as shown by RoadRunner] to save RAM at the expense of speed.

Format table from output of a string

I have tried different ways but not able to format data into table
$str1 = "First string"
$str2 = "Sec string"
$str3 = "third str"
$str4 = "fourth string"
$str = "$str1 $str2 `r`n"
$str+= "$str3 $str4"
write-host $str | Format-Table
I am looking to create output like below:
First string Sec string
third str fourth string
In order to use Format-Table as intended, you need objects with properties rather than mere strings:
$str -split "`r`n" | ForEach-Object {
# Initialize a custom object whose properties will reflect
# the input line's tokens (column values).
$obj = New-Object PSCustomObject; $i = 0
# Add each whitespace-separated token as a property.
foreach ($token in -split $_) {
Add-Member -InputObject $obj -NotePropertyName ('col' + ++$i) -NotePropertyValue $token
}
# Output the custom object.
$obj
} | Format-Table -HideTableHeaders
$str -split "`r`n" splits the multi-line string into individual lines and sends them through the pipeline one by one.
The ForEach-Object command constructs a custom object from each line whose properties are the whitespace-separated tokens on the line, as described in the comments; the property names - which don't matter for the output - are auto-generated as col1, col2, ...
Note: This does not match your desired output exactly in that each space (run of whitespace) is treated as a separator. If you wanted to treat the original $str1, $str2, ... variable values (e.g., First string) each as a single column value, you'd have to make assumptions about how to tokenize the line.
For instance, if the assumption is that 2 consecutive words form a single value, replace -split $_ above with $_ -split '(\w+ \w+) ?' -ne ''
If you didn't want to rely on assumptions, you'd have to construct your input strings with embedded quoting so as to unambiguously indicate token boundaries (the code would then have to be modified to parse the embedded quoting correctly).
Format-Table then displays the custom objects in tabular form, with columns properly aligned; -HideTableHeaders suppresses the header line (the auto-generated property names).
With your sample input, the above yields the following, produced without -HideTableHeaders so as to better illustrate what the code does:
col1 col2 col3 col4
---- ---- ---- ----
First string Sec string
third str fourth string
Ditto, but with the 2-consecutive-words splitting logic:
col1 col2
---- ----
First string Sec string
third str fourth string
As for what you tried:
Do not use Write-Host to produce data output: Write-Host output (by default) goes to the console and bypasses the pipeline, so that Format-Table receives no input and has no effect here.
That said, even if Format-Table did receive input (by using $str by itself, without Write-Host, i.e.: $str | Format-Table), it would have no (visible) effect on strings, which are always rendered as-is.