Get Substring of value when using import-Csv in PowerShell - powershell

I have a PowerShell script that imports a CSV file, filters out rows from two columns and then concatenates a string and exports to a new CSV file.
Import-Csv "redirect_and_canonical_chains.csv" |
Where { $_."Number of Redirects" -gt 1} |
Select {"Redirect 301 ",$_.Address, $_."Final Address"} |
Export-Csv "testing-export.csv" –NoTypeInformation
This all works fine however for the $_.Address value I want to strip the domain, sub-domain and protocol etc using the following regex
^(?:https?:\/\/)?(?:[^#\/\n]+#)?(?:www\.)?([^:\/\n]+)
This individually works and matches as I want but I am not sure of the best way to implement when selecting the data (should I use $match, -replace etc) or whether I should do it after importing?
Any advice greatly appreciated!
Many thanks
Mike

The best place to do it would be in the select clause, as in:
select Property1,Property2,#{name='NewProperty';expression={$_.Property3 -replace '<regex>',''}}
That's what a calculated property is: you give the name, and the way to create it.Your regex might need revision to work with PowerShell, though.

I've realized now that I can just use .Replace in the following way :)
Select {"Redirect 301 ",$_.Address.Replace('http://', 'testing'), $_."Final Address"}

Based on follow-up comments, the intent behind your Select[-Object] call was to create a single string with space-separated entries from each input object.
Note that use of Export-Csv then makes no sense, because it will create a single Length column with the input strings' length rather than output the strings themselves.
In a follow-up comment you posted a solution that used Write-Host to produce the output string, but Write-Host is generally the wrong tool to use, unless the intent is explicitly to write to the display only, thereby bypassing PowerShell's output streams and thus the ability to send the output to other commands, capture it in a variable or redirect it to a file.
Here's a fixed version of your command, which uses the -join operator to join the elements of a string array to output a single, space-separated string:
$sampleCsvInput = [pscustomobject] #{
Address = 'http://www.example.org/more/stuff';
'Final Address' = 'more/stuff2'
}
$sampleCsvInput | ForEach-Object {
"Redirect 301 ",
($_.Address -replace '^(?:https?://)?(?:[^#/\n]+#)?(?:www\.)?([^:/\n]+)', ''),
$_.'Final Address' -join ' '
}
Note that , - PowerShell's array-construction operator - has higher precedence than the -join operator, so the -join operation indeed joins all 3 preceding array elements.
The above yields the following string:
Redirect 301 /more/stuff more/stuff2

Related

Having problem with split method using powershell

I have an xml file where i have line some
<!--<__AMAZONSITE id="-123456780" instance ="CATZ00124"__/>-->
and i need the id and instance values from that particular line.
where i need have -123456780 as well as CATZ00124 in 2 different variables.
Below is the sample code which i have tried
$xmlfile = 'D:\Test\sample.xml'
$find_string = '__AMAZONSITE'
$array = #((Get-Content $xmlfile) | select-string $find_string)
Write-Host $array.Length
foreach ($commentedline in $array)
{
Write-Host $commentedline.Line.Split('id=')
}
I am getting below result:
<!--<__AMAZONSITE
"-123456780"
nstance
"CATZ00124"__/>
The preferred way still is to use XML tools for XML files.
As long a line with AMAZONSITE and instance is unique in the file this could do:
## Q:\Test\2019\09\13\SO_57923292.ps1
$xmlfile = 'D:\Test\sample.xml' # '.\sample.xml' #
## see following RegEx live and with explanation on https://regex101.com/r/w34ieh/1
$RE = '(?<=AMAZONSITE id=")(?<id>[\d-]+)" instance ="(?<instance>[^"]+)"'
if((Get-Content $xmlfile -raw) -match $RE){
$AmazonSiteID = $Matches.id
$Instance = $Matches.instance
}
LotPings' answer sensibly recommends using a regular expression with capture groups to extract the substrings of interest from each matching line.
You can incorporate that into your Select-String call for a single-pipeline solution (the assumption is that the XML comments of interest are all on a single line each):
# Define the regex to use with Select-String, which both
# matches the lines of interest and captures the substrings of interest
# ('id' an 'instance' attributes) via capture groups, (...)
$regex = '<!--<__AMAZONSITE id="(.+?)" instance ="(.+?)"__/>-->'
Select-String -LiteralPath $xmlfile -Pattern $regex | ForEach-Object {
# Output a custom object with properties reflecting
# the substrings of interest reported by the capture groups.
[pscustomobject] #{
id = $_.Matches.Groups[1].Value
instance = $_.Matches.Groups[2].Value
}
}
The result is an array of custom objects that each have an .id and .instance property with the values of interest (which is preferable to setting individual variables); in the console, the output would look something like this:
id instance
-- --------
-123456780 CATZ00124
-123456781 CATZ00125
-123456782 CATZ00126
As for what you tried:
Note: I'm discussing your use of .Split(), though for extracting a substring, as is your intent, .Split() is not the best tool, given that it is only the first step toward isolating the substring of interest.
As LotPings notes in a comment, in Windows PowerShell, $commentedline.Line.Split('id=') causes the String.Split() method to split the input string by any of the individual characters in split string 'id=', because the method overload that Windows PowerShell selects takes a char[] value, i.e. an array of characters, which is not your intent.
You could rectify this as follows, by forcing use of the overload that accepts string[] (even though you're only passing one string), which also requires passing an options argument:
$commentedline.Line.Split([string[] 'id=', 'None') # OK, splits by whole string
Note that in PowerShell Core the logic is reversed, because .NET Core introduced a new overload with just [string] (with an optional options argument), which PowerShell Core selects by default. Conversely, this means that if you do want by-any-character splitting in PowerShell Core, you must cast the split string to [char[]].
On a general note, PowerShell has the -split operator, which is regex-based and offers much more flexibility than String.Split() - see this answer.
Applied to your case:
$commentedline.Line -split 'id='
While id= is interpreted a regex by -split, that makes no difference here, given that the string contains no regex metacharacters (characters with special meaning); if you do want to safely split by a literal substring, use [regex]::Escape('...') as the RHS.
Note that -split is case-insensitive by default, as PowerShell generally is; however, you can use the -csplit variant for case-sensitive matching.

Stripping Data From a String In Powershell

I'm pulling the hostnames from all computers in an AD domain and the current command formats it in url form with the hostname at the end. I just need the hostnames so I'd like to strip everything to the left of the last forward slash.
(([adsi]"WinNT://$((Get-WMIObject Win32_ComputerSystem).Domain)").Children).Where({$_.schemaclassname -eq 'computer'}) | %{ $_.Path }
It's outputting as it should, I just happen to just need the hostname, so instead of WinNT://subdomain.somedomain.local/hostname I just got hostname which I would then redirect to an output file.
You can use the -Split operator to help retrieve the data:
"WinNT://subdomain.somedomain.local/hostname" -Split "/" | Select-Object -Last 1
-Split "/" separates the value into an array of substrings using / as a delimiter. You can access the resulting parts using array indexes or Select-Object. Since you want the last value, you could alternatively access [-1] index of the resulting array (("WinNT://subdomain.somedomain.local/hostname" -Split "/")[-1]).
See About Split for more information and examples.
Just posting another option, and something else that may be useful. You can also split strings by their last index, which is the last time a character appears in it. From there you can use the Substring method to select the remainder of the string.
$lio = "WinNT://subdomain.somedomain.local/hostname".LastIndexOf('/')
"WinNT://subdomain.somedomain.local/hostname".Substring($lin + 1) # +1 to not include the slash
You can see all the methods for a string here
For things like this, I would also suggest looking at the ActiveDirectory module. You can run Get-ADComputer and select specific fields really easily.

Combining test string from CSV with

I have a CSV file containing user aliases in first.last name format.
I am attempting to pull these aliases one-by-one and combine them with our domain, to create their email address.
Here is my code:
$CSV = Import-CSV "\\this\is\thepath\to\the\csv.csv"
$domain = "#domain.co.uk"
$CSV.Alias | ForEach-Object (
Write-Host ($CSV.Alias + $domain)
)
the output I need, is:
John.Doe#domain.co.uk Jane.Doe#domain.co.uk John.Smith#domain.co.uk
However, this is what's being output:
John.Doe Jane.Doe John.Smith #domain.co.uk
Try this:
$CSV = Import-CSV '\\this\is\thepath\to\the\csv.csv'
$domain = '#domain.co.uk'
$CSV.Alias | ForEach-Object {
Write-Host $_ + $domain
}
If I apply it to this input file:
Alias
John.Doe
Jane.Doe
John.Smith
I get this output.
John.Doe#domain.co.uk
Jane.Doe#domain.co.uk
John.Smith#domain.co.uk
All I did was to correct two bugs in your attempt, the two bugs that were pointed out in the comments to the question. Plus I made two cosmetic changes.
One cosmetic change was to use single quotes in a couple of places where you used double quotes. If you don't need the transformations that double quotes invoke, don't use them. Use single quotes instead. This is just defensive coding.
Another cosmetic change was to remove the parentheses surrounding the argument to Write-Host. It works without those parentheses. One might argue that the parentheses should be in there anyway, to make it easier to read. OK.

Reformat column names in a csv with PowerShell

Question
How do I reformat an unknown CSV column name according to a formula or subroutine (e.g. rename column " Arbitrary Column Name " to "Arbitrary Column Name" by running a trim or regex or something) while maintaining data?
Goal
I'm trying to more or less sanitize columns (the names) in a hand-produced (or at least hand-edited) csv file that needs to be processed by an existing PowerShell script. In this specific case, the columns have spaces that would be removed by a call to [String]::Trim(), or which could be ignored with an appropriate regex, but I can't figure a way to call or use those techniques when importing or processing a CSV.
Short Background
Most files and columns have historically been entered into the CSV properly, but recently a few columns were being dropped during processing; I determined it was because the files contained a space (e.g., Select-Object was being told to get "RFC", but Import-CSV retrieved "RFC ", so no matchy-matchy). Telling the customer to enter it correctly by hand (though preferred and much simpler) is not an option in this case.
Options considered
I could manually process the text of the file, but that is a messy and error prone way to re-invent the wheel. I wonder if there's a syntax with Select-Object that would allow a softer match for column names, but I can't find that info.
The closest I have come conceptually is using a calculated property in the call to Select-Object to rename the column, but I can only find ways to rename a known column to another known column. So, this would require enumerating the columns and matching them exactly (preferred) or a softer match (like comparing after trimming or matching via regex as a fallback) with expected column names, then creating a collection of name mappings to use in constructing calculated properties from that information to select into a new object.
That seems like it would work, but more it's work than I'd prefer, and I can't help but hope that there's a simpler way I haven't been able to find via Google. Maybe I should try Bing?
Sample File
Let's say you have a file.csv like this:
" RFC "
"1"
"2"
"3"
Code
Now try to run the following:
$CSV = Get-Content file.csv -First 2 | ConvertFrom-Csv
$FixedHeaders = $CSV.PSObject.Properties.Name.Trim(' ')
Import-Csv file.csv -Header $FixedHeaders |
Select-Object -Skip 1 -Property RFC
Output
You will get this output:
RFC
---
1
2
3
Explanation
First we use Get-Content with parameter -First 2 to get the first two lines. Piping to ConvertFrom-Csv will allow us to access the headers with PSObject.Properties.Name. Use Import-Csv with the -Header parameter to use the trimmed headers. Pipe to Select-Object and use -Skip 1 to skip the original headers.
I'm not sure about comparisons in terms of efficiency, but I think this is a little more hardened, and imports the CSV only once. You might be able to use #lahell's approach and Get-Content -raw, but this was done and it works, so I'm gonna leave it to the community to determine which is better...
#import the CSV
$rawCSV = Import-Csv $Path
#get actual header names and map to their reformatted versions
$CSVColumns = #{}
$rawCSV |
Get-Member |
Where-Object {$_.MemberType -eq "NoteProperty"} |
Select-Object -ExpandProperty Name |
Foreach-Object {
#add a mapping to the original from a trimmed and whitespace-reduced version of the original
$CSVColumns.Add(($_.Trim() -replace '(\s)\s+', '$1'), "$_")
}
#Create the array of names and calculated properties to pass to Select-Object
$SelectColumns = #()
$CSVColumns.GetEnumerator() |
Foreach-Object {
$SelectColumns += {
if ($CSVColumns.values -contains $_.key) {$_.key}
else { #{Name = $_.key; Expression = $CSVColumns[$_.key]} }
}
}
$FormattedCSV = $rawCSV |
Select-Object $SelectColumns
This was hand-copied to a computer where I don't have the rights to run it, so there might be an error - I tried to copy it correctly
You can use gocsv https://github.com/DataFoxCo/gocsv to see the headers of the csv, you can then rename the headers, behead the file, swap columns, join, merge, any number of transformations you want

Powershell extract values from file

I have a log file that has a lot of fields listed in it. I want to extract the fields out of the file, but I don't want to search through the file line by line.
I have my pattern:
$pattern="Hostname \(Alias\):(.+)\(.+Service: (.+)"
This will give me the two values that I need. I know that if I have a string, and I'm looking for one match I can use the $matches array to find the fields. In other words, If I'm looking at a single line in the file using the string variable $line, I can extract the fields using this code.
if($line -matches $pattern){
$var1=$matches[1]
$var2=$matches[2]
}
But how can I get these values without searching line by line? I want to pass the whole file as a single string, and add the values that I am extracting to two different arrays.
I'm looking for something like
while($filetext -match $pattern){
$array1+=$matches[1]
$array2+=$matches[2]
}
But this code puts me in an infinite loop if there is even one match. So is there a nextMatch function I can use?
PowerShell 2.0 addressed this limitation by adding the -AllMatches parameter to the Select-String cmdlet e.g.:
$filetext | Select-String $pattern -AllMatches |
Foreach {$_.Matches | Foreach {$_.Groups[1] | Foreach {$_.Value}}}