Reading strings from text files using switch -regex returns null element - powershell

Question:
The intention of my script is to filter out the name and phone number from both text files and add them into a hash table with the name being the key and the phone number being the value.
The problem I am facing is
$name = $_.Current is returning $null, as a result of which my hash is not getting populated.
Can someone tell me what the issue is?
Contents of File1.txt:
Lori
234 east 2nd street
Raleigh nc 12345
9199617621
lori#hotmail.com
=================
Contents of File2.txt:
Robert
2531 10th Avenue
Seattle WA 93413
2068869421
robert#hotmail.com
Sample Code:
$hash = #{}
Switch -regex (Get-content -Path C:\Users\svats\Desktop\Fil*.txt)
{
'^[a-z]+$' { $name = $_.current}
'^\d{10}' {
$phone = $_.current
$hash.Add($name,$phone)
$name=$phone=$null
}
default
{
write-host "Nothing matched"
}
}
$hash

Remove the current property reference from $_:
$hash = #{}
Switch -regex (Get-content -Path C:\Users\svats\Desktop\Fil*.txt)
{
'^[a-z]+$' {
$name = $_
}
'^\d{10}' {
$phone = $_
$hash.Add($name, $phone)
$name = $phone = $null
}
default {
Write-Host "Nothing matched"
}
}
$hash

Mathias R. Jessen's helpful answer explains your problem and offers an effective solution:
it is automatic variable $_ / $PSItem itself that contains the current input object (whatever its type is - what properties $_ / $PSItem has therefore depends on the input object's specific type).
Aside from that, there's potential for making the code both less verbose and more efficient:
# Initialize the output hashtable.
$hash = #{}
# Create the regex that will be used on each input file's content.
# (?...) sets options: i ... case-insensitive; m ... ^ and $ match
# the beginning and end of every *line*.
$re = [regex] '(?im)^([a-z]+|\d{10})$'
# Loop over each input file's content (as a whole, thanks to -Raw).
Get-Content -Raw File*.txt | foreach {
# Look for name and phone number.
$matchColl = $re.Matches($_)
if ($matchColl.Count -eq 2) { # Both found, add hashtable entry.
$hash.Add($matchColl.Value[0], $matchColl.Value[1])
} else {
Write-Host "Nothing matched."
}
}
# Output the resulting hashtable.
$hash
A note on the construction of the .NET [System.Text.RegularExpressions.Regex] object (or [regex] for short), [regex] '(?im)^([a-z]+|\d{10})$':
Embedding matching options IgnoreCase and Multiline as inline options i and m directly in the regex string ((?im) is convenient, in that it allows using simple cast syntax ([regex] ...) to construct the regular-expression .NET object.
However, this syntax may be obscure and, furthermore, not all matching options are available in inline form, so here's the more verbose, but easier-to-read equivalent:
$re = New-Object regex -ArgumentList '^([a-z]+|\d{10})$', 'IgnoreCase, Multiline'
Note that the two options must be specified comma-separated, as a single string, which PowerShell translates into the bit-OR-ed values of the corresponding enumeration values.

other solution, use convertfrom-string
$template=#'
{name*:Lori}
{street:234 east 2nd street}
{city:Raleigh nc 12345}
{phone:9199617621}
{mail:lori#hotmail.com}
{name*:Robert}
{street:2531 10th Avenue}
{city:Seattle WA 93413}
{phone:2068869421}
{mail:robert#hotmail.com}
{name*:Robert}
{street:2531 Avenue}
{city:Seattle WA 93413}
{phone:2068869421}
{mail:robert#hotmail.com}
'#
Get-Content -Path "c:\temp\file*.txt" | ConvertFrom-String -TemplateContent $template | select name, phone

Related

Check if a condition is met by a line within a TXT but "in an advanced way"

I have a TXT file with 1300 megabytes (huge thing). I want to build code that does two things:
Every line contains a unique ID at the beginning. I want to check for all lines with the same unique ID if the conditions is met for that "group" of IDs. (This answers me: For how many lines with the unique ID X have all conditions been met)
If the script is finished I want to remove all lines from the TXT where the condition was met (see 2). So I can rerun the script with another condition set to "narrow down" the whole document.
After few cycles I finally have a set of conditions that applies to all lines in the document.
It seems that my current approach is very slow.( one cycle needs hours). My final result is a set of conditions that apply to all lines of code.
If you find an easier way to do that, feel free to recommend.
Help is welcome :)
Code so far (does not fullfill everything from 1&2)
foreach ($item in $liste)
{
# Check Conditions
if ( ($item -like "*XXX*") -and ($item -like "*YYY*") -and ($item -notlike "*ZZZ*")) {
# Add a line to a document to see which lines match condition
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
# Retrieve the unique ID from the line and feed array.
$array += $item.Split("/")[1]
# Remove the line from final document
$liste = $liste -replace $item, ""
}
}
# Pipe the "new cleaned" list somewhere
$liste | Set-Content -Path "C:\NewListToWorkWith.txt"
# Show me the counts
$array | group | % { $h = #{} } { $h[$_.Name] = $_.Count } { $h } | Out-File "C:\Desktop\count.txt"
Demo Lines:
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
performance considerations:
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
try to avoid wrapping cmdlet pipelines
See also: Mastering the (steppable) pipeline
$array += $item.Split("/")[1]
Try to avoid using the increase assignment operator (+=) to create a collection
See also: Why should I avoid using the increase assignment operator (+=) to create a collection
$liste = $liste -replace $item, ""
This is a very expensive operation considering that you are reassigning (copying) a long list ($liste) with each iteration.
Besides it is a bad practice to change an array that you are currently iterating.
$array | group | ...
Group-Object is a rather slow cmdlet, you better collect (or count) the items on-the-fly (where you do $array += $item.Split("/")[1]) using a hashtable, something like:
$Name = $item.Split("/")[1]
if (!$HashTable.Contains($Name)) { $HashTable[$Name] = [Collections.Generic.List[String]]::new() }
$HashTable[$Name].Add($Item)
To minimize memory usage it may be better to read one line at a time and check if it already exists. Below code I used StringReader and you can replace with StreamReader for reading from a file. I'm checking if the entire string exists, but you may want to split the line. Notice I have duplicaes in the input but not in the dictionary. See code below :
$rows= #"
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
"#
$dict = [System.Collections.Generic.Dictionary[int, System.Collections.Generic.List[string]]]::new();
$reader = [System.IO.StringReader]::new($rows)
while(($row = $reader.ReadLine()) -ne $null)
{
$hash = $row.GetHashCode()
if($dict.ContainsKey($hash))
{
#check if list contains the string
if($dict[$hash].Contains($row))
{
#string is a duplicate
}
else
{
#add string to dictionary value if it is not in list
$list = $dict[$hash].Value
$list.Add($row)
}
}
else
{
#add new hash value to dictionary
$list = [System.Collections.Generic.List[string]]::new();
$list.Add($row)
$dict.Add($hash, $list)
}
}
$dict

Powershell - combine switch statement and loop

I have a piece of code presented below, it takes values from a json file. This is a array of string --> $Json.Names. I would like to avoid duplicate lines like --> $Json.Names[0].Name {$Json.Names[0].Name; break}. Second case is that Array $Json.Names could have different length, array can have 6 and more or less elements. I want to make this switch statement more elastic. I tried to use for loop and while loop, but in this case these loops doesn't help me. Is there any clever method to make this code more sophisticated/elastic and avoid duplicate mentioned code lines$Json.Names[0].Name {$Json.Names[0].Name; break}
$Json = Get-Content "$path" | out-string | ConvertFrom-Json
$Name = switch ($Member) {
$Json.Names[0].Name {$Json.Names[0].Name; break}
$Json.Names[1].Name {$Json.Names[1].Name; break}
$Json.Names[2].Name {$Json.Names[2].Name; break}
$Json.Names[3].Name {$Json.Names[3].Name; break}
$Json.Names[4].Name {$Json.Names[4].Name; break}
$Json.Names[5].Name {$Json.Names[5].Name; break}
$Json.Names[6].Name {$Json.Names[6].Name; break}
default {"Unknown Name"}
}
Assuming this structure:
$json = [pscustomobject]#{names = [pscustomobject]#{name ='joe'},
[pscustomobject]#{name ='john'},
[pscustomobject]#{name ='james'}}
Assuming $member is a single name, you can say
$name = $json.names.name -eq $member # an array of one
$name would be a null array if there's no match.
if (! $name) { $name = 'Unknown Name' }
Or, in the language of Powershell 7 preview 5:
$name ??= 'Unknown Name'
You also may want to make a hashtable of the names.
You can take advantage of property enumeration and simplify your code pretty significantly:
# define our default
$Name = "Unknown Name"
# define the list of names
$Json = Get-Content "$path" | ConvertFrom-Json
$Names = $Json.Names.Name
# update $Name if applicable
if($Names -contains $member){
$Name = $member
}

What is '#{}' meaning in PowerShell

I have line of scripts for review here, I noticed variable declaration with a value:
function readConfig {
Param([string]$fileName)
$config = #{}
Get-Content $fileName | Where-Object {
$_ -like '*=*'
} | ForEach-Object {
$key, $value = $_ -split '\s*=\s*', 2
$config[$key] = $value
}
return $config
}
I wonder what #{} means in $config = #{}?
#{} in PowerShell defines a hashtable, a data structure for mapping unique keys to values (in other languages this data structure is called "dictionary" or "associative array").
#{} on its own defines an empty hashtable, that can then be filled with values, e.g. like this:
$h = #{}
$h['a'] = 'foo'
$h['b'] = 'bar'
Hashtables can also be defined with their content already present:
$h = #{
'a' = 'foo'
'b' = 'bar'
}
Note, however, that when you see similar notation in PowerShell output, e.g. like this:
abc: 23
def: #{"a"="foo";"b"="bar"}
that is usually not a hashtable, but the string representation of a custom object.
The meaning of the #{}
can be seen in diffrent ways.
If the #{} is empty, an empty hash table is defined.
But if there is something between the curly brackets it can be used in a contex of an splatting operation.
Hash Table
Splatting
I think there is no need in explaining what an hash table is.
Splatting is a method of passing a collection of parameter values to a command as unit.
$prints = #{
Name = "John Doe"
Age = 18
Haircolor = "Red"
}
Write-Host #prints
Hope it helps! BR
Edit:
Regarding the updated code from the questioner the answer is
It defines an empty hash table.
Be aware that Get-Content has its own parameters!
THE MOST IMPORTANT 1:
[-Raw]

How do I change foreach to for in PowerShell?

I want to print the word exist in a text file and print "match" and "not match". My 1st text file is: xxaavv6J, my 2nd file is 6J6SCa.yB.
If it is match, it return like this:
Match found:
Match found:
Match found:
Match found:
Match found:
Match found: 6J
Match found:
Match found:
Match found:
My expectation is just print match and not match.
$X = Get-Content "C:\Users\2.txt"
$Data = Get-Content "C:\Users\d.txt"
$Split = $Data -split '(..)'
$Y = $X.Substring(0, 6)
$Z = $Y -split '(..)'
foreach ($i in $Z) {
foreach ($j in $Split) {
if ($i -like $j) {
Write-Host ("Match found: {0}" -f $i, $j)
}
}
}
The operation -split '(..)' does not produce the result you think it does. If you take a look at the output of the following command you'll see that you're getting a lot of empty results:
PS C:\> 'xxaavv6J' -split '(..)' | % { "-$_-" }
--
-xx-
--
-aa-
--
-vv-
--
-6J-
--
Those empty values are the additional matches you're getting from $i -like $j.
I'm not quite sure why -split '(..)' gives you any non-empty values in the first place, because I would have expected it to produce 5 empty strings for an input string "xxaavv6J". Apparently it has to do with the grouping parentheses, since -split '..' (without the grouping parentheses) actually does behave as expected. Looks like with the capturing group the captured matches are returned on top of the results of the split operation.
Anyway, to get the behavior you want replace
... -split '(..)'
with
... |
Select-String '..' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Value
You can also replace the nested loop with something like this:
foreach ($i in $Z) {
if (if $Split -contains $i) {
Write-Host "Match found: ${i}"
}
}
A slightly different approach using regex '.Match()' should also do it.
I have added a lot of explaining comments for you:
$Test = Get-Content "C:\Users\2.txt" -Raw # Read as single string. Contains "xxaavv6J"
$Data = (Get-Content "C:\Users\d.txt") -join '' # Read as array and join the lines with an empty string.
# This will remove Newlines. Contains "6J6SCa.yB"
# Split the data and make sure every substring has two characters
# In each substring, the regex special characters need to be Escaped.
# When this is done, we join the substrings together using the pipe symbol.
$Data = ($Data -split '(.{2})' | # split on every two characters
Where-Object { $_.Length -eq 2 } | # don't care about any left over character
ForEach-Object { [Regex]::Escape($_) } ) -join '|' # join with the '|' which is an OR in regular expression
# $Data is now a string to use with regular expression: "6J|6S|Ca|\.y"
# Using '.Match()' works Case-Sensitive. To have it compare Case-Insensitive, we do this:
$Data = '(?i)' + $Data
# See if we can find one or more matches
$regex = [regex]$Data
$match = $regex.Match($Test)
# If we have found at least one match:
if ($match.Groups.Count) {
while ($match.Success) {
# matched text: $match.Value
# match start: $match.Index
# match length: $match.Length
Write-Host ("Match found: {0}" -f $match.Value)
$match = $match.NextMatch()
}
}
else {
Write-Host "Not Found"
}
Result:
Match found: 6J
Further to the excellent Ansgar Wiechers' answer: if you are running (above) Windows PowerShell 4.0 then you could apply the .Where() method described in Kirk Munro's exhaustive article ForEach and Where magic methods:
With the release of Windows PowerShell 4.0, two new “magic” methods
were introduced for collection types that provide a new syntax for
accessing ForEach and Where capabilities in Windows PowerShell.
These methods are aptly named ForEach and Where. I call
these methods “magic” because they are quite magical in how they work
in PowerShell. They don’t show up in Get-Member output, even if you
apply -Force and request -MemberType All. If you roll up your
sleeves and dig in with reflection, you can find them; however, it
requires a broad search because they are private extension methods
implemented on a private class. Yet even though they are not
discoverable without peeking under the covers, they are there when you
need them, they are faster than their older counterparts, and they
include functionality that was not available in their older
counterparts, hence the “magic” feeling they leave you with when you
use them in PowerShell. Unfortunately, these methods remain
undocumented even today, almost a year since they were publicly
released, so many people don’t realize the power that is available in
these methods.
…
The Where method
Where is a method that allows you to filter a collection of objects.
This is very much like the Where-Object cmdlet, but the Where
method is also like Select-Object and Group-Object as well,
includes several additional features that the Where-Object cmdlet
does not natively support by itself. This method provides faster
performance than Where-Object in a simple, elegant command. Like
the ForEach method, any objects that are output by this method are
returned in a generic collection of type
System.Collections.ObjectModel.Collection1[psobject].
There is only one version of this method, which can be described as
follows:
Where(scriptblock expression[, WhereOperatorSelectionMode mode[, int numberToReturn]])
As indicated by the square brackets, the expression script block is
required and the mode enumeration and the numberToReturn integer
argument are optional, so you can invoke this method using 1, 2, or 3
arguments. If you want to use a particular argument, you must provide
all arguments to the left of that argument (i.e. if you want to
provide a value for numberToReturn, you must provide values for
mode and expression as well).
Applied to your case (using the simplest variant Where(scriptblock expression) of the .Where() method):
$X = '6J6SCa.yB' # Get-Content "C:\Users\2.txt"
$Data = 'xxaavv6J' # Get-Content "C:\Users\d.txt"
$Split = ($Data -split '(..)').Where({$_ -ne ''})
$Y = $X.Substring(0, 6)
$Z = ($Y -split '(..)').Where{$_ -ne ''} # without parentheses
For instance, Ansgar's example changes as follows:
PS > ('xxaavv6J' -split '(..)').Where{$_ -ne ''} | % { "-$_-" }
-xx-
-aa-
-vv-
-6J-

How can I use PowerShell to expand placeholders in a template string using values read from an INI file?

values.ini looks like
[default]
A=1
B=2
C=3
foo.txt looks like
Now is the %A% for %a% %B% men to come to the %C% of their %c%
I want to use Powershell to search for all of the %x% values in values.ini and then replace every matching instance in foo.txt with the corresponding value, case insensitively; generating the following:
Now is the 1 for 1 2 men to come to the 3 of their 3
Assuming PowerShell version 3.0 or newer, you can use the ConvertFrom-StringData cmdlet to parse the key-value pair in your ini file, but you'll need to filter out the [default] directive:
# grab relevant lines from file
$KeyValPairs = Get-Content .\values.ini | Where {$_ -like "*=*" }
# join strings together as one big string
$KeyValPairString = $KeyValPairs -join [Environment]::NewLine
# create hashtable/dictionary from string with ConvertFrom-StringData
$Dictionary = $KeyValPairString |ConvertFrom-StringData
You can then use the [regex]::Replace() method to do a lookup against the dictionary for each match you want to replace:
Get-Content .\foo.txt |ForEach-Object {
[Regex]::Replace($_, '%(\p{L}+)%', {
param($Match)
# look term up in dictionary
return $Dictionary[$Match.Groups[1].Value]
})
}
To complement Mathias R. Jessen's excellent answer with alternative approaches that also take the later requirement change of limiting values to a specific INI-file section into account (PSv2+, except for Get-Content -Raw; in PSv2, use (Get-Content ...) -join "`n" instead.)
Using PsIni\Get-IniContent and [environment]::ExpandEnvironmentVariables():
# Translate key-value pairs from section the section of interest
# into environment variables.
# After this command, the following environment variables are defined:
# $env:A, with value 1 (cmd.exe equivalent: %A%)
# $env:B, with value 2 (cmd.exe equivalent: %B%)
# $env:C, with value 3 (cmd.exe equivalent: %C%)
$section = 'default' # Specify the INI-file section of interest.
(Get-IniContent values.ini)[$section].GetEnumerator() |
ForEach-Object { Set-Item "env:$($_.Name)" -Value $_.Value }
# Read the template string as a whole from file foo.txt, and expand the
# environment-variable references in it, using the .NET framework.
# With the sample input, this yields
# "Now is the 1 for 1 2 men to come to the 3 of their 3".
[environment]::ExpandEnvironmentVariables((Get-Content -Raw foo.txt))
The 3rd-party Get-IniContent cmdlet, which conveniently reads an INI file (*.ini) into a nested, ordered hashtable, can easily be installed with Install-Module PsIni from an elevated console (alternatively, add -Scope CurrentUser), if you have PS v5+ (or v3 or v4 with PackageManagement installed).
This solution takes advantage of the fact that the placeholders (e.g., %a%) look like cmd.exe-style environment-variable references.
Note the assumptions and caveats:
All ini-file keys / placeholder names are legal environment-variable names.
Preexisting variables may be overwritten, which can be problematic with names such as PATH.
Cross-platform caveat: on Unix-like platforms, environment-variable references are case-sensitive, so the solution won't work the same there.
Using custom INI-file parsing and [environment]::ExpandEnvironmentVariables():
If installing a module for INI-file parsing is not an option, the following solution uses a - rather complex - regular expression to extract the section of interest via the -replace operator.
$section = 'default' # Specify the INI-file section of interest.
# Get all non-empty, non-comment lines from the section using a regex.
$sectLines = (Get-Content -Raw values.ini) -replace ('(?smn)\A.*?(^|\r\n)\[' + [regex]::Escape($section) + '\]\r\n(?<sectLines>.*?)(\r\n\[.*|\Z)'), '${sectLines}' -split "`r`n" -notmatch '(^;|^\s*$)'
# Define the key-value pairs as environment variables.
$sectlines | ForEach-Object { $tokens = $_ -split '=', 2; Set-Item "env:$($tokens[0].Trim())" -Value $tokens[1].Trim() }
# Read the template string as a whole, and expand the environment-variable
# references in it, as before.
[environment]::ExpandEnvironmentVariables((Get-Content -Raw foo.txt))
I found a simpler solution using this INI script called Get-IniContent.
#read from Setup.ini
$INI = Get-IniContent .\Setup.ini
$sec="setup"
#REPLACE VARIABLES
foreach($c in Get-ChildItem -Path .\Application -Recurse -Filter *.config)
{
Write-Output $c.FullName
Write-Output $c.DirectoryName
$configFile = Get-Content $c.FullName -Raw
foreach($v in $INI[$sec].Keys)
{
$k = '%'+$v+'%'
$match = [regex]::IsMatch($configFile, $k)
if($match)
{
$configFile = $configFile -ireplace [regex]::Escape($k), $INI[$sec][$v]
}
}
Set-Content $c.FullName -Value $configFile
}