Construct a formatted list of strings in PowerShell - powershell

I have several lists consisting of strings like this (imagine them as a tree of sort):
$list1:
data-pool
data-pool.house 1
data-pool.house 1.door2
data-pool.house 1.door3
data-pool.house2
data-pool.house2.door 1
To make them more easier to parse as a tree how can indent them based on how many . characters occur while ditching the repetitive text earlier in the line? For example:
data-pool
house 1
door2
door3
house2
door 1
The way I approached it counting the occurrences of .s with .split('.').Length-1 to determine the amount of needed indents and then adding the spaces using .IndexOf('.') and .Substring(0, <position>) feels overly complicated - or then I just can't wrap my head around how to do it in a less complicated way.

I think this should work as long as the number of nodes from line to line are ordered, what I mean by this is that it will not look "pretty" if for example the current node has n elements and the next node has n+2 or more.
To put it into perspective, using this list as an example:
$list = #'
data-pool
data-pool.house 1
data-pool.house 1.door2
data-pool.house 1.door3
data-pool.house 1.door4.something1
data-pool.house2
data-pool.house2.door 1
data-pool.house2.door 2.something1
data-pool.house2.door 3.something1.something2
data-pool.house3
data-pool.house3.door 1
data-pool.house3.door 2
'# -split '\r?\n'
The function indent will take each line of your list and will split it using . as delimiter, if the count of elements after splitting is lower than or equal to 1 it will not perform any modification and display that line as is, else, it will multiply the $IndentType containing 2 white spaces by the number of elements of the split array minus 1 and concatenate it with the last element of the split array.
function indent {
param(
[string]$Value,
[string]$IndentType = ' '
)
$out = $Value -split '\.'
$level = $out.Count - 1
'{0}{1}' -f ($null,($IndentType*$level))[[int]($out.Count -gt 1)], $out[-1]
}
$list.ForEach({ indent $_ })
Sample:
data-pool
house 1
door2
door3
something1
house2
door 1
something1
something2
house3
door 1
door 2

the approach to get the last element of the string is below
## String
$string = "data-pool.house 1"
## split the string into an array every "."
$split = $string.split(".")
## return the last element of the array
write-host $split[-1] -ForegroundColor Green
Then to test against each string
$myArray = ("data-pool", "data-pool.house 1", "data-pool.house 1.door2", "data-pool.house 1.door3", "data-pool.house2", "data-pool.house2.door 1")
ForEach($Name in $myArray) {
## String
$Name = $Name.ToString()
$string = $Name
$split = $string.split(".")
## return the last element of the array
write-host $split[-1] -ForegroundColor Green
}

Related

Truncate, Convert String and set output as variable

It seems so simple. I need a cmdlet to take a two word string, and truncate the first word to just the first character and truncate the second word to 11 characters, and eliminate the space between them. So "Arnold Schwarzenegger" would output to a variable as "ASchwarzeneg"
I literally have no code. My thinking was to
$vars=$var1.split(" ")
$var1=""
foreach $var in $vars{
????
}
I'm totally at a loss as to how to do this, and it seems so simple too. Any help would be appreciated.
Here is one way to do it using the index operator [ ] in combination with the range operator ..:
$vars = 'Arnold Schwarzenegger', 'Short Name'
$names = foreach($var in $vars) {
$i = $var.IndexOf(' ') + 1 # index after the space
$z = $i + 10 # to slice from `$i` until `$i + 10` (11 chars)
$var[0] + [string]::new($var[$i..$z])
}
$names

How to find the positions of all instances of a string in a specific line of a txt file?

Say that I have a .txt file with lines of multiple dates/times:
5/5/2020 5:45:45 AM
5/10/2020 12:30:03 PM
And I want to find the position of all slashes in one line, then move on to the next.
So for the first line I would want it to return the value:
1 3
And for the second line I would want:
1 4
How would I go about doing this?
I currently have:
$firstslashpos = Get-Content .\Documents\LoggedDates.txt | ForEach-Object{
$_.IndexOf("/")}
But that gives me only the first "/" on each line, and gives me that result for all lines at once. I need it to loop where I can figure out the space between each "/" for each line.
Sorry if I worded this badly.
You can indeed use the String.IndexOf() method for this!
function Find-SubstringIndex
{
param(
[string]$InputString,
[string]$Substring
)
$indices = #()
# start at position zero
$offset = 0
# Keep calling IndexOf() to find the next occurrence of the substring
# stop when IndexOf() returns -1
while(($i = $InputString.IndexOf($Substring, $offset)) -ne -1){
# Keep track of the index at which the substring was found
$indices += $i
# Update the offset, we'll want to start searching for the next index _after_ this one
$offset = $i + $Substring.Length
}
}
Now you can do:
Get-Content listOfDates.txt |ForEach-Object {
$indices = Find-SubstringIndex -InputString $_ -Substring '/'
Write-Host "Found slash at indices: $($indices -join ',')"
}
An concise solution is to use [regex]::Matches(), which finds all matches of a given regular expression in a given string and returns a collection of match objects that also indicate the index (character position) of each match:
# Create a sample file.
#'
5/5/2020 5:45:45 AM
5/10/2020 12:30:03 PM
'# > sample.txt
Get-Content sample.txt | ForEach-Object {
# Get the indices of all '/' instances.
$indices = [regex]::Matches($_, '/').Index
# Output them as a list (string), separated with spaces.
"$indices"
}
The above yields:
1 3
1 4
Note:
Input lines that contain no / instances at all will result in empty lines.
If, rather than strings, you want to output the indices as arrays (collections), use
, [regex]::Matches($_, '/').Index as the only statement in the ForEach-Object script block; the unary form of ,, the array constructor operator ensures (by way of a transient aux. array) that the collection returned by the method call is output as a whole. If you omit the , , the indices are output one by one, resulting in a flat array when collected in a variable.

Count matches between specific substrings

I have a string
1AAAAaaa>###_1BBbbbbbbb>###_2CCCCCCCCccccc
Data blocks begins with "a number" and end with >.
I need to calculate in how many of those blocks lower case letters outnumber upper case.
As an answer I want to get
there are x places between number and >, where lowercase is over 50%.
I understand how to do it for the whole string, but not for the separate regions.
You can separate each target section of the string into an array using split.
Then iterate through the array and do your count.
my $string = 'AAAAaaa>1BBbbbbbbb>2CCCCCCCCccccc>3DDDDDDDDDddd>4FFFFfffffff>';
my #targets = split(/(?=\d+\w+>)/, $string);
my $successes = 0;
foreach my $target (#targets){
my $target_lc = $target =~ tr/a-z//;
my $target_uc = $target =~ tr/A-Z//;
if($target_lc > $target_uc){
$successes++;
}
}
print $successes;
OUTPUT = 2

Extract the nth to nth characters of an string object

I have a filename and I wish to extract two portions of this and add into variables so I can compare if they are the same.
$name = FILE_20161012_054146_Import_5785_1234.xml
So I want...
$a = 5785
$b = 1234
if ($a = $b) {
# do stuff
}
I have tried to extract the 36th up to the 39th character
Select-Object {$_.Name[35,36,37,38]}
but I get
{5, 7, 8, 5}
Have considered splitting but looks messy.
There are several ways to do this. One of the most straightforward, as PetSerAl suggested is with .Substring():
$_.name.Substring(35,4)
Another way is with square braces, as you tried to do, but it gives you an array of [char] objects, not a string. You can use -join and you can use a range to make that easier:
$_.name[35..38] -join ''
For what you're doing, matching a pattern, you could also use a regular expression with capturing groups:
if ($_.name -match '_(\d{4})_(\d{4})\.xml$') {
if ($Matches[1] -eq $Matches[2]) {
# ...
}
}
This way can be very powerful, but you need to learn more about regex if you're not familiar. In this case it's looking for an underscore _ followed by 4 digits (0-9), followed by an underscore, and four more digits, followed by .xml at the end of the string. The digits are wrapped in parentheses so they are captured separately to be referenced later (in $Matches).
Yet another approach: returns 1234 substring four times.
$FileName = "FILE_20161012_054146_Import_5785_1234.xml"
# $FileName
$FileName.Substring(33,4) # Substring method (zero-based)
-join $FileName[33..36] # indexing from beginning (zero-based)
-join $FileName[-8..-5] # reverse indexing:
# e.g. $FileName[-1] returns the last character
$FileArr = $FileName.Split("_.") # Split (depends only on filename "pattern template")
$FileArr[$FileArr.Count -2] # does not depend on lengths of tokens

Using PowerShell To Count Sentences In A File

I am having an issue with my PowerShell Program counting the number of sentences in a file I am using. I am using the following code:
foreach ($Sentence in (Get-Content file))
{
$i = $Sentence.Split("?")
$n = $Sentence.Split(".")
$Sentences += $i.Length
$Sentences += $n.Length
}
The total number of sentences I should get is 61 but I am getting 71, could someone please help me out with this? I have Sentences set to zero as well.
Thanks
foreach ($Sentence in (Get-Content file))
{
$i = $Sentence.Split("[?\.]")
$Sentences = $i.Length
}
I edited your code a bit.
The . that you were using needs to be escaped, otherwise Powershell recognises it as a Regex dotall expression, which means "any character"
So you should split the string on "[?\.]" or similar.
When counting sentences, what you are looking for is where each sentence ends. Splitting, though, returns a collection of sentence fragments around those end characters, with the ends themselves represented by the gap between elements. Therefore, the number of sentences will equal the number of gaps, which is one less the number of fragments in the split result.
Of course, as Keith Hill pointed out in a comment above, the actual splitting is unnecessary when you can count the ends directly.
foreach( $Sentence in (Get-Content test.txt) ) {
# Split at every occurrence of '.' and '?', and count the gaps.
$Split = $Sentence.Split( '.?' )
$SplitSentences += $Split.Count - 1
# Count every occurrence of '.' and '?'.
$Ends = [char[]]$Sentence -match '[.?]'
$CountedSentences += $Ends.Count
}
Contents of test.txt file:
Is this a sentence? This is a
sentence. Is this a sentence?
This is a sentence. Is this a
very long sentence that spans
multiple lines?
Also, to clarify on the remarks to Vasili's answer: the PowerShell -split operator interprets a string as a regular expression by default, while the .NET Split method only works with literal string values.
For example:
'Unclosed [bracket?' -split '[?]' will treat [?] as a regular expression character class and match the ? character, returning the two strings 'Unclosed [bracket' and ''
'Unclosed [bracket?'.Split( '[?]' ) will call the Split(char[]) overload and match each [, ?, and ] character, returning the three strings 'Unclosed ', 'bracket', and ''