Exclude index in powershell - powershell

I have a very simple requirement of removing couple of lines in a file. I found little help on the net , where we can make use of Index. Suppose i want to select 5th line i use
Get-Content file.txt | Select -Index 4
Similarly, what if i dont need the 5th and 6th line? How would the statement change?
Get-Content file.txt | Select -Index -ne 4
I tried using -ne between -Index and the number. It did not work. neither did "!=".
Also the below code gives no error but not the desired output
$tmp = $test | where {$_.Index -ne 4,5 }

Pipeline elements does not have Index auto-property, but you can add it, if you wish:
function Add-Index {
begin {
$i=-1
}
process {
Add-Member Index (++$i) -InputObject $_ -PassThru
}
}
Then you can apply filtering by Index:
Get-Content file.txt | Add-Index | Where-Object Index -notin 4,5

Don't know about the Index property or parameter, but you can also achieve it like this :
$count = 0
$exclude = 4, 5
Get-Content "G:\input\sqlite.txt" | % {
if($exclude -notcontains $count){ $_ }
$count++
}
EDIT :
The ReadCount property holds the information you need :)
$exclude = 0, 1
Get-Content "G:\input\sqlite.txt" | Where-Object { $_.ReadCount -NotIn $exclude }
WARNING : as pointed by #PetSerAl and #Matt, ReadCount starts at 1 and not 0 like arrays

Try this:
get-content file.txt | select -index (0..3 + 5..10000)
It's a bit of a hack, but it works. Downside is that building the range takes some time. Also, adjust the 10000 to make sure you get the whole file.

Convert this as an array and use RemoveRange method(ind index, int count)
[System.Collections.ArrayList]$text = gc C:\file.txt
$text.RemoveRange(4,1)
$text

Related

PowerShell: Find unique values from multiple CSV files

let's say that I have several CSV files and I need to check a specific column and find values that exist in one file, but not in any of the others. I'm having a bit of trouble coming up with the best way to go about it as I wanted to use Compare-Object and possibly keep all columns and not just the one that contains the values I'm checking.
So I do indeed have several CSV files and they all have a Service Code column, and I'm trying to create a list for each Service Code that only appears in one file. So I would have "Service Codes only in CSV1", "Service Codes only in CSV2", etc.
Based on some testing and a semi-related question, I've come up with a workable solution, but with all of the nesting and For loops, I'm wondering if there is a more elegant method out there.
Here's what I do have:
$files = Get-ChildItem -LiteralPath "C:\temp\ItemCompare" -Include "*.csv"
$HashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $files.Count; $i++){
$TempHashSet = [System.Collections.Generic.HashSet[String]]::New([String[]](Import-Csv $files[$i])."Service Code")
$HashList.Add($TempHashSet)
}
$FinalHashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $HashList.Count; $i++){
$UniqueHS = [System.Collections.Generic.HashSet[String]]::New($HashList[$i])
For ($j = 0; $j -lt $HashList.Count; $j++){
#Skip the check when the HashSet would be compared to itself
If ($j -eq $i){Continue}
$UniqueHS.ExceptWith($HashList[$j])
}
$FinalHashList.Add($UniqueHS)
}
It seems a bit messy to me using so many different .NET references, and I know I could make it cleaner with a tag to say using namespace System.Collections.Generic, but I'm wondering if there is a way to make it work using Compare-Object which was my first attempt, or even just a simpler/more efficient method to filter each file.
I believe I found an "elegant" solution based on Group-Object, using only a single pipeline:
# Import all CSV files.
Get-ChildItem $PSScriptRoot\csv\*.csv -File -PipelineVariable file | Import-Csv |
# Add new column "FileName" to distinguish the files.
Select-Object *, #{ label = 'FileName'; expression = { $file.Name } } |
# Group by ServiceCode to get a list of files per distinct value.
Group-Object ServiceCode |
# Filter by ServiceCode values that exist only in a single file.
# Sort-Object -Unique takes care of possible duplicates within a single file.
Where-Object { ( $_.Group.FileName | Sort-Object -Unique ).Count -eq 1 } |
# Expand the groups so we get the original object structure back.
ForEach-Object Group |
# Format-Table requires sorting by FileName, for -GroupBy.
Sort-Object FileName |
# Finally pretty-print the result.
Format-Table -Property ServiceCode, Foo -GroupBy FileName
Test Input
a.csv:
ServiceCode,Foo
1,fop
2,fip
3,fap
b.csv:
ServiceCode,Foo
6,bar
6,baz
3,bam
2,bir
4,biz
c.csv:
ServiceCode,Foo
2,bla
5,blu
1,bli
Output
FileName: b.csv
ServiceCode Foo
----------- ---
4 biz
6 bar
6 baz
FileName: c.csv
ServiceCode Foo
----------- ---
5 blu
Looks correct to me. The values 1, 2 and 3 are duplicated between multiple files, so they are excluded. 4, 5 and 6 exist only in single files, while 6 is a duplicate value only within a single file.
Understanding the code
Maybe it is easier to understand how this code works, by looking at the intermediate output of the pipeline produced by the Group-Object line:
Count Name Group
----- ---- -----
2 1 {#{ServiceCode=1; Foo=fop; FileName=a.csv}, #{ServiceCode=1; Foo=bli; FileName=c.csv}}
3 2 {#{ServiceCode=2; Foo=fip; FileName=a.csv}, #{ServiceCode=2; Foo=bir; FileName=b.csv}, #{ServiceCode=2; Foo=bla; FileName=c.csv}}
2 3 {#{ServiceCode=3; Foo=fap; FileName=a.csv}, #{ServiceCode=3; Foo=bam; FileName=b.csv}}
1 4 {#{ServiceCode=4; Foo=biz; FileName=b.csv}}
1 5 {#{ServiceCode=5; Foo=blu; FileName=c.csv}}
2 6 {#{ServiceCode=6; Foo=bar; FileName=b.csv}, #{ServiceCode=6; Foo=baz; FileName=b.csv}}
Here the Name contains the unique ServiceCode values, while Group "links" the data to the files.
From here it should already be clear how to find values that exist only in single files. If duplicate ServiceCode values within a single file wouldn't be allowed, we could even simplify the filter to Where-Object Count -eq 1. Since it was stated that dupes within single files may exist, we need the Sort-Object -Unique to count multiple equal file names within a group as only one.
It is not completely clear what you expect as an output.
If this is just the ServiceCodes that intersect then this is actually a duplicate with:
Comparing two arrays & get the values which are not common
Union and Intersection in PowerShell?
But taking that you actually want the related object and files, you might use this approach:
$HashTable = #{}
ForEach ($File in Get-ChildItem .\*.csv) {
ForEach ($Object in (Import-Csv $File)) {
$HashTable[$Object.ServiceCode] = $Object |Select-Object *,
#{ n='File'; e={ $File.Name } },
#{ n='Count'; e={ $HashTable[$Object.ServiceCode].Count + 1 } }
}
}
$HashTable.Values |Where-Object Count -eq 1
Here is my take on this fun exercise, I'm using a similar approach as yours with the HashSet but adding [System.StringComparer]::OrdinalIgnoreCase to leverage the .Contains(..) method:
using namespace System.Collections.Generic
# Generate Random CSVs:
$charset = 'abABcdCD0123xXyYzZ'
$ran = [random]::new()
$csvs = #{}
foreach($i in 1..50) # Create 50 CSVs for testing
{
$csvs["csv$i"] = foreach($z in 1..50) # With 50 Rows
{
$index = (0..2).ForEach({ $ran.Next($charset.Length) })
[pscustomobject]#{
ServiceCode = [string]::new($charset[$index])
Data = $ran.Next()
}
}
}
# Get Unique 'ServiceCode' per CSV:
$result = #{}
foreach($key in $csvs.Keys)
{
# Get all unique `ServiceCode` from the other CSVs
$tempHash = [HashSet[string]]::new(
[string[]]($csvs[$csvs.Keys -ne $key].ServiceCode),
[System.StringComparer]::OrdinalIgnoreCase
)
# Filter the unique `ServiceCode`
$result[$key] = foreach($line in $csvs[$key])
{
if(-not $tempHash.Contains($line.ServiceCode))
{
$line
}
}
}
# Test if the code worked,
# If something is returned from here means it didn't work
foreach($key in $result.Keys)
{
$tmp = $result[$result.Keys -ne $key].ServiceCode
foreach($val in $result[$key])
{
if($val.ServiceCode -in $tmp)
{
$val
}
}
}
i was able to get unique items as follow
# Get all items of CSVs in a single variable with adding the file name at the last column
$CSVs = Get-ChildItem "C:\temp\ItemCompare\*.csv" | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}}
}
Foreach($line in $CSVs){
$ServiceCode = $line.ServiceCode
$file = $line.Filename
if (!($CSVs | where {$_.ServiceCode -eq $ServiceCode -and $_.filename -ne $file})){
$line
}
}

Using Powershell to compare two files and then output only the different string names

So I am a complete beginner at Powershell but need to write a script that will take a file, compare it against another file, and tell me what strings are different in the first compared to the second. I have had a go at this but I am struggling with the outputs as my script will currently only tell me on which line things are different, but it also seems to count lines that are empty too.
To give some context for what I am trying to achieve, I would like to have a static file of known good Windows processes ($Authorized) and I want my script to pull a list of current running processes, filter by the process name column so to just pull the process name strings, then match anything over 1 character, sort the file by unique values and then compare it against $Authorized, plus finally either outputting the different process strings found in $Processes (to the ISE Output Pane) or just to output the different process names to a file.
I have spent today attempting the following in Powershell ISE and also Googling around to try and find solutions. I heard 'fc' is a better choice instead of Compare-Object but I could not get that to work. I have thus far managed to get it to work but the final part where it compares the two files it seems to compare line by line, for which would always give me false positives as the line position of the process names in the file supplied would change, furthermore I only want to see the changed process names, and not the line numbers which it is reporting ("The process at line 34 is an outlier" is what currently gets outputted).
I hope this makes sense, and any help on this would be very much appreciated.
Get-Process | Format-Table -Wrap -Autosize -Property ProcessName | Outfile c:\users\me\Desktop\Processes.txt
$Processes = 'c:\Users\me\Desktop\Processes.txt'
$Output_file = 'c:\Users\me\Desktop\Extracted.txt'
$Sorted = 'c:\Users\me\Desktop\Sorted.txt'
$Authorized = 'c:\Users\me\Desktop\Authorized.txt'
$regex = '.{1,}'
select-string -Path $Processes -Pattern $regex |% { $_.Matches } |% { $_.Value } > $Output_file
Get-Content $Output_file | Sort-Object -Unique > $Sorted
$dif = Compare-Object -ReferenceObject $(Get-Content $Sorted) -DifferenceObject $(get-content $Authorized) -IncludeEqual
$lineNumber = 1
foreach ($difference in $dif)
{
if ($difference.SideIndicator -ne "==")
{
Write-Output "The Process at Line $linenumber is an Outlier"
}
$lineNumber ++
}
Remove-Item c:\Users\me\Desktop\Processes.txt
Remove-Item c:\Users\me\Desktop\Extracted.txt
Write-Output "The Results are Stored in $Sorted"
From the length and complexity of your script, I feel like I'm missing something, but your description seems clear
Running process names:
$ProcessNames = #(Get-Process | Select-Object -ExpandProperty Name)
.. which aren't blank: $ProcessNames = $ProcessNames | Where-Object {$_ -ne ''}
List of authorised names from a file:
$AuthorizedNames = Get-Content 'c:\Users\me\Desktop\Authorized.txt'
Compare:
$UnAuthorizedNames = $ProcessNames | Where-Object { $_ -notin $AuthorizedNames }
optional output to file:
$UnAuthorizedNames | Set-Content out.txt
or in the shell:
#(gps).Name -ne '' |? { $_ -notin (gc authorized.txt) } | sc out.txt
1 2 3 4 5 6 7 8
1. #() forces something to be an array, even if it only returns one thing
2. gps is a default alias of Get-Process
3. using .Property on an array takes that property value from every item in the array
4. using an operator on an array filters the array by whether the items pass the test
5. ? is an alias of Where-Object
6. -notin tests if one item is not in a collection
7. gc is an alias of Get-Content
8. sc is an alias of Set-Content
You should use Set-Content instead of Out-File and > because it handles character encoding nicely, and they don't. And because Get-Content/Set-Content sounds like a memorable matched pair, and Get-Content/Out-File doesn't.

Count tabs per line and return the lines with too many tabs

Looking for a PowerShell script that looks in a text file for rows that have too many (or too few) tabs.
I found this PowerShell script that does exactly what I want (almost).
This counts the number of tabs per row:
Get-Content test.txt | ForEach-Object {
($_ | Select-String `t -all).matches | Measure-Object | Select-Object count
}
Can someone extend/modify/re-write this to return only the rows (with row numbers) that have more than, or less than, X number of tabs per row?
Don't use Get-Content before piping to Select-String, you'll lose contextual information about each line.
Instead, use the -Path parameter with Select-String:
$Tabs = Select-String -Path .\test.txt -Pattern "`t" -AllMatches
$Tabs |Select-Object LineNumber,Line,#{Name='TabCount';Expression={ $_.Matches.Count }}
To return only the ones where the number of tabs is greater than $x, use Where-Object:
$x = 3
$Tabs |Where-Object { $_.TabCount -ge $x} | Select-Object -ExpandProperty Line
If you just want a quick overview of the distribution, you could also use Group-Object:
Get-Content .\test.txt | Group-Object { "{0} tabs" -f [regex]::Matches($_,"`t").Count }
Lots of ways to do this. Get-Content works just fine for me and we create a custom object that you can then filter as desired.
Get-Content test.txt | ForEach-Object{
New-Object PSObject -Property #{
Line = $_
LineNumber = $_.ReadCount
NumberofTabs = [regex]::matches($_,"`t").count
}
}
Use the .net regex method to count the tabs returned and populate a value based on the result.
NumberofTabs Number Line
------------ ------ ----
8 1 ;lkjasfdsa
8 2 asdfasdf
4 3 asdfasdfasdfa
2 4 fasdfjasdlfjas;l
Now you can use PowerShell to filter as you see fit.
} | Where-Object { $_.NumberofTabs -ne 4}
So if 4 was the perfect number then line 3 would be ommited from the results.

Using powershell, how do I extract a 7-digit number from a subject-line (of an email ), regular expressions?

I have the following code which lists the first 5 items in the Inbox folder (of Outlook).
How would I extract only the number portion of it( say - 7 digit arbitrary numberss, which are embedded within other text)? Then using Powershell commands, I'd really like to take those extracted numbers and dump them to a CSV file(thus, they can be easily incorporated into an existing spreadsheet I use).
Here's what I tried :
$outlook = new-object -com Outlook.Application
$sentMail = $outlook.Session.GetDefaultFolder(6) # == olFolderInbox
$sentMail.Items | select -last 10 TaskSubject # ideally, grabbing first 20
$matches2 = "\d+$"
$res = gc $sentMail.Items | ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] }
but this does not run correctly, but rather .. keeps me hanging with awaiting-input symbol: like so :
>>
>>
>>
Do I need to perhaps create a separate variable in between the 1st part and 2nd part?
Not sure what the $matches variable is for but try to replace your last line with something like below.
For Subject Line Items:
$sentMail.Items | % { $_.TaskSubject | Select-String -Pattern '^\d{3}-\d{3}-\d{4}' | % {([string]$_).Substring(0,12)} }
For Message Body Items:
$sentMail.Items | % { ($_.Body).Split("`n") | Select-String -Pattern '^\d{3}-\d{3}-\d{4}' |% {([string]$_).Substring(0,12)} }
Here is a refrence to Select-String which I use pretty often.
https://technet.microsoft.com/library/hh849903.aspx
Here is a reference to the Phone number portion which I have never used but found pretty cool.
http://blogs.technet.com/b/heyscriptingguy/archive/2011/03/24/use-powershell-to-search-a-group-of-files-for-phone-numbers.aspx
Good luck!
Here is an edited version for 7 digit extraction via subject line. This assumes the number has a space on each side but can be modified a bit if necessary. You may also want to adjust the depth by changing the -First portion to Select * or just making 100 deeper in range.
$outlook = New-Object -com Outlook.Application
$Mail = $outlook.Session.GetDefaultFolder(6) # Folder Inbox
$Mail.Items | select -First 100 TaskSubject |
% { $_.TaskSubject | Select-String -Pattern '\s\d{7}\s'} |
% {((Select-String -InputObject $_ -Pattern '\s\d{7}\s').Line).split(" ") |
% {if(($_.Length -eq 7) -and ($_ -match '\d{7}')) {$_ | Out-File -FilePath "C:\Temp\SomeFile.csv" -Append}}}
Some of this you have already addressed / figured out but I wanted to explain the issues with your current code.
If you expect multiple matches and want to return those then you would need to use Select-String with the -AllMatches parameter. Your regex, in your example, is currently looking for a sequence of digits at the end of the subject. That would only return one match so lets looks at the issues with your code.
$sentMail.Items | select -last 10 TaskSubject
You are filtering the last 10 items but you are not storing those for later use so they would merely be displayed on screen. We cover a solution later.
One of the primary reasons for using -match is to get the Boolean value that is returned for code like if blocks and where clauses. You can still use it in the way you intended. Looking at the current code in question:
$res = gc $sentMail.Items | ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] }
The two big issues with this are you are calling Get-Content(gc) on each item. Get-Content is for pulling file data which $sentMail.Items is not. You also having a large where block. Where blocks will pass data to the output steam based on a true or false condition. Your malformed statement ?{$_ -match $matches2 | %{ $_ -match $matches2 | out-null; $matches[1] } wont do this... at least not well.
$outlook = new-object -com Outlook.Application
$sentMail = $outlook.Session.GetDefaultFolder(6) # == olFolderInbox
$matches2 = "\d+$"
$sentMail.Items | select -last 10 -ExpandProperty TaskSubject | ?{$_ -match $matches2} | %{$Matches[0]}
Take the last 10 email subjects and check if either of them match the regex string $matches2. If they do then return the string match to standard output.

Count number of files in each subfolder, ignoring files with certain name

Consider the following directory tree
ROOT
BAR001
foo_1.txt
foo_2.txt
foo_ignore_this_1.txt
BAR001_a
foo_3.txt
foo_4.txt
foo_ignore_this_2.txt
foo_ignore_this_3.txt
BAR001_b
foo_5.txt
foo_ignore_this_4.txt
BAR002
baz_1.txt
baz_ignore_this_1.txt
BAR002_a
baz_2.txt
baz_ignore_this_2.txt
BAR002_b
baz_3.txt
baz_4.txt
baz_5.txt
baz_ignore_this_3.txt
BAR002_c
baz_ignore_this_4.txt
BAR003
lor_1.txt
The structure will always be like this, so no deeper subfolders. I'm working on a script to count the number of files:
for each BARXXX folder
for each BARXXX_Y folder
textfiles with "ignore_this" in the name, should be ignored in the count
For the example above, this would result into:
Folder Filecount
---------------------
BAR001 2
BAR001_a 2
BAR001_b 1
BAR002 1
BAR002_a 1
BAR002_b 3
BAR002_c 0
BAR003 1
I now have:
Function Filecount {
param(
[string]$dir
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($childs in $child) {
Write-Host (Get-ChildItem $dir | Measure-Object).Count;
}
}
Filecount -dir "C:\ROOT"
(Not ready yet but building) This however, does not work. $child seems to be empty. Please tell me what I'm doing wrong.
Well, to start, you're running ForEach ($childs in $child), this syntax is backwards, so that will cause you some issues! If you swap it, so that you're running:
ForEach ($child in $childs)
You'll get the following output:
>2
>2
>1
>1
>1
>3
>0
Alright, I'm back now with the completed answer. For one, instead of using Write-Out, I'm using a PowerShell custom object to let PowerShell do the hard work for me. I'm setting FolderName equal to the $child.BaseName, and then running a GCI on the $Child.FullName to get the file count. I've added an extra parameter called $ignoreme, that should have an asterisk value for the values you want to ignore.
Here's the complete answer now. Keep in mind that my file structure was a bit different than yours, so my file count is different at the bottom as well.
Function Filecount {
param(
[string]$dir="C:\TEMP\Example",
[string]$ignoreme = "*_*"
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($child in $childs) {
[pscustomobject]#{FolderName=$child.Name;ItemCount=(Get-ChildItem $child.FullName | ? Name -notlike $ignoreme | Measure-Object).Count}
}
}
>Filecount | ft -AutoSize
>FolderName ItemCount
>---------- ---------
>BAR001 2
>BAR001_A 1
>BAR001_b 2
>BAR001_C 0
>BAR002 0
>BAR003 0
If you're using PowerShell v 2.0, use this method instead.
Function Filecount {
param(
[string]$dir="C:\TEMP\Example",
[string]$ignoreme = "*_*"
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($child in $childs) {
$ObjectProperties = #{
FolderName=$child.Name
ItemCount=(Get-ChildItem $child.FullName | ? Name -notlike $ignoreme | Measure-Object).Count}
New-Object PSObject -Property $ObjectProperties
}
}
I like that way of creating an object 1RedOne, haven't seen that before, thanks.
We can improve the performance of the code in a few of ways. By using the Filter Left principle, which states that the provider for any cmdlet is inherently more efficient than running things through PowerShell, by performing fewer loops and by removing an unnecessary step:
Function Filecount
{
param
(
[string]$dir = ".",
[parameter(mandatory=$true)]
[string]$ignoreme
)
Get-ChildItem -Recurse -Directory -Path $dir | ForEach-Object `
{
[pscustomobject]#{FolderName=$_.Name;ItemCount=(Get-ChildItem -Recurse -Exclude "*$ignoreme*" -Path $_.FullName).count}
}
}
So, firstly we can use the -Directory switch of Get-Childitem in the top-level directory (I know this is available in v3.0 and above, not sure about v2.0).
Then we can pipe the output of this directly in to the next loop, without storing it first.
Then we can replace another Where-Object with a provider -Exclude.
Finally, we can remove the Measure-Object as a simple count of the array will do:
Filecount "ROOT" "ignore_this" | ft -a
FolderName ItemCount
---------- ---------
BAR001 2
BAR001_a 2
BAR001_b 1
BAR002 1
BAR002_a 1
BAR002_b 3
BAR002_c 0
BAR003 1
Cheers Folks!