We have a directory, which features many subdirectories (one per day) with serveral files in it. Unfortunately, files can be resent - so a file of 2020-01-01 can be resend (with slightly different filename, since a timestamp is added to the filename) on 2020-02-03. The structure looks something like this:
TopDir
20200801
AFile_20200801_20200801150000 (Timestamped 2020-08-01 15:00:00)
BFile_20200801_20200801150000
CFile_20200801_20200801150000
20200802
AFile_20200802_20200801150000
BFile_20200802_20200801150000
CFile_20200802_20200801150000
AFile_20200801_20200802150000 (Timestamped 2020-08-02 15:00:00)
So the AFile of 2020-08-01 has been resent on 2020-08-02 at 3 PM.
I am now trying to retrieve a list with the most recent file per day, so I built up an array and populated it with all files below TopDir (recurively). So far so good, all files are found:
$path = "Y:\";
$FileArray = #()
$FileNameArray = #()
$FileArrayCounter = 0
foreach ($item in Get-ChildItem $path -Recurse)
{
if ($item.Extension -ne "")
{
$StringPart1, $StringPart2, $StringPart3, $StringPart4 = $item.Name.Split('_');
$FileNameShort = "{0}_{1}_{2}" -f $StringPart1.Trim(), $StringPart2.Trim(), $StringPart3.Trim();
$FileNameShort = $FileNameShort.Trim().ToUpper();
$FileArray += #{FileID = $FileArrayCounter; FileNameShort = $FileNameShort; FileName = $item.Name; FullName = $item.FullName; LastWriteTime = $item.LastWriteTime};
$FileArrayCounter ++;
}
}
$FileArray = $FileArray | sort FileNameShort; ##{Expression={"FileNameShort"}; Ascending=$True} #, #{Expression={"LastWriteTime"}; Descending=$True}
foreach($f in $FileArray)
{
Write-host($f.FileNameShort, $f.LastWriteTime)
}
Write-host($FileArrayCounter.ToString() + " Dateien gefunden");
The newly added column "FileNameShort" includes a substring of the filename. With this done, I receive two Rows for AFile_20200801:
AFile_20200801, AFile_20200801_20200801150000, ...
AFile_20200801, AFile_20200801_20200802150000, ...
However, when I try to sort my array (see above code), the output is NOT sorted by name. Instead I receive something like the following:
AFile_20200801
CFile_20200802
AFile_20200801
BFile_20200801
What I want to achieve is a sorting by FileNameShort ASCENDING and LastWriteTime DESCENDING.
What am I missing here?
Your sort does not work because $FileArray is an array of hash tables. The syntax Sort FileNameShort is binding the FileNameShort property to the -Property parameter of Sort-Object. However, the hash table does not contain a property called FileShortName. You can see this if you run $FileArray[0] | Get-Member.
If you create them as custom objects, the simple sort syntax works.
$FileArray += [pscustomobject]#{FileID = $FileArrayCounter; FileNameShort = $FileNameShort; FileName = $item.Name; FullName = $item.FullName; LastWriteTime = $item.LastWriteTime}
$FileArray | Sort FileNameShort # This will sort correctly
As an aside, I do not recommend using += to seemingly add elements to an array. It is best to either output the results inside of your loop and save the loop results or create a list with an .Add() method. The problem with += is the current array is expanded into memory and those contents are then used to create a new array with the new items. As the array grows, it becomes increasingly non-performant. See below for a more efficient example.
$FileArray = foreach ($item in Get-ChildItem $path -Recurse)
{
if ($item.Extension -ne "")
{
$StringPart1, $StringPart2, $StringPart3, $StringPart4 = $item.Name.Split('_');
$FileNameShort = "{0}_{1}_{2}" -f $StringPart1.Trim(), $StringPart2.Trim(), $StringPart3.Trim();
$FileNameShort = $FileNameShort.Trim().ToUpper();
# Outputting custom object here
[pscustomobject]#{FileID = $FileArrayCounter; FileNameShort = $FileNameShort; FileName = $item.Name; FullName = $item.FullName; LastWriteTime = $item.LastWriteTime};
$FileArrayCounter ++;
}
}
I just found the solution:
$FileArray = $FileArray | sort #{Expression={[string]$_.FileNameShort}; Ascending=$True}, #{Expression={[datetime]$_.LastWriteTime}; Descending=$True}
Still I don't know, why the first sorting did not work as expected.
Related
I have a TXT file with 1300 megabytes (huge thing). I want to build code that does two things:
Every line contains a unique ID at the beginning. I want to check for all lines with the same unique ID if the conditions is met for that "group" of IDs. (This answers me: For how many lines with the unique ID X have all conditions been met)
If the script is finished I want to remove all lines from the TXT where the condition was met (see 2). So I can rerun the script with another condition set to "narrow down" the whole document.
After few cycles I finally have a set of conditions that applies to all lines in the document.
It seems that my current approach is very slow.( one cycle needs hours). My final result is a set of conditions that apply to all lines of code.
If you find an easier way to do that, feel free to recommend.
Help is welcome :)
Code so far (does not fullfill everything from 1&2)
foreach ($item in $liste)
{
# Check Conditions
if ( ($item -like "*XXX*") -and ($item -like "*YYY*") -and ($item -notlike "*ZZZ*")) {
# Add a line to a document to see which lines match condition
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
# Retrieve the unique ID from the line and feed array.
$array += $item.Split("/")[1]
# Remove the line from final document
$liste = $liste -replace $item, ""
}
}
# Pipe the "new cleaned" list somewhere
$liste | Set-Content -Path "C:\NewListToWorkWith.txt"
# Show me the counts
$array | group | % { $h = #{} } { $h[$_.Name] = $_.Count } { $h } | Out-File "C:\Desktop\count.txt"
Demo Lines:
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
performance considerations:
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
try to avoid wrapping cmdlet pipelines
See also: Mastering the (steppable) pipeline
$array += $item.Split("/")[1]
Try to avoid using the increase assignment operator (+=) to create a collection
See also: Why should I avoid using the increase assignment operator (+=) to create a collection
$liste = $liste -replace $item, ""
This is a very expensive operation considering that you are reassigning (copying) a long list ($liste) with each iteration.
Besides it is a bad practice to change an array that you are currently iterating.
$array | group | ...
Group-Object is a rather slow cmdlet, you better collect (or count) the items on-the-fly (where you do $array += $item.Split("/")[1]) using a hashtable, something like:
$Name = $item.Split("/")[1]
if (!$HashTable.Contains($Name)) { $HashTable[$Name] = [Collections.Generic.List[String]]::new() }
$HashTable[$Name].Add($Item)
To minimize memory usage it may be better to read one line at a time and check if it already exists. Below code I used StringReader and you can replace with StreamReader for reading from a file. I'm checking if the entire string exists, but you may want to split the line. Notice I have duplicaes in the input but not in the dictionary. See code below :
$rows= #"
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
"#
$dict = [System.Collections.Generic.Dictionary[int, System.Collections.Generic.List[string]]]::new();
$reader = [System.IO.StringReader]::new($rows)
while(($row = $reader.ReadLine()) -ne $null)
{
$hash = $row.GetHashCode()
if($dict.ContainsKey($hash))
{
#check if list contains the string
if($dict[$hash].Contains($row))
{
#string is a duplicate
}
else
{
#add string to dictionary value if it is not in list
$list = $dict[$hash].Value
$list.Add($row)
}
}
else
{
#add new hash value to dictionary
$list = [System.Collections.Generic.List[string]]::new();
$list.Add($row)
$dict.Add($hash, $list)
}
}
$dict
I'm trying to insert my CSV into my SQL Server database but just wondering how can I subtract the last three character from CSV GID column and then assigned it to my $CSVHold1 variable.
My CSV file look like this
GID Source Type Message Time
KLEMOE http://google.com Od Hello 12/22/2022
EEINGJ http://facebook.com Od hey 12/22/2022
Basically I'm trying to get only the first three character from GID and pass that value to my $CSVHold1 variable.
$CSVImport = Import-CSV $Global:ErrorReport
ForEach ($CSVLine1 in $CSVImport) {
$CSVHold1 = $CSVLine1.GID | ForEach-Object { $_.$GID = $_.$GID.subString(0, $_.$GID.Length - 3); $_ }
$CSVGID1 = $CSVLine1.GID
$CSVSource1 = $CSVLine1.Source
$CSVTYPE1 = $CSVLine1.TYPE
$CSVMessage1 = $CSVLine1.Message
}
I'm trying to do like above but some reason I'm getting an error.
You cannot call a method on a null-valued expression.
Your original line 3 was/is not valid syntax as Santiago pointed out.
$CSVHold1 = $CSVLine1.GID | ForEach-Object { $_.$GID = $_.$GID.subString(0, $_.$GID.Length - 3); $_ }
You are calling $_.$GID but you're wanting $_.GID
You also don't need to pipe the object into a loop to achieve what it seems you are asking.
#!/usr/bin/env powershell
$csvimport = Import-Csv -Path $env:HOMEDRIVE\Powershell\TestCSVs\test1.csv
##$CSVImport = Import-CSV $Global:ErrorReport
ForEach ($CSVLine1 in $CSVImport) {
$CSVHold1 = $CSVLine1.GID.SubString(0, $CSVLine1.GID.Length - 3)
$CSVGID1 = $CSVLine1.GID
$CSVSource1 = $CSVLine1.Source
$CSVTYPE1 = $CSVLine1.TYPE
$CSVMessage1 = $CSVLine1.Message
Write-Output -InputObject ('Changing {0} to {1}' -f $CSVLine1.gid, $CSVHold1)
}
Using your sample data, the above outputs:
C:> . 'C:\Powershell\Scripts\dchero.ps1'
Changing KLEMOE to KLE
Changing EEINGJ to EEI
Lastly, be aware that that the SubString method will fail if the length of $CSVLine1.GID is less than 3.
I have two large CSVs to compare. Bosth csvs are basically data from the same system 1 day apart. No of rows are around 12k and columns 30.
The aim is to identify what column data has changed for primary key(#ID).
My idea was to loop through the CSVs to identify which rows have changed and dump these into a separate csvs. One done, I again loop through the changes rows, and indetify the exact change in column.
NewCSV = Import-Csv -Path ".\Data_A.csv"
OldCSV = Import-Csv -Path ".\Data_B.csv"
foreach ($LineNew in $NewCSV)
{
ForEach ($LineOld in $OldCSV)
{
If($LineNew -eq $LineOld)
{
Write-Host $LineNew, " Match"
}else{
Write-Host $LineNew, " Not Match"
}
}
}
But as soon as run the loop, it takes forever to run for 12k rows. I was hoping there must be a more efficient way to compare large files powershell. Something that is quicker.
Well you can give this a try, I'm not claiming it will be fast for what vonPryz has already pointed out but it should give you a good side-by-side perspective to compare what has changed from OldCsv to NewCsv.
Note: Those cells that have the same value on both CSVs will be ignored.
$NewCSV = Import-Csv -Path ".\Data_A.csv"
$OldCSV = Import-Csv -Path ".\Data_B.csv" | Group-Object ID -AsHashTable -AsString
$properties = $newCsv[0].PSObject.Properties.Name
$result = foreach($line in $NewCSV)
{
if($ref = $OldCSV[$line.ID])
{
foreach($prop in $properties)
{
if($line.$prop -ne $ref.$prop)
{
[pscustomobject]#{
ID = $line.ID
Property = $prop
OldValue = $ref.$prop
NewValue = $line.$prop
}
}
}
continue
}
Write-Warning "ID $($line.ID) could not be found on Old Csv!!"
}
As vonPryz hints in the comments, you've written an algorithm with quadratic time complexity (O(n²) in Big-O notation) - every time the input size doubles, the number of computations performed increase 4-fold.
To avoid this, I'd suggest using a hashtable or other dictionary type to hold each data set, and use the primary key from the input as the dictionary key. This way you get constant-time lookup of corresponding records, and the time complexity of your algorithm becomes near-linear (O(2n + k)):
$NewCSV = #{}
Import-Csv -Path ".\Data_A.csv" |ForEach-Object {
$NewCSV[$_.ID] = $_
}
$OldCSV = #{}
Import-Csv -Path ".\Data_B.csv" |ForEach-Object {
$OldCSV[$_.ID] = $_
}
Now that we can efficiently resolve each row by it's ID, we can inspect the whole of the data sets with an independent loop over each:
foreach($entry in $NewCSV.GetEnumerator()){
if(-not $OldCSV.ContainsKey($entry.Key)){
# $entry.Value is a new row, not seen in the old data set
}
$newRow = $entry.Value
$oldRow = $OldCSV[$entry.Key]
# do the individual comparison of the rows here
}
Do another loop like above, but with $NewCSV in place of $OldCSV to find/detect deletions.
I need help with loop processing an array of arrays. I have finally figured out how to do it, and I am doing it as such...
$serverList = $1Servers,$2Servers,$3Servers,$4Servers,$5Servers
$serverList | % {
% {
Write-Host $_
}
}
I can't get it to process correctly. What I'd like to do is create a CSV from each array, and title the lists accordingly. So 1Servers.csv, 2Servers.csv, etc... The thing I can not figure out is how to get the original array name into the filename. Is there a variable that holds the list object name that can be accessed within the loop? Do I need to just do a separate single loop for each list?
You can try :
$1Servers = "Mach1","Mach2"
$2Servers = "Mach3","Mach4"
$serverList = $1Servers,$2Servers
$serverList | % {$i=0}{$i+=1;$_ | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($i)Servers.csv" -NoTypeInformation }
I take each list, and create new objects that I export in a CSV file. The way I create the file name is not so nice, I don't take the var name I just recreate it, so if your list is not sorted it will not work.
It would perhaps be more efficient if you store your servers in a hash table :
$1Servers = #{Name="1Servers"; Computers="Mach1","Mach2"}
$2Servers = #{Name="2Servers"; Computers="Mach3","Mach4"}
$serverList = $1Servers,$2Servers
$serverList | % {$name=$_.name;$_.computers | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($name).csv" -NoTypeInformation }
Much like JPBlanc's answer, I kinda have to kludge the filename... (FWIW, I can't see how you can get that out of the array itself).
I did this example w/ foreach instead of foreach-object (%). Since you have actual variable names you can address w/ foreach, it seems a little cleaner, if nothing else, and hopefully a little easier to read/maintain:
$1Servers = "apple.contoso.com","orange.contoso.com"
$2Servers = "peach.contoso.com","cherry.contoso.com"
$serverList = $1Servers,$2Servers
$counter = 1
foreach ( $list in $serverList ) {
$fileName = "{0}Servers.csv" -f $counter++
"FileName: $fileName"
foreach ( $server in $list ) {
"-- ServerName: $server"
}
}
I was able to resolve this issue myself. Because I wasn't able to get the object name through, I just changed the nature of the object. So now my server lists consist of two columns, one of which is the name of the list itself.
So...
$1Servers = += [pscustomobject] #{
Servername = $entry.Servername
Domain = $entry.Domain
}
Then...
$serverList = $usaServers,$devsubServers,$wtencServers,$wtenclvServers,$pcidevServers
Then I am able to use that second column to name the lists within my foreach loop.
Is there a function, method, or language construction allowing to retrieve a single column from a multi-dimensional array in Powershell?
$my_array = #()
$my_array += ,#(1,2,3)
$my_array += ,#(4,5,6)
$my_array += ,#(7,8,9)
# I currently use that, and I want to find a better way:
foreach ($line in $my_array) {
[array]$single_column += $line[1] # fetch column 1
}
# now $single_column contains only 2 and 5 and 8
My final goal is to find non-duplicated values from one column.
Sorry, I don't think anything like that exist. I would go with:
#($my_array | foreach { $_[1] })
To quickly find unique values I tend to use hashtables keys hack:
$UniqueArray = #($my_array | foreach -Begin {
$unique = #{}
} -Process {
$unique.($_[1]) = $null
} -End {
$unique.Keys
})
Obviously it has it limitations...
To extract one column:
$single_column = $my_array | foreach { $_[1] }
To extract any columns:
$some_columns = $my_array | foreach { ,#($_[2],$_[1]) } # any order
To find non-duplicated values from one column:
$unique_array = $my_array | foreach {$_[1]} | sort-object -unique
# caveat: the resulting array is sorted,
# so BartekB have a better solution if sort is a problem
I tried #BartekB's solution and it worked for me. But for the unique part I did the following.
#($my_array | foreach { $_[1] } | select -Unique)
I am not very familiar with powershell but I am posting this hoping it helps others since it worked for me.