So far I have a hash table with 2 values in it. Right now the code below, exports all the unique lines and gives me a count of how many times the line was referenced in 100's of xml files. This is one part.
I now need to find out which subfolder had the xml file in it that has that unique line of referenced in the hash table. Is this possible?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | Get-Content | %{$ht[$_] = $ht[$_]+1}
$ht
# To export to CSV:
$ht.GetEnumerator() | select key, value | Export-Csv D:\output.csv
To get file path to your output, you need to assign it to a variable in the first pipe.
Is this something similar to what you need?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | %{$path = $_.FullName; Get-Content $path} | % { $ht[$_] = $ht[$_] + $path + ";"}
The code above will return a hash-table in "config line" = "count" format.
EDIT:
If you need to return three elements (unique line, count and array of paths where it was found) it gets more complicated. Here is a code that will return an array of PSObjects. Each contains info for one unique line in XML files.
$ht = #()
$files = Get-ChildItem -recurse -Filter *.xml
foreach ($file in $files) {
$path = $file.FullName
$lines = Get-Content $path
foreach ($line in $lines) {
if ($match = $ht | where {$_.line -EQ $line}) {
$match.count = $match.count + 1
$match.Paths += $path
} else {
$ht += new-object PSObject -Property #{
Count = 1
Paths = #(,$path)
Line = $line }
}
}
}
$ht
I'm sure it can be shortened and optimized, but hopefully it is enough to get you started.
Related
I'm trying to get the script to read the first line of files.txt, grab the data metadata requested and then output the .xml, move onto the next line and repeat.
I expect each line to have its own individual file with the meta data and then the next line to do the same.
Currently it creates all the individual files, but the data is combined and duplicated across them.
The files.txt contains the full path and files that is being
collected with the metadata e.g.
D:\data\testscript.ps1
D:\data\workingfile.doc
C:\Windows\temp\output.txt
Filesv2.txt contain the filename of the xml output and is consistent
in array with files.txt e.g
D_data_testscript.ps1
D_data_workingfile.doc
C_Windows_temp_output.txt
$logdir = "C:\Users\gnome\Documents"
$inputPath = Get-Content -Path "C:\Users\gnome\Documents\files.txt"
$inputSave = Get-Content -Path "C:\Users\gnome\Documents\filesv2.txt"
#Get-*
$hash = Get-FileHash -Path $inputPath
$acl = Get-Acl -Path $inputPath | Select-Object *
$metadata = Get-ChildItem -Path $inputPath | Select-Object *
#Loop each directory in $inputPath
#ForEach ($path in $inputPath){
$output = ForEach ($path in $inputPath){
Write-host Checking $path
ForEach($inputSave in $inputSave){
#{
#$log = "$logdir\$inputSave.xml"
sha256Hash = $hash
acl = $acl
metadata =$metadata
}
$output | Export-Clixml "$logdir\test1_$inputSave.xml"
}
}
'''
From your comment, files.txt stores full path and filenames and filesv2.txt has new names for these files according to some naming convention to be used for the output xml filename.
Having both arrays separate from eachother in separate files is somewhat accident-prone, because all that links the file name to the convention name is the index in both arrays..
Below first creates a Hashtable from these arrays assuming the indices match and both arrays have the same number of elements
$logdir = "C:\Users\gnome\Documents"
$inputPath = #(Get-Content -Path "C:\Users\gnome\Documents\files.txt") # full path and filenames
$inputSave = #(Get-Content -Path "C:\Users\gnome\Documents\filesv2.txt") # naming convention for the output
# create a Hashtable where the input from files.txt is key and the naming convention for the output xml is value
$filesHash = #{}
for ($i = 0; $i -lt $inputPath.Count; $i++) {
$filesHash[$inputPath[$i]] = $inputSave[$i]
}
# now iterate
$filesHash.GetEnumerator() | ForEach-Object {
Write-host Checking $_.Key
$output = [PsCustomObject]#{
sha256Hash = Get-FileHash -Path $_.Key -Algorithm SHA256
acl = Get-Acl -Path $_.Key
metadata = Get-Item -Path $_.Key
}
$outFile = Join-Path -Path $logdir -ChildPath ('{0}.xml' -f $_.Value)
$output | Export-Clixml -Path $outFile
}
A little background on my intent:
The purpose of the first function, GetFileNames, is to read through directories in a main folder that contains digital information on a number of facilities, and return the list of files with their BaseName.
The second function, SplitFileNames, takes the output of GetFileNames, and splits the name into 3 parts using the underscore as a delimiter. The names of the files are all structured like this; SITECODE_FACNUM_FILETYPE, so for example, DKFX_00099_MAP. Each of the 3 parts form individual columns that I then import into a database in Access.
Also, I've never used Powershell before this, and the experience I've had so far is basically a combination of reverse engineering and splicing code from a number of sources, and obviously some of my own writing.
My questions/respectful requests are:
I'm almost certain there's got to be a better way to do what I'm trying to accomplish, and that I just don't have a solid enough understanding with the information I've gone through to make it happen. So I would definitely appreciate any recommendations at all for improvement.
I also need the hyperlink information contained in FullName as column as well, but unfortunately I could never get it to work correctly since I have to split only the BaseName up into 3 pieces.
Thank you!
$targetPath = "C:\Users\mattm\Google Drive\TestDatabase\"
$outputPath_1 = "C:\Users\mattm\Google Drive\Powershell Scripts\Facilities Database Scanner\outputScan_1.csv"
$outputPath_2 = "C:\Users\mattm\Google Drive\Powershell Scripts\Facilities Database Scanner\outputScan_2.csv"
$delimPath = "_"
Function GetFileNames([string]$path, [string]$outputFile) {
$list = Get-ChildItem $path -Recurse | where {!$_.PSIsContainer}
$list | Select-Object BaseName | Export-Csv -NoTypeInformation $outputFile
}
GetFileNames $targetPath $outputPath_1
Function SplitFileNames([string]$inputFile, [string]$outputFile) {
$inputData = Get-Content $inputFile | select -Skip 1
$array = #()
$outArray = #()
$inputData | Foreach{
$elements = $_.split($delimPath)
$array += ,#($elements[0], $elements[1], $elements[2])
}
Foreach($value in $array){
$outArray += New-Object PSObject -Property #{
'SiteCode' = $value[0]
'FacilityNumber' = $value[1]
'FileTypeCode' = $value[2]
}
}
$outArray | Select-Object "SiteCode","FacilityNumber","FileTypeCode" | ConvertTo-Csv -NoTypeInformation | % {$_ -replace '"',""} | Out-File $outputFile -fo -en ascii
}
SplitFileNames $outputPath_1 $outputPath_2
You might take a little time to learn the basics of Powershell. Reverse Engineering might not be the best teacher for this. ;-) Start with something like this:
$Path = 'Enter your path here'
Get-ChildItem -Path $Path -Recurse -File |
ForEach-Object{
$One,$Two,$Three = $_.BaseName -split '_'
[PSCustomObject][ordered]#{
SiteCode = $One
FacNum = $Two
FileType = $Three
BaseName = $_.BaseName
Hyperlink = $_.FullName
}
}
I'm extremely happy with the results from this new version!
$moduleCustomUIMessage ='path to UI module'
$outputFile_3 = 'output path'
$Path = 'target path of directory to scan'
Import-Module $moduleCustomUIMessage
$results = Get-ChildItem -Path $Path -Recurse | where {!$_.PSIsContainer} |
ForEach-Object{
$One,$Two,$Three = $_.BaseName -split '_'
[PSCustomObject]#{
SiteCode = $One
FacNum = $Two
FileType = $Three
BaseName = $_.BaseName
Hyperlink = $_.FullName
}
} | Select-Object SiteCode, FacNum, FileType, BaseName, Hyperlink | Export-Csv -NoTypeInformation $outputFile_3
CustomUIMessage $outputFile_3
I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$_.FullName
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
Get-Unique
}
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array
I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
}
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.
i am trying to append binary AFP files into one file. When I used my code below the same file gets written three times instead of the three files I have getting appended to one file. Why would the value of $bytes not change? Get-Content was unsuccessful without causing errors in the AFP file.
$dira = "D:\User1\Desktop\AFPTest\"
$list = get-childitem $dira -filter *.afp -recurse | % { $_.FullName } | Sort-Object
foreach($afpFile in $list){
$bytes = [System.IO.File]::ReadAllBytes($afpFile)
[io.file]::WriteAllBytes("D:\User1\Desktop\AFPTest\Content.afp",$bytes)
}
The script below is after I made a change to store the $bytes to a $data variable and then write out $data.
$dira = "D:\User1\Desktop\AFPTest\"
$list = get-childitem $dira -filter *.afp -recurse | % { $_.FullName } | Sort-Object -descending
foreach($afpFile in $list){
Write-Host $afpFile
$bytes = [System.IO.File]::ReadAllBytes($afpFile)
$data += $bytes
}
[io.file]::WriteAllBytes("D:\User1\Desktop\AFPTest\Content.afp",$bytes)
I attempted to combine them manually by setting each of the three files to a variable and then adding them to the $data array but the same issue happens of the repeated image. The code is below.
$dira = "D:\User1\Desktop\AFPTest\"
$list = get-childitem $dira -filter *.afp -recurse | % { $_.FullName } | Sort-Object
$file3 = [System.IO.File]::ReadAllBytes("D:\User1\Desktop\AFPTest\000001.afp")
$file2 = [System.IO.File]::ReadAllBytes("D:\User1\Desktop\AFPTest\000002.afp")
$file1 = [System.IO.File]::ReadAllBytes("D:\User1\Desktop\AFPTest\000003.afp")
$data = $file1 + $file2
[io.file]::WriteAllBytes("D:\User1\Desktop\AFPTest\AFP.afp",$data)
WriteAllBytes() always creates a new file. You want to append. Try this:
...
$bytes = #()
foreach($afpFile in $list) {
$bytes += [System.IO.File]::ReadAllBytes($afpFile)
}
[io.file]::WriteAllBytes("D:\User1\Desktop\AFPTest\Content.afp",$bytes)
I have some 6 files which are created dynamically (so,I dont know the contents). I need to compare these 6 files (exactly speaking compare one file with 5 others) and see what all contents in the file 1 are matching with the other 5. The contents which are matching should be saved, others need to be deleted.
I coded something like below, but is deleting everything (which are matching too).
$lines = Get-Content "C:\snaps.txt"
$check1 = Get-Content "C:\Previous_day_latest.txt"
$check2 = Get-Content "C:\this_week_saved_snaps.txt"
$check3 = Get-Content "C:\all_week_latest_snapshots.txt"
$check4 = Get-Content "C:\each_month_latest.txt"
$check5 = Get-Content "C:\exclusions.txt"
foreach($l in $lines)
{
if(($l -notmatch $check1) -and ($l -notmatch $check2) -and ($l -notmatch $check3) -and ($l -notmatch $check4))
{
Remove-Item -Path "C:\$l.txt"
}else
{
#nothing
}
}
foreach($ch in $check5)
{
Remove-Item -Path "C:\$ch.txt"
}
Contents of 6 files will be as shown below:
$lines
testinstance-01-07-15-08-00
testinstance-10-07-15-23-00
testinstance-13-02-15-13-00
testinstance-15-06-15-23-00
testinstance-19-01-15-23-00
testinstance-23-05-15-20-00
testinstance-27-03-15-23-00
testinstance-28-02-15-23-00
testinstance-29-07-15-08-00
testinstance-30-04-15-23-00
testinstance-30-06-15-23-00
testinstance-31-01-15-23-00
testinstance-31-12-14-23-00
$check1
testinstance-29-07-15-08-00
$check2
testinstance-23-05-15-20-00
testinstance-27-03-15-23-00
$check3
testinstance-01-07-15-23-00
testinstance-13-02-15-13-00
testinstance-19-01-15-23-00
$check4
testinstance-28-02-15-23-00
testinstance-30-04-15-23-00
testinstance-30-06-15-23-00
testinstance-31-01-15-23-00
$check5
testinstance-31-12-14-23-00
I've read about compare-object. But not sure how that can be implemented in my case as contents of all 5 files will be different and all those contents should be saved from deletion. Can someone please guide me to achieve what I said.? Any help would be really appreciated.
I would create an array of the files to check so you can simply add new files without modifying other parts of your script.
I use the where cmdlet which filters all lines that are in the reference file using -in condition and finally overwrite the file:
$referenceFile = 'C:\snaps.txt'
$compareFiles = #(
'C:\Previous_day_latest.txt',
'C:\this_week_saved_snaps.txt',
'C:\all_week_latest_snapshots.txt',
'C:\each_month_latest.txt',
'C:\exclusions.txt'
)
# get the content of the reference file
$referenceContent = (gc $referenceFile)
foreach ($file in $compareFiles)
{
# get the content of the file to check
$content = (gc $file)
# filter all contents from the file to check which are in the reference file and save it
$content | where { $_ -in $referenceContent } | sc $file
}
You can use the -contains operator to compare array contents. If you open all the files you want to check and store into an array, you can compare that with the reference file:
$lines = Get-Content "C:\snaps.txt"
$check1 = "C:\Previous_day_latest.txt"
$check2 = "C:\this_week_saved_snaps.txt"
$check3 = "C:\all_week_latest_snapshots.txt"
$check4 = "C:\each_month_latest.txt"
$check5 = "C:\exclusions.txt"
$checklines = #()
(1..5) | ForEach-Object {
$comp = Get-Content $(Get-Variable check$_).value
$checklines += $comp
}
$matches = $lines | ? { $checklines -contains $_ }
If you switch the -contains to -notcontains you'll see the three lines that don't match
The other answers here are great but I wanted to show you that Compare-Object could still work. You need to use it in a loop however. Just to try and show something else I included a simple use of Join-Path for building the array of checks. Basically we are saving some typing when you move your files to a production area. Update one path instead of more.
$rootPath = "C:\"
$fileNames = "Previous_day_latest.txt", "this_week_saved_snaps.txt", "all_week_latest_snapshots.txt", "each_month_latest.txt", "exclusions.txt"
$lines = Get-Content (Join-path $rootPath "snaps.txt")
$checks = $fileNames | ForEach-Object{Join-Path $rootPath $_}
ForEach($check in $checks){
Compare-Object -ReferenceObject $lines -DifferenceObject (Get-Content $check) -IncludeEqual |
Where-Object{$_.SideIndicator -eq "=="} |
Select-Object -ExpandProperty InputObject |
Set-Content $check
}
So we take each file path and use Compare-Object in a loop comparing each to the $lines array. Using -IncludeEqual we find the lines that both files share and write those back to the file.
Depending on how many checks you have and where they are it might be easier to have this line to build the array $checks
$checks = Get-ChildItem "C:\" -Filter "*.txt" | Select-Object -Expand FullName