I'm performing some pretty straightforward checksums on our linux boxes, but I now need to recreate something similar for our windows users. To give me a single checksum, I just run:
md5sum *.txt | awk '{ print $1 }' | md5sum
I'm struggling to recreate this in Windows, either with a batch file or Powershell. The closest I've got is:
Get-ChildItem $path -Filter *.txt |
Foreach-Object {
$hash = Get-FileHash -Algorithm MD5 -Path ($path + "\" + $_) | Select -ExpandProperty "Hash"
$hash = $hash.tolower() #Get-FileHash returns checksums in uppercase, linux in lower case (!)
Write-host $hash
}
This will print the same checksum results for each file to the console as the linux command, but piping that back to Get-FileHash to get a single output that matches the linux equivalent is eluding me. Writing to a file gets me stuck with carriage return differences
Streaming as a string back to Get-FileHash doesn't return the same checksum:
$String = Get-FileHash -Algorithm MD5 -Path (Get-ChildItem -path $files -Recurse) | Select -ExpandProperty "Hash"
$stringAsStream = [System.IO.MemoryStream]::new()
$writer = [System.IO.StreamWriter]::new($stringAsStream)
$writer.write($stringAsStream)
Get-FileHash -Algorithm MD5 -InputStream $stringAsStream
Am I over-engineering this? I'm sure this shouldn't be this complicated! TIA
You need to reference the .Hash property on the returned object from Get-FileHash. If you want a similar view to md5hash, you can also use Select-Object to curate this:
# Get filehashes in $path with similar output to md5sum
$fileHashes = Get-ChildItem $path -File | Get-FileHash -Algorithm MD5
# Once you have the hashes, you can reference the properties as follows
# .Algorithm is the hashing algo
# .Hash is the actual file hash
# .Path is the full path to the file
foreach( $hash in $fileHashes ){
"$($hash.Algorithm):$($hash.Hash) ($($hash.Path))"
}
For each file in $path, the above foreach loop will produce a line that similar to:
MD5:B4976887F256A26B59A9D97656BF2078 (C:\Users\username\dl\installer.msi)
The algorithm, hash, and filenames will obviously differ based on your selected hashing algorithm and filesystem.
The devil is in the details:
(known already) Get-FileHash returns checksums in uppercase while Linux md5sum in lower case (!);
The FileSystem provider's filter *.txt is not case sensitive in PowerShell while in Linux depends on the option nocaseglob. If set (shopt -s nocaseglob) then Bash matches filenames in a case-insensitive fashion when performing filename expansion. Otherwise (shopt -u nocaseglob), filename matching is case-sensitive;
Order: Get-ChildItem output is ordered according to Unicode collation algorithm while in Linux *.txt filter is expanded in order of LC_COLLATE category (LC_COLLATE="C.UTF-8" on my system).
In the following (partially commented) script, three # Test blocks demonstrate my debugging steps to the final solution:
Function Get-StringHash {
[OutputType([System.String])]
param(
# named or positional: a string
[Parameter(Position=0)]
[string]$InputObject
)
$stringAsStream = [System.IO.MemoryStream]::new()
$writer = [System.IO.StreamWriter]::new($stringAsStream)
$writer.write( $InputObject)
$writer.Flush()
$stringAsStream.Position = 0
Get-FileHash -Algorithm MD5 -InputStream $stringAsStream |
Select-Object -ExpandProperty Hash
$writer.Close()
$writer.Dispose()
$stringAsStream.Close()
$stringAsStream.Dispose()
}
function ConvertTo-Utf8String {
[OutputType([System.String])]
param(
# named or positional: a string
[Parameter(Position=0, Mandatory = $false)]
[string]$InputObject = ''
)
begin {
$InChars = [char[]]$InputObject
$InChLen = $InChars.Count
$AuxU_8 = [System.Collections.ArrayList]::new()
}
process {
for ($ii= 0; $ii -lt $InChLen; $ii++) {
if ( [char]::IsHighSurrogate( $InChars[$ii]) -and
( 1 + $ii) -lt $InChLen -and
[char]::IsLowSurrogate( $InChars[1 + $ii]) ) {
$s = [char]::ConvertFromUtf32(
[char]::ConvertToUtf32( $InChars[$ii], $InChars[1 + $ii]))
$ii ++
} else {
$s = $InChars[$ii]
}
[void]$AuxU_8.Add(
([System.Text.UTF32Encoding]::UTF8.GetBytes($s) |
ForEach-Object { '{0:X2}' -f $_}) -join ''
)
}
}
end { $AuxU_8 -join '' }
}
# Set variables
$hashUbuntu = '5d944e44149fece685d3eb71fb94e71b'
$hashUbuntu <# copied from 'Ubuntu 20.04 LTS' in Wsl2:
cd `wslpath -a 'D:\\bat'`
md5sum *.txt | awk '{ print $1 }' | md5sum | awk '{ print $1 }'
<##>
$LF = [char]0x0A # Line Feed (LF)
$path = 'D:\Bat' # testing directory
$filenames = 'D:\bat\md5sum_Ubuntu_awk.lst'
<# obtained from 'Ubuntu 20.04 LTS' in Wsl2:
cd `wslpath -a 'D:\\bat'`
md5sum *.txt | awk '{ print $1 }' > md5sum_Ubuntu_awk.lst
md5sum md5sum_Ubuntu_awk.lst | awk '{ print $1 }' # for reference
<##>
# Test #1: is `Get-FileHash` the same (beyond character case)?
$hashFile = Get-FileHash -Algorithm MD5 -Path $filenames |
Select-Object -ExpandProperty Hash
$hashFile.ToLower() -ceq $hashUbuntu
# Test #2: is `$stringToHash` well-defined? is `Get-StringHash` the same?
$hashArray = Get-Content $filenames -Encoding UTF8
$stringToHash = ($hashArray -join $LF) + $LF
(Get-StringHash -InputObject $stringToHash) -eq $hashUbuntu
# Test #3: another check: is `Get-StringHash` the same?
Push-Location -Path $path
$filesInBashOrder = bash.exe -c "ls -1 *.txt"
$hashArray = $filesInBashOrder |
Foreach-Object {
$hash = Get-FileHash -Algorithm MD5 -Path (
Join-Path -Path $path -ChildPath $_) |
Select-Object -ExpandProperty "Hash"
$hash.tolower()
}
$stringToHash = ($hashArray -join $LF) + $LF
(Get-StringHash -InputObject $stringToHash) -eq $hashUbuntu
Pop-Location
# Solution - ordinal order assuming `LC_COLLATE="C.UTF-8"` in Linux
Push-Location -Path $path
$hashArray = Get-ChildItem -Filter *.txt -Force -ErrorAction SilentlyContinue |
Where-Object {$_.Name -clike "*.txt"} | # only if `shopt -u nocaseglob`
Sort-Object -Property { (ConvertTo-Utf8String -InputObject $_.Name) } |
Get-FileHash -Algorithm MD5 |
Select-Object -ExpandProperty "Hash" |
Foreach-Object {
$_.ToLower()
}
$stringToHash = ($hashArray -join $LF) + $LF
(Get-StringHash -InputObject $stringToHash).ToLower() -ceq $hashUbuntu
Pop-Location
Output (tested on 278 files): .\SO\69181414.ps1
5d944e44149fece685d3eb71fb94e71b
True
True
True
True
Related
I would like to run a powershell script that can be supplied a directory name by the user and then it will check the directory, sub directories, and all file contents of those directories to compare if they are identical to each other. There are 8 servers that should all have identical files and contents. The below code does not appear to be doing what I intended. I have seen the use of Compare-Object, Get-ChildItem, and Get-FileHash but have not found the right combo that I am certain is actually accomplishing the task. Any and all help is appreciated!
$35 = "\\server1\"
$36 = "\\server2\"
$37 = "\\server3\"
$38 = "\\server4\"
$45 = "\\server5\"
$46 = "\\server6\"
$47 = "\\server7\"
$48 = "\\server8\"
do{
Write-Host "|1 : New |"
Write-Host "|2 : Repeat|"
Write-Host "|3 : Exit |"
$choice = Read-Host -Prompt "Please make a selection"
switch ($choice){
1{
$App = Read-Host -Prompt "Input Directory Application"
}
2{
#rerun
}
3{
exit; }
}
$c35 = $35 + "$App" +"\*"
$c36 = $36 + "$App" +"\*"
$c37 = $37 + "$App" +"\*"
$c38 = $38 + "$App" +"\*"
$c45 = $45 + "$App" +"\*"
$c46 = $46 + "$App" +"\*"
$c47 = $47 + "$App" +"\*"
$c48 = $48 + "$App" +"\*"
Write-Host "Comparing Server1 -> Server2"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c36 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server3"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c37 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server4"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c38 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server5"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c45 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server6"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c46 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server7"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c47 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server8"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c48 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
} until ($choice -eq 3)
Here is an example function that tries to compare one reference directory against multiple difference directories efficiently. It does so by comparing the most easily available informations first and stopping at the first difference.
Get all relevant informations about files in reference directory once, including hashes (though this could be more optimized by getting hashes only if necessary).
For each difference directory, compare in this order:
file count - if different, then obviously directories are different
relative file paths - if not all paths from difference directory can be found in reference directory, then directories are different
file sizes - should be obvious
file hashes - hashes only need to be calculated if files have equal size
Function Compare-MultipleDirectories {
param(
[Parameter(Mandatory)] [string] $ReferencePath,
[Parameter(Mandatory)] [string[]] $DifferencePath
)
# Get basic file information recursively by calling Get-ChildItem with the addition of the relative file path
Function Get-ChildItemRelative {
param( [Parameter(Mandatory)] [string] $Path )
Push-Location $Path # Base path for Get-ChildItem and Resolve-Path
try {
Get-ChildItem -File -Recurse |
Select-Object FullName, Length, #{ n = 'RelativePath'; e = { Resolve-Path $_.FullName -Relative } }
} finally {
Pop-Location
}
}
Write-Verbose "Reading reference directory '$ReferencePath'"
# Create hashtable with all infos of reference directory
$refFiles = #{}
Get-ChildItemRelative $ReferencePath |
Select-Object *, #{ n = 'Hash'; e = { (Get-FileHash $_.FullName -Algorithm MD5).Hash } } |
ForEach-Object { $refFiles[ $_.RelativePath ] = $_ }
# Compare content of each directory of $DifferencePath with $ReferencePath
foreach( $diffPath in $DifferencePath ) {
Write-Verbose "Comparing directory '$diffPath' with '$ReferencePath'"
$areDirectoriesEqual = $false
$differenceType = $null
$diffFiles = Get-ChildItemRelative $diffPath
# Directories must have same number of files
if( $diffFiles.Count -eq $refFiles.Count ) {
# Find first different path (if any)
$firstDifferentPath = $diffFiles | Where-Object { -not $refFiles.ContainsKey( $_.RelativePath ) } |
Select-Object -First 1
if( -not $firstDifferentPath ) {
# Find first different content (if any) by file size comparison
$firstDifferentFileSize = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Length -ne $_.Length } |
Select-Object -First 1
if( -not $firstDifferentFileSize ) {
# Find first different content (if any) by hash comparison
$firstDifferentContent = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Hash -ne (Get-FileHash $_.FullName -Algorithm MD5).Hash } |
Select-Object -First 1
if( -not $firstDifferentContent ) {
$areDirectoriesEqual = $true
}
else {
$differenceType = 'Content'
}
}
else {
$differenceType = 'FileSize'
}
}
else {
$differenceType = 'Path'
}
}
else {
$differenceType = 'FileCount'
}
# Output comparison result
[PSCustomObject]#{
ReferencePath = $ReferencePath
DifferencePath = $diffPath
Equal = $areDirectoriesEqual
DiffCause = $differenceType
}
}
}
Usage example:
# compare each of directories B, C, D, E, F against A
Compare-MultipleDirectories -ReferencePath 'A' -DifferencePath 'B', 'C', 'D', 'E', 'F' -Verbose
Output example:
ReferencePath DifferencePath Equal DiffCause
------------- -------------- ----- ---------
A B True
A C False FileCount
A D False Path
A E False FileSize
A F False Content
DiffCause column gives you the information why the function thinks the directories are different.
Note:
Select-Object -First 1 is a neat trick to stop searching after we got the first result. It is efficient because it doesn't process all input first and drop everything except first item, but instead it actually cancels the pipeline after the 1st item has been found.
Group-Object RelativePath -AsHashTable creates a hashtable of the file information so it can be looked up quickly by the RelativePath property.
Empty sub directories are ignored, because the function only looks at files. E. g. if reference path contains some empty directories but difference path does not, and the files in all other directories are equal, the function treats the directories as equal.
I've choosen MD5 algorithm because it is faster than the default SHA-256 algorithm used by Get-FileHash, but it is insecure. Someone could easily manipulate a file that is different, to have the same MD5 hash as the original file. In a trusted environment this won't matter though. Remove -Algorithm MD5 if you need more secure comparison.
A simple place to start:
compare (dir -r dir1) (dir -r dir2) -Property name,length,lastwritetime
You can also add -passthru to see the original objects, or -includeequal to see the equal elements. The order of each array doesn't matter without -syncwindow. I'm assuming all the lastwritetime's are in sync, to the millisecond. Don't assume you can skip specifying the properties to compare. See also Comparing folders and content with PowerShell
I was looking into calculated properties like for relative path, but it looks like you can't name them, even in powershell 7. I'm chopping off the first four path elements, 0..3.
compare (dir -r foo1) (dir -r foo2) -Property length,lastwritetime,#{e={($_.fullname -split '\\')[4..$_.fullname.length] -join '\'}}
length lastwritetime ($_.fullname -split '\\')[4..$_.fullname.length] -join '\' SideIndicator
------ ------------- ---------------------------------------------------------- -------------
16 11/12/2022 11:30:20 AM foo2\file2 =>
18 11/12/2022 11:30:20 AM foo1\file2 <=
I'm tying to automate gci in order to work on each row in a config file, where for each row I have as first column the path, and following it a list of files. Something like this:
C:\Users\*\AppData\Roaming\* *.dll
C:\Test file.txt,file2.txt
This means that gci will search for:
*.dll in C:\Users*\AppData\Roaming*
file.txt in C:\Test
file2.txt in C:\Test
In order to do this I'm creating dynamically the where condition in the script below. Here the ps script I'm using
foreach($line in Get-Content .\List.txt) {
try {
$path,$files = $line.split(' ')
$files = $files.split(',')
}
catch {
$path = $line
$files = "*.*"
}
if([string]::IsNullOrEmpty($files)){
$files = "*.*"
}
$filter = $files -join(" -or `$_.Name` -like ")
$filter = "`$_.Name` -like " + $filter
echo "Searching Path: $path, Pattern: $filter" | out-file -append -encoding ASCII -filepath .\result.txt
if ($path.Contains("*"))
{
gci -Path $path -Recurse | Where {$filter} | Select -ExpandProperty FullName | Out-String -Width 2048 | out-file -append -encoding UTF8 -filepath .\result.txt
}
else
{
gci -Path $path | Where {$filter} | Select -ExpandProperty FullName | Out-String -Width 2048 | out-file -append -encoding UTF8 -filepath .\result.txt
}
}
The problem is that the where filter is not considered. All files are returned
First attempt, suggested by
foreach($line in Get-Content .\List.txt) {
try {
$path,$files = $line.split(' ')
$files = $files.split(',')
}
catch {
$path = $line
$files = "*.*"
}
if([string]::IsNullOrEmpty($files)){
$files = "*.*"
}
$filter = $files -join(" -or `$_.Name -like ")
$filter = "`$_.Name -like " + $filter
$gciParams = #{
Path = $Path
Recurse = $Path.Contains('*')
}
"Searching Path: $path, Pattern(s): [$($files -join ',')]" | Add-Content -Path .\result.txt -Encoding ASCII
Get-ChildItem #gciParams | Where $filter | Select -ExpandProperty FullName | Add-Content -Path .\result.txt -Encoding UTF8
}
If you want to create a piece of code and defer execution of it until later, you need a Script Block.
A Script Block literal in PowerShell is just {}, so for constructing script block to filter based on a single comparison, you'd want to define $filter like this:
$filter = {$_.Name -like $filter}
At which point you can pass it directly as an argument to Where-Object:
Get-ChildItem $path |Where-Object $filter
... but since you want to test against multiple wildcard patterns, we'll need to write a slightly different filtering routine:
$filter = {
# Store file name of file we're filtering
$FileName = $_.Name
# Test ALL the patterns in $files and see if at least 1 matches
$files.Where({$FileName -like $_}, 'First').Count -eq 1
}
Since the $filter block now references $files to get the patterns, we can simplify your loop as:
foreach($line in Get-Content .\List.txt) {
try {
$path,$files = $line.split(' ')
$files = $files.split(',')
}
catch {
$path = $line
$files = "*.*"
}
if([string]::IsNullOrEmpty($files)){
$files = "*.*"
}
$gciParams = #{
Path = $Path
Recurse = $Path.Contains('*')
}
"Searching Path: $path, Pattern(s): [$($files -join ',')]" | Add-Content -Path .\result.txt -Encoding ASCII
Get-ChildItem #gciParams | Where $filter | Select -ExpandProperty FullName | Add-Content -Path .\result.txt -Encoding UTF8
}
Note that we no longer need to re-define $filter everytime the loop runs - the condition is based on the value of $files at runtime, so you can define $filter once before entering the loop and then reuse $filter every time.
The "trick" with using #gciParams (which allows us to remove the big if/else block) is known as splatting, but you could achieve the same result with Get-ChildItem -Path:$Path -Recurse:$Path.Contains('*') :)
I have been attempting to replicate the output of the Unix command find . -type f -exec md5sum {} + to compute hashes for all files in a directory and its subdirectories in PowerShell (i.e. two columns: MD5 hashes in lowercase and relative file paths). Is there a better way to have newlines marked with a line feed instead of a carriage return and a line feed than in the following code?
$Result = Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "*.*" -Recurse) | Select-Object #{N='Hash';E={$_.Hash.ToLower()}},#{N='Path';E={$_.Path | Resolve-Path -Relative}}
($Result | Format-Table -AutoSize -HideTableHeaders | Out-String).Trim() + "`n" -replace ' *\r\n',"`n" | Set-Content -NoNewline -Encoding ascii $ENV:USERPROFILE\Desktop\MYFOLDER\hashes.txt
The first line returns the hashes in lowercase and the relative paths.
The second line uses Format-Table to exclude the column headings (with the option -HideTableHeaders plus it uses the option -AutoSize to avoid truncating long relative file paths).
Out-String converts the output to a string and .Trim() removes the preceding and trailing blank lines.
Presently, I add a line feed
`+ "`n"`
then use
`-replace ' *\r\n',"`n"`
to do the replacement and eliminate any extra spaces at the end of each line. Finally, Set-Content uses the option -NoNewline to avoid a newline with a carriage return.
I've seen these similar questions: Replace CRLF using powershell
and
In PowerShell how to replace carriage return
Instead of replacing all "\r\n" newlines afterwards, this approach builds a list of strings and joins them together with the wanted newline character:
$path = 'YOUR ROOT DIRECTORY'
$algo = 'MD5'
$list = Get-ChildItem -Path $Path -File -Recurse | Get-FileHash -Algorithm $algo | ForEach-Object {
"{0} {1}" -f $_.Hash.ToLower(), (Resolve-Path $_.Path -Relative)
}
Set-Content -Path 'D:\checksums.txt' -Value ($list -join "`n") -NoNewline -Encoding Ascii -Force
Another way of doing this could be to use a StringBuilder object together with its AppendFormat method:
$path = 'YOUR ROOT DIRECTORY'
$algo = 'MD5'
$sb = New-Object -TypeName System.Text.StringBuilder
Get-ChildItem -Path $Path -File -Recurse | Get-FileHash -Algorithm $algo | ForEach-Object {
[void]$sb.AppendFormat("{0} {1}`n", $_.Hash.ToLower(), (Resolve-Path $_.Path -Relative))
}
# trim off the final "`n"
$sb.Remove($sb.Length -1, 1)
Set-Content -Path 'D:\checksums.txt' -Value $sb.ToString() -NoNewline -Encoding Ascii -Force
I am trying to make a script in PowerShell that analyzes recursevily a directory and gets all hashes MD5 from all files and from all files inside any directories inside the 1st one given.
After that, I want to compare all the hashes between each other to see which one is a copy, and then give an option to delete these copies or not.
At the moment I have this:
$UserInput=Read-Host
Get-ChildItem -Path $UserInput -Recurse
$someFilePath = $UserInput
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$hash = [System.BitConverter]::ToString($md5.ComputeHash([System.IO.File]::ReadAllBytes($someFilePath)))
$hash
The main problem is in the hash part, that I get an error in calling the "ReadAllBytes".
I also am doubting if creating an array so when I compare the hashes, if they are equal, put the copies in an array, so then deleting them is "easier".
What do you think? (I am also not sure if I am using right the "SomeFilePath", MD5 nor Hash).
If targeting PowerShell 5.1 on Windows 10, I'd use the Get-FileHash cmdlet and then group them by hash using the Group-Object cmdlet:
$UserInput = Read-Host
$DuplicateFiles = Get-ChildItem -Path $UserInput -Recurse -File |Group {($_|Get-FileHash).Hash} |Where Count -gt 1
foreach($FileGroup in $DuplicateFiles)
{
Write-Host "These files share hash $($FileGroup.Name)"
$FileGroup.Group.FullName |Write-Host
}
Try this:
$fileHashes = Get-ChildItem -Path $myFilePath -Recurse -File | Get-Filehash -Algorithm MD5
$doubles = $fileHashes | Group hash | ? {$_.count -gt 1} | % {$_.Group}
foreach($item in $doubles) {
Write-Output $item
}
Just do it
Get-ChildItem -Path $UserInput -Recurse -File | Get-FileHash | Group Hash | Where Count -gt 1
Short version :
gci -Path $UserInput -R -File | Get-FileHash | Group Hash | ? Count -gt 1
I am new to WinPowerShell. Please, would you be so kind to give me some code or information, how to write a program which will do for all *.txt files in a folder next:
1.Count characters for each line in the file
2. If length of line exceeds 1024 characters to create a subfolder within that folder and to move file there (that how I will know which file has over 1024 char per line)
I've tried though VB and VBA (this is more familiar to me), but I want to learn some new cool stuff!
Many thanks!
Edit: I found some part of a code that is beginning
$fileDirectory = "E:\files";
foreach($file in Get-ChildItem $fileDirectory)
{
# Processing code goes here
}
OR
$fileDirectory = "E:\files";
foreach($line in Get-ChildItem $fileDirectory)
{
if($line.length -gt 1023){# mkdir and mv to subfolder!}
}
If you are willing to learn, why not start here.
You can use the Get-Content command in PS to get some information of your files. http://blogs.technet.com/b/heyscriptingguy/archive/2013/07/06/powertip-counting-characters-with-powershell.aspx and Getting character count for each row in text doc
With your second edit I did see some effort so I would like to help you.
$path = "D:\temp"
$lengthToNotExceed = 1024
$longFiles = Get-ChildItem -path -File |
Where-Object {(Get-Content($_.Fullname) | Measure-Object -Maximum Length | Select-Object -ExpandProperty Maximum) -ge $lengthToNotExceed}
$longFiles | ForEach-Object{
$target = "$($_.Directory)\$lengthToNotExceed\"
If(!(Test-Path $target)){New-Item $target -ItemType Directory -Force | Out-Null}
Move-Item $_.FullName -Destination $target
}
You can make this a one-liner but it would be unnecessarily complicated. Use measure object on the array returned by Get-Content. The array being, more or less, a string array. In PowerShell strings have a length property which query.
That will return the maximum length line in the file. We use Where-Object to filter only those results with the length we desire.
Then for each file we attempt to move it to the sub directory that is in the same location as the file matched. If no sub folder exists we make it.
Caveats:
You need at least 3.0 for the -File switch. In place of that you can update the Where-Object to have another clause: $_.PSIsContainer
This would perform poorly on files with a large number of lines.
Here's my comment above indented and line broken in .ps1 script form.
$long = #()
foreach ($file in gci *.txt) {
$f=0
gc $file | %{
if ($_.length -ge 1024) {
if (-not($f)) {
$f=1
$long += $file
}
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
I was also mentioning labels and breaks there. Rather than $f=0 and if (-not($f)), you can break out of the inner loop with break like this:
$long = #()
foreach ($file in gci *.txt) {
:inner foreach ($line in gc $file) {
if ($line.length -ge 1024) {
$long += $file
break inner
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
Did you happen to notice the two different ways of calling foreach? There's the verbose foreach command, and then there's command | %{} where the iterative item is represented by $_.