Extract zip file and concatenate the extracted content - powershell

I have multiple zip files which I want to unzip and concatenate its content to a single file. I do not need each of these unzipped files thus prefer that they never get created if possible.
I need to do this in powershell and only have access to version 2.0.
Currently using 7zip to perform the unzipping. If I do the unzip without attempting to concatenate the output, I end up with all the extracted files using following command.
# all the zip files (10000 of them)
$logs = Get-ChildItem $folders[$i] -filter "*$stake).zip"
foreach ($log in $logs) {
Write-Host $log
& '7z' e $log
}
Attempting to concatenate all the files via following command but I still end up with all the extracted files and the combined.txt file contains repetition of following text. Please advice. Thanks.
# all the zip files (10000 of them)
$logs = Get-ChildItem $folders[$i] -filter "*$stake).zip"
foreach ($log in $logs) {
Write-Host $log
& '7z' e $log >> combined.txt
}
Text being repeated
7-Zip [64] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04
Scanning the drive for archives: 1 file, 7796 bytes (8 KiB)
Extracting archive: 0000_0000_0000 Game Log (name at 86% staking
25).zip
-- Path = 0000_0000_0000 Game Log (name at 86% staking 25).zip Type = zip Physical Size = 7796
Further clarifications:
I have 10000 zip files.
Upon extraction, each file content = "Hello".
I want to concatenate all these file contents into 1 single file.
Thus single file content will be -
"Hello"
"Hello"
"Hello"
... 10000 times
All I want is this single file which has the concatenated data.

AFAIK ZIP files need to be extracted before they can be read. Even in the case of ZIP browsers, when you "open" a file in it the file is extracted to a temporary location first.
So that's basically what you need to be doing:
$logs = Get-ChildItem $folders[$i] -filter "*$stake).zip"
New-Item -Name "Temp" -ItemType Directory
$output = #()
foreach ($log in $logs) {
Write-Host $log
& '7z' e $log -o"Temp" | Out-Null
Get-ChildItem "Temp" | Foreach-Object {
$output += Get-Content $_.FullName
Remove-Item $_
}
}
Remove-Item "Temp"
$output | Out-File "FullLog.txt"
This goes through each file, extracts it (ignores output of 7zip as it's informational only), reads the content of the extracted file then deletes it. Afterwards it cleans up and writes the total output to a file.

Related

ZIP a folder of files into multiple "stand-alone" ZIP files of maximum size

I have the following problem: I'm writing a Powershell script that, after generating a lot of PDF files (each in a separate subfolder of a main folder), needs to zip them.
Here is the difficulty/problem:
The destination where those ZIP files need to be unzipped (a Moodle website) has a filesize limit of 50MB, hence I need to generate multiple ZIP files of this maximum size.
Each ZIP file, however, needs to be "standalone", i.e. it must be unzipped by itself (automatically by the website), without requiring the presence of the other files.
Here below is what I've tried so far:
Direct Powershell approach:
Compress-Archive -Path "SourceFolder" -DestinationPath "Result.zip"
This however only generates a single ZIP file (of "huge" dimension).
7-Zip (command line tool) approach:
7za.exe" a -v50m "Result.zip" "SourceFolder"
This correctly generates a lot of 50MB zip files, however of the form "Result.zip.001 , Result.zip.002" which, taken alone, cannot be uncompressed as individual zip files.
Can somebody suggest me a way to achieve my goal, i.e. to separate those files into individual ZIP files of size no larger than 50MB (of course the ZIP will be in general smaller than 50MB, since the PDF files have arbitary size, but all < 50MB for sure)? The result should look like:
"Result1.zip" [49MB] , "Result2.zip" [47MB] , ... , Result16.zip [48MB], Result17.zip[7MB]
Thank you very much for any suggestion! :)
I believe you look for something like this:
$filesFolderPath = "Path of where the PDFs are"
$archivePath = "Path to where the archive will be"
$files = Get-ChildItem -Path $filesFolderPath -Recurse
foreach ($file in $files) {
$fileName = $file.Name
$filePath = $file.FullName
Compress-Archive -Path $filePath -DestinationPath "$($archivePath)\$($fileName).zip"
}
This script will basically get the list of the files (assuming you only have PDFs in the location) and will archive each one.
If you want to be sure it only archives the PDFs please change this line:
$files = Get-ChildItem -Path $filesFolderPath -Recurse | Where-Object { $_.Extension -eq ".pdf" }
UPDATE (changed the script to allow archive of multiple files if their combined size isn't over 50GB):
$filesFolderPath = "Path of where the PDFs are"
$archivePath = "Path to where the archive will be"
$files = Get-ChildItem -Path $filesFolderPath -Recurse | Where-Object { $_.Extension -eq ".pdf" }
$filesToArchive = #()
$archiveSize = 0
$counter = 0
foreach ($file in $files) {
$fileSize = [Math]::Round(($file.Length / 1MB), 2)
# Check if the combined size of the files to be archived exceeds 50 MB
if (($archiveSize + $fileSize) -gt 49) {
# Create the archive if the combined size exceeds 50 MB
$counter++
Compress-Archive -Path $filesToArchive.FullName -DestinationPath "$($archivePath)\Archive-$counter.zip"
$filesToArchive = #()
$archiveSize = 0
}
# Add the file to the list of files to be archived
$filesToArchive += $file
$archiveSize += $fileSize
}
# Create the final archive if there are any remaining files to be archived
if ($filesToArchive.Count -gt 0) {
$counter++
Compress-Archive -Path $filesToArchive.FullName -DestinationPath "$($archivePath)\Archive-$counter.zip"
}
UPDATE 2 (added the warning and archiving of a single file if exceeds the 50MB size).
All you have to do in with this update is to replace the foreach statement with the below code.
foreach ($file in $files) {
$fileSize = [Math]::Round(($file.Length / 1MB), 2)
# Check if the file is bigger than 49MB to archive it separately and write a warning
if ($fileSize -gt 49) {
$counter++
Compress-Archive -Path $file.FullName -DestinationPath "$($archivePath)\Archive-$counter.zip"
Write-Warning "The archive number '$($counter)' has a single file bigger than 50MB"
} else {
# Check if the combined size of the files to be archived exceeds 50 MB
if (($archiveSize + $fileSize) -gt 49) {
# Create the archive if the combined size exceeds 50 MB
$counter++
Compress-Archive -Path $filesToArchive.FullName -DestinationPath "$($archivePath)\Archive-$counter.zip"
$filesToArchive = #()
$archiveSize = 0
}
# Add the file to the list of files to be archived
$filesToArchive += $file
$archiveSize += $fileSize
}
}

How do I most efficiently move, rename files and log this action?

I have the following CSV list (in reality 1000s of lines):
needle,code
123456,AB
121212,BB
33333333,CVV
And I have a directory (C:\old_files) containing PDF files (again, 1000s in reality):
dsadsadsa.343222.dsads23213jkjl.saddsa.pdf
dsadsadsa.123456.dsads23213jkjl.saddsa.pdf
dsadsadsa.111111.dsads23213jkjl.saddsa.pdf
dsadsadsa.33333333.dsads23213jkjl.saddsa.pdf
dsadsadsa.33333333.fsdgdsfdsfdsf.dsad.pdf
For each needle in the CSV:
I have to see if there is a PDF containing that needle (there might be 0 or more matches)
If there is a match, I have to
make a copy of the file into a separate folder (D:\new_files)
rename the copied file by prepending the respective code to the name
write an entry into the log.
For the example, I have a match for 123456 and 2 for 33333333, so I have to move a copy of these files into D:\new_files and rename them into:
AB.dsadsadsa.123456.dsads23213jkjl.saddsa.pdf
CVV.dsadsadsa.33333333.dsads23213jkjl.saddsa.pdf
CVV.dsadsadsa.33333333.fsdgdsfdsfdsf.dsad.pdf
The logfile would look like (format needle,code,oldfilepath,newfilepath):
123456,AB,C:\old_files\dsadsadsa.123456.dsads23213jkjl.saddsa.pdf,D:\new_files\AB.dsadsadsa.123456.dsads23213jkjl.saddsa.pdf
33333333,CVV,C:\old_files\dsadsadsa.33333333.dsads23213jkjl.saddsa.pdf,D:\new_files\CVV.dsadsadsa.33333333.dsads23213jkjl.saddsa.pdf
33333333,CVV,C:\old_files\dsadsadsa.33333333.fsdgdsfdsfdsf.dsad.pdf,D:\new_files\CVV.dsadsadsa.33333333.fsdgdsfdsfdsf.dsad.pdf
It is important that I only loop over the files in the directory once, because iterating through all files in a ForEach loop for each needle takes way too long. So with thanks to this forum I'm building a hashtable first:
$pairs = #{}
Import-CSV .\data.csv | ForEach-Object { $pairs[$_.needle] = $_.code+"." }
Get-ChildItem "C:\old_files" | Rename-Item -NewName { "D:\new_files\" + $pairs[$_.Name.Split('.')[1]] + $_.Name }
My first problem here: I am unable to move the file into the new folder.
Q1 How do I properly copy a file from C:\old_files into D:\new_files and rename it?
My second problem: I don't understand how I can add code to the above code.
Q2 How can I create the logfile for each match (and therefore: copied and renamed file)?
You need to actually check if you have a match before copying the matching file.
Get-ChildItem "C:\old_files" | ForEach-Object {
$n = ($_.Name -split '.')[1]
if ($pair[$n]) {
$oldname = $_.FullName
$newname = Join-Path 'C:\new_files' ($pair[$n] + $_.Name)
Copy-Item $oldname $newname
}
}
Do the logging after the copy operation:
Copy-Item $oldname $newname
if ($?) {
# log success information here
} else {
# log error information here
}

Code for automating incremented ZIP compression?

I'm trying to ZIP a folder of 800 pictures, with each ZIP file containing only 10 or less pictures, so I should end up with 80 ZIP files. If anyone knows the BAT file code to do this, I would be very appreciative. I also do NOT want to delete the files after they've been zipped.
I know that I'll probably be using 7-Zip, but I just can't seem to find an answer for this anywhere. Thanks!
Try the following PowerShell:
# Setup variables (Change)
$ZipFolder = "T:\YourFolder\WithFiles\ToZip"
$7Zip = "C:\Program Files\7-Zip\7z.exe"
$NewZipsFolder = "T:\FolderToPut\AllOfThe\ZipsIn"
# Script Variables
$pendingFiles = #()
$fileNumber = 1
# Get a list of all the files to be zipped
Get-ChildItem $ZipFolder | sort $_.FullName | ForEach-Object { $pendingFiles += $_.FullName }
# While there are files still to zip
While($pendingFiles){
# Select first 10 files to zip and zip them
$ToZip = $pendingFiles | Select -First 10
& $7Zip "a" "$NewZipsFolder\File-$fileNumber.7z" $ToZip
# Remove first 10 zipped files from pending files array
$pendingFiles = $pendingFiles | Where-Object { $ToZip -notcontains $_ }
$fileNumber++
}
This will create a list of all the file that need to be zipped. Then zip them up in batches of 10 files using 7z.exe (7-zip).
Note: For the variables $ZipFolder & $NewZipsFolder do not put a trailing backslash on the folder paths (\).
You could store an list of files in Powershell using something along the lines of
$fileList = Get-Item -Path "C:\MyPhotosDir\*"
Then set an alias for 7zip
set-alias sz "$env:ProgramFiles\7-Zip\7z.exe"
Then create a loop with a counter along the lines of
$i = 1
foreach $file in $fileList
#Build foder name name
$folderDir = "C:\MyPhotoArchive$($i - ($i % 10) + 1).7z"
sz a -t7z $folderDir $file.filename
end for
I have been writing in VB for a short while and so apologies if the Powershell syntax is a bit off. Essentially that should add 10 files to "C:\MyPhotoArchive1", 10 files to "C:\MyPhotoArchive2". I haven't added files to an archive using 7zip for a long time but I think the call just uses a a and should add files to an archive, creating one when needed.

How do I select files in a folder based on part of filename and zip them in Powershell?

I'm fairly new to Powershell(using Powershell 2.0 btw) and am trying to make a script that does several things(this is my 3rd script or so). I have most things in place but the last thing remaining is to group files of different types (xml, tfw and tif) in a folder, based on the first part of the filename(first three characters) and then zip these files into several zip-files with name like the first 3 characters, either in the same location or in a new one.
Sample of folder content:
001.tif
001.tfw
001.metadata.xml
002.tif
002.tfw
002.metadata.xml
003.tif
003.tfw
003.metadata.xml
003_svel.tif
003_svel.tfw
003_svel.metadata.xml
Wanted result:
001.zip containing 001.tif, 001.tfw, 001.metadata.xml
002.zip containing 002.tif, 002.tfw, 002.metadata.xml
003.zip containing 003.tif, 003.tfw, 003.metadata.xml, 003_svel.tif,
003_svel.tfw and 003_svel.metadata.xml
I have installed 7-zip to do the zipping and am using the commandline version. I've used 7-zip local on some testfiles and got it to work, but then it was only tif-files. I have a source folder where I search for the latest created folder and then process the files in it.
This is what I have so far(Powershell 2.0):
$dir_source = "c:\Test"
$new_folder = Get-ChildItem $dir_source -Recurse |
Where { $_.PSIsContainer} |
Sort-Object LastWriteTime -Descending |
Select-Object -ExpandProperty Fullname <-First 1
Get-ChildItem $new_folder -recurse -Exclude metafile.xml |
Group-Object {$_.Name.Substring(0,3)}
This gives me a list of grouped files in the lates created folder based on the first 3 characters in the filename. It also show what files are in each group.
Like below:
Count Name Group
----- ---- -----
3 003 {C:\Test\20150708 063255_B\003.metafile.xml, C:\Test\20150708 063255_B\003.tfw, C:\Test\20150708 063255_B\003.tif}
6 004 {C:\Test\20150708 063255_B\004.metafile.xml, C:\Test\20150708 063255_B\004.tfw, C:\Test\20150708 063255_B\004.tif,C:\Test...
6 009 {C:\Test\20150708 063255_B\009.metafile.xml, C:\Test\20150708 063255_B\009.tfw, C:\Test\20150708 063255_B\009.tif,C:\Test...
Now my next step ist to take these groups and zip them. Ideally create these zip-files in a different destination directory (I believe I can change this when setting the $directory- variable in the script below.)
foreach ($group in $dataset) {
$name = $file.name
$directory = $file.DirectoryName
$zipFile = $file.Name + ".zip"
sz a -t7z "$directory\$zipfile" "$directory\$name"
This last code is causing some trouble. I either get the message:
7-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 Error:
c:\Test\Dest_test460.zip is not supported archive System error:
Incorrect function.
,or
WARNING: Cannot find 1 file 7-Zip (A) 9.20 Copyright (c) 1999-2010
Igor Pavlov 2010-11-18 Scanning \460: WARNING: The system cannot
find the file specified.
,or it starts zipping all files on my userprofile into a zip-file. Depending on changes I do to the $group-value. I believe there are one ore more basic errors in my script causing this, and this is where I'm asking for some help. It may be that I am approaching this the wrong way by first grouping the files I want and then try to zip them?
Anyone that can see my error or give me some hint to what I have to do?
Thanks for your time!
Lee Holmes New-ZipFile do the job, he has two versions one of them using the ICSharpCode.SharpZipLib.dll to compress, and the other not require it, i wrapped the 2nd one into a function:
Function New-ZipFile {
param(
## The name of the zip archive to create
$Path = $(throw "Specify a zip file name"),
## Switch to delete the zip archive if it already exists.
[Switch] $Force
)
Set-StrictMode -Version 3
## Create the Zip File
$zipName = $executionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($Path)
## Check if the file exists already. If it does, check
## for -Force - generate an error if not specified.
if(Test-Path $zipName)
{
if($Force)
{
Remove-Item $zipName -Force
}
else
{
throw "Item with specified name $zipName already exists."
}
}
## Add the DLL that helps with file compression
Add-Type -Assembly System.IO.Compression.FileSystem
try
{
## Open the Zip archive
$archive = [System.IO.Compression.ZipFile]::Open($zipName, "Create")
## Go through each file in the input, adding it to the Zip file
## specified
foreach($file in $input)
{
## Skip the current file if it is the zip file itself
if($file.FullName -eq $zipName)
{
continue
}
## Skip directories
if($file.PSIsContainer)
{
continue
}
$item = $file | Get-Item
$null = [System.IO.Compression.ZipFileExtensions]::CreateEntryFromFile(
$archive, $item.FullName, $item.Name)
}
}
finally
{
## Close the file
$archive.Dispose()
$archive = $null
}
}
To use it for example:
dir c:\folder -Recurse | New-ZipFile -Path c:\temp\folder.zip
The Source file(for the one that use the ICSharpCode): http://poshcode.org/2202
Use my Previous Answer Function New-ZipFile and use with this one:
$FolderName = "C:\temp"
$Files = dir $FolderName
$prfx = #()
foreach ($file in $files)
{
$prfx += $file.Name.Substring(0,3)
}
$prfx = $prfx | Group
foreach ($Prf in $prfx)
{
$prf = $prf.name.ToString()
dir $Files | ? {$_.Name -match "^$prf"} | New-ZipFile -Path "$foldername\$prf.zip"
}
According to your example It will output 3 zip files like you want,
it will always use the first 3 letters of the file, you can the change this in this line $prfx += $file.Name.Substring(0,3) and set it different if needed.
Good Luck
Could not get the suggested solution to work, had problems configuring ICSharpCode. I also wanted to use 7zip, since it is still under some updating regime.
Ended up copying my files to temp folders based on the filenames and then zip each folder. After that delete the tempfolders with files. Ugly code, but it does the job.
# Create folder based on filename and copy files into respective folder
Get-ChildItem $new_folder -Filter *.* | Where-Object {!$_.PSIsContainer} | Foreach-Object{
$dest = Join-Path $_.DirectoryName $_.Name.SubString(0,3)
if(!(Test-Path -Path $dest -PathType Container))
{
$null = md $dest
}
$_ | Copy-Item -Destination $dest -Force
}
# Create zip-file of each folder
dir $new_folder | Where-Object { $_.PSIsContainer } | ForEach-Object { sz a -t7z -mx9 "$dir_dest\$_.zip" $_.FullName }
# Delete temp-folders
dir $new_folder | Where-Object { $_.PSIsContainer } | Remove-Item -Recurse

How do I make powershell script transverse zip files and report based off select-string -pattern

I have the following that is working but I need to also have the ability to read the contents of compressed file (zip)
function Search-Files {
param ([string[]]$Servers, [string]$SearchPath, [string]$SearchItem, [string[]]$LogName)
ForEach ($Server in $Servers) {
if ($LogName -eq $null) {
dir -Path \\$server\$SearchPath -Recurse -Force -ErrorAction SilentlyContinue -WarningAction SilentlyContinue | Select-String -pattern $SearchItem -ErrorAction SilentlyContinue -WarningAction SilentlyContinue | Select-Object Filename, Path, Matches, LineNumber
}
Else {
dir -Path \\$server\$SearchPath -Recurse -Force -ErrorAction SilentlyContinue -WarningAction SilentlyContinue | ? {$_.Name -match $LogName} | Select-String -pattern $SearchItem -ErrorAction SilentlyContinue -WarningAction SilentlyContinue | Select-Object Filename, Path, Matches, LineNumber
}
}
}
Currently I am getting the following out put displayed which is what I would like to do for zip files as well
ip.ininlog \CO200197L\C$\Temp\Test\Test\ip\ip.ininlog {3030872954} 136594
I have found the following just not sure how to proceed to get them implemented
Grep File in Zip
List File in Zip
I need the ability to transverse all zip files that are store in a directory
Sample of Directory Structure
2014-07-01 - root
zip.zip
zip_1.zip
zip_2.zip
etc
In case you have NET 4.5 framework installed, you can use 4.5's built-in ZIP support to extract files to a temporary path and run the selection on the temporary file. If no 4.5 is available, I recommend using SharpCompress (https://sharpcompress.codeplex.com/) which works in a similar way.
The following code snippet demonstrates extracting a ZIP archive into a temporary file, running the selection process from your script and the cleanup after the extraction. You can significantly simplify the code by extracting the entire ZIP file at once (just use ExtractToDirectory() on the archive) if it contains only the files you are seeking.
# import .NET 4.5 compression utilities
Add-Type -As System.IO.Compression.FileSystem;
# the input archive
$archivePath = "C:\sample.zip";
# open archive for reading
$archive = [System.IO.Compression.ZipFile]::OpenRead($archivePath);
try
{
# enumerate all entries in the archive, which includes both files and directories
foreach($archiveEntry in $archive.Entries)
{
# if the entry is not a directory (which ends with /)
if($archiveEntry.FullName -notmatch '/$')
{
# get temporary file -- note that this will also create the file
$tempFile = [System.IO.Path]::GetTempFileName();
try
{
# extract to file system
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($archiveEntry, $tempFile, $true);
# create PowerShell backslash-friendly path from ZIP path with forward slashes
$windowsStyleArchiveEntryName = $archiveEntry.FullName.Replace('/', '\');
# run selection
Get-ChildItem $tempFile | Select-String -pattern "yourpattern" | Select-Object #{Name="Filename";Expression={$windowsStyleArchiveEntryName}}, #{Name="Path";Expression={Join-Path $archivePath (Split-Path $windowsStyleArchiveEntryName -Parent)}}, Matches, LineNumber
}
finally
{
Remove-Item $tempFile;
}
}
}
}
finally
{
# release archive object to prevent leaking resources
$archive.Dispose();
}
If you have multiple ZIP files in the directory, you can enumerate them as follows (using your example script):
$zipArchives = Get-ChildItem -Path \\$server\$SearchPath -Recurse "*.zip";
foreach($zipArchive in $zipArchives)
{
$archivePath = $zipArchive.FullName;
...
}
You can place the demo code in ... or move it to a PowerShell function.
Sometimes is not desirable to extract a zip entry as a file. Instead it may be preferable to work with the file in memory. Extracting a Zip entry containing XML or JSON text so it can be parsed in memory is an example.
Here is a technique that will allow you to do this. This example assumes there is a Zip entry with a name ending in .json and it is this file which is to be retrieved. Clearly the idea can be modified to handle different cases.
This code should work with version of the .NET Framework that includes the System.IO.Compression namespace.
try
{
# import .NET 4.5 compression utilities
Add-Type -As System.IO.Compression.FileSystem;
# A variable to hold the recovered JSON content
$json = $null
$zip = [IO.Compression.ZipFile]::OpenRead($zipFileName)
$zip.Entries |
Where-Object { $_.Name.EndsWith(".json") } |
ForEach-Object {
# Use a MemoryStream to hold the inflated file content
$memoryStream = New-Object System.IO.MemoryStream
# Read the entry
$file = $_.Open()
# Copying inflates the entry content
$file.CopyTo($memoryStream)
# Make sure the entry is closed
$file.Dispose()
# After copying, the cursor will be at the end of the stream
# so set the position to the beginning or there will be no output
$memoryStream.Position = 0
# Use a StreamReader because it allows the content to be
# read as a string in one go
$reader = New-Object System.IO.StreamReader($memoryStream)
# Read the content as a string
$json = $reader.ReadToEnd()
# Close the reader and memory stream
$reader.Dispose()
$memoryStream.Dispose()
}
# Finally close the zip file. This is necessary
# because the zip file does get closed automatically
$zip.Dispose()
# Do something with the JSON in memory
if ( $json -ne $null )
{
$objects = $json | ConvertFrom-Json
}
}
catch
{
# Report errors
}