Open multiple HTML files and save as XLSX using PowerShell - powershell

How do I open multiple HTML files (tabular format) and save them as Excel XLSX format in Windows PowerShell ISE? Directly renaming file extension removes all the formatting. It was working with a single file. I need help with the looping part?
$FolderPath = 'C:\Users\abcd\Desktop\New folder'
$FilePaths = get-childitem $FolderPath -recurse | where {$_.extension -eq ".html"}
foreach($FilePath in $FilePaths)
{
$Workbook = $Excel.Workbooks.Open($FilePath)
$Excel.Visible = $true
$Excel.DisplayAlerts = $False
$OutFile = 'C:\Users\abcd\Desktop\New folder\...xlsx' #Need same file names
$xlSLSXType = 51
$workBook.SaveAs("$OutFile",$xlSLSXType)
}

Related

I have a script that makes some manipulation with .xlsx file. How do i loop it with all files within folder?

I have script that updates query in excel file
$filePath = "C:\Scripts\SheetToRefresh.xlsx"
$excelObj = New-Object -ComObject Excel.Application
$excelObj.Visible = $true
$workBook = $excelObj.Workbooks.Open($filePath)
$workSheet = $workBook.Sheets.Item("Data")
$workSheet.Select()
$workBook.RefreshAll()
$workBook.Save()
Original script comes from here
Now i need to loop it wihtin folder, i came up with:
$files = Get-ChildItem "C:\path" -Filter *.xlsx
foreach ($f in $files){
}
but struggling with changing filename for each file.(newbie with ps)
Let's break down what needs to happen:
Before:
Open Excel
Enumerate files
During, for each file:
Open workbook
Run the relevant part of your existing script
Save and close workbook
After:
Close Excel
So, let's start by moving the "Before" actions to the top of your new script:
# Open Excel
$excelObj = New-Object -ComObject Excel.Application
$excelObj.Visible = $true
# Enumerate files
$files = Get-ChildItem "C:\path" -Filter *.xlsx
Now we need to move the relevant parts of the existing script into the new loop. To get the full path of the file object returned by Get-ChildItem, use the FullName property:
foreach($file in $files){
# Open workbook from $file
$workBook = $excelObj.Workbooks.Open($file.FullName)
# Refresh query results
$workSheet = $workBook.Sheets.Item("Data")
$workSheet.Select()
$workBook.RefreshAll()
# Save updated workbook to file
$workBook.Save()
# Close workbook
$workBook.Close()
}
And finally we just need to quit Excel:
$excelObj.Quit()

Modify a .csv file in powershell automatically

I try to create a powershell script, to perform a few steps:
In a specific folder, I put a .xlsx file, it converts it to csv. Until now I got this:
$ErrorActionPreference = 'Stop'
Function Convert-CsvInBatch
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)][String]$Folder
)
$ExcelFiles = Get-ChildItem -Path $Folder -Filter *.xlsx -Recurse
$excelApp = New-Object -ComObject Excel.Application
$excelApp.DisplayAlerts = $false
$ExcelFiles | ForEach-Object {
$workbook = $excelApp.Workbooks.Open($_.FullName)
$csvFilePath = $_.FullName -replace "\.xlsx$", ".csv"
$workbook.SaveAs($csvFilePath, [Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSV)
$workbook.Close()
}
# Release Excel Com Object resource
$excelApp.Workbooks.Close()
$excelApp.Visible = $true
Start-Sleep 5
$excelApp.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelApp) | Out-Null
}
#
# 0. Prepare the folder path which contains all excel files
$FolderPath = "C:\exacthpath"
Convert-CsvInBatch -Folder $FolderPath
The columns in the file, are still there, so I want to remove them, and insert a ';' instead, like:
H;1;43;185;
At this point I'm stuck. I can import it into Powershell like:
Import-Csv -Path 'C:\folder\filename.csv' | ForEach-Object {
$_
}
I get this look, and the most important task is here, in the first row only:
H;1;43;185;
This should be modified into:
H;01;43;185
the rest should be left untouched.
After I need to export back it into a CSV file, like:
Export-Csv -Path 'C:\folder\modified_filename.csv'
But this whole process should be inserted in one single powershell script, which performs the above steps on it's own. So in short:
identifies any .xlsx file - regardless of it's name
convers it into .csv
modifies the outlook of the document, to separate the columns with a ";"
modify the first line to have 'H;01;43;185' - this is a static line, it will always look like this
save the created file as a final .csv file
Can you help me somehow to include/optimize the above scripts and let powershell perform the modification too? Example content of a file like this (final look) Usually it includes more 1000+ lines:
H;01;43;185
D;111;3;1042;2
D;222;3;1055;3
D;333;3;1085;1
T;3;;;
Any help is highly appreciated.
Regards,
Armin
If as you say in your comment, your Excel already creates a csv with the semi-colon as delimiter, you can do this inside the loop, just below $workbook.Close()
# read the file created by Excel as string array
$data = Get-Content $csvFilePath
# overwrite the file with just the new header
Set-Content -Path $csvFilePath -Value 'H;01;43;185'
# add the rest of the data to the file
$data[1..($data.Count -1)] | Add-Content -Path $csvFilePath
P.S. I would delete the lines
$excelApp.Visible = $true
Start-Sleep 5
because I don't see the need to have Excel show itself and pause the function for 5 seconds.. Instead, have Excel not show at all so it will work a lot faster by adding
$excelApp.Visible = $false
right after you have created the $excelApp

Powershell script to modify a .dat file within a zip folder

I am new to PowerShell and have an issue. I need a PowerShell script to modify the first line of a .dat file within a zip folder. Then the zip folder will be need to be renamed but the file within has to keep the same name.
Any help will be appreciated!
I have the following code. It is reading the latest zip file from one directory and copying it into a new worker2 directory. This is working fine. I am trying to open the file and modify the first line. However the file is blank so the code is not copying into the file.
$today = get-date -Format yyyyMMdd
robocopy "C:\Tcc_Touchpoints\Tcc_Touchpoints\data\fusion\Worker\"
"C:\Tcc_Touchpoints\Tcc_Touchpoints\data\fusion\Worker2\" /s
/maxage:$today
$file = gci C:\Tcc_Touchpoints\Tcc_Touchpoints\data\fusion\Worker2\ | sort
LastWriteTime | select -last 1
$file2 = "C:\Tcc_Touchpoints\Tcc_Touchpoints\data\fusion\Worker2\" + $file
$zipfileName = $file2
$fileToEdit = "Worker.dat"
$path = $zipfileName + '\' + $fileToEdit
$contents = Get-Content $fileToEdit #-path $path
$contents
Add-Type -assembly System.IO.Compression.FileSystem
$zip = [System.IO.Compression.ZipFile]::Open($zipfileName,"Update")
#$zip = [System.IO.Compression.ZipFile]::Open($path,"Update")
$robotsFile = $zip.Entries.Where({$_.name -eq $fileToEdit})
$desiredFile = [System.IO.StreamWriter]($robotsFile).Open()
$desiredFile.BaseStream.SetLength(0)
$desiredFile -replace 'SET PURGE_FUTURE_CHANGES Y','SET
PURGE_FUTURE_CHANGES N'
$desiredFile.Write($contents)
$desiredFile.Flush()
$desiredFile.Close()
# Write the changes and close the zip file
$zip.Dispose()
Write-Host "zip file updated"
Your code is not far off. The following edits a single plain text entry in a ZIP file in-place.
Be sure to use the correct text encoding (i.e. the encoding your Worker.dat actually is in). Not specifying the encoding when working with text files will lead to mangled data at some point, always play it safe there.
using namespace System.Text
using namespace System.IO
using namespace System.IO.Compression
Add-Type -Assembly System.IO.Compression.FileSystem
$fileToEdit = "Worker.dat"
$fileEncoding = [Encoding]::UTF8
$zipFileName = "C:\path\to\your\file.zip"
$zip = [ZipFile]::Open($zipfileName, "Update")
$entry = $zip.Entries.Where({$_.name -eq $fileToEdit}) | Select-Object -First 1
$reader = [StreamReader]::new($entry.Open(), $fileEncoding)
$currentText = $reader.ReadToEnd()
$reader.Dispose()
$newText = $currentText -replace "\}$"," }"
$writer = [StreamWriter]::new($entry.Open(), $fileEncoding)
$writer.Write($newText)
$writer.Dispose()
$zip.Dispose()
This does not set the last modified date of the ZIP entry. You can (propably) do this by setting a new value to the $entry.LastWriteTime property.

create csv from xls using powershell

I want create powershell script which create me csv file from .xls file but I don't know excacly how to use powershell wihout vba.
So far i have this :
ConvertTo-Csv "C:\Users\Me\TestsShella\test.xlsx" | Out-File Q:\test\testShella.csv
But it doesn't working.
With Excel present on the running machine use it as a COM-object:
## Q:\Test\2019\01\31\SO_54461362.ps1
$InFile = Get-Item "$($Env:USERPROFILE)\TestsShella\test.xlsx"
$OutFile= $InFile.FullName.replace($InFile.Extension,".csv")
$Excel = new-object -ComObject "Excel.Application"
$Excel.DisplayAlerts = $True
$Excel.Visible = $False # $True while testing
$WorkBook = $Excel.Workbooks.Open($InFile.FullName)
$WorkBook.SaveAs($OutFile, 6) # 6 -> type csv
$WorkBook.Close($True)
$Excel.Quit()
[void][System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel)
Depending on the locale (decimal point/comma) the csv file will either be comma or semicolon seperated.
Without Excel being installed, use the already suggest module ImportExcel
$InFile = Get-Item "$($Env:USERPROFILE)\TestsShella\test.xlsx"
$OutFile= $InFile.FullName.replace($InFile.Extension,".csv")
Import-Excel $Infile.FullName | Export-Csv $OutFile -NoTypeInformation
This yields a .csv file with all fields double quoted and comma seperated.
There is a prebuilt library for this:
https://www.powershellgallery.com/packages/ImportExcel/5.4.4
You will then have the import-excel function/cmdlet available to you and will be able to import, convert to csv and then export
Maybe this could work:
rename-item -Path "C:\Users\Me\TestsShella\test.xlsx" -NewName "item.csv"
you will get a message when open the CSV, but the format of CSV is like XLSX.

How to search a word in a docx file with powershell?

I have to examine all of .docx files in a folder and i have to display the name of files which is contain that word I added as param. How can I do it in powershell?
try someting like this:
#Instance of word
$Word=NEW-Object –comobject Word.Application
$Word.visible = $False
#take list of .docx
Get-ChildItem "c:\temp" -file -Filter "*.docx" | %{
$Filename=$_.FullName
#open file and take content of word file
$Document=$Word.documents.open($Filename, $false, $true)
$range = $document.content
#if content have your word, print path of word file
If($range.Text -like "*tot*"){
$Filename
}
$word.Documents.Close($false)
}