Convert xlsx to CSV without using Excel - powershell

I get following error:
Cannot index into a null array.
At C:\tmp\Folder\excel\output\net45\test.ps1:14 char:1
+ $Data = $Reader.AsDataSet().Tables[0].Rows
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : NullArray
# Zero based index. The second row has index 1.
$StartRow = 2
# Input File
$InputFileName = "C:\tmp\Folder\excel\output\net20\test.xlsx"
# Output File
$OutputFileName = "C:\tmp\Folder\excel\output\net20\SomeFile.csv"
# Path to Excel.dll is saved (downloaded from http://exceldatareader.codeplex.com/)
$DllPath = "C:\tmp\Folder\excel\output\net45\Excel.4.5.dll"
[void]([Reflection.Assembly]::LoadFrom($DllPath))
$Stream = New-Object IO.FileStream($InputFileName, "Open", "Read")
$Reader = [Excel.ExcelReaderFactory]::CreateBinaryReader($Stream)
$Data = $Reader.AsDataSet().Tables[0].Rows
# Read the column names. Order should be preserved
$Columns = $Data[$StartRow].ItemArray
# Sort the remaining data into an object using the specified columns
$Data[$($StartRow + 1)..$($Data.Count - 1)] | % {
# Create an object
$Output = New-Object Object
# Read each column
for ($i = 0; $i -lt $Columns.Count; $i++) {
$Output | Add-Member NoteProperty $Columns[$i] $_.ItemArray[$i]
}
# Leave it in the output pipeline
$Output
} | Export-CSV $OutputFileName -NoType

You're calling the binary method (.xls) and using an Open XML format file (.xlsx). Try using [Excel.ExcelReaderFactory]::CreateOpenXmlReader($Stream) instead.
This works for me:
$DllPath = 'C:\Excel.DataReader.45\Excel.4.5.dll';
$FilePath = 'C:\Students.xlsx';
$FileMode = [System.IO.FileMode]::Open;
$FileAccess = [System.IO.FileAccess]::Read;
Add-Type -Path $DllPath;
$FileStream = New-Object -TypeName System.IO.FileStream $FilePath, $FileMode, $FileAccess;
$ExcelDataReader = [Excel.ExcelReaderFactory]::CreateOpenXmlReader($FileStream);
$ExcelDataReader.IsFirstRowAsColumnNames = $true;
$ExcelDataSet = $ExcelDataReader.AsDataSet();
$ExcelDataReader.Dispose();
$FileStream.Close();
$FileStream.Dispose();
$ExcelDataSet.Tables | Format-Table -AutoSize
If you're still having trouble, you might consider using the Microsoft.ACE.OLEDB.12.0 provider, which you install separately from Office. There's some doc here.

I've read this "Convert XLS to CSV on command line" and this "convert-xlsx-file-to-csv-using-batch" before in a similar doubt I have. Try too see if it helps.

Related

How to loop through column values from a table and create folders via powershell

I'm trying to achieve the following via powershell:
I have a table(TBL_DDL) with 5 columns (CATALOG,SCHEMA,OBJECT_TYPE,OBJECT_NAME,DDL)
Now, i'm extract data from this table and then trying to create a folder structure by concatenating first 4 columns (CATALOG,SCHEMA,OBJECT_TYPE,OBJECT_NAME) in C: drive and then exporting the data in DDL column in txt file.
For eg: C:\"CATALOG"\"SCHEMA"\"OBJECT_TYPE"\"OBJECT_NAME"\DDL.txt
I'm trying to achieve this via powershell. Can anyone help me please?
$SqlCmd = 'snowsql -c example -d tu_test -s public -q "select catalog,schema,OBJECT_TYPE,OBJECT_NAME,DDL from SF_TBL_DDL limit 2"'
$MultiArray = #(Invoke-Expression $SqlCmd)
$dt = New-Object System.Data.Datatable
[void]$dt.Columns.Add("CATALOG")
[void]$dt.Columns.Add("SCHEMA")
$Output = foreach ($Object in $MultiArray)
{
foreach ($SCHEMA in $Object.SCHEMA)
{
$someother = New-Object -TypeName psobject -Property #{CATALOG = $Object.CATALOG; SCHEMA = $SCHEMA}
$nRow = $dt.NewRow()
$nRow.CATALOG = $someother.CATALOG
$nRow.SCHEMA = $someother.SCHEMA
$dt.Rows.Add($nRow)
}
}
$dt.row.count
At the moment, i'm getting 0 rows in $dt.
Cheers
You can use System.Data.DataTable object the pull your result set and then loop through it to perform the required operation.
Here GetTableValues function will retrieve the table values and then use following cmdlet to create directory and file
New-Item -ItemType "directory" -Path $dirPath
New-Item -ItemType "file" -Path $filePath
Complete code looks like this
function GetTableValues(){
$DBConnectionString = "<Your DB connection string>";
$sqlConn = new-object System.Data.SqlClient.sqlConnection $DBConnectionString;
$sqlConn.Open();
$sqlCommand = $sqlConn.CreateCommand();
$sqlCommand.CommandText = "select catalog,[schema],OBJECT_TYPE,OBJECT_NAME,DDL from TBL_DDL"; ##Put your correct query here
$result = $sqlCommand.ExecuteReader();
$table = New-Object System.Data.DataTable;
$table.Load($result);
$sqlConn.Close();
return $table;
}
$tableValue = GetTableValues;
foreach ($Row in $tableValue)
{
$filePath = "C:\" + $Row.catalog.TrimEnd() + "\" + $Row.schema.TrimEnd() + "\" + $Row.OBJECT_TYPE.TrimEnd() + "\" + $Row.OBJECT_NAME.TrimEnd() + "\" + $Row.DDL.TrimEnd() + ".txt"
$dirPath = "C:\" + $Row.catalog.TrimEnd() + "\" + $Row.schema.TrimEnd() + "\" + $Row.OBJECT_TYPE.TrimEnd() + "\" + $Row.OBJECT_NAME.TrimEnd()
New-Item -ItemType "directory" -Path $dirPath ##Creates directory
New-Item -ItemType "file" -Path $filePath ##Creates file in $dirPath directory
}
This works perfectly fine for me.

Powershell Loop to Write Password Protected Files

I'm trying to read excel files into Powershell, open, password protect them and write them back. I can do it individually but within a loop the script fails:
#working individually
$f = ("C:my\path\Out Files\1234dv.xlsx")
$outfile = $f.FullName + "out"
$xlNormal = -4143
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($f)
$a = $wb.SaveAs("C:my\path\Out Files\test.xls",$xlNormal,"test")
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
#not working in loop, error after
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject(
[System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
foreach ($f in Get-ChildItem "C:\my\path\Out Files"){
$ff = $f
$outfile = $f.FullName + "out"
$xlNormal = -4143
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($ff)
$a = $wb.SaveAs("C:\my\path\Out Files\test.xls",$xlNormal,"test")
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
}
Sorry, we couldn't find 1234dv.xlsx. Is it possible it was moved,
renamed or deleted? At line:16 char:5
+ $wb = $xl.Workbooks.Open($ff)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException COM object that has been
separated from its underlying RCW cannot be used. At line:17 char:5
+ $a = $wb.SaveAs("C:\my\path ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], InvalidComObjectException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.InvalidComObjectException
That error repeats for all four test files I'm working with.
I'm not really familiar with Powershell so I relied on MS docs, and I couldn't password protect the files in python so thought this would be easier. I know this doesn't address the password yet either but trying to get the loop to work first. Any help would be greatly appreciated. Thank you.
You should use
$wb = $xl.Workbooks.Open($ff.FullName)
To give Excel the full file path. Otherwise, $ff is a FileInfo object where a string (path) is required
Slightly off topic for your question , but not for your intent :
From a security perspective using .xls passwords is not security, it is merely an annoyance.
If you need security, then i suggest you use something like Azure Information protection that allows you to encrypt , and share the file securely only with those that need access.
You still need to create your xls or .xlsx files (or any other file for that matter) then you can the powershell simply loop over them :
PS C:\>foreach ($file in (Get-ChildItem -Path \\server1\Docs -Recurse -Force |
where {!$_.PSIsContainer} |
Where-Object {$_.Extension -eq ".xls"})) {
Protect-RMSFile -File $file.PSPath -InPlace -DoNotPersistEncryptionKey All -TemplateID "e6ee2481-26b9-45e5-b34a-f744eacd53b0" -OwnerEmail "IT#Contoso.com"
}
https://learn.microsoft.com/en-us/powershell/module/azureinformationprotection/protect-rmsfile?view=azureipps

Opening Large Set of Word Documents With Powershell - Automation

I am in the process of assigning a footer to hundreds of word documents with their current filepath. Here is my code, which does the job:
I plan to have $Word.Visible set to false, but it isn't for now for debugging purposes.
This gets all the word docs in a directory, adds footer with their file path, then saves and closes.
I am trying to handle a case like this:
I just want to skip this, or possibly force open and continue. Not sure the best way to go about this, however, and am seeking some help.
Thanks,
Elijah
Set-ExecutionPolicy bypass;
$path = 'somepath';
$documents = Get-ChildItem -Path $path *.docx -Recurse -Force
$filepaths = foreach ($document in $documents) {$document.fullname}
$Word = New-Object -ComObject Word.application;
$Word.Visible = $true;
foreach ($filepath in $filepaths){
$Doc = $Word.Documents.OpenNoRepairDialog($filepath);
$Doc.Unprotect();
$Selection = $Word.Selection;
$Doc.ActiveWindow.ActivePane.View.SeekView = 4;
$Selection.ParagraphFormat.Alignment = 1;
$Selection.TypeText($filepath);
$Doc.Save();
$Doc.Close();
}
$Word.Quit();
Edit1:
I've made an edit where it adds the dynamic field object for the file path, rather than just typing in the file path, that way if you happen to move the file, the file path can be updated to the new path. You will have to press F9 while selecting the footer in word, but this is the best you can do without making a macro and saving the file as a .docm.
Here is the amended code:
$documents = Get-ChildItem -path *docx -recurse -force
$filepaths = foreach($document in $documents){$document.FullName}
Set-Variable -Name wdFieldFileName -Value 29 -Option constant -Force -ErrorAction SilentlyContinue
$word = New-Object -ComObject Word.Application
#$word.Visible = $true
foreach($filepath in $filepaths){
$doc = $word.Documents.Open($filepath)
$sections = $doc.Sections
$item1 = $sections.Item(1)
$footer = $item1.Footers.Item(1)
$range = $footer.Range
$doc.Fields.Add($range, $wdFieldFileName, '\p')
$doc.Save()
$doc.Close()
}
$word.Quit()
I am still running into the error window when trying to open corrupted or document "in need of repair" as diagnosed by word.
Passing in multiple arguments to the Open() method does not yield results as expected. Here is an example:
Exception calling "Open" with "16" argument(s): "Type mismatch. (Exception from HRESULT: 0x80020005 (DISP_E_TYPEMISMATCH))"
At line:1 char:1
+ $doc = $word.Documents.Open($filepath, $False, $False, $False, $null, ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodTargetInvocation

Is $doc a reserved word/reference in Powershell?

I've inherited and am running the Powershell script below, and I'm having one problem interpreting it and applying some of its functionality to other scripts I'm trying to write.
In particular, I'm looking at the following lines of code:
ForEach ($doc in $srcfiles) {
saveas-document -docs $doc
}
From the program functionality, I know that each instance of a document in $srcfiles is being assigned to the variable $doc, and that variable is being passed to the saveas-document function as the input value. However, I'm not sure where $doc is coming from. Does this statement declare the $doc variable on the fly? Or is it a Powershell reserved word the represents the document object in my source path? Also, does the -docs switch in dessence declare that $doc is equal to the $docs variable expected by the function? I need some help understanding HOW this works so I can apply that knowledge to other projects.
$global:word = new-object -comobject word.application
$word.Visible = $False
# PATHS
$backupPath = "\\Server\path\to\source\files\"
$srcfiles = Get-ChildItem $backupPath -filter "*htm.*"
$dPath = "\\Server\path\to\desitination\files\"
$htmPath = $dPath + "HT\" # Data path for HTML
$docPath = $dPath + "DO\" # Data path for *.DOC
$doxPath = $dPath + "DX\" # Data path for *.DOCX
$txtPath = $dPath + "TX\" # Data path for *.TXT
$rtfPath = $dPath + "RT\" # Data path for *.RTF
# SAVE FORMATS
$saveFormatDoc = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 0);
$saveFormatTxt = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 4);
$saveFormatRTF = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 6);
$saveFormatDox = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 16);
$saveFormatXML = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], 14);
# Convert Documents
function saveas-document ($docs) {
$savepath = "$docPath$($docs.BaseName)"
"Converting to $savepath.doc"
$opendoc.saveas([ref]"$savepath", [ref]$saveFormatDoc)
$opendoc.close()
"Success with conversions."
" "
}
#
ForEach ($doc in $srcfiles) {
saveas-document -docs $doc
}
#
$word.quit()
Run the follwing commands, and read through the output:
Get-Help about_foreach -ShowWindow
Get-Help about_functions -ShowWindow

Shortcut to add files into local sql database

I would like to be able to right click on a file(s) and "send-to" a local MSSQL database. The details are that I would like to store the file contents in "contents" column and the file name in the "filename" column ... how novel :)
*In most cases the file contents is HTML.
It seems like it should be possible through windows shell/SQL Shell using a shortcut to a command in the "shell:sendto" folder.
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | Out-Null
$Server1 = New-Object ("Microsoft.SqlServer.Management.Smo.Server") 'SQLSERVER'
$Server1.databases["DB"].tables["Table"].rowcount
$RowCount = $server1.databases["DB"].tables["Table"].rowcount.ToString()
$TotalRecords = [int]$RowCount
$wc = New-Object system.net.WebClient
$url = ""
$files = #(Get-ChildItem c:\test\*.*)
"Number of files $($files.length)"
# Errors out when no files are found
if($files.length -lt 1) { return }
foreach($file1 in $files) {
# $txt = Get-Content($file1)
# $txt = $txt.Replace("'", "''")
# Write-Host $file1.name + " - - " + $Txt
$url1 = $url + $file1
Write-Host("URL is " + $url1)
$webpage = $wc.DownloadData($url1)
$string = [System.Text.Encoding]::ASCII.GetString($webpage)
$string = $string.Replace("'", "''")
Invoke-SqlCmd -ServerInstance SERVER -Query "Insert into DATABASE.dbo.Table(text,filename) Values ('$string','$file1')"}