Powershell: Implementing an IdataReader wrapper around streamreader - powershell

I am trying to load extremely large CSV files into SQL Server using Powershell. The code also has to apply on the fly regex replacements, allow for various delimiters, EOR, and EOF markers. For maintenance, I would really like all of this logic to exist in Powershell without importing assemblies.
To be efficient, I know I need to use the SQLBulkCopy method. But, all of the Powershell examples I see fill a DataTable and pass it which is not possible for me because of the file size.
I am pretty sure I need to wrap StreamReader in an Idatareader and then pass that to SQLBulkcopy. I found a couple great examples of this implemented in C#:
http://archive.msdn.microsoft.com/FlatFileDataReader
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
Is it possible to accomplish this functionality using native PowerShell without importing the C# assembly? I am specifically having a hard time converting the abstract class wrapper.
This is the code I have so far that does not pass an IdataReader and breaks on memory limits.
function Get-CSVDataReader()
{
param (
[string]$path
)
$parsedData = New-Object 'System.Collections.Generic.List[string]'
#List<string[]> parsedData = new List<string[]>()
$sr = new-object IO.StreamReader($path)
while ($line = $sr.ReadLine())
{
#regex replace and other logic here
$parsedData.Add($line.Split(','))
}
,$parsedData #if this was an idatareader, the comma keeps it from exploding
}
$MyReader = Get-CSVDataReader('This should not fill immediately. It needs a Read Method.')
Thanks a bunch for the help.

If all you want to do is use a DataReader with SqlBulkCopy you could use the ACE drivers which comes with Office 2007/2010 and is also available as a separate download to open an OLEDB connection to to CSV file, open a reader and call WriteToServer
$ServerInstance = "$env:computername\sql1"
$Database = "tempdb"
$tableName = "psdrive"
$ConnectionString = "Server={0};Database={1};Integrated Security=True;" -f $ServerInstance,$Database
$filepath = "C:\Users\Public\bin\"
get-psdrive | export-csv ./psdrive.csv -NoTypeInformation -Force
$connString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$filepath`";Extended Properties=`"text;HDR=yes;FMT=Delimited`";"
$qry = 'select * from [psdrive.csv]'
$conn = new-object System.Data.OleDb.OleDbConnection($connString)
$conn.open()
$cmd = new-object System.Data.OleDb.OleDbCommand($qry,$conn)
$dr = $cmd.ExecuteReader()
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $connectionString
$bulkCopy.DestinationTableName = $tableName
$bulkCopy.WriteToServer($dr)
$dr.Close()
$conn.Close()
#CREATE TABLE [dbo].[psdrive](
# [Used] [varchar](1000) NULL,
# [Free] [varchar](1000) NULL,
# [CurrentLocation] [varchar](1000) NULL,
# [Name] [varchar](1000) NULL,
# [Provider] [varchar](1000) NULL,
# [Root] [varchar](1000) NULL,
# [Description] [varchar](1000) NULL,
# [Credential] [varchar](1000) NULL,
# [DisplayRoot] [varchar](1000) NULL
#)

I'm importing large CSV's by a datatable and performing batch updates after 1 million rows.
if ($dt.rows.count -eq 1000000) {
$bulkCopy.WriteToServer($dt)
$dt.Clear()
}
Here is the link where I detail my own script on my blog, but the above code outlines the basic concept. My PowerShell script took 4.x minutes to import 9 million rows from a 1.1 GB CSV. The script relied on SqlBulkCopy, [System.IO.File]::OpenText and a datatable.

Related

it is possible to pass argument to powershell script a datatable?

I'm trying to create a PowerShell script that inserts a datatable to SQL via WriteToServer...
This script is called by a PowerAutomateDesktop automation.
So... I cannot pass my datatable as an argument :(
%dt% it s datatable variable which needs to be used inside powershell script.
This is my dilemma - it is interpreted as a string or something like that
#Invoke-sqlcmd Connection string parameters
$params = #{'server'='SQLEXPRESS';'Database'='Db'}
Write-Output %dt%
#Variable to hold output as data-table
$dataTable = %dt% | Out-DataTable
#Define Connection string
$connectionString = "Data Source=DSQLEXPRESS; Integrated Security=SSPI;Initial Catalog=Db"
#Bulk copy object instantiation
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $connectionString
#Define the destination table
$bulkCopy.DestinationTableName = "dbo.__SALES"
#load the data into the target
$bulkCopy.WriteToServer($dataTable)
#Query the target table to see for output
Invoke-Sqlcmd #params -Query "SELECT * FROM dbo.__SALES" | format-table -AutoSize
Thanks!
UPDATE
No loner need to pass an argument - I create the datatable inside the script.
Thanks again!
Work-around: create the datatable inside the script

I have some code to download all the tables in my database to csv, is there a way to specify row separators?

Currently the code uses a comma for the column and a new line for row.
This is an issue because some of the data in the tables are paragraphs which already include commas and new lines.
I want to be able to use a delimiter with multiple characters but that is returning an error
Cannot bind parameter Delimiter. Cannot convert value '/~/' to type System.Char
$server = "(server)\instance"
$database = "DBName"
$tablequery = "SELECT name from sys.tables"
#Delcare Connection Variables
$connectionTemplate = "Data Source={0};Integrated Security=SSPI;Initial Catalog={1};"
$connectionString = [string]::Format($connectionTemplate, $server, $database)
$connection = New-Object System.Data.SqlClient.SqlConnection
$connection.ConnectionString = $connectionString
$command = New-Object System.Data.SqlClient.SqlCommand
$command.CommandText = $tablequery
$command.Connection = $connection
#Load up the Tables in a dataset
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $command
$DataSet = New-Object System.Data.DataSet
$SqlAdapter.Fill($DataSet)
$connection.Close()
# Loop through all tables and export a CSV of the Table Data
foreach ($Row in $DataSet.Tables[0].Rows)
{
$queryData = "SELECT * FROM [$($Row[0])]"
#Specify the output location of your dump file
$extractFile = "C:\temp\backups\$($Row[0]).csv"
$command.CommandText = $queryData
$command.Connection = $connection
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $command
$DataSet = New-Object System.Data.DataSet
$SqlAdapter.Fill($DataSet)
$connection.Close()
$DataSet.Tables[0] | Export-Csv $extractFile -Delimiter '/~/'
}
Export-Csv and ConvertTo-Csv correctly handles newline and comma characters
It is not a problem that the data could potentially contain commas and/or new lines.
If you look at the CSV "specification" (written in quotation signs here because a lot of usages of the term CSV is not referring to anything following this specification) you'll see that a field in a CSV file can be enclosed in quation characters. If the data of that field contains a quotation character, the delimiter character or a newline character it must be enclosed in quotation characters. If the data contains a quotation character, that quotation character should be doubled.
This will all be handled correctly by the ConvertTo-Csv and the Export-Csv cmdlets.
$obj = New-Object PSObject -Property ([ordered]#{
FirstColumn = "First Value";
SecondColumn = "Second value, including a comma";
ThirdColumn ="Third Value including two`nnewline`ncharacters";
FourthColumn = 'Fourth value including a " character'}
)
$obj | ConvertTo-Csv -NoTypeInformation
This will give us the following output:
"FirstColumn","SecondColumn","ThirdColumn","FourthColumn"
"First Value","Second value, including a comma","Third Value including two
newline
characters","Fourth value including a "" character"
Which is correctly handled according to the CSV specification.
So you do not need to worry about the data containing comma characters or newline characters, since it is handled by the CSV format.
Open a CSV file in Excel
I don't know what your current problem with the data is, but I'm guessing you're trying to open the resulting file in Excel and seeing incorrect data. This is because Excel unfortunately doesn't open .csv files as... well... CSV files.
One way (there might be more ways) to open it in Excel is to go on the Data tab and in the "Get & Transform Data" section press the "From Text/CSV" button. This way, Excel should open the file correctly according to the CSV standard.

How to insert strings into a table with their UTF-8 encodings?

I am trying to upload some string values into an Oracle table by means of powershell. However when I upload strings directly some characters are shown up like ? in the table.
Actually, I first parse a text and retrieve some results through regex as below:
if($wiki_link -match "http:\/\/en\.wikipedia\.org\/wiki\/(.*)") {$city = $matches[1]}
Then I wanna upload this $city variable into a table as below:
[System.Reflection.Assembly]::LoadWithPartialName("System.Data.OracleClient")
$connectionString = "Data Source=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Host=xxxxxxxxx)(Port=1521)))(CONNECT_DATA=(SERVER = DEDICATED) (SERVICE_NAME =xxxxx)));user id=xxxxxx;password=xxxxx"
$connection = New-Object System.Data.OracleClient.OracleConnection($connectionString)
$connection.Open()
$cmd2=$connection.CreateCommand()
$cmd2.CommandText="insert into mehmet.goo_region (city) values ('$city')"
$rdr2=$cmd2.ExecuteNonQuery()
When I apply this method, the city named Elâzığ appears as Elaz?? in the table cell.
I guess I have to convert string into UTF-8 but I could not find a solution through web.
Thanks in advance...
Try this, it should work:
$u = New-Object System.Text.UTF8Encoding
$s = $u.GetBytes("YourStringGoesHere")
$u.GetString($s) ## this is your UTF-8 string
So your code becomes
$u = New-Object System.Text.UTF8Encoding
$s = $u.GetBytes($city)
$utf8city = $u.GetString($s)

file IO, is this a bug in Powershell?

I have the following code in Powershell
$filePath = "C:\my\programming\Powershell\output.test.txt"
try
{
$wStream = new-object IO.FileStream $filePath, [System.IO.FileMode]::Append, [IO.FileAccess]::Write, [IO.FileShare]::Read
$sWriter = New-Object System.IO.StreamWriter $wStream
$sWriter.writeLine("test")
}
I keep getting error:
Cannot convert argument "1", with value: "[IO.FileMode]::Append", for
"FileStream" to type "System.IO.FileMode": "Cannot convert value
"[IO.FileMode]::Append" to type "System.IO.FileMode" due to invalid
enumeration values. Specify one of the following enumeration values
and try again. The possible enumeration values are "CreateNew, Create,
Open, OpenOrCreate, Truncate, Append"."
I tried the equivalent in C#,
FileStream fStream = null;
StreamWriter stWriter = null;
try
{
fStream = new FileStream(#"C:\my\programming\Powershell\output.txt", FileMode.Append, FileAccess.Write, FileShare.Read);
stWriter = new StreamWriter(fStream);
stWriter.WriteLine("hahha");
}
it works fine!
What's wrong with my powershell script? BTW I am running on powershell
Major Minor Build Revision
----- ----- ----- --------
3 2 0 2237
Another way would be to use just the name of the value and let PowerShell cast it to the target type:
New-Object IO.FileStream $filePath ,'Append','Write','Read'
When using the New-Object cmdlet and the target type constructor takes in parameters, you should either use the -ArgumentList parameter (of New-Object) or wrap the parameters in parenthesis - I prefer to wrap my constructors with parens:
# setup some convenience variables to keep each line shorter
$path = [System.IO.Path]::Combine($Env:TEMP,"Temp.txt")
$mode = [System.IO.FileMode]::Append
$access = [System.IO.FileAccess]::Write
$sharing = [IO.FileShare]::Read
# create the FileStream and StreamWriter objects
$fs = New-Object IO.FileStream($path, $mode, $access, $sharing)
$sw = New-Object System.IO.StreamWriter($fs)
# write something and remember to call to Dispose to clean up the resources
$sw.WriteLine("Hello, PowerShell!")
$sw.Dispose()
$fs.Dispose()
New-Object cmdlet online help: http://go.microsoft.com/fwlink/?LinkID=113355
Yet another way could be to enclose the enums in parens:
$wStream = new-object IO.FileStream $filePath, ([System.IO.FileMode]::Append), `
([IO.FileAccess]::Write), ([IO.FileShare]::Read)
If your goal is to write into a logfile or text file, then you could try the supported cmdlets in PowerShell to achieve this?
Get-Help Out-File -Detailed

Converting accdb to csv with powershell

I am trying to convert some excel (.xlsx) and Access (.accdb) to CSV files.
I quickly found a way to do this with Excel but now I cannot find any helpful documentation on converting .accdb files.
So far I have:
$adOpenStatic = 3
$adLockOptimistic = 3
$objConnection = New-Object -com "ADODB.Connection"
$objRecordSet = New-Object -com "ADODB.Recordset"
$objConnection.Open("Provider = Microsoft.ACE.OLEDB.12.0; Data Source = " + $Filepath)
$objRecordset.Open("Select * From TableName",$objConnection,$adOpenStatic, $adLockOptimistic)
#Here I need some way to either saveas .csv or loop through
#each row and pass to csv.
$objRecordSet.Close()
$objConnection.Close()
Any Ideas?
I would be willing to do this with another language (VB, Java, PHP) if anyone knows a way.
If you use .NET rather than COM it's a lot easier. Here's some code to handle the Excel XLSX files
#Even /w Excel 2010 installed, needed to install ACE:
#http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c06b8369-60dd-4b64-a44b-84b371ede16d&displaylang=en
#Becareful about executing in "right" version x86 vs. x64
#Change these settings as needed
$filepath = 'C:\Users\u00\Documents\backupset.xlsx'
#Comment/Uncomment connection string based on version
#Connection String for Excel 2007:
$connString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$filepath`";Extended Properties=`"Excel 12.0 Xml;HDR=YES`";"
#Connection String for Excel 2003:
#$connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=`"$filepath`";Extended Properties=`"Excel 8.0;HDR=Yes;IMEX=1`";"
$qry = 'select * from [backupset$]'
$conn = new-object System.Data.OleDb.OleDbConnection($connString)
$conn.open()
$cmd = new-object System.Data.OleDb.OleDbCommand($qry,$conn)
$da = new-object System.Data.OleDb.OleDbDataAdapter($cmd)
$dt = new-object System.Data.dataTable
[void]$da.fill($dt)
$conn.close()
$dt | export-csv ./test.csv -NoTypeInformation
If you want to stick with ADODB COM object:
# loop through all records - do work on each record to convert it to CSV
$objRecordset.Open("Select * FROM Tablename", $objConnection,$adOpenStatic,$adLockOptimistic)
$objRecordset.MoveFirst()
do {
# do your work to get each field and convert this item to CSV
# fields available thru: $objRecordset.Fields['fieldname'].Value
$objRecordset.MoveNext()
} while ($objRecordset.EOF -eq $false)
$objRecordset.Close()