Visual Studio Data Load from flat file to Postgres Database - postgresql

I am using Visual Studio to transfer data from a CSV file to the Postgres database. My database is installed on windows server 2012 and I'm using my local machine to transfer data. My process runs successfully without throwing any errors but somehow it's not loading all the rows of my CSV file. The file contains 382,363 rows but after I check my database only 26000 rows have been loaded.
I directly loaded the CSV from the import wizard of Postgres and it successfully loaded all the 382,363 rows but when I'm loading data through visual studio it just loads 26000 rows without throwing any error. I just get two warning msgs. Has anyone faced any issue if so how do I solve it?
Pasting below the entire output of my process
SSIS package "C:\Users\Shivam SARIN\source\repos\Integration Services Project3\Integration Services Project3\Package.dtsx" starting.
Information: 0x4004300A at Data Flow Task, SSIS.Pipeline: Validation phase is beginning.
Information: 0x4004300A at Data Flow Task, SSIS.Pipeline: Validation phase is beginning.
Warning: 0x80049304 at Data Flow Task, SSIS.Pipeline: Warning: Could not open global shared memory to communicate with performance DLL; data flow performance counters are not available. To resolve, run this package as an administrator, or on the system's console.
Warning: 0x80047076 at Data Flow Task, SSIS.Pipeline: The output column "T_CTRY_DESTINATION" (115) on output "Flat File Source Output" (6) and component "Flat File Source" (2) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
Information: 0x40043006 at Data Flow Task, SSIS.Pipeline: Prepare for Execute phase is beginning.
Information: 0x40043007 at Data Flow Task, SSIS.Pipeline: Pre-Execute phase is beginning.
Information: 0x402090DC at Data Flow Task, Flat File Source [2]: The processing of file "C:\Users\Shivam SARIN\Documents\Excel-csv\MS 2018 Q3.csv" has started.
Information: 0x4004300C at Data Flow Task, SSIS.Pipeline: Execute phase is beginning.
Information: 0x402090DE at Data Flow Task, Flat File Source [2]: The total number of data rows processed for file "C:\Users\Shivam SARIN\Documents\Excel-csv\MS 2018 Q3.csv" is 382364.
Information: 0x402090DF at Data Flow Task, OLE DB Destination [275]: The final commit for the data insertion in "OLE DB Destination" has started.
Information: 0x402090E0 at Data Flow Task, OLE DB Destination [275]: The final commit for the data insertion in "OLE DB Destination" has ended.
Information: 0x40043008 at Data Flow Task, SSIS.Pipeline: Post Execute phase is beginning.
Information: 0x402090DD at Data Flow Task, Flat File Source [2]: The processing of file "C:\Users\Shivam SARIN\Documents\Excel-csv\MS 2018 Q3.csv" has ended.
Information: 0x4004300B at Data Flow Task, SSIS.Pipeline: "OLE DB Destination" wrote 382363 rows.
Information: 0x40043009 at Data Flow Task, SSIS.Pipeline: Cleanup phase is beginning.
SSIS package "C:\Users\Shivam SARIN\source\repos\Integration Services Project3\Integration Services Project3\Package.dtsx" finished: Success.
The program '[21956] DtsDebugHost.exe: DTS' has exited with code 0 (0x0).

In my experience with postgres, I found that using the oledb destination to insert rows was incredibly slow and weird things as you noted above seemed to happen. It's slow because the provider does not support bulk insert operations.
My suggestion is to call psql with an Execute Process Task. This will be much faster and you will be using tooling that is native to postgres.
More on that here: https://www.postgresql.org/docs/9.6/app-psql.html

Related

Instant logging and file generation in Db2

I am currently using Community version on linux server, have configured db2audit process
Which generated audit files at respective location. Then user have to manually execute db2audit archive command to achieved logs file and have to execute thedb2 extract command to extract the archived files into flat ascIII files and then we have to load the files into respective tables.
There only we can analyze the logs by query the tables. In this whole process lots of manual intervention is required.
Question:-Do we have any config settings or utility
with the help of which we can generate logs files which include SQL statement event, host, session id,Timestamp and all instantly and automatically.
need to set instant level logging mechanism to generate flat files for logs of any SQL execution happened or any event triggered in database level in DB2 on linux server

Executing Batch service in Azure Data factory using python script

Hi i've been trying to execute a custom activity in ADF which receives csv file from the container (A) after further transformation on the data set, transformed DF stored into another csv file in a same container (A).
I've written the transformation logic in python and have it stored in the same container (A).
Error raises here, when i execute the pipeline it returns an error *can't find the specified file *
Nothing wrong in the connections, Is anything wrong in batch Account or pools!!
can anyone tell me where to place the python script..!!!
Install azure batch explorer and make sure to choose proper configuration for virtual machine (dsvm-windows) which will ensure python is already in place in the virtual machine where your code is being run.
This video explains the steps
https://youtu.be/_3_eiHX3RKE

Move file after transaction commit

I just started using Spring Batch and I don't know how I can implement my business need.
The behavior is quite simple : I have a directory where files are saved. My batch should detect those files, import them into my database and move the file to a backup directory (or an error directory if the data can't be saved).
So I create chunks of 1 file. The reader retrieve them and the processor imports the data.
I read Spring Batch create a global transaction for the whole chunk, and only the ChunkListener is called out of the transaction. It seems to be OK, but the input parameter is a ChunkContext. How can I retrieve the file managed in the chunk ? I don't see where it's stored in the ChunkContext.
I need to be sure the DB accepts the insertions before choosing where the file must be moved. That's why I need to do that after the commit.
Here is how you can proceed:
Create a service (based on a file system watching API or something like Spring Integration directory polling) that launches a batch job for the new file
The batch job can use a chunk-oriented step to read data and write it to the database. In that job, you can use a job/step execution listener or a separate step to move files to the backup/error directory according to the success or failure of the previous step.

Can Oracle golden gate adapter read trail file which generated by Extract or Pump process?

I am researching OGG, Can Oracle golden gate adapter read trail file which generated by Extract or Pump process? Most example I can find is OGG adapter works for replication process on delivery server, I just wanna apply the adapter whatever via plat file, jms or java api on extract or pump process directly.
The trail file is just an intermediate internal Oracle GoldenGate format which is used to store the data. The source is processed by the Extract process.
The Extract process can read the data from:
a database transaction log,
a JMS Queue or a flat file (through GoldenGate for Application Adapters)
The Extract process produces as output:
trail files (to be processed by Replicat processes)
XML files
SQL files
If you set a Trail file as the output, then the trail can be processed by:
another Extract (called Data Pump) to transfer the trail files to another location
a replicat Process - to apply the transactions to a destination
Later on the Replicat process can read the trails (produced by Extract) and apply the transactions to:
a database,
a JMS Queue or flat file (through GoldenGate for Application Adapters),
Hadoop, target (through GoldenGate for BigData)
The trail files have internal Oracle proprietary format and can not be produced by non-Oracle programs. You can read their content using the logdump program.

Import Excel 2010 into SQL Server

I use Excel to collect & configure data, then import it into SQL Server 2012 for storage.
So far I've been using the SQL Server Import & Export Wizard, but it is a pain to manually set it up constantly. Since I'm using Express, of course it won't allow me to save, or even view, the actual commands to transfer the data.
I tried to set up a linked server, per How to use Excel with SQL Server linked servers and distributed queries, but get the following error:
The linked server has been created but failed a connection test. Do you want to keep the linked server?
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
Cannot initialize the data source object of OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "FLTST".
OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "FLTST" returned message "Unspecified error". (Microsoft SQL Server, Error: 7303)
I thought perhaps the Excel version number was the problem, since the web page is from 2005, so I tried with:
Excel 8.0 (Excel 2002) as shown on the page
Excel 12.0 (Excel 2007) which is what the wizard seems to use
Excel 14.0 (Excel 2010) what I actually have
All of those gave me identical results.
Next I tried the distributed query as shown at Import excel file to SQL Server Express, (again with different variations of the provider string)
USE ExTest
SELECT * INTO TstTbl FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 14.0;Database=c:\ExTest.xlsm', [Contacts])
go
Which gives me the following error:
OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)" returned message "Unspecified error".
Msg 7303, Level 16, State 1, Line 3
Cannot initialize the data source object of OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)".
Instead of going to SQL Server & pulling the data in, should I stay in Excel & push it over?
What am I doing wrong?
PS: Please don't tell me to convert it to a csv file! I'm trying to do fewer steps, not more!
Having similar issues as you have in your question I have done some research on this. My issue is not yet fully resolved but I think I might get you one step further. Although the question is old there is perhaps someone else who needs the help.
By running:
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0 Xml;HDR=YES;Database=P:\Path\File.xlsx','SELECT * FROM [Sheet1$]');
GO
I get the following error message:
Msg 15281, Level 16, State 1, Line 19
SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', search for 'Ad Hoc Distributed Queries' in SQL Server Books Online.
To resolve that I run the following:
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'ad hoc distributed queries', 1
RECONFIGURE
GO
But I get a new error mesasge:
Msg 7302, Level 16, State 1, Line 19
Cannot create an instance of OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".
To rectify that I run:
EXEC sp_MSSet_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1
GO
EXEC sp_MSSet_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1
GO
But I get this error in stead:
Msg 7438, Level 16, State 1, Line 19
The 32-bit OLE DB provider "Microsoft.ACE.OLEDB.12.0" cannot be loaded in-process on a 64-bit SQL Server.
In my case I have asked the IT department to install a 64 bit version of excel on the server and I hope that should be the end of the technical problems when importing from excel.
To clean up afterwards I disable the settings I just enabled:
EXEC sp_MSSet_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 0
GO
EXEC sp_MSSet_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 0
GO
sp_configure 'ad hoc distributed queries', 0
RECONFIGURE
GO
sp_configure 'show advanced options', 0
RECONFIGURE
GO
Create an SSIS package with Excel data source connection manager, destination is your SQL express, OLE DB destination
When you create Excel connection manager, you can just use one existing excel file
Define one user variable, like user::sourceFile, which is used to input excel file full path
After Excel connection manager is created, right click-> preperties-> find the "Expression", just give your [User::sourceFile] to the Expression
Just create one simple data flow from your source to destination
Save and debug your SSIS package, make sure all credential works and data can flow into destination table. Note: don't save sensitvie data in your package with ecrypted by machine key
Each time when you need to load a new file, use DTEXEC to execute package and override the parameter
good luck