I am doing 2 separate SQL queries on separate databases / connections in a Powershell script. The goal is to export the results of both requests into a single CSV file.
What I am doing now is:
# Create a data table for Clients
$ClientsTable = new-object "System.Data.DataTable"
# Create text commands
$ClientsCommand1 = $connection1.CreateCommand()
$ClientsCommand1.CommandText = $ClientsQuery1
$ClientsCommand2 = $connection2.CreateCommand()
$ClientsCommand2.CommandText = $ClientsQuery2
# Get Clients results
$ClientsResults1 = $ClientsCommand1.ExecuteReader()
$ClientsResults2 = $ClientsCommand2.ExecuteReader()
# Load Clients in data table
$ClientsTable.Load($ClientsResults1)
$ClientsTable.Load($ClientsResults2)
# Export Clients data table to CSV
$ClientsTable | export-csv -Encoding UTF8 -NoTypeInformation -delimiter ";" "C:\test\clients.csv"
where $connection1 and $connection2 are opened System.Data.SqlClient.SqlConnection.
Both requests work fine and both output data with exactly the same columns names. If I export the 2 results sets to 2 separate CSV files, all is fine.
But loading the results in the data table as above fails with the following message:
Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.
If instead I switch the order in which I load data into the data tables, like
$ClientsTable.Load($ClientsResults2)
$ClientsTable.Load($ClientsResults1)
(load second results set before the first one), then the error goes away and my CSV is generated without any problem with the data from the 2 requests. I cannot think of why appending data in one way, or the other, would trigger this error, or work fine.
Any idea?
I'm skeptical reversing the order works. More likely, it's doing something like appending to the csv file that was already created from the first attempt.
It is possible, though, that different primary key definitions from the original data could produce the results you're seeing. Datatable.Load() can do unexpected things when pulling data from an additional sort. It will try to MERGE the data rather than simply append it, using different matching strategies depending on the overload and argument. If the primary key used for the one of the tables causes nothing match and no records to merge, but the primary key for the table matched everything, that might explain it.
If you want to just append the results, what you want to do instead is Load() the first result into the datatable, export to CSV, clear the table, load the second result into the table, and then export again in append mode.
Related
I am working on a solution that involves merging two queries in Power Query to retrieve a single data table back to Excel. The first query is always populated but the other query comes from an ERP and might be empty (empty table) from time to time.
Appending the two queries involves making the header names the same in the two queries before the appending takes place. As the second query sometimes results in an empty table, the error arises in the steps when Power Query is modifying the header names in the second table (it cannot modify the header names as there are no headers).
"Error message: Expression.Error: The column 'PartMtl_Company' of the table wasn't found.
Details: PartMtl_Company" where the PartMtl_Company is the leftmost column in my table.
I am kind of thinking that I would need to evaluate whether the second table is empty and skip the renaming steps if that is the case. I assume merging the populated first table with an empty table would cause no problem and would only result in the first table. I have tried to look around for a suitable M-code but have not come across such.
I'm thinking you might be able to use Table.RowCount to solve this. Something along the lines of:
= if Table.RowCount(Table2) > 0 then...
You would modify the headers only if there is data in the second table. Same goes for the appending of the tables: you would only append if there is data in the second table, since you won't have renamed any headers otherwise.
Thank you Marc! That did the trick.
In the end, I wrote some in the lines of
= if Table.RowCount(Table2) > 0 then... (code that works on a non-empty table) ...else Table2
, which returns the empty table if it is empty to begin with. Appending the second table into the first table did not throw an error but returned only the first table like planned.
I am trying to read in a large CSV with millions of rows for testing. I know that I can treat the CSV as a database using the provider Microsoft.ACE.OLEDB.12.0
Using a small data set I am able to read the row contents positionally using .GetValue(int). I am having a tough time finding a better was to read the data (assuming there even is one.). If I know the column names before hand this is easy. However if I didn't know them I would have to read in the first line of the file to get that data which seems silly.
#"
id,first_name,last_name,email,ip_address
1,Edward,Richards,erichards0#businessweek.com,201.133.112.30
2,Jimmy,Scott,jscott1#clickbank.net,103.231.149.144
3,Marilyn,Williams,mwilliams2#chicagotribune.com,52.180.157.43
4,Frank,Morales,fmorales3#google.ru,218.175.165.205
5,Chris,Watson,cwatson4#ed.gov,75.251.1.149
6,Albert,Ross,aross5#abc.net.au,89.56.133.54
7,Diane,Daniels,ddaniels6#washingtonpost.com,197.156.129.45
8,Nancy,Carter,ncarter7#surveymonkey.com,75.162.65.142
9,John,Kennedy,jkennedy8#tumblr.com,85.35.177.235
10,Bonnie,Bradley,bbradley9#dagondesign.com,255.67.106.193
"# | Set-Content .\test.csv
$conn = New-Object System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='C:\Users\Matt';Extended Properties='Text;HDR=Yes;FMT=Delimited';")
$cmd=$conn.CreateCommand()
$cmd.CommandText="Select * from test.csv where first_name like '%n%'"
$conn.open()
$data = $cmd.ExecuteReader()
$data | ForEach-Object{
[pscustomobject]#{
id=$_.GetValue(0)
first_name=$_.GetValue(1)
last_name=$_.GetValue(2)
ip_address=$_.GetValue(4)
}
}
$cmd.Dispose()
$conn.Dispose()
Is there a better way to deal with the output from $cmd.ExecuteReader()? Finding hard to get information for a CSV import. Most of the web deals with exporting to CSV using this provider from a SQL database. The logic here would be applied to a large CSV so that I don't need to read the whole thing in just to ignore most of the data.
I should have looked closer on TechNet for the OleDbDataReader Class. There are a few methods and properties that help understand the data returned from the SQL statement.
FieldCount: Gets the number of columns in the current row.
So if nothing else you know how many columns your rows have.
Item[Int32]: Gets the value of the specified column in its native format given the column ordinal.
Which I can use to pull back the data from each row. This appears to work the same as GetValue().
GetName(Int32): Gets the name of the specified column.
So if you don't know what the column is named this is what you can use to get it from a given index.
There are many other methods and some properties but those are enough to shed light if you are not sure what data is contained within a csv (assuming you don't want to manually verify before hand). So, knowing that, a more dynamic way to get the same information would be...
$data | ForEach-Object{
# Save the current row as its own object so that it can be used in other scopes
$dataRow = $_
# Blank hashtable that will be built into a "row" object
$properties = #{}
# For every field that exists we will add it name and value to the hashtable
0..($dataRow.FieldCount - 1) | ForEach-Object{
$properties.($dataRow.GetName($_)) = $dataRow.Item($_)
}
# Send the newly created object down the pipeline.
[pscustomobject]$properties
}
$cmd.Dispose()
$conn.Dispose()
Only downside of this is that the columns will likely be output in not the same order as the originating CSV. That can be address by saving the row names in a separate variable and using a Select at the end of the pipe. This answer was mostly trying to make sense of the column names and values returned.
I have a PostgreSQL database. I had to extend an existing, big table with a few more columns.
Now I need to fill those columns. I tought I can create an .csv file (out of Excel/Calc) which contains the IDs / primary keys of existing rows - and the data for the new, empty fields. Is it possible to do so? If it is, how to?
I remember doing exactly this pretty easily using Microsoft SQL Management Server, but for PostgreSQL I am using PG Admin (but I am ofc willing to switch the tool if it'd be helpfull). I tried using the import function of PG Admin which uses the COPY function of PostgreSQL, but it seems like COPY isn't suitable as it can only create whole new rows.
Edit: I guess I could write a script which loads the csv and iterates over the rows, using UPDATE. But I don't want to reinvent the wheel.
Edit2: I've found this question here on SO which provides an answer by using a temp table. I guess I will use it - although it's more of a workaround than an actual solution.
PostgreSQL can import data directly from CSV files with COPY statements, this will however only work, as you stated, for new rows.
Instead of creating a CSV file you could just generate the necessary SQL UPDATE statements.
Suppose this would be the CSV file
PK;ExtraCol1;ExtraCol2
1;"foo",42
4;"bar",21
Then just produce the following
UPDATE my_table SET ExtraCol1 = 'foo', ExtraCol2 = 42 WHERE PK = 1;
UPDATE my_table SET ExtraCol1 = 'bar', ExtraCol2 = 21 WHERE PK = 4;
You seem to work under Windows, so I don't really know how to accomplish this there (probably with PowerShell), but under Unix you could generate the SQL from a CSV easily with tools like awk or sed. An editor with regular expression support would probably suffice too.
I am looking for the easiest way to manually dump a subset of records in an OpenEdge database table in the Progress ".d" file format.
The best way I can imagine is creating an extra test database with the identical schema as the source database, and then copying the subset of records over to the test database using FOR EACH and BUFFER-COPY statements. Then just export the data from the test database using the Dump Data and Definitions Table Contens (.d file )... menu option.
That seems like a lot of trouble. If you can identify the subset of records in order to do the BUFFER-COPY than you should also be able to:
OUTPUT TO VALUE( "table.d" ).
FOR EACH table NO-LOCK WHERE someCondition:
EXPORT table.
END.
OUTPUT CLOSE.
Which is, essentially, what the dictionary "dump data" .d file is less a few lines of administrivia at the bottom which can be safely omitted for most purposes.
What is the best way to delete from a table using Talend?
I'm currently using a tELTJDBCoutput with the action on Delete.
It looks like Talend always generate a DELETE ... WHERE EXISTS (<your generated query>) query.
So I am wondering if we have to use the field values or just put a fixed value of 1 (even in only one field) in the tELTmap mapping.
To me, putting real values looks like it useless as in the where exists it only matters the Where clause.
Is there a better way to delete using ELT components?
My current job is set up like so:
The tELTMAP component with real data values looks like:
But I can also do the same thing with the following configuration:
Am I missing the reason why we should put something in the fields?
The following answer is a demonstration of how to perform deletes using ETL operations where the data is extracted from the database, read in to memory, transformed and then fed back into the database. After clarification, the OP specifically wants information around how this would differ for ELT operations
If you need to delete certain records from a table then you can use the normal database output components.
In the following example, the use case is to take some updated database and check to see which records are no longer in the new data set compared to the old data set and then delete the relevant rows in the old data set. This might be used for refreshing data from one live system to a non live system or some other usage case where you need to manually move data deltas from one database to another.
We set up our job like so:
Which has two tMySqlConnection components that connect to two different databases (potentially on different hosts), one containing our new data set and one containing our old data set.
We then select the relevant data from the old data set and inner join it using a tMap against the new data set, capturing any rejects from the inner join (rows that exist in the old data set but not in the new data set):
We are only interested in the key for the output as we will delete with a WHERE query on this unique key. Notice as well that the key has been selected for the id field. This needs to be done for updates and deletes.
And then we simply need to tell Talend to delete these rows from the relevant table by configuring our tMySqlOutput component properly:
Alternatively you can simply specify some constraint that would be used to delete the records as if you had built the DELETE statement manually. This can then be fed in as the key via a main link to your tMySqlOutput component.
For instance I might want to read in a CSV with a list of email addresses, first names and last names of people who are opting out of being contacted and then make all of these fields a key and connect this to the tMySqlOutput and Talend will generate a DELETE for every row that matches the email address, first name and last name of the records in the database.
In the first example shown in your question:
you are specifically only selecting (for the deletion) products where the SOME_TABLE.CODE_COUNTRY is equal to JS_OPP.CODE_COUNTRY and SOME_TABLE.FK_USER is equal to JS_OPP.FK_USER in your where clause and then the data you send to the delete statement is setting the CODE_COUNTRY equal to JS_OPP.CODE_COUNTRY and FK_USER equal to JS_OPP.CODE_COUNTRY.
If you were to put a tLogRow (or some other output) directly after your tELTxMap you would be presented with something that looks like:
.----------+---------.
| tLogRow_1 |
|=-----------+------=|
|CODE_COUNTRY|FK_USER|
|=-----------+------=|
|GBR |1 |
|GBR |2 |
|USA |3 |
'------------+-------'
In your second example:
You are setting CODE_COUNTRY to an integer of 1 (your database will then translate this to a VARCHAR "1"). This would then mean the output from the component would instead look like:
.------------.
|tLogRow_1 |
|=-----------|
|CODE_COUNTRY|
|=-----------|
|1 |
|1 |
|1 |
'------------'
In your use case this would mean that the deletion should only delete the rows where the CODE_COUNTRY is equal to "1".
You might want to test this a bit further though because the ELT components are sometimes a little less straightforward than they seem to be.