So I am tasked with performing some calculation in SSIS on a CSV file using T-SQL. Once I am complete I need to save the project as an ISPAC file so other users outside my business can run the program too, however their SQL connection managers will obviously differ from mine. To that end is it possible to use T-SQL to make modifications to a CSV inside of SSIS without actually using a connection manager to an SQL database? If so, what tasks would I use to achieve this? Many Thanks
EDIT: For more clarification on what i need to achieve: Basically I am tasked with adding additional data to a CSV file. For example adding forename and surname together etc, create new columns such as month numbers, names etc from a date string then output that to a new CSV location. I am only allowed to use T-SQL and SSIS to achieve this. Once Complete I need to send the SSIS project file to another organisation to use therefore I cannot hardcode any of my connections or assume they hold any of the databases I will use.
My initial question is can all this be done inside of SSIS using T-SQL but I think I have the answer to that now. I plan to parameterise the connection string, source file location, output file and use the Master DB and temp DBs to perform the SQL code to add the additional data, so they will have SQL their end but it wont make any assumptions on what they are using
Related
I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.
I am new to SSIS and am after some assistance in creating an SSIS package to do a specific task. My data is stored remotely within a MySQL Database and this is downloaded to a SQL Server 2014 Database. What I want to do is the following, create a package where I can enter 2 dates that can be compared against the create date/date modified per record on a number of tables to give me a snap shot and compare the MySQL Data to the SQL Data so that I can see if there are any rows that are missing from my local SQL Database or if any need to be updated. Some tables have no dates so I just want to see a record count on what is missing if anything between the 2. If this is better achieved through TSQL I am happy to hear about other suggestions or sites to look at where things have been done similar.
In relation to your query Tab :
"Hi Tab, What happens at the moment is our master data is stored in a MySQL Database, the data was then downloaded to a SQL Server Database as a one off. What happens at the moment is I have a SSIS package that uses the MAX ID which can be found on most of the tables to work out which records are new and just downloads them or updates them. What I want to do is run separate checks on the tables to make sure that during the download nothing has been missed and everything is within sync. In an ideal world I would like to pass in to a SSIS package or tsql stored procedure a date range, shall we say calender week, this would then check for any differences between the remote MySQL database tables and the local SQL tables. It does not currently have to do anything but identify issues, correcting them may come later or changes would need to be made to the existing sync package. Hope his makes more sense."
Thanks P
To do this, you need to implement a Type 1 Slowly Changing Dimension type data flow in SSIS. There are a number of ways to do this, including a built in transformation aptly called the Slowly Changing Dimension transformation. Whilst this is easy to set up, it is a pain to maintain and it runs horrendously slowly.
There are numerous ways to set this up using other transformations or even SQL merge statements which are detailed here: https://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/
I would recommend that you use Lookup transformations as they perform better than the Slowly Changing Dimension transformation but offer better diagnostics and error handling than the better performing SQL merge statement.
Before you do this you will need to add a Checksum or Hashbytes column to your SQL data for ease of comparison with the incoming MySQL data.
In short, calculate some sort of repeatable checksum as the data is downloaded into your SQL Server, then use this in an SSIS Lookup, matching on the row key, to check for changes. Where the checksum value is different for the same row it needs updating and where there is no matching row key in your SQL Data you need to insert the new row.
I am very new to Powershell and was wondering if it is possible to accomplish the following, which is based on a requirement to change SQL passwords every 3 months and update all the websites whose web.config files have connections strings at the same time to the new SQL password.
Ideally the new SQL passwords, server names & databases could be stored in a file (SQL table, XML files, csv, etc) and the script reads in the file. Then the script changes the SQL Passwords based on the content of a table that contains the names of the servers and databases. Immediately after step 1, the script updates the web.configs connection strings in about 20 websites to the new password used in step 1. I guess I need a table to hold the path to the web.configs too. Then it repeats the process by changing a SQL password and updating another 20 websites.
Nice to have would be creating a backup copy of the web.config prior to modifying it and wrapping the script in transaction just in case the script encounters an error.
Thanks
I have several tables I'm importing from ODBC using the import script step. Currently, I have an import script for each and every table. This is becoming unwieldy as I now have nearly 200 different tables.
I know I can calculate the SQL statement to say something like "Select * from " & $TableName. However, I can't figure out how to set the target table without specifying it in the script. Please, tell me I'm being dense and there is a good way to do this!
Thanks in advance for your assistance,
Nicole Willson
Integrated Research
Unfortunately, the target table of an import has to be hard coded in FileMaker up through version 12 if you're using the Import Records script step. I can think of a workaround to this, but it's rather convoluted and if you're importing a large number of records, would probably significantly increase the time to import them.
The workaround would be to not use the Import Records script step, but to script the creation of records and the population of data into fields yourself.
First of all, the success of this would depend on how you're using ODBC. As far as I can think, it would only work if you're using ODBC to create shadow tables within FileMaker so that FileMaker can access the ODBC database via other script steps. I'm not an expert with the other ODBC facilities of FileMaker, so I don't know if this workaround would be helpful in other cases.
So, if you have a shadow table into the remote ODBC database, then you can use a script something like the following. The basic idea is to have two sets of layouts, one for the shadow tables that information is coming from and another for the FileMaker tables that the information needs to go to. Loop through this list, pulling information from the shadow table into variables (or something like the dictionary library I wrote which you can find at https://github.com/chivalry/filemaker-dictionary). Then go to the layout linked to the target table, create a record and populate the fields.
This isn't a novice technique, however. In addition to using variables and loops, you're also going to have to use FileMaker's design functions to determine the source and destination of each field and Set Field By Name to put the data in the right place. But as far as I can tell, it's the only way to dynamically target tables for importing data.
I have written a VBScript to extract data from Active Directory into a record set. I'm now wondering what the most efficient way is to transfer the data into a SQL database.
I'm torn between;
Writing it to an excel file then firing an SSIS package to import it or...
Within the VBScript, iterating through the dataset in memory and submitting 3000+ INSERT commands to the SQL database
Would the latter option result in 3000+ round trips communicating with the database and therefore be the slower of the two options?
Sending an insert row by row is always the slowest option. This is what is known as Row by Agonizing Row or RBAR. You should avoid that if possible and take advantage of set based operations.
Your other option, writing to an intermediate file is a good option, I agree with #Remou in the comments that you should probably pick CSV rather than Excel if you are going to choose this option.
I would propose a third option. You already have the design in VB contained in your VBscript. You should be able to convert this easily to a script component in SSIS. Create an SSIS package, add a DataFlow task, add a Script Component (as a datasource {example here}) to the flow, write your fields out to the output buffer, and then add a sql destination and save yourself the step of writing to an intermediate file. This is also more secure, as you don't have your AD data on disk in plaintext anywhere during the process.
You don't mention how often this will run or if you have to run it within a certain time window, so it isn't clear that performance is even an issue here. "Slow" doesn't mean anything by itself: a process that runs for 30 minutes can be perfectly acceptable if the time window is one hour.
Just write the simplest, most maintainable code you can to get the job done and go from there. If it runs in an acceptable amount of time then you're done. If it doesn't, then at least you have a clean, functioning solution that you can profile and optimize.
If you already have it in a dataset and if it's SQL Server 2008+ create a user defined table type and send the whole dataset in as an atomic unit.
And if you go the SSIS route, I have a post covering Active Directory as an SSIS Data Source