Initially I thought of this approach: creating a full web app that would allow uploading the files into the Data Store, then Data Flow would do the ETL by reading the files from the Data Store and putting the transformed data in Cloud SQL, and then have a different section that would allow passing the query to output the results in a CSV file into the Data Store.
However I want to keep it as simple as possible, so I my idea is to create a Perl script in the Cloud that does the ETL and also another script in which you can pass a SQL query as argument and output the results in a CSV file into the Data Store. This script would be invoked remotely. The idea is to execute the script without having to install all the stack in each client (Google SQL proxy, etc.), just by executing a local script with the arguments that would be passed to the remote script.
Can this be done? If so, how? And in addition to that, does this approach makes sense?
Thanks!!
Related
Every time the .csv file appearing in the blob storage, i have to create DDL from that manually on azure sql. The data type is based on the value specified for that field.
The file have 400 column, and manually it is taking lots of time.
May someone please suggest how to automate this using SP or script, so when we execute the script, it will create TABLE or DDL script, based on the file in the blob storage.
I am not sure if it is possible, or is there any better way to handle such scenario.
Appreciate yours valuable suggestion.
Many Thanks
This can be achieved in multiple ways. As you mentioned about automating it, you can use Azure function as well.
Firstly create a function that reads the csv file from blob storage:
Read a CSV Blob file in Azure
Then add the code to generate the DDL statement:
Uploading and Importing CSV File to SQL Server
Azure function can be scheduled or run when new files are added to blob storage.
If this is once a day kind of requirement and can manually be done as well, we can download the file from blob and use the 'Import Flat File' functionality available within SSMS where we can just specify the csv file and it creates the schema based on existing column values.
So I am tasked with performing some calculation in SSIS on a CSV file using T-SQL. Once I am complete I need to save the project as an ISPAC file so other users outside my business can run the program too, however their SQL connection managers will obviously differ from mine. To that end is it possible to use T-SQL to make modifications to a CSV inside of SSIS without actually using a connection manager to an SQL database? If so, what tasks would I use to achieve this? Many Thanks
EDIT: For more clarification on what i need to achieve: Basically I am tasked with adding additional data to a CSV file. For example adding forename and surname together etc, create new columns such as month numbers, names etc from a date string then output that to a new CSV location. I am only allowed to use T-SQL and SSIS to achieve this. Once Complete I need to send the SSIS project file to another organisation to use therefore I cannot hardcode any of my connections or assume they hold any of the databases I will use.
My initial question is can all this be done inside of SSIS using T-SQL but I think I have the answer to that now. I plan to parameterise the connection string, source file location, output file and use the Master DB and temp DBs to perform the SQL code to add the additional data, so they will have SQL their end but it wont make any assumptions on what they are using
I have some Perl scripts on a unix-based server, which access a common text file containing server IPs and login credentials, which are used to login and perform routine operations on those servers. Currently, these scripts are being run manually at different times.
I would like to know that if I cron these scripts to execute at the same time, will it cause any issues with accessing data from the text file (file locking?), since all scripts will essentially be accessing the data file at the same time?
Also, is there a better way to do it (without using a DB - since I can't, due to some server restrictions) ?
It depends on which kind of access.
There is no problem in reading the data file from multiple processes. If you want to update the data file while it could be read, it's better to do it atomically (e.g. write a new version under different name, than rename it).
Similar to SQL script to create insert script, I need to generate a list of INSERT from a table to load onto another database (sqlite), like the dump command. I do this for sync purposes.
I have limitations because this will run on a cloud server without acces to the filesystem, so I need to do this in the DB (I can do this in the app server, I'm asking if is possible to do this in the DB directly).
In the app server, I load a datatable, walk his fieldnames and datatypes and build a insert... I wonder if exist a way to do the same in the DB...
I am not entirely sure whether that helps, but you can use simple ETL tool like 'Pentaho Kettle'. I used it once for a similar function and it did not take me more than 10 min. You can also schedule the jobs. I am not sure whether it is supported in database level.
Thanks,
Shankar
I have written a VBScript to extract data from Active Directory into a record set. I'm now wondering what the most efficient way is to transfer the data into a SQL database.
I'm torn between;
Writing it to an excel file then firing an SSIS package to import it or...
Within the VBScript, iterating through the dataset in memory and submitting 3000+ INSERT commands to the SQL database
Would the latter option result in 3000+ round trips communicating with the database and therefore be the slower of the two options?
Sending an insert row by row is always the slowest option. This is what is known as Row by Agonizing Row or RBAR. You should avoid that if possible and take advantage of set based operations.
Your other option, writing to an intermediate file is a good option, I agree with #Remou in the comments that you should probably pick CSV rather than Excel if you are going to choose this option.
I would propose a third option. You already have the design in VB contained in your VBscript. You should be able to convert this easily to a script component in SSIS. Create an SSIS package, add a DataFlow task, add a Script Component (as a datasource {example here}) to the flow, write your fields out to the output buffer, and then add a sql destination and save yourself the step of writing to an intermediate file. This is also more secure, as you don't have your AD data on disk in plaintext anywhere during the process.
You don't mention how often this will run or if you have to run it within a certain time window, so it isn't clear that performance is even an issue here. "Slow" doesn't mean anything by itself: a process that runs for 30 minutes can be perfectly acceptable if the time window is one hour.
Just write the simplest, most maintainable code you can to get the job done and go from there. If it runs in an acceptable amount of time then you're done. If it doesn't, then at least you have a clean, functioning solution that you can profile and optimize.
If you already have it in a dataset and if it's SQL Server 2008+ create a user defined table type and send the whole dataset in as an atomic unit.
And if you go the SSIS route, I have a post covering Active Directory as an SSIS Data Source