I have some codes in SQL developer in which we are pulling data from oracle server and then we are doing too many updation and calculation using sub-queries. In some sub-queries we used join also. Finally we have a single block of codes with multiple sub-queries inside it. Now we need to run this code every month to update data for our new created table, so we want to automated this process and i don't know how to automate these codes in SQL developer
Related
My ssis package has an oledb source which joins oracle and sql server to get source data and loads it into sql server oledb destination. Earlier we were using linked server for this purpose but we cannot use linked server anymore.
So I am taking the data from sql server and want to return it to the in clause of the oracle query which i am keeping as sql command oledb source.
I tried parsing an object type variable from sql server and putting it into the in clause of oracle query in oledb source but i get error that oracle cannot have more than 1000 literals in the in statement. So basically I think I have to do something like this:
select * from oracle.db where id in (select id from sqlserver.db).
Since I cannot use linked server so i was thinking if I could have a temp table which can be used throughout the package.
I tried out another way of using merge join in ssis. but my source data set is really large and the merge join is returning fewer rows than expecetd. I am badly stuck at this point. I have tried a number if things nothung seems to be working.
Can someone please help. Any help will be greatly appreciated.
A couple of options to try.
Lookup:
My first instinct was a Lookup Task, but that might not be a great solution depending on the size of your data sets, since all of the records from both tables have to pulled over the wire and stored in memory on the SSIS server. But if you were able to pull off a Merge Join, then a Lookup should also work, but it might be slow.
Set an OLE DB Source to pull the Oracle data, without the WHERE clause.
Set a Lookup to pull the id column from your SQL Server table.
On the General tab of the Lookup, under Specify how to handle rows with no matching entries, select Redirect rows to no-match output.
The output of the Lookup will just be the Oracle rows that found a matching row in your SQL Server query.
Working Table on the Oracle server
If you have the option of creating a table in the Oracle database, you could create a Data Flow Task to pipe the results of your SQL Server query into a working table on the Oracle box. Then, in a subsequent Data Flow, just construct your Oracle query to use that working table as a filter.
Probably follow that up with an Execute SQL Task to truncate that working table.
Although this requires write access to Oracle, it has the advantage of off-loading the heavy lifting of the query to the database machine, and only pulling the rows you care about over the wire.
I am new to SSIS and am after some assistance in creating an SSIS package to do a specific task. My data is stored remotely within a MySQL Database and this is downloaded to a SQL Server 2014 Database. What I want to do is the following, create a package where I can enter 2 dates that can be compared against the create date/date modified per record on a number of tables to give me a snap shot and compare the MySQL Data to the SQL Data so that I can see if there are any rows that are missing from my local SQL Database or if any need to be updated. Some tables have no dates so I just want to see a record count on what is missing if anything between the 2. If this is better achieved through TSQL I am happy to hear about other suggestions or sites to look at where things have been done similar.
In relation to your query Tab :
"Hi Tab, What happens at the moment is our master data is stored in a MySQL Database, the data was then downloaded to a SQL Server Database as a one off. What happens at the moment is I have a SSIS package that uses the MAX ID which can be found on most of the tables to work out which records are new and just downloads them or updates them. What I want to do is run separate checks on the tables to make sure that during the download nothing has been missed and everything is within sync. In an ideal world I would like to pass in to a SSIS package or tsql stored procedure a date range, shall we say calender week, this would then check for any differences between the remote MySQL database tables and the local SQL tables. It does not currently have to do anything but identify issues, correcting them may come later or changes would need to be made to the existing sync package. Hope his makes more sense."
Thanks P
To do this, you need to implement a Type 1 Slowly Changing Dimension type data flow in SSIS. There are a number of ways to do this, including a built in transformation aptly called the Slowly Changing Dimension transformation. Whilst this is easy to set up, it is a pain to maintain and it runs horrendously slowly.
There are numerous ways to set this up using other transformations or even SQL merge statements which are detailed here: https://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/
I would recommend that you use Lookup transformations as they perform better than the Slowly Changing Dimension transformation but offer better diagnostics and error handling than the better performing SQL merge statement.
Before you do this you will need to add a Checksum or Hashbytes column to your SQL data for ease of comparison with the incoming MySQL data.
In short, calculate some sort of repeatable checksum as the data is downloaded into your SQL Server, then use this in an SSIS Lookup, matching on the row key, to check for changes. Where the checksum value is different for the same row it needs updating and where there is no matching row key in your SQL Data you need to insert the new row.
I use INavigor system for ad-hoc data extraction from the DB2 database. Only issue is that when it comes to automation. Is there a way I could automate the SQL code to be run on a specific time? I know there is Advance Job Sheduler but I'm not sure how the SQL can be added to the Sheduler. Any one who can help?
IBM added a Run SQL Statements (RUNSQL) CL command at v7.1.
Prior to that, you could store SQL statements in source files and run them with the Run SQL Statements (RUNSQLSTM) command.
Neither of the above allow an SQL Select to be run by itself. For data extraction, you'd want INSERT INTO tbl (SELECT <...> FROM <...>)
For reporting SELECTs, your best bet is to create a Query Manager query (*QMQRY object) and form (*QMFORM object) via Start DB2 UDB Query Manager (STRQM); which can then be run by the Start Query Management Query (STRQMQRY) command. Query Manager (QM) is SQL based, unlike the older Query/400 product. QM manual is here
One last option, is the db2 utility available via QShell.
Don't waste effort creating a day late going out of business because the job scheduler hasn't updated the file system.
Real businesses need real time data.
Just make a SQL view on the iseries that pulls the info you need together.
Query the view externally in real time. Even if you need last 30 days or last month or year to date. These are all simple views to create.
I am trying to run RAWSQL_REAL("select sum(amount_us)from gbsa_dpo_itg.Fact_tblHistoryData_new where qtr_data='Q42014'") in calculated field and I am getting error message ERROR 2133: Aggregate function calls cannot contain subqueries.
I am using tableau 8.3.3 and HP Vertica database live connection to tableau
When I run the same query in custom sql it is working fine
pleas help in this
thanks in advance
Read the manual about these functions, look under reference, functions
You don't generally pass an entire SQL string to execute in isolation. Instead, they are useful for writing expressions or calling non standard functions that your server may provide, which are embedded within the SQL that Tableau generates. So first learn to use Tableau to get the effect you want, and then resort to Raw SQL functions in the rare case where you need to access some database server specific feature.
There is no reason that you would need Raw SQL to get the information above using Tableau. You could put amount_us on the row shelf and qtr_data on the filter shelf, and Tableau would generate a similar query.
If you are doing this to combine data from multiple queries, first learn about calculated fields and data blending.
I am new to Tableau, and having performance issues and need some help. I have a query that joins several large tables. I am using a live data connection to a MySQL db.
The issue I am having is that it is not applying the filter criteria before asking MySQL for the data. So it is essentially doing a SELECT * from my query and not applying the filter criteria to the where clause. It pulls all the data from MySQL db back to Tableau, then throws away the un-needed data based on my filter criteria. My two main filter criteria are on account_id and a date range.
I can cleanly get a list of the accounts from just doing a select from my account table to populate the filter list, then need to know how to apply that selection when it goes to pull the data from the main data query from MySQL.
To apply a filter at the data source first, try using context filters.
Performance can also be improved by using extracts.
I would personally use an extract, go into your MySQL DB Back-end, run the query, and a CREATE TABLE extract1 AS statement, or whatever you want to call your data table.
When you import this table into Tableau it will already have a SELECT * of your aggregate data in the workbook. From here your query efficiency will be increased ten fold.
Unfortunately, it's going to take awhile for Tableau processing time + mySQL backend DB query time = Ntime to process your data.
Try the extracts...
I've been struggling with the very same thing. I have found that the tableau extracts aren't any faster than pulling directly from a SQL table. What I have done is within SQL created tables that already have the filtered data in them, so the Select * will have only the needed data. The downside to this is it takes up more space on the server, but this isn't a problem on my side.
For the Large Data sets Tableau recommend using an Extract.
An extract will create a snapshot of the data that you are connected with and processing on this data will be faster than a live connection.
All the charts and visualization will load faster and saves your time, each time when you go to the Dashboard.
For the filters that you are using to filter the data-set will work faster in an extract connection. But to get the latest data you have to refresh the extract or schedule a refresh in the server ( if you are uploading the report to server).
There are multiple type of filters available in Tableau, the use of which depends on your application, context filters and global filters can be use to filter the whole set of data.