Passing multiple variables with NVARCHAR(MAX) in SSIS - tsql

The starting point is a table in which product groups are defined. These are later to be used to limit the sales data to be loaded for certain products.
ProductGroup
1
2
The table that is to be restricted when loading is on a different server and does not know these product groups, but works with a unique ProductNumberID. To identify these a multi-step process is needed. With ProductGroups I get ProductGroupIDs from table ProductGroups, with ProductGroupIDs I get ProductIDs from table Products, with ProductIDs I finally get ProductNumberIDs from table ProductNumbers. Using STRING_AGG commands, I concatinate the rows into a single field and write the result into a variable and pass it into the next script of an Execute SQL task. Unfortunately, at some point in this cascade, I exceed the maximum length allowed for VARCHAR/NVARCHAR. NVARCHAR(MAX) for the variables is unfortunately not accepted. I would need a series of statements like this:
SELECT
STRING_AGG(CONVERT(NVARCHAR(MAX),ProductGroupID), ',') AS ProductGroupID
FROM ProductGroups
WHERE ProductGroup IN (?)
This would then be stored in a variable and passed to the next script task:
SELECT
STRING_AGG(CONVERT(NVARCHAR(MAX),ProductID), ',') AS ProductID
FROM Products
WHERE ProductGroupID IN (?)
And so on.
I am at a loss so any help is much appreciated.

Related

How to Link 2 Sheets that have the same fields

I am looking for some help with trying to link 2 sheets that have a number of Filters that I have setup but are both sitting in separate tables. The reason this is because I have a number of aggregated columns that are different for the 2 tables and want to keep this separately as I will be building more sheets as I go along.
The filters that are the same within the 2 sheets are the following:
we_date
product
manager
patch
Through the data manager I managed to create an association between the 2 tables for we_date but from reading on this site and other searches on Google I can't make any associations between these tables and this is where I am stuck.
The 2 sheets will now allow me to filter using the we_date, but if I use the filters for product, manager or patch then nothing happens on my 2nd sheet as they are not linked.
Currently in my data load editor I have 2 sections of select queries like the following:
Table1
QUALIFY *;
w:
SELECT
*
FROM
table1
;
UNQUALIFY *;
Table2
QUALIFY *;
w_c:
SELECT
*
FROM
table2
;
UNQUALIFY *;
I would really appreciate if somebody could advise a fix on the issue I am having.
In Qlik, field names of identical values from different tables are automatically associated.
When you're calling Qualify *, you're actually renaming all field names and explicitly saying NOT to associate.
Take a look at the Qlik Sense documentation on Qualify *:
The automatic join between fields with the same name in different
tables can be suspended by means of the qualify statement, which
qualifies the field name with its table name. If qualified, the field
name(s) will be renamed when found in a table. The new name will be in
the form of tablename.fieldname. Tablename is equivalent to the label
of the current table, or, if no label exists, to the name appearing
after from in LOAD and SELECT statements.
We can use as to manually reassign field names.
SELECT customer_id, private_info as "private_info_1", favorite_dog from table1;
SELECT customer_id, private_info as "private_info_2", car from table2;
Or, we can correctly use Qualify. Example:
table1 and table2 have a customer_id field, and private_info field. We want customer_id field to be the associative value, and private_info to not be. We would use QUALIFY on private_info, which Qlik would then rename based on file name.
QUALIFY private_info;
SELECT * from table1;
SELECT * from table2;
The following field names would then be: customer_id (associated), and table1.private_info, and table2.private_info

Most efficient way to DECODE multiple columns -- DB2

I am fairly new to DB2 (and SQL in general) and I am having trouble finding an efficient method to DECODE columns
Currently, the database has a number of tables most of which have a significant number of their columns as numbers, these numbers correspond to a table with the real values. We are talking 9,500 different values (e.g '502=yes' or '1413= Graduate Student')
In any situation, I would just do WHERE clause and show where they are equal, but since there are 20-30 columns that need to be decoded per table, I can't really do this (that I know of).
Is there a way to effectively just display the corresponding value from the other table?
Example:
SELECT TEST_ID, DECODE(TEST_STATUS, 5111, 'Approved, 5112, 'In Progress') TEST_STATUS
FROM TEST_TABLE
The above works fine.......but I manually look up the numbers and review them to build the statements. As I mentioned, some tables have 20-30 columns that would need this AND some need DECODE statements that would be 12-15 conditions.
Is there anything that would allow me to do something simpler like:
SELECT TEST_ID, DECODE(TEST_STATUS = *TableWithCodeValues*) TEST_STATUS
FROM TEST_TABLE
EDIT: Also, to be more clear, I know I can do a ton of INNER JOINS, but I wasn't sure if there was a more efficient way than that.
From a logical point of view, I would consider splitting the lookup table into several domain/dimension tables. Not sure if that is possible to do for you, so I'll leave that part.
As mentioned in my comment I would stay away from using DECODE as described in your post. I would start by doing it as usual joins:
SELECT a.TEST_STATUS
, b.TEST_STATUS_DESCRIPTION
, a.ANOTHER_STATUS
, c.ANOTHER_STATUS_DESCRIPTION
, ...
FROM TEST_TABLE as a
JOIN TEST_STATUS_TABLE as b
ON a.TEST_STATUS = b.TEST_STATUS
JOIN ANOTHER_STATUS_TABLE as c
ON a.ANOTHER_STATUS = c.ANOTHER_STATUS
JOIN ...
If things are too slow there are a couple of things you can try:
Create a statistical view that can help determine cardinalities from the joins (may help the optimizer creating a better plan):
https://www.ibm.com/support/knowledgecenter/sl/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0021713.html
If your license admits you can experiment with Materialized Query Tables (MQT). Note that there is a penalty for modifications of the base tables, so if you have more of a OLTP workload, this is probably not a good idea:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/index.html
A third option if your lookup table is fairly static is to cache the lookup table in the application. Read the TEST_TABLE from the database, and lookup descriptions in the application. Further improvements may be to add triggers that invalidate the cache when lookup table is modified.
If you don't want to do all these joins you could create yourself an own LOOKUP function.
create or replace function lookup(IN_ID INTEGER)
returns varchar(32)
deterministic reads sql data
begin atomic
declare OUT_TEXT varchar(32);--
set OUT_TEXT=(select text from test.lookup where id=IN_ID);--
return OUT_TEXT;--
end;
With a table TEST.LOOKUP like
create table test.lookup(id integer, text varchar(32))
containing some id/text pairs this will return the text value corrseponding to an id .. if not found NULL.
With your mentioned 10k id/text pairs and an index on the ID field this shouldn't be a performance issue as such data amount should be easily be cached in the corresponding bufferpool.

Improve dynamic SQL query performance or filter records another way

Preliminaries:
Our application can read data from an attached client SQL Server 2005 or 2008 database but make no changes to it, apart from using temp tables. We can create tables in our own database on their server.
The solution must work in SQL Server 2005.
The Schema:
Here is a simplified idea of the schema.
Group - Defines characteristics of a group of locations
Location - Defines characteristics of one geographic location. It links to the Group table.
GroupCondition - Links to a Group. It defines measures that apply to a subset of locations belonging to that group.
GroupConditionCriteria - Links to GroupCondition table. It names attributes, values, relational operators and boolean operators for a single phrase in a where clause. The named attributes are all fields of the Location table. There is a sequence number. Multiple rows in the GroupConditionCriteria must be strung together in proper sequence to form a full filter condition. This filter condition is implicitly restricted to those Locations that are part of the group associated with the GroupCondition. Location records that satisfy the filter criteria are "Included" and those that do not are "Excluded".
The Goal:
Many of our existing queries get attributes from the location table. We would like to join to something (table, temp table, query, CTE, openquery, UDF, etc.) that will give us the GroupCondition information for those Locations that are "Included". (A location could be included in more than one rule, but that is a separate issue.)
The schema for what I want is:
CREATE TABLE #LocationConditions
(
[PolicyID] int NOT NULL,
[LocID] int NOT NULL,
[CONDITIONID] int NOT NULL,
[Satisfies Condition] bit NOT NULL,
[Included] smallint NOT NULL
)
PolicyID identifies the group, LocID identifies the Location, CONDITIONID identifies the GroupCondition, [Satisfies Condition] is 1 if the filter includes the location record. (Included is derived from a different rule table with forced overrides of the filter condition. Not important for this discussion.)
Size of Problem:
My best effort so far can create such a table, but it is slow. For the current database I am testing, there are 50,000 locations affected (either included or excluded) by potentially matching rules (GroupConditions). The execution time is 4 minutes. If we do a periodic refresh and use a permanent table, this could be workabble, but I am hoping for something faster.
What I tried:
I used a series of CTEs, one of which is recursive, to concatenate the several parts of the filter condition into one large filter condition. As an example of such a condition:
(STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL'
There can be from one to five fields mentioned in the filter condition, and any number of parentheses used to group them. The operators that are supported are lt, le, gt, ge, =, <>, AND and OR.
Once I have the condition, it is still a text string, so I create an insert statement (that will have to be executed dynamically):
insert into LocationConditions
SELECT
1896,
390063,
38,
case when (STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL' then 1
else 0
end,
1
FROM Location loc
WHERE loc.LocID = 390063
I first add the insert statements to their own temp table, called #InsertStatements, then loop through them with a cursor. I execute each insert using EXEC.
CREATE TABLE #InsertStatements
(
[Insert Statement] nvarchar(4000) NOT NULL
)
-- Skipping over Lots of complicated CTE's to add to #InsertStatements
DECLARE #InsertCmd nvarchar(4000)
DECLARE InsertCursor CURSOR FAST_FORWARD
FOR
SELECT [Insert Statement]
FROM #InsertStatements
OPEN InsertCursor
FETCH NEXT FROM InsertCursor
INTO #InsertCmd
WHILE ##FETCH_STATUS = 0
BEGIN
--PRINT #InsertCmd
EXEC(#InsertCmd)
FETCH NEXT FROM InsertCursor
INTO #InsertCmd
END
CLOSE InsertCursor
DEALLOCATE InsertCursor
SELECT *
FROM #LocationConditions
ORDER BY PolicyID, LocID
As you can imagine, executing 50,000 dynamic SQL inserts is slow. How can I speed this up?
you have to insert each row individually? you can't use
insert into LocationConditions
SELECT
PolicyID,
LocID,
CONDITIONID,
case when (STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL' then 1
else 0
end,
Included
FROM Location loc
? You didn't show how you were creating your insert statements, so I can't tell if it's dependent on each row or not.

define a computed column reference another table

I have two database tables, Team (ID, NAME, CITY, BOSS, TOTALPLAYER) and
Player (ID, NAME, TEAMID, AGE), the relationship between the two tables is one to many, one team can have many players.
I want to know is there a way to define a TOTALPLAYER column in the Team table as computed?
For example, if there are 10 players' TEAMID is 1, then the row in Team table which ID is 1 has the TOTALPLAYER column with a value of 10. If I add a player, the TOTALPLAYER column's value goes up to 11, I needn't to explicitly assign value to it, let it generated by the database. Anyone know how to realize it?
Thx in advance.
BTW, the database is SQL Server 2008 R2
Yes, you can do that - you need a function to count the players for the team, and use that in the computed column:
CREATE FUNCTION dbo.CountPlayers (#TeamID INT)
RETURNS INT
AS BEGIN
DECLARE #PlayerCount INT
SELECT #PlayerCount = COUNT(*) FROM dbo.Player WHERE TeamID = #TeamID
RETURN #PlayerCount
END
and then define your computed column:
ALTER TABLE dbo.Team
ADD TotalPlayers AS dbo.CountPlayers(ID)
Now if you select, that function is being called every time, for each team being selected. The value is not persisted in the Team table - it's calculated on the fly each time you select from the Team table.
Since it's value isn't persisted, the question really is: does it need to be a computed column on the table, or could you just use the stored function to compute the number of players, if needed?
You don't have to store the total in the table -- it can be computed when you do a query, something like:
SELECT teams.*, COUNT(players.id) AS num_players
FROM teams LEFT JOIN players ON teams.id = players.team_id
GROUP BY teams.id;
This will create an additional column "num_players" in the query, which will be a count of the number of players on each team, if any.

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.