define a computed column reference another table - sql-server-2008-r2

I have two database tables, Team (ID, NAME, CITY, BOSS, TOTALPLAYER) and
Player (ID, NAME, TEAMID, AGE), the relationship between the two tables is one to many, one team can have many players.
I want to know is there a way to define a TOTALPLAYER column in the Team table as computed?
For example, if there are 10 players' TEAMID is 1, then the row in Team table which ID is 1 has the TOTALPLAYER column with a value of 10. If I add a player, the TOTALPLAYER column's value goes up to 11, I needn't to explicitly assign value to it, let it generated by the database. Anyone know how to realize it?
Thx in advance.
BTW, the database is SQL Server 2008 R2

Yes, you can do that - you need a function to count the players for the team, and use that in the computed column:
CREATE FUNCTION dbo.CountPlayers (#TeamID INT)
RETURNS INT
AS BEGIN
DECLARE #PlayerCount INT
SELECT #PlayerCount = COUNT(*) FROM dbo.Player WHERE TeamID = #TeamID
RETURN #PlayerCount
END
and then define your computed column:
ALTER TABLE dbo.Team
ADD TotalPlayers AS dbo.CountPlayers(ID)
Now if you select, that function is being called every time, for each team being selected. The value is not persisted in the Team table - it's calculated on the fly each time you select from the Team table.
Since it's value isn't persisted, the question really is: does it need to be a computed column on the table, or could you just use the stored function to compute the number of players, if needed?

You don't have to store the total in the table -- it can be computed when you do a query, something like:
SELECT teams.*, COUNT(players.id) AS num_players
FROM teams LEFT JOIN players ON teams.id = players.team_id
GROUP BY teams.id;
This will create an additional column "num_players" in the query, which will be a count of the number of players on each team, if any.

Related

Passing multiple variables with NVARCHAR(MAX) in SSIS

The starting point is a table in which product groups are defined. These are later to be used to limit the sales data to be loaded for certain products.
ProductGroup
1
2
The table that is to be restricted when loading is on a different server and does not know these product groups, but works with a unique ProductNumberID. To identify these a multi-step process is needed. With ProductGroups I get ProductGroupIDs from table ProductGroups, with ProductGroupIDs I get ProductIDs from table Products, with ProductIDs I finally get ProductNumberIDs from table ProductNumbers. Using STRING_AGG commands, I concatinate the rows into a single field and write the result into a variable and pass it into the next script of an Execute SQL task. Unfortunately, at some point in this cascade, I exceed the maximum length allowed for VARCHAR/NVARCHAR. NVARCHAR(MAX) for the variables is unfortunately not accepted. I would need a series of statements like this:
SELECT
STRING_AGG(CONVERT(NVARCHAR(MAX),ProductGroupID), ',') AS ProductGroupID
FROM ProductGroups
WHERE ProductGroup IN (?)
This would then be stored in a variable and passed to the next script task:
SELECT
STRING_AGG(CONVERT(NVARCHAR(MAX),ProductID), ',') AS ProductID
FROM Products
WHERE ProductGroupID IN (?)
And so on.
I am at a loss so any help is much appreciated.

Many to Many Table - Performance is bad

The following tables are given:
--- player --
id serial
name VARCHAR(100)
birthday DATE
country VARCHAR(3)
PRIMARY KEY id
--- club ---
id SERIAL
name VARCHAR(100)
country VARCHAR(3)
PRIMARY KEY id
--- playersinclubs ---
id SERIAL
player_id INTEGER (with INDEX)
club_id INTEGER (with INDEX)
joined DATE
left DATE
PRIMARY KEY id
Every player has a row in table player (with his attributes). Equally every club has an entry in table club.
For every station in his career, a player has an entry in table playersInClubs (n-m) with the date when the player joined and optionally when the player left the club.
My main problem is the performance of these tables. In Table player we have over 10 million entries. If i want to display a history of a club with all his players played for this club, my select looks like the following:
SELECT * FROM player
JOIN playersinclubs ON player.id = playersinclubs.player_id
JOIN club ON club.id = playersinclubs.club_id
WHERE club.dbid = 3;
But for the massive load of players a sequence scan on table player will be executed. This selection takes a lot of time.
Before I implemented some new functions to my app, every players has exactly one team (only todays teams and players).
So i havn't had the table playersinclubs. Instead i had a team_id in table player. I could select the players of a team directly in table player with the where clause team_id = 3.
Does someone has some performance tips for my database structure to speed up these selections?
Most importantly, you need an index on playersinclubs(club_id, player_id). The rest is details (that may still make quite a difference).
You need to be precise about your actual goals. You write:
all his players played for this club:
You don't need to join to club for this at all:
SELECT p.*
FROM playersinclubs pc
JOIN player p ON p.id = pc.player_id
WHERE pc.club_id = 3;
And you don't need columns playersinclubs in the output either, which is a small gain for performance - unless it allows an index-only scan on playersinclubs, then it may be substantial.
How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?
You probably don't need all columns of player in the result, either. Only SELECT the columns you actually need.
The PK on player provides the index you need on that table.
You need an index on playersinclubs(club_id, player_id), but do not make it unique unless players are not allowed to join the same club a second time.
If players can join multiple times and you just want a list of "all players", you also need to add a DISTINCT step to fold duplicate entries. You could just:
SELECT DISTINCT p.* ...
But since you are trying to optimize performance: it's cheaper to eliminate dupes early:
SELECT p.*
FROM (
SELECT DISTINCT player_id
FROM playersinclubs
WHERE club_id = 3;
) pc
JOIN player p ON p.id = pc.player_id;
Maybe you really want all entries in playersinclubs and all columns of the table, too. But your description says otherwise. Query and indexes would be different.
Closely related answer:
Find overlapping date ranges in PostgreSQL
The tables look fine and so does the query. So let's see what the query is supposed to do:
Select the club with ID 3. One record that can be accessed via the PK index.
Select all playersinclub records for club ID 3. So we need an index starting with this column. If you don't have it, create it.
I suggest:
create unique index idx_playersinclubs on playersinclubs(club_id, player_id, joined);
This would be the table's unique business key. I know that in many databases with technical IDs these unique constraints are not established, but I consider this a flaw in those databases and would always create these constraints/indexes.
Use the player IDs got thus and select the players accordingly. We can get the player ID from the playersinclubs records, but it is also the second column in our index, so the DBMS may choose one or the other to perform the join. (It will probably use the column from the index.)
So maybe it is simply that above index does not exist yet.

Improve dynamic SQL query performance or filter records another way

Preliminaries:
Our application can read data from an attached client SQL Server 2005 or 2008 database but make no changes to it, apart from using temp tables. We can create tables in our own database on their server.
The solution must work in SQL Server 2005.
The Schema:
Here is a simplified idea of the schema.
Group - Defines characteristics of a group of locations
Location - Defines characteristics of one geographic location. It links to the Group table.
GroupCondition - Links to a Group. It defines measures that apply to a subset of locations belonging to that group.
GroupConditionCriteria - Links to GroupCondition table. It names attributes, values, relational operators and boolean operators for a single phrase in a where clause. The named attributes are all fields of the Location table. There is a sequence number. Multiple rows in the GroupConditionCriteria must be strung together in proper sequence to form a full filter condition. This filter condition is implicitly restricted to those Locations that are part of the group associated with the GroupCondition. Location records that satisfy the filter criteria are "Included" and those that do not are "Excluded".
The Goal:
Many of our existing queries get attributes from the location table. We would like to join to something (table, temp table, query, CTE, openquery, UDF, etc.) that will give us the GroupCondition information for those Locations that are "Included". (A location could be included in more than one rule, but that is a separate issue.)
The schema for what I want is:
CREATE TABLE #LocationConditions
(
[PolicyID] int NOT NULL,
[LocID] int NOT NULL,
[CONDITIONID] int NOT NULL,
[Satisfies Condition] bit NOT NULL,
[Included] smallint NOT NULL
)
PolicyID identifies the group, LocID identifies the Location, CONDITIONID identifies the GroupCondition, [Satisfies Condition] is 1 if the filter includes the location record. (Included is derived from a different rule table with forced overrides of the filter condition. Not important for this discussion.)
Size of Problem:
My best effort so far can create such a table, but it is slow. For the current database I am testing, there are 50,000 locations affected (either included or excluded) by potentially matching rules (GroupConditions). The execution time is 4 minutes. If we do a periodic refresh and use a permanent table, this could be workabble, but I am hoping for something faster.
What I tried:
I used a series of CTEs, one of which is recursive, to concatenate the several parts of the filter condition into one large filter condition. As an example of such a condition:
(STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL'
There can be from one to five fields mentioned in the filter condition, and any number of parentheses used to group them. The operators that are supported are lt, le, gt, ge, =, <>, AND and OR.
Once I have the condition, it is still a text string, so I create an insert statement (that will have to be executed dynamically):
insert into LocationConditions
SELECT
1896,
390063,
38,
case when (STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL' then 1
else 0
end,
1
FROM Location loc
WHERE loc.LocID = 390063
I first add the insert statements to their own temp table, called #InsertStatements, then loop through them with a cursor. I execute each insert using EXEC.
CREATE TABLE #InsertStatements
(
[Insert Statement] nvarchar(4000) NOT NULL
)
-- Skipping over Lots of complicated CTE's to add to #InsertStatements
DECLARE #InsertCmd nvarchar(4000)
DECLARE InsertCursor CURSOR FAST_FORWARD
FOR
SELECT [Insert Statement]
FROM #InsertStatements
OPEN InsertCursor
FETCH NEXT FROM InsertCursor
INTO #InsertCmd
WHILE ##FETCH_STATUS = 0
BEGIN
--PRINT #InsertCmd
EXEC(#InsertCmd)
FETCH NEXT FROM InsertCursor
INTO #InsertCmd
END
CLOSE InsertCursor
DEALLOCATE InsertCursor
SELECT *
FROM #LocationConditions
ORDER BY PolicyID, LocID
As you can imagine, executing 50,000 dynamic SQL inserts is slow. How can I speed this up?
you have to insert each row individually? you can't use
insert into LocationConditions
SELECT
PolicyID,
LocID,
CONDITIONID,
case when (STATECODE = 'TX' AND COUNTY = 'Harris County') OR STATECODE = 'FL' then 1
else 0
end,
Included
FROM Location loc
? You didn't show how you were creating your insert statements, so I can't tell if it's dependent on each row or not.

need help to write a trigger

I have three tables:
Orders:
orderid, valuedid, valuesdesc
Customers:
customerid, cutomdesc
Groups:
groupid, groupdesc
valueid - id of a customer or a group
valuesdesc - must filled with appropriate for inserted valueid description from Customers or Groups depend on what (customer or group) user selected in the client.
So, when client send an insert query for Orders it consists orderid for new order and valuedid. And on a client side I know what user selected: group or customer.
What I need: if a new row inserted in Orders, corresponding valuesdesc for valuedid from Customers or Groups inserted in valuesdesc column.
I have an idea to insert with new order record valuesdesc which would contain key - fake value to pick the right dictionary (customers or groups), but how to create that trigger I unfortunately don't know for now.
First, I must say this is a very uncommon design. When you are inserting a record into Orders, you use valuesdesc to hint at the table that valueid references. But as soon as valuesdesc gets filled by the trigger, valueid essentially loses any meaning. That is, you haven't reveal us you've got any other way to tell a reference to Customers from a reference to Groups. So, you could just as well drop valueid altogether and use valuesdesc to pass both the 'dictionary' hint and the reference within that table.
Other than that, because valueid can reference more than one table, you cannot use a foreign key constraint on it.
Anyway, in your present design you could try this:
CREATE TRIGGER Orders_UpdateValuesdesc
ON Orders
FOR INSERT
AS
UPDATE o
SET valuesdesc = COALESCE(c.customdesc, g.groupdesc)
FROM Orders o
INNER JOIN inserted ON o.orderid = i.orderid
LEFT JOIN Customers c ON i.valuesdesc = 'Customers'
AND i.valueid = c.customerid
LEFT JOIN Groups g ON i.valuesdesc = 'Groups'
AND i.valueid = g.customerid
The strings 'Customers' and 'Groups' are meant as the dictionary specifiers. You replace them with the actual values.
The trigger uses LEFT JOIN to join both Customers and Groups, then fills Orders.valuesdesc with either customdesc or groupdesc, depending on which one is not null.

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum