Table cardinality information in Firebird - metadata

I am new to Firebird and I am messing around in it's meta data to get some information about the table structure and etc.
My problem is that I can't seem to find some information about estimated table cardinality. Is there a way to get this information from Firebird?
Edit:
By cardinality i mean the number of rows in a table :) and for my use the select count(*) is not an option.

You can use an aproximative method, using the selectivity of primary key like this:
SELECT
R.RDB$RELATION_NAME TABLENAME,
(
CASE
WHEN I.RDB$STATISTICS = 0 THEN 0
ELSE 1 / I.RDB$STATISTICS
END) AS COUNTRECORDS8
FROM RDB$RELATIONS R
JOIN RDB$RELATION_CONSTRAINTS C ON (R.RDB$RELATION_NAME = C.RDB$RELATION_NAME AND C.RDB$CONSTRAINT_TYPE = 'PRIMARY KEY')
JOIN RDB$INDICES I ON (I.RDB$RELATION_NAME = C.RDB$RELATION_NAME AND I.RDB$INDEX_NAME = C.RDB$INDEX_NAME)

To get the number of rows in a table you use the COUNT() function as in any other SQL DB, ie
SELECT count(*) FROM table;

Why not use a query:
select count(distinct field_name)/(count(field_name) + 0.0000) from table_name
The closer result to 1 the higher cardinality of a specified column.

Related

Using two COUNT in SELECT returns the same values

SELECT user_posts.id,
COUNT(user_post_comments.post_id) as number_of_comments,
COUNT(user_post_reactions.post_id) as number_of_reactions
FROM user_posts
LEFT JOIN user_post_comments
ON (user_posts.id = user_post_comments.post_id)
LEFT JOIN user_post_reactions
ON (user_posts.id = user_post_reactions.post_id)
WHERE user_posts.user_id = '850e6511-2f30-472d-95a1-59a02308b46a'
group by user_posts.id
I have this query for getting the number of comments and reactions from another table by post_id
current output screenshot
To caluclate number of comments and reactions just use subqueries. No need to join and group by.
SELECT user_posts.id,
( select COUNT(*) from user_post_comments
where user_posts.id = user_post_comments.post_id
) as number_of_comments,
( select COUNT(*) from user_post_reactions
where user_posts.id = user_post_reactions.post_id
) as number_of_reactions
FROM user_posts
WHERE user_posts.user_id = '850e6511-2f30-472d-95a1-59a02308b46a'
If both joins return non-null rows, what you get for each count is the product of the number of rows from each for a given user_posts.id. One way you could fix that is by counting distinct identifiers for each table, e.g.
COUNT(DISTINCT user_post_comments.id) as number_of_comments
(Assuming "id" exists as a primary key on that table). This may not be spectacularly efficient, but is relatively simple.

Nested SQL Query Optimization

is there any better way to write this query to be optimized?
SELECT * FROM data d
WHERE d.id IN (SELECT max(d1.id)
FROM data d1
WHERE d1.name='A'
AND d1.version='2')
I am not so good with SQL.
With PostgreSQL v13, you can do it like this:
SELECT * FROM data
WHERE name = 'A'
AND version = '2'
ORDER BY is DESC
FETCH FIRST 1 ROWS WITH TIES;
That will give you all rows where id is the maximum.
If id is unique, you can use FETCH FIRST 1 ROWS ONLY or LIMIT 1, which will also work with older PostgreSQL versions.
Apart from other answers that are equally interesting / correct, IN is typically a non-performant keyword. You can remove it by using a slightly different way of writing your own query:
SELECT * FROM data d
WHERE d.name = 'A' and d.version = '2' and
d.id = (SELECT max(d1.id) FROM data d1 WHERE d1.name='A' AND d1.version='2')

Simple SELECT, but adding JOIN returns too many rows

The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)

T-SQL query one table, get presence or absence of other table value

I'm not sure what this type of query is called so I've been unable to search for it properly. I've got two tables, Table A has about 10,000 rows. Table B has a variable amount of rows.
I want to write a query that gets all of Table A's results but with an added column, the value of that column is a boolean that says whether the result also appears in Table B.
I've written this query which works but is slow, it doesn't use a boolean but rather a count that will be either zero or one. Any suggested improvements are gratefully accepted:
SELECT u.number,u.name,u.deliveryaddress,
(SELECT COUNT(productUserid)
FROM ProductUser
WHERE number = u.number and productid = #ProductId)
AS IsInPromo
FROM Users u
UPDATE
I've run the query with actual execution plan enabled, I'm not sure how to show the results but various costs are:
Nested Loops (left semi join): 29%]
Clustered Index scan (User Table): 41%
Clustered Index Scan (ProductUser table): 29%
NUMBERS
There are 7366 users in the users table and currently 18 rows in the productUser table (although this will change and could be in the thousands)
You can use EXISTS to short circuit after the first row is found rather than COUNT-ing all matching rows.
SQL Server does not have a boolean datatype. The closest equivalent is BIT
SELECT u.number,
u.name,
u.deliveryaddress,
CASE
WHEN EXISTS (SELECT *
FROM ProductUser
WHERE number = u.number
AND productid = #ProductId) THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END AS IsInPromo
FROM Users u
RE: "I'm not sure what this type of query is called". This will give a plan with a semi join. See Subqueries in CASE Expressions for more about this.
Which management system are you using?
Try this:
SELECT u.number,u.name,u.deliveryaddress,
case when COUNT(p.productUserid) > 0 then 1 else 0 end
FROM Users u
left join ProductUser p on p.number = u.number and productid = #ProductId
group by u.number,u.name,u.deliveryaddress
UPD: this could be faster using mssql
;with fff as
(
select distinct p.number from ProductUser p where p.productid = #ProductId
)
select u.number,u.name,u.deliveryaddress,
case when isnull(f.number, 0) = 0 then 0 else 1 end
from Users u left join fff f on f.number = u.number
Since you seem concerned about performance, this query can perform faster as this will cause index seek on both tables versus an index scan:
SELECT u.number,
u.name,
u.deliveryaddress,
ISNULL(p.number, 0) IsInPromo
FROM Users u
LEFT JOIN ProductUser p ON p.number = u.number
WHERE p.productid = #ProductId

T-SQL Need help to optimize table value function

I need help to optmize the SQL logic in one of my functions. Please, note that I am not able to use store procedure.
Here is my table. It will be initialized using #MainTable that contains a lot of records.
DECLARE TABLE #ResultTable
(
ResultValue INT
)
These are tables that stores some parameters - they can be emty too.
DECLARE TABLE #ParameterOne (ParameterOne INT)
DECLARE TABLE #ParameterTwo (ParameterOne NVARCHAR(100))
...
DECLARE TABLE #ParameterN(ParameterN TINYINT)
Now, I need to join a lot of tables to my #MainTable in order to select from it only some of its records.
The selected records depend on the information stored in the parameters table.
So, my current solution is:
INSERT INTO ResultTable(ResultValue)
SELECT ResultValue
FROM MainTable M
INNER JOIN #MainOne MO
ON M.ID=MO.ID
....
INNER JOIN #MainN MN
ON M.IDN=MN.ID
WHERE (EXISTS (SELECT 1 FROM #ParameterOne WHERE ParameterOne=MO.ID) OR NOT EXISTS (SELECT 1 FROM #ParameterOne))
AND
...
AND
(EXISTS (SELECT 1 FROM #ParameterN WHERE ParameterN=MN.Name) OR NOT EXISTS (SELECT 1 FROM #ParameterN ))
So, the idea is to add the records only if they match the current criteria from the parameters tables.
Because I am not able to use procedure to build dynamic query I am using the WHERE clause with combinations of EXISTS and NOT EXISTS for each parameter table.
The problem is that it works slower when I am adding more and more parameters table. Is there an other way to do this without using a lot of IF/ELSE statements checking what parameter table has records - it will make the function a lot bigger and difficult for read.
And ideas and advices are welcomed.
Good question.
Try the following one:
INSERT INTO ResultTable(ResultValue)
SELECT ResultValue
FROM MainTable M
INNER JOIN (SELECT * FROM #MainOne WHERE (EXISTS (SELECT 1 FROM #ParameterOne WHERE ParameterOne=#MainOne.ID) OR NOT EXISTS (SELECT 1 FROM #ParameterOne))) MO
ON M.ID=MO.ID
....
INNER JOIN (SELECT * FROM #MainN WHERE (EXISTS (SELECT 1 FROM #ParameterN WHERE ParameterOne=#MainN.Name OR NOT EXISTS (SELECT 1 FROM #ParameterN))) MO
ON M.IDN=MN.ID
Advantages:
Result of the JOIN is more quickly, because it does not process all data (it is already filtered)
It looks more simple for adjusting