Easy way to identify required fields in a table - tsql

Scenario: Table with over 100 fields (not my doing... I inherited this)
Only 50 these fields are required to be displayed on a web site
They want to maintain the other 50 fields for historical purposes.
There is a possibility that some of the not required fields may become required sometime in the future.
Problem: I'm looking for a way to easily indentify the 50 required fields such that I could pull the field names with a query.
Psuedo Query: Select FieldNames from TableName where Required = Yes
Is there a setting I could change?
What about using Extended Properties?
Thanks in advance for any direction you can provide.

Unless I'm missing a nuance to your question, use the INFORMATION_SCHEMA table for COLUMNS. This query identifies all the columns in table dbo.dummy that are required.
SELECT
IC.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS IC
WHERE
IC.TABLE_SCHEMA = 'dbo'
AND IC.TABLE_NAME = 'dummy'
AND IC.IS_NULLABLE = 'NO'
After doing more thinking, perhaps you wanted a generic query that would grab all the required columns and then build out the select query. This query covers that possible request
DECLARE
#hax varchar(max)
, #schemaName sysname
, #tableName sysname
SELECT
#schemaName = 'dbo'
, #tableName = 'dummy'
; WITH A AS
(
-- this query identifies all the columns that are not nullable
SELECT
IC.TABLE_SCHEMA + '.' + IC.TABLE_NAME AS tname
, IC.COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS IC
WHERE
IC.TABLE_SCHEMA = #schemaName
AND IC.TABLE_NAME = #tableName
AND IC.IS_NULLABLE = 'NO'
)
, COLUMN_SELECT (column_list) AS
(
-- this query concatenates all the column names
-- returned by the above
SELECT STUFF((SELECT '], [' + A.Column_Name
FROM A
FOR XML PATH('')),1, 2, '')
)
-- Use the above to build a query string
SELECT DISTINCT
#hax = 'SELECT ' + CS.column_list + '] FROM ' + A.tname
FROM
A
CROSS APPLY
COLUMN_SELECT CS
-- invoke the query
EXECUTE (#hax)

How about creating a view that only has the required fields.

I am not sure if I understand the question correctly. Is this what you are looking for? The code is in MS SQL.
select t.name as TABLE_NAME, c.name as COLUMN_NAME, c.is_nullable
from sys.tables t
inner join sys.columns c on c.object_id = t.object_id
WHERE t.name = '<TableName>'
and c.is_nullable = 0

There's no flag you can put on a field to determine whether it's relevant or not -- that's what the SELECT list is for. A couple of ideas...
1) Split the historical data out into a separate table, with a one-to-one relationship to the source table.
2) Re-name the historical fields in your table as "OBSOLETE_" + fieldname. This will at least give you a quick visual reference for when you're writing your sql.
3) Create a view. Big drawback to this one would be that you can take some big performance hits as soon as you try to use the view as a table in other queries. But if you're just pulling off it directly without joining it, you should be fine.

We use separate metatables describing all tables and columns in database. We store information like friendly name (for example 'username' column shoud be displayed to user as 'User name'), formating, etc. You could use this approach to store information about required columns.
We have tried object extended properties (sp_addextendedproperty etc.), but metatable(s) solution came up better for us.

Within TSQL this is not easy as you cannot dynamically build the columns in the select line nor the alias name for those columns. The parser and query optimizer need some stuff to be static. Is it an ASP.NET web site? In your development environment (e.g. C#) you could dynamically build the query.

Related

SQL Pivot using a subquery in FOR

Using SQL Server 2016 and referring to this article:
https://www.sqlshack.com/dynamic-pivot-tables-in-sql-server/
That article uses this pivot:
SELECT * FROM (
SELECT
[Student],
[Subject],
[Marks]
FROM Grades
) StudentResults
PIVOT (
SUM([Marks])
FOR [Subject]
IN (
[Mathematics],
[Science],
[Geography]
)
) AS PivotTable
How can you change the query so that the Subjects ([Mathematics], [Science], [Geography]) don't have to be hardcoded in the query?
Can you rather get the Subject list using a subquery? How do you get the FOR to work with a query like this?
...
FOR [Subject]
IN (
SELECT subject FROM grades WHERE student = "Jacob"
)
How can you change the query so that the Subjects ([Mathematics], [Science], [Geography]) don't have to be hardcoded in the query?
You can't; you'll have to form the SQL as a string and execute it dynamically
SQL makes it easy to have a variable number of columns (you just write more words in a SELECT), which then also makes it easy to forget that columns are like properties of an object (and an entire row is like an instance of an object); they aren't something that vary dynamically every time you run a program. As a Person you don't have a Name this week and not next week.
The number of columns output from a query isn't meant to vary; the number of rows is. If you want variable numbers of attributes, you'll have to form them as rows and then have your front end behave differently to account for them (i.e. don't do the pivot). If you can't do this because you have no front end, and you really do need a varying number of columns, you have to write a different SQL each time (which you can do by concatenating together a new SQL string and EXECing it, but be under no illusions - it works because it's a totally different SQL/the programmatic equivalent of you editing your hardcoded query and re-running it)
It looks something like (not tested - consider this pseudocode):
DECLARE #sql VARCHAR(4000) = CONCAT('
SELECT * FROM (
SELECT
[Student],
[Subject],
[Marks]
FROM Grades
) StudentResults
PIVOT (
SUM([Marks])
FOR [Subject]
IN (',
SELECT STRING_AGG(Subject, ',') FROM (SELECT DISTINCT QUOTENAME(Subject) FROM Grades) x,
' )
) AS PivotTable'
) --end concat
EXEC #sql

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.
You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)
Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

T-SQL: Find column match within a string (LIKE but different)

Server: SQL Server 2008 R2
I apologize in advance, as I'm not sure of the best way to verbalize the question. I'm receiving a string of email addresses and I need to see if, within that string, any of the addresses exist as a user already. The query that obviously doesn't work is shown below, but hopefully it helps to clarify what I'm looking for:
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress LIKE '%user1#domain.com,user2#domain.com%'
I was hoping SQL had an "InString" operator, that would check for matches "within the string", but I my Google abilities must be weak today.
Any assistance is greatly appreciated. If there simply isn't a way, I'll have to dig in and do some work in the codebehind to split each item in the string and search on each one.
Thanks in advance,
Beems
Split the input string and use IN clause
to split the CSV to rows use this.
SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
Now use the above query in where clause.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN(SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a))
Or use can use Inner Join
SELECT f_emailaddress
FROM tb_users A
JOIN (SELECT Ltrim(Rtrim(( Split.a.value('.', 'VARCHAR(100)') )))
FROM (SELECT Cast ('<M>'
+ Replace('user1#domain.com,user2#domain.com', ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) B
ON a.f_emailaddress = b.f_emailaddress
You first need to split the CSV list into a temp table and then use that to INNER JOIN with your existing table, as that will act as a filter.
You cannot use CONTAINS unless you have created a Full Text index on that table and column, which I doubt is the case here.
For example:
CREATE TABLE #EmailAddresses (Email NVARCHAR(500) NOT NULL);
INSERT INTO #EmailAddress (Email)
SELECT split.Val
FROM dbo.Splitter(#IncomingListOfEmailAddresses);
SELECT usr.f_emailaddress
FROM tb_users usr
INNER JOIN #EmailAddresses tmp
ON tmp.Email = usr.f_emailaddress;
Please note that the reference to "dbo.Splitter" is a placeholder for whatever string splitter you already have or might get. Please do not use any splitter that makes use of a WHILE loop. The best options are either the SQLCLR- or XML- based ones. The XML-based ones are generally fast but do have some issues with encoding if the string to be split has special XML characters such as &, <, or ". If you want a quick and easy SQLCLR-based splitter, you can download the Free version of the SQL# library (which I am the creator of, but this feature is in the free version) which contains String_Split and String_Split4k (for when the input is always <= 4000 characters).
SQL has a CONTAINS and an IN function. You can use either of those to accomplish your task. Click on either for more information via MSDNs website! Hope this helps.
CONTAINS
CONTAINS will look to see if any values in your data contain the entire string you provided. Kind of similar in presentations to LIKE '%myValue%';
SELECT f_emailaddress
FROM tb_users
WHERE CONTAINS (f_emailaddress, 'user1#domain.com');
IN
IN will return matches for any values in the provided comma delimited list. They need to be exact matches however. You can't provide partial terms.
SELECT f_emailaddress
FROM tb_users
WHERE f_emailaddress IN ('user1#domain.com','user2#domain.com')
As far as splitting each of the values out into separate strings, have a look at the StackOverflow question found HERE. This might point you in the proper direction.
You can try like this(not tested).
Before using this, make sure that you have created a Full Text index on that table and column.
Replace your comma with AND then
SELECT id,email
FROM t
where CONTAINS(email, 'user1#domain.com and user2#domain.com');
--prepare temp table for testing
DECLARE #tb_users AS TABLE
(f_emailaddress VARCHAR(100))
INSERT INTO #tb_users
( f_emailaddress)
VALUES ( 'user1#domain.com' ),
( 'user2#domain.com' ),
( 'user3#domain.com' ),
( 'user4#domain.com' )
--Your query
SELECT f_emailaddress
FROM #tb_users
WHERE 'user1#domain.com,user2#domain.com' LIKE '%' + f_emailaddress + '%'

Postgres: Find number of distinct values for each column

I am trying to find the number of distinct values in each column of a table. Declaratively that is:
for each column of table xyz
run_query("SELECT COUNT(DISTINCT column) FROM xyz")
Finding the column names of a table is shown here.
SELECT column_name
FROM information_schema.columns
WHERE table_name=xyz
However, I don't manage to merge the count query inside. I tried various queries, this one:
SELECT column_name, thecount
FROM information_schema.columns,
(SELECT COUNT(DISTINCT column_name) FROM myTable) AS thecount
WHERE table_name=myTable
is syntactically not allowed (reference to column_name in the nested query not allowed).
This one seems erroneous too (timeout):
SELECT column_name, count(distinct column_name)
FROM information_schema.columns, myTable
WHERE table_name=myTable
What is the right way to get the number of distinct values for each column of a table with one query?
Article SQL to find the number of distinct values in a column talks about a fixed column only.
In general, SQL expects the names of items (fields, tables, roles, indices, constraints, etc) in a statement to be constant. That many database systems let you examine the structure through something like information_schema does not mean you can plug that data into the running statement.
You can however use the information_schema to construct new SQL statements that you execute separately.
First consider your original problem.
CREATE TABLE foo (a numeric, b numeric, c numeric);
INSERT INTO foo(a,b,c)
VALUES (1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2);
SELECT COUNT(DISTINCT a) "distinct a",
COUNT(DISTINCT b) "distinct b",
COUNT(DISTINCT c) "distinct c"
FROM foo;
If you know the name of all of your columns when you are writing the query, that is sufficient.
If you are seeking data for an arbitrary table, you need to construct the SQL statement via SQL (I've added plenty of whitespace so you can see the different levels involved):
SELECT 'SELECT ' || STRING_AGG( 'COUNT (DISTINCT '
|| column_name
|| ') "'
|| column_name
|| '"',
',')
|| ' FROM foo;'
FROM information_schema.columns
WHERE table_name='foo';
That however is just the text of the necessary SQL statement. Depending on how you are accessing Postgresql, it might be easy for you to feed that into a new query, or if you are keeping everything inside Postgresql, then you will have to resort to one of the integrated procedural languages. An excellent (though complex,) discussion of the issues may provide guidance.

How can I check the type of object associated with an object_id? (SQL Server 2012)

I've searched through StackOverflow and Google for a while and didn't find anything too similar, so here's my problem:
I'm currently writing a stored procedure to check that every every column in a database named 'Sequence' has an associated constraint ensuring the value is >=1. However, my current method returns all objects containing 'Sequence', not just tables (ie. get/set/delete stored procedures that contain 'Sequence').
Here is my current code, which works, but I feel is a dirty solution:
SELECT DISTINCT
'The Sequence column of the ' + CAST(OBJECT_NAME([AC].[object_id]) AS NVARCHAR(255)) + ' table is missing a Sequence>=1 constraint.' AS MESSAGE
FROM [sys].[all_columns] AC
LEFT JOIN [sys].[check_constraints] CC
ON [CC].[parent_object_id] = [AC].[object_id]
AND [CC].[name] LIKE '%Sequence'
AND [CC].[definition] LIKE '%Sequence]>=(1))'
WHERE [AC].[name] = 'Sequence'
AND [CC].[name] IS NULL
AND OBJECT_NAME([AC].[object_id]) NOT LIKE '%Get%'
AND OBJECT_NAME([AC].[object_id]) NOT LIKE '%Set%'
AND OBJECT_NAME([AC].[object_id]) NOT LIKE '%Delete%'
Specifically, my question is: Given [sys].[all_columns].[object_id], is there an easy way to check if the given object is a table versus a stored procedure?
Any help or advice on this would be greatly appreciated! Also general code cleanup here, I'm relatively new to tSQL so this is probably not the most efficient way to go about it.
Thanks,
Andrew
You may refer tables using sys.tables view and search within constraints associated to Sequence column solely:
select quotename(schema_name(t.schema_id)) + '.' + quotename(t.name)
from sys.tables t
join sys.columns c on c.object_id = t.object_id
left join sys.check_constraints cs on cs.parent_object_id = t.object_id and cs.parent_column_id = c.column_id
and cs.definition like '%Sequence]>=(1))'
where c.name = 'Sequence' and cs.object_id is NULL
This should give you tables having Sequence column, but having no constraint on it or having constraint which is not defined according to the rule specified.