meta: why do I have to specify a group by clause - tsql

Just curious why I really have to specify a group by clause since if I use a function that requiers a group by clause(can't remember the general name of those functions), eg. SUM().
Because if I use one of those I have to specify every column that doesn't use one in the group by clause.
Why doesn't sql just automatically group on all columns that isn't using an aggregation function? It seems redundant since as soon as I'm using an aggregation I'm grouping on all other columns that is not using it.

Probably for the same reason a C compiler would not automatically assume and insert a variable declaration if you are using one that has not been previously declared. There are programming languages which do that sort of things, SQL is not one of them.
Editors, on the other hand, may be aware of this and at least auto-complete functionally dependent parts of the syntax for you. Oracle SQL developer will by default automatically append a GROUP BY clause as soon as it detects you're writing a select column list that needs it. IMO this is a pain, and I usually keep it turned off, but it will be as far as you get - on an IDE/editor level.
Edit: Based on your last comment, there is an option in MySQL (not Microsoft's T-SQL) meant to relax the rule by implementing optional feature T301 of the standard SQL99. I think this is exactly what you're after:
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
Source: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Could not find much information on the status of this feature in future versions of T-SQL, though. The only reference is this, with the very cryptic remark that T-SQL would "partially support this feature".

Related

Is it possible to prevent the reading and/or setting of a field value with DBIx Class?

I'm working in a project that uses Catalyst and DBIx::Class.
I have a requirement where, under a certain condition, users should not be able to read or set a specific field in a table (e.g. the last_name field in a list of users that will be presented and may be edited by the user).
Instead of applying the conditional logic to each part of the project where that table field is read or set, risking old or new cases where the logic is missed, is it possible to implement the logic directly in the DBIx::Class based module, to never return or change the value of that field when the condition is met?
I've been trying to find the answer, and I'm still reading, but I'm somewhat new to DBIx::Class and its documentation. Any help would be highly appreciated. Thank you!
I‘d use an around Moose method modifier on the column accessor generated by DBIC.
This won‘t be a real security solution as you can still access data without the Result class, for example when using HashRefInflator.
Same for calling get_column.
Real security would be at the database level with column level security and not allowing the database user used by the application to fetch that field.
Another solution I can think of is an additional Result class for that table that doesn‘t include the column, maybe even defaulting to it and only use the one including the column when the user has a special role.

CHECK length limit-n constraint in text field instead vartext(n) use

SQL language, in particular PostgreSQL 9+, have many ways to do the same thing... But in many circumstances (see Notes sec. for a rationale) we need to "cut diversity" and opt to a standard way.
There are a tendency to adopt text data type instead varchar. Will be "the standard way to express strings" in PostgreSQL (!), avoiding lost time in project discussions and casting similar formats...
But, how to use text preserving the size limit constraint?
I use CHECK(char_length(field)<N) and have no problem to change the limit in live environment, so it is perhaps the best way... Is it?
Some variations: in general what is the best choice?
in CREATE TABLE:
1.1. CHECK after the data type, just like default value definitions. This is the best practice?
1.2. CHECK after all column definitions. Usual to multi-column declaration like CHECK(char_length(col1)<N1 AND char_length(col2)<N2).
1.2.1. Some people like also to express all individual CHECKs after, to not "pollute" column declarations.
Use in trigger: there are some advantage?
Other ways... Other relevant one?
1.1, 1.2, 2 or 3, what is the best practice ?
CONTEXT NOTES
In projects and teams with some KISS or Convention over configuration demands, we need "good practices" recommendations... I was looking for it, in the context of CREATE TABLE ... text/varchar and project maintenance. No unbiased "good practices" recommendation in the horizon: Stackoverflow votings are the only reasonable record of this kind of recommendation.
Convention scope
(edit) For individual use, of course, as #ConsiderMe commented, "no matter what you choose, as long as you stick with it throughout the entire time there will be no problem with it".
This question, by other hand, is about "SQL community" or "PostgreSQL community" best practices conventions.
I like to keep the code as short as it's possible so it'd go with length(string) in the CHECK constraint. I do not see a particular use for char_length in this case - it takes up more "code space".
Internally, they are both textlen anyways.
You should be careful about signs that take more than 1 byte. In this case I would use octet_length. As an example consider character ą which returns 1 when asked for length, and 2 when asked for octet_length. It's been a pain doing migrations between database systems with different length enforcement.
I believe that a good source for "best practices" would be to follow documentation.
It says that using CHECK constraint inline with a column implies a column constraint which is bound to a particular column.
It mentions table constraint which is written separately from any column definition and enforces data corectness between several columns.
Basically in projects I'm involved I follow this rule for readability and maintenance purposes.
I wouldn't even consider creating trigger for such things. To me they are designed for much more complex tasks. I don't see a reason to enforce simple data correctness rules in triggers.
I can't think of any other solution which would be as basic as the standard ones and still doing it's simple job as those mentioned above.
The Depesz article on which this reasoning was based is outdated. The only argument against varchar(N) was that changing N required a table rewrite, and as of Postgres 9.2, this is no longer the case.
This gives varchar(N) a clear advantage, as increasing N is basically instantaneous, while changing the CHECK constraint on a text field will involve re-checking the entire table.

CHECK CONSTRAINT text is changed by SQL Server. How do I avoid or work around this?

When I define a CHECK CONSTRAINT on a table, I find the condition clause stored can be different than what I entered.
Example:
Alter table T1 add constraint C1 CHECK (field1 in (1,2,3))
Looking at what is stored:
select cc.Definition from sys.check_constraints cc
inner join sys.objects o on o.object_id = cc.parent_object_id
where cc.type = 'C' and cc.name = 'T1';
I see:
([field1]=(3) OR [field1]=(2) OR [field1]=(1))
Whilst these are equivalent, they are not the same text.
(A similar behaviour occurs when using a BETWEEN clause).
My reason for wishing this did not happen is that I am trying to programatically ensure that all my CHECK constraints are correct by comparing the text I would use to define
the constraint with that stored in sys.check_constraints - and if different then drop and recreate the constraint.
However, in these cases, they are always different and so the program would always think it needs to recreate the constraint.
Question is:
Is there any known reason why SQL Server does this translation? Is it just removing a bit of syntactic sugar and storing the clause in a simpler form?
Is there a way to avoid the behaviour (other than to write my constraint clauses in the long form to match what SQL Server would change it to)?
Is there another way to tell if my check constraint is 'out of date' and needs recreating?
Is there any known reason why SQL Server does this translation? Is it just removing a bit of syntactic sugar and storing the clause in a simpler form?
I'm not aware of any reasons documented in the Books Online, or elsewhere. However, my guess is that it's normalized for some purposes that are internal to SQL Server. It might allow SQL Server to be a bit lenient in defining the expression (such as using Database for a column name), but guaranteeing that the column names are always appropriately escaped for whatever engine needs to parse the expression (ie, [Database]).
Is there a way to avoid the behaviour (other than to write my constraint clauses in the long form to match what SQL Server would change it to)?
Probably not. But if your constraints aren't terribly complicated, is re-writing the constraint clauses in the long form such a bad idea?
Is there another way to tell if my check constraint is 'out of date' and needs recreating?
Before I answer this directly, I'd point out that there's a bit of programming philosophy involved here. The API that SQL Server provides for the text of a CHECK constraint only guarantees that you'll get something equivalent to the original expression. While you could certainly build some fancy methods to try to ensure that you'll always be able to reproduce SQL Server's normalized version of the expression, there's no guarantee that Microsoft won't change its normalization rules in the future. And indeed, there's probably no guarantee that two equivalent expressions will always be normalized identically!
So, I'd first advise you to re-examine your architecture, and see if you can accomplish the same result without having to rely on undocumented API behavior.
Having said that, there are a number of methods outlined in this question (and answer).
Another alternative, which is a bit more brute-force but perhaps acceptable, would be to always assume that the expression is "out of date" and simply drop/re-create the constraint every time you check. Unless you're expecting these constraints to frequently become out-of-date (or the tables are quite large), it seems this would be a decent solution. You could probably even run it in a transaction, so that if the new constraint is already violated, simply roll-back the transaction and report the error.

Prepared statements on Select statements

I’m just starting to convert all of my site’s code into prepared statements for that extra security cushion but I find myself running into the same questions.
After some reading, I’ve decided to use prepared statements on all select queries, however I’m not sure if all of variables in these queries require to be used as “parameters” in the prepared statement.
For example:
Where some_column IS NULL
Where some_column = $_SESSION[‘some-session-var’]
Where some_column IN ($someArray)
Also, is there some way to give each condition a “name” rather than using the question mark? I feel like I’ve seen this before in documentation, but I’ve had no luck finding it since.
For example: Where city_name = :cityName. If so, how would I go about binding the parameters here?
Thanks,
Evan
Yes. All data going to the query should be added via placeholders.
Otherwise there will be no security at all.
Though prepared statements extremely limited and support only scalar values, so, your first and third examples require extra coding (examples can be found in plenty under the tag)
Named placeholders you mentioned belongs to PDO, mysqli don't support them

variable table or column names in a function

I'm trying to search all tables and columns in a database, a la here. The suggested technique is to construct SQL query strings and then EXEC them. This works well, as a stored procedure. (Another example of variable table/column names is here. Again, EXEC is used to execute "dynamic SQL".)
However, my app requires that I do this in a function, not an SP. (Our development framework has trouble obtaining results from an SP.) But in a function, at least on SQL Server 2008 R2, you can't use EXEC; I get this error:
Invalid use of a side-effecting operator 'INSERT EXEC' within a function.
According to the answer to this post, apparently by a Microsoft developer, this is by design; it has nothing to do with the INSERT, only the fact that when you execute dynamically-constructed SQL code, the parser cannot guarantee a lack of side effects. Therefore it won't allow you to create such a function.
So... is there any way to iterate over many tables/columns within a function?
I see from BOL that
The following statements are valid in a function: ...
EXECUTE
statements calling extended stored procedures.
Huh - How could extended SP's be guaranteed side-effect free?
But that doesn't help me anyway:
The extended stored procedure, when it is called from inside a
function, cannot return result sets to the client. Any ODS APIs that
return result sets to the client will return FAIL. The extended stored
procedure could connect back to an instance of SQL Server; however, it
should not try to join the same transaction as the function that
invoked the extended stored procedure.
Since we need the function to return the results of the search, an ESP won't help.
I don't really want to get into extended SP's anyway: incrementing the number of programming languages in the environment would complicate our development environment more than it's worth.
I can think of a few solutions right now, none of which is very satisfactory:
First call an SP that produces the needed data and puts it in a table, then select from the function which merely reads the result from the table; this could be trouble if the search takes a while and two users' searches overlap. Or,
Have the application (not the function) generate a long query naming every table and column name from the db. I wonder if the JDBC driver can handle a query that long. Or,
Have the application (not the function) generate a long series of short queries naming every table and column name from the db. This will make the overall search a lot slower.
Thanks for any suggestions.
P.S. Upon further searching, I stumbled across this question which is closely related. It has no answers.
Update: No longer needed
I think this question is still valid, and we may again have a situation where we need it. However, I don't need an answer anymore for the present problem. After much trial-and-error I managed to get our application framework to retrieve row results from the RDBMS via the JDBC driver from the stored procedure. Therefore getting the thing to work as a function is unnecessary.
But if anyone posts an answer here that helps with the stated problem, I will be happy to upvote and/or accept it as appropriate.
An sp is basically a predefined sql statment with some add ons.
So if you had
PSEUDOCODE
Create SP_DoSomething As
Select * From MyTable
END
And you can't use the SP
Then you just execute the SQL as in "Select * From MyTable"
As for that naff sql code.
For start you could join table to column with a where clause, which would get rid of that line by line if stuff.
Ask another question. Like How could this be improved, there's lots of scope for more attempts than mine.