On page 34 of 70-461 Querying Microsoft SQL Server 2012 it says that an indentifier is regular if:
The rules say that the first character must be a letter in the range
A through Z (lower or uppercase), underscore (_), at sign (#), or
number sign (#). Subsequent characters can include letters, decimal
numbers, at sign, dollar sign ($), number sign, or underscore.
However on pg 271 it says:
Even though you can embed special characters such as #, #, and $ in
an identifier for a schema, table, or column name, that action makes
the identifier delimited, no longer regular.
So to clarify would having special characters like the '$' an identifier regular or not
Having $ after the first character is part of the specification that defines a regular identifier and will not require the use of a delimiter.
I found the definition in SQL Server 2008 R2 Identifiers to be clearer than the one from page 34. It is essentially the same as the one on page 271, but with more detail.
Either you have misquoted pg 271 of the book, or your version is different than mine and has an error:
If you embed special characters other than #, #, and $ in an
identifier for a schema, table, or column, name, that action makes the
identifier delimited, no longer regular.
Here is a regular expression that will match a string that complies with the definition:
^[\p{letter}_##][\p{Letter}\p{Number}_##$]*$
Regex for flavors without unicode support:
^[a-zA-Z_##][a-zA-Z\d_##$]*$
Related
Disclaimer:
I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
The "Problem":
I have a list of servers (VMs) that have it's UUID embedded as part of the name. I need to get rid of that in order to obtain the "pure/clean" server name. Now, the problem is precisely that: I need to get rid of the UUID (which has a very specific and constant format, more details on this below) and ONLY that, nothing else.
The UUID - as you might already know or have noticed - has a specific and constant format which consists of the following parts:
It starts with a dash (-).
Which is followed by a subset of 8 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 12 alphanumeric characters (letters are always lowercase).
Samples of results achieved using "my" """"code"""":
In this case the result is the expected one:
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f | sed 's/-[a-z0-9]*//g'
PRODSERVER0022
In this case the result is the expected one too:
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f_OLD | sed 's/-[a-z0-9]*//g'
PRODSERVER0022_OLD
Expected result: PRODSERVER0022-OLD
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f-OLD | sed 's/-[a-z0-9]*//g'
PRODSERVER0022
Expected result: PRODSERVER00-22
echo PRODSERVER00-22-872151c8-1a75-43fb-9b63-e77652931d3f-old | sed 's/-[a-z0-9]*//g'
PRODSERVER00
I know that, within the sed universe, a . means "any character", while a * means "any number of the preceding character". However, what I would need in this case, as I see it at least, is a way to tell sed to do the replacement only if this specific sequence is present (8 alphanumeric characters [any, but specifically 8, not more, not less]; followed by a dash, then followed by 4 alphanumeric characters [any, but specifically 4, not more, not less], etc..). So, the question would be: Is there a regex construction (or a combination [through piping I guess] of several of them, if it has to be the case) that can achieve the expected results in this case?
Note that: Even though servers may have additional dashes (-) as part of their names, the resulting sub-strings will never consist of 8 characters, neither of 4. They might, however, end up having 12 characters, which, even though would initially match up with the last sub-string in the UUID, it will not be at the end of the string, so we have that to discriminate between these two 12-chars substrings (and also it will not be a problem if there is indeed a regex combination that can get rid of the UUID as a whole).
Try this to match the UUID.
-[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}
Embed it in the sed command line in the usual way. As Benjamin W. has said, we need to use extended regular expressiongs.
sed -E 's/-[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}//g'
Normally, for simple character strings, a leading backtick does the trick.
Example: `abc
However, if the string has some special characters, such as space, this will not work.
Example: `$"abc def"
Example: `$"BAT-3Kn.BK"
What are the rules when $"" is required?
Simple syntax for symbols can be used when the symbol consists of alphanumeric characters, dots (.), colons (:), and (non-leading) underscores (_). In addition, slashes (/) are allowed when there is a colon before it. Everything else requires the `$"" syntax.
The book 'Q for mortals', which is available online, has a section discussing datatypes. For symbols it states:
A symbol can include arbitrary text, including text that cannot be
directly entered from the console – e.g., embedded blanks and special
characters such as back-tick. You can manufacture a symbol from any
text by casting the corresponding list of char to a symbol. (You will
need to escape special characters into the string.) See §6.1.5 for
more on casting.
q)`$"A symbol with blanks and `"
`A symbol with blanks and `
The essential takeaway here is that converting a string to a symbol is required when special characters are involved. In the examples you have given both space " " and hyphen "-" are characters that cannot be directly placed into a symbol type.
Googling I've found this DB2 Function declaration:
CREATE FUNCTION QGPL.SPLIT (
#Data VARCHAR(32000),
#Delimiter VARCHAR(5)
)
Whats means # symbol before the Variable Name?
Regards,
Pedro
The # character is simply the first character of the SQL identifier [variable name] naming the parameter defined for the arguments of the User Defined Function (UDF); slightly reformatted [because at first glance I thought that revision might make the at-symbols appear more conspicuously to be part of the name, though now I think probably not]:
CREATE FUNCTION QGPL.SPLIT
( #Data VARCHAR(32000)
, #Delimiter VARCHAR(5)
) returns ...
Put simply, the use of the # character in an identifier is highly discouraged; the use of such variant characters, although supported in standard object naming, they can cause great pains and difficulties, including some that are insurmountable:
http://www.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_71/db2/rbafzch2iden.htm
Identifiers
An identifier is a token used to form a name. An identifier in an SQL statement is an SQL identifier, a system identifier, or a host identifier.
Note: $, #, #, and all other variant characters should not be used in identifiers because the code points used to represent them vary depending on the CCSID of the string in which they are contained. If they are used, unpredictable results may occur. [...]
[Edit-addendum 17May2015]
http://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/SSEPGG_10.5.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0004625.html
Naming rules in a multiple national language environment
The basic character set that can be used in database names consists of the single-byte uppercase and lowercase Latin letters (A…Z, a…z), the Arabic numerals (0…9) and the underscore character (_).
This list is augmented with three special characters (#, #, and $) to provide compatibility with host database products. Use special characters #, #, and $ with care in a multiple national language environment because they are not included in the multiple national language host (EBCDIC) invariant character set. Characters from the extended character set can also be used, depending on the code page that is being used. If you are using the database in a multiple code page environment, you must ensure that all code pages support any elements from the extended character set you plan to use.
[...]
[/Edit-addendum 17May2015]
What are the rules to name methods and variables in Scala, especially when mixing symbols and letters using _? For instance, why _a_, a_+, __a, __a__a__a__+, ___ are valid names, but _a_+_a or _a_+_ are not?
It's in the very first section of the Scala Language Specification:
There are three ways to form an identifier. First, an identifier can start with a letter which can be followed by an arbitrary sequence of letters and digits. This may be followed by underscore ‘_‘ characters and another string composed of either letters and digits or of operator characters.
It's not entirely clear from this, but the operator characters cannot be followed by anything else. Seen here (the pattern for the end of the identifier):
idrest ::= {letter | digit} [‘_’ op]
_a_+_a and _a_+_ are illegal because they have another letter or underscore following the operator characters. However, they are legal if you surround them with back quotes.
scala> val `_a_+_` = 1
_a_+_: Int = 1
scala> val `_a_+_a` = 1
_a_+_a: Int = 1
From here:
There are three ways to form an identifier. First, an identifier can
start with a letter which can be followed by an arbitrary sequence of
letters and digits. This may be followed by underscore ‘_‘ characters
and another string composed of either letters and digits or of
operator characters. Second, an identifier can start with an operator
character followed by an arbitrary sequence of operator characters.
The preceding two forms are called plain identifiers. Finally, an
identifier may also be formed by an arbitrary string between
back-quotes (host systems may impose some restrictions on which
strings are legal for identifiers). The identifier then is composed of
all characters excluding the backquotes themselves.
You can also see in the link the grammar of the language.
I need to bookmark parts of a document from the name of paragraphs but the name of a paragraph is not always a valid name for a bookmark name. I have not found on Google or MSDN an exhaustive list of limitations for bookmark names.
What special characters are forbidden?
The only thing I found is that the length must not exceed 40 characters.
If you are familiar with regular expressions, I would say it is
^(?!\d)\w{1,40}$
Where \w refers to the range of Unicode word characters, which also contain the underscore and the digits from 0-9.
In plain English: The bookmark name must...
be between 1 and 40 characters long
consist of any combination of Unicode letters, digits, underscores
not start with a digit
not contain any kind of white space or punctuation
As stated in the comments, bookmark names beginning with an underscore are treated as hidden. They will not appear in the regular user interface, but they can be used from VBA code. It it is not possible to create bookmarks that begin with an underscore via the regular user interface, but you can do it through VBA code with Bookmarks.Add().