Some people in my company have gone to great lengths to remove & characters from data and measure names in our Tabular models. I wasn't around when they made this decision, but it destroys readability in our financial reporting. Instead of R&D and SG&A in our statements, we have RD and SGA.
The offenders are no longer around to answer for their crimes. I am trying to convince my co-workers to re-add the &, but they won't budge without some idea why this was done in the first place. My best guess is that a consultant told them not to use & in models. I think they meant in object names only, but our team got carried away. I've been able to find this page that says & was a reserved character in Compatibility Level 1100, but that goes back to SQL Server 2012! I think our lowest environment is SSAS 2017.
Am I missing anything or can we re-add & to our data and measure names? Is there any reason you would avoid the use of & anywhere in an SSAS tabular model? Links to documentation appreciated!
https://learn.microsoft.com/en-us/analysis-services/multidimensional-models/olap-physical/object-naming-rules-analysis-services?view=asallproducts-allversions
Exceptions: When Reserved Characters are Allowed
As noted, databases of a specific modality and compatibility level can have object names that include reserved characters. Dimension attribute, hierarchy, level, measure and KPI object names can include reserved characters, for tabular databases (1103 or higher) that allow the use of extended characters:
Server mode and database compatibility level Reserved characters allowed?
Databases can have a ModelType of default. Default is equivalent to multidimensional, and thus does not support the use of reserved characters in column names.
Related
We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.
I have a large number of strings containing a product name and a few other properties (size, volume, age, etc). But the strings are not standardized at all. Product names might be misspelled, volume might be in a different notation (0.5l, 1/2 liter, 500ml, etc). The number of variations is limited though, there are for instance only a few hundred products. What tools can I use to analyze each string and tell me if it contains certain tokens? My guess is that some sort of learning mechanism would be useful, but I'm not sure which tools would offer just that. I've looked at ElasticSearch, but I'm not sure if that's the way to go. All my data is currently in a PostgreSQL db and I've looked at pg_grm as well. Again, not sure if that fits my need.
One solution I've been thinking about is maintaining a list of proper keywords and, per string, see if the string contains any of the keywords. I'm not sure if this would work and, if it would, how to efficiently and effectively implement it in postgresql
EDIT
Here are a few example lines I'm trying to extract keywords from:
wine Bardolo red 1L 12b 12%
La Tulipe, 13* box 3 bottles, 2005
Great Johnny Walker 7CL 22% red label
Wisky Jonny Walken .7 Red limited editon
I've done quite some searching by now but have yet to find a proper way to solve this problem.
I've used pg_trgm extension for similar task (I was comparing misspelled address lines and company names) along with clustering algorithm (may be not needed in your case).
It's done it's job with some data preparations (regexp replacements).
May be not very easy but I'm sure it's possible to solve your problem too. And index support in pg_trgm is great.
I am wary of using quoted identifiers in PostgreSQL when my "business" table or column names happen to be reserved words. Are there any standard or best-practice mangling approaches? I would like to have a consistent mangling mechanism as there's also some code-generation in place and need to be able to automatically translate back and forth between mangled names used in PostgreSQL and non-mangled names used in code.
The simplest, and probably best, way to "mangle" terms to guarantee all names are not reserved words is to prefix them with an underscore character.
The best way is to not use reserved words - let the DBA chose the names for any clashes between business terms and DB terms.
Also consider terms that might clash with reserved words in your application language.
I like to reserve the underscore prefix (_name) for parameters and variables in server-side functions to rule out conflicts. I have seen that a lot lately among people working with functions.
For identifiers of database objects I pick the names by hand. Working with non-english terms (reduced to ASCII letters) avoids most of the conflicts.
Another alternative is to use the plural form for table names, since most reserved words are in singular form (a few exceptions!).
Most 1-letter prefixes would do the job without exception. Like: n_name. But since I hand-pick identifiers, there is no need for automation.
Anything is good that avoids the error-prone need for double-quotes. I never use reserved words even if Postgres would allow them.
I recently found myself using some rather lengthy names for the tables and views involved in a development piece, which got me wondering whether it's possible to create client/database/server level aliases for objects.
Say for example I have a view named dbo.vAlphaBetaGammaDelta . Is there a way (with or without Intellisense) to create a reference to it named dbo.vABGD ?
If not, would there be any downfalls to creating a view of a view or single table aside from maintenance necessary if/when the table schema changes?
I should note that these aliases/views would not be intended for use in other objects, but for alleviation and prevention of carpal tunnel during day-to-day troubleshooting and delving xD
SQL Server allows for the creation of synonyms. That seems to be what you are looking for: http://msdn.microsoft.com/en-us/library/ms177544.aspx
However, as #MitchWheat mentioned. this seems to be going in the wrong direction. There are a few quite good SSMS plugins available that provide auto completion of long object names (e.g. SQL Prompt). Incidentally those products have trouble with synonyms...
There are many cases where you would like to have synonyms.
Let me state just one for start:
You have a well defined hypothetical name of a table: GlobalStatisticalRecord. Hundreds of lines of code and objects (keys, indexes, etc.) in SQL and elsewhere are referring to this table name.
After 5 years of usage, the abbreviation GSR was accustomed not only among the technical people, but also among the business users. So, to stress again, GSR is now even more recognizable than GlobalStatisticalRecord. However, for the new people that come into the technical team, it is good to keep the name GlobalStatisticalRecord as a table name, since it nicely describes what is the table all about. Now, when writing a quick adhoc query - and that may not be from your tool of choice with all the Intellisense features you are accustom to - then these aliases are really saving your time (and "life" at 2am in the morning when you are frantically trying to diagnose a production problem).
Please, if you never faced a case when you would need this, just don't assume that there is none.
I stressed the adhoc adjective, since I agree that in permanent queries (stored procedures, etc.), for the reasons you pointed out, it is advisable to use the full table names.
Recently there was a big debate during a code reveiw session on the use of constants.
The developers had used constants for the following purposes:
Each and every message key used in the i18N application was declared as a constant. The application contained around 3000 message keys and hence the same number of constants.
Each and every database column name was declared as a constant. There were around 5000 column names and still counting..
Does it make sense to have such a huge number of constants in any application?
IMHO, common sense should prevail. Message keys just don't need to be declared as constants. We already have one level of indirection - why add one more?
Reg. database column names, I have mixed opinions. If a column is being used in multiple classes, does it make sense to declare it as a global constant?
Please pour in with your thoughts...
If I18N message keys aren't defined as constants, how do you enforce consistency? How do you automatically differentiate between a typo and a missing value? How do you audit to make sure that all I18N keys are fulfilled in each new language file?
As to database columns, you could definitely use some indirection - if your application knows about column names, you've got a binding problem. So there, you might consider a config file with the actual column names - but of course, you would want to refer to the column names by symbolic keys, which should be defined as auditable constants, just like the I18N keys.
I think is a good practice to put message keys used for i18N as constants.
I don't see much benefits in doing the same for the DB columns, if you have a well designed persistence layer.
This depends on the programming language, I think.
In PHP it's not uncommon to ude defines aka contants for such things, while I'd not use this in Java or C#.
In most projects we tried to extract the SQL to templates, so not only the table and column names were configurable but the whole sql statement. We used velocity for basic templating mechanics like variables, small loops,...
Regarding the language constants:
Another layer doesn't make much sense to me, but you hav eto choose your identifiers for the language translation carefully. Using the whole english sentence as key may end up in a lot of work for the translators if you fix the wording for example in the english sentence without changing the meaning. So all translators would have to update their files.
If the constant is used in multiple places and the compiler really catches the problem, yes.