what is the easy way to compare cube data with that of data mart? - tsql

I was trying to compare with the cube's data with that of data mart using TSQL query. It s hard for me to find where the columns used in the cube are coming from. Because the columns used in the cube are coming from data source view, but how do I know which table?
Help appreciated

Are you familiar with SQL Server Business Intelligence? Very generically speaking, the "columns" that you're seeing are coming from the fact table(s), sliced across one or more dimensions (as defined by the dimension tables).
Your best bet is to inspect the cube via SQL Server Business Intelligence Studio, if you know how to use it. This link should get you started if you're a little shaky with using BIDS to design SSAS cubes.

Related

Database design: Postgres or EAV to hold semi-structured data

I was given the task to decide whether our stack of technologies is adequate to complete the project we have at hand or should we change it (and to which technologies exactly).
The problem is that I'm just a SQL Server DBA and I have a few days to come up with a solution...
This is what our client wants:
They want a web application to centralize pharmaceutical researches separated into topics, or projects, in their jargon. These researches are sent as csv files and they are somewhat structured as follows:
Project (just a name for the project)
Segment (could be behavioral, toxicology, etc. There is a finite set of about 10 segments. Each csv file holds a segment)
Mandatory fixed fields (a small set of fields that are always present, like Date, subjects IDs, etc. These will be the PKs).
Dynamic fields (could be anything here, but always as a key/pair value and shouldn't be more than 200 fields)
Whatever files (images, PDFs, etc.) that are associated with the project.
At the moment, they just want to store these files and retrieve them through a simple search mechanism.
They don't want to crunch the numbers at this point.
98% of the files have a couple of thousand lines, but there's a 2% with a couple of million rows (and around 200 fields).
This is what we are developing so far:
The back-end is SQL 2008R2. I've designed EAVs for each segment (before anything please keep in mind that this is not our first EAV design. It worked well before with less data.) and the mid-tier/front-end is PHP 5.3 and Laravel 4 framework with Bootstrap.
The issue we are experiencing is that PHP chokes up with the big files. It can't insert into SQL in a timely fashion when there's more than 100k rows and that's because there's a lot of pivoting involved and, on top of that, PHP needs to get back all the fields IDs first to start inserting. I'll explain: this is necessary because the client wants some sort of control on the fields names. We created a repository for all the possible fields to try and minimize ambiguity problems; fields, for instance, named as "Blood Pressure", "BP", "BloodPressure" or "Blood-Pressure" should all be stored under the same name in the database. So, to minimize the issue, the user has to actually insert his csv fields into another table first, we called it properties table. This action won't completely solve the problem, but as he's inserting the fields, he's seeing possible matches already inserted. When the user types in blood, there's a panel showing all the fields already used with the word blood. If the user thinks it's the same thing, he has to change the csv header to the field. Anyway, all this is to explain that's not a simple EAV structure and there's a lot of back and forth of IDs.
This issue is giving us second thoughts about our technologies stack choice, but we have limitations on our possible choices: I only have worked with relational DBs so far, only SQL Server actually and the other guys know only PHP. I guess a MS full stack is out of the question.
It seems to me that a non-SQL approach would be the best. I read a lot about MongoDB but honestly, I think it would be a super steep learning curve for us and if they want to start crunching the numbers or even to have some reporting capabilities,
I guess Mongo wouldn't be up to that. I'm reading about PostgreSQL which is relational and it's famous HStore type. So here is where my questions start:
Would you guys think that Postgres would be a better fit than SQL Server for this project?
Would we be able to convert the csv files into JSON objects or whatever to be stored into HStore fields and be somewhat queryable?
Is there any issues with Postgres sitting in a windows box? I don't think our client has Linux admins. Nor have we for that matter...
Is it's licensing free for commercial applications?
Or should we stick with what we have and try to sort the problem out with staging tables or bulk-insert or other technique that relies on the back-end to do the heavy lifting?
Sorry for the long post and thanks for your input guys, I appreciate all answers as I'm pulling my hair out here :)

Difference between Birt data cubes vs mondrian pentaho data cubes

I have used Eclipse birt data cube but didn't gain any performance improvement over a huge data. Could i get any performance improvement if i use mondrian data cubes.
What is the main difference between Eclipse birt data cube and mondrian data cubes.
Different Engine. Mondrian cubes come from Pentaho. A ROLAP cube won't necessarily give you an immediate huge improvement - what is it you're looking for? The main advantage of cubes is caching ( which will obviously help - but no use when cold ) and ability to process the data in a businesslike way.
Saying that Mondrian is very good and has a lot of performance related features - especially agg tables.. Mondrian is also used in a lot of other BI suits and follows industry standards ( olap4j/mdx/xmla etc)
Perhaps if you have a specific data problem more info on that could help devise a solution!
You may also want to look at an analytic db, combine Mondrian or any cube with one of those and you'll fly.

blank measures in ssas cube even when data is in source table

I've been working on a cube that has data in the fact table that is browsable through the DSV but after being processed seems to have blank values for all the measures. The common suggestions include checking the that the CALCULATE command is still visible in the Calculations tab for the measures (it is there and its not commented out or anything) and checking there's actually data for the cube (there is data). The partitions are set for the whole table so its not like data is being filtered out. Interestingly if I redeploy the same SSAS project and therefore effectively the same code to a brand new SSAS database then its fine (measures show up). I guess I'm wondering if someone can throw some light on why this behaviour occurs?
Any help much appreciated!
I would suggest checking the cube size first. You can use following recipe http://www.ssas-info.com/analysis-services-scripts/1197-powershell-script-to-list-info-about-ssas-databases
Depending on the settings, it is possible that you have missing/mismatched dimension keys.
That can cause your data not being loaded.

Why to build a SSAS Cube?

I was just searching for the best explanations and reasons to build a OLAP Cube from Relational Data. Is that all about performance and query optimization?
It will be great if you can give links or point out best explanations and reasons for building a cube, as we can do all the things from relational database that we can do from cube and cube is faster to show results.Is there any other explanation or reasons?
There are many reasons why you should use a cube for analytical proccessing.
Speed. Olap wharehouses are read only infrastractures providing 10 times faster queries than their oltp counterparts. See wiki
Multiple data integration. On a cube you can easily use multiple data sources and do minimal work with many automated tasks (especially when you use SSIS) to intergrate them on a single analysis system. See elt process
Minimum code. That is, you need not write queries. Even though you can write MDX - the language of the cubes in SSAS, the BI Studio does most of the hard work for you. On a project I am working on, at first we used SSRS to provide reports for the client. The queries were long and hard to make and took days to implement. Their SSAS equivalent reports took us half an hour to make, writing only a few simple queries to trasform some data.
A cube provides reports and drill up-down-through, without the need to write additional queries. The end user can traverse the dimension automatically, as the aggregations are already stored in the warehouse. This helps as the users of the cube need only traverse its dimensions to produce their own reports without the need to write queries.
Is is part of the Bussiness Intelligence. When you make a cube it can be fed to many new technologies and help in the implementation of BI solutions.
I hope this helps.
If you want a top level view, use OLAP. Say you have millions of rows detailing product sales and you want to know your monthly sales totals.
If you want bottom-level detail, use OLTP (e.g. SQL). Say you have millions of rows detailing product sales and want to examine one store's sales on one particular day to find potential fraud.
OLAP is good for big numbers. You wouldn't use it to examine string values, really...
It's bit like asking why using JAVA/C++ when we can do everything with Assembly Language ;-) Building a cube (apart from performance) is giving you the MDX language; this language has higher level concepts than SQL and is better with analytic tasks. Perhaps this question gives more info.
My 2 centavos.

SSAS - Where do you start using it?

I'm in the middle of designing a SSAS db. I get the theory and the use of this stuff. Here's the thing, I've got a logging database that logs interesting order statuses which I would like to measure time to complete. I've got these tables (not implemented), to measure status times
time_dimension
user_dimension
status_dimension
status_fact - dimension references and timeInStatus measure
So my question is, do I create regular database and stage these things up for an SSIS task to pull into a SSAS db, or do I just create an SSAS db and describe the regular db with SSAS?
Naturally I'm new at this, but this type of analysis has been an interest of mine for a looong time! Your help is appreciated.
If your source DB (the logging one) is really nicely normalized around the data you need, you can probably get away without the stage.
Performance may suffer, development may suffer, etc.. I think a DW (stage) db is almost a necessity to fully leverage SSAS though...