sql server partitioned table - sql-server-2008-r2

I am preparing for 70-451 exam. There is a question I got:
You are a database developer. You plan to design a database solution by using SQL Server 2008. The database will contain a table named Claims. The Claims table will contain a large amount of data. You plan to partition the data into following categories:
Open claims
Claims closed before January 1, 2005
Claims closed between January 1, 2005 and December 31, 2007
Claims closed from January 1, 2008 till date
The close_date field in the Claims table is a date data type and is populated only if the claim has been closed. You need to design a partition function to segregate records into the defined categories.
what should you do?
A Create a RANGE RIGHT partition function by using the values 20051231, 20071231, and 20080101.
B Create a RANGE RIGHT partition function by using the values 20051231, 20071231, and NULL.
C Create a RANGE LEFT partition function by using the values 20051231, 20071231, and 20080101.
D Create a RANGE LEFT partition function by using the values 20051231, 20071231, and NULL.
Can someone answer this?

I've looked at this a few times, and I can't see any of them being right.
The partition for claimes before Jan 1 2005 partition is not generated by any of them, since the first partition value on any answer is 20051231. Whether LEFT / RIGHT is used is then immaterial, every value up until 31st Dec 2005 is in a single partition, and the LEFT / RIGHT is just about whether that date is included.
I would of expected a left with 20041231, or a right with 20050101 to be in the mix somewhere.
If the answers all started with 20041231 instead of 20051231, then I would take answer D as correct. Either question has a typo, or the test does.

I had the exam this week and this question came up. I commented this question with the non related date 20051231.

Related

How to create a "bucket" variable in SAS from ranges given in another table

I am trying to create a bucket variable in SAS that will split transactions into various buckets. However, depending on the retailer where the transactions occurred, the buckets have different lengths and end points. For example, Bucket 1 for Retailer 1 is from June 2017 to July 2018, while for Retailer 2 it is from January 2018 to November 2018. The retailers, bucket labels, and end points for the buckets are stored in an Excel file which I have imported successfully. The transactions are stored in a separate table with retailer information and a "date incurred" column. I am struggling to create a bucket variable in the transactions table. Does SAS allow for conditional logic when merging, like "if the transaction date is between these two dates assign this bucket value"? Is merging even the best way to add the bucket info to the transactions table?
Thank you so much for your help - this is my first ever Stack Overflow question, and I am teaching myself SAS for the first time. Please let me know what other information I can provide to make answering this question easier!
Add the condition to the SQL join, for example :
transactions a
left join
buckets b on a.Retailer = b.Retailer
and a.TransactionDate between b.BucketStart and b.BucketEnd

Tableau Target vs actual - data drop issue

I am having a serious issue in tableau when comparing the current month value to the current month target. I am currently using data blending for this purpose. But this issue wont solve even if I use tableau relationship joining. I have 3 join clauses. Such as branch ID, Staff ID and Month/year. Pls see the below
Ex - january Target - 95,000
Actual - 126,000
But when 3 joining clauses created, its dropping data to 111,900.
the reason is in january, even the allocated branch ID and month is matching, the other staff id clauses are dropping off. That means even though the table A has all branches , staff ID's and date key, the txn table is having only one staff ID maching for January. If the 3 matches are not satisfying , data is dropping itself. How to solve this isseue? I need to show the total value of 126,000 infront of 95,000. not 111,900.
Hope anyone can help.
Many thanks
This got achieved by a table unioning.
Seems like this could have been solved by changing the join type from an inner join to a right join.

Compute rolling average across years while displaying data split by year

The dashboard in the linked workbook shows a table with sales split by year on the top. Below, there's a table with the rolling average of the last 4 weeks, including the current. It's set to show NULL if there are not enough data points. I'd like for it to compute the first January 2018 value based on the current week and 3 full weeks from the end of 2017. Carrying that concept forward, all NULLs from 2018 onward will be eliminated. The NULLs for the first 5 weeks of 2017 will be the only NULL values. The average should always be computed on a full 4 weeks (28 days) even when week 53 doesn't contain 7 days.
How can I write a calculation to achieve what's described above?
I've tried putting the WINDOW_AVG function inside a LOD, but that's not allowed. Furthermore, I've also tried using FIXED and even FIXED inside WINDOW_AVG.
Here's one of my attempts:
{FIXED [Week_int]:
WINDOW_AVG(SUM([Sales]), -4, 0)
}
It returns this error: "Error: Level of detail expressions cannot contain table calculations or the ATTR function"
Here's the data structure. It includes one value of Sales per day.
Basically I created a dummy data in Excel by creating dates (from 1-1-2017 to 2-2-2021) and filling some random values (unif dist *5000) against these.
I added Week[date] to columns and year[date] to rows as in your screenshot. I added sum(value) on the text marks card.
Thereafter, I added table calculation --> Moving average --> edited it for previous 4 values , next 0 values, (check current value if you want to include current record), then check Null if there are not enough values. (your requirement). --> click compute using -Specific Dimensions change the order of fields below - drag Year above than week (table across then down will also create the same view)
You should be able to get a view as desired.
Regarding your query on number of days in the week, Tableau caters it automatically if you have chosen it datepart.
Edit I verified this in Excel, the method is correctly working.
See, the average of first 28 values in Excel
and the view built in tableau:
Here's the corrected dashboard hosted on Tableau Public.

Inaccurate COUNT DISTINCT Aggregation with Date dimension in Google Data Studio

When I aggregate values in Google Data Studio with a date dimension on a PostgreSQL Connector, I see buggy behaviour. The symptom is that performing COUNT(DISTINCT) returns the same value as COUNT():
My theory is that it has something to do with the aggregation on the data occurring after the count has already happened. If I attempt the exact same aggregation on the same data in an exported CSV instead of directly from a PostgreSQL Connector Data Source, the issue does not reproduce:
My PostgreSQL Connector is connecting to Amazon Redshift (jdbc:postgresql://*******.eu-west-1.redshift.amazonaws.com) with the following custom query:
SELECT
userid,
submissionid,
date
FROM mytable
Workaround
If I stop using the default date field for the Date Dimension and aggregate my own dates directly in within the SQL query (date_byweek), the COUNT(DISTINCT) aggregation works as expected:
SELECT
userid,
submissionid,
to_char(date,'YYYY-IW') as date_byweek
FROM mytable
While this workaround solves my immediate problem, it sucks because I miss out on all the date functionality provided by Data Studio (Hierarchy Drill Down, Date Range filtering, etc.). Not to mention reducing my confidence at what else may be "buggy" within the product 😞
How to Reproduce
If you'd like to re-create the issue, using the following data as a PostgreSQL Data Source should suffice:
> SELECT * FROM mytable
userid submissionid
-------- -------------
1 1
2 2
1 3
1 4
3 5
> COUNT(DISTINCT userid) -- ERROR: Returns 5 when data source is PostgreSQL
> COUNT(DISTINCT userid) -- EXPECTED: Returns 3 when data source is CSV (exported from same PostgreSQL query above)
I'm happy to report that as of Sep 17 2020, there's a workaround.
DataStudio added the DATETIME_TRUNC function (see here https://support.google.com/datastudio/answer/9729685?), that allows you to add a custom field that truncs the original date to whatever granularity you want, without causing the distinct bug.
Attempting to set the display granularity in the report still causes the bug (i.e., you'll still set Oct 1 2020 12:00:00 instead of Oct 2020).
This can be solved by creating a SECOND custom field, which just returns the first, and then you can add IT to the report, change the display granularity, and everything will work OK.
I have the same issue with MySQL Connector. But my problem is solved, when I change date field format in DB from DATETIME (YYYY-MM-DD HH:MM:SS) to INT (Unixtimestamp). After connection this table to the Googe Datastudio I set type for this field as Date (YYYYMMDD) and all works, as expected. Hope, this may help you :)
In this Google forum there is a curious solution by Damien Choizit that involves combining your data source with itself. It works well for me.
https://support.google.com/datastudio/thread/13600719?hl=en&msgid=39060607
It says:
I figured out a solution in my case: I used a Blend Data joining twice the same data source with corresponding join key(s), then I specified a data range dimension only on the left side and selected the columns I wanted to CTD aggregate as "dimensions" (and not metric!) on the right side.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)