Decimal: Integral number too large in Redshift COPY - amazon-redshift

I'm getting the following error from Redshift.
Decimal: Integral number too large
This is happening when inserting the following csv line
2015-03-20,A_M300X250CONTENT_INT_ADSENSE,3443,3443,1.4,13,,
The error is being thrown by 1.4.
The definition of that column is this:
schemaName | tablename | column | type | encoding | disktkey | sortkey | notnull
-----------|-----------|-----------------|--------------|----------|----------|---------|---------
public | partners | revenue_partner | numeric(7,7) | none | false | 0 | false
This copy worked fine when the type was numeric(7,2), but I need to change it to fix a rounding error.

numeric(7,7) means the total number of digits allowed is 7 and all 7 are allocated as decimals. If you want 7 decimals and 7 digits you need numeric(14,7)

Reading the docs http://docs.aws.amazon.com/redshift/latest/dg/r_Numeric_types201.html
It looks like a numeric(7,7) data type can only store values between 0-1 with 7 significant figures. The second number is the number of values you can have after the decimal and the first number - the second number will be the number of values you can have before the decimal.

Related

Update telephone number format to include country identifier

Being a beginner in SQL, I am trying to do the telephone fields to be noted in the format: '"+" + country identifier + telephone number'.
UPDATE public.contact
SET phone_number = CASE WHEN (country_code ='FR')
AND phone_number NOT LIKE '+33%'
AND phone_number <> NULL
THEN CONCAT('+33', phone_number)
WHEN (country_code ='GB')and phone_number NOT LIKE '+44%'
AND phone_number <> NULL
THEN CONCAT('+44', phone_number)
I want to update telephone number format to include country identifier like : 0606080905-> +33606080905 if country_code='FR' . I am looking for a faster and less complex way than what I did.
You can do this with a regular expression using regexp_replace.
Imagine your data being:
+----------+--------------+
Table 'numbers': | country | phone |
+----------+--------------+
| FR | 0606080905 |
| FR | +33606080906 |
| GB | 0123456789 |
| GB | +44987654321 |
| GB | NULL |
+----------+--------------+
Then the following update would replace the leading 0 with the country code +33 for all numbers that do not start with a +xx and have FR as country.
UPDATE numbers
SET phone = REGEXP_REPLACE(trim(phone), '^(0)', '+33')
WHERE country = 'FR'
Explained:
the ^ means start of the string
the (0) is the match that gets replaced (leading zero)
the +33 is the string that is used to replace it
the trim() is just added for safety, in case there are leading spaces
NULL phone numbers won't be affected, as they do not match
You could do this now as you did before with a CASE WHEN or something similar for each of the different possibilities. But since the expression always is the same, an easier way would be to have your country codes and their numerical mapping in a separate table:
+----------+--------+
Table 'mapping': | country | prefix |
+----------+--------+
| FR | +33 |
| GB | +44 |
+----------+--------+
You could then do
UPDATE numbers n
SET phone = REGEXP_REPLACE(trim(phone), '^(0)', prefix)
FROM mapping m
WHERE m.country = n.country
and update all your numbers in one go:
+----------+--------------+
| country | phone |
+----------+--------------+
| FR | +33606080905 |
| FR | +33606080906 |
| GB | +44123456789 |
| GB | +44987654321 |
| GB | NULL |
+----------+--------------+
EDIT: Previously, I had this needlessly complicated answer. You may need something like this if your phone number patterns are more diverse...
The following update would replace the leading 0 with the country code +33 for all numbers that do not start with a +xx and have FR as country.
UPDATE numbers
SET phone = REGEXP_REPLACE(trim(phone), '^(?<![+\d{2}])(0)', '+33')
WHERE country = 'FR'
Explained:
the (?<![+]) is a negative lookbehind assertion that makes sure the regex only matches if there is no + followed by two digits before
the (0) is the match that gets replaced
the +33 is the string that is used to replace it
the trim() is just added for safety, in case there are leading spaces
NULL phone numbers won't be affected, as they do not match
That's about as simple as it gets.
The only way I can imagine to speed up processing is to add a WHERE condition that avoids updating the rows that don't have to be modified.
You could also run several such statements in parallel, where each modifies a different part of the table.
As mentioned in the comment, <> NULL is never true.

String splitting and operations on only some results

I have strings that look like this:
schedulestart | event_labels
2018-04-04 | 9=TTR&11=DNV&14=SWW&26=DNV&2=QQQ&43=FTW
When I look at it in the database. I have code that relies in this string in this format to display a schedule with events with those labels on those days.
Now I find myself needing to break down the string in postgres for reporting/analysis, and I can't really pull out the string and parse it in another language, so I have to stick to postgres.
I've figured out a way to unpack the string so my results look like this:
User ID | Schedule Start | Unpacked String
2 | 2018-04-04 | TTR
2 | 2018-04-04 | 9
2 | 2018-04-04 | DNV
2 | 2018-04-04 | 11
2 | 2018-04-04 | SWW
2 | 2018-04-04 | 14
2 | 2018-04-04 | DNV
2 | 2018-04-04 | 26
select schedulestart, unnest(string_to_array(unnest(string_to_array(event_labels, '&')), '=')) from table;
Now what I need is a way to actually perform an interval calculation (so 2018-04-04+11 days::interval), and I can if I only get a numbers list, but I need to also bind that result to each string. So the goal is an output like this:
eventdate | event_label
2018-04-12 | TTR
2018-04-20 | DNV
Where eventdate is the schedule start + which day of the schedule the event is on. I'm not sure how to take the unpacked string I created and use it to perform date calculations, and tie it to the string.
I've considered doing only one unnest, so that it's 11=TTR and 14=DNV, but I'm not sure how to take that to my desired result either. Is there a way to read a string until you reach a certain character, and then use that in calculations, and then read every character past a certain character in a string into a new column?
I'm aware completely rewriting how this is handled would be ideal, but I did not initially write it, and I don't have the time or means to rewrite the ~20 locations this is used.
Here is your table (I added userid column):
CREATE TABLE test(userid INTEGER, schedulestart DATE, event_labels VARCHAR);
And input data:
INSERT INTO test(userid,schedulestart , event_labels) VALUES
(2,DATE '2018-04-04', '9=TTR&11=DNV&14=SWW&26=DNV&2=QQQ&43=FTW');
And finally the solution:
SELECT
userid,
(schedulestart + (SPLIT_PART(kv,'=',1)||' days')::INTERVAL)::DATE AS eventdate,
SPLIT_PART(kv,'=',2) AS event_label
FROM (
SELECT
userid,schedulestart,
REGEXP_SPLIT_TO_TABLE(event_labels, '&') AS kv
FROM test
WHERE userid = 2
) a

Cassandra: how to search key columns with DECIMAL?

I understand Cassandra is designed for String based Key/Value pair.
I have a need to have Cassandra table with Decimal keys. Is there anyway to search the keys with range of numeric values. Like keys between 3 and 6 (inclusive)??.
Sample Key Column
1
3.3
6.345
9
10
2.5
Let's try this out. Assume a simple table with a decimal key, and a text value.
CREATE TABLE decimalRangePK (dec decimal, value text, PRIMARY KEY (dec));
In this case, dec is my partition key. And it is my only key, as there is not a clustering key present. After INSERTing some data, here is what I have:
aploetz#cqlsh:stackoverflow> SELECT * FROM decimalrangepk ;
dec | value
------+-------
2.5 | ghi
6.35 | abc
9 | def
3.2 | 3.2
1 | 1
3.3 | 3.3
10 | ten
(7 rows)
So I assume that you are trying a range query on your partition key, like this:
aploetz#cqlsh:stackoverflow> SELECT * FROM decimalrangeck WHERE dec>=3.3 AND dec<=9;
InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
As you can see, this doesn't work. Cassandra cannot execute range query on a partition key. However, because clustering keys are used enforce on-disk sort order (within a partition key) you can execute a range query on a clustering key.
In this next example, I'll try this again. But this time I will partition my data by date, like this:
CREATE TABLE decimalRangeCK (dateBucket text, dec decimal, value text,
PRIMARY KEY (dateBucket,dec));
After inserting some rows, I'll query the table and it will look slightly different:
aploetz#cqlsh:stackoverflow> SELECT * FROM decimalrangeck ;
datebucket | dec | value
------------+------+-------
20151108 | 1 | 1
20151108 | 3.2 | 3.2
20151110 | 2.5 | ghi
20151110 | 10 | ten
20151109 | 1 | 1
20151109 | 3.3 | 3.3
20151109 | 6.35 | abc
20151109 | 9 | def
(8 rows)
Now I can run a range query on dec, as long as I also provide a partition key:
aploetz#cqlsh:stackoverflow> SELECT * FROM decimalrangeck WHERE datebucket='20151109'
AND dec>=3.3 AND dec<=9;
datebucket | dec | value
------------+------+-------
20151109 | 3.3 | 3.3
20151109 | 6.35 | abc
20151109 | 9 | def
(3 rows)
As you can see, picking a good partition key is very important. High cardinality, unique partition keys are great for data distribution, but don't really give you a whole lot of query flexibility.

Calculate time range in org-mode table

Given a table that has a column of time ranges e.g.:
| <2015-10-02>--<2015-10-24> |
| <2015-10-05>--<2015-10-20> |
....
how can I create a column showing the results of org-evalute-time-range?
If I attempt something like:
#+TBLFM: $2='(org-evaluate-time-range $1)
the 2nd column is populated with
Time difference inserted
in every row.
It would also be nice to generate the same result from two different columns with, say, start date and end date instead of creating one column of time ranges out of those two.
If you have your date range split into 2 columns, a simple subtraction works and returns number of days:
| <2015-10-05> | <2015-10-20> | 15 |
| <2013-10-02 08:30> | <2015-10-24> | 751.64583 |
#+TBLFM: $3=$2-$1
Using org-evaluate-time-range is also possible, and you get a nice formatted output:
| <2015-10-02>--<2015-10-24> | 22 days |
| <2015-10-05>--<2015-10-20> | 15 days |
| <2015-10-22 Thu 21:08>--<2015-08-01> | 82 days 21 hours 8 minutes |
#+TBLFM: $2='(org-evaluate-time-range)
Note that the only optional argument that org-evaluate-time-range accepts is a flag to indicate insertion of the result in the current buffer, which you don't want.
Now, how does this function (without arguments) get the correct time range when evaluated is a complete mystery to me; pure magic(!)

Cognos Calculate Variance Crosstab (Dimensional)

This is very similar to Cognos Calculate Variance Crosstab (Relational), but my data source is dimensional.
I have a simple crosstab such as this:
| 04-13-2013 | 04-13-2014
---------------------------------------
Sold | 75 | 50
Purchased | 10 | 15
Repaired | 33 | 44
Filter: The user selects 1 date and then we include that date plus 1 year ago.
Dimension: The date is the day level in a YQMD Hierarchy.
Measures: We are showing various measures from a Measure Dimension.
Sold
Purchased
Repaired
Here is what is looks like in report studio:
| <#Day#> | <#Day#>
---------------------------------------
<#Sold#> | <#1234#> | <#1234#>
<#Purchased#> | <#1234#> | <#1234#>
<#Repaired#> | <#1234#> | <#1234#>
I want to be able to calculate the variance as a percentage between the two time periods for each measure like this.
| 04-13-2013 | 04-13-2014 | Var. %
-----------------------------------------------
Sold | 75 | 50 | -33%
Purchased | 10 | 15 | 50%
Repaired | 33 | 44 | 33%
I added a Query Expression to the right of the <#Day#> as shown below, but I cannot get the variance calculation to work.
| <#Day#> | <#Variance#>
---------------------------------------
<#Sold#> | <#1234#> | <#1234#>
<#Purchased#> | <#1234#> | <#1234#>
<#Repaired#> | <#1234#> | <#1234#>
These are the expressions I've tried and the results that I get:
An expression that is hard coded works, but only for that 1 measure:
total(case when [date] = 2014-04-13 then [Sold] end)
/
total(case when [date] = 2013-04-13 then [Sold] end)
-1
I thought CurrentMember and PrevMember might work, but it produces blank cells:
CurrentMember( [YQMD Hierarchy] )
/
prevMember(CurrentMember([YQMD Hierarchy]))
-1
I think it is because prevMember produces blank.
prevMember(CurrentMember([YQMD Hierarchy]))
Using only CurrentMember gives a total of both columns:
CurrentMember([YQMD Hierarchy])
What expression can I use to take advantage of my dimensional model and add a column with % variance?
These are the pages I used for research:
Variance reporting in Report Studio on Cognos 8.4?
Calculations that span dimensions - PDF
IBM Cognos 10 Report Studio: Creating Consumer-Friendly Reports
I hope there is a better way to do this. I finally found a resource that describes one approach to this problem. Using the tail and head functions, we can get to the first and last periods, and thereby calculate the % variance.
item(tail(members([Day])),0)
/
item(head(members([Day])),0)
-1
This idea came from IBM Cognos BI – Using Dimensional Functions to Determine Current Period.
Example 2 – Find Current Period by Filtering on Measure Data
If the OLAP or DMR data source has been populated with time periods into the future (e.g. end of year or future years), then the calculation of current period is more complicated. However, it can still be determined by finding the latest period that has data for a given measure.
item(tail(filter(members([sales_and_marketing].[Time].[Time].[Month]),
tuple([Revenue], currentMember([sales_and_marketing].[Time].[Time]))
is not null), 1), 0)