Querying String Columns as Numeric

Querying String Columns as Numeric - sql-server-2008-r2

I am developing for multiple SQLServer database versions, so I cannot use TRY_CONVERT, as SQLSERVER 2008 R2 must be supported.
I have several tables that share the same set values, many of which are numeric, but some of which are not. (Yuck, bad design decision, but legacy code that is very hard to change.)
I need to get a list of all of the unique values that are numeric as integers that are within a certain range.
So I have SQL that looks sort of like this:
SELECT C1, C2_AS_INT (
SELECT C1, CAST( LTRIM(RTRIM(C2)) AS INT ) AS C2_AS_INT FROM T1 WHERE ISNUMERIC(C2) = 1 --We are accepting of any problems with ISNUMERIC
UNION
SELECT C1, CAST( LTRIM(RTRIM(C2)) AS INT ) AS C2_AS_INT FROM T2 WHERE ISNUMERIC(C2) = 1
) AS C2_AS_INT_QUERY
This is fine and it works (although I'm unsure of what datatype C2_AS_INT is in the outer query.)
However, when I add a where-clause to it, I'm getting an error that a failure occurs due to converting from an nvarchar to an int.
SELECT C1, C2_AS_INT (
SELECT C1, CAST( LTRIM(RTRIM(C2)) AS INT ) AS C2_AS_INT_INNER FROM T1 WHERE ISNUMERIC(C2) = 1 --We are accepting of any problems with ISNUMERIC
UNION
SELECT C1, CAST( LTRIM(RTRIM(C2)) AS INT ) AS C2_AS_INT_INNER FROM T2 WHERE ISNUMERIC(C2) = 1
) AS C2_AS_INT_QUERY
WHERE C2_AS_INT >= 1
It seems like it is adding the WHERE clause to the inner queries instead of the outer query which I would have suspected should have been cast to an INT.
Any suggestions on how to fix this error?

For all intents and purposes, I was able to take the original query into a Table-Valued function. Then I could apply the where clause as expected to the result of the UDF.
While I'm marking this as the accepted answer, I'm still interested in knowing if there's a technique that can be applied at the query level to get this to work.

Related

lead and lag on large table 1billion rows

I have a table T as follows with 1 Billion records. Currently, this table has no Primary key or Indexes.
create table T(
day_c date,
str_c varchar2(20),
comm_c varchar2(20),
src_c varchar2(20)
);
some sample data:
insert into T
select to_date('20171011','yyyymmdd') day_c,'st1' str_c,'c1' comm_c,'s1' src_c from dual
union
select to_date('20171012','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171013','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171014','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171015','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171016','yyyymmdd'),'st1','c1','s2' from dual
union
select to_date('20171017','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171018','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171019','yyyymmdd'),'st1','c1','s1' from dual
union
select to_date('20171020','yyyymmdd'),'st1','c1','s1' from dual;
The expected result is to generate the date ranges for the changes in column src_c.
I have the following code snippet which provides the desired result. However, it is slow as the cost of running lag and lead is quite high on the table.
WITH EndsMarked AS (
SELECT
day_c,str_c,comm_c,src_c,
CASE WHEN src_c= LAG(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_START,
CASE WHEN src_c= LEAD(src_c,1) OVER (ORDER BY day_c)
THEN 0 ELSE 1 END AS IS_END
FROM T
), GroupsNumbered AS (
SELECT
day_c,str_c,comm_c,
src_c,
IS_START,
IS_END,
COUNT(CASE WHEN IS_START = 1 THEN 1 END)
OVER (ORDER BY day_c) AS GroupNum
FROM EndsMarked
WHERE IS_START=1 OR IS_END=1
)
SELECT
str_c,comm_c,src_c,
MIN(day_c) AS GROUP_START,
MAX(day_c) AS GROUP_END
FROM GroupsNumbered
GROUP BY str_c,comm_c, src_c,GroupNum
ORDER BY groupnum;
Output :
STR_C COMM_C SRC_C GROUP_START GROUP_END
st1 c1 s1 11-OCT-17 13-OCT-17
st1 c1 s2 14-OCT-17 16-OCT-17
st1 c1 s1 17-OCT-17 20-OCT-17
Any suggestion to speed up?
Oracle database :12c.
SGA Memory:20GB
Total CPU:22
Explain plan:

Order by day_c only, or do you need to partition by str_c and comm_c first? It seems so - in which case I am not sure your query is correct, and Sentinel's solution will need to be adjusted accordingly.
Then:
For some reason (which escapes me), it appears that the match_recognize clause (available only since Oracle 12.1) is faster than analytic functions, even when the work done seems to be the same.
In your problem, (1) you must read 1 billion rows from disk, which can't be done faster than the hardware allows (do you REALLY need to do this on all 1 billion rows, or should you archive a large portion of your table, perhaps after performing this identification of GROUP_START and GROUP_END)? (2) you must order the data by day_c no matter what method you use, and that is time consuming.
With that said, the tabibitosan method (see Sentinel's answer) will be faster than the start-of-group method (which is close to, but simpler than what you currently have).
The match_recognize solution, which will probably be faster than any solution based on analytic functions, looks like this:
select str_c, comm_c, src_c, group_start, group_end
from t
match_recognize(
partition by str_c, comm_c
order by day_c
measures x.src_c as src_c,
first(day_c) as group_start,
last(day_c) as group_end
pattern ( x y* )
define y as src_c = x.src_c
)
-- Add ORDER BY clause here, if needed
;
Here is a quick explanation of how this works; for developers who are not familiar with match_recognize, I provided links to a few good tutorials in a Comment below this Answer.
The match_recognize clause partitions the input rows by str_c and comm_c and orders them by day_c. So far this is exactly the same work that analytic functions do.
Then in the PATTERN and DEFINE clauses I declare and define two "classes" of rows, which will be flagged as X and Y, respectively. X is any row (there are no restrictions on it in the DEFINE clause). However, Y is restricted: it must have the same src_c as the last X row preceding it.
So, in each partition, and reading from the earliest row to the latest (within the partition), I am looking for any number of matches, where a match consists of an arbitrary row (marked X), followed by as many Y rows as possible; where Y means "same src_c as the first row in this match. So, this will identify sequences of rows where the src_c did not change.
For each match that is found, the clause will output the src_c value from the X row (which is the same, really, for all the rows in that match), and the first and the last value in the day_c column for that match. That is what we need to put in the SELECT clause of the overall query.

You can eliminate one CTE by using the Tabibito-san (Traveler) method:
with Groups as (
select t.*
, row_number() over (order by day_c)
- row_number() over (partition by str_c
, comm_c
, src_c
order by day_c) GroupNum
from t
)
select str_c
, comm_c
, src_c
, min(day_c) GROUP_START
, max(day_c) GROUP_END
from Groups
group by str_c
, comm_c
, src_c
, GroupNum

How Dynamicaly columns in UNPIVOT operator

I currently have the following query:
WITH History AS (
SELECT
kz.*,
kz.__$operation AS operation,
map.tran_begin_time as beginT,
map.tran_end_time as endT
FROM cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all') AS kz
INNER JOIN [cdc].[lsn_time_mapping] map
ON kz.[__$start_lsn] = map.start_lsn
where kz.GUID_BalanceHC_Zalezh = 'DDA9AB3A-A0AF-4623-9362-0000C8C83D63'
),
UnpivotedValues AS(
SELECT guid, GUID_another, field, val, operation, beginT, endT
FROM History
UNPIVOT ( [val] FOR field IN
(
area,
oilwidthmin,
oilwidthmax,
efectivwidthmin,
efectivwidthmax,
etc...
))t
),
UnpivotedWithLastValue AS (
SELECT
*,
--Use LAG() to get the last value for the same field
LAG(val, 1) OVER (PARTITION BY guid, GUID_another, field ORDER BY BeginT) LastVal
FROM UnpivotedValues
)
SELECT * FROM UnpivotedWithLastValue WHERE val <> LastVal OR LastVal IS NULL ORDER BY guid
This query returns the changed values for a single table that has CDC (Change Data Capture) enabled.
I want to create a stored procedure that receives the columns to be unpivoted, and the cdc function (e.g. cdc.fn_cdc_get_all_...) as parameters and returns the result set.
The result for this tables must be joined in one report.
In my case parameter 1 is cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all'). This is the CDC function.
How should I send the list of fields that i want in the result? How's the string?
Also, is there a way to do without dynamic SQL? Dynamic SQL it is not better solution for performance.

As you know SQL Server is declarative by design and does not support macro substitution.
UNPIVOT would clearly be more performant, but here is a simplified example of a UNPIVOT which does not require Dynamic SQL, but only a little XML.
Example
Let's assume your table/results looks like this:
You may notice that I only we only specify key fields to EXCLUDE in the final WHERE
Declare #YourData table (ID int,Active bit,First_Name varchar(50),Last_Name varchar(50),EMail varchar(50),Salary decimal(10,2))
Insert into #YourData values
(1,1,'John','Smith','john.smith#email.com',85600),
(2,0,'Jane','Doe' ,'jane.doe#email.com',83200)
;with cte as (
-- Replace with your Complex Query
Select * from #YourData
)
Select A.ID
,A.Active
,C.*
From cte A
Cross Apply (Select XMLData=cast((Select A.* for XML RAW) as xml)) B
Cross Apply (
Select Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From XMLData.nodes('/row') C1(n)
Cross Apply C1.n.nodes('./#*') C2(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('ID','Active')
) C
Returns

TSQL -- Where Statements on Multiple columns in Update

My basic question has to do with updating multiple columns at once from specified values in my query. The reason I want to do this is that I am updating my values from a ginormous table so I only want to query it once in order to reduce run time. Here is an example of an example select statement that returns the value I want for just one of the columns I need to update:
select a.Value
from Table1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2}
In order to update multiple columns at once I would like to be able do something like this (but it returns NULL):
Update T1
set T1Value1 = (select a.Value where {Condition1a on FilterCol1}
and {Condition2a on FilterCol2)
,T1Value2 = (select a.Value where {Condition1b on FilterCol1}
and {Condition2b on FilterCol2})
from Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
from Table2
) a on a.ID = Table1.ID
Any help figuring this out would be greatly appreciated, let me know if you have any questions or if I made any errors. Thanks!
EDIT: I think I have identified the problem, but I'm not sure of a solution yet. I think seeing the issue requires a little more context: The select from table 2 is actually an unpivot on a wide table. This means that when the left outer join is applied, there will be multiple rows for a given ID. What the case statement that Earl suggested seems to be doing (and I assume this is happening with the where clause as well) is comparing my Conditions to only the first row of the columns from a. Since my conditions are meant to help determine which of the rows from a is chosen, they will always evaluate false for the first row (I know this just from what I know about the data), hence my perpetual NULL values. Does anyone know of a workaround to look at the other rows in a?

UPDATE T1
SET T1Value1 = CASE WHEN (FilterCol1 = Condition1a AND FilterCol2 = Condition2a) THEN a.Value END,
T1Value2 = CASE WHEN (FilterCol1 = Condition1b AND FilterCol2 = Condition2b) THEN a.Value END
FROM Table1 T1
left outer join
(
select ID, FilterCol1, FilterCol2, Value
) a on a.ID = Table1.ID

How to perform "a UNION b" when a and b are CTEs?

If I try to UNION (or INTERSECT or EXCEPT) a common table expression I get a syntax error near the UNION. If instead of using the CTE I put the query into the union directly, everything works as expected.
I can work around this but for some more complicated queries using CTEs makes things much more readable. I also just don't like not knowing why something is failing.
As an example, the following query works:
SELECT *
FROM
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.start_point_oid
UNION
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.end_point_oid
) AS allpoints
;
But this one fails with:
ERROR: syntax error at or near "UNION"
LINE 20: UNION
WITH
startpoints AS
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.start_point_oid
),
endpoints AS
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.end_point_oid
)
SELECT *
FROM
(
startpoints
UNION
endpoints
) AS allpoints
;
The data being UNIONed together is identical but one query fails and the other does not.
I'm running PostgreSQL 9.3 on Windows 7.

The problem is because CTEs are not direct text-substitutions and a UNION b is invalid SELECT syntax. The SELECT keyword is a mandatory part of the parsing and the syntax error is raised before the CTEs are even taken into account.
This is why
SELECT * FROM a
UNION
SELECT * FROM b
works; the syntax is valid, and then the CTEs (represented by a and b) are then used at the table-position (via with_query_name).

At least in SQL Server, I can easily do this - create two CTE's, and do a SELECT from each, combined with a UNION:
WITH FirstNames AS
(
SELECT DISTINCT FirstName FROM Person
), LastNames AS
(
SELECT DISTINCT LastName FROM Person
)
SELECT * FROM FirstNames
UNION
SELECT * FROM LastNames
Not sure if this works in Postgres, too - give it a try!

query for a range of records in result

I am wondering if there is some easy way, a function, or other method to return data from a query with the following results.
I have a SQL Express DB 2008 R2, a table that contains numerical data in a given column, say col T.
I am given a value X in code and would like to return up to three records. The record where col T equals my value X, and the record before and after, and nothing else. The sort is done on col T. The record before may be beginning of file and therefore not exist, likewise, if X equals the last record then the record after would be non existent, end of file/table.
The value of X may not exist in the table.
This I think is similar to get a range of results in numerical order.
Any help or direction in solving this would be greatly appreciated.
Thanks again,

It might not be the most optimal solution, but:
SELECT T
FROM theTable
WHERE T = X
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T > X
ORDER BY T
) blah
UNION ALL
SELECT *
FROM
(
SELECT TOP 1 T
FROM theTable
WHERE T < X
ORDER BY T DESC
) blah2

DECLARE #x int = 100
;WITH t as
(
select ROW_NUMBER() OVER (ORDER BY T ASC) AS row_nm,*
from YourTable
)
, t1 as
(
select *
from t
WHERE T = #x
)
select *
from t
CROSS APPLY t1
WHERE t.row_nm BETWEEN t1.row_nm -1 and t1.row_nm + 1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse