Copying of data from varchar2 to number in a same table - oracle10g

I have two columns and i need to copy of data from column VISITSAUTHORIZED to NEWVISITS, When i use below command to copy data i am getting an error message "invalid number".Can anyone correct this ?
VISITSAUTHORIZED VARCHAR2(10)
NEWVISITS NUMBER(8)
SQL> update patientinsurance set NEWVISITS=VISITSAUTHORIZED ;
ERROR at line 1:
ORA-01722: invalid number

It depends what kind of data you have in your old column. If it is all consistently formatted then you might be able to do:
update patientinsurance
set newvisits = to_number(visitsauthorized, '<format model>')
But it sounds more likely that you have something less easy to deal with. (The joys of storing data as the wrong datatype, which I assume is what you're now correcting). If there are rogue characters then you could use translate to get rid of them, perhaps, but you'd have to wonder about the integrity of the data and the values you end up with.
You can do something like this to display all the values that can't be converted, which may give you an idea of the best way to proceed - if there are only a few you might be able to correct them manually before re-running your update:
set serveroutput on
declare
newvisits number;
number_format_exception exception;
pragma exception_init(number_format_exception, -6502);
begin
for r in (select id, visitsauthorized from patientinsurance) loop
begin
newvisits := to_number(r.visitsauthorized);
exception
when number_format_exception then
dbms_output.put_line(sqlcode || ' ID ' || r.id
|| ' value ' || r.visitsauthorized);
end;
end loop;
end;
/
This is guessing you have a unique identifier field called ID, but change that as appropriate for your table, obviously.
Another approach is to convert the numbers that are valid and skip over the rest, which you can do with an error logging table:
exec dbms_errlog.create_error_log(dml_table_name => 'PATIENTINSURANCE');
merge into patientinsurance target
using (select id, visitsauthorized from patientinsurance) source
on (target.id = source.id)
when matched then
update set target.newvisits = source.visitsauthorized
log errors into err$_patientinsurance reject limit unlimited;
You can then query the error table to see what failed:
select id, visitsauthorized, ora_err_number$
from err$_patientinsurance;
Or see which records in your main table have newvisits still null. Analysing your data should probably be the first step though.
If you want to strip out all non-numeric characters and treat whatever is left as a number then you can change the merge to do:
...
update set target.newvisits = regexp_replace(source.visitsauthorized,
'[^[:digit:]]', null)
But then you probably don't need the merge, you can just do:
update patientinsurance set newvisits = regexp_replace(visitsauthorized,
'[^[:digit:]]', null);
This will strip out any group or decimal separators as well, which might not be an issue, particularly as you're inserting into a number(8) column. But you could preserve those if you wanted to, by changing the pattern to '[^[:digit:].,]'... though that could give you other problems still, potentially.
You can also do this with translate if the regex is too slow.

Related

What's the easiest way to find rows of data that are causing errors in Query?

I'm getting an error when I run a query due to a couple of lines of my source data having decimals when they should not. I need to pinpoint these specific rows so I can correct them in the source data but I do not know how to find them. Is there an easy way to pinpoint the rows that are causing the error?
It appears your data is improperly typed, you have a text or varchar data type for what should be numeric class type. If so that needs to be corrected, but yes you must fix the data first. You can to isolate and fix the incorrect rows by selecting those which fail a regexp that defines a valid value. Since you did not post sample data nor table descriptions only provide an example that I make up, not your values.
Example:
- For just digits only: column_name !~ '^[0-9]+$'
- For Positive or Negative Integer: column_name !~ '^(+|-)?[0-9]+$'
with test (id,num_val) as
( values (1,'abc'),(2,'18'),(3,'1.7'),(4,'-12') )
select id, num_val, 'Fails Just Digits'
from test
where num_val !~ '^[0-9]+$'
union
select id, num_val, 'Fails Pos/Neg Integer'
from test
where num_val !~ '^(\+|-)?[0-9]+$'
order by 1,3;
fails both
passes both (so not in output)
fails both
fails Just digits

How can I change all occurrences of a particular value in any column in PostgreSQL?

I have three different values in my database that represent a null: an actual null, an empty string, and a string {x:Null}. This value appears across multiple columns.
{x:Null} is normalized on the web front-end, so all these values look exactly the same although they end up ordered differently in a sort. How can I write a query that will take these values and make them actual nulls across every column and every table?
Bonus points if you can tell me how to make sure these other empty values are always inserted as nulls going forward. (Disclaimer: I have no power to grant any actual bonus points. ;)
You can query the information_schema to get a list of all tables and columns with a string type.
SELECT table_name, column_name
FROM information_schema.columns
WHERE data_type IN ('text', 'character', 'character varying')
NOTE double check first what values data_type has, I'm not sure if it will be character or char or what.
Then I would write a small program to update each column in each table. Here it is sketched out in Perl.
while( my($table, $column) = $sth->fetch ) {
my $q_table = $dbh->quote($table);
my $q_column = $dbh->quote($column);
$dbh->do(q[
UPDATE `$q_table`
SET `$q_column` = NULL
WHERE `$q_column` = '{x:Null}'
OR `$q_column` = ''
]);
}
Be sure to SQL escape $table and $column as in my sample.
Going forward, you'll have to set CONSTRAINTS on each and every column. You can use the information_schema.columns to do this as well. Something like
ALTER TABLE `$q_table` ADD CHECK(`$q_column` NOT IN ('{x:Null}', ''))
You could use a trigger to change the values to NULL, but I don't like data stores that silently change basic data for application purposes.
For new columns and tables, you'll have to remember to add that constraint. Same caveats about data_type apply.
However, it's probably a bad idea to say that no column can ever be an empty string. You might want to be bit more selective.
Another thing to note: NULL is a funny thing, its not true and its not false. You might be better off deciding that an empty string is the thing to set empty values to.
I don't think this approach is maintainable. It's scribbling an application rule all over the data layer. What if you have some data that doesn't follow that rule? And it will have to be continuously maintained for any new data schema added. Perhaps instead you should put this at your ORM layer. Or write a few stored procedures to take care of this.
Using the information_schema.columns table, write a procedural language routine which iterates through all applicable tables and columns, executing an update... set *column* = NULL...where column in ('','{x:Null}'). for each eligible column.
As for inserting these values as NULL going forward, you would have to set triggers on your tables to intercept these values and replace them with NULL.
I don't think there is any query that would do this thing for every table and every column. In principle, what you want to do is
UPDATE table SET column=NULL WHERE column='' OR column='{x:Null}';
You could try selecting data from the pg_attribute and pg_class columns to get the names of the tables and names of the columns and then generating automatically the queries. Be sure to select only those columns that contain textual data.
What if somebody has entered a genuine string '{x:Null}'? You would then change it into NULL.
However, you have done a real mistake by letting the situation to be as bad as it's currently. You should always normalize data before putting it into a database.

How can I generate a unique string per record in a table in Postgres?

Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.
Ideally there would be a way for Postgres to handle both of these concerns:
generate the string
ensure its uniqueness
And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.
I don't claim the following is efficient, but it is how we have done this sort of thing in the past.
CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
new_uid text;
done bool;
BEGIN
done := false;
WHILE NOT done LOOP
new_uid := md5(''||now()::text||random()::text);
done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
END LOOP;
RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;
make_uid() can be used as the default for a column in my_table. Something like:
ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();
md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.
All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.
First let us create table with code field and add unique index
CREATE TABLE my_table
(
code TEXT NOT NULL
);
CREATE UNIQUE INDEX ON my_table (lower(code));
Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1
CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
new_code TEXT;
BEGIN
LOOP
new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
BEGIN
INSERT INTO my_table (code) VALUES (new_code);
EXIT;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
This is guaranteed error free solution not like other solutions on this thread
Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.
For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.
For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.
You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).
The easiest way probably to use the sequence to guarantee uniqueness
(so after the seq add a fix x digit random number):
CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;
However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)
Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):
PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?
Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?
Generating Random Data Via Sql
"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0)
) AS y(x)
)
FROM generate_series(1,5) as a(b);
Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.
Example:
select base36_encode(generate_series(1000000000,1000000010));
GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2

Sequence Generators in T-SQL

We have an Oracle application that uses a standard pattern to populate surrogate keys. We have a series of extrinsic rows (that have specific values for the surrogate keys) and other rows that have intrinsic values.
We use the following Oracle trigger snippet to determine what to do with the Surrogate key on insert:
IF :NEW.SurrogateKey IS NULL THEN
SELECT SurrogateKey_SEQ.NEXTVAL INTO :NEW.SurrogateKey FROM DUAL;
END IF;
If the supplied surrogate key is null then get a value from the nominated sequence, else pass the supplied surrogate key through to the row.
I can't seem to find an easy way to do this is T-SQL. There are all sorts of approaches, but none of which use the notion of a sequence generator like Oracle and other SQL-92 compliant DBs do.
Anybody know of a really efficient way to do this in SQL Server T-SQL? By the way, we're using SQL Server 2008 if that's any help.
You may want to look at IDENTITY. This gives you a column for which the value will be determined when you insert the row.
This may mean that you have to insert the row, and determine the value afterwards, using SCOPE_IDENTITY().
There is also an article on simulating Oracle Sequences in SQL Server here: http://www.sqlmag.com/Articles/ArticleID/46900/46900.html?Ad=1
Identity is one approach, although it will generate unique identifiers at a per table level.
Another approach is to use unique identifiers, in particualr using NewSequantialID() that ensues the generated id is always bigger than the last. The problem with this approach is you are no longer dealing with integers.
The closest way to emulate the oracle method is to have a separate table with a counter field, and then write a user defined function that queries this field, increments it, and returns the value.
Here is a way to do it using a table to store your last sequence number. The stored proc is very simple, most of the stuff in there is because I'm lazy and don't like surprises should I forget something so...here it is:
----- Create the sequence value table.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[SequenceTbl]
(
[CurrentValue] [bigint]
) ON [PRIMARY]
GO
-----------------Create the stored procedure
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE procedure [dbo].[sp_NextInSequence](#SkipCount BigInt = 1)
AS
BEGIN
BEGIN TRANSACTION
DECLARE #NextInSequence BigInt;
IF NOT EXISTS
(
SELECT
CurrentValue
FROM
SequenceTbl
)
INSERT INTO SequenceTbl (CurrentValue) VALUES (0);
SELECT TOP 1
#NextInSequence = ISNULL(CurrentValue, 0) + 1
FROM
SequenceTbl WITH (HoldLock);
UPDATE SequenceTbl WITH (UPDLOCK)
SET CurrentValue = #NextInSequence + (#SkipCount - 1);
COMMIT TRANSACTION
RETURN #NextInSequence
END;
GO
--------Use the stored procedure in Sql Manager to retrive a test value.
declare #NextInSequence BigInt
exec #NextInSequence = sp_NextInSequence;
--exec #NextInSequence = sp_NextInSequence <skipcount>;
select NextInSequence = #NextInSequence;
-----Show the current table value.
select * from SequenceTbl;
The astute will notice that there is a parameter (optional) for the stored proc. This is to allow the caller to reserve a block of ID's in the instance that the caller has more than one record that needs a unique id - using the SkipCount, the caller need make only a single call for however many IDs are needed.
The entire "IF EXISTS...INSERT INTO..." block can be removed if you remember to insert a record when the table is created. If you also remember to insert that record with a value (your seed value - a number which will never be used as an ID), you can also remove the ISNULL(...) portion of the select and just use CurrentValue + 1.
Now, before anyone makes a comment, please note that I am a software engineer, not a dba! So, any constructive criticism concerning the use of "Top 1", "With (HoldLock)" and "With (UPDLock)" is welcome. I don't know how well this will scale but this works OK for me so far...

How do I avoid using cursors in Sybase (T-SQL)?

Imagine the scene, you're updating some legacy Sybase code and come across a cursor. The stored procedure builds up a result set in a #temporary table which is all ready to be returned except that one of columns isn't terribly human readable, it's an alphanumeric code.
What we need to do, is figure out the possible distinct values of this code, call another stored procedure to cross reference these discrete values and then update the result set with the newly deciphered values:
declare c_lookup_codes for
select distinct lookup_code
from #workinprogress
while(1=1)
begin
fetch c_lookup_codes into #lookup_code
if ##sqlstatus<>0
begin
break
end
exec proc_code_xref #lookup_code #xref_code OUTPUT
update #workinprogress
set xref = #xref_code
where lookup_code = #lookup_code
end
Now then, whilst this may give some folks palpitations, it does work. My question is, how best would one avoid this kind of thing?
_NB: for the purposes of this example you can also imagine that the result set is in the region of 500k rows and that there are 100 distinct values of look_up_code and finally, that it is not possible to have a table with the xref values in as the logic in proc_code_xref is too arcane._
You have to have a XRef table if you want to take out the cursor. Assuming you know the 100 distinct lookup values (and that they're static) it's simple to generate one by calling proc_code_xref 100 times and inserting the results into a table
Unless you are willing to duplicate the code in the xref proc, there is no way to avoid using a cursor.
They say, that if you must use cursor, then, you must have done something wrong ;-) here's solution without cursor:
declare #lookup_code char(8)
select distinct lookup_code
into #lookup_codes
from #workinprogress
while 1=1
begin
select #lookup_code = lookup_code from #lookup_codes
if ##rowcount = 0 break
exec proc_code_xref #lookup_code #xref_code OUTPUT
delete #lookup_codes
where lookup_code = #lookup_code
end