One of row will have the data as 30 > 50, or 170 > 40 etc
How to evalute this varchar column data to find what it represents true or false.
SQL Server 2008 R2 and above.
If you MUST keep your data in this (painful) form, then your best bet will probably be to parse the string into it's individual parts. Something like the following steps:
Parse string into operand1, operator, operand2
Cast operand1 and operand2 to int
Probably go into some painful case statement to apply the correct operator based on what you parsed out
I would personally recommend finding a way to calculate this before you insert into the database. What you are storing and what you need are very far apart right now.
Related
I wish to have stored many (N ~ about 150) boolean values of web app "environment" variables.
What is the proper way to get them stored?
creating N columns and one (1) row of data,
creating two (2) or three (3) columns (id smallserial, name varchar(255), value boolean) with N rows of data,
by using jsonb data type,
by using area data type,
by using bit string bit varying(n),
by another way (please advise)
Note: name may be too long.
Tia!
Could you perhaps use a bit string? https://www.postgresql.org/docs/7.3/static/datatype-bit.html. (Set the nth bit to 1 when the nth attribute would have been "true")
Depends how you wants to access them in normal usage.
Do you need to access one of this value at time, in this case JSONB is a really good way, really easy and quick to find a record, or do you need to get all of them in one call, in this case Bit String Types are the best, but you need to be really careful around, order and transcription for writing and reading..
Any of the options will do, depending on your circumstances. There is little need to optimise storage if you have only 150 values. Unless, of course there can be a very large number of these sets of 150 values or you are working in a very restricted environment like an embedded system (in which case a full-blown database client is probably not what you're looking for).
There is no definite answer here, but I will give you a few guidelines to consider. As from experience:
You don't want to have an anonymous string of values that is interpreted in code. When you change anything later on, your 1101011 or 0x12f08a will be rendered an fascinatingly enigmatic problem.
When the number of your fields starts to grow, you will regret if they are all stored in a single cell on a single row, because you will either be developing some obscure SQL or transforming a larger-than-needed dataset from the server.
When you feel that boolean values are really not enough, you start to wonder if there is a possibility to store something else too.
Settings and environmental properties are seldom subject to processor or data intensive processing, so follow the easiest path.
As my recommendation based on the given information and some educated guessing, you'll probably want to store your information in a table like
string key | integer set_idx | string value
---------------------------------------------------------
use.the.force | 1899 | 1
home.directory | 1899 | /home/dvader
use.the.force | 1900 | 0
home.directory | 1900 | /home/yoda
Converting a 1 to boolean true is cheap, and if you have only one set of values, you can ignore the set index.
I am working on a database that (hopefully) will end up using a primary key with both numbers and letters in the values to track lots of agricultural product. Due to the way in which the weighing of product takes place at more than one facility, I have no other option but to maintain the same base number but use letters in addition to this base number to denote split portions of each lot of product. The problem is, after I create record number 99, the number 100 suddenly floats up and underneath 10. This makes it difficult to maintain consistency and forces me to replace this alphanumeric lot ID with a strictly numeric value in order to keep it sorted (which I use "autonumber" as the data type). Either way, I need the alphanumeric lot ID, and so having 2 ID's for the same lot can be confusing for anyone inputting values into the form. Is there a way around this that I am just not seeing?
If you're using query as a data source then you may try to sort it by string converted to number, something like
SELECT id, field1, field2, ..
ORDER BY CLng(YourAlphaNumericField)
Edit: you may also try Val function instead of CLng - it should not fail on non-numeric input
Why not properly format your key before saving ? e.g: "0000099". You will avoid a costly conversion later.
Alternatively, you could use 2 fields as the composite PK. One with the Number (as Long) and one with the Location (as String).
I need to know if the results of SQL query has been changed between two queries.
The solution a came up with is to calculate and compare some hash value based on ResultSet content.
What is the preferred way?
There are no such special hashCode method, for ResultSet that is calculated based on all retrieved data. Definetly you can not use default hashCode method.
To be 100% sure that you will take into account all the changes in the data,
you have to retrieve all columns from all the rows from ResultSet one by one and calculate hash code for them with any possible way. (Put everything into single String and get it's hashCode).
But it's very time consumption operation. I would propose you to execute extra query that calculate hash sum by itself. For example it can return count of rows and sum of all columns/rows... or smth like that..
I'm using amazon redshift as my data warehouse
I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:
'test alpha'
'1382 test beta'
I want to filter out rows where the string does not start with four numbers
Looking at the redshift documentation, I don't believe isnumber or isnumeric are functions. It seems that the 'like' function is the best possibility.
I tried
where left(field1, 4) like '[0-9][0-9][0-9][0-9]'
this did not work and from the link below seems like redshift may not support that:
https://forums.aws.amazon.com/message.jspa?messageID=439850
is there an error in the 'where' clause? if not and that clause isn't supported in redshift, is there a way to filter? I was thinking of using cast
cast(left(field1,4) as integer)
and then passing over the row if it generated an error, but not sure how to do this in amazon redshift. or is there some other proxy for the isnumeric filter.
thanks
Try something like:
where field1 ~ '^[0-9]{4}'
It will match any string, that starts with 4 digits.
Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).
The UDF function is:
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Usage would be:
select isnumeric(mycolumn), * from mytable
where isnumeric(mycolumn)=false
It looks like the code you are looking for the is the similar to function:
where left(field,4) similar to '[0-9]{4}'
Redshift doc
It seems that redshift doesn't support any of the following:
where left(field1,4) like '[0-9][0-9][0-9][0-9]'
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'
what does seem to work is:
where left(field1,4) between 0 and 9999
this returns all rows that start with four numeric characters.
it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.
Per Amazon, the posix style ~regex style expressions are slow...
https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html
Using their own REGEXP_* functions seems to be faster.
https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html
For checking just a true/false for integers I've been using the following with success.
REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0
this returns 1 if only numeric, 0 if anything else
where regexp_instr(field1,'^[0-9]{4}') = 0
will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)
We have tried the following and worked for most of our scenarios:
columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'
This will positive, negative, integer and float numbers.
redshift should support similar to.
WHERE field1 SIMILAR TO '[0-9]{4}%'
This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.
I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.