I have an alphanumeric column in my DB2 table. I'm searching for results that fall between two user entered values.
Sample Data
ABC300
ABC2002
CDEF200
ABC429
UOH250
Sample SQL Query
SELECT VALUE
FROM TABLE
WHERE VALUE BETWEEN 'ABC200' AND 'ABC700'
Returned Values
ABC300
ABC2002
ABC429
ABC2002 is an unwanted result. I understand why the query is returning this result. It's doing a string comparison and sees that "ABC2" falls between "ABC200" and "ABC700" and stops the comparison.
I know about PATINDEX but I'm using DB2 and there is no equivalent.
I've tried using TRANSLATE like this:
WHERE TRANSLATE(LOWER(VALUE), '', 'abcdefghijklmnopqrstuvwxyz')
BETWEEN TRANSLATE(LOWER('ABC200'), '', 'abcdefghijklmnopqrstuvwxyz')
AND TRANSLATE(LOWER('ABC700'), '', 'abcdefghijklmnopqrstuvwxyz')
And like this:
WHERE TRANSLATE(LOWER(VALUE), '', 'abcdefghijklmnopqrstuvwxyz')
BETWEEN 200 AND 700
And neither give desirable results.
The alphabetic prefix is not a fixed value or fixed length.
Any ideas? Thank you.
EDIT
Ok after explaining the problem here I was able to solve it (See: Rubber Duck Debugging). Here is what I did to solve my issue:
SELECT VALUE
FROM TABLE
WHERE TRIM(TRANSLATE(VALUE, '', ' 0123456789'))
BETWEEN TRIM(TRANSLATE(UPPER(#VALUE_HI), '', ' 0123456789'))
AND TRIM(TRANSLATE(UPPER(#VALUE_LO), '', ' 0123456789'))
AND TRIM(TRANSLATE(VALUE, '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
BETWEEN INT(TRIM(TRANSLATE(UPPER(#VALUE_LO), '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')))
AND INT(TRIM(TRANSLATE(UPPER(#VALUE_HI), '', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')))
If you want this functionality, you should split this column into two separate columns. A character column should store the prefix, and a numeric column the number.
The problem here is that you need to perform two different comparison operations:
A string comparison on the prefix.
A numeric comparison on the numeric portion.
You can't squeeze both of these things out of a single comparison.
You should probably change the underlying data permanently, if possible. The problem arises because you are storing two pieces of information (a category and a rank) in a single field, which goes against good database design.
But even if you can't, this is still the right approach: use a subquery to generate the two fields.
with col_split as (
select
translate(value, '', '1234567890') prefix,
cast(translate(lower(value), '', 'abcdefghijklmnopqrstuvwxyz') as int) number
from table
)
select *
from col_split
where
prefix = 'ABC' and
number between 200 and 700
Related
I am working on a database that (hopefully) will end up using a primary key with both numbers and letters in the values to track lots of agricultural product. Due to the way in which the weighing of product takes place at more than one facility, I have no other option but to maintain the same base number but use letters in addition to this base number to denote split portions of each lot of product. The problem is, after I create record number 99, the number 100 suddenly floats up and underneath 10. This makes it difficult to maintain consistency and forces me to replace this alphanumeric lot ID with a strictly numeric value in order to keep it sorted (which I use "autonumber" as the data type). Either way, I need the alphanumeric lot ID, and so having 2 ID's for the same lot can be confusing for anyone inputting values into the form. Is there a way around this that I am just not seeing?
If you're using query as a data source then you may try to sort it by string converted to number, something like
SELECT id, field1, field2, ..
ORDER BY CLng(YourAlphaNumericField)
Edit: you may also try Val function instead of CLng - it should not fail on non-numeric input
Why not properly format your key before saving ? e.g: "0000099". You will avoid a costly conversion later.
Alternatively, you could use 2 fields as the composite PK. One with the Number (as Long) and one with the Location (as String).
$result = mysqli_query($link, "INSERT INTO mytable...
`friends`,
`friend1`,
`friend2`,
`friend3`,
VALUES (NULL, '$friends',
'$friends[0]'
'$friends[1]'
'$friends[2]'
Using cloneya.js to duplicate fields, I get an array value for a set of 3 names. Posting to mysql, I get three names in the in the first field(friends) but only the first,second and third letter of the first name in the subsequent fields (friend1-3). How can I insert each name to the separate fields?
$friends is not an array in your case, its just a string. Which means that $friends[0] will be the first character of that string, $friends[1] will be the second character and so on. You must send the friends with a different variable name so you have $friends which is a string and $otherName as an array which you can use in your sql query.
Keep in mind to use prepared statements when you use sql queries which depends on variables. Also convert your tables to 3NF.
How would you search for the longest match within a varchar variable? For example, table GOB has entries as follows:
magic_word | prize
===================
sh| $0.20
sha| $0.40
shaz| $0.60
shaza| $1.50
I would like to write a plpgsql function that takes amongst other arguments a string as input (e.g. shazam), and returns the 'prize' column on the row of GOB with the longest matching substring. In the example shown, that would be $1.50 on the row with magic_word shaza.
All the function format I can handle, it's just the matching bit. I can't think of an elegant solution. I'm guessing it's probably really easy, but I am scratching my head. I don't know the input string at the start, as it will be derived from the result of a query on another table.
Any ideas?
Simple solution
SELECT magic_word
FROM gob
WHERE 'shazam' LIKE (magic_word || '%')
ORDER BY magic_word DESC
LIMIT 1;
This works because the longest match sorts last - so I sort DESC and pick the first match.
I am assuming from your example that you want to match left-anchored, from the beginning of the string. If you want to match anywhere in the string (which is more expensive and even harder to back up with an index), use:
...
WHERE 'shazam' LIKE ('%' || magic_word || '%')
...
SQL Fiddle.
Performance
The query is not sargable. It might help quite a bit if you had additional information, like a minimum length that you could base an index on, to reduce the number of rows to consider. It needs to be criteria that gets you less than ~ 5% of the table to be effective. So, initials (a natural minimum pick) may or may not be useful. But two or three letters at the start might help quite a bit.
In fact you could optimize this iteratively. Something along the line of:
Try a partial index of words with 15 letters+
If not found, try 12 letters+
If not found, try 9 letters+
...
A simple case of what I outlined in this related answer on dba.SE:
Can spatial index help a “range - order by - limit” query
Another approach would be to use a trigram index. You'd need the additional module pg_trgm for that. Normally you would search with a short pattern in a table with longer strings. But trigrams work for your reverse approach, too, with some limitations. Obviously you couldn't match a string with just two characters in the middle of a longer string using trigrams ... Test for corner cases.
There are a number of answers here on SO with more information. Example:
Effectively query on column that includes a substring
Advanced solution
Consider the solution under this closely related question for a whole table of search strings. Implemented with a recursive CTE:
Longest Prefix Match
How about
1
select max(FOO.matchingValue)
from
(
select magic_word as matchingValue
from T
where substr( "abracadabra", 1, length(magic_word)) = magic_word
)
as FOO
2
select prize from
T
join
(
select max(FOO.matchingValue) as MaxValue
from
(
select magic_word as matchingValue
from T
where substr( "abracadabra", 1, length(magic_word)) = magic_word
)
as FOO
) as BAR
on BAR.MaxValue = T.magic_word
I've come across full text search in postgres in the last few days, and I am a little confused about indexing when searching across multiple columns.
The postgres docs talk about creating a ts_vector index on concatenated columns, like so:
CREATE INDEX pgweb_idx ON pgweb
USING gin(to_tsvector('english', title || ' ' || body));
which I can search like so:
... WHERE
(to_tsvector('english', title||' '||body) ## to_tsquery('english', 'foo'))
However, if I wanted to sometimes search just the title, sometimes just the body, and sometimes both, I would need 3 separate indexes. And if I added in a third column, that could potentially be 6 indexes, and so on.
An alternative which I haven't seen in the docs is just to index the two columns seperately, and then just use a normal WHERE...AND query:
... WHERE
(to_tsvector('english', title) ## to_tsquery('english','foo'))
AND
(to_tsvector('english', body) ## to_tsquery('english','foo'))
Benchmarking the two on ~1million rows seems to have basically no difference in performance.
So my question is:
Why would I want to concatenate indexes like this, rather than just indexing columns individually? What are the advantages/disadvantages of both?
My best guess is that if I knew in advance I would only want to ever search both columns (never one at a time) I would only ever need one index by concatenating which use less memory.
Edit
moved to: https://dba.stackexchange.com/questions/15412/postgres-full-text-search-with-multiple-columns-why-concat-in-index-and-not-at
Using one index is easier / faster for a DB;
It will be quite difficult to properly rank results when using two indexes;
You can assign relative weights to columns when creating a single index, so that match in title will be worth more than a match in body;
You are searching for a single word here, what happens if you search for several and they appear separately in different columns?
To answer the question of the implementation of #3, please see https://www.postgresql.org/docs/9.1/textsearch-controls.html:
a weight is one of the letters A, B, C, or D
UPDATE tt SET ti =
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(keyword,'')), 'B') ||
setweight(to_tsvector(coalesce(abstract,'')), 'C') ||
setweight(to_tsvector(coalesce(body,'')), 'D');
Just to be clear, I cannot use CLR UDF for this and SUBSTRING and CHARINDEX just don't cut the mustard.
We have a faux account management system with accounts being sub-accounts of others, better described here (with tables too :) )
Now, assuming I have an account 2.4.1.3 (obviously, the parent becomes 2.4.1) and if wanted to extract the 'prefix' 2.4.1 so that I may create another sibling account with the next ID in line (assume 2.4.1.4) how would I go about splitting such a string in T-SQL?
Of course, a similar way can be applied to children accounts, but that's just butterscotch for this sundae.
Try something like this:
DECLARE #accountno VARCHAR(50) = '2.4.1.3'
SELECT
REVERSE(#accountno),
CHARINDEX('.', REVERSE(#accountno)),
SUBSTRING(#accountno, 1, LEN(#accountno) - CHARINDEX('.', REVERSE(#accountno)))
That third element in the SELECT statement should be the one that extracts the "prefix" 2.4.1 from your account number string.
Basically, what I do is reverse the string and then look for the first occurence of the dot ('.') - the first in the reversed string is the last in the original string, and that's what you want to extract up to.