Advanced usage of "like" statement in sqlite - iphone

I am now working on the sqlite in iPhone platform and dealing with the database with some strange entities. One of the column in this DB called type, which allows multiple values (multiple types). The data in this column is like this:
1|9|20|31|999
The table is like:
ID TYPE
------------------------
1 1|9|20|31|999
2 5|13|15|30|990
3 6|7|45|46|57
When the user want to select the data with the type 9, it needs select the above data because the entity contains 9. And I use the following statement to execute:
SELECT id
FROM table
WHERE type LIKE '%' || ? || '%'
The problem is that the data with the type 999 and without type 9 will also be selected. Also, if the user wants to select type 1, the data with type 11, 12, 13...etc will also be selected.
I tried to use the statement:
SELECT id
FROM table
WHERE type LIKE '[^0123456789]' || ? || '[^0123456789]'
But it can't select any data.
What can I do to select the data with correct type?
(The database cannot be changed because of the company requirement)

You'd need a regular expression that matches only 'yournumber|', '|yournumber|', '|yournumber', or a combination of the above. I see no way an optimizer could use an index on that kind of thing short of a specialized index that would work similarly to full-text indexes.
This is almost hopeless, unless you have a trivial amount of data (or way too many processing resources), it will not scale.
If the amount of data is small, consider just selecting like '%number%', and do some post-processing in you application code.
Otherwise, normalize you data.

You could try this
SELECT id
FROM table
WHERE '|' + type + '|' LIKE '%|' + ? + '|%'
(I use the + operator here because it's easier to read but you could also use || instead)
It's an easy solution without regex. Indexes don't help anything here (this is the same for regex solution) the query will be slow with huge data.
I had a similar question I asked myself a while ago. Also check it's answer

I'm not too much familiar with SQLite but what about:
SELECT id FROM table WHERE type REGEX '(^9\|.*)|(.*\|9\|.*)|(.*\|9$)';
assuming that you are looking for the type 9
Then to look for type 999 it would be:
SELECT id FROM table WHERE type REGEX '(^999\|.*)|(.*\|999\|.*)|(.*\|999$)';
Note that you might want to tweak the expression if you may potentially have a single type value in the type field because in that case I'd suppose that there wouldn't be any | character.

Related

format issue in scala, while having wildcards characters

I have a sql query suppose
sqlQuery="select * from %s_table where event like '%holi%'"
listCity=["Bangalore","Mumbai"]
for (city<- listCity){
print(s.format(city))
}
Expected output:
select * from Bangalore_table where event like '%holi%'
select * from Mumbai_table where event like '%holi%'
Actual output:
unknown format conversion exception: Conversion='%h'
Can anyone let me how to solve this, instead of holi it could be anything iam looking for a generic solution in scala.
If you want the character % in a formatting string you need to escape it by repeating it:
sqlQuery = "select * from %s_table where event like '%%holi%%'"
More generally I would not recommend using raw SQL. Instead, use a library to access the database. I use Slick but there are a number to choose from.
Also, having different tables named for different cities is really poor database design and will cause endless problems. Create a single table with an indexed city column and use WHERE to select one or more cities for inclusion in the query.

Is there a SQLite for Swift method for more complex ORDER BY statements?

I have a query similar to the following that I'd like to perform on an sqlite database:
SELECT name
FROM table
WHERE name LIKE "%John%"
ORDER BY (CASE WHEN name = "John" THEN 1 WHEN name LIKE "John%" THEN 2 ELSE 3 END),name LIMIT 10
I'd like to use SQLite for Swift to chain the query together, but I'm stumped as to how to (or if it is even possible to) use the .order method.
let name = "John"
let filter = "%" + name + "%"
table.select(nameCOL).filter(nameCOL.like(filter)).order(nameCOL)
Gets me
SELECT name
FROM table
WHERE name LIKE %John%
ORDER BY name
Any ideas on how to add to the query to get the more advanced sorting where names that start with John come first, followed by names with John in them?
I saw the sqlite portion of the solution here: SQLite LIKE & ORDER BY Match query
Now I'd just like to implement it using SQlite for Swift
Seems it may be too restrictive for that given the limited examples, anyone else have any experience with more complicated ORDER BY clauses?
Thanks very much.
Sqlite.swift can handle that pretty cleanly with two .order statements:
let name = "John"
let filter = "%" + name + "%"
table.select(nameCol).filter(nameCol.like(filter))
.order(nameCol.like("\(name)%").desc
.order(nameCol)
The .order statements are applied in the order they are listed, the first being primary.
The filter will reduce the results to only those with "John" in them. SQlite.swift can do many complex things. I thought I would need a great deal of raw sql queries when I ported 100s of complex queries over to it, but I have yet to use raw sql.

Efficient way to find ordered string's exact, prefix and postfix match in PostgreSQL

Given a table name table and a string column named column, I want to search for the word word in that column in the following way: exact matches be on top, followed by prefix matches and finally postfix matches.
Currently I got the following solutions:
Solution 1:
select column
from (select column,
case
when column like 'word' then 1
when column like 'word%' then 2
when column like '%word' then 3
end as rank
from table) as ranked
where rank is not null
order by rank;
Solution 2:
select column
from table
where column like 'word'
or column like 'word%'
or column like '%word'
order by case
when column like 'word' then 1
when column like 'word%' then 2
when column like '%word' then 3
end;
Now my question is which one of the two solutions are more efficient or better yet, is there a solution better than both of them?
Your 2nd solution looks simpler for the planner to optimize, but it is possible that the first one gets the same plan as well.
For the Where, is not needed as it is covered by ; it might confuse the DB to do 2 checks instead of one.
But the biggest problem is the third one as this has no way to be optimized by an index.
So either way, PostgreSQL is going to scan your full table and manually extract the matches. This is going to be slow for 20,000 rows or more.
I recommend you to explore fuzzy string matching and full text search; looks like that is what you're trying to emulate.
Even if you don't want the full power of FTS or fuzzy string matching, you definitely should add the extension "pgtrgm", as it will enable you to add a GIN index on the column that will speedup LIKE '%word' searches.
https://www.postgresql.org/docs/current/pgtrgm.html
And seriously, have a look to FTS. It does provide ranking. If your requirements are strict to what you described, you can still perform the FTS query to "prefilter" and then apply this logic afterwards.
There are tons of introduction articles to PostgreSQL FTS, here's one:
https://www.compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/
And even I wrote a post recently when I added FTS search to my site:
https://deavid.wordpress.com/2019/05/28/sedice-adding-fts-with-postgresql-was-really-easy/

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.

Parameterized SQL Columns?

I have some code which utilizes parameterized queries to prevent against injection, but I also need to be able to dynamically construct the query regardless of the structure of the table. What is the proper way to do this?
Here's an example, say I have a table with columns Name, Address, Telephone. I have a web page where I run Show Columns and populate a select drop-down with them as options.
Next, I have a textbox called Search. This textbox is used as the parameter.
Currently my code looks something like this:
result = pquery('SELECT * FROM contacts WHERE `' + escape(column) + '`=?', search);
I get an icky feeling from it though. The reason I'm using parameterized queries is to avoid using escape. Also, escape is likely not designed for escaping column names.
How can I make sure this works the way I intend?
Edit:
The reason I require dynamic queries is that the schema is user-configurable, and I will not be around to fix anything hard-coded.
Instead of passing the column names, just pass an identifier that you code will translate to a column name using a hardcoded table. This means you don't need to worry about malicious data being passed, since all the data is either translated legally, or is known to be invalid. Psudoish code:
#columns = qw/Name Address Telephone/;
if ($columns[$param]) {
$query = "select * from contacts where $columns[$param] = ?";
} else {
die "Invalid column!";
}
run_sql($query, $search);
The trick is to be confident in your escaping and validating routines. I use my own SQL escape function that is overloaded for literals of different types. Nowhere do I insert expressions (as opposed to quoted literal values) directly from user input.
Still, it can be done, I recommend a separate — and strict — function for validating the column name. Allow it to accept only a single identifier, something like
/^\w[\w\d_]*$/
You'll have to rely on assumptions you can make about your own column names.
I use ADO.NET and the use of SQL Commands and SQLParameters to those commands which take care of the Escape problem. So if you are in a Microsoft-tool environment as well, I can say that I use this very sucesfully to build dynamic SQL and yet protect my parameters
best of luck
Make the column based on the results of another query to a table that enumerates the possible schema values. In that second query you can hardcode the select to the column name that is used to define the schema. if no rows are returned then the entered column is invalid.
In standard SQL, you enclose delimited identifiers in double quotes. This means that:
SELECT * FROM "SomeTable" WHERE "SomeColumn" = ?
will select from a table called SomeTable with the shown capitalization (not a case-converted version of the name), and will apply a condition to a column called SomeColumn with the shown capitalization.
Of itself, that's not very helpful, but...if you can apply the escape() technique with double quotes to the names entered via your web form, then you can build up your query reasonably confidently.
Of course, you said you wanted to avoid using escape - and indeed you don't have to use it on the parameters where you provide the ? place-holders. But where you are putting user-provided data into the query, you need to protect yourself from malicious people.
Different DBMS have different ways of providing delimited identifiers. MS SQL Server, for instance, seems to use square brackets [SomeTable] instead of double quotes.
Column names in some databases can contain spaces, which mean you'd have to quote the column name, but if your database contains no such columns, just run the column name through a regular expression or some sort of check before splicing into the SQL:
if ( $column !~ /^\w+$/ ) {
die "Bad column name [$column]";
}