Postgres: optimize query where column is in string

Postgres: optimize query where column is in string - postgresql

Lets say I have a table like so:
webpages
id | url
------------------------
1 | http://example.com/path
2 | example2.biz/another/path
3 | https://www.example3.net/another/path
And I want to search which webpages' url column is a substring of an input string. I know I can do it like this:
SELECT id FROM webpages WHERE STRPOS(url, 'example.com/path/to/some/content') > 0;
Expected result: 1
But I'm not sure how I might optimize this kind of query to run faster. Is there a way?
I found this article, it seems to suggest not - though I'm not sure if that's still true as it's from over a decade ago.
https://www.postgresql.org/message-id/046801c96b06%242cb14280%248613c780%24%40r%40sbcglobal.net

Related

Very slow from query to c.JSON

I'm querying my database using Gorm and then using gin's c.JSON to marshal the structs to json.
It's a large query with not so much result ( < 100k ) and i'm having an issue with the time it takes ( 6-10 seconds ) to marshal the data.
I have no idea where to start to resolve the issue.
[2019-07-02 14:41:04] [946.63ms] SELECT big slow query
[62861 rows affected or returned ]
[GIN] 2019/07/02 - 14:41:11 | 200 | 7.92347114s | ip | GET /api/date/2019-05-30
[2019-07-02 14:40:44] [660.47ms] SELECT big slow query
[7583 rows affected or returned ]
[GIN] 2019/07/02 - 14:40:54 | 200 | 10.841096216s | ip | GET /api/dailies
[2019-07-02 14:43:49] [154.13ms] SELECT simple query
[11 rows affected or returned ]
[GIN] 2019/07/02 - 14:43:49 | 200 | 158.256792ms | ip | GET /api/dailycount
As you can see query 1 and 2 resolve in 600-900 ms , it's slow but it can be optimised separately.
The issue is that the response of the server take 7.9 and 10.8s ..!
For the smaller query there isn't much difference but i'm not getting why this is happening.
The go code for one of the route is pretty straightforward and similare for all routes :
var alertList []AlertJson
dbInstance.Debug().Raw("SELECT big query").Scan(&alertList)
c.JSON(http.StatusOK, gin.H{"alerts": alertList})
10.2 second for the second query with 7583rows to marshal seem pretty insane to me.

There was immense lag on my server provider and just in fact the 10 second is indeed the delay for me to receive the data as mentioned by peter.

Using NSPredicate to refer other NSPredicate rules

Let's say I have a Core Data database for NSPredicate rules.
enum PredicateType,Int {
case beginswith
case endswith
case contains
}
My Database looks like below
+------+-----------+
| Type | Content |
+------+-----------+
| 0 | Hello |
| 1 | end |
| 2 | somevalue |
| 0 | end |
+------+-----------+
I have a content "This is end". How can I query Core Data to check if there is any rule that satisfies this content? It should find second entry on the table
+------+-----------+
| Type | Content |
+------+-----------+
| 1 | end |
+------+-----------+
but shouldn't find
+------+-----------+
| Type | Content |
+------+-----------+
| 0 | end |
+------+-----------+
Because in this sentence end is not at the beginning.
Currently I am getting all values, Create predicate with Content and Type and query the database again which is a big overhead I believe.

They way you doing it now is correct. You first need to build your predicate (which in your case is very complex operation that also requires fetching) and run each predicate to see if which one matches.
I wouldn't be so quick to assume that there is a huge overhead with this. If your data set is small (<300) I would suspect that there would be no problem with this at all. If you are experencing problems then (and only then!) you should start optimizing.
If you see the app is running too slowly then use instrements to see where the issue is. There are two possible places that I could see having perforance issues - 1) the fetching of all the predicates from the database and 2) the running of all of the predicates.
If you want to make the fetching faster, then I would recommend using a NSFetchedResultsController. While it is generally used to keep data in sync with a tableview it can be used for any data that you want to have a correct data for at any time. With the controller you do a single fetch and then it monitors core-data and keeps itself up to data. Then when you you need all of the predicate instead of doing a fetch, you simply access the contoller's fetchedObjects property.
If you find that running all the predicates are taking a long time, then you can improve the running for beginsWith and endsWith by a clever use of a bianary search. You keep two arrays of custom predicate objects, one sorted alphabetically and the other will all the revered strings sorted alphabetically. To find which string it begins with use indexOfObject:inSortedRange:options:usingComparator: to find the relevant objects. If don't know how you can improve contains. You could see if running string methods on the objects is faster then NSPredicate methods. You could also try running the predicates on a background thread concurrently.
Again, you shouldn't do any of this unless you find that you need to. If your dataset is small, then the way you are doing it now is fine.

Limit results on OR condition in Sphinx

I am trying to limit results by somehow grouping them,
This query attempt should makes things clear:
#namee ("Cameras") limit 5| #namee ("Mobiles") limit 5| #namee ("Washing Machine") limit 5| #namee ("Graphic Cards") limit 5
where namee is the column
Basically I am trying to limit results/ based upon specific criteria.
Is this possible ? Any alternative way of doing what I want to do.
I am on sphinx 2.2.9

There is no Sphinx syntax to do this directly.
The easiest would be just to do directly 4 separate queries and 'UNION' them in the application itself. Performance isn't going to be terrible.
... If you REALLY want to do it in Sphinx, can explicit a couple of tricks to get close, but it gets very complicated.
Would need to create 4 separate indexes (or upto as many terms as you need!). Each with the the same data, but with the field called something different. (they duplicate each other!) You would also need an attribute on each one (more on why later)
source str1 {
sql_query = SELECT id, namee AS field1, 1 as idx FROM ...
sql_attr_unit = idx
source str2 {
sql_query = SELECT id, namee AS field2, 2 as idx FROM ...
sql_attr_unit = idx
... etc
Then create a single distributed index over the 4 indexes.
Then can run a single query to get all results kinda magically unioned...
MATCH('##relaxed #field1 ("Cameras") | #field2 ("Mobiles") | #field3 ("Washing Machine") | #field4 ("Graphic Cards")')
(The ##relaxed is important, as the fields are different. the matches must come from different indexes)
Now to limiting them... Because each keyword match must come from a different index, and each index has a unique attribute, the attribute identifies what term matches....
in Sphinx, there is a nice GROUP N BY where you only get a certain number of results from each attribute, so could do... (putting all that together)
SELECT *,WEIGHT() AS weight
FROM dist_index
WHERE MATCH('##relaxed #field1 ("Cameras") | #field2 ("Mobiles") | #field3 ("Washing Machine") | #field4 ("Graphic Cards")')
GROUP 4 BY idx
ORDER BY weight DESC;
simples eh?
(note it only works if want 4 from each index, if want different limits is much more complicated!)

Postgres array fields: find where array contains value

Currently I have a table schema that looks like this:
| id | visitor_ids | name |
|----|-------------|----------------|
| 1 | {abc,def} | Chris Houghton |
| 2 | {ghi} | Matt Quinn |
The visitor_ids are all GUIDs, I've just shortened them for simplicity.
A user can have multiple visitor ids, hence the array type.
I have a GIN index created on the visitor_ids field.
I want to be able to lookup users by a visitor id. Currently we're doing this:
SELECT *
FROM users
WHERE visitor_ids && array['abc'];
The above works, but it's really really slow at scale - it takes around 45ms which is ~700x slower than a lookup by the primary key. (Even with the GIN index)
Surely there's got to be a more efficient way of doing this? I've looked around and wasn't able to find anything.
Possible solutions I can think of could be:
The current query is just bad and needs improving
Using a separate user_visitor_ids table
Something smart with special indexes
Help appreciated :)

I tried the second solution - 700x faster. Bingo.
I feel like this is an unsolved problem however, what's the point in adding arrays to Postgres when the performance is so bad, even with indexes?

How to match rows with one or more words in query, but without any words not in query?

I have a table in a MySQL database that has a list of comma separated tags in it.
I want users to be able to enter a list of comma separated tags and then use Sphinx or MySQL to select rows that have at least one of the tags in the query but not any tags the query doesn't have.
The query can have additional tags that are not in the rows, but the rows should not be matched if they have tags not in the query.
I either want to use Sphinx or MySQL to do the searching.
Here's an example:
creatures:
----------------------------
| name | tags |
----------------------------
| cat | wily,hairy |
| dog | cute,hairy |
| fly | ugly |
| bear | grumpy,hungry |
----------------------------
Example searches:
wily,hairy <-- should match cat
cute,hairy,happy <-- should match dog
happy,cute <-- no match (dog has hairy)
ugly,yuck,gross <-- should match fly
hairy <-- no match (dog has cute cat has wily)
grumpy <-- no match (bear has hungry)
grumpy,hungry <-- should match bear
wily,grumpy,hungry <-- should match bear
Is it possible to do this with Sphinx or MySQL?
To reiterate, the query will be a list of comma separated tags and rows that have at least one of the entered tags but not any tags the query doesn't have should be selected.

Sphinx expression ranker should be able to do this.
sphinxQL> SELECT *, WEIGHT() AS w FROM index
WHERE MATCH('#tags "cute hairy happy"/1') AND w > 0
OPTION ranker=expr('IF(word_count>=tags_len,1,0)');
basically you want the number of matched tags never to be less than the number of tags.
Note these just gives all documents a weight of 1, if want to get more elaborate ranking (eg to match other keywords) it gets more complicated.
You need index_field_lengths enabled on the index to get the tags_len attribute.
(the same concept is obviouslly possible in mysql.. probably using FIND_IN_SET to do matching. And either a second column to store the number, or compute the number of tags, using say the REPLACE function)
Edit to add, details about multiple fields...
sphinxQL> SELECT *, WEIGHT() AS w FROM index
WHERE MATCH('#tags "cute hairy happy"/1 #tags2 "one two thee"/1') AND w = 2
OPTION ranker=expr('SUM(IF(word_count>=IF(user_weight=2,tags2_len,tags_len),1,0))'),
field_weights=(tags=1,tags2=2);
The SUM function is run for each field in turn, so need to use the user_weight system to get be able to distinguish which field currently enumerating.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse