How to constrain a query for number of items in ObjectStorage property in Extbase? - typo3

I want to build queries in my Extbase repository that can filter out objects with a certain number of items. items is of type ObjectStorage in the model.
I tried to get objects with at least 1 item in this query, but it obviously doesn't work because the method "greater than" won't count the items in the objectStorage.
$x = 0; //any number
$query = $this->createQuery();
$constraints[] = $query->equals('deleted', 0);
$constraints[] = $query->equals('hidden', 0);
$constraints[] = $query->greaterThan('items', $x);
return $query->matching($query->logicalAnd($constraints))->execute();
So how can I do this?
I was thinking about an SQL Statement, but how could I add it to constraints?
I don't want to do everything with SQL.

I don't see how you could do it without either using $query->statement or iterating the whole result without the "count" constraint and then throwing away the records with items lower than $x. (Apart from pgampe's approach that was posted at the exact same time.)
Which approach you take depends on performance considerations. If you have a small result set, doing a foreach on the QueryResult wouldn't hurt so much. But since every result is converted into an object in the process, you'll have a big overhead by doing so when you have a lot of records.
Keep in mind that using $query->statement won't make you lose the ability to work with the records as domain objects. So it's not a big deal to go this way.

You can create yourself a custom constraint that does this. Have a look at how it is done in the \TYPO3\CMS\Extbase\Persistence\Generic\Qom\QueryObjectModelFactory.

Related

Apex query optimization

I am trying this query:
List<Account> onlyRRCustomer = [SELECT
ac.rr_First_Name__c,
ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c,
ac.id,
ac.rr_Date_of_Birth__c
FROM
Account ac
WHERE
ac.rr_National_Insurance_Number__c IN :uniqueNiInputSet
AND RecordTypeId = :recordTypeId];
It gives me an error:
SELECT ac.rr_First_Name__c, ac.rr_Last_Name__c,
ac.rr_National_Insurance_Number__c, ac.id, ac.rr_Date_of_Birth__c FROM
Account ac WHERE (ac.rr_National_Insurance_Number__c = :tmpVar1 AND
RecordTypeId = :tmpVar2) 10:12:05.0
(11489528)|EXCEPTION_THROWN|[49]|System.QueryException: Non-selective
query against large object type (more than 200000 rows). Consider an
indexed filter or contact salesforce.com about custom indexing.
I understand uniqueNiInputSet.size() ~ 50, so, it's not an issue but for that record type, it might contains more records.
So, if i changed the position will that work? Means, first the recordtype and then the NIset in where clause. Is there any order how where clause are selected in SF. So, it will only look for 50 member and then within 50 it will serach for the particular record type?
That just means that the script is taking too long to execute. You may need to move this to a #future method or make execute it using Database.Batchable.
I don't think the order matters in SOQL, I think it's just trying to return too many records.
A non-selective query means you are performing a query against a table that has a large number of records and your query is not specific enough. You can work with Salesforce support to try to resolve this, either through the creation of additional backend indexes or by making the query more selective.
To be honest, your query looks very selective already, you're not using LIKE or IN. You should also put your most selective conditions first (resulting in a more focused query against your records).
I know it should'nt matter, but I would also move your conditions out of the parenthesis.
If there are any other fields you can filter on, that may help. Sometimes, you have to actually create new fields and populate them just to help make your queries more selective.
Also, if rr_National_Insurance_Number__c is a formula field, you will want to change it to a text field and populate workflow or apex instead. Formula fields require additional time on the servers to calculate.
SELECT rr_First_Name__c, rr_Last_Name__c, rr_National_Insurance_Number__c, id, rr_Date_of_Birth__c
FROM Account
WHERE new_custom_field__c = TRUE
AND rr_National_Insurance_Number__c = :tmpVar1
AND RecordTypeId = :tmpVar2
Your query is non-selective. For a standard indexes is 30% for the fist million records and 15% of records over a million up to 1 million records total. For and "AND" query each individual where criteria must itself be selective see this quick reference cheat sheet. In general try making
rr_National_Insurance_Number__c
an external id which will make it an indexed by salesforce by default and retry you query. Record Types are already indexed by default. If the result is still non-selective because of the number of results returned, try limiting the number of results using a field like CreatedDate to limit the scope of the query.

100 columns vs Array of length 100

I have a table with 100+ values corresponding to each row, so I'm exploring different ways to store them.
Without any indexes, would I lose anything if I store these 100 values in an integer[] column in postgresql? As compared to storing them in separate columns.
Plus, since we can add indexes to array elemnets,
CREATE INDEX test_index on test ((foo[1]));
Would there be a performance difference queries using such an index as compared to regular index on a column?
As far as I've read, this performance difference would come into picture in arrays with variable length elements; but I'm not sure about fixed length ones.
Don't go for the lazy way.
If you need to store 100 and more values as array, it is ok, if it has sense has array for your application, your data.
If you need to query for a specific element of the array, then this design is not good, regardless of performances, and you must use columns. This will help you in the moment you must delete a "column" in the middle or redesign it.
Anyway, as wrote by Frank in comments, if values are all same type, consider to model them to another table (if also the meaning is the same).

ORMLite foreigncollection - object searching

I'm using ORMLite & SQLite as my database, and I'm working on a android app.
1) I'm searching for a particular object in the Foreign-Collection object as follows.
Collection<PantryCheckLine> pantryCheckLinesCollection = pantryCheck.getPantryCheckLines();
Iterator iterator = pantryCheckLinesCollection.iterator();
while (iterator.hasNext()) {
pantryCheckLine = (PantryCheckLine) iterator.next();
//i'm searching for a purticular object match
}
2) Or else I can directly query from the relevant table and identified the item as well.
What I'm asking is out of these two methods which one will be much faster?
Depends a bit on the particulars of your situation.
If your ForeignCollection is eager fetched then your loop will not have to do any database transactions and will, for small collections, probably be faster then doing the query.
However if your collection lazy loaded then iterating through the collection will go back to the database anyway so you might as well do the precise query.

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.

What's a real world example of something you would represent with a hash?

I'm just trying to get a grip on when you would need to use a hash and when it might be better to use an array. What kind of real-world object would a hash represent, say, in the case of strings?
I believe sometimes a hash is referred to as a "dictionary", and I think that's a good example in itself. If you want to look up the definition of a word, it's nice to just do something like:
definition['pernicious']
Instead of trying to figure out the correct numeric index that the definition would be stored at.
This answer assumes that by "hash" you're basically just referring to an associative array.
I think you're looking at things in the wrong direction. It is not the object which determines if you should use a hash but the manner in which you are accessing it. A common use of a hash is when using a lookup table. If your objects are strings and you want to check if they exist in a Dictionary, looking them up will (assuming the hash works properly) by O(1). WIth sorting, the time would instead be O(logn), which may not be acceptable.
Thus, hashes are ideal for use with Dictionaries (hashmaps), sets (hashsets), etc.
They are also a useful way of representing an object without storing the object itself (for passwords).
The phone book - key = name, value = phone number.
I also think of the old World Book Encyclopedias (actual books). Each article is "hashed" into a single book (cat goes in the "C" volume).
Any time you have data that is well served by a 1-to-1 map.
For example, grades in a class:
"John Smith" => "B+"
"Jacob Jenkens" => "C"
etc
In general hashes are used to find things fast - a hash map can be used to assosiate one thing with another fast, a hash set will just store things "fast".
Please consider also the hash function complexity and cost when considering whether it's better to use a hash container or a normal less then container - the additional size of the hash value and the time needed to compute a "perfect" hash, and the time needed to make a 1:1 comparision at the end in case of a hash function conflict may in fact be a lot higher then just going through a tree structure with logharitmic complexity using the less then operators.
When you need to associate one variable with another. There isn't a "type limit" to what can be a key/value in a hash.
Hashed have many uses. Aside from cryptographic uses, they are commonly used for quick lookups of information. To get similarly quick lookups using an array you would need to keep the array sorted and then used a binary search. With a hash you get the fast lookup without having to sort. This is the reason most scripting languages implement hashing under one name or another (dictionaries, et al).
I use one often for a "dictionary" of settings for my app.
Setting | Value
I load them from the database or config file, into hashtable for use by my app.
Works well, and is simple.
One example could be zip code associated with an area, city or any postal address.
A good example is a cache with lot's of elements in it. You have some identifer by which you want to look up the a value (say an URL, and you want to find the according cached webpage). You want these lookups to be as fast as possible and don't want to search through all the stored pages everytime some URL is requested. A hash table is a great data structure for a problem like this.
One real world example I just wrote is when I was adding up the amount people spent on meals when filing expense reports.I needed to get a daily total with no idea how many items would exist on a particular day and no idea what the date range for the expense report would be. There are restrictions on how much a person can expense with many variables (What city, weekend, etc...)
The hash table was the perfect tool to handle this. The key was the date the value was the receipt amount (converted to USD). The receipts could come in in any order, i just keep getting the value for that date and adding to it until the job was done. Displaying was easy as well.
(php code)
$david = new stdclass();
$david->name = "david";
$david->age = 12;
$david->id = 1;
$david->title = "manager";
$joe = new stdclass();
$joe->name = "joe";
$joe->age = 17;
$joe->id = 2;
$joe->title = "employee";
// option 1: lets put users by index
$users[] = $david;
$users[] = $joe;
// option 2: lets put users by title
$users[$david->title] = $david;
$users[$joe->title] = $joe;
now the question: who is the manager?
answer:
$users["manager"]