Select records at random using extbase's QueryBuilder

Select records at random using extbase's QueryBuilder - typo3

I’m working on a bespoke extension in TYPO3 version 8.
In my extension I want to display three items in the front end in a random order out of a result set of at least ten or so.
As an example
(Select only three people at random and pass them to a view )
My first though was to look at the QueryBuilder and create a custom query. Looking at other similar posts it would seem that extbase’s query builder doesn’t carry a RAND function (with good reason).
Would it be better to look at using Fluid and the iterator viewhelper to help me show items in a random order? Or can it be achieved using the QueryBuilder?

The Extbase Repository Query does not have any Random functions. At least the new Doctrine Querybuilder since TYPO3 8 LTS can be used to get random results: https://docs.typo3.org/typo3cms/CoreApiReference/latest/ApiOverview/Database/QueryBuilder/Index.html
On this QueryBuilder you can use addSelectLiteral to insert a RAND() function to the DB query. But you have to make sure that your Database Software supports this function since the TYPO3 DB can be connected to any other DB software which maybe doesnt support RAND().
Usage example:
$rows = $queryBuilder
->select('*')
->from('tx_yourext_domain_model_example')
->addSelectLiteral('RAND() AS randomnumber')
->orderBy('randomnumber')
->setMaxResults(3)
->execute()
->fetchAll();
To create some Extbase Model Records from the result you can use the DataMapper.
$dataMapper = GeneralUtility::makeInstance(TYPO3\CMS\Extbase\Persistence\Generic\Mapper\DataMapper::class);
$result = $dataMapper->map(\You\Yourext\Model\Example::class, $rows);

I don't think there is a build in way in TYPO3, Fluid or QueryBuilder to do this. The way I would solve it depends on how many records there are in total.
If it's just a few dozen I'd probably select them all and use the PHP shuffle function to sort them randomly and then show the first 3.
If there can be hundreds or more records, I'd probably do a count and get 3 random numbers between 0 and the result of the count. You can then do 3 different queries using setFirstResult (the random number) and setMaxResults (1) functions.

I do not use vhs in TYPO3v9. I use seconds and simple math to get 1 random element. For example, I get the random value of the iterator from the {news} array through:
<f:variable name="sec4random" value="{f:format.date(date:'0 seconds', format:'s')}"/>
<f:variable name="count4random" value="{f:count(subject: news)}"/>
Random index for iterator: {sec4random % count4random}
To access the newsItem with the found index, you could add the following:
<f:variable name="newsindex" value="{sec4random % count4random}"/>
<f:variable name="newsItem" value="{news.{newsindex}}"/>
Now {newsItem} contains the random single news out of the {news} array.

Related

Bigquery Stack Overflow tags. How do I find strongly related tags?

Suppose I were to ask show me all the tags and questions related to 'data-structures' I would like to see related tags like '2-3-4 trees', 'binary-search-trees', 'dfs' etc. and not see many questions filtered on the language criteria. If I tried finding out the most commonly occurring bigrams with data structures then even C++ and Python would come but I would rather try to see core data structure tags come up.
How do I go about implementing this?
What if have tried is to get tags that occur the most with data structures by querying on data structures first. And then also went and looked at all the tags and fetched their common partners to see if data structures occur. If they occur pairwise then I assuming they are strongly associated. But how can I take this logic one more step ahead to fetch more related tags?

From what I understood - you already got code that gives you "related" tags for given tag!
If so - to get rid of languages and leave only "strongly" related tags you need to find (using the same exact code/query you already have) tags related to "programming-languages" tag and exclude them from your result!
Bingo! (hopefully)
P.S.
When I did this:
below initial list
algorithm java c++ c
python linked-list tree arrays
c# binary-tree binary-search-tree dictionary
database graph sorting javascript
performance list recursion stack
hash hashtable time-complexity queue
hashmap
got "trimmed" to
linked-list tree arrays binary-tree
binary-search-tree dictionary database graph
sorting list recursion stack
hash hashtable time-complexity queue
hashmap
so, entries as below got removed
algorithm java c++ c
python c# javascript performance

I'm defining a ratio:
(how many times a_tag has shown up with b_tag) /
(how many times b_tag has shown up without a_tag)
Results for 'data-structures' look like the results you are looking for:
#standardSQL
CREATE TEMPORARY FUNCTION targetTag() AS ("data-structures");
WITH split_tags AS (
SELECT SPLIT(tags, '|') tags
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags LIKE '%|%'
)
SELECT *, ROUND(c/notc,4) ratio_related
FROM (
SELECT tag, COUNT(*) c, (
SELECT COUNT(*)
FROM split_tags b, UNNEST(tags) btag
WHERE targetTag() NOT IN UNNEST(b.tags) AND btag=tag
) notc
FROM split_tags, UNNEST(tags) tag
WHERE targetTag() IN UNNEST(tags)
GROUP BY 1
)
WHERE notc>0
AND c+notc>20
ORDER BY 4 DESC
LIMIT 100

Sphinx Mulit-Level Sort with Randomize

Here is my challenge with Sphinx Sort where I have Vendors who pay for premium placement and those who don't:
I already do a multi-level order including the PaidVendorStatus which is either 0 or 1 as:
order by PaidVendorStatus,Weight()
So in essence I end up with multiple sort groups:
PaidVendorStatus=1, Weight1
....
PaidVendorStatus=1, WeightN
PaidVendorStatus=0, Weight1
...
PaidVendorStatus=0, WeightN
The problem is I have three goals:
Randomly prioritize each vendor in any given sort group
Have each vendor's 'odds' of being randomly assigned top position be equal regardless of how many records they have returned in the group (so if Vendor A has 50 results and VendorB has 2 results they still both have 50% odds of being randomly assigned any given spot)
Ideally, maintain the same results order in any given search (so that if the user searches again the same order will be displayed
I've tried various solutions:
Select CRC32(Vendor) as RANDOM...Order by PaidVendorStatus,Weight(),RANDOM
which solves 2 and 3 except due to the nature of CRC32 ALWAYS puts the same vendor first (and second, third, etc.) so in essence does not solve the issue at all.
I tried making a sphinx sql_attr_string in my Sphinx Configuration which was a concatenation of Vendor and the record Title (Select... concat(Vendor,Title) as RANDOMIZER..)` and then used that to randomize
Select CRC32(RANDOMIZER) as RANDOM...
which solves 1 and 3 as now the Title field gets thrown in the randomization mis so that the same Vendor does not always get first billing. However, it fails at 2 since in essence I am only sorting by Title and thus Vendor B with two results now has a very low change of being sorted first.
In an ideal world naturally I could just order this way;
Order by PaidVendorStatus,Weight(),RAND(Vendor)
but that is not possible.
Any thoughts on this appreciated. I did btw check out as per Barry Hunter's suggestion this thread on UDF but unless I am not understanding it at all (possible) it does not seem to be the solution for this problem.

Well one idea is:
SELECT * FROM (
SELECT *,uniqueserial(vendor_id) AS sorter FROM index WHERE MATCH(...)
ORDER BY PaidVendorStatus DESC ,Weight() DESC LIMIT 1000
) ORDER BY sorter DESC, WEIGHT() DESC:
This exploits SPhixnes 'multiple sort' function with pysudeo subquery.
This works wors becasuse the inner query is sorted by PaidVendor first, so their items are fist. Which works to affect the ordr that unqique serial is called in.
Its NOT really 'randomising' the results as such, seems you jsut randomising them to mix up the vendors (so a single vendor doesnt domninate results. Uniqueserial works by 'spreading' the particular vendors results out - the results will tend to cycle through the vendors.
This is tricky as it exploits a relative undocumented sphinx feature - subqueries.
For the UDF see http://svn.geograph.org.uk/svn/modules/trunk/sphinx/

Still dont have an answer for your biased random (as in 2.)
but just remembered another feature taht can help with 3. - can supply s specific seed to the random number. Typically random generators are seeded from the current time, which gives ever changing values, But using a specific seed.
Seed is however a number, so need a predictable, but changing number. Could CRC the query?
... sphinx doesnt support expressions in the OPTION so would have to caculate the hash in the app.
<?php
$query = $db->Quote($_GET['q']);
$crc = crc32($query);
$sql = "SELECT id,IDIV(WEIGHT(),100) as i,RAND() as r FROM index WHERE MATCH($query)
ORDER BY PaidVendorStatus DESC,i DESC,r ASC OPTION random_seed=$crc";
If wanted the results to only slowly evolve, add the current date, so each day is a new selection...
$crc = crc32($query.date('Ymd'));

Shuffle ID's on table rails 4 [duplicate]

User.find(:all, :order => "RANDOM()", :limit => 10) was the way I did it in Rails 3.
User.all(:order => "RANDOM()", :limit => 10) is how I thought Rails 4 would do it, but this is still giving me a Deprecation warning:
DEPRECATION WARNING: Relation#all is deprecated. If you want to eager-load a relation, you can call #load (e.g. `Post.where(published: true).load`). If you want to get an array of records from a relation, you can call #to_a (e.g. `Post.where(published: true).to_a`).

You'll want to use the order and limit methods instead. You can get rid of the all.
For PostgreSQL and SQLite:
User.order("RANDOM()").limit(10)
Or for MySQL:
User.order("RAND()").limit(10)

As the random function could change for different databases, I would recommend to use the following code:
User.offset(rand(User.count)).first
Of course, this is useful only if you're looking for only one record.
If you wanna get more that one, you could do something like:
User.offset(rand(User.count) - 10).limit(10)
The - 10 is to assure you get 10 records in case rand returns a number greater than count - 10.
Keep in mind you'll always get 10 consecutive records.

I think the best solution is really ordering randomly in database.
But if you need to avoid specific random function from database, you can use pluck and shuffle approach.
For one record:
User.find(User.pluck(:id).shuffle.first)
For more than one record:
User.where(id: User.pluck(:id).sample(10))

I would suggest making this a scope as you can then chain it:
class User < ActiveRecord::Base
scope :random, -> { order(Arel::Nodes::NamedFunction.new('RANDOM', [])) }
end
User.random.limit(10)
User.active.random.limit(10)

While not the fastest solution, I like the brevity of:
User.ids.sample(10)
The .ids method yields an array of User IDs and .sample(10) picks 10 random values from this array.

Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally

For MYSQL this worked for me:
User.order("RAND()").limit(10)

You could call .sample on the records, like: User.all.sample(10)

The answer of #maurimiranda User.offset(rand(User.count)).first is not good in case we need get 10 random records because User.offset(rand(User.count) - 10).limit(10) will return a sequence of 10 records from the random position, they are not "total randomly", right? So we need to call that function 10 times to get 10 "total randomly".
Beside that, offset is also not good if the random function return a high value. If your query looks like offset: 10000 and limit: 20 , it is generating 10,020 rows and throwing away the first 10,000 of them,
which is very expensive. So call 10 times offset.limit is not efficient.
So i thought that in case we just want to get one random user then User.offset(rand(User.count)).first maybe better (at least we can improve by caching User.count).
But if we want 10 random users or more then User.order("RAND()").limit(10) should be better.

Here's a quick solution.. currently using it with over 1.5 million records and getting decent performance. The best solution would be to cache one or more random record sets, and then refresh them with a background worker at a desired interval.
Created random_records_helper.rb file:
module RandomRecordsHelper
def random_user_ids(n)
user_ids = []
user_count = User.count
n.times{user_ids << rand(1..user_count)}
return user_ids
end
in the controller:
#users = User.where(id: random_user_ids(10))
This is much quicker than the .order("RANDOM()").limit(10) method - I went from a 13 sec load time down to 500ms.

Extbase query to compare two fields in same table

Is it possible to compare two database fields in the query api? For example I want compare the fields tstamp and crdate like:
SELECT * FROM tt_content WHERE tstamp > crdate;
In the query api I could not found a solution. To get all records and compare the fields in a loop is not a performed way, because this could be over 2 million records (in my real case).
Thanks for your help.

The only way I can think of (and that the query builder supports) is to directly supply the statement. It'll look like this:
$query = $contentElementRepository->createQuery();
$query->statement('SELECT * FROM tt_content WHERE tstamp > crdate');
$matchingContentElements = $query->execute();
This probably breaks the database abstraction layer, so use it with caution. statement() has a second parameter where you can put parameters, in case you need some user input in the query.
Maybe there is another way to do this which I don't know, I'd be really interested in it myself.

Calculate sum in script in ABBYY Flexicapture

I would like to perform the function of a Calculate Sum rule with a Script rule in ABBYY Flexicapture, because I want to only perform the calculation based on evaluation of an if statement.
I have tried the following in a C# script rule:
IFields AllTaxes = Context.Field("cu_location_taxes").Rows;
which gives me the error "Field is not a table."
I have also tried
IFields AllTaxes = Context.Field("cu_location_taxes").Children;
which gives me the error "Cannot access fields not specified in rule settings." Even though I have added the repeating group cu_location_taxes to the c# script rule.
Once I am able to get them in some kind of array or list or IFields variable, I would like to sum the children values in some way. I am open to doing this with JScript or C#.

The reasons of the errors you are facing can be found in ABBYY FlexiCapture Help.
In the description of IField class you can find the following descriptions of properties:
Rows - A set of table rows. Unavailable for non-table fields.
Well, it seems to be that "cu_location_taxes" is not a table. You said, it is a repeating group.
Children - Child items of the field (cells for tables). Unavailable in script rules.
But as I understand, you are exactly using script rules.
To achieve correct results try to use Items property exactly of the fields that you are summing.
For example, you have a repeating group that contains number fields field_1 and field_2. And you want to calculate the sum of all field_1 instances.
Then you can use the following code (JScript):
sum = 0
for (i = 0; i < this.Field("field_1").Items.Count; ++i)
{
sum += this.Field("field_1").Items.Item(i).Value
}
Also do not forget to add the field_1 in available fields of your rule settings.
Hope this will help.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Select records at random using extbase's QueryBuilder - typo3

Related

Bigquery Stack Overflow tags. How do I find strongly related tags?

Sphinx Mulit-Level Sort with Randomize

Shuffle ID's on table rails 4 [duplicate]

Extbase query to compare two fields in same table

Calculate sum in script in ABBYY Flexicapture

Categories

Resources