How do we optimize our mongo logging for analytics

How do we optimize our mongo logging for analytics - mongodb

We log activity on our website using mongo, but now that our traffic has grown we're struggling to pull the reports we want. We hit timeouts. Each user session has it's own record which include the following attributes -
{
channel: 'adwords',
device: {
type: 'mobile'
},
abTestArms: [
test1_arm2,
test3_arm4
],
userEventCounters: {
clickOnRedButton: 3,
adjustPriceSlider: 4,
clickOnBlueButton: 2,
useSearchBox: 4
}
}
We run lots of AB tests and want to find how different versions effect user activity. So we might run queries like -
db.sessions.count({channel:'bing',device:'tablet',abTestArms:'test1_arm1','userEventCounters.useSearchBox':{$exists:true}})
Or
db.sessions.count({channel:'bing',device:'tablet',abTestArms:'test1_arm1'})
We used to use aggregation pipelines, but they started timing out, so now we're building the results bit by bit, but even like this we hit timeouts.
We've tried various different indexes, so for example compound indexes like
{channel:1, device:1, abTestArms:1, userEventCounters:1}
The only thing we haven't tried is creating lots of compound indexes like
{channel:1, device:1, abTestArms:1, 'userEventCounters.hoverOverAd':1}
The issue with this, is that there's lots userEvents that we track, and we don't want 40+ compound indexes like this. Also we'd have to create new indexes whenever we track a new event.
We currently keep about 3m sessions in our db, and the bigger count queries are normally in the 100k's. This can't be an uncommon thing to do, ideally I'd like to be able to do this stuff on the fly (as we used to when traffic levels were small). What am I missing?

Related

Rollup result set with custom condition - rollup by multiple columns and N 'latest' events

I want to count certain Esports events rolling up results by tournament, tournament stage, last 5 and last 10 games, all matches.
I'd like to get a single query that I could put into materialised view - new events happen not too often, so even a few sec query would be perfectly fine for our db.
The picture of what I want to get:
Metrics are very different (tower kills, dragon kills, first tower kills, etc) and I would like avoid copy-paste as much as I could.
This logic could be easily implemented within application code, but I'd like to give the database a try.
Unfortunately, I don't have any working code yet, as the only solution I'm aware of today - create unions with kind of copy-pasted select statements.
I'm looking for something like GROUP BY ROLLUP tournamentId, {limit 5}, {limit 10}, but it seems there is no a very straightforward way to do this.
Any hints on how this could be implemented?
Thanks!

How Can use real-time workflow in CRM 2015?

I have a real-time workflow for creating unique numbers. This workflow get a numeric field from my custom entity, increase it by 1, and update it for next use.
I want to run this workflow on multiple records.
Running on-demand mode, it works fine,and I have true and unique numbers, but for "Record is Created" mode, it dose not work fine and get repeated numbers.
What I have to do?

This approach wont work, when the workflow runs on demand its running multi-threaded, e.g. two users create two records, two instances of the workflow start. As there is no locking mechanism you end up with duplicated numbers.
I'm guessing this isn't happening when running on demand because you are running as a single user.
You will need to implement a custom auto number approach, such as Auto Number for DynamicsCRM.
Disclaimer: I work for Gap Consulting who produce the tool linked above.

jQuery Auto Complete - Performance Issue

We are using the plugin https://goodies.pixabay.com/jquery/tag-editor/demo.html for our AutoComplete feature. We load the source with 3500 items. The performance gets too bad when the user starts typing and the autocomplete loads the filtered result after 6 to 8 seconds.
What are alternate approach that we can take for upto 4000 items for Autocomplete.
Appreciate your response!

are you using the minLength attribute from autocomplete?
on their homepage, they have something like this:
$('#my_textarea').tagEditor({ autocomplete: { 'source': '/url/', minLength: 3 } });
this effectively means, that the user has to enter at least 3 charaters before autocomplete will be used. doing so will usually reduce the amount of results from the autocomplete to a more sane count (like 20-30 maybe).
However, this might not necessarily be your problem. first you should figure out, if it's your server that's got a problem with responding fast (you can use your browser developer toolbar to see how long the requests takes to complete).
If the request takes 6-8 seconds, then you will have to optimize your server's code. On the other hand, if the response is quick, but tageditor needs a long time to build the suggestion list, then the problem is, that it might not be optimized for so many suggestions. in that case, the ultimate solution would be to rewrite the autocompletion module yourself or patch the existing one to better scale to your needs.

Do you go back to the server every time the user types in something to get the matching results?
I am using SPRING ehcache which gets all the items from database and stores in the server cache when the server is started. Whenever the user types the cached data is used which gets the results with few milliseconds. Some one else recommended me to use this.Below is the example for it
http://www.mkyong.com/ehcache/ehcache-hello-world-example/
I am using the jQuery autocomplete features with 2500 items without any issue.
here is the link where it being used http://www.all4sportsonline.com

Queries are very slow

I build a Website with CMS Typo3. Everything works fine, but i have i performance issue. The frontend was generated very slow. So i decide to install the extension "T3Profiler" to analyze the problem.
I find in profiler queries like:
SELECT content FROM cf_cache_hash WHERE identifier = 'f33c135b63eac6bb7194edab51f3c57a' AND cf_cache_hash.expires >= 1441015330 LIMIT 1
Such queries takes 90.000 - 600.000 ms. Why this selects are so slow? How can solve my issue?
Can someone give me a hint?

Sometimes when system tables like sys_log, *_cache_* and others grows to huge sizes it causes that querying them becomes slower and slooooower... While they are often accessed during common rendering process, they can become real performance killers
There are several workarounds for this:
Add scheduler's tasks to clean these tables regularly i.e. sys_log and history entries can be purged after ie. 30 days especially when system is in dev state and maaaany changes are done every day.
Check what fills the tables - i.e. some extension may add several hundreds logs to sys_log i.e. when some method doesn't get some expected argument - if it's used in loop with collection of hundred items, logger has to write the error several hundreds times per request (!), fix the code to avoid such situations
Make sure that all tables structures are correct via Install Tool > Compare current database with specification
Finally use your database GUI to optimize and/or repair tables.

In addition to #biesior answer, another tip is to store caches outside of your TYPO3 database, because db is already stressed enough by data selection.
For TYPO3 7.x I do it like this in AdditionalConfiguration.php:
$redisCacheOptions = [
'hostname' => 'localhost',
'port' => 6379,
'database' => 2,
'password' => '******',
];
$cacheConfigurations = [
'cache_hash',
'cache_imagesizes',
'cache_pages',
'cache_pagesection',
'cache_rootline',
'extbase_datamapfactory_datamap',
'extbase_object',
'extbase_reflection',
'extbase_typo3dbbackend_queries',
'extbase_typo3dbbackend_tablecolumns'
];
foreach ($cacheConfigurations as $cacheConfiguration) {
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['backend'] = \TYPO3\CMS\Core\Cache\Backend\RedisBackend::class;
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['options'] =
$redisCacheOptions + (array)$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['options'];
}

Why is index not created after teardown if some connections persist?

I setup and teardown my MongoDB database during functional test.
One of my models will make use of GridFS and I am going to run that test (which also calls setup and teardown). Suppose we started out with a clean empty database called test_repoapi:
python serve.py testing.ini
nosetests -a 'write-file'
The second time I run the test, I am getting this:
OperationFailure: command SON([('filemd5', ObjectId('518ec7d84b8aa41dec957d3c')), ('root', u'fs')]) failed: need an index on { files_id : 1 , n : 1 }
If we look at client:
> use test_repoapi
switched to db test_repoapi
> show collections
fs.chunks
system.indexes
users
Here is the log: http://pastebin.com/1adX4svG
There are three kinds of timestamps:
(1) the top one is when I first launched the web app
(2) anything before 23:06:27 were the first iteration
(3) then everything else were the second iteration
As you can see I did initialized commands to drop the database. Two possible explanations:
(1) Web app holds two active connections to the database, and
(2) Some kind of "lock" prevents the index from fully created. Also look fs.files was not recreated.
The workaround is to stop the web app, start again, and run the test; then the error will not appear.
By the way, I am using Mongoengine as my ODM in my web app.
Any thoughts on this?

We used to have similar issue with mongoengine failing to recreate indexes after drop_collection() during tests because it failed to realise dropping collection also drops the indexes. But that was happening with normal collections and a rather ancient version of mongoengine (and a call to QuerySet._reset_already_indexed() fixed it for us - but we haven't needed that since 0.6)
Maybe this is another case of mongoengine internally keeping track of which indexes have been created and it's just failing to realize the database/collection vanished and those indexes must be recreated? FWIW using drop_collection() between tests is working for us and that includes GridFS.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do we optimize our mongo logging for analytics - mongodb

Related

Rollup result set with custom condition - rollup by multiple columns and N 'latest' events

How Can use real-time workflow in CRM 2015?

jQuery Auto Complete - Performance Issue

Queries are very slow

Why is index not created after teardown if some connections persist?

Categories

Resources