MongoDB log: extent 0:550000 and can't find plugin [desc] - mongodb

I am using Meteor with a separate MongoDB running on Windows. I ran MongoDB as a service.
My MongoDB log is full of the following:
Mon Mar 04 14:15:36 [conn19] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:15:38 [conn17] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:15:40 [conn16] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:15:43 [conn18] warning: can't find plugin [desc]
Mon Mar 04 14:15:43 [conn19] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:15:49 [conn18] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:16:14 [conn16] warning: can't find plugin [desc]
Mon Mar 04 14:16:20 [conn17] info DFM::findAll(): extent 0:55000 was empty, skipping ahead. ns:webfm.graphdata
Mon Mar 04 14:16:24 [conn16] warning: can't find plugin [desc]
Mon Mar 04 14:16:32 [conn20] info DFM::findAll(): extent 0:60000 was empty, skipping ahead. ns:webfm.history
Mon Mar 04 14:16:34 [conn16] warning: can't find plugin [desc]
From what I can find the "findAll(): extent 0:55000..." seemed to have to do with my application's removing the data frequently. Is that correct?
How about the can't find plugin [desc]? What's that plugin? How can I fix it?

From what I can find the "findAll(): extent 0:55000..." seemed to have to do with my application's removing the data frequently. Is that correct?
Per Nick's answer, the message extent 0:55000 was empty, skipping ahead is related to your frequent deletions. This is a warning emitted when skipping an empty data extent, with 0:55000 being the extent location. The space will normally be reused as you add more data to that collection.
If you are frequently adding and deleting this collection and disk space is of concern, you could also consider:
enabling usePowerof2Sizes flag for this collection (MongoDB 2.2 or newer) for more effective reuse of deleted space
saving temporary collections in a separate database, and dropping the database when done (rather than running frequent compacts or repairs)
How about the can't find plugin [desc]? What's that plugin? How can I fix it?
This typically means you have an invalid index definition and the index "plugin" can't be found to handle that. In this example, I expect you have an indexed field declared with a property of desc rather than -1.
You can check the index definitions with db.collection.getIndexes():
use webfm
db.graphdata.getIndexes()

The message is in fact informational and harmless. It tells you that a whole extent was empty. This can happen if you deleted a lot of data recently.
You can use the compact command to defragment and compact a collection (basically re-writing it to disk and re-creating the indexes on the collection), which would get rid of the log message. Please be aware that compact is a resource-heavy operation.

I'm not absolute sure this is a problem. You might be running mongodb with too much verbosity, so its showing excessive detail in its warnings and processes for debug reasons. This is a good thing, because if theres a problem or bug the issue can be identified.
But if you want to remove it:
Decrease the number of 'v's or remove them in your startup command.
If you use a config file change whichever of these you have to false:
vvvvv = false
vvvv = false
vvv = false
vv = false
v = false

Related

MongoDB data corruption on a replica set

I am working with a MongoDB database running in a replica set.
Unfortunately, I noticed that the data appears to be corrupted.
There should be over 10,000 documents in the database. However, there are several thousand records that are not being returned in queries.
The total count DOES show the correct total.
db.records.find().count()
10793
And some records are returned when querying by RecordID (a custom sequence integer).
db.records.find({"RecordID": 10049})
{ "_id" : ObjectId("5dfbdb35c1c2a400104edece")
However, when querying for a records that I know for a fact should exist, it does not return anything.
db.records.find({"RecordID": 10048})
db.records.find({"RecordID": 10047})
db.records.find({"RecordID": 10046})
The issue appears to be very sporadic, and in some cases entire ranges of records are missing. The entire range from RecordIDs 1500 to 8000 is missing.
Questions: What could be the cause of the issue? What can I do to troubleshoot this issue further and recover the corrupted data? I looked into running repairDatabase but that is for standalone instances only.
UPDATE:
More info on replication:
rs.printReplicationInfo()
configured oplog size: 5100.880859375MB
log length start to end: 14641107secs (4066.97hrs)
oplog first event time: Wed Mar 03 2021 05:21:25 GMT-0500 (EST)
oplog last event time: Thu Aug 19 2021 17:19:52 GMT-0400 (EDT)
now: Thu Aug 19 2021 17:20:01 GMT-0400 (EDT)
rs.printSecondaryReplicationInfo()
source: node2-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
source: node3-examplehost.com:27017
syncedTo: Thu Aug 19 2021 17:16:42 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
UPDATE 2:
We did a restore from a backup and somehow it looks like it fixed the issue.
We did a restore from a backup and somehow it looks like it fixed the issue.

MongoDB SECONDARY becoming RECOVERING at nighttime

I am running a conventional MongoDB Replica Set consisting of 3 members (member1 in datacenter A, member2 and member3 in datacenter B).
member1 is the current PRIMARY and I am adding members 2 and 3 via rs.add(). They are performing their initial sync and become SECONDARY very soon. Everything is fine all day long and the replication delay of both members is 0 seconds until 2 AM at nighttime.
Now: Every night at 2 AM both members shift into the RECOVERING state and stop replication at all, which leads to a replication delay of hours when I am having a look into rs.printSlaveReplicationInfo() in the morning hours. At around 2 AM there are no massive inserts or maintenance tasks known to me.
I get the following log entries on the PRIMARY:
2015-10-09T01:59:38.914+0200 [initandlisten] connection accepted from 192.168.227.209:59905 #11954 (37 connections now open)
2015-10-09T01:59:55.751+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan
2015-10-09T01:59:55.869+0200 [conn11111] warning: Collection dropped or state deleted during yield of CollectionScan
2015-10-09T01:59:55.870+0200 [conn11111] getmore local.oplog.rs cursorid:1155433944036 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:32168 nreturned:0 reslen:20 134ms
2015-10-09T01:59:55.872+0200 [conn11111] end connection 192.168.227.209:58972 (36 connections now open)
And, which is more interesting, I get the following log entries on both SECONDARYs:
2015-10-09T01:59:55.873+0200 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
2015-10-09T01:59:55.873+0200 [rsBackgroundSync] replSet syncing to: member1:27017
2015-10-09T01:59:56.065+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from member1:27017
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet our last optime : Oct 9 01:59:23 5617035b:17f
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet oldest at member1:27017 : Oct 9 01:59:23 5617035b:1af
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-10-09T01:59:56.066+0200 [rsBackgroundSync] replSet RECOVERING
Which is also striking - the start of the oplog "resets" itself every night at around 2 AM:
configured oplog size: 990MB
log length start to end: 19485secs (5.41hrs)
oplog first event time: Fri Oct 09 2015 02:00:33 GMT+0200 (CEST)
oplog last event time: Fri Oct 09 2015 07:25:18 GMT+0200 (CEST)
now: Fri Oct 09 2015 07:25:26 GMT+0200 (CEST)
I am not sure if this is somehow correlated to the issue. I am also wondering that such a small delay (Oct 9 01:59:23 5617035b:17f <-> Oct 9 01:59:23 5617035b:1af) lets the members become stale.
Could this also be a server (VM host) time issue or is it something completely different? (Why is the first oplog event being "resetted" every night and not "shifting" to a timestamp like NOW minus 24 hrs?)
What can I do to investigate and to avoid?
Upping the oplog size should solve this (per our comments).
Some references for others who run into this issue
Workloads that Might Require a Larger Oplog Size
Error: replSet error RS102 too stale to catch up link1 & link2

CQ/AEM Dispatcher does not flush Binaries

Our application imports binaries (mostly PDF) from a legacy system and stores them on a page together with some metadata.
If there was a change the page automatically gets activated. We see the replication events in the replication log and also on the dispatcher an invalidate event is logged. But there is no eviction entry and this the old binary is still cached.
We also have HTML pages next to these container pages for the binaries and they work as expected. Here the two log entries for the successful html and the unsuccessful PDF:
OK:
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] Found farm website for localhost:81
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] checking [/dispatcher/invalidate.cache]
[Thu Jul 03 09:26:33 2014] [I] [27635(24)] Activation detected: action=Activate [/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/test]
[Thu Jul 03 09:26:33 2014] [I] [27635(24)] Touched /app/C2Z/dyn/c2zcqdis/docroot/.stat
[Thu Jul 03 09:26:33 2014] [I] [27635(24)] Evicted /app/C2Z/dyn/c2zcqdis/docroot/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/test.html
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] response.status = 200
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] response.headers[Server] = "Communique/2.6.3 (build 5221)"
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] response.headers[Content-Type] = "text/html"
[Thu Jul 03 09:26:33 2014] [D] [27635(24)] cache flushed
[Thu Jul 03 09:26:33 2014] [I] [27635(24)] "GET /dispatcher/invalidate.cache" 200 13 2ms
Not OK
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] Found farm website for localhost:81
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] checking [/dispatcher/invalidate.cache]
[Thu Jul 03 09:30:45 2014] [I] [27635(24)] Activation detected: action=Activate [/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf]
[Thu Jul 03 09:30:45 2014] [I] [27635(24)] Touched /app/C2Z/dyn/c2zcqdis/docroot/.stat
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] response.status = 200
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] response.headers[Server] = "Communique/2.6.3 (build 5221)"
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] response.headers[Content-Type] = "text/html"
[Thu Jul 03 09:30:45 2014] [D] [27635(24)] cache flushed
[Thu Jul 03 09:30:45 2014] [I] [27635(24)] "GET /dispatcher/invalidate.cache" 200 13 1ms
The PDF in this case is stored in a node called 'download' directly below the jcr:content node. It's html container is never called directly and this is not available on the dispatcher. So a user directly requests the file:
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf/jcr%3acontent/download/file.res/as2p_vvm_ch_gl_fix_chf_.pdf
In the dispatcher.any we flush all html pages on activation, but not for the binaries. For testing, we added an allow *.pdf but this didn't help anyway.
/invalidate
{
/0000
{
/glob "*"
/type "deny"
}
/0001
{
/glob "*.html"
/type "allow"
}
}
In my opinion, the invalidate call should just delete the whole folder:
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf
Any ideas why our binaries do not get flushed?
UPDATE: In another post the statfileslevel property in the dispatcher.any is mentioned. In our environment this is commented out. Could it be that this could be the problem. Sadly I don't fully understand how this is supposed to work. Is the level meant from the wwwroot or from the page that is activated?
It looks like your problem with dispatcher flushing is that the path the file is being served from is using jcr%3acontent when it should use _jcr_content.
Dispatcher flushing deletes the folder _jcr_content under the path that is being flushed. It does not delete jcr%3acontent (urldecoded as jcr:content). So you should instead serve the pdf using this URL:
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf/_jcr_content/download/file.res/as2p_vvm_ch_gl_fix_chf_.pdf
This would then cache the pdf file under:
{CACHEROOT}/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf/_jcr_content/download/file.res/as2p_vvm_ch_gl_fix_chf_.pdf
Then when this path is flushed it will delete the subdirectory _jcr_content under the path of the flush
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf
To go into more detail, when you issue a flush request for path above then the following files and directories are deleted:
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf.* where * is a wildcard
/content/offering/s2p/en/offerings/documents/Swiss_Mandate_Line/Review/as2p_vvm_ch_gl_fix_chf__pdf/_jcr_content
See slide 23 in this presentation for details on how flushing works:
http://www.slideshare.net/andrewmkhoury/aem-cq-dispatcher-caching-webinar-2013
Not sure if this is the root cause, but what I suspect you probably need to do, is to go to localhost:4503/etc/replication/agents.publish.html (note, this is a publish instance, you can do it on the author and replicate the replication agents et al, but for the purposes of the POC, just do it directly on the publisher.)
Then go to your dispatcher flush agent, and click on edit settings.
Go to the triggers panel.
Make sure that the "On Receive" trigger is checked. What this does is enable chain replication, meaning that when a direct asset is published, it is directly deleted from the dispatcher, causing a miss on the next request, and thus pulling a fresh copy from the dispatcher.
Note that this kind of flushing is distinct from the stats file level flushing, which only flushes a directory, rather than a fully qualified path to the asset.
By the way, it's not stats file level. The stats file level by default is 0 if it is commented out, which invalidates anything below. What you seem to be looking for is an active delete of the cache. This is possible, as Dave just outlined to me for an unrelated problem in this post:
Is it possible to recursively flush directories in the CQ5/AEM apache dispatcher?
An approach would be to create a flush interceptor. Essentially a custom servlet on the publisher. What you would then do, is to configure the normal flush replicator to make a call to the local servlet on the publisher.
The servlet then detects whether it would need to delete the directory, or any particular files within. It can transform the flush path to the required path, and instead of a FLUSH action, use a DELETE action.
It would still be very important to send the flush to the normal dispatcher location.
Hope this helps.

MongoDB index creation goes past 100% and appears to loop forever

I have a MongoDB collection with ~5.5M records. My attempts to index it, whether on a single field or with a compoundIndex fail as the indexing process proceeds normally but then when it reaches 100% where I presume it should stop, it goes past 100% and just continues on. I've left it running for 10 hours but it never ended.
The fields I try to index on are longs or doubles.
I'm running the latest MongoDB version on x64 Windows.
Am I right to think that this is abnormal behaviour? Any ideas what I can do?
Wed Sep 05 10:22:37 [conn1] 415000000/5576219 7442%
Wed Sep 05 10:22:48 [conn1] 417000000/5576219 7478%
Wed Sep 05 10:22:59 [conn1] 419000000/5576219 7514%
Per helpful advice from mongodb-users list:
This was likely due to running out of disk space and getting the database corrupted due to that.
What I did is I cleared up disk space, then ran "mongodump --repair" and then "mongorestore".

Default Index Controller Not Being Called With New Zend Studio Project

I have just purchased a license for Zend Studio 9. I have only a minimal amount of experience with the Zend framework, and no previous experience with Zend Studio. I am using http://framework.zend.com/manual/en/ as a tutorial on the framework and have browsed through the resources located at http://www.zend.com/en/products/studio/resources for help with the studio software.
My main problem is that after creating a new Zend project with zstudio, I'm not seeing the initial welcome message. Here are the steps I am using:
I've already installed the Zend Server and confirmed that web apps are working (made some test files, they all parsed correctly).
Create a new project with Zend Studio.
a. File->New->Local PHP Project
b. For location, I am using C:\Program Files\Zend\Apache2\htdocs.
c. For version I used the default "Zend Framework 1.11.11 (Built-in)"
I go to http://localhost:81/projectname. Instead of the default index controller being called, I just see my directory structure.
Addition info:
OS: Windows 7
PHP version: 5.3
ERROR LOGS:
>[Wed Nov 30 14:32:30 2011] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
>[Wed Nov 30 14:32:30 2011] [warn] pid file C:/Program Files (x86)/Zend/Apache2/logs/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
>[Wed Nov 30 14:32:30 2011] [notice] Digest: generating secret for digest authentication ...
>[Wed Nov 30 14:32:30 2011] [notice] Digest: done
>[Wed Nov 30 14:32:31 2011] [notice] Apache/2.2.16 (Win32) mod_ssl/2.2.16 OpenSSL/0.9.8o configured -- resuming normal operations
>[Wed Nov 30 14:32:31 2011] [notice] Server built: Aug 8 2010 16:45:53
>[Wed Nov 30 14:32:31 2011] [notice] Parent: Created child process 13788
>[Wed Nov 30 14:32:32 2011] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
>[Wed Nov 30 14:32:32 2011] [notice] Digest: generating secret for digest authentication ...
>[Wed Nov 30 14:32:32 2011] [notice] Digest: done
>[Wed Nov 30 14:32:33 2011] [notice] Child 13788: Child process is running
>[Wed Nov 30 14:32:33 2011] [notice] Child 13788: Acquired the start mutex.
>[Wed Nov 30 14:32:33 2011] [notice] Child 13788: Starting 64 worker threads.
>[Wed Nov 30 14:32:33 2011] [notice] Child 13788: Starting thread to listen on port 10081.
>[Wed Nov 30 14:32:33 2011] [notice] Child 13788: Starting thread to listen on port 81.
If you navigate to http://localhost:81/projectname/index/index does the correct screen load?
If so:
Check that the .htaccess file in your public directory contains the correct rewrite rules for Zend Framework.
Check your httpd.conf file and make sure index.php is added to the DirectoryIndex directive.
I think the solution is going to be the second bullet, but let me know what you find and I can help further if that doesn't work. Make sure to restart apache after you make any changes to httpd.conf.
Otherwise, report any errors you see when you access the controller directly, and check Apache's error_log file to see if you get any errors.