I am creating 4 mountpoint disk in Windows OS. I need to copy files up to a threshold value (say 50 GB).
I tried with vdbench. It works fine, but it throws an exception at last.
compratio=4
dedupratio=1
dedupunit=256k
* Host Definition section
hd=default,user=Administator,shell=vdbench,jvms=1
hd=localhost,system=localhost
********************************************************************************
* Storage Definition section
fsd=fsd1,anchor=C:\UnMapTest-Volume1\disk1\,depth=1,width=1,files=1,size=5g
fsd=fsd2,anchor=C:\UnMapTest-Volume2\disk2\,depth=1,width=1,files=1,size=5g
fwd=fwd1,fsd=fsd*,operation=write,xfersize=1m,fileio=sequential,fileselect=random,threads=10
rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=1h,interval=1
Below is the exception from vdbench. Due to this my calling script would fail.
05:29:14.287 Message from slave localhost-0:
05:29:14.289 file=C:\UnMapTest-Volume1\disk1\\vdb.1_1.dir\vdb_f0001.file,busy=true
05:29:14.290 Thread: FwgThread write C:\UnMapTest-Volume1\disk1\ rd=rd1 For loops: None
05:29:14.291
05:29:14.292 last_ok_request: Thu Dec 28 05:28:57 PST 2017
05:29:14.292 Duration: 16.92 seconds
05:29:14.293 consecutive_blocks: 10001
05:29:14.294 last_block: FILE_BUSY File busy
05:29:14.294 operation: write
05:29:14.295
05:29:14.296 Do you maybe have more threads running than that you have
05:29:14.296 files and therefore some threads ultimately give up after 10000 tries?
05:29:14.300 *
05:29:14.301 ******************************************************
05:29:14.302 * Slave localhost-0 aborting: Too many thread blocks *
05:29:14.302 ******************************************************
05:29:14.303 *
05:29:21.235
05:29:21.235 Slave localhost-0 prematurely terminated.
05:29:21.235
05:29:21.235 Slave aborted. Abort message received:
05:29:21.235 Too many thread blocks
05:29:21.235
05:29:21.235 Look at file localhost-0.stdout.html for more information.
05:29:21.735
05:29:21.735 Slave localhost-0 prematurely terminated.
05:29:21.735
java.lang.RuntimeException: Slave localhost-0 prematurely terminated.
at Vdb.common.failure(common.java:335)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:198)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
I am using PowerShell in a Windows machine. Even if some other tools like Diskspd is having way to fill data up to some threshold then please provide me.
I found the answer by myself.
I have done this using Diskspd.exe as below
The following command fill 50 GB data in the mentioned disk folder
.\diskspd.exe -c50G -b4K -t2 C:\UnMapTest-Volume1\disk1\testfile1.dat
It is very simple than Vdbench for my requirement.
Caution : But it is not having real data so array side disk size is
not shown up with the size
Related
I'm using :
TYPO3 6.2
ke_search 2.2
Everything work fine except the indexing process, I mean :
If I manually index (with the backend module) it's OK, no error messages.
If I run manually the scheduler indexing task it's OK, no error messages.
If I run the scheduler with the php typo3/cli_dispatch.phpsh scheduler command, then I got this error :
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 87 bytes) in
/path_to_my_website/typo3/sysext/core/Classes/Cache/Frontend/VariableFrontend.php on line 99
For your information :
my PHP memory_limit setting is on 128M.
Other tasks are OK.
After this error appears on my console, the scheduler task is locked :
I can't figure out what's wrong ?
EDIT : I made flush frontend caches + flush general caches + flush system caches. If I run one more time the scheduler via the console, this is the new error I get :
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to
allocate 12288 bytes) in
/path_to_my_website/typo3/sysext/core/Classes/Database/QueryGenerator.php
on line 1265
EDIT 2 : if I disable all my indexer configurations, all goes well. But if I enable even 1 configuration -> PHP error.
Here is one of the indexer file :
Orientdb version - 2.2.26
CLuster - 3 node setup, readQuorum = 2, writeQuorum = "majority", ridBag.embeddedToSbtreeBonsaiThreshold = 2147483647
Nodes - CentOS 7.0, 24 cores and 96 GB RAM
Gremlin-scala/tinkerpop APIs are used for querying and inserting.
This code works fine on single node setup.
Code checks for existing vertex in graph. If vertex does not exist, then the insert operation are batched and send to the db within a transaction.
I see following warnings in orientdb log on all three nodes -
2017-09-15 16:37:31:025 WARNI [dev2] Timeout (852567ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.354 task=record_read(#65:22)) [ODistributedDatabaseImpl]
2017-09-15 16:52:18:239 WARNI [dev2] Timeout (1049042ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.568 task=record_read(#63:24)) [ODistributedDatabaseImpl]
2017-09-15 17:25:22:477 WARNI [dev2] Timeout (1984236ms) on waiting for synchronous responses from nodes=[dev1, dev3, dev2] responsesSoFar=[] request=(id=1.889 task=record_read(#63:24)) [ODistributedDatabaseImpl]
There is no problem on network. Firewall is disabled on all three nodes.
Are these logs related to the problem ?
What else I should check to fix the problem ?
We have an Orion instance which crashes about once each day or two.
In /var/log/contextBroker/contextBroker-service.out I have found:
log directory: '/var/log/contextBroker'
*** glibc detected *** /usr/bin/contextBroker: corrupted double-linked list: 0x00007f0ed92e3f70 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75f4e)[0x7f0eecdeaf4e]
/lib64/libc.so.6(+0x763d3)[0x7f0eecdeb3d3]
/lib64/libc.so.6(+0x78c88)[0x7f0eecdedc88]
/usr/lib64/libstdc++.so.6(_ZNSsD1Ev+0x39)[0x7f0eed6404c9]
/usr/bin/contextBroker(_Z9jsonParseP14ConnectionInfoPKcRKSsP8JsonNodeP9ParseData+0x539)[0x56fb99]
/usr/bin/contextBroker(_Z9jsonTreatPKcP14ConnectionInfoP9ParseData11RequestTypeRKSsPP11JsonRequest+0x17d)[0x56cf0d]
/usr/bin/contextBroker(_Z12payloadParseP14ConnectionInfoP9ParseDataP11RestServicePP10XmlRequestPP11JsonRequestP18JsonDelayedReleaseRSt6vectorISsSaISsEE+0x3f2)[0x564012]
/usr/bin/contextBroker(_Z11restServiceP14ConnectionInfoP11RestService+0x126c)[0x5654bc]
/usr/bin/contextBroker[0x55cbb6]
/usr/bin/contextBroker[0x55f987]
/usr/lib64/libmicrohttpd.so.10(+0x5599)[0x7f0eee1cf599]
/usr/lib64/libmicrohttpd.so.10(MHD_connection_handle_idle+0x518)[0x7f0eee1d0078]
/usr/lib64/libmicrohttpd.so.10(+0xc3c8)[0x7f0eee1d63c8]
/lib64/libpthread.so.0(+0x7a51)[0x7f0eec957a51]
/lib64/libc.so.6(clone+0x6d)[0x7f0eece5d93d]
And in /var/log/contextBroker/contextBroker-service.out.old the following:
log directory: '/var/log/contextBroker'
*** glibc detected *** /usr/bin/contextBroker: free(): invalid next size (fast): 0x00007fe6d4262110 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75f4e)[0x7fe6e8e9cf4e]
/lib64/libc.so.6(+0x78cf0)[0x7fe6e8e9fcf0]
/usr/bin/contextBroker(_ZN20ContextElementVector7releaseEv+0x2fa)[0x5f4a4a]
/usr/bin/contextBroker(_Z17postUpdateContextP14ConnectionInfoiRSt6vectorISsSaISsEEP9ParseDatab+0x1472)[0x4d6692]
/usr/bin/contextBroker(_Z11restServiceP14ConnectionInfoP11RestService+0x6d6)[0x564926]
/usr/bin/contextBroker[0x55cbb6]
/usr/bin/contextBroker[0x55f987]
/usr/lib64/libmicrohttpd.so.10(+0x5599)[0x7fe6ea281599]
/usr/lib64/libmicrohttpd.so.10(MHD_connection_handle_idle+0x518)[0x7fe6ea282078]
/usr/lib64/libmicrohttpd.so.10(+0xc3c8)[0x7fe6ea2883c8]
/lib64/libpthread.so.0(+0x7a51)[0x7fe6e8a09a51]
/lib64/libc.so.6(clone+0x6d)[0x7fe6e8f0f93d]
Data is sent to the Orion in batches each 5 minutes:
a request with around 500 contextElements
a request with around 10 contextElements
a request with a single contextElements
Orion has only 2 subscriptions (AFAIK) which send the data to Proton-CEP.
The Orion version is:
[centos#orion ~]$ /usr/bin/contextBroker --version
0.25.0 (git version: a8cf800d4e9fdd7b4293a886490c40309a5bb58c)
Copyright 2013-2015 Telefonica Investigacion y Desarrollo, S.A.U
Is there anything we can do to debug the issue?
Taking into account user inputs, Orion seems to be running below the recommended CPU and RAM thresholds (see recomendations). Thus, probably with more resources (e.g. 2 vCPU and 4GM RAM) it run better, specially if MongoDB runs in the same machine that Orion (MongoDB is known to be a memory-intensive process).
I am executing below C# code -
for (; ; )
{
Console.WriteLine("Doc# {0}", ctr++);
BsonDocument log = new BsonDocument();
log["type"] = "auth";
BsonDateTime time = new BsonDateTime(DateTime.Now);
log["when"] = time;
log["user"] = "staticString";
BsonBoolean bol = BsonBoolean.False;
log["res"] = bol;
coll.Insert(log);
}
When I run it on a MongoDB instance (version 2.0.2) running on virtual 64 bit Linux machine with just 512 MB ram, I get about 5k inserts with 1-2 faults as reported by mongostat after few mins.
When same code is run against a MongoDB instance (version 2.0.2) running on a physical Windows machine with 8 GB of ram, I get 2.5k inserts with about 80 faults as reported by mongostat after few mins.
Why more faults are occurring on Windows? I can see following message in logs-
[DataFileSync] FlushViewOfFile failed 33 file
Journaling is disable on both instances
Also, is 5k insert on a virtual machine with 1-2 faults a good enough speed? or should I be expecting better inserts?
Looks like this is a known issue - https://jira.mongodb.org/browse/SERVER-1163
page fault counter on Windows is in fact the total page faults which include both hard and soft page fault.
Process : Page Faults/sec. This is an indication of the number of page faults that
occurred due to requests from this particular process. Excessive page faults from a
particular process are an indication usually of bad coding practices. Either the
functions and DLLs are not organized correctly, or the data set that the application
is using is being called in a less than efficient manner.
I've been banging my head on this one for a few days, and hope that somebody out there will have some insight.
I've written a streaming map reduce job in perl that is prone to having one or two reduce tasks take an extremely long time to execute. This is due to a natural asymmetry in the data: some of the reduce keys have over a million rows, where most have only a few dozen.
I've had problems with long tasks before, and I've been incrementing counters throughout to ensure that map reduce doesn't time them out. But now they are failing with an error message I hadn't seen before:
java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
This is not the standard timeout error message, but the error code 137 = 128+9 suggests that my reducer script received a kill -9 from Hadoop. The tasktracker log contains the following:
2011-09-05 19:18:31,269 WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_201109051336_0003_m_000029_1,7) failed :
org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2940)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
at sun.nio.ch.IOUtil.write(IOUtil.java:43)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
... 24 more
2011-09-05 19:18:31,289 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.92.8.202:50060, dest: 10.92.8.201:46436, bytes: 7340032, op: MAPRED_SHUFFLE, cliID: attempt_201109051336_0003_m_000029_1
2011-09-05 19:18:31,292 ERROR org.mortbay.log: /mapOutput
java.lang.IllegalStateException: Committed
at org.mortbay.jetty.Response.resetBuffer(Response.java:994)
at org.mortbay.jetty.Response.sendError(Response.java:240)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2963)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
I've been looking around the forums, and it sounds like the Warnings are commonly found in jobs that run without error, and can usually be ignored. The error makes it look like the reducer lost contact with the map output, but I can't figure out why. Does anyone have any ideas?
Here's a potentially relevant fact: These long tasks were making my job take days where it should take minutes. Since I can live without the output from one or two keys, I tried to implement a ten minute timeout in my reducer as follows:
eval {
local $SIG{ALRM} = sub {
print STDERR "Processing of new merchant names in $prev_merchant_zip timed out...\n";
print STDERR "reporter:counter:tags,failed_zips,1\n";
die "timeout";
};
alarm 600;
#Code that could take a long time to execute
alarm 0;
};
This timeout code works like a charm when I test the script locally, but the strange mapreduce errors started after I introduced it. However, the failures seem to occur well after the first timeout, so I'm not sure if it's related.
Thanks in advance for any help!
Two possibilities come to mind:
RAM usage, if a tasks uses too much RAM the OS can kill it (after horrible swapping, etc).
Are you using any non-reentrant libraries? Maybe the timer is being triggered at an inopportune point in a library call.
Exit code 137 is a typical sign of the infamous OOM killer. You can easily check it using dmesg command for messages like this:
[2094250.428153] CPU: 23 PID: 28108 Comm: node Tainted: G C O 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u2
[2094250.428155] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
[2094250.428156] ffff880773439400 ffffffff8150dacf ffff881328ea32f0 ffffffff8150b6e7
[2094250.428159] ffff881328ea3808 0000000100000000 ffff88202cb30080 ffff881328ea32f0
[2094250.428162] ffff88107fdf2f00 ffff88202cb30080 ffff88202cb30080 ffff881328ea32f0
[2094250.428164] Call Trace:
[2094250.428174] [<ffffffff8150dacf>] ? dump_stack+0x41/0x51
[2094250.428177] [<ffffffff8150b6e7>] ? dump_header+0x76/0x1e8
[2094250.428183] [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90
[2094250.428186] [<ffffffff8114088d>] ? oom_kill_process+0x21d/0x370
[2094250.428188] [<ffffffff8114044d>] ? find_lock_task_mm+0x3d/0x90
[2094250.428193] [<ffffffff811a053a>] ? mem_cgroup_oom_synchronize+0x52a/0x590
[2094250.428195] [<ffffffff8119fac0>] ? mem_cgroup_try_charge_mm+0xa0/0xa0
[2094250.428199] [<ffffffff81141040>] ? pagefault_out_of_memory+0x10/0x80
[2094250.428203] [<ffffffff81057505>] ? __do_page_fault+0x3c5/0x4f0
[2094250.428208] [<ffffffff8109d017>] ? put_prev_entity+0x57/0x350
[2094250.428211] [<ffffffff8109be86>] ? set_next_entity+0x56/0x70
[2094250.428214] [<ffffffff810a2c61>] ? pick_next_task_fair+0x6e1/0x820
[2094250.428219] [<ffffffff810115dc>] ? __switch_to+0x15c/0x570
[2094250.428222] [<ffffffff81515ce8>] ? page_fault+0x28/0x30
You can easily if OOM is enabled:
$ cat /proc/sys/vm/overcommit_memory
0
Basically OOM killer tries to kill process that eats largest part of memory. And you probably don't want to disable it:
The OOM killer can be completely disabled with the following command.
This is not recommended for production environments, because if an
out-of-memory condition does present itself, there could be unexpected
behavior depending on the available system resources and
configuration. This unexpected behavior could be anything from a
kernel panic to a hang depending on the resources available to the
kernel at the time of the OOM condition.
sysctl vm.overcommit_memory=2
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf
Same situation can happen if you use e.g. cgroups for limiting memory. When process exceeds given limit it gets killed without warning.
I got this error. Kill a day and found it was an infinite loop somewhere in the code.