Kernel crash - NULL pointer dereference when calling DEVICE_WRITE from KTHREAD in a USB device driver - drivers

I'm writing a simple USB driver to drive a stepper motor based on USB Skeleton 2.2 Driver, kernel 3.8. The basic version is running properly. As a advancement, I introduced KTHREAD to call the DEVICE_WRITE (skel_write) (), so that the driver will be available for other tasks & requests.
Calling procedure : USER (request) -> DEVICE_IOCTL -> KTHREAD -> DEVICE_WRITE.
In this scenario, when I call the DEVICE_WRITE multiple times from KTHREAD through a loop, everything works fine. Then after some iterations, kernel gets messed up, Otherwise if called directly works fine. Upon seeing the log file, the error is :
Dec 30 01:15:14 mit kernel: [ 962.316843] device_write(efed1180,2,10),ioused : 1
Dec 30 01:15:14 mit kernel: [ 962.316900] data : 0, motor_cnt : 2, master_counter : 20
Dec 30 01:15:14 mit kernel: [ 962.366498] data : 1, motor_cnt : 2, master_counter : 21
Dec 30 01:15:14 mit kernel: [ 962.416116] Write over, going for sleep
Dec 30 01:15:14 mit kernel: [ 962.416125] file : efed1180,data : 2,i : 11
Dec 30 01:15:14 mit kernel: [ 962.416128] device_write(efed1180,2,10),ioused : 1
Dec 30 01:15:14 mit kernel: [ 962.416166] BUG: unable to handle kernel NULL pointer dereference at (null)
Dec 30 01:15:14 mit kernel: [ 962.416254] IP: [] skel_write+0xd7/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [ 962.416294] *pdpt = 0000000000000000* pde= f0002accf0002acc
Dec 30 01:15:14 mit kernel: [ 962.416332] Oops: 0000 [#1] SMP
Dec 30 01:15:14 mit kernel: [ 962.416363] Modules linked in: usbstep(OF) parport_pc(F) ppdev(F) bnep rfcomm bluetooth snd_hda_codec_hdmi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_hda_codec_idt coretemp snd_hda_intel kvm snd_hda_codec snd_hwdep(F) snd_pcm(F) snd_page_alloc(F) joydev(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) hp_wmi lib80211_crypt_tkip snd_seq(F) snd_seq_device(F) snd_timer(F) sparse_keymap radeon wl(POF) lib80211 ttm drm_kms_helper cfg80211 drm hp_accel lis3lv02d mei input_polldev wmi i2c_algo_bit video(F) intel_ips mac_hid snd(F) lpc_ich soundcore(F) microcode(F) lp(F) parport(F) psmouse(F) serio_raw(F) r8169 ahci(F) libahci(F) [last unloaded: usbstep]
Dec 30 01:15:14 mit kernel: [ 962.416866] Pid: 2997, comm: mitesh Tainted: PF O 3.8.0-26-generic #38-Ubuntu Hewlett-Packard HP ProBook 4520s/1411
Dec 30 01:15:14 mit kernel: [ 962.416928] EIP: 0060:[] EFLAGS: 00010287 CPU: 2
Dec 30 01:15:14 mit kernel: [ 962.416960] EIP is at skel_write+0xd7/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [ 962.416989] EAX: f0665b84 EBX: 00000014 ECX: 000000d0 EDX: 00000014
Dec 30 01:15:14 mit kernel: [ 962.417024] ESI: f0665b40 EDI: 00000000 EBP: efddbf40 ESP: efddbf04
Dec 30 01:15:14 mit kernel: [ 962.417059] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 30 01:15:14 mit kernel: [ 962.417089] CR0: 8005003b CR2: 00000000 CR3: 019d1000 CR4: 000007f0
Dec 30 01:15:14 mit kernel: [ 962.417124] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Dec 30 01:15:14 mit kernel: [ 962.417158] DR6: ffff0ff0 DR7: 00000400
Dec 30 01:15:14 mit kernel: [ 962.417181] Process mitesh (pid: 2997, ti=efdda000 task=f0bed9b0 task.ti=efdda000)
Dec 30 01:15:14 mit kernel: [ 962.417223] Stack:
Dec 30 01:15:14 mit kernel: [ 962.417236] f0665b84 efddbf58 efddbf58 0000000a 00000001 efddbf58 00000000 efddbf40
Dec 30 01:15:14 mit kernel: [ 962.417301] c1609d81 00000002 f06c5d40 00000014 0000000c efddbf58 f1487408 efddbf6c
Dec 30 01:15:14 mit kernel: [ 962.417398] f8585546 00000000 efed1180 efddbf58 0000000b f6eb0032 aa092dff f6eb7ebc
Dec 30 01:15:14 mit kernel: [ 962.417475] Call Trace:
Dec 30 01:15:14 mit kernel: [ 962.417504] [] ? printk+0x4d/0x4f
Dec 30 01:15:14 mit kernel: [ 962.417559] [] tele+0x86/0xc0 [usbstep]
Dec 30 01:15:14 mit kernel: [ 962.417618] [] ? skel_write+0x360/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [ 962.417691] [] kthread+0x94/0xa0
Dec 30 01:15:14 mit kernel: [ 962.417744] [] ? __hrtimer_start_range_ns+0x2e0/0x460
Dec 30 01:15:14 mit kernel: [ 962.417819] [] ret_from_kernel_thread+0x1b/0x28
Dec 30 01:15:14 mit kernel: [ 962.417886] [] ? kthread_create_on_node+0xc0/0xc0
Dec 30 01:15:14 mit kernel: [ 962.417951] Code: c0 89 c6 0f 84 83 01 00 00 83 c3 0a b8 00 0e 00 00 81 fb 00 0e 00 00 b9 d0 00 00 00 0f 46 c3 89 45 f0 8d 46 44 8b 55 f0 89 04 24 <8b> 07 e8 52 9f ee c8 85 c0 89 45 e4 0f 84 0f 01 00 00 8d 47 54
Dec 30 01:15:14 mit kernel: [ 962.418433] EIP: [] skel_write+0xd7/0x360 [usbstep] SS:ESP 0068:efddbf04
Dec 30 01:15:14 mit kernel: [ 962.418530] CR2: 0000000000000000
Dec 30 01:15:14 mit kernel: [ 962.433930] ---[ end trace 63245eeeb64414aa ]---
Here goes the code : KTHREAD
int tele(void *__tele_data) {
struct tele_data *tele_data = __tele_data;
int i=0;
char *dptr=NULL;
char numb[4];
sprintf(numb,"%d",tele_data->num);
dptr=numb;
for(i=0;i<30;i++) {
is_ioctl_used=1;
printk("file : %p,data : %s,i : %d\n", tele_data->file,dptr,i);
skel_write(tele_data->file,(char *)dptr, 10, 0);
printk("Write over, going for sleep\n");
}
return 0;
}
DEVICE_WRITE -
static ssize_t skel_write(struct file *file, const char *user_buffer,
size_t count, loff_t *ppos)
{
struct usb_skel *dev;
int retval = 0,i = 0,motor_count,dir=0;
struct urb *urb = NULL;
char *buf = NULL;
char *buf1 = NULL;
size_t writesize = min(count+10, (size_t)MAX_TRANSFER);
printk(KERN_INFO "device_write(%p,%s,%d),ioused : %d\n", file, user_buffer, count,is_ioctl_used);
dev = file->private_data;
// verify that we actually have some data to write
if (count == 0)
goto exit;
/*
* limit the number of URBs in flight to stop a user from using up all
* RAM
*/
if (!(file->f_flags & O_NONBLOCK)) {
if (down_interruptible(&dev->limit_sem)) {
retval = -ERESTARTSYS;
goto exit;
}
} else {
if (down_trylock(&dev->limit_sem)) {
retval = -EAGAIN;
goto exit;
}
}
spin_lock_irq(&dev->err_lock);
retval = dev->errors;
if (retval < 0) {
// any error is reported once
dev->errors = 0;
// to preserve notifications about reset
retval = (retval == -EPIPE) ? retval : -EIO;
}
spin_unlock_irq(&dev->err_lock);
if (retval < 0)
goto error;
/* create a urb, and a buffer for it, and copy the data to the urb */
buf1=(char *)kmalloc(sizeof(char)*20,GFP_KERNEL); //Allocate 2nd buffer.
if(is_ioctl_used) { //Whether the write function is called from IOCTL or Directly (echo > /dev/stepper)
sprintf(buf1,user_buffer);
} else {
if (copy_from_user(buf1, user_buffer,count)) {
retval = -EFAULT;
goto error;
}
}
motor_count=simple_strtol(buf1,NULL,10);
if(motor_count<0) { //Rotation counts of stepper motor.
motor_count=motor_count * -1; //If motor_count<0 then rotate in anti-clock direction.
dir=1;
}
urb = usb_alloc_urb(0, GFP_KERNEL);
if (!urb) {
retval = -ENOMEM;
goto error;
}
buf = usb_alloc_coherent(dev->udev, writesize, GFP_KERNEL,
&urb->transfer_dma);
if (!buf) {
retval = -ENOMEM;
goto error;
}
/* this lock makes sure we don't submit URBs to gone devices */
mutex_lock(&dev->io_mutex);
if (!dev->interface) { /* disconnect() was called */
mutex_unlock(&dev->io_mutex);
retval = -ENODEV;
goto error;
}
/* initialize the urb properly */
usb_fill_int_urb(urb, dev->udev,
usb_sndintpipe(dev->udev, dev->bulk_out_endpointAddr),
buf, writesize, skel_write_bulk_callback, dev,dev->bInterval);
urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
usb_anchor_urb(urb, &dev->submitted);
for(i=0;i<motor_count;i++) { //Loop to rotate motor based on counts.
printk("data : %d, motor_cnt : %d, master_counter : %d\n",ptr->data,motor_count,master_counter);
if(dir==0) ptr=ptr->next;
else ptr=ptr->prev;
// Fill the buffers.
buf[0]=0x01;
buf[1]=0;
buf[2]=ptr->data;
/* send the data out the bulk port */
retval = usb_submit_urb(urb, GFP_KERNEL);
if (retval) {
dev_err(&dev->interface->dev,
"%s - failed submitting write urb, error %d\n",
__func__, retval);
mutex_unlock(&dev->io_mutex);
goto error_unanchor;
}
if(++master_counter && master_counter > 47) master_counter=0;
/*
* release our reference to this urb, the USB core will eventually free
* it entirely
*/
mdelay(50); //Delay is required to match with motor speed.
}
mutex_unlock(&dev->io_mutex);
usb_free_coherent(dev->udev, writesize, buf, urb->transfer_dma);
kfree(buf1);
usb_free_urb(urb);
is_ioctl_used=0;
return writesize;
error_unanchor:
usb_unanchor_urb(urb);
error:
if (urb) {
usb_free_coherent(dev->udev, writesize, buf, urb->transfer_dma);
usb_free_urb(urb);
}
up(&dev->limit_sem);
exit:
return retval;
}
I'm new to kernel programming and might be missing out something.

I don't know if this is the root cause of your problem, but it seems like you have a number of issues in your tele() function:
int tele(void *__tele_data) {
struct tele_data *tele_data = __tele_data;
int i=0;
char *dptr=NULL;
char numb[4];
sprintf(numb,"%d",tele_data->num);
Here, you sprintf() the number into the numb buffer. What is the range of tele_data->num? Would it ever take more than 4 characters (including the terminating NUL character)? Also, you're not recording how many bytes were printed in the buffer. Seems like you'd want to know that for use below...
dptr=numb;
Okay, so now dptr point to numb. Which means it points to a character buffer that has a maximum of 4 bytes, but...
for(i=0;i<30;i++) {
is_ioctl_used=1;
printk("file : %p,data : %s,i : %d\n", tele_data->file,dptr,i);
skel_write(tele_data->file,(char *)dptr, 10, 0);
In the skel_write() line above, you're requesting 10 bytes to be written. That's 6 more than is available. So you could be smashing the stack here.
I'm not convinced it's your only issue, but it does appear to be a problem.
Just a couple of other minor things to point out. You don't need the cast on dptr in the skel_write() line... it's already a char *. Be wary of casting as it can hide an unintentional mismatch of types if the type of the variable changes. Also, the indentation in your code is all over the place. I realize you're just learning, but get in the habit of good practices here. It's really hard to read through your skel_write() implementation. The are likely a few other issues there, and something as simple as correct indentation can help readers understand the flow, and potentially see the issue.
Finally, don't give up. Kernel programming is hard: there are lots of moving parts, concurrency, locking, caching, and a very asynchronous style of programming. OTOH, you're down close to the bare metal of your processor and system, and it's quite rewarding.

Mit,
May you need to scrutinize the kthreads behaviour w.r.t operations you are doing from inside it
See: In what context Kernel Thread runs in Linux?

Related

Socket error on client <>, disconnecting with Paho MQTT-SN Gateway and ESP8266 CLient

I'm trying to test MQTT-SN.
I'm using Mosquitto Broker, Paho MQTT-SN Gateway and this library (https://github.com/S3ler/arduino-mqtt-sn-client) for the clients.
I'm using an esp8266 as a client.
With this client, I can connect, subscribe, receive from subscribed topics but I cant publish into topics
memset(buffer, 0x0, buffer_length);
mqttSnClient.publish(buffer, publishTopicName , qos);
Every time I try to publish with this client, Mosquitto gives me
Socket error on client <clientid>, disconnecting
And my client disconnects from the Broker.
Any clues?
EDIT1
Client Code
#include <ESP8266WiFi.h>
#include <WiFiUdp.h>
#include "WiFiUdpSocket.h"
#include "MqttSnClient.h"
#include <NTPClient.h>
const char* ssid = "example";
const char* password = "example1";
long utcOffsetInSeconds = -10800;
// Define NTP Client to get time
WiFiUDP ntpUDP;
NTPClient timeClient(ntpUDP, "pool.ntp.org", utcOffsetInSeconds);
#define buffer_length 10
char buffer[buffer_length + 1];
uint16_t buffer_pos = 0;
IPAddress gatewayIPAddress(192, 168, 0, 106);
uint16_t localUdpPort = 10000;
WiFiUDP udp;
WiFiUdpSocket wiFiUdpSocket(udp, localUdpPort);
MqttSnClient<WiFiUdpSocket> mqttSnClient(wiFiUdpSocket);
const char* clientId = "hamilton12";
char* subscribeTopicName = "ESP8266/123";
char* publishTopicName = "ESP8266/123";
int8_t qos = 1;
void mqttsn_callback(char *topic, uint8_t *payload, uint16_t length, bool retain) {
timeClient.update();
Serial.print("Received - Topic: ");
Serial.print(topic);
Serial.print(" Payload: ");
for (uint16_t i = 0; i < length; i++) {
char c = (char) * (payload + i);
Serial.print(c);
}
Serial.print(" Lenght: ");
Serial.print(length);
Serial.print(" Received Timestamp milliseconds: ");
Serial.print(timeClient.getHours());
Serial.print(":");
Serial.print(timeClient.getMinutes());
Serial.print(":");
Serial.println(timeClient.getSeconds());
}
void setup() {
Serial.begin(115200);
delay(10);
Serial.println();
Serial.print("Connecting to ");
Serial.println(ssid);
/* Explicitly set the ESP8266 to be a WiFi-client, otherwise, it by default,
would try to act as both a client and an access-point and could cause
network-issues with your other WiFi-devices on your WiFi-network. */
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
Serial.println("IP address: ");
Serial.println(WiFi.localIP());
Serial.print("Starting MqttSnClient - ");
mqttSnClient.setCallback(mqttsn_callback);
if (!mqttSnClient.begin()) {
Serial.print("Could not initialize MQTT-SN Client ");
while (true) {
Serial.println(".");
delay(1000);
}
}
Serial.println(" ready!");
}
void convertIPAddressAndPortToDeviceAddress(IPAddress& source, uint16_t port, device_address& target) {
// IPAdress 0 - 3 bytes
target.bytes[0] = source[0];
target.bytes[1] = source[1];
target.bytes[2] = source[2];
target.bytes[3] = source[3];
// Port 4 - 5 bytes
target.bytes[4] = port >> 8;
target.bytes[5] = (uint8_t) port ;
}
void loop() {
if (!mqttSnClient.is_mqttsn_connected()) {
#if defined(gatewayHostAddress)
IPAddress gatewayIPAddress;
if (!WiFi.hostByName(gatewayHostAddress, gatewayIPAddress, 20000)) {
Serial.println("Could not lookup MQTT-SN Gateway.");
return;
}
#endif
device_address gateway_device_address;
convertIPAddressAndPortToDeviceAddress(gatewayIPAddress, localUdpPort, gateway_device_address);
Serial.print("MQTT-SN Gateway device_address: ");
printDeviceAddress(&gateway_device_address);
if (!mqttSnClient.connect(&gateway_device_address, clientId, 180) ) {
Serial.println("Could not connect MQTT-SN Client.");
delay(1000);
return;
}
Serial.println("MQTT-SN Client connected.");
//mqttSnClient.set_mqttsn_connected();
if (!mqttSnClient.subscribe(subscribeTopicName, qos)){
Serial.println("Cant subscribe");
}
Serial.println("Subscribed");
}
//It never enters this IF
if (Serial.available() > 0) {
buffer[buffer_pos++] = Serial.read();
if (buffer[buffer_pos - 1] == '\n') {
// only qos -1, 0, 1 are supported
if (!mqttSnClient.publish(buffer, publishTopicName , qos)) {
Serial.println("Could not publish");
}
Serial.println("Published");
memset(buffer, 0x0, buffer_length);
buffer_pos = 0;
}
}
//Uncommenting this line will give socket error
//mqttSnClient.publish(buffer, publishTopicName , qos);
mqttSnClient.loop();
}
etc/mosquitto/mosquitto.conf
pid_file /var/run/mosquitto.pid
persistence true
persistence_location /var/lib/mosquitto/
log_dest file /var/log/mosquitto/mosquitto.log
#include_dir /etc/mosquitto/conf.d
connection_messages true
log_timestamp true
log_dest stderr
log_type error
log_type warning
log_type debug
allow_anonymous true
gateway.conf
BrokerName=192.168.0.106
BrokerPortNo=1883
BrokerSecurePortNo=8883
#
# When AggregatingGateway=YES or ClientAuthentication=YES,
# All clients must be specified by the ClientList File
#
ClientAuthentication=NO
AggregatingGateway=NO
QoS-1=NO
Forwarder=NO
#ClientsList=/path/to/your_clients.conf
PredefinedTopic=NO
#PredefinedTopicList=/path/to/your_predefinedTopic.conf
#RootCAfile=/etc/ssl/certs/ca-certificates.crt
#RootCApath=/etc/ssl/certs/
#CertsFile=/path/to/certKey.pem
#PrivateKey=/path/to/privateKey.pem
GatewayID=1
GatewayName=PahoGateway-01
KeepAlive=900
#LoginID=your_ID
#Password=your_Password
# UDP
GatewayPortNo=10000
MulticastIP=225.1.1.1
MulticastPortNo=1884
# UDP6
GatewayUDP6Bind=FFFF:FFFE::1
GatewayUDP6Port=10000
GatewayUDP6Broadcast=FF02::1
GatewayUDP6If=wpan0
# XBee
Baudrate=38400
SerialDevice=/dev/ttyUSB0
ApiMode=2
# LOG
ShearedMemory=NO;
EDIT2
Terminal running mosquitto
hamilton#hamilton-note:~$ mosquitto
1574806892: mosquitto version 1.4.15 (build date Tue, 18 Jun 2019 11:42:22 -0300) starting
1574806892: Using default config.
1574806892: Opening ipv4 listen socket on port 1883.
1574806892: Opening ipv6 listen socket on port 1883.
1574806900: New connection from 192.168.0.106 on port 1883.
1574806900: New client connected from 192.168.0.106 as hamilton123 (c1, k46080).
1574806900: Socket error on client hamilton123, disconnecting.
^C1574806911: mosquitto version 1.4.15 terminating
Terminal running Paho Gateway
hamilton#hamilton-note:~/Downloads$ ./MQTT-SNGateway
***************************************************************************
* MQTT-SN Transparent Gateway
* Part of Project Paho in Eclipse
* (http://git.eclipse.org/c/paho/org.eclipse.paho.mqtt-sn.embedded-c.git/)
*
* Author : Tomoaki YAMAGUCHI
* Version: 1.3.1
***************************************************************************
20191126 192134.372 PahoGateway-01 has been started.
ConfigFile: ./gateway.conf
PreDefFile: ./predefinedTopic.conf
SensorN/W: UDP Multicast 225.1.1.1:1884 Gateway Port 10000
Broker: 192.168.0.106 : 1883, 8883
RootCApath: (null)
RootCAfile: (null)
CertKey: (null)
PrivateKey: (null)
20191126 192140.660 CONNECT <--- hamilton123 12 04 04 01 B4 00 68 61 6D 69 6C 74 6F 6E 31 32 33 00
20191126 192140.660 CONNECT ===> hamilton123 10 17 00 04 4D 51 54 54 04 02 B4 00 00 0B 68 61 6D 69 6C 74 6F 6E 31 32 33
20191126 192140.874 CONNACK <=== hamilton123 20 02 00 00
20191126 192140.874 CONNACK ---> hamilton123 03 05 00
20191126 192140.879 SUBSCRIBE 0200 <--- hamilton123 11 12 20 02 00 45 53 50 38 32 36 36 2F 31 32 33 00
20191126 192140.879 SUBSCRIBE 0200 ===> hamilton123 82 10 02 00 00 0B 45 53 50 38 32 36 36 2F 31 32 33 01
20191126 192140.879 SUBACK 0200 <=== hamilton123 90 03 02 00 01
20191126 192140.879 SUBACK 0200 ---> hamilton123 08 13 20 00 01 02 00 00
20191126 192140.883 PUBLISH 0300 <--- hamilton123 08 0C 22 00 01 03 00 00
20191126 192140.884 PUBLISH 0300 ===> hamilton123 32 07 00 02 00 01 03 00 00
^C20191126 192149.215 BrokerSendTask stopped.
20191126 192149.215 PacketHandleTask stopped.
20191126 192149.215 ClientSendTask stopped.
20191126 192149.386 BrokerRecvTask stopped.
20191126 192150.158 ClientRecvTask stopped.
20191126 192150.215 MQTT-SN Gateway stoped
Thank you for the help Dalton Cézane.
But I found the problem in an open issue in the client's library:
Having trouble with your example WiFiUdpMqttSnClient program in that
it does not successfully publish the test messages. I'm using
paho-mqtt-sn gateway.
I'm bashing around in the dark a bit but I think this is because it
publishes the messages with the flag TopicIdType set to 2. I think it
should be zero (normal) because it's not pre-registered nor is it a
short topic.
In file MqttSnClient.h line 216 the call to send_publish has
short_topic set to true. But that's not all; in file mqttsn_messages.h
around line 215 if short_topic flag is false it sets the flag to
predefined. I've removed the latter 'else' clause so the flag is set
to zero and I can now publish successfully.
I suspect my hack is not a complete solution but I hope it helps you
resolve this issue.
This comment was made by #nottledim, big thanks!
Now i can publish without a problem using my esp8266.
Just leaving here if anyone has this problem.
link to the issue: https://github.com/S3ler/arduino-mqtt-sn-client/issues/3

MMC crash in windows 10

On my Windows 10, mmc.exe crashes when I try to add the certificate snap-in. When I click "Ok" after having added the snap-in (computer account, local computer) the message: "Microsoft Management Console has stopped working" and I am offered a debug option.
There are no further error messages.
I have tried to run "sfc /scannow" and found nothing to repair.
The MMC is crashing because of the SqlManager.dll from SQLServer 2014 RTM (2014.0120.2000.08 ((SQL14_RTM).140220-1924 ))
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
DUMP_CLASS: 2
DUMP_QUALIFIER: 400
CONTEXT: (.ecxr)
rax=0000000000000000 rbx=0000000072f3df90 rcx=000000000000000a
rdx=0000000072f3df90 rsi=0000000000000000 rdi=0000000080000010
rip=00007ffcb524a030 rsp=000000000f13ec18 rbp=000000001339d408
r8=000000000f13eb78 r9=000000001339d408 r10=0000000000000000
r11=000000000f13ebe0 r12=0000000000220a5e r13=0000000000000090
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
ntdll!RtlFailFast2:
00007ffc`b524a030 cd29 int 29h
Resetting default scope
FAULTING_IP:
ntdll!RtlFailFast2+0
00007ffc`b524a030 cd29 int 29h
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ffcb524a030 (ntdll!RtlFailFast2)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 000000000000000a
Subcode: 0xa FAST_FAIL_GUARD_ICALL_CHECK_FAILURE
PROBLEM_CLASSES:
ID: [0n262]
Type: [FAIL_FAST]
Class: Primary
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [Unspecified]
Frame: [0]
ID: [0n256]
Type: [GUARD_ICALL_CHECK_FAILURE]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [Unspecified]
Frame: [0]
ID: [0n92]
Type: [AVRF]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [0x3be8]
TID: [0x3cf0]
Frame: [0] : ntdll!RtlFailFast2
BUGCHECK_STR: FAIL_FAST_GUARD_ICALL_CHECK_FAILURE_AVRF
DEFAULT_BUCKET_ID: FAIL_FAST_GUARD_ICALL_CHECK_FAILURE_AVRF
PRIMARY_PROBLEM_CLASS: FAIL_FAST
STACK_TEXT:
00 ntdll!RtlFailFast2
01 ntdll!RtlpHandleInvalidUserCallTarget
02 ntdll!LdrpHandleInvalidUserCallTarget
03 user32!UserCallWinProcCheckWow
04 user32!DispatchClientMessage
05 user32!_fnDWORD
06 ntdll!KiUserCallbackDispatcherContinue
07 win32u!NtUserDestroyWindow
08 SqlManager_72c70000!CEventRegWnd::~CEventRegWnd
09 SqlManager_72c70000!CRT_INIT
0a SqlManager_72c70000!CRT_INIT
0b verifier!AVrfpStandardDllEntryPointRoutine
0c ntdll!LdrpCallInitRoutine
0d ntdll!LdrpProcessDetachNode
0e ntdll!LdrpUnloadNode
0f ntdll!LdrpDecrementModuleLoadCountEx
10 ntdll!LdrUnloadDll
11 KERNELBASE!FreeLibrary
12 combase!CClassCache::CDllPathEntry::CFinishObject::Finish
13 combase!CClassCache::CFinishComposite::Finish
14 combase!CClassCache::CleanUpDllsForApartment
15 combase!CCCleanUpDllsForApartment
16 combase!FinishShutdown::__l2::<lambda_ac39365968346bea08de70a73a47183a>::operator()
17 combase!ObjectMethodExceptionHandlingAction<<lambda_ac39365968346bea08de70a73a47183a> >
18 combase!FinishShutdown
19 combase!ApartmentUninitialize
1a combase!wCoUninitialize
1b combase!CoUninitialize
1c verifier!AVrfpCoUninitialize
1d mmcndmgr!MMC21ADDREMOVEUI::CAboutInfoThread::ThreadProc
1e msvcrt!_callthreadstartex
1f msvcrt!_threadstartex
20 verifier!AVrfpStandardThreadFunction
21 kernel32!BaseThreadInitThunk
22 ntdll!RtlUserThreadStart
So update the SQL Server to latest Service Pack and Update Hotfix Rollup. Import the uninstall .reg to disable dump creation and app verifier.

JVM Crash due to `EXCEPTION_ACCESS_VIOLATION` in org.infinispan.util.concurrent.jdk8backported.LongAdder

This is happening on JBoss EAP 6.1, for single user it works fine, But as soon as 2 or 3 concurrent user start interacting with the application, JVM gets crashed.
The JVM is crashing with EXCEPTION_ACCESS_VIOLATION; the full trace is below:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x024c8266, pid=5136, tid=8736
#
# JRE version: 6.0_45-b06
# Java VM: Java HotSpot(TM) Client VM (20.45-b01 mixed mode windows-x86 )
# Problematic frame:
# J org.infinispan.util.concurrent.jdk8backported.LongAdder.add(J)V
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x6602dc00): JavaThread "http-ds-7071bc90200f..corp.in/10.112.70.75:8080-6" daemon [_thread_in_Java, id=8736, stack(0x663a0000,0x663f0000)]
siginfo: ExceptionCode=0xc0000005, reading address 0x8ddb6987
Registers:
EAX=0x00000002, EBX=0xffffffff, ECX=0x00000001, EDX=0x0ddb6988
ESP=0x663ee3c0, EBP=0x663eea58, ESI=0x0ddb60a0, EDI=0xffffffff
EIP=0x024c8266, EFLAGS=0x00010293
Top of Stack: (sp=0x663ee3c0)
0x663ee3c0: 0ddb4830 00000000 663eea58 024d2b44
0x663ee3d0: 0ddb1610 0ddb1610 663eea58 024d7368
0x663ee3e0: 0ddb4830 00000000 0ddb1610 0ddb6be0
0x663ee3f0: 22239910 0ddb1610 2223aff0 22239910
0x663ee400: 0ddb60a0 2223c338 2127c660 24bb6ca8
0x663ee410: 24c1f6e0 0ddb6070 00000002 0ddb6988
0x663ee420: 4ac892b3 0ddb1610 663eea58 024d5bf4
0x663ee430: 00000000 00000000 663eea58 024c5a0c
Instructions: (pc=0x024c8266)
0x024c8246: 0c 00 00 00 b8 01 00 00 00 8b f7 e9 5b 00 00 00
0x024c8256: 8b bc 24 80 00 00 00 8b 9c 24 84 00 00 00 3b 02
0x024c8266: f2 0f 10 82 ff ff ff 7f 66 0f 7e c6 66 0f 73 d0
0x024c8276: 20 66 0f 7e c0 8b d6 8b c8 03 d7 13 cb 89 34 24
Register to memory mapping:
EAX=0x00000002 is an unknown value
EBX=0xffffffff is an unknown value
ECX=0x00000001 is an unknown value
EDX=
[error occurred during error reporting (printing register info), id 0xc0000005]
Stack: [0x663a0000,0x663f0000], sp=0x663ee3c0, free space=312k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J org.infinispan.util.concurrent.jdk8backported.LongAdder.add(J)V
j org.infinispan.CacheImpl.put(Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;JLjava/util/concurrent/TimeUnit;Ljava/util/EnumSet;Ljava/lang/ClassLoader;)Ljava/lang/Object;+24
j org.infinispan.CacheImpl.put(Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object;+12
j org.infinispan.CacheSupport.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+17
j org.infinispan.AbstractDelegatingCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+6
j org.infinispan.spring.provider.SpringCache.put(Ljava/lang/Object;Ljava/lang/Object;)V+6
j com.hmtp.security.server.util.SessionTokenValidator.refreshAccessTime(Ljava/lang/String;)V+25
j com.hmtp.security.server.util.SessionTokenValidator.checkAndRenewToken([Ljava/lang/String;)Lorg/springframework/security/web/authentication/rememberme/PersistentRememberMeToken;+174
j com.hmtp.security.server.auth.MultiTenantRememberMeServices.processAutoLoginCookie([Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)Lorg/springframework/security/core/userdetails/UserDetails;+5
j org.springframework.security.web.authentication.rememberme.AbstractRememberMeServices.autoLogin(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)Lorg/springframework/security/core/Authentication;+64
j com.hmtp.common.security.server.BrownstoneRememberMeAuthenticationFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V+20
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
j org.springframework.security.web.FilterChainProxy.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V+40
j org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(Ljavax/servlet/Filter;Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V+5
j org.springframework.web.filter.DelegatingFilterProxy.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V+71
J org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
J org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
j org.apache.catalina.core.StandardContextValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+200
j org.jboss.as.web.session.ClusteredSessionValve.handleRequest(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;Lorg/jboss/servlet/http/HttpEvent;Z)V+61
j org.jboss.as.web.session.ClusteredSessionValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+26
j org.jboss.as.web.session.JvmRouteValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+33
j org.jboss.as.web.session.LockingValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+43
j org.jboss.as.web.security.SecurityContextAssociationValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+402
J org.apache.catalina.core.StandardHostValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V
j org.apache.catalina.core.StandardEngineValve.invoke(Lorg/apache/catalina/connector/Request;Lorg/apache/catalina/connector/Response;)V+42
j org.apache.catalina.connector.CoyoteAdapter.service(Lorg/apache/coyote/Request;Lorg/apache/coyote/Response;)V+188
j org.apache.coyote.http11.Http11Processor.process(Ljava/net/Socket;)Lorg/apache/tomcat/util/net/JIoEndpoint$Handler$SocketState;+349
j org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Ljava/net/Socket;)Lorg/apache/tomcat/util/net/JIoEndpoint$Handler$SocketState;+65
j org.apache.tomcat.util.net.JIoEndpoint$Worker.run()V+128
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
V [jvm.dll+0xfb88b]
V [jvm.dll+0x18d551]
V [jvm.dll+0xfba31]
V [jvm.dll+0xfba8b]
V [jvm.dll+0xb5e89]
V [jvm.dll+0x119b74]
V [jvm.dll+0x14217c]
C [msvcr71.dll+0x9565] endthreadex+0xa0
C [kernel32.dll+0x4ee6c] BaseThreadInitThunk+0x12
C [ntdll.dll+0x6399b] RtlInitializeExceptionChain+0xef
C [ntdll.dll+0x6396e] RtlInitializeExceptionChain+0xc2
VM state:synchronizing (normal execution)
VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event])
[0x00dd8360] Safepoint_lock - owner thread: 0x01a68c00
[0x00dd83c8] Threads_lock - owner thread: 0x01a68c00
Heap
def new generation total 400384K, used 175418K [0x03ae0000, 0x1ed50000, 0x1ed80000)
eden space 355904K, 47% used [0x03ae0000, 0x0df2c4f0, 0x19670000)
from space 44480K, 16% used [0x1c1e0000, 0x1c8e2540, 0x1ed50000)
to space 44480K, 0% used [0x19670000, 0x19670000, 0x1c1e0000)
tenured generation total 889536K, used 111605K [0x1ed80000, 0x55230000, 0x552e0000)
the space 889536K, 12% used [0x1ed80000, 0x25a7d7f0, 0x25a7d800, 0x55230000)
compacting perm gen total 102144K, used 102123K [0x552e0000, 0x5b6a0000, 0x652e0000)
the space 102144K, 99% used [0x552e0000, 0x5b69ad60, 0x5b69ae00, 0x5b6a0000)
No shared spaces configured.
Code Cache [0x01ae0000, 0x02cb8000, 0x03ae0000)
total_blobs=9369 nmethods=9146 adapters=163 free_code_cache=14876096 largest_free_block=384
VM Arguments:
jvm_args: -Dprogram.name=standalone.bat -Xms1303M -Xmx1303M -XX:MaxPermSize=256M -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -agentlib:jdwp=transport=dt_socket,address=8787,server=y,suspend=n -Dorg.jboss.boot.log.file=C:\jboss-eap-6.1\standalone\log\server.log -Dlogging.configuration=file:C:\jboss-eap-6.1\standalone/configuration/logging.properties
java_command: C:\jboss-eap-6.1\jboss-modules.jar -mp C:\jboss-eap-6.1\modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=C:\jboss-eap-6.1
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=C:\Program Files\Java\jdk1.6.0_45
PATH=C:\ProgramData\Oracle\Java\javapath;C:\Program Files\Java\jdk1.6.0_45\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\apache-maven-3.2.2\bin;C:\Program Files\TortoiseSVN\bin;D:\apache-ant-1.9.4\bin
USERNAME=neeraj.ar
OS=Windows_NT
PROCESSOR_IDENTIFIER=x86 Family 6 Model 23 Stepping 10, GenuineIntel
--------------- S Y S T E M ---------------
OS: Windows 7 Build 7601 Service Pack 1
CPU:total 2 (2 cores per cpu, 1 threads per core) family 6 model 23 stepping 10, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1
Memory: 4k page, physical 3338764k(929876k free), swap 15625020k(11901304k free)
vm_info: Java HotSpot(TM) Client VM (20.45-b01) for windows-x86 JRE (1.6.0_45-b06), built on Mar 26 2013 13:40:03 by "java_re" with MS VC++ 7.1 (VS2003)
time: Thu Jul 23 18:41:10 2015
elapsed time: 256 seconds
The crash happens in C1-compiled code on the instruction
f20f1082ffffff7f movsd xmm0, qword ptr [edx+0x7fffffff]
which represents a volatile long load that has not been patched.
I believe this is a JVM bug JDK-6965570 or its duplicate JDK-7004258.
It has been fixed a long ago in JDK 6u60, but you seem to use very old version of JDK.

MongoDB crashes on Map/Reduce

I have been using MongoDB as my primary storage for 1.5Tb+ of data since last year. Everything was fine, but recently i decided to execute some map-reduce against 14 000 000 documents collection and my production instance got down.
Please take a look at details:
My config:
Ubuntu 12.04.5 LTS, MongoDB 2.6.4, LVM (2 HDD, 1.5TB+ free of 3TB+ total), 24GB RAM (almost all free)
Mongo config is default (except logpath and dbpath parameters)
Mongo log:
2014-08-28T07:33:41.147+0400 [DataFileSync] flushing mmaps took 16177ms for 777 files
2014-08-28T07:33:44.004+0400 [conn13] M/R: (1/3) Emit Progress: 9920300
2014-08-28T07:33:47.178+0400 [conn13] M/R: (1/3) Emit Progress: 9928100
2014-08-28T07:33:50.004+0400 [conn13] M/R: (1/3) Emit Progress: 9967800
2014-08-28T07:33:53.115+0400 [conn13] M/R: (1/3) Emit Progress: 10007800
2014-08-28T07:33:56.009+0400 [conn13] M/R: (1/3) Emit Progress: 10048800
2014-08-28T07:33:59.050+0400 [conn13] M/R: (1/3) Emit Progress: 10091200
2014-08-28T07:34:02.530+0400 [conn13] M/R: (1/3) Emit Progress: 10102300
2014-08-28T07:34:05.510+0400 [conn13] M/R: (1/3) Emit Progress: 10102400
2014-08-28T07:34:08.932+0400 [conn13] SEVERE: Invalid access at address: 0x7cc8b2fe70b4
2014-08-28T07:34:08.983+0400 [conn13] SEVERE: Got signal: 7 (Bus error).
Backtrace:0x11e6111 0x11e54ee 0x11e55df 0x7f5a7031ecb0 0xf29cad 0xf32f28 0xf32770 0x8b601f 0x8b693a 0x982885 0x988485 0x9966d8 0x9a3355 0xa2889a 0xa29ce2 0xa2bea6 0xd5dd6d 0xb9fe62 0xba1440 0x770aef
mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
mongod() [0x11e54ee]
mongod() [0x11e55df]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f5a7031ecb0]
mongod(_ZN5mongo16NamespaceDetails5allocEPNS_10CollectionERKNS_10StringDataEi+0x1bd) [0xf29cad]
mongod(_ZN5mongo19SimpleRecordStoreV111allocRecordEii+0x68) [0xf32f28]
mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPKcii+0x60) [0xf32770]
mongod(_ZN5mongo10Collection15_insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x7f) [0x8b601f]
mongod(_ZN5mongo10Collection14insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x22a) [0x8b693a]
mongod(_ZN5mongo2mr5State12_insertToIncERNS_7BSONObjE+0x85) [0x982885]
mongod(_ZN5mongo2mr5State14reduceInMemoryEv+0x175) [0x988485]
mongod(_ZN5mongo2mr5State35reduceAndSpillInMemoryStateIfNeededEv+0x148) [0x9966d8]
mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xcc5) [0x9a3355]
mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2889a]
mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1042) [0xa29ce2]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa2bea6]
mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x22ed) [0xd5dd6d]
mongod() [0xb9fe62]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x580) [0xba1440]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x770aef]
After my first run of that map-reduce, i made db.repairDatabase(), but after second attempt to map-reduce (after repairing) the same crash happened again. Now, i have no idea how to get my m/r done
Any ideas, folks?
Having issue investigated, i recently came up with a couple of things:
As it was suggested in comments, i took a look at mongo jira ticket SERVER-12849
and double checked my logs.
/var/log/syslog says:
kernel: [1349503.760215] ata6.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
Aug 28 08:18:41 overlord kernel: [1349503.760253] ata6.00: irq_stat 0x40000008
Aug 28 08:18:41 overlord kernel: [1349503.760281] ata6.00: failed command: READ FPDMA QUEUED
Aug 28 08:18:41 overlord kernel: [1349503.760318] ata6.00: cmd 60/08:00:10:48:92/00:00:84:00:00/40 tag 0 ncq 4096 in
Aug 28 08:18:41 overlord kernel: [1349503.760318] res 41/40:08:10:48:92/00:00:84:00:00/00 Emask 0x409 (media error)
Aug 28 08:18:41 overlord kernel: [1349503.760411] ata6.00: status: { DRDY ERR }
Aug 28 08:18:41 overlord kernel: [1349503.760437] ata6.00: error: { UNC }
Aug 28 08:18:41 overlord kernel: [1349503.788325] ata6.00: configured for UDMA/133
Aug 28 08:18:41 overlord kernel: [1349503.788340] sd 5:0:0:0: [sdb] Unhandled sense code
Aug 28 08:18:41 overlord kernel: [1349503.788343] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788345] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 28 08:18:41 overlord kernel: [1349503.788348] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788350] Sense Key : Medium Error [current] [descriptor]
Aug 28 08:18:41 overlord kernel: [1349503.788353] Descriptor sense data with sense descriptors (in hex):
Aug 28 08:18:41 overlord kernel: [1349503.788355] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Aug 28 08:18:41 overlord kernel: [1349503.788365] 84 92 48 10
Aug 28 08:18:41 overlord kernel: [1349503.788370] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788373] Add. Sense: Unrecovered read error - auto reallocate failed
Aug 28 08:18:41 overlord kernel: [1349503.788376] sd 5:0:0:0: [sdb] CDB:
Aug 28 08:18:41 overlord kernel: [1349503.788377] Read(10): 28 00 84 92 48 10 00 00 08 00
Aug 28 08:18:41 overlord kernel: [1349503.788387] end_request: I/O error, dev sdb, sector 2224179216
Aug 28 08:18:41 overlord kernel: [1349503.788434] ata6: EH complete
looks like /dev/sdb is culprit, let's check SMART status (as suggested in jira)
SMART Error Log Version: 1
ATA Error Count: 135 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 135 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 49d+12:01:35.512 WRITE FPDMA QUEUED
60 00 08 ff ff ff 4f 00 49d+12:01:33.380 READ FPDMA QUEUED
ea 00 00 00 00 00 a0 00 49d+12:01:33.294 FLUSH CACHE EXT
61 00 00 ff ff ff 4f 00 49d+12:01:33.292 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 00 49d+12:01:33.153 FLUSH CACHE EXT
Error 134 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 49d+11:17:00.189 WRITE FPDMA QUEUED
61 00 10 ff ff ff 4f 00 49d+11:17:00.189 WRITE FPDMA QUEUED
61 00 28 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
Error 133 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
so, as we can see there are errors on /dev/sdb, let's do the final check - copy entire data to another host and try to run original map-reduce script there.
Result is success.
So mongo is ok in my case. It seems (Bus Error) log entries in mongo log signal that it is time to check your hardware.

Sending a trap with Perl's Net::SNMP

I'm trying to send a trap as part of a larger Perl script. I've copied the trapsending code to another file, and am running it by itself. The code seems to think the trap sends successfully, yet I'm not seeing the trap on either machine that I have a trap listener running on.
Here's the code:
#! /usr/local/bin/perl
use strict;
use warnings;
use Net::SNMP;
#messy hardcoding
my $snmp_target = '192.168.129.50';
#my $snmp_target = '10.200.6.105'; # Server running trap listener
my $enterprise = '1.3.6.1.4.1.27002.1';
my ($sess, $err) = Net::SNMP->session(
-hostname => $snmp_target,
-version => 1, #trap() requires v1
);
if (!defined $sess) {
print "Error connecting to target ". $snmp_target . ": ". $err;
next;
}
my #vars = qw();
my $varcounter = 1;
push (#vars, $enterprise . '.' . $varcounter);
push (#vars, OCTET_STRING);
push (#vars, "Test string");
my $result = $sess->trap(
-varbindlist => \#vars,
-enterprise => $enterprise,
-specifictrap => 1,
);
if (! $result)
{
print "An error occurred sending the trap: " . $sess->error();
}
EDIT: Added $sess->debug(255) call, here's the output:
debug: [440] Net::SNMP::Dispatcher::_event_insert(): created new head and tail [ARRAY(0x1af1fea8)]
debug: [687] Net::SNMP::Message::send(): transport address 192.168.129.50:161
debug: [2058] Net::SNMP::Message::_buffer_dump(): 70 bytes
[0000] 30 44 02 01 00 04 06 70 75 62 6C 69 63 A4 37 06 0D.....public.7.
[0016] 09 2B 06 01 04 01 81 D2 7A 01 40 04 C0 A8 81 85 .+......z.#.....
[0032] 02 01 06 02 01 01 43 01 00 30 1B 30 19 06 0A 2B ......C..0.0...+
[0048] 06 01 04 01 81 D2 7A 01 01 04 0B 54 65 73 74 20 ......z....Test
[0064] 73 74 72 69 6E 67 string
debug: [517] Net::SNMP::Dispatcher::_event_delete(): deleted [ARRAY(0x1af1fea8)], list is now empty
EDIT: Can anyone running a trap listener try this code on their machine and let me know if it works?
EDIT: Tried it from my MBP. Same result. Then noticed that the debug info says it is sending to port 161. Forced -port => 162 parameter, and it works. That leaves me with a couple of questions:
Why does the trap sender default to 161?
I get this error when I run with debug on. What does it mean?
error: [97] Net::SNMP::Transport::IPv4::UDP::agent_addr(): Failed to disconnect: Address family not supported by protocol family
Fixed by changing 'Port' setting from default 161 to 162.