MongoDB crashes on Map/Reduce - mongodb

I have been using MongoDB as my primary storage for 1.5Tb+ of data since last year. Everything was fine, but recently i decided to execute some map-reduce against 14 000 000 documents collection and my production instance got down.
Please take a look at details:
My config:
Ubuntu 12.04.5 LTS, MongoDB 2.6.4, LVM (2 HDD, 1.5TB+ free of 3TB+ total), 24GB RAM (almost all free)
Mongo config is default (except logpath and dbpath parameters)
Mongo log:
2014-08-28T07:33:41.147+0400 [DataFileSync] flushing mmaps took 16177ms for 777 files
2014-08-28T07:33:44.004+0400 [conn13] M/R: (1/3) Emit Progress: 9920300
2014-08-28T07:33:47.178+0400 [conn13] M/R: (1/3) Emit Progress: 9928100
2014-08-28T07:33:50.004+0400 [conn13] M/R: (1/3) Emit Progress: 9967800
2014-08-28T07:33:53.115+0400 [conn13] M/R: (1/3) Emit Progress: 10007800
2014-08-28T07:33:56.009+0400 [conn13] M/R: (1/3) Emit Progress: 10048800
2014-08-28T07:33:59.050+0400 [conn13] M/R: (1/3) Emit Progress: 10091200
2014-08-28T07:34:02.530+0400 [conn13] M/R: (1/3) Emit Progress: 10102300
2014-08-28T07:34:05.510+0400 [conn13] M/R: (1/3) Emit Progress: 10102400
2014-08-28T07:34:08.932+0400 [conn13] SEVERE: Invalid access at address: 0x7cc8b2fe70b4
2014-08-28T07:34:08.983+0400 [conn13] SEVERE: Got signal: 7 (Bus error).
Backtrace:0x11e6111 0x11e54ee 0x11e55df 0x7f5a7031ecb0 0xf29cad 0xf32f28 0xf32770 0x8b601f 0x8b693a 0x982885 0x988485 0x9966d8 0x9a3355 0xa2889a 0xa29ce2 0xa2bea6 0xd5dd6d 0xb9fe62 0xba1440 0x770aef
mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
mongod() [0x11e54ee]
mongod() [0x11e55df]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f5a7031ecb0]
mongod(_ZN5mongo16NamespaceDetails5allocEPNS_10CollectionERKNS_10StringDataEi+0x1bd) [0xf29cad]
mongod(_ZN5mongo19SimpleRecordStoreV111allocRecordEii+0x68) [0xf32f28]
mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPKcii+0x60) [0xf32770]
mongod(_ZN5mongo10Collection15_insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x7f) [0x8b601f]
mongod(_ZN5mongo10Collection14insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x22a) [0x8b693a]
mongod(_ZN5mongo2mr5State12_insertToIncERNS_7BSONObjE+0x85) [0x982885]
mongod(_ZN5mongo2mr5State14reduceInMemoryEv+0x175) [0x988485]
mongod(_ZN5mongo2mr5State35reduceAndSpillInMemoryStateIfNeededEv+0x148) [0x9966d8]
mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0xcc5) [0x9a3355]
mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2889a]
mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1042) [0xa29ce2]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa2bea6]
mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x22ed) [0xd5dd6d]
mongod() [0xb9fe62]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x580) [0xba1440]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x770aef]
After my first run of that map-reduce, i made db.repairDatabase(), but after second attempt to map-reduce (after repairing) the same crash happened again. Now, i have no idea how to get my m/r done
Any ideas, folks?

Having issue investigated, i recently came up with a couple of things:
As it was suggested in comments, i took a look at mongo jira ticket SERVER-12849
and double checked my logs.
/var/log/syslog says:
kernel: [1349503.760215] ata6.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
Aug 28 08:18:41 overlord kernel: [1349503.760253] ata6.00: irq_stat 0x40000008
Aug 28 08:18:41 overlord kernel: [1349503.760281] ata6.00: failed command: READ FPDMA QUEUED
Aug 28 08:18:41 overlord kernel: [1349503.760318] ata6.00: cmd 60/08:00:10:48:92/00:00:84:00:00/40 tag 0 ncq 4096 in
Aug 28 08:18:41 overlord kernel: [1349503.760318] res 41/40:08:10:48:92/00:00:84:00:00/00 Emask 0x409 (media error)
Aug 28 08:18:41 overlord kernel: [1349503.760411] ata6.00: status: { DRDY ERR }
Aug 28 08:18:41 overlord kernel: [1349503.760437] ata6.00: error: { UNC }
Aug 28 08:18:41 overlord kernel: [1349503.788325] ata6.00: configured for UDMA/133
Aug 28 08:18:41 overlord kernel: [1349503.788340] sd 5:0:0:0: [sdb] Unhandled sense code
Aug 28 08:18:41 overlord kernel: [1349503.788343] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788345] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 28 08:18:41 overlord kernel: [1349503.788348] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788350] Sense Key : Medium Error [current] [descriptor]
Aug 28 08:18:41 overlord kernel: [1349503.788353] Descriptor sense data with sense descriptors (in hex):
Aug 28 08:18:41 overlord kernel: [1349503.788355] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Aug 28 08:18:41 overlord kernel: [1349503.788365] 84 92 48 10
Aug 28 08:18:41 overlord kernel: [1349503.788370] sd 5:0:0:0: [sdb]
Aug 28 08:18:41 overlord kernel: [1349503.788373] Add. Sense: Unrecovered read error - auto reallocate failed
Aug 28 08:18:41 overlord kernel: [1349503.788376] sd 5:0:0:0: [sdb] CDB:
Aug 28 08:18:41 overlord kernel: [1349503.788377] Read(10): 28 00 84 92 48 10 00 00 08 00
Aug 28 08:18:41 overlord kernel: [1349503.788387] end_request: I/O error, dev sdb, sector 2224179216
Aug 28 08:18:41 overlord kernel: [1349503.788434] ata6: EH complete
looks like /dev/sdb is culprit, let's check SMART status (as suggested in jira)
SMART Error Log Version: 1
ATA Error Count: 135 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 135 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 49d+12:01:35.512 WRITE FPDMA QUEUED
60 00 08 ff ff ff 4f 00 49d+12:01:33.380 READ FPDMA QUEUED
ea 00 00 00 00 00 a0 00 49d+12:01:33.294 FLUSH CACHE EXT
61 00 00 ff ff ff 4f 00 49d+12:01:33.292 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 00 49d+12:01:33.153 FLUSH CACHE EXT
Error 134 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 08 ff ff ff 4f 00 49d+11:17:00.189 WRITE FPDMA QUEUED
61 00 10 ff ff ff 4f 00 49d+11:17:00.189 WRITE FPDMA QUEUED
61 00 28 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
61 00 08 ff ff ff 4f 00 49d+11:17:00.188 WRITE FPDMA QUEUED
Error 133 occurred at disk power-on lifetime: 11930 hours (497 days + 2 hours)
When the command that caused the error occurred, the device was active or idle.
so, as we can see there are errors on /dev/sdb, let's do the final check - copy entire data to another host and try to run original map-reduce script there.
Result is success.
So mongo is ok in my case. It seems (Bus Error) log entries in mongo log signal that it is time to check your hardware.

Related

Enabling 802.11w mode with hostapd

I'm trying to setup a WiFi Access Point with a Raspberry Pi 3B+ having 802.11w enabled.
Kernel version: Linux efb-ap-0 4.19.66-Re4son-v7+ #1 SMP Sun Aug 18 22:25:39 AEST 2019 armv7l GNU/Linux
Driver: brcmfmac
hostapd (Deb package): 2:2.9-1 armel
During the 4-Way Handshake, wpa_supplicant immediatly disconnects at the 3/4 msg, with following logs:
wlan0: WPA: IE in 3/4 msg does not match with IE in Beacon/ProbeResp (src=b8:27:eb:3b:3f:0e)
WPA: RSN IE in Beacon/ProbeResp - hexdump(len=28): 30 1a 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 06 c0 00 00 00 00 0f ac 06
WPA: RSN IE in 3/4 msg - hexdump(len=26): 30 18 01 00 00 0f ac 04 01 00 00 0f ac 04 02 00 00 0f ac 02 00 0f ac 06 c0 00
Comparing 3/4 msg hexdump and Beacon hexdump via Wireshark shows that the Beacon contains the following additional fields that are not in the 3/4 msg: PMKID Count (0x00 00)+ PMKID List + Group Management Cipher Suite
(0x00 0f ac 06).
Why is the 3/4 msg not matching the Beacon ? Is this an issue in hostapd ? in driver ? in hostapd<->driver communication ?
Thanks for any information about that.
You can find below the hostapd.conf content:
interface=wlan0
driver=nl80211
logger_syslog=-1
logger_syslog_level=2
auth_algs=1
wpa_pairwise=CCMP
rsn_pairwise=CCMP
wpa=2
hw_mode=g
ieee80211w=2
ssid=XXXXXXXXXX
channel=1
wpa_key_mgmt=WPA-PSK-SHA256
wpa_passphrase=XXXXXXXXXX
And the wpa_supplicant.conf used to connect:
ctrl_interface=DIR=/var/run/
network={
ssid="XXXXXXXX"
proto=RSN
scan_ssid=1
key_mgmt=WPA-PSK-SHA256
pairwise=CCMP
psk="XXXXXXXX"
ieee80211w=2
}
Note: this thread is a duplicate from a message I had posted on hostap mailing list for which I didn't have answer: http://lists.infradead.org/pipermail/hostap/2019-November/040764.html

Unrecognized status byte in Midi file

I've been working on Midi file for some time and I stuck on some kind of status byte of thing. According to the standard Midi file format there is no such a things. So, Can someone tell what is this 3 bytes information "00 a040". I know that "00" is the byte stands for delta time and 0xa0 should be status byte, If only I understood it correctly. Last 3 bytes located at line 18 is the only part I don't understand so far. After those 3 bytes, then comes the text meta event bytes lead by "00 ff01".
Midi File Line 18th to 19th:
ff 51 03 09 cc 90 00 c0 00 00 b0 07 64 00 0a 40
00 ff 01 20 62 64 63 61 34 32 36 64 31 30 34 61
The SMF specification says:
Running status is used: status bytes of MIDI channel messages may be omitted if the preceding event is a MIDI channel message with the same status.
So these bytes can be decoded as follows:
ff 51 03 09 cc 90: meta event: set tempo, 9CC90h = 642192 µs per quarter note
00: delta time
c0 00: set program 0 (piano) on channel 0
00: delta time
b0 07 64: set controller 7 (volumn) to value 100
00: delta time
  0a 40: running status (repeat B0h); set controller 10 (expression) to value 64
00: delta time
ff 01 20 ...: meta event: text: "bdca426d104a..."

MimeMultipart count is zero when an email is read using JavaMail

My application sends an email to an Exchange mail server, mail server is configured with a third party application where it routes email to agent and agent replies to that email. Application reads agent reply from the mailbox which is used to send the email.
Email sending code is below;
Message mimeMessage = new MimeMessage(session);
mimeMessage.setFrom(new InternetAddress(from));
mimeMessage.addRecipient(Message.RecipientType.TO, new InternetAddress(to));
mimeMessage.setSubject(subject);
mimeMessage.setContent(emailText,"text/plain");
mimeMessage.setReplyTo(replyToAddress);
Transport.send(mimeMessage);
This works perfectly. When agent reply is received, Application read it as;
if (message.isMimeType("multipart/MIXED")) {
logger.info("Email MIME Type is: multipart/MIXED");
MimeMultipart multipart =(MimeMultipart)message.getContent();
logger.info("Content type = "+multipart.getContentType());
int count = multipart.getCount();
}
The content type is "multipart/mixed" but the count is 0 means there are no parts in this emails.
I need to set System property,
System.setProperty("mail.mime.multipart.allowempty", "true");
if it is not set, multipart.getCount() throws "missingBoundryException".
Why it is so ?
I can see that the agent's reply is not empty.
The email was sent with content type as text/plain, why reply type is multipart/mixed?
Is this due to any invalid formatting of email by third party application, what is the workaround?
Below is the snap of agent reply.
Below is the raw MIME content,
Received: from sociaminer.host (192.168.1.29) by thirdpartHost
(192.168.1.53) with Microsoft SMTP Server (TLS) id 14.1.218.12; Thu, 19 Jan
2017 17:06:26 +0500
To: hafiz <hafiz#bla.bla>
Message-ID: <hassan.MESSAGEID#bla.bla>
In-Reply-To: <CF72F94#bla.bla>
References: <CF72F945A#bla.bla>
Subject: Re: 1122+50
Content-Type: multipart/mixed;
boundary="----=_Part_127_14151461.1484827604583"
From: <reply#bla.bla>
Return-Path: reply#bla.bla
Date: Thu, 19 Jan 2017 17:06:26 +0500
X-MS-Exchange-Organization-AuthSource: bla.bla
X-MS-Exchange-Organization-AuthAs: Internal
X-MS-Exchange-Organization-AuthMechanism: 06
X-Originating-IP: [SocialMinerIP]
MIME-Version: 1.0
------=_Part_127_14151461.1484827604583
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">Reply to 50<br>
<blockquote><hr>
<b>From:</b> hafiz <hafiz#bla.bla><br><b>Sent:</b> Thursday, January 19, 2017 5:05 PM<br><b>To:</b> testing2 <testing2#bla.bla><br><b>Subject:</b> 1122+50<br>
<html dir="ltr">
<head>
<style type="text/css" id="owaParaStyle"></style>
</head>
<body fpstyle="1" ocsi="0">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Testing 50</div>
</body>
</html>
</blockquote>
------=_Part_127_14151461.1484827604583--
JavaMail debug output looks like below,
DEBUG: setDebug: JavaMail version 1.4.7
DEBUG: getProvider() returning javax.mail.Provider[STORE,imap,com.sun.mail.imap.IMAPStore,Oracle]
DEBUG IMAP: mail.imap.fetchsize: 16384
DEBUG IMAP: mail.imap.ignorebodystructuresize: false
DEBUG IMAP: mail.imap.statuscachetimeout: 1000
DEBUG IMAP: mail.imap.appendbuffersize: -1
DEBUG IMAP: mail.imap.minidletime: 10
DEBUG IMAP: disable AUTH=PLAIN
DEBUG IMAP: enable STARTTLS
DEBUG IMAP: trying to connect to host "Echange IP", port 143, isSSL false
* OK The Microsoft Exchange IMAP4 service is ready.
A0 CAPABILITY
* CAPABILITY IMAP4 IMAP4rev1 LOGINDISABLED STARTTLS UIDPLUS CHILDREN IDLE NAMESPACE LITERAL+
A0 OK CAPABILITY completed.
DEBUG IMAP: protocolConnect login, host=192.168.1.53, user=hafiz#bla.bla, password=<non-null>
A1 STARTTLS
A1 OK Begin TLS negotiation now.
A2 CAPABILITY
* CAPABILITY IMAP4 IMAP4rev1 AUTH=NTLM AUTH=GSSAPI AUTH=PLAIN UIDPLUS CHILDREN IDLE NAMESPACE LITERAL+
A2 OK CAPABILITY completed.
DEBUG IMAP: AUTH: NTLM
DEBUG IMAP: AUTH: GSSAPI
DEBUG IMAP: AUTH: PLAIN
DEBUG IMAP: AUTHENTICATE NTLM command trace suppressed
DEBUG NTLM: type 1 message: 4E 54 4C 4D 53 53 50 00 01 00 00 00 03 A2 00 00 00 00 00 00 23 00 00 00 03 00 03 00 20 00 00 00 31 39 32
DEBUG NTLM: type 3 message: 4E 54 4C 4D 53 53 50 00 03 00 00 00 18 00 18 00 68 00 00 00 18 00 18 00 80 00 00 00 00 00 00 00 40 00 00 00 22 00 22 00 40 00 00 00 06 00 06 00 62 00 00 00 00 00 00 00 98 00 00 00 01 82 00 00 68 00 61 00 66 00 69 00 7A 00 40 00 65 00 66 00 6C 00 61 00 62 00 2E 00 6C 00 6F 00 63 00 61 00 6C 00 31 00 39 00 32 00 3B 5E 2B 86 67 49 E3 01 C9 9E F2 CA ED 54 21 11 81 89 94 C6 EC E0 26 E3 BA DB E7 5A F4 CA 28 17 7C 0E 8A 08 18 B5 5A 4E 72 4F C5 7F 52 64 FA 76
DEBUG IMAP: AUTHENTICATE NTLM command result: A3 OK AUTHENTICATE completed.
A4 CAPABILITY
* CAPABILITY IMAP4 IMAP4rev1 AUTH=NTLM AUTH=GSSAPI AUTH=PLAIN UIDPLUS CHILDREN IDLE NAMESPACE LITERAL+
A4 OK CAPABILITY completed.
DEBUG IMAP: AUTH: NTLM
DEBUG IMAP: AUTH: GSSAPI
DEBUG IMAP: AUTH: PLAIN
DEBUG IMAP: connection available -- size: 1
A5 SELECT INBOX
* 40 EXISTS
* 0 RECENT
* FLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)
* OK [PERMANENTFLAGS (\Seen \Answered \Flagged \Deleted \Draft $MDNSent)] Permanent flags
* OK [UNSEEN 39] Is the first unseen message
* OK [UIDVALIDITY 436] UIDVALIDITY value
* OK [UIDNEXT 46] The next unique identifier value
A5 OK [READ-WRITE] SELECT completed.
A6 SEARCH UNSEEN ALL
* SEARCH 39
A6 OK SEARCH completed.
A7 SEARCH UNSEEN ALL
* SEARCH 39
A7 OK SEARCH completed.
main INFO emailToSms.EmailReader - 1 unread emails read from inbox.
A8 STORE 39 +FLAGS (\Seen)
* 39 FETCH (FLAGS (\Seen))
A8 OK STORE completed.
A9 FETCH 39 (BODY.PEEK[HEADER])
* 39 FETCH (BODY[HEADER] {851}
MIME-Version: 1.0
Received: from HOST (IP) by HOST
(192.168.1.53) with Microsoft SMTP Server (TLS) id 14.1.218.12; Thu, 19 Jan
2017 17:06:26 +0500
To: hafiz <hafiz#bla.bla>
Message-ID: <hassan.B69E3DD110000159000004A73F57FEE3.1484827604448.cisco-ccp#bla.bla>
In-Reply-To: <CF72F945A1ED2E438A53A11DA9415F65A0E981#Expert.bla.bla>
References: <CF72F945A1ED2E438A53A11DA9415F65A0E981#Expert.bla.bla>
Subject: Re: 1122+50
Content-Type: multipart/mixed;
boundary="----=_Part_127_14151461.1484827604583"
From: <testing2#bla.bla>
Return-Path: testing2#bla.bla
Date: Thu, 19 Jan 2017 17:06:26 +0500
X-MS-Exchange-Organization-AuthSource: Expert.bla.bla
X-MS-Exchange-Organization-AuthAs: Internal
X-MS-Exchange-Organization-AuthMechanism: 06
X-Originating-IP: [IP]
)
A9 OK FETCH completed.
A10 FETCH 39 (ENVELOPE INTERNALDATE RFC822.SIZE)
* 39 FETCH (ENVELOPE ("Thu, 19 Jan 2017 17:06:26 +0500" "Re: 1122+50" ((NIL NIL "testing2" "bla.bla")) NIL NIL (("hafiz" NIL "hafiz" "bla.bla")) NIL NIL "<CF72F945A1ED2E438A53A11DA9415F65A0E981#Expert.bla.bla>" "<hassan.B69E3DD110000159000004A73F57FEE3.1484827604448.cisco-ccp#bla.bla>") INTERNALDATE "19-Jan-2017 17:06:26 +0500" RFC822.SIZE 1250)
A10 OK FETCH completed.
A11 FETCH 39 (BODYSTRUCTURE)
* 39 FETCH (BODYSTRUCTURE ("multipart" "mixed" ("boundary" "----=_Part_127_14151461.1484827604583") NIL NIL 7BIT 0 NIL NIL NIL NIL))
A11 OK FETCH completed.
DEBUG IMAP: IMAPProtocol noop
A12 NOOP
A12 OK NOOP completed.
This is a bug in Microsoft Exchange. Report this bug to Microsoft and upgrade to a newer version or newer service pack if possible in case they've already fixed it.
Exchange is returning the BODYSTRUCTURE information for the message as if it were a single part message when in fact it is a multipart message. This is a violation of the IMAP protocol spec.
You can use the workaround in the JavaMail FAQ.
Also, you might want to upgrade to a newer version of JavaMail - 1.4.7 is pretty old, the current version is 1.5.6.

APDU: "Conditions of use not satisfied" (69 85) while calculate signature

With a smart card Gemalto (IAS ECC), I would to calculate a signature by using private key stored on smart card. For this, I use APDU commands:
// Verify PIN
00 20 00 01 04 31 32 33 34
-> 90 00
// Create a context for security operation
00 22 41 B6 06 84 01 84 80 01 12
-> 90 00
// Set the hash of the document
00 2A 90 A0 14 HASH OF DOCUMENT
-> 69 85
// Calculating the signature
00 2A 9E 9A 80
-> 69 85
My problem is the following: the las two commands return the error code "69 85", meaning "Conditions of use not satisfied".
I have already tried several solutions, but I obtain always the same error. How to resolve it? What does this code can mean?
After some tests, I discovered something interesting. When I replace cla "00" by "10", smart card returns a different response:
// Create a context for security operation
00 22 41 B6 06 84 01 84 80 01 12
// Verify PIN
00 20 00 01 04 31 32 33 34
// Calculating the signature (I replace "00" by "10")
10 2A 9E 9A 23 30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 12 13 14 15
I don't know if it's the good solution because smart card returns "90 00". But, it would return the content of my signature!
Thank you for your help!
Best regards
You are getting SW 6985 for
// Set the hash of the document
00 2A 90 A0 14 HASH OF DOCUMENT
-> 69 85
Since you have not set the correct context in current security environment.
Let me explain this below
First you performed VERIFY PIN command which was successful
// Verify PIN
00 20 00 01 04 31 32 33 34
-> 90 00
Then you performed MSE SET command,Where you set the security context.For this you have to understood how SE works(Please refer to section 3.5 fron IAS ECC v1.01).
At the time of personalisation, the Personaliser agent create SDO(Secure Data Object) inside the card.The reference to this SDO are mentioned in SE(Security Environment) in form of CRT(Control reference template).
// Create a context for security operation
00 22 41 B6 06 84 01 84 80 01 12
-> 90 00
Generally speaking, MSE SET command will always return SW 900 even if the SDO reference is wrong. Since it only return SW 6A80 when the template is wrong not when the reference is wrong.(The SDO reference is passed in tag 84)
After that you performed PSO HASH command
// Set the hash of the document
00 2A 90 A0 14 HASH OF DOCUMENT
-> 69 85
where the card return SW 6985(Condition of use not satisfied), This indicate the algorithm and SDO reference used for calculating Hash may wrong. Which is probably happening since the SDO reference which was sent during the time of MSE SET command is not available
Detecting error coming from MSE SET could be tricky since it return SW 9000.
For these type of situation you have to check the personalisation file carefully and need to match the MSE SET command with regard to SDO reference and supported ALGOs.
It may be useful to put the default context (e.g., cryptographic algorithms or
security operations) into the current SE in order to have few exchanges of MSE set commands.

!mlocks hung interpretation help needed

i try to investigate a hung with windbg and want to know if my assumptions are right. If I call the command
!mlocks i got the following:
0:000> !mlocks
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
ClrThread DbgThread OsThread LockType Lock LockLevel
--------------------------------------------------------------------------
0x640064 -1 0xffffffff RWLock 000000000339a338 Writer
0x6 7 0x1ea8 thinlock 000000000343ddd8 (recursion:0)
When executeing rwlocks i got the following:
0:000> !rwlock 000000000339a338
WriterThread: 0x640064 (DEAD) WriterLevel: 115 WaitingWriterCount: 0
WriterEvent: 0
WaitingWriterThreadIds: None
ReaderCount: 116
CurrentReaderThreadIds:
WaitingReaderCount: 576
ReaderEvent: 80400002
WaitingReaderThreadIds:
*This lock has 116 orphaned reader locks.
0:007> !rwlock
Address ReaderCount WaitingReaderCount WriterThread WaitingWriterCount
...
000000000339a338 116 576 0x640064 0
...
00000000053f0688 568 499 -- 6
...
i got this.
When I call
0:000> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock instances... Scanning for holders of ReaderWriterLock locks... Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
Scanning for threads waiting on CriticalSections...
No deadlocks detected.
then no deadlock will be detected.
I found this on Tess's blog
My question is if this threads have to do with my hunging application,
and what the scenario could be.
What does it mean that the thread is DEAD. Can a lock happen on a dead thread? Or do I have to find somewhere else root cause of my hunging application?
Please help me to interpret this output.
Regards,
Bernhard
Here is some addional info:
0:000> dc 000000000339a338
00000000`0339a338 f2073268 000007fe 00000000 80000000 h2..............
00000000`0339a348 f2066960 000007fe 00000010 00650047 `i..........G.e.
00000000`0339a358 00480074 0073006f 00410074 00640064 t.H.o.s.t.A.d.d.
00000000`0339a368 00650072 00730073 00730065 00000000 r.e.s.s.e.s.....
00000000`0339a378 00000000 00000000 00000000 00000000 ................
00000000`0339a388 f2066960 000007fe 0000001c 00650066 `i..........f.e.
00000000`0339a398 00300038 003a003a 00380034 00310038 8.0.:.:.4.8.8.1.
00000000`0339a3a8 0034003a 00660039 003a0063 00300039 :.4.9.f.c.:.9.0.
0:000> db 000000000339a338
00000000`0339a338 68 32 07 f2 fe 07 00 00-00 00 00 00 00 00 00 80 h2..............
00000000`0339a348 60 69 06 f2 fe 07 00 00-10 00 00 00 47 00 65 00 `i..........G.e.
00000000`0339a358 74 00 48 00 6f 00 73 00-74 00 41 00 64 00 64 00 t.H.o.s.t.A.d.d.
00000000`0339a368 72 00 65 00 73 00 73 00-65 00 73 00 00 00 00 00 r.e.s.s.e.s.....
00000000`0339a378 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000000`0339a388 60 69 06 f2 fe 07 00 00-1c 00 00 00 66 00 65 00 `i..........f.e.
00000000`0339a398 38 00 30 00 3a 00 3a 00-34 00 38 00 38 00 31 00 8.0.:.:.4.8.8.1.
00000000`0339a3a8 3a 00 34 00 39 00 66 00-63 00 3a 00 39 00 30 00 :.4.9.f.c.:.9.0.
0:000> !mdt 000000000339a338
Can't get name for module 000007ff0068c3c0. Error = 0x80070057.
Can't get name for module 000007ff00791908. Error = 0x80070057.
Can't get name for module 000007ff0068c3c0. Error = 0x80070057.
Can't get name for module 000007ff00791908. Error = 0x80070057.
000000000339a338 (System.Threading.ReaderWriterLock)
_hWriterEvent:8000000000000000 (System.IntPtr)
_hReaderEvent:000007fef2066960 (System.IntPtr)
_hObjectHandle:0065004700000010 (System.IntPtr)
_dwState:0x480074 (System.Int32)
_dwULockID:0x73006f (System.Int32)
_dwLLockID:0x410074 (System.Int32)
_dwWriterID:0x640064 (System.Int32)
_dwWriterSeqNum:0x650072 (System.Int32)
_wWriterLevel:0x0073 (System.Int16)
0:000> !mdt 000000000343ddd8
Can't get name for module 000007ff0068c3c0. Error = 0x80070057.
Can't get name for module 000007ff00791908. Error = 0x80070057.
000000000343ddd8 (System.Collections.Generic.LinkedList`1[[TAU.GuiAccess.PopupHandler.ClientInfo, TAU.GuiAccess.PopupHandler]])
head:000000000823e148 (System.Collections.Generic.LinkedListNode`1[[TAU.GuiAccess.PopupHandler.ClientInfo, TAU.GuiAccess.PopupHandler]])
count:0x1 (System.Int32)
version:0x3b (System.Int32)
_syncRoot:NULL (System.Object)
siInfo:NULL (System.Runtime.Serialization.SerializationInfo)
0:000> !do 000000000343ddd8
Name: System.Collections.Generic.LinkedList`1[[TAU.GuiAccess.PopupHandler.ClientInfo, TAU.GuiAccess.PopupHandler]]
MethodTable: 000007ff009ce218
EEClass: 000007ff009dda20
Size: 48(0x30) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System\v4.0_4.0.0.0__b77a5c561934e089\System.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007ff009ced28 4000586 8 ...ss.PopupHandler]] 0 instance 000000000823e148 head
000007fef206c848 4000587 20 System.Int32 1 instance 1 count
000007fef206c848 4000588 24 System.Int32 1 instance 59 version
000007fef2065ab8 4000589 10 System.Object 0 instance 0000000000000000 _syncRoot
000007fef208a1b8 400058a 18 ...SerializationInfo 0 instance 0000000000000000 siInfo
ThinLock owner 6 (0000000000000000), Recursive 0
The "DEAD" indication means that there is no corresponding OS thread. When a managed thread is created, a managed thread object is created and assigned a thread ID. On Windows, there is currently always a 1:1 mapping between managed and native threads, so there is also an OS thread ID assigned. When a thread terminates, the association between the managed thread object and the native thread is obviously broken. However, it takes a period of time before the managed thread object gets cleaned up. A thread in this state is listed by sosex as "DEAD" and is listed as XXXX in the !sos.threads output.
In this case, the CLR thread ID (00640064) looks suspicious. It looks curiously like Unicode text 'dd'. There may be corruption around the managed lock address. Look around with the debugger's 'dc' or 'db' commands to see if some text has overwritten the thread ID field.