what is SAM LUN format in iscsi protocol? - iscsi

When I read the source code of SPDK, there are two forms of fmt_lun in the function spdk_scsi_lun_id_fmt_to_int. What do these two forms mean? and fmt_lun complies with the SAM LUN format,what is SAM LUN format?
uint64_t
spdk_scsi_lun_id_int_to_fmt(int lun_id)
{
uint64_t fmt_lun, method;
if (SPDK_SCSI_DEV_MAX_LUN <= 0x0100) {
/* below 256 */
method = 0x00U;
fmt_lun = (method & 0x03U) << 62;
fmt_lun |= ((uint64_t)lun_id & 0x00ffU) << 48;
} else if (SPDK_SCSI_DEV_MAX_LUN <= 0x4000) {
/* below 16384 */
method = 0x01U;
fmt_lun = (method & 0x03U) << 62;
fmt_lun |= ((uint64_t)lun_id & 0x3fffU) << 48;
} else {
/* XXX */
fmt_lun = 0;
}
return fmt_lun;
}

You'll get the best answer to your question by going to the original specification. You can obtain a copy of the SCSI Architecture Model spec from t10.org here: https://www.t10.org/members/w_sam5.htm. That's the "SAM" from the SAM LUN format.
SCSI has had to adapt to numerous advancements in computing hardware over the years. Back in the mid-1980's, even if you could predict how storage would change over the decades, the protocol itself still needed to be useful on the comparatively tiny computers of the day. And, so you see a lot of this kind of thing in the SCSI world, like you have with LUN encoding. At some point in time, there became a need for more than 256 LUNs. Thankfully, the engineers had built in an addressing method field from the beginning. Naturally, the first method was 0. To maintain compatibility with existing systems, they created method 1, which allows for up to 16,384 LUNs.
SAM-5 defines four different addressing methods:
0: Peripheral device addressing method
1: Flat space addressing method
2: Logical unit addressing method
3: Extended logical unit addressing method
I've only seen the first two out in the wild, but I'm sure there are devices out there that use methods 2 and 3.

Related

Why are the hex numbers for big endian different than little endian?

#include<stdio.h>
int main()
{
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, size_t len)
{
int i;
for (i = 0; i < len; i++)
{
printf(" %.2x", start[i]);
printf("\n");
}
}
void show_int(int x)
{
show_bytes((byte_pointer) &x, sizeof(int));
}
void show_float(int x)
{
show_bytes((byte_pointer) &x, sizeof(float));
}
void show_pointer(int x)
{
show_bytes((byte_pointer) &x, sizeof(void *));
}
int a = 0x12345678;
byte_pointer ap = (byte_pointer) &a;
show_bytes(ap, 3);
return 0;
}
(Solutions according to the CS:APP book)
Big endian: 12 34 56
Little endian: 78 56 34
I know systems have different conventions for storage allocation but if two systems use the same convention but are different endian why are the hex values different?
Endian-ness is an issue that arises when we use more than one storage location for a value/type, which we do because somethings won't fit in a single storage location.
As soon as we use multiple storage locations for a single value that gives rise to the question of:  What part of the value will we store in each storage location?
The first byte of a two-byte item will have a lower address than the second byte, and in particular, the address of the second byte will be at +1 from the address of the lower byte.
Storing a two-byte item in two bytes of storage, do we store the most significant byte first and the least significant byte second, or vice versa?
We choose to use directly consecutive bytes for the two bytes of the two-byte item, so no matter which (endian) way we choose to store such an item, we refer to the whole two-byte item by the lower address (the address of its first byte).
We can express these storage choices with a formula, here item[0] refer to the first byte while item[1] refers to the second byte.
item[0] = value >> 8 // also value / 256
item[1] = value & 0xFF // also value % 256
value = (item[0]<<8) | item[1] // also item[0]*256 | item[1]
--vs--
item[0] = value & 0xFF // also value % 256
item[1] = value >> 8 // also value / 256
value = item[0] | (item[1]<<8) // also item[0] | item[1]*256
The first set of formulas is for big endian, and the second for little endian.
By these formulas, it doesn't matter what order we access memory as to whether item[0] first, then item[1], or vice versa, or both at the same time (common in hardware), as long as the formulas for one endian are consistently used.
If the item in question is a four-byte value, then there are 4 possible orderings(!) — though only two of them are truly sensible.
For efficiency, the hardware offers us multibyte memory access in one instruction (and with one reference, namely to the lowest address of the multibyte item), and therefore, the hardware itself needs to define and consistently use one of the two possible/reasonable orderings.
If the hardware did not offer multibyte memory access, then the ordering would be entirely up to the software program itself to define (accessing memory one byte at a time), and the program could choose big or little endian, even differently for each variable, as long as it consistently accesses the multiple bytes of memory in the same manner to reassemble the values stored there.
In a similar manner, when we define a structure of multiple items (e.g. struct point { int x; int y; }, software chooses whether x comes first or y comes first in memory ordering.  However, since programmers (and compilers) will still choose to use hardware instructions to access individual fields such as x in one go, the hardware's endian configuration remains necessary.

What initialises the contents of the STM32's USB BTABLE when the __HAL_RCC_USB_CLK_ENABLE() macro is executed in HAL_PCD_MspInit()?

I have used STM32CubeMX/IDE to generate a USB HID project for the STM32F3DISCOVERY board.
The USB BTABLE register is zero, indicating that the BTABLE is at the start of the Packet Memory Area.
(I zero the whole PMA at program start, to avoid stale values.)
Just before the execution of the __HAL_RCC_USB_CLK_ENABLE macro (in HAL_PCD_MspInit() in usbd_conf.c) the values of the BTABLE (at index zero onwards, in the PMA are:
After that macro is executed, the values are:
The macro expands to:
do { \
volatile uint32_t tmpreg; \
((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x00001000UL))->APB1ENR) |= ((0x1UL << (23U))));\
/* Delay after an RCC peripheral clock enabling */ \
tmpreg = ((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x00001000UL))->APB1ENR) & ((0x1UL << (23U))));\
(void)tmpreg; \
} while(0U)
How does this macro cause the BTABLE to be initialised?
(I need pma[12] to be 0x100 instead of 0x0 as I want to use endpoint 3 for the HID interface in a composite device. I am using this simple HID device to test the use of a different endpoint. Changing 0x81 to 0x83 in USBD_LL_Init() and #define HID_EPIN_ADDR are not sufficient to change the value of pma[12]. The incorrect TX pointer at pma[12] is used and corrupt data is observed in wireshark.)
Update:
If I add code to manually set pma[12] to 0x100:
HAL_StatusTypeDef HAL_PCDEx_PMAConfig(PCD_HandleTypeDef *hpcd,
uint16_t ep_addr,
uint16_t ep_kind,
uint32_t pmaadress)
...
/* Here we check if the endpoint is single or double Buffer*/
if (ep_kind == PCD_SNG_BUF)
{
/* Single Buffer */
ep->doublebuffer = 0U;
/* Configure the PMA */
ep->pmaadress = (uint16_t)pmaadress;
// correct PMA BTABLE
uint32_t *btable = (uint32_t *) USB_PMAADDR; // Test this.
if (ep->is_in) {
btable[ep->num * 4] = pmaadress;
}
}
The value at pam[12] does get set, but it later gets overwritten:
__HAL_RCC_USB_CLK_ENABLE() enables clocks for the USB block. Before it is enabled, all peripheral locations are read as zeroes. After clock is enabled, the actual PMA content becomes visible, whatever was written there before reset or random garbage left after the power up. So executing __HAL_RCC_USB_CLK_ENABLE() has nothing to do with your problem.
I don't know where the TX buffer address for endpoint 3 gets overwritten, but I guess it is the Cube sets it when it decides to send data on the endpoint. I am not familiar with the Cube, does it have an API to send a USB packet?
Also, double-check that your pma array has the right definition. On F1 and I likely F3, there is a 2-byte value at each of the 32-bit location.
UPD: Sorry, I saw this question first, but your real problem is why TX addr gets overwritten or not set up correctly.

Variable sized i2c reads Raspberry

I am trying to interface A71CH with raspberry PI 3 over i2c, the device requires repeated starts and when a read request is made the first byte the device sends, is always the length of the whole message. When I am trying to make a read, instead of reading a fixed sized message , I want to read the first byte then send NACK signal to the slave after certain amount of bytes have been received that is indicated with the first byte. I used to following code but could not get the results I expected because it only read one byte than sends a NACK signal as you can see below.
struct i2c_rdwr_ioctl_data packets;
struct i2c_msg messages[2];
int r = 0;
int i = 0;
if (bus != I2C_BUS_0) // change if bus 0 is not the correct bus
{
printf("axI2CWriteRead on wrong bus %x (addr %x)\n", bus, addr);
}
messages[0].addr = axSmDevice_addr;
messages[0].flags = 0;
messages[0].len = txLen;
messages[0].buf = pTx;
// NOTE:
// By setting the 'I2C_M_RECV_LEN' bit in 'messages[1].flags' one ensures
// the I2C Block Read feature is used.
messages[1].addr = axSmDevice_addr;
messages[1].flags = I2C_M_RD | I2C_M_RECV_LEN|I2C_M_IGNORE_NAK;
messages[1].len = 256;
messages[1].buf = pRx;
messages[1].buf[0] = 1;
// NOTE:
// By passing the two message structures via the packets structure as
// a parameter to the ioctl call one ensures a Repeated Start is triggered.
packets.msgs = messages;
packets.nmsgs = 2;
// Send the request to the kernel and get the result back
r = ioctl(axSmDevice, I2C_RDWR, &packets);
Is there any way that allows me to make variable sized i2c reads ? What can I do to make it work ? Thanks for looking.
Raspbery doesn't support SMBUS Block Reads, only way to overcome this is to do bitbanging on GPIO pins. As #Ian Abbott mentioned above, I managed to modify bbI2CZip function to fit my need by checking the first byte of the received message and updating the read length afterwards.
I had a similar issue with the rpi3. I wanted to read exactly 32 bytes of data from a register on a slave device, but i2c_smbus_read_block_data() was returning -71 and errno 71 EPROTO.
The solution was to use i2c_smbus_read_i2c_block_data() instead of i2c_smbus_read_block_data().
/* Until kernel 2.6.22, the length is hardcoded to 32 bytes. If you
ask for less than 32 bytes, your code will only work with kernels
2.6.23 and later. */
extern __s32 i2c_smbus_read_i2c_block_data(int file, __u8 command, __u8 length,
__u8 *values);

Reducing LUT utilization in a Vivado HLS design (RSA cryptosystem using montgomery multiplication)

A question/problem for anyone experienced with Xilinx Vivado HLS and FPGA design:
I need help reducing the utilization numbers of a design within the confines of HLS (i.e. can't just redo the design in an HDL). I am targeting the Zedboard (Zynq 7020).
I'm trying to implement 2048-bit RSA in HLS, using the Tenca-koc multiple-word radix 2 montgomery multiplication algorithm, shown below (More algorithm details here):
I wrote this algorithm in HLS and it works in simulation and in C/RTL cosim. My algorithm is here:
#define MWR2MM_m 2048 // Bit-length of operands
#define MWR2MM_w 8 // word size
#define MWR2MM_e 257 // number of words per operand
// Type definitions
typedef ap_uint<1> bit_t; // 1-bit scan
typedef ap_uint< MWR2MM_w > word_t; // 8-bit words
typedef ap_uint< MWR2MM_m > rsaSize_t; // m-bit operand size
/*
* Multiple-word radix 2 montgomery multiplication using carry-propagate adder
*/
void mwr2mm_cpa(rsaSize_t X, rsaSize_t Yin, rsaSize_t Min, rsaSize_t* out)
{
// extend operands to 2 extra words of 0
ap_uint<MWR2MM_m + 2*MWR2MM_w> Y = Yin;
ap_uint<MWR2MM_m + 2*MWR2MM_w> M = Min;
ap_uint<MWR2MM_m + 2*MWR2MM_w> S = 0;
ap_uint<2> C = 0; // two carry bits
bit_t qi = 0; // an intermediate result bit
// Store concatenations in a temporary variable to eliminate HLS compiler warnings about shift count
ap_uint<MWR2MM_w> temp_concat=0;
// scan X bit-by bit
for (int i=0; i<MWR2MM_m; i++)
{
qi = (X[i]*Y[0]) xor S[0];
// C gets top two bits of temp_concat, j'th word of S gets bottom 8 bits of temp_concat
temp_concat = X[i]*Y.range(MWR2MM_w-1,0) + qi*M.range(MWR2MM_w-1,0) + S.range(MWR2MM_w-1,0);
C = temp_concat.range(9,8);
S.range(MWR2MM_w-1,0) = temp_concat.range(7,0);
// scan Y and M word-by word, for each bit of X
for (int j=1; j<=MWR2MM_e; j++)
{
temp_concat = C + X[i]*Y.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + qi*M.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) + S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j);
C = temp_concat.range(9,8);
S.range(MWR2MM_w*j+(MWR2MM_w-1), MWR2MM_w*j) = temp_concat.range(7,0);
S.range(MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)) = (S.bit(MWR2MM_w*j), S.range( MWR2MM_w*(j-1)+(MWR2MM_w-1), MWR2MM_w*(j-1)+1));
}
S.range(S.length()-1, S.length()-MWR2MM_w) = 0;
C=0;
}
// if final partial sum is greater than the modulus, bring it back to proper range
if (S >= M)
S -= M;
*out = S;
}
Unfortunately, the LUT utilization is huge.
This is problematic because I need to be able to fit multiple of these blocks in hardware as axi4-lite slaves.
Could someone please provide a few suggestions as to how I can reduce the LUT utilization, WITHIN THE CONFINES OF HLS?
I've already tried the following:
Experimenting with different word lengths
switching the top level inputs to arrays so they are BRAM (i.e. not using ap_uint<2048>, but instead ap_uint foo[MWR2MM_e])
Experimenting with all sorts of directives: compartmentalizing into multiple inline functions, dataflow architecture, resource limits on lshr, etc.
However, nothing really drives the LUT utilization down in a meaningful way. Is there a glaringly obvious way that I could reduce the utilization that is apparent to anyone?
In particular, I've seen papers on implementations of the mwr2mm algorithm that (only use one DSP block and one BRAM). Is this even worth attempting to implement using HLS? Or is there no way that I can actually control the resources that the algorithm is mapped to without describing it in HDL?
Thanks for the help.

How to store data larger than 128 byte in JavaCard

I can't write data at index above 128 in byte array.
code is given below.
private void Write1(APDU apdu) throws ISOException
{
apdu.setIncomingAndReceive();
byte[] apduBuffer = apdu.getBuffer();
byte j = (byte)apduBuffer[4]; // Return incoming bytes lets take 160
Buffer1 = new byte[j]; // initialize a array with size 160
for (byte i=0; i<j; i++)
Buffer1[(byte)i] = (byte)apduBuffer[5+i];
}
It gives me error 6F 00 (It means reach End Of file).
I am using:
smart card type = contact card
using java card 2.2.2 with jcop using apdu
Your code contains several problems:
As already pointed out by 'pst' you are using a signed byte value which works only up to 128 - use a short instead
Your are creating a new buffer Buffer1 on every call of your Write1 method. On JavaCard there is usually no automatic garbage collection - therefore memory allocation should only be done once when the app is installed. If you only want to process the data in the adpu buffer just use it from there. And if you want to copy data from one byte array into another better use javacard.framework.Util.arrayCopy(..).
You are calling apdu.setIncomingAndReceive(); but ignore the return value. The return value gives you the number of bytes of data you can read.
The following code is from the API docs and shows the common way:
short bytesLeft = (short) (buffer[ISO7816.OFFSET_LC] & 0x00FF);
if (bytesLeft < (short)55) ISOException.throwIt( ISO7816.SW_WRONG_LENGTH );
short readCount = apdu.setIncomingAndReceive();
while ( bytesLeft > 0){
// process bytes in buffer[5] to buffer[readCount+4];
bytesLeft -= readCount;
readCount = apdu.receiveBytes ( ISO7816.OFFSET_CDATA );
}
short j = (short) apdu_buffer[ISO7816.OFFSET_LC] & 0xFF
Elaborating on pst's answer. A byte has 2^8 bits numbers, or rather 256. But if you are working with signed numbers, they will work in a cycle instead. So, 128 will be actually -128, 129 will be -127 and so on.
Update: While the following answer is "valid" for normal Java, please refer to Roberts answer for Java Card-specific information, as well additional concerns/approaches.
In Java a byte has values in the range [-128, 127] so, when you say "160", that's not what the code is really giving you :)
Perhaps you'd like to use:
int j = apduBuffer[4] & 0xFF;
That "upcasts" the value apduBuffer[4] to an int while treating the original byte data as an unsigned value.
Likewise, i should also be an int (to avoid a nasty overflow-and-loop-forever bug), and the System.arraycopy method could be handy as well...
(I have no idea if that is the only/real problem -- or if the above is a viable solution on a Java Card -- but it sure is a problem and aligns with the "128 limit" mentioned.)
Happy coding.