Pymodbus: write_coil also writes in discrete input register? - modbus

I have a question regarding the pymodbus module and its functionality. My server works just fine, as well as the implemented datastore, I guess.
Here is the code for my server and datastore:
import asyncio
from pymodbus.datastore import (ModbusSequentialDataBlock,ModbusServerContext,ModbusSlaveContext,)
from pymodbus.server import StartTcpServer
from pymodbus.device import ModbusDeviceIdentification
from pymodbus.version import version
def run_full_sync_server():
print("Creating sequential Datastore.")
# Creating continuing Datablock without gaps.
datablock = ModbusSequentialDataBlock(0, [0]*20) # Initializing starting address and registers with value 0 each
print("Creating slave Context.")
# Creating one slave context. Allows the server to respond to slave context with its unit id.
slave_context = ModbusSlaveContext(
di=datablock, # 'di' - Discrete Input Initializer
co=datablock, # 'co' - Coils Initializer
hr=datablock, # 'hr' - Holding Register Initializer
ir=datablock, # 'ir' - Input Registers Initializer
unit=1, # SlaveID
zero_mode=False
)
print("Creating Server Identity.")
# Creates Server Information.
identity = ModbusDeviceIdentification(
info_name={
"VendorName": "XXX",
"ProductCode": "ModbusKassow",
"VendorUrl": "XXX",
"ProductName": "Synchronous Py",
"ModelName": "Pymodbus Server",
"MajorMinorRevision": version.short(),
}
)
print("Starting Server.")
server_context = ModbusServerContext(slaves=slave_context, single=True)
server = StartTcpServer(context=server_context, identity=identity, address=("0.0.0.0", 502))
#return server
if name == "main":
asyncio.create_task(run_full_sync_server())
I am working with spyder atm, therfore I need a seperate instance for asyncio to get the server running.
As for writing coils/registers, there are some functions given from the module, but whenever I use the function
write_coil
the exact values appear in my discrete inputs datastore (run the script two times).
from pymodbus.client import ModbusTcpClient
client = ModbusTcpClient('127.0.0.1', port=502)
discrete_inputs = client.read_discrete_inputs(0, 14) # start reading from address 0
discrete_inputs.setBit(3, 1) # set address 3 to value 1
discrete_inputs.setBit(1,1)
print(discrete_inputs.getBit(0))
print(discrete_inputs.getBit(1))
print(discrete_inputs.bits[:])
print()
client.write_coil(10, True)
client.write_coil(12, True)
client.write_coil(15, True)
reading_coil = client.read_coils(0, 14) # start reading from address 1
print(reading_coil.bits[:])
I thought that the discrete inputs and discrete coils were seperate datastores because they were initialized seperately?
Also, why does the
discrete_inputs.bits[:]
returns a list of only 16 bool values because in the server script I initialized the datastore with a list of 20 values?
Whenever I run the script twice, my coil values will appear in the discrete input datastore:
[False, True, False, True, False, False, False, False, False, False, True, False, True, False, False, False]
I know that each register/coil can only store 16 bits, which corresponds to the list of 16 values I get from the read function.
I tried the code above and were expecting that the datastores were seperately.
Also I was assuming that maybe I am only writing in one register, because I get exactly 16bits returned.
If that's the case however, I do not know how to access the other coil addresses.
Edit: code formatting

I thought that the discrete inputs and discrete coils were separate datastores because they were initialized separately?
They are; but you are creating a single store and then assigning that same store to each element. Try something like (showing two ways of accomplishing this):
datablock = ModbusSequentialDataBlock(0, [0]*20)
datablock2 = ModbusSequentialDataBlock(0, [0]*20)
slave_context = ModbusSlaveContext(
di=datablock, # 'di' - Discrete Input Initializer
co=ModbusSequentialDataBlock(0, [0]*20), # 'co' - Coils Initializer
hr=datablock2, # 'hr' - Holding Register Initializer
ir=ModbusSequentialDataBlock(0, [0]*20), # 'ir' - Input Registers Initializer
unit=1, # SlaveID
zero_mode=False
)
I know that each register/coil can only store 16 bits, which corresponds to the list of 16 values I get from the read function.
I think you may be confusing the Modbus protocol and the PyModbus implementation. A Modbus coil is one bit (The PyModbus datastore may end up storing that one bit in a 16-bit variable but that is an implementation detail).
As per the docs:
The coils in the response message are packed as one coil per bit of the data field.
Status is indicated as 1= ON and 0= OFF. The LSB of the first data byte contains the output addressed in the query. The other coils follow toward the high order end of this byte, and from low order to high order in subsequent bytes.
If the returned output quantity is not a multiple of eight, the remaining bits in the final data byte will be padded with zeros (toward the high order end of the byte). The Byte Count field specifies the quantity of complete bytes of data.
You are requesting 14 registers (client.read_discrete_inputs(0, 14)) so the output will be rounded up to 2 bytes (16 bits).

Related

Trying to use Distributed data parallel on GANs but getting runtime error about an inplace operation

I am trying to train a GAN a machine with 3GPUs using distributed data parallel.
before wrapping my model in the DDP everything works fine but when I wrap it, it givers me the following Runtime Error
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [128]] is at version 5; expected version 4 instead.
I cloned every related tensor to the gradient to solve the inplace operation (if it is any) but I could not find it.
the part of code with the problem is as follow:
Tensor = torch.cuda.FloatTensor
# ----------
# Training
# ----------
def train_gan(rank, world_size, opt):
print(f"Running basic DDP example on rank {rank}.")
setup(rank, world_size)
if rank == 0:
get_dataloader(rank, opt)
dist.barrier()
print(f"Rank {rank}/{world_size} training process passed data download barrier.\n")
dataloader = get_dataloader(rank, opt)
# Loss function
adversarial_loss = torch.nn.BCELoss()
# Initialize generator and discriminator
generator = Generator()
discriminator = Discriminator()
# Initialize weights
generator.apply(weights_init_normal)
discriminator.apply(weights_init_normal)
generator.to(rank)
discriminator.to(rank)
generator_d = DDP(generator, device_ids=[rank])
discriminator_d = DDP(discriminator, device_ids=[rank])
# Optimizers
# Since we are computing the average of several batches at once (an effective batch size of
# world_size * batch_size) we scale the learning rate to match.
optimizer_G = torch.optim.Adam(generator_d.parameters(), lr=opt.lr * opt.world_size, betas=(opt.b1, opt.b2))
optimizer_D = torch.optim.Adam(discriminator_d.parameters(), lr=opt.lr * opt.world_size, betas=(opt.b1, opt.b2))
losses = []
for epoch in range(opt.n_epochs):
for i, (imgs, _) in enumerate(dataloader):
# Adversarial ground truths
valid = Variable(Tensor(imgs.shape[0], 1).fill_(1.0), requires_grad=False).to(rank)
fake = Variable(Tensor(imgs.shape[0], 1).fill_(0.0), requires_grad=False).to(rank)
# Configure input
real_imgs = Variable(imgs.type(Tensor)).to(rank)
# -----------------
# Train Generator
# -----------------
optimizer_G.zero_grad()
# Sample noise as generator input
z = Variable(Tensor(np.random.normal(0, 1, (imgs.shape[0], opt.latent_dim)))).to(rank)
# Generate a batch of images
gen_imgs = generator_d(z)
# Loss measures generator's ability to fool the discriminator
g_loss = adversarial_loss(discriminator_d(gen_imgs), valid)
g_loss.backward()
optimizer_G.step()
# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Measure discriminator's ability to classify real from generated samples
real_loss = adversarial_loss(discriminator_d(real_imgs), valid)
fake_loss = adversarial_loss(discriminator_d(gen_imgs.detach()), fake)
d_loss = ((real_loss + fake_loss) / 2).to(rank)
d_loss.backward()
optimizer_D.step()
I encountered a similar error when trying to train a GAN with DistributedDataParallel.
I noticed the problem was coming from BatchNorm layers in my discriminator.
Indeed, DistributedDataParallel synchronizes the batchnorm parameters at each forward pass (see the doc), thereby modifying the variable inplace, which causes problems if you have multiple forward passes in a row.
Converting my BatchNorm layers to SyncBatchNorm did the trick for me:
discriminator = torch.nn.SyncBatchNorm.convert_sync_batchnorm(discriminator)
discriminator = DPP(discriminator)
You probably want to do it anyway when using DistributedDataParallel.
Alternatively, if you don't want to use SyncBatchNorm, you can set the broadcast_buffers parameter to False, but I don't think you really want to do that, as it means your batch norm stats will not be synchronized among processes.
discriminator = DPP(discriminator, device_ids=[rank], broadcast_buffers=False)

Transferring arrays/classes/records between locales

In a typical N-Body simulation, at the end of each epoch, each locale would need to share its own portion of the world (i.e. all bodies) to the rest of the locales. I am working on this with a local-view approach (i.e. using on Loc statements). I encountered some strange behaviours that I couldn't make sense out of, so I decided to make a test program, in which things got more complicated. Here's the code to replicate the experiment.
proc log(args...?n) {
writeln("[locale = ", here.id, "] [", datetime.now(), "] => ", args);
}
const max: int = 50000;
record stuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class ctuff {
var x1: int;
var x2: int;
proc init() {
this.x1 = here.id;
this.x2 = here.id;
}
}
class wrapper {
// The point is that total size (in bytes) of data in `r`, `c` and `a` are the same here, because the record and the class hold two ints per index.
var r: [{1..max / 2}] stuff;
var c: [{1..max / 2}] owned ctuff?;
var a: [{1..max}] int;
proc init() {
this.a = here.id;
}
}
proc test() {
var wrappers: [LocaleSpace] owned wrapper?;
coforall loc in LocaleSpace {
on Locales[loc] {
wrappers[loc] = new owned wrapper();
}
}
// rest of the experiment further down.
}
Two interesting behaviours happen here.
1. Moving data
Now, each instance of wrapper in array wrappers should live in its locale. Specifically, the references (wrappers) will live in locale 0, but the internal data (r, c, a) should live in the respective locale. So we try to move some from locale 1 to locale 3, as such:
on Locales[3] {
var timer: Timer;
timer.start();
var local_stuff = wrappers[1]!.r;
timer.stop();
log("get r from 1", timer.elapsed());
log(local_stuff);
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_c = wrappers[1]!.c;
timer.stop();
log("get c from 1", timer.elapsed());
}
on Locales[3] {
var timer: Timer;
timer.start();
var local_a = wrappers[1]!.a;
timer.stop();
log("get a from 1", timer.elapsed());
}
Surprisingly, my timings show that
Regardless of the size (const max), the time of sending the array and record strays constant, which doesn't make sense to me. I even checked with chplvis, and the size of GET actually increases, but the time stays the same.
The time to send the class field increases with time, which makes sense, but it is quite slow and I don't know which case to trust here.
2. Querying the locales directly.
To demystify the problem, I also query the .locale.id of some variables directly. First, we query the data, which we expect to live in locale 2, from locale 2:
on Locales[2] {
var wrappers_ref = wrappers[2]!; // This is always 1 GET from 0, okay.
log("array",
wrappers_ref.a.locale.id,
wrappers_ref.a[1].locale.id
);
log("record",
wrappers_ref.r.locale.id,
wrappers_ref.r[1].locale.id,
wrappers_ref.r[1].x1.locale.id,
);
log("class",
wrappers_ref.c.locale.id,
wrappers_ref.c[1]!.locale.id,
wrappers_ref.c[1]!.x1.locale.id
);
}
And the result is:
[locale = 2] [2020-12-26T19:36:26.834472] => (array, 2, 2)
[locale = 2] [2020-12-26T19:36:26.894779] => (record, 2, 2, 2)
[locale = 2] [2020-12-26T19:36:27.023112] => (class, 2, 2, 2)
Which is expected. Yet, if we query the locale of the same data on locale 1, then we get:
[locale = 1] [2020-12-26T19:34:28.509624] => (array, 2, 2)
[locale = 1] [2020-12-26T19:34:28.574125] => (record, 2, 2, 1)
[locale = 1] [2020-12-26T19:34:28.700481] => (class, 2, 2, 2)
Implying that wrappers_ref.r[1].x1.locale.id lives in locale 1, even though it should clearly be on locale 2. My only guess is that by the time .locale.id is executed, the data (i.e. the .x of the record) is already moved to the querying locale (1).
So all in all, the second part of the experiment lead to a secondary question, whilst not answering the first part.
NOTE: all experiment are run with -nl 4 in chapel/chapel-gasnet docker image.
Good observations, let me see if I can shed some light.
As an initial note, any timings taken with the gasnet Docker image should be taken with a grain of salt since that image simulates the execution across multiple nodes using your local system rather than running each locale on its own compute node as intended in Chapel. As a result, it is useful for developing distributed memory programs, but the performance characteristics are likely to be very different than running on an actual cluster or supercomputer. That said, it can still be useful for getting coarse timings (e.g., your "this is taking a much longer time" observation) or for counting communications using chplvis or the CommDiagnostics module.
With respect to your observations about timings, I also observe that the array-of-class case is much slower, and I believe I can explain some of the behaviors:
First, it's important to understand that any cross-node communications can be characterized using a formula like alpha + beta*length. Think of alpha as representing the basic cost of performing the communication, independent of length. This represents the cost of calling down through the software stack to get to the network, putting the data on the wire, receiving it on the other side, and getting it back up through the software stack to the application there. The precise value of alpha will depend on factors like the type of communication, choice of software stack, and physical hardware. Meanwhile, think of beta as representing the per-byte cost of the communication where, as you intuit, longer messages necessarily cost more because there's more data to put on the wire, or potentially to buffer or copy, depending on how the communication is implemented.
In my experience, the value of alpha typically dominates beta for most system configurations. That's not to say that it's free to do longer data transfers, but that the variance in execution time tends to be much smaller for longer vs. shorter transfers than it is for performing a single transfer versus many. As a result, when choosing between performing one transfer of n elements vs. n transfers of 1 element, you'll almost always want the former.
To investigate your timings, I bracketed your timed code portions with calls to the CommDiagnostics module as follows:
resetCommDiagnostics();
startCommDiagnostics();
...code to time here...
stopCommDiagnostics();
printCommDiagnosticsTable();
and found, as you did with chplvis, that the number of communications required to localize the array of records or array of ints was constant as I varied max, for example:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
This is consistent with what I'd expect from the implementation: That for an array of value types, we perform a fixed number of communications to access array meta-data, and then communicate the array elements themselves in a single data transfer to amortize the overheads (avoid paying multiple alpha costs).
In contrast, I found that the number of communications for localizing the array of classes was proportional to the size of the array. For example, for the default value of 50,000 for max, I saw:
locale
get
put
execute_on
0
0
0
0
1
0
0
0
2
0
0
0
3
25040
25000
1
I believe the reason for this distinction relates to the fact that c is an array of owned classes, in which only a single class variable can "own" a given ctuff object at a time. As a result, when copying the elements of array c from one locale to another, you're not just copying raw data, as with the record and integer cases, but also performing an ownership transfer per element. This essentially requires setting the remote value to nil after copying its value to the local class variable. In our current implementation, this seems to be done using a remote get to copy the remote class value to the local one, followed by a remote put to set the remote value to nil, hence, we have a get and put per array element, resulting in O(n) communications rather than O(1) as in the previous cases. With additional effort, we could potentially have the compiler optimize this case, though I believe it will always be more expensive than the others due to the need to perform the ownership transfer.
I tested the hypothesis that owned classes were resulting in the additional overhead by changing your ctuff objects from being owned to unmanaged, which removes any ownership semantics from the implementation. When I do this, I see a constant number of communications, as in the value cases:
locale
get
execute_on
0
0
0
1
0
0
2
0
0
3
21
1
I believe this represents the fact that once the language has no need to manage the ownership of the class variables, it can simply transfer their pointer values in a single transfer again.
Beyond these performance notes, it's important to understand a key semantic difference between classes and records when choosing which to use. A class object is allocated on the heap, and a class variable is essentially a reference or pointer to that object. Thus, when a class variable is copied from one locale to another, only the pointer is copied, and the original object remains where it was (for better or worse). In contrast, a record variable represents the object itself, and can be thought of as being allocated "in place" (e.g., on the stack for a local variable). When a record variable is copied from one locale to the other, it's the object itself (i.e., the record's fields' values) which are copied, resulting in a new copy of the object itself. See this SO question for further details.
Moving on to your second observation, I believe that your interpretation is correct, and that this may be a bug in the implementation (I need to stew on it a bit more to be confident). Specifically, I think you're correct that what's happening is that wrappers_ref.r[1].x1 is being evaluated, with the result being stored in a local variable, and that the .locale.id query is being applied to the local variable storing the result rather than the original field. I tested this theory by taking a ref to the field and then printing locale.id of that ref, as follows:
ref x1loc = wrappers_ref.r[1].x1;
...wrappers_ref.c[1]!.x1.locale.id...
and that seemed to give the right result. I also looked at the generated code which seemed to indicate that our theories were correct. I don't believe that the implementation should behave this way, but need to think about it a bit more before being confident. If you'd like to open a bug against this on Chapel's GitHub issues page, for further discussion there, we'd appreciate that.

Variable sized i2c reads Raspberry

I am trying to interface A71CH with raspberry PI 3 over i2c, the device requires repeated starts and when a read request is made the first byte the device sends, is always the length of the whole message. When I am trying to make a read, instead of reading a fixed sized message , I want to read the first byte then send NACK signal to the slave after certain amount of bytes have been received that is indicated with the first byte. I used to following code but could not get the results I expected because it only read one byte than sends a NACK signal as you can see below.
struct i2c_rdwr_ioctl_data packets;
struct i2c_msg messages[2];
int r = 0;
int i = 0;
if (bus != I2C_BUS_0) // change if bus 0 is not the correct bus
{
printf("axI2CWriteRead on wrong bus %x (addr %x)\n", bus, addr);
}
messages[0].addr = axSmDevice_addr;
messages[0].flags = 0;
messages[0].len = txLen;
messages[0].buf = pTx;
// NOTE:
// By setting the 'I2C_M_RECV_LEN' bit in 'messages[1].flags' one ensures
// the I2C Block Read feature is used.
messages[1].addr = axSmDevice_addr;
messages[1].flags = I2C_M_RD | I2C_M_RECV_LEN|I2C_M_IGNORE_NAK;
messages[1].len = 256;
messages[1].buf = pRx;
messages[1].buf[0] = 1;
// NOTE:
// By passing the two message structures via the packets structure as
// a parameter to the ioctl call one ensures a Repeated Start is triggered.
packets.msgs = messages;
packets.nmsgs = 2;
// Send the request to the kernel and get the result back
r = ioctl(axSmDevice, I2C_RDWR, &packets);
Is there any way that allows me to make variable sized i2c reads ? What can I do to make it work ? Thanks for looking.
Raspbery doesn't support SMBUS Block Reads, only way to overcome this is to do bitbanging on GPIO pins. As #Ian Abbott mentioned above, I managed to modify bbI2CZip function to fit my need by checking the first byte of the received message and updating the read length afterwards.
I had a similar issue with the rpi3. I wanted to read exactly 32 bytes of data from a register on a slave device, but i2c_smbus_read_block_data() was returning -71 and errno 71 EPROTO.
The solution was to use i2c_smbus_read_i2c_block_data() instead of i2c_smbus_read_block_data().
/* Until kernel 2.6.22, the length is hardcoded to 32 bytes. If you
ask for less than 32 bytes, your code will only work with kernels
2.6.23 and later. */
extern __s32 i2c_smbus_read_i2c_block_data(int file, __u8 command, __u8 length,
__u8 *values);

in UVM RAL, a reg defined as no reset value, but set/update a '0' data on it won't trigger bus transaction

In ral file, I have something like:
class ral_reg_AAA_0 extends uvm_reg;
rand uvm_reg_field R2Y;
constraint R2Y_default {
}
function new(string name = "AAA_0");
super.new(name, 32,build_coverage(UVM_NO_COVERAGE));
endfunction: new
virtual function void build();
this.R2Y = uvm_reg_field::type_id::create("R2Y",,get_full_name());
this.R2Y.configure(this, 12, 4, "RW", 0, 12'h0, 0, 1, 1);
endfunction: build
`uvm_object_utils(ral_reg_AAA_0)
endclass : ral_reg_AAA_0
You can find R2Y is set to has_reset = 0, in real RTL, it's 'X' value by default
but if I use set/update mechanism to write this reg, if write data is 0, which is equal to reset value in R2Y (even has_reset = 0), seems like RAL will treat m_mirror == m_desired so there won't be a bus transaction for this reg access.
like
env.regmodel.AAA_0.R2Y.set(0);
env.regmodel.AAA_0.update(status,UVM_FRONTDOOR);
Does that make sense? I thought no matter which value I set to these kind of regs, there should be always bus transaction happening.
PS: mirrored and desired values are 2-state vectors, and even reg fields are set as 'no reset' value, m_mirrored initial value for a reg field is still 0. if RTL reset value is "x", for instance, there are 10 regs in design, I want to randomly pick up any number of them to write them with random value (of course, 0 is also a legal value), seems I will miss those '0' value register setting in this case.
I am using a workaround for now, flush all regs to be 0 value with a 'write' ral method, it can meet my expectation with some extra overhead on bus
The internal representations of the mirrored and desired values are 2-state vectors and are stored on a per field basis. This means that when created, your R2Y field's mirrored value is 0. By setting the desired value to 0, the values are still the same, which is why no bus transaction is started. If you want to force a bus transaction, just use the write(...) method:
env.regmodel.AAA_0.write(status, 0); // this will write the value '0' to the register
If you still want to use set(...) to play with the register fields, then you could try something like:
env.regmodel.AAA_0.R2Y.set(0);
env.regmodel.AAA_0.write(status, env.regmodel.AAA_0.get());

PyBrain how to interpret the results from net.activate?

I've trained a network on PyBrain for purpose of classification and am ready to fire away with specific input. However, when I do
classes = ['apple', 'orange', 'peach', 'banana']
data = ClassificationDataSet(len(input), 1, nb_classes=len(classes), class_labels=classes)
data._convertToOneOfMany( ) # recommended by PyBrain
fnn = buildNetwork( data.indim, 5, data.outdim, outclass=SoftmaxLayer )
trainer = BackpropTrainer( fnn, dataset=data, momentum=m, verbose=True, weightdecay=wd)
trainer.trainUntilConvergence(maxEpochs=80)
# stop training and start using my trained network here
output = fnn.activate(input)
As expected, I get a numeric value for "output", but is there a way to determine the predicted class label directly? Even if there's not one, how can I map the value of "output" to my class label? Thank you for your help.
When you say you get a numeric value for "output" do you mean a scalar (that is, not an array)? From my understanding of it, you should have gotten an array of four values (ie. as many as possible output classes you have). The biggest value in that array corresponds to the index of the class. I don't know if PyBrain provides an utility function to extract that, but you can do it like this:
class_index = max(xrange(len(output)), key=output.__getitem__)
class_name = classes[class_index]
Incidentally, you omitted the step in which you actually fill the data in the dataset.