Couting instructions in a Python code to predict PLC's scan time - plc

I am creating a code on Python that is made to be later implemented on a PLC.
I would like to know how long it would take for the PLC to run it.
So far the documentation of a PLC's CPU is giving me:
Boolean → 0.08 μs/instruction
Move Word → 1.7 μs/instruction
Real math → 2.3 μs/instruction
But i would need a few more details about it such as :
if A == B then: C = (D+1)*2
How would you count that ?
To me :
- 2 Booleans (if and A == B)
- 1 Move word (Move a value to C)
- 2 Real math (one addition and one multiplication)
Is that correct ?
Thank you

Your logic makes sense to me, but I do not know why you would want to do this. The processors used in PLCs have gotten so fast scan time is almost of no concern.

Related

Designing an Asynchronous Down Counter (13 to 1 and reset) using JK-FF with PRN and CLRN

I decided to mess around with asynchronous counters and tried to create an asynchronous down counter using FF-JK which counts from 13 to 1 and wraps around. However, I ran into various problems.
My RESET signal expression: Q0.Q1.Q2.Q3 + (Q1.Q2.Q3)' where Q0 is LSB and Q3 is MSB
My circuit is as follows:
However, when I simulated the circuit, it gave me the wrong results.
I hope I described my problem detailedly, and if there was anything I missed, please correct me. Thank you very much, and have a wonderful day.
I tried reconnecting my reset signal from PRN to CLRN and vice versa, I have also tried using T-FF, SR-FF, D-FF (the implementations were different).
I found a workaround for this problem, though, I would not consider it an optimal solution. However, it does the job now, and I look forward to receiving insights on the issues I faced.
Instead of designing the RESET signal and having the circuit loop in some undefined state, I designed a MOD-13 counter which goes through all 13 states from 0000 to 1100 and maps the output to the desired values (13 to 1 respectively).
MOD-13 counters are easier to design, as they prevent undefined states from being reached so easily, and the circuit responsible for mapping the states with the desired values are simple combinatorial circuits which are also easy to design.
Of course, there exists a much better way to do this, but I am not able to implement it at the moment. With that being said, I am always open for discussions, and I look forward to seeing what kind of error I had accumulated during implementation. Have a wonderful day!

Register Ranges in HLSL?

I am currently refactoring a large chunk of old code and have finally dove into the HLSL section where my knowledge is minimal due to being out of practice. I've come across some documentation online that specifies which registers are to be used for which purposes:
t – for shader resource views (SRV)
s – for samplers
u – for unordered access views (UAV)
b – for constant buffer views (CBV)
This part is pretty self explanatory. If I want to create a constant buffer, I can just declare as:
cbuffer LightBuffer: register(b0) { };
cbuffer CameraBuffer: register(b1) { };
cbuffer MaterialBuffer: register(b2) { };
cbuffer ViewBuffer: register(b3) { };
However, originating from the world of MIPS Assembly I can't help but wonder if there are finite and restricted ranges on these. For example, temporary registers are restricted to a range of t0 - t7 in MIPS Assembly. In the case of HLSL I haven't been able to find any documentation surrounding this topic as everything seems to point to assembly languages and microprocessors (such as the 8051 if you'd like a random topic to read up on).
Is there a set range for the four register types in HLSL or do I just continue as much as needed in a sequential fashion and let the underlying assembly handle the messy details?
Note
I have answered this question partially, as I am unable to find a range for u currently; however, if someone has a better, more detailed answer than what I've given through testing, then feel free to post it and I will mark that as the correct answer. I will leave this question open until December 1st, 2018 to give others a chance to give a better answer for future readers.
Resource slot count (for d3d11, indeed d3d12 case expands that) are specified in Resource Limit msdn page.
The ones which are of interest for you here are :
D3D11_COMMONSHADER_INPUT_RESOURCE_REGISTER_COUNT (which is t) = 128
D3D11_COMMONSHADER_SAMPLER_SLOT_COUNT (which is s) = 16
D3D11_COMMONSHADER_CONSTANT_BUFFER_HW_SLOT_COUNT (which is b) = 15 but one is reserved to eventually store some constant data from shaders (if you have a static const large array for example)
The u case is different, as it depends on Feature Level (and tbh is a vendor/os version mess) :
D3D11_FEATURE_LEVEL_11_1 or greater, this is 64 slots
D3D11_FEATURE_LEVEL_11 : It will always be 8 (but some cards/driver eventually support 64, you need at least windows 8 for it (It might also be available in windows 7 with some platform update too). I do not recall a way to test if 64 is supported (many nvidia in their 700 range do for example).
D3D11_FEATURE_LEVEL_10_1 : either 0 or 1, there's a way to check is compute is supported
You need to perform a feature check:
D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS checkData;
d3dDevice->CheckFeatureSupport(D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS, &checkData);
BOOL computeSupport = checkData.ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x
Please note that for some OS/Driver version I had this flag returning TRUE while not supported (Intel was doing that on win7/8), so in that case the only valid solution was to try to either create a small Raw / Byte Address buffer or a Structured Buffer and check the HRESULT
As a side note feature feature level 10 or below are for for quite old configurations nowadays, so except for rare scenarios you can probably safely ignore it (I just leave it for information purpose).
Since it's usually a long wait time for these types of questions I tested the b register by attempting to create a cbuffer in register b51. This failed as I expected and luckily SharpDX spit out an exception that stated it has a maximum of 14. So for the sake of future readers I am testing all four register types and posting back the ranges I find successful.
b has a range of b0 - b13.
s has a range of s0 - s15.
t has a range of t0 - t127.
u has a range of .
At the current moment, I am unable to find a range for the u register as I have no examples of it in my code, and haven't actually ever used it. If someone comes along that does have an example usage then feel free to test it and update this post for future readers.
I did find a contradiction to my findings above in the documentation linked in my question; they have an example using a t register above the noted range in this answer:
Texture2D a[10000] : register(t0);
Texture2D b[10000] : register(t10000);
ConstantBuffer<myConstants> c[10000] : register(b0);
Note
I would like to point out that I am using the SharpDX version of the HLSL compiler and so I am unsure if these ranges vary from compiler to compiler; I heavily doubt that they do, but you can never be too sure until you try to exceed them. GLSL may be the same due to being similar to HLSL, but it could also be very different.

problems with implementation of 0000-9999 counter on fpga(seven segment)

EDIT1
okay i couldnt post a long comment(i am new to the website so please accept my apologies) so i am editing my earlier question. I have tried to implement multiplexing in 2 attempts:
-2nd attempt
-3rd attempt
in 2nd attempt i have tried to send the seven seg variables of each module to the module which is just one step ahead of it, and when they all reach the final top module i have multiplexed them...there is also a clock module which generates a clock for the units module(which makes units place change 2 times in a second) and a clock for multiplexing(multiplexing between each displays 500 times per second)...ofcourse i read that my board has a clock freq of 50M hertz, so these calculations for clocks are based on that figure...
in the 3rd comment i have done the same thing, in one single module. see the 2nd attempt first and then the 3rd one.
both give errors right after synthesis and lots of unfamiliar warnings.
EDIT 2
I have been able to synthesize and implement the program in attempt4(which i am not allowed to post since my reputation is low), using the save flag for variables, variables1 variables2 and variables3(which were giving warning of unused pins) but the program doesnt run on fpga...it simply shows the number 3777. also there are still warnings of "combinatorial loops" for some things that are related to some variables( i am sorry i am new to all this verilog thing) but you can see all of them in attempt 3 as well.
You can not implement counters with loops. Neither can you implement cascaded counters with nested loops.
Writing HDL is not writing software! Please read a book or tutorial on VHDL or Verilog on how to design basic hardware circuits. There is also the Synthesis and Simulation Guide 14.4 - UG626 from Xilinx. Have a look at page 88.
Edit1:
Now it's possible to access your zip file without any dropbox credentials and I have looked into your project. Here are my comments on your code.
I'll number my bullets for better reference:
Your project has 4 mostly identical ucf files. The difference is only in assigning different anode control signals to the same pin location. This will cause errors in post synthesis steps (assign multiple nets to one pin). Normally, simple projects have only one ucf file.
The Nexsys 2 board has a 4 digit 7-segment display with common cathodes and switchable common anodes. In total these are 8+4 wires to control. A time multiplexing circuit is needed to switch at 25Hz < f < 1kHz through every digit of your 4-digit output vector.
Choosing a nested hierarchy is not so good. One major drawback is the passing of many signals from every level to the topmost level for connecting them to the FPGA pins. I would suggest a top-level module and 4 counters on level one. The top-level module can also provide the time-multiplexing circuit and the binary to 7-seg encoding.

LuaJIT FFI callback performance

The LuaJIT FFI docs mention that calling from C back into Lua code is relatively slow and recommend avoiding it where possible:
Do not use callbacks for performance-sensitive work: e.g. consider a numerical integration routine which takes a user-defined function to integrate over. It's a bad idea to call a user-defined Lua function from C code millions of times. The callback overhead will be absolutely detrimental for performance.
For new designs avoid push-style APIs (C function repeatedly calling a callback for each result). Instead use pull-style APIs (call a C function repeatedly to get a new result). Calls from Lua to C via the FFI are much faster than the other way round. Most well-designed libraries already use pull-style APIs (read/write, get/put).
However, they don't give any sense of how much slower callbacks from C are. If I have some code that I want to speed up that uses callbacks, roughly how much of a speedup could I expect if I rewrote it to use a pull-style API? Does anyone have any benchmarks comparing implementations of equivalent functionality using each style of API?
On my computer, a function call from LuaJIT into C has an overhead of 5 clock cycles (notably, just as fast as calling a function via a function pointer in plain C), whereas calling from C back into Lua has a 135 cycle overhead, 27x slower. That being said, program that required a million calls from C into Lua would only add ~100ms overhead to the program's runtime; while it might be worth it to avoid FFI callbacks in a tight loop that operates on mostly in-cache data, the overhead of callbacks if they're invoked, say, once per I/O operation is probably not going to be noticeable compared to the overhead of the I/O itself.
$ luajit-2.0.0-beta10 callback-bench.lua
C into C 3.344 nsec/call
Lua into C 3.345 nsec/call
C into Lua 75.386 nsec/call
Lua into Lua 0.557 nsec/call
C empty loop 0.557 nsec/call
Lua empty loop 0.557 nsec/call
$ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i5-3427U CPU # 1.80GHz
Benchmark code: https://gist.github.com/3726661
Since this issue (and LJ in general) has been the source of great pain for me, I'd like to toss some extra information into the ring, in hopes that it may assist someone out there in the future.
'Callbacks' are Not Always Slow
The LuaJIT FFI documentation, when it says 'callbacks are slow,' is referring very specifically to the case of a callback created by LuaJIT and passed through FFI to a C function that expects a function pointer. This is completely different from other callback mechanisms, in particular, it has entirely different performance characteristics compared to calling a standard lua_CFunction that uses the API to invoke a callback.
With that said, the real question is then: when do we use the Lua C API to implement logic that involves pcall et al, vs. keeping everything in Lua? As always with performance, but especially in the case of a tracing JIT, one must profile (-jp) to know the answer. Period.
I have seen situations that looked similar yet fell on opposite ends of the performance spectrum; that is, I have encountered code (not toy code, but rather production code in the context of writing a high-perf game engine) that performs better when structured as Lua-only, as well as code (that seems structurally-similar) that performs better upon introducing a language boundary via calling a lua_CFunction that uses luaL_ref to maintain handles to callbacks and callback arguments.
Optimizing for LuaJIT without Measurement is a Fool's Errand
Tracing JITs are already hard to reason about, even if you're an expert in static language perf analysis. They take everything you thought you knew about performance and shatter it to pieces. If the concept of compiling recorded IR rather than compiling functions doesn't already annihilate one's ability to reason about LuaJIT performance, then the fact that calling into C via the FFI is more-or-less free when successfully JITed, yet potentially an order-of-magnitude more expensive than an equivalent lua_CFunction call when interpreted...well, this for sure pushes the situation over the edge.
Concretely, a system that you wrote last week that vastly out-performed a C equivalent may tank this week because you introduced an NYI in trace-proximity to said system, which may well have come from a seemingly-orthogonal region of code, and now your system is falling back and obliterating performance. Even worse, perhaps you're well-aware of what is and isn't an NYI, but you added just enough code to the trace proximity that it exceeded the JIT's max recorded IR instructions, max virtual registers, call depth, unroll factor, side trace limit...etc.
Also, note that, while 'empty' benchmarks can sometimes give a very general insight, it is even more important with LJ (for the aforementioned reasons) that code be profiled in context. It is very, very difficult to write representative performance benchmarks for LuaJIT, since traces are, by their nature, non-local. When using LJ in a large application, these non-local interactions become tremendously impactful.
TL;DR
There is exactly one person on this planet who really and truly understands the behavior of LuaJIT. His name is Mike Pall.
If you are not Mike Pall, do not assume anything about LJ behavior and performance. Use -jv (verbose; watch for NYIs and fallbacks), -jp (profiler! Combine with jit.zone for custom annotations; use -jp=vf to see what % of your time is being spent due in the interpreter due to fallbacks), and, when you really need to know what's going on, -jdump (trace IR & ASM). Measure, measure, measure. Take generalizations about LJ performance characteristics with a grain of salt unless they come from the man himself or you've measured them in your specific usage case (in which case, after all, it's not a generalization). And remember, the right solution might be all in Lua, it might be all in C, it might be Lua -> C through FFI, it might be Lua -> lua_CFunction -> Lua, ...you get the idea.
Coming from someone who has been fooled time-and-time-again into thinking that he has understood LuaJIT, only to be proven wrong the following week, I sincerely hope this information helps someone out there :) Personally, I simply no longer make 'educated guess' about LuaJIT. My engine outputs jv and jp logs for every run, and they are the 'word of God' for me with respect to optimization.
Two years later, I redid the benchmarks from Miles' answer, for the following reasons:
See if they improved with the new advancements (in CPU and LuaJIT)
To add tests for functions with parameters and returns. The callback documentation mentiones that apart from the call overhead, parameter marshalling also matters:
[...] the C to Lua transition itself has an unavoidable cost, similar to a lua_call() or lua_pcall(). Argument and result marshalling add to that cost [...]
Check the difference between PUSH style and PULL style.
My results, on Intel(R) Core(TM) i7 CPU 920 # 2.67GHz:
operation reps time(s) nsec/call
C into Lua set_v 10000000 0.498 49.817
C into Lua set_i 10000000 0.662 66.249
C into Lua set_d 10000000 0.681 68.143
C into Lua get_i 10000000 0.633 63.272
C into Lua get_d 10000000 0.650 64.990
Lua into C call(void) 100000000 0.381 3.807
Lua into C call(int) 100000000 0.381 3.815
Lua into C call(double) 100000000 0.415 4.154
Lua into Lua 100000000 0.104 1.039
C empty loop 1000000000 0.695 0.695
Lua empty loop 1000000000 0.693 0.693
PUSH style 1000000 0.158 158.256
PULL style 1000000 0.207 207.297
The code for this results is here.
Conclusion: C callbacks into Lua have a really big overhead when used with parameters (which is what you almost always do), so they really shouldn't be used in critical points. You can use them for IO or user input though.
I am a bit surprised there is so little difference between PUSH/PULL styles, but maybe my implementation is not among the best.
There is a significant performance difference, as shown by these results:
LuaJIT 2.0.0-beta10 (Windows x64)
JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
n Push Time Pull Time Push Mem Pull Mem
256 0.000333 0 68 64
4096 0.002999 0.001333 188 124
65536 0.037999 0.017333 2108 1084
1048576 0.588333 0.255 32828 16444
16777216 9.535666 4.282999 524348 262204
The code for this benchmark can be found here.

Which is more efficient: binary & or logical &&

When all values are boolean doesn't the binary & operate on more bits than the logical &&?
For example
if ( foo == "Yes" & 2 != 0 & 6 )
or
if ( foo == "Yes" && 2 != 0 && 6 )
(thinking PHP specifically, but any language will do)
How many bits it operates on is irrelevant: the processor can do it in a single instruction anyway (normally). The crucial difference is the shortcut-ability of &&, so for anything on the right hand side that's not trivial to evaluate it is faster — assuming a language where && works this way, like C or Java. (And apparently PHP, though I know to little about it to be sure.)
On the other hand, the branch that this shortcut requires might also slow down things, however I'm quite sure nowadays' compilers are smart enough to optimize that away.
It depends on the logic of your code.
It's better to use && if && is logical and to use & when & is logical, that's why these operators are for. Don't optimize prematurely, because:
The speed of your application might be fine without optimization
You might optimize a part of your application and will win an irrelevant amount of speed increase
Your code will be more complicated and it will be difficult to work on later versions
If you have an issue with speed, it will probably not be related to the idea
Programming is quite difficult in itself sometimes and this kind of hack will make your code unreadable, your teammates will not understand your code and the overall efficiency of your team will be reduced because of ugly codes written by you