I am working on a map-matching/trajectory matching project. What I am unsure after reading a number of research papers is what is the most efficient data structure to store the road network (described by a weighted directed graph) so as to facilitate real time searching (Fast). I am getting things like grids, MTrees, Quadtrees...do I need a database in the backend for these. I am working on MATLAB at the moment but can shift languages. What are the programming languages used in actual satellite navigators.
Help will be much appreciated
There are many such index structures like K-D Trees, R-Trees etc. that can be built on the data that you store on the database. You can store your data on SQL Server and use these indexes. You might also want to take a look at SQL Server Spatial tools that helps to perform Spatial queries on your data.
Are there any technologies that can take raw semi-structured, schema-less big data input (say from HDFS or S3), perform near-real-time computation on it, and generate output that can be queried or plugged in to BI tools?
If not, is anyone at least working on it for release in the next year or two?
There are some solutions with big semistructured input and queried output, but they are usually
unique
expensive
secret enough
If you are able to avoid direct computations using neural networks or expert systems, you will be close enough to low latency system. All you need is a team of brilliant mathematicians to make a model of your problem, a team of programmers to realize it in code and some cash to buy servers and get needed input/output channels for them.
Have you taken a look at Splunk? We use it to analyze Windows Event Logs and Splunk does an excellent job indexing this information to allow for fast querying of any string that appears in the data.
My Data
It's primarily monitoring data, passed in the form of Timestamp: Value, for each monitored value, on each monitored appliance. It's regularly collected over many appliances and many monitored values.
Additionally, it has the quirky feature of many of these data values being derived at the source, with the calculation changing from time to time. This means that my data is effectively versioned, and I need to be able to simply call up only data from the most recent version of the calculation. Note: This is not versioning where the old values are overwritten. I simply have timestamp cutoffs, beyond which the data changes its meaning.
My Usage
Downstream, I'm going to have various undefined data mining/machine learning uses for the data. It's not really clear yet what those uses are, but it is clear that I will be writing all of the downstream code in Python. Also, we are a very small shop, so I can really only deal with so much complexity in setup, maintenance, and interfacing to downstream applications. We just don't have that many people.
The Choice
I am not allowed to use a SQL RDBMS to store this data, so I have to find the right NoSQL solution. Here's what I've found so far:
Cassandra
Looks totally fine to me, but it seems like some of the major users have moved on. It makes me wonder if it's just not going to be that much of a vibrant ecosystem. This SE post seems to have good things to say: Cassandra time series data
Accumulo
Again, this seems fine, but I'm concerned that this is not a major, actively developed platform. It seems like this would leave me a bit starved for tools and documentation.
MongoDB
I have a, perhaps irrational, intense dislike for the Mongo crowd, and I'm looking for any reason to discard this as a solution. It seems to me like the data model of Mongo is all wrong for things with such a static, regular structure. My data even comes in (and has to stay in) order. That said, everybody and their mother seems to love this thing, so I'm really trying to evaluate its applicability. See this and many other SE posts: What NoSQL DB to use for sparse Time Series like data?
HBase
This is where I'm currently leaning. It seems like the successor to Cassandra with a totally usable approach for my problem. That said, it is a big piece of technology, and I'm concerned about really knowing what it is I'm signing up for, if I choose it.
OpenTSDB
This is basically a time-series specific database, built on top of HBase. Perfect, right? I don't know. I'm trying to figure out what another layer of abstraction buys me.
My Criteria
Open source
Works well with Python
Appropriate for a small team
Very well documented
Has specific features to take advantage of ordered time series data
Helps me solve some of my versioned data problems
So, which NoSQL database actually can help me address my needs? It can be anything, from my list or not. I'm just trying to understand what platform actually has code, not just usage patterns, that support my super specific, well understood needs. I'm not asking which one is best or which one is cooler. I'm trying to understand which technology can most natively store and manipulate this type of data.
Any thoughts?
It sounds like you are describing one of the most common use cases for Cassandra. Time series data in general is often a very good fit for the cassandra data model. More specifically many people store metric/sensor data like you are describing. See:
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
http://engineering.rockmelt.com/post/17229017779/modeling-time-series-data-on-top-of-cassandra
As far as your concerns with the community I'm not sure what is giving you that impression, but there is quite a large community (see irc, mailing lists) as well as a growing number of cassandra users.
http://www.datastax.com/cassandrausers
Regarding your criteria:
Open source
Yes
Works well with Python
http://pycassa.github.com/pycassa/
Appropriate for a small team
Yes
Very well documented
http://www.datastax.com/docs/1.1/index
Has specific features to take advantage of ordered time series data
See above links
Helps me solve some of my versioned data problems
If I understand your description correctly you could solve this multiple ways. You could start writing a new row when the version changes. Alternatively you could use composite columns to store the version along with the timestamp/value pair.
I'll also note that Accumulo, HBase, and Cassandra all have essentially the same data model. You will still find small differences around the data model in regards to specific features that each database offers, but the basics will be the same.
The bigger difference between the three will be the architecture of the system. Cassandra takes its architecture from Amazon's Dynamo. Every server in the cluster is the same and it is quite simple to setup. HBase and Accumulo or more direct clones of BigTable. These have more moving parts and will require more setup/types of servers. For example, setting up HDFS, Zookeeper, and HBase/Accumulo specific server types.
Disclaimer: I work for DataStax (we work with Cassandra)
I only have experience in Cassandra and MongoDB but my experience might add something.
So your basically doing time based metrics?
Ok if I understand right you use the timestamp as a versioning mechanism so that you query per a certain timestamp, say to get the latest calculation used you go based on the metric ID or whatever and get ts DESC and take off the first row?
It sounds like a versioned key value store at times.
With this in mind I probably would not recommend either of the two I have used.
Cassandra is too rigid and it's too heirachal, too based around how you query to the point where you can only make one pivot of graph data from (I presume you would wanna graph these metrics) the columfamily which is crazy, hence why I dropped it. As for searching (which Facebook use it for, and only that) it's not that impressive either.
MongoDB, well I love MongoDB and I am an elite of the user group and it could work here if you didn't use a key value storage policy but at the end of the day if your mind is not set and you don't like the tech then let me be the very first to say: don't use it! You will be no good at a tech that you don't like so stay away from it.
Though I would picture this happening in Mongo much like:
{
_id: ObjectID(),
metricId: 'AvailableMessagesInQueue',
formula: '4+5/10.01',
result: NaN
ts: ISODate()
}
And you query for the latest version of your calculation by:
var results = db.metrics.find({ 'metricId': 'AvailableMessagesInQueue' }).sort({ ts: -1 });
var latest = results.getNext();
Which would output the doc structure you see above. Without knowing more of exactly how you wish to query and the general servera and app scenario etc thats the best I can come up with.
I fond this thread on HBase though: http://mail-archives.apache.org/mod_mbox/hbase-user/201011.mbox/%3C5A76F6CE309AD049AAF9A039A39242820F0C20E5#sc-mbx04.TheFacebook.com%3E
Which might be of interest, it seems to support the argument that HBase is a good time based key value store.
I have not personally used HBase so do not take anything I say about it seriously....
I hope I have added something, if not you could try narrowing your criteria so we can answer more dedicated questions.
Hope it helps a little,
Not a plug for any particular technology but this article on Time Series storage using MongoDB might provide another way of thinking about the storage of large amounts of "sensor" data.
http://www.10gen.com/presentations/mongodc-2011/time-series-data-storage-mongodb
Axibase Time-Series Database
Open source
There is a free Community Edition
Works well with Python
https://github.com/axibase/atsd-api-python. There are also other language wrappers, for example ATSD R client.
Appropriate for a small team
Built-in graphics and rule engine make it productive for building an in-house reporting, dashboarding, or monitoring solution with less coding.
Very well documented
It's hard to beat IBM redbooks, but we're trying. API, configuration, and administration is documented in detail and with examples.
Has specific features to take advantage of ordered time series data
It's a time-series database from the ground-up so aggregation, filtering and non-parametric ARIMA and HW forecasts are available.
Helps me solve some of my versioned data problems
ATSD supports versioned time-series data natively in SE and EE editions. Versions keep track of status, change-time and source changes for the same timestamp for audit trails and reconciliations. It's a useful feature to have if you need clean, verified data with tracing. Think energy metering, PHMR records. ATSD schema also supports series tags, which you could use to store versioning columns manually if you're on CE edition or you need to extend default versioning columns: status, source, change-time.
Disclosure - I work for the company that develops ATSD.
I would like to identify and analyze different machine instruction executed and required clock cycle for each of them, throughout running of a code.
Is there any way to do this simply? Dynamic binary translation might be a way but i am looking for more easier mechanism.
Thanks in advance
If you are programming, consider using a performance analysis tool such as a profiling tool, such as Intel VTune (http://en.wikipedia.org/wiki/VTune), or oprof.
It is much less common for most programmers to have access to a cycle accurate simulator, although in the embedded space it is quite common.
Dynamic binary translation is probably NOT a good way to measure your program at individual instruction granularity. DBT tools like http://www.pintool.org/ do allow you to insert code to read timers. You could do this around individual instructions is way too slow, and the instrumentation adds too much overhead. But doing this at function granularity can be okay. Basic block granularity, i.e. every branch, borderline.
Bottom line: try a profiling tool like VTune first. Then go looking for a cycle accurate simulator.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
How do I learn PLC programming? Would it differ greatly for different brands of PLCs? Is ladder programming the same as PLC programming?
I did a lot of PLC programming, and now do quite a bit of .NET programming. It's very dangerous to make the switch either way, because a lot of the skills that you think should be transferrable (patterns and such) lead you very far astray.
The biggest difference that I tell people is that PC program code should be written as if other programmers are the audience, but PLC programs (ladder logic) must be written as if maintenance people are the audience. Maintainance in most facilities (particularly manufacturing) frequently connect directly to PLCs and in online mode they can watch the code execute graphically to figure out what's wrong.
For instance, if an output isn't turning on, they'll type the output electrical device ID into the find function of the programming software, find that output coil, and start tracing back from there looking for issues. One of the frequent mistakes that some PLC programmers make is to "map" their I/O into a structure (in PLCs, these are called user-defined types), and they use a copy instruction to move all the inputs or outputs over to the structure at once. Makes sense from a PC programming perspective, but it makes the maintenance person want to kill you. Typically the programming software provides a cross reference feature where they can specify that output coil, and it will tell them everywhere in the program that it's used. If you use a copy instruction to move 10 words of I/O into a 10 word data structure, he's got to sit there and count bits to figure out which bit in the source of the copy maps to which bit on the destination side of the copy. True, comments can help, but there's a problem with that too... PLCs store the whole program and allow you to upload the program from it in an emergency if you need to troubleshoot and you don't have a copy of the original program. The problem is that for space reasons, the PLC doesn't store the comments. So if the line is down, it's costing $5000 per minute in downtime, and a guy runs out there with a laptop, he might have to do a quick upload without comments and try to troubleshoot it. Having those copy instructions in there, wasting 10 minutes of his time, just cost the company $50,000 in downtime. These are the things you have to be aware of when writing PLC programs.
Some other tips: some PLCs have support for FOR loops. Never use them. For the same reason above, they make the code very difficult to troubleshoot for a maintenance person. This is because if you have one piece of code in the PLC that gets scanned more than once per scan (like the contents of a loop), then when you go into online debugging mode, the software can't show you the values for each of 10 loops that executed this scan, so you really have no idea what value you're looking at. Then you have to write all this tricky code to pull the loop values for a specific loop index out into some other tags (variables) that you can monitor. That's just one more impedance to fixing the problem in an emergency. Using a subroutine more than once per scan suffers from the same problem.
Indirect addressing (what we would call Arrays) are very difficult for maintenance people to understand. It's generally OK to use them when you're dealing with recipe management (storing and retrieving values for how to build your part) but you should try to stay away from it in the control part of the program.
In PC programming, of course we seek to re-use code as much as possible. However, in PLCs and control systems, downtime is extremely expensive, and hardware is expensive. Memory is cheap, and actually PLC programmers are cheap. Therefore, it's expected that if you have 10 identical things on your machine (like conveyor drives or something) that you will have 10 different files (subroutines), one for each drive, and each drive will have its own variables associated with them: e.g. Drive1_Run, Drive2_Run, Drive3_Run, etc. This is going to feel very "wrong" to you when you come from a PC programming background, but this is all because of the points I've made above. When you're in a downtime situation, and someone says that Drive 3 isn't working, you crack open the laptop, go to the file for Drive 3 and you look at the Run output rung. You start troubleshooting from there, while the program is executing. There's no breakpoints (the program never stops).
Good luck on your endeavors. I wrote up some more insights from my years of programming PLCs, if you want to check them out.
You can learn PLC programming from various sources on the internet, one of which is this(wikibooks) or this
The program that you write will be pretty much the same across different brands of PLCs for LLDs (Ladder Logic Diagrams) unless you use PLC specific functions. But there will be much more differences if you use some language like IL (Instruction List). But once you have written the program, the format of storage and execution differs widely across brands
Ladder logic is one of the 5 programming languages for PLC, the others being FBD (Function block diagram), ST (Structured text, similar to the Pascal programming language), IL (Instruction list, similar to assembly language) and SFC (Sequential function chart). These are just various representations of the programming language, various flavours if you will. But usually, a given brand supports only one of these. In USA, LLDs are widely used, while in Europe, ILs are more popular.
Ladder, often call LD is one of several language styles defined in ISO 61131 automation programming standard. Others are SFC (sequential flow chart), FBD (functional block diagram), ST (structured text), and IL (instruction list). IL is similar to assembler and very few people use it. ST is a text based programming much like early versions of BASIC. It is not often used either. LD is designed to resemble relay contacts off an electrical control panel (which many PLC replaced). FBD looks more like a circuit diagram. SFC is basically a flow chart.
Some PLC support all, other only some, or even one. While LD is the most common, FBD and SFC are gain popularity.
Different brands do use slightly different programming languages. They are usually similar enough that once you understand one brand, you can work with any of them, but you cannot directly take code from one PLC and using on another brand.
The answers given so far are pretty on target. One thing I found that PLCs have a split personality when it comes to their langauges and setup. Their core design is to give the electrical guys a flexible means of setting up control logic for their overall design. PLCs are basically a bunch of input and a bunch of outputs and how they are connected is controlled by the software you load into the device.
One of the emphasis of the languages that are used for PLCs is that they are accessible to people coming from an electrical background. So the idioms and structures seem counter intuitive for a person used to high level languages or even assembly languages. Ladder Logic for example is very accessible for electrical folks.
However in recent years PLCs have been supporting a multitude of languages for maximum flexibility. However in my opinion the handful of PLCs I worked are very lacking in terms of being a programming environment. Simple things like assigning variable names to memory location are often not designed into the language being used. The ones that are easy to work are often not the most cost effective for the job.
Despite these handicaps they are excellent for simplifying complex electrical systems. If you are working with others on a project, you will find that your knowledge of programming will help the project solve thorny programs. I was able to take a 100 rung ladder logic program and rewrite it into a third of the rungs. Once I was able to learn the ladder logic language I was able implement various optimizations that reduced the complexity of the program.
One tip is that you will need to learn about latching. Sometimes you will need to store or hold some output and unless you have a latch it the result will disappear the next cycle. Once you understand the issue it become clear but at first it was a great source of frustration for me.
PLC programming should be viewed as implementation activity of PLC software engineering output, unless you are using PLC as purely part of alternative components to mechanical or electrical solutions.
With this as basis, PLC programming environment is typically IEC61131 driven, gauranteed cycle time, "pre-emptive" realtime, no need to handle realtime OS related issues, continuous code scanning, non-program-pointer, different concept from typical computer task spawning kind of multi-tasking. Code execution is naturally atomic, no need to use monitors between tasks.
Each of the languages has its closeness to how conceivable is your code to the logic model you want to implement.
Ladder has its basic concept on electrical power flow interlocking style. Code resolution within single network is either horizontal or vertical scanning (your can find resource on this topic from manufacturer or other sites). If your code has single scan resolution nature and is within one network, some unconceivable behavior can be due to scanning type (important to remember that ladder is only emulation of electrical circuit, it is still sequential in execution).
FBD or function block diagram was electronic signal flow but today can be data flow depending on type of PLC. FBD shows clearer execution sequence quite similar to horizontal scanning ladder in scanning sequence. Today, FBD is typically used as container for object function blocks, although dependency implementation and visual similarity to process model is dependent on PLC type.
Literal is very similar to BASIC, but syntax only; execution is still scan-through. Literal language is good for mathematical calculation. For high level implementation, methods or derivation of attributes within object can be easier using Literal. State machine programming using English-like state representation or constants makes program very readable.
Statement list looks similar to assembly mnemonics but again execution is still scan-through and not program pointer. It is strong in bit operation and parenthesis-styled discrete logics. It can be a very efficient language to use with proper structuring and commenting.
SFC or sequential flow chart is a complementary language for sequence implementation. SFC has inherent rules on action block activation, state transitions, parellel sequence activation and merging. However, complex exception branching or concurrent action management can make implementation complicated and flow chart difficult to read.
PLC system management on IO handling, communication, hot-standby is hardware configuration effort, and is product dependent. Generally, can be treated separately from software engineering. However, data related to PLC system management are of "located" (independent data addressing area) type, good data modeling approach in software engineering can help in manageability of system data.
The Online PLC Simulator may be useful.
You can use Structured Text (ST) which consists of a series of instructions which, as determined in high level languages, ("IF..THEN..ELSE") or in loops (WHILE..DO) can be executed.
I find it better than Ladder as it is close to standard programming language.
I had a little of PLC programming on University. It seemed to me, to be a one level lower than assembly, but device we were using wasn't the newest one.
I belive you need to have a PLC driver, but I would first look for simulators and read more about it before buying.
Allen-Bradley has a free dos based software PLC, specifically for training. You can probably find it if you go to their site, or Google it. It's used to teach PLC programming in schools.
For a beginner trying to learn ladder logic, the best way is to attend free online training at http://plcs.net
PLC is the term used for the devices that use ladder logic. The devices that are programmed in more typical programming languages are generally called microcontrollers. However, there are some of us that on occasion lump them all under the PLC name. :-) Not sure how much ladder logic varies, but microcontroller code can vary significantly.