Is it possible to model event data (e.g. life events at a concert hall that is marked by location) as part of OpenStreetMap metadata?
If yes, what is the relevant data schema and REST API?
If no (since this may be considered out of scope), are there other prominent collaborative projects that deal with the gathering of (pointers to) events in a common, machine-readable data format?
So far I have seen several aggregators (e.g. apps) that use OpenStreetMap for (the more "static") geographic data and add own proprietary layers for storing (the more "dynamic") event data (e.g. when a life concert takes place or when a street is closed to traffic due to construction), etc., on top. I am wondering if such proprietary layers are strictly needed with all the open data out there.
Well, there is a general consensus in the OSM community, that we just add geodata about mid- long-term objects.
http://wiki.openstreetmap.org/wiki/Best_practices
But of course, the community is very dynamical and yes there are people who contribute geodata about XXL local events (Burning man, ...) or objects with short lifetime (Xmas markets, ...) but be aware that this is controversial, esp. we lack of a unique time model:
http://wiki.openstreetmap.org/wiki/Comparison_of_life_cycle_concepts
No idea if there are mashups that geolocate events an return the data via public API and a good license. Usually people prefer Facebook, Google+, ... so I guess it's not that easy to crawl that stuff legaly.
Related
Currently I am trying to develop a multi-layered Leaflet map using GeoJSON (for country, states/provinces) and CSV (for city data). I want it to go down all the way to city level and that requires layers of Country, State/Province, and City data. I have all the Country data I need in the format I require (GeoJSON) and I have a decent source for City data in a CSV format.
However, I only have USA, Canada, Brazil, and Australian states/provinces and I have been looking around but haven't been able to find a reliable source such as NaturalEarth (which is where I initially got my states/province data from).
Does anyone have a resource they could point me towards? Even if it is multiple, hopefully I can merge them together in mapshaper/other open source applications. I've been looking for the past month but I am new to geographic visualization so I don't know the good spots to look yet.
Thank you so much for any help
You can try BBBike exports of OpenStreetMap data. Probably you will have to export bit by bit as there is a size limit to what you can export.
I recommend you also OpenStreet map. Osm also uses NaturalEarth for higher layers.
But I would download their database (for test, just download a small country, then you can download continents at a time).
Then I would select the features I'm interested. In your case the boundaries (region, districts, cities/municipalities), and city/hamlets names. Then you should look at the tools they have (and what you need). My simple and stupid way would be just to import such features in a GIS database, and then use it to get data. But you may find shortcuts and use directly the data without importing in a database.
Check OSM wiki, Downloading data and Planet.osm page, and other linked pages about tools. You may need to look the Feature page to known what features you want (just to discard most of the data).
If you want to use the tiles (pre-rendered images), you may need to read the term of services. It may be easier to download all data and render in your server, or to buy a service which offer you tiles. (all in the wiki).
Event sourcing and CQRS is great because it gets rids developers being stuck with one pre-modelled database which the developer has to work with for the lifetime of the application unless there is a big data migration project. CQRS and ES also has other benefits like scaling eventstore, audit log etc. that are already all over the internet.
But what are the disadvantages ?
Here are some disadvantages that I can think of after researching and writing small demo apps
Complex: Some people say ES is complex. But I'd say having a complex application is better than a complex database model on which you can only run very restricted queries using a query language (multiple joins, indexes etc). I mean some programming languages like Scala have very rich collection library that is very flexible to produce some seriously complex aggregations and also there is Apache Spark which makes it easy query distributed collections. But databases will always be restricted to it's query language capabilities and distributing databases are harder then distributed application code (Just deploy another instance on another machine!).
High disk space usage: Event store might end up using a lot of disk space to store events. But we can schedule a clean up every few weeks and creating snapshot and may be we can store historical events locally on an external HD just incase we need old events in the future ?
High memory usage: State of every domain object is stored in memory which might increase RAM usage and we all how expensive RAM is. BIG PROBLEM!! because I'm poor! any solution to this ? May be use Sqlite instead of storing state in memory ? Am I making things more complex by introducing multiple Sqlite instances in my application ?
Longer bootup time: On failure or software upgrade bootup is slow depending on the number of events. But we can use snapshots to solve this ?
Eventual consistency: Problem for some applications. Imagine if Facebook used Event sourcing with CQRS for storing posts and considering how busy facebook's system is and if I posted a post I would see my fb post the next day :)
Serialized events in Event store: Event stores store events as serialized objects which means we can't query the content of events in the event store which is discouraged anyway. And we won't be able to add another attribute to the event in the future. Solution would be to store events as JSON objects instead of Serialized events ? But is that a good idea ? Or add more Events to support the change to orignal event object ?
Can someone please comment on the disadvantages I brought up here and correct me if I am wrong and suggest any other I may have missed out ?
Here is my take on this.
CQRS + ES can make things a lot simpler in complex software systems by having rich domain objects, simple data models, history tracking, more visibility into concurrency problems, scalability and much more. It does require a different way thinking about the systems so it could be difficult to find qualified developers. But CQRS makes it simpler to separate responsibilities across developers. For example, a junior developer can work purely with the read side without having to touch business logic.
Copies of data will require more disk space for sure. But storage is relatively cheap these days. It may require the IT support team to do more backups and planning how to restore the system in a case in things go wrong. However, server virtualization these days makes it a more streamlined workflow. Also, it is much easier to create redundancy in the system without a monolithic database.
I do not consider higher memory usage a problem. Business object hydration should be done on demand. Objects should not keep references to events that have already been persisted. And event hydration should happen only when persisting data. On the read side you do not have Entity -> DTO -> ViewModel conversions that usually happened in tiered systems, and you would not have any kind of object change tracking that full featured ORMs usually do. Most systems perform significantly more reads than writes.
Longer boot up time can be a slight problem if you are using multiple heterogeneous databases due to initialization of various data contexts. However, if you are using something simple like ADO .NET to interact with the event store and a micro-ORM for the read side, the system will "cold start" faster than any full featured ORM. The important thing here is not to over-complicate how you access the data. That is actually a problem CQRS is supposed to solve. And as I said before, the read side should be modeled for the views and not have any overhead of re-mapping data.
Two-phase commit can work well for systems that do not need to scale for thousands of users in my experience. You would need to choose databases that would work well with the distributed transaction coordinator. PostgreSQL can work well for read and write separate models, for example. If the system needs to scale for a high number of concurrent users, it would have to be designed with eventual consistency in mind. There are cases where you would have aggregate roots or context boundaries that do not use CQRS to avoid eventual consistency. It makes sense for non-collaborative parts of the domain.
You can query events in serialized a format like JSON or XML, if you choose the right database for the event store. And that should be only done for purposes of analytics. Nothing inside the system should query event store by anything other than the aggregate root id and the event type. That data would be indexed and live outside the serialized event.
Just to comment on point 5. I've been told that Facebook does use ES with Eventual Consistency, which is why you can sometimes see a post disappear and reappear after you've posted it.
Usually the read-model your browser is accessing is located 'close' to you, but after you make a post the SPA switches over to a read-model that is close to your write-model. The close proximity between the write-model (events) and the read-model mean you get to see your own post.
However, 15 minutes later your SPA switches back to the first, closer, read-model. If the event containing your post hasn't yet propagated to that read-model you'll see your own post disappear only to reappear sometime later.
I know it's been almost 3 years since this question was asked, but still this article may be useful for someone. Key points are
Scaling with snapshots
Visibility of data
Schema changing
Dealing with complex domains
Need to explain it to most new team members
Event sourcing and CQRS is great because it gets rids developers being stuck with one pre-modeled database which the developer has to work with for the lifetime of the application unless there is a big data migration project.
This is a big misconception. The relational databases were invented exactly for the evolution of the model (thanks to simple two-dimensional tables as opposed to pre-defined hierarchical structures). With views and procedures ensuring the encapsulation of data access, the logical and physical model can evolve independently. This is also why SQL defines DDL and DML in the same language. Some RDBMS also allow all those evolutions to be versioned and deployed online (continuous delivery) as Oracle Edition Based Redefinition.
Big data structures are predefined and can be read only with the code developed for this structure. Ok when consumed immediately but you will have hard time to read it 10 years later without the exact version, and language compiler or interpreter.
I hope to not be late to try to give an answer. In these months I've done a lot of research on that argument with the goal of implementing a production-grade solution for some parts of my architecture where ES can make sense
Complex: Actually, it should not be complex, its mission is to be deadly simple. How? pushing all the complexity from business logic code to infrastructure code. The data access should be done by frameworks that are not enough mature yet. Still, there is no clear winner in the ES/CQRS race, maybe because is still a niche/hipster approach (?) So some team is rolling its own solution or adopting some ready-made technology such as Axon
High disk space usage: I would say more, I would say * potentially infinite* Disk Usage. But if you go towards ES, you also have a very good reason to tolerate this apparent drawback. Let's give some of them:
Audit Logs : The datastore is an event log, we already know it. Financial apps or every mission/safety critical could need a centralized audit log that enables to state Who made What in Which moment. ES provides this capability of the box...you can also decorate your event entries with some business meaningful metadata (eg. a transaction Id correlated with some API consumer identity, A severity level of the operation...)
High Concurrency: there are systems where logical resource states are mutated by many clients in a concurrent way. These are games, IoT platforms, and so on. Logging events instead of change a state representation could be a smart way to provide a total order of events. The other way is to delegate to DB the synchronization stuff. But this is not what you want if you're into ES
Analytics Let's say you have a lot of data with a lot of business value, but you still don't know which. For years we extracted knowledge from applications information by translating data organization with different information models (OLAP cubes). The event store provides something similar out of the box again. Event logs is the rawest form of representation of information And you can have many ways to process them, in batch or reacting to events stored
High memory usage: I think it should be the same once you have built your projection
Longer bootup time: If the read side caches its projections and "remembers" the last update event, it should not re-apply the entire event sequence. Snapshots are mitigation but if you do a lot of snapshots maybe you made a bad choice with ES. I think that this problem is minor in microservices ecosystems, where the boot time can be masked without service interruption. In fact, you get the most out of ES/CQRS when you apply it so microservices
Eventual consistency: Blame CAP theorem for this, not ES. Many non ES/CQRS have to deal with this, but there are a lot of scenarios where it is not a real problem. These are the scenarios where ES fits well. And you can mix ES and non ES services into the same platform
Serialized events in Event store: if it's important to have a non-serialized event representation, you could use a document-oriented DB, but if you do this to make queries over events payload, you are missing the point of ES/CQRS. ES means to move all data manipulation from the DB side to the application tier, where every piece changes fastly, and all are stateless. This enhances scalability and fault tolerance and provides means to shape the organization of your team, doing things like let the frontend guy/girl write his/her BFF in javascript easily.
I hope to put into practices this principles with good results and draw on the benefits of this exciting approach
I have hundreds of thousands of PDFs that are presently stored in the filesystem. I have a custom application that, as an afterthought to its actual purpose, provides access to these PDFs. I would like to take the "storage & retrieval" part out of the custom application and use an OpenSource document storage backend.
Access to the PDF Store should be via a REST API, so that users would not need a custom client for basic document browsing and viewing. Programs that store PDFs should also be able to work via the REST API. They would provide the actual binary or ASCII data plus structured meta data, which could later be used in retrieval.
A typical query for retrieval would be "give me all documents that were created between days X and Y with document types A or B".
My research, whether such a storage backend exists, has come up empty. Do any of you know a system that provides these features? OpenSource preferred, reasonably priced systems considered.
I am not looking for advice on how to "roll my own" using available technologies. Rather, I'm trying to find out whether that can be avoided. Many thanks in advance.
What you describe sounds like a document management or asset management system of which there are many; and many work with PDF files. I have some fleeting experience with commercial offerings such as Xinet (http://www.northplains.com/xinet - now acquired apparently) or Elvis (http://www.elvisdam.com). Both might fit your requirements but they're probably too big and likely too expensive.
Have you looked at Alfresco? This is an open source alternative I came into contact with years ago while being on the board of a selection committee. As far as I remember it definitely goes in the direction of what you are looking for and it is open source so might fit that angle as well: http://www.alfresco.com.
I understand that XMPP is used in chat services, but it seems to be more generally useful than that. Can someone list some scenarios and examples where you would consider using XMPP, and the pros and cons of it versus other approaches?
I know that Dropbox uses it for its filesharing system in Android (possibly it does in other platforms too).
Cons: much more verbose than binary (more bandwidth).
Pros: a wide variety of already implemented client and servers. A wide range of already implemented reliability, scalability, security, presence, rpc, federation, custom components, mail, VoIP mechanisms... the list is very very long. Even if you need something different, and you know where to touch, you could extend it to your needs, inheriting all the already implemented features.
We had a project on collecting information eg. wind direction, temperature, stock and forex, etc. Every sensor is a Jabber 'user'.
This allows us to detect if a sensor
is online, offline or problematic.
Sensors also publish information to
a pubsub node to be distributed to
collectors.
Human users can also
interact with a particular sensor by
querying with the sensor. The sensor
returns human friendly formatted
data.
We use it for chatrooms, and for distributing sports results to users watching live events.
Google Buzz and Facebook talk is built on it.
I am using Scantron Cognition Enterprise at work to capture data from scanned forms. Building these forms is tedious at best, especially when it would be nice to have a library of pre-built objects to use. Unfortunately, documentation and on-line resources are scarce.
Does anyone have any pointers to find some resources for this tool?
Hey Jason, believe it or not, Scantron is STILL the standard, but this is not the Scantron you probably remember. Although OMR (bubble) forms are still used extensively in education, there are a lot more advanced technologies available to be added to them today.
Concerning Cognition, I looked through the available tags and these would fit:
"document-imaging" - Cognition is a document imaging product and can feed images and index values into most commercially available document storage applications
"OCR" - Optical Character Recognition, or reading machine print.
"ICR" - Intelligent Character Recognition - reading hand writing, usually in a constrained print format (one letter per box like a credt card application.
"datacollection" - the key purpose of Cognition is data collection.
However, there is not a tag for "OMR" - Optical Mark Recognition, or reading bubble choices, similar to the basic Scantron forms of the past. Also, I could not find one for "Key From Image", another purpose that Cognition is used for.
I am a Cognition user as well as someone who markets it and I know that there are a large number of users in North America. Many corporations that use Cognition use it for sensitive HR functions and so might not have their usage of it posted in a searchable format. Many other organizations use it for safety inspections, insurance data entry, and also for testing and surveys - basically anywhere you have a large number of paper forms and need all of the data quickly entered into a database. Many users are using Cognition for sensitive applications are so are not likely to share, but I can share a few I have, you could also contact your Scantron rep and they might have something they could share as well. I have some decent ICR fields built for name, e-mail, address, etc. The ICR fields are best when you build in your own dictionary or database look-ups. The OMR fields are the hard ones to build, but I have a few of these as well. The easiest way to share these is to send you the form that already has the field built into it. You can build your own lookups from txt, xls or db files.