Resources for Scantron Cognition Enterprise? - document-imaging

I am using Scantron Cognition Enterprise at work to capture data from scanned forms. Building these forms is tedious at best, especially when it would be nice to have a library of pre-built objects to use. Unfortunately, documentation and on-line resources are scarce.
Does anyone have any pointers to find some resources for this tool?

Hey Jason, believe it or not, Scantron is STILL the standard, but this is not the Scantron you probably remember. Although OMR (bubble) forms are still used extensively in education, there are a lot more advanced technologies available to be added to them today.
Concerning Cognition, I looked through the available tags and these would fit:
"document-imaging" - Cognition is a document imaging product and can feed images and index values into most commercially available document storage applications
"OCR" - Optical Character Recognition, or reading machine print.
"ICR" - Intelligent Character Recognition - reading hand writing, usually in a constrained print format (one letter per box like a credt card application.
"datacollection" - the key purpose of Cognition is data collection.
However, there is not a tag for "OMR" - Optical Mark Recognition, or reading bubble choices, similar to the basic Scantron forms of the past. Also, I could not find one for "Key From Image", another purpose that Cognition is used for.
I am a Cognition user as well as someone who markets it and I know that there are a large number of users in North America. Many corporations that use Cognition use it for sensitive HR functions and so might not have their usage of it posted in a searchable format. Many other organizations use it for safety inspections, insurance data entry, and also for testing and surveys - basically anywhere you have a large number of paper forms and need all of the data quickly entered into a database. Many users are using Cognition for sensitive applications are so are not likely to share, but I can share a few I have, you could also contact your Scantron rep and they might have something they could share as well. I have some decent ICR fields built for name, e-mail, address, etc. The ICR fields are best when you build in your own dictionary or database look-ups. The OMR fields are the hard ones to build, but I have a few of these as well. The easiest way to share these is to send you the form that already has the field built into it. You can build your own lookups from txt, xls or db files.

Related

Creating a database of many products

I am creating an inventory app currently for iPhone using Parse for companies to keep track of all of their tools, supplies, inventory. Now I'd like to allow for the user/company when adding a new item to their database for them to have the option to search from a pre-made database of items such as for a construction company when adding a simple Dewalt Drill Battery to their inventory would search the pre-made database for "Dewalt #DC9096 18V XRP 2.4A Battery" or an office would search for pencils by brand/serial number/name. I am looking for a simple way to make a database or even a table containing multiple brands products including their prices, product specifications, website for ordering more, company website, warranty phone number, etc... I have considered parsing all of the retail websites for information but don't know the legalities behind it and if the websites change then I'd need to update code. If there is ANY (easier/better) way to do this then assistance or direction would be great!
Thanks always
I would not go down the route of trying to parse websites, that will be a huge pain in the neck and impossible to maintain unless you have extensive resources (and as you mention it probably violates most site's terms of service anyway). Your best bet would be to hook into existing product databases via an API, such as Google's Search API for shopping, or maybe Amazon's API. Here's where you can start if you wanted to use Google:
https://developers.google.com/shopping-search/
Hopefully that gets you going in the right direction.
Edit: Here's a list of a lot more shopping APIs that could be good options:
http://www.programmableweb.com/apis/directory/1?apicat=Shopping
If you did find yourself needing to parse many different vendor websites (we'd call this "screen scraping") and you have the legal right to do so, you should use a tool like SelectorGadget to get your XPaths, it's much faster, easier and less error-prone than doing it by hand.
If you're doing more than a couple websites, though, you'll probably find that you'll have to update the scraping rules pretty often, it definitely won't be a set-and-forget operation.

PDF Storage System with REST API

I have hundreds of thousands of PDFs that are presently stored in the filesystem. I have a custom application that, as an afterthought to its actual purpose, provides access to these PDFs. I would like to take the "storage & retrieval" part out of the custom application and use an OpenSource document storage backend.
Access to the PDF Store should be via a REST API, so that users would not need a custom client for basic document browsing and viewing. Programs that store PDFs should also be able to work via the REST API. They would provide the actual binary or ASCII data plus structured meta data, which could later be used in retrieval.
A typical query for retrieval would be "give me all documents that were created between days X and Y with document types A or B".
My research, whether such a storage backend exists, has come up empty. Do any of you know a system that provides these features? OpenSource preferred, reasonably priced systems considered.
I am not looking for advice on how to "roll my own" using available technologies. Rather, I'm trying to find out whether that can be avoided. Many thanks in advance.
What you describe sounds like a document management or asset management system of which there are many; and many work with PDF files. I have some fleeting experience with commercial offerings such as Xinet (http://www.northplains.com/xinet - now acquired apparently) or Elvis (http://www.elvisdam.com). Both might fit your requirements but they're probably too big and likely too expensive.
Have you looked at Alfresco? This is an open source alternative I came into contact with years ago while being on the board of a selection committee. As far as I remember it definitely goes in the direction of what you are looking for and it is open source so might fit that angle as well: http://www.alfresco.com.

Shopping cart framework that supports multiple vendors?

I'm searching for a shopping cart or web store framework that supports multiple vendors.
There are many, many shopping cart frameworks out there: that page lists couple of hundred. In spite of the comparisons on that page, supporting multiple vendors isn't a comparison item, probably because it's a rare requirement. Separate to that page I have evaluated a few of what appear to be the top frameworks, and none that I evaluated supported this feature. Which carts would you recommend?
Commercial is okay, although I would prefer open source.
Platform (Windows, Linux, ASP.Net, PHP, Ruby... Minix, Fortran... :)) doesn't matter.
A system
where I manually add vendors who request it (instead of them freely
being able to sign up) is also okay, if there's a store where that's
possible but freely joining up isn't built in yet.
Rationale: I'd like to create an app-store like website. "App store" is a close analogy: it won't sell apps, but it will sell digital goods and I'd like anyone to be able to sell their item on the store. It's this second requirement, multiple vendors selling through the store, that I'm finding hard to satisfy.
I've used multiple shopping cart frameworks (a lot of them broken), and my favorite (which just so happens to support multiple vendors) is PrestaShop. It's free, open source, and suppports all that you asked for. Is this the framework you were looking for?
-JXP
The Wikipedia page you cited lists multiple vendor support as a column in Other Features, along with features that are pertinent to your search.
This question otherwise requires domain knowledge and likely requires multiple answers. The best I can do is offer the bounded set of software that competes directly within this space, at least according to Wikipedia.
The easiest solution for achieving your stated goal of allowing multiple people to sell on your site while exercising fine-grained control of who can and cannot do so is perhaps using WPMU's MarketPress in tandem with BuddyPress or WordPress Multisite. I'm not a die-hard fan of WordPress, per se, but that might be an expedient way for you to get to a minimal viable product and to validate your idea before shelling out the time and/or cash to custom build it from the ground up, and/or labor ad nauseam with tweaking an existing framework. MarketPress is a good plug-in that'll give you many of the features of a full-fledged e-commerce framework... BuddyPress, of course, will allow you to set up individual vendor's with their own sites under your brand. The two work together. More on MarketPress at:
http://premium.wpmudev.org/project/e-commerce/installation/
Another alternative is Jimdo's PagePartners. I haven't used it, but it looks intriguing. I like their design sensibilities, and their stated business ethos. This might be a viable option, too. The caveat being: it's not white label. More info about Jimdo's PagePartners here:
http://www.jimdo.com/pagepartner/faq/
Finally, another interesting CMS to explore is SetSeed. I think it'll allow you to launch multiple sites for each vendor via a central hub you control, and will allow you to maintain your branding within each. How, the,n any sort of renumeration would flow back to you for setting up an individual vendor's store would be up to you to figure out... This is a fairly new CMS and it looks like it's evolving smartly and rapidly. If you require some customization of it, to approach more specifically what you ask for, now might be a good time to reach out to the developer...but you might be able to think of an effective way to adapt it for your use right out of the box.
http://setseed.com/multi-site-cms/setseed-hub/
Unfortunately, none of the above is open-source--but, again, the ease by which you could get to a functional site approximating your idea may off-set that drawback. Jimdo is an open-source contributor, however. So, maybe even an e-mail to them might be a fruitful dialogue to begin. If anything, check out each of the above, and it may influence how you search for other solutions, and will at least provide some models in your own thinking or with other developers. The shopping cart is an integrated feature, I believe, in all of the above cases. With regard to giving your vendors the capacity to deliver digital goods (e-books, mp3s, etc.), check out Fetchapp.com. Very cool app. Very easy to set-up...could probably be rolled into one of the above frameworks. The frameworks would handle the issue of individual vendor profiles and/or sub-domains.

Searching for a document format.. flowing layout + page control

I am bouncing around the idea of creating a custom document versioning system to use on business rule manuals. These manuals are broken up into outlined sections which contain one rule per section which are outlined in various ways (1.1, 1.2, etc). There are many manuals which contain the same rule for different locations in the country (down to the state/county level), however many locations will have different versions of the rules depending on business needs or whatnot.
My thought is to create a system which will manage versions of each section/rule separately. This would make the management of this mess much easier to maintain (think hundreds of manuals times hundreds of rules), and it would make fielding query requests from management much quicker.
Ok, it's a fairly easy and straightforward design to this point. Now for the monkey wrench. These rules are regulated by government agencies, so they must be submitted to and approved by state agencies. In doing this, many states require only the exact pages which are updated for each request to be submitted for approval. Once they are approved, these pages will get a new effective date and the rest of the manual will remain the same. There are business reasons for this process.
So my choice of document format has to allow for flowing layout much like Word, however I need to be able to programatically determine the page range of these sections and if changes or additions will cause a repagination.
The most complex layout will contain only tables, headers/footers, and a table of contents. I have thought about using OOXML, but I don't see a way to determine pagination without loading Word which is something I would prefer to avoid. I could create my own pagination algorithm, but that sounds a lot like reinventing the wheel.
Can anyone offer pointers to a solution whether it is an open document format, a book, or something else? Thank you for taking the time to read this.
If you want a truly modular document, then DocBook might be worth a look. You have all the rich formatting you need but it does need a bit of work. It really depends on who's doing the authoring and what tools they're comfortable using. DocBook is a rich mark-up language and you can do anything from work in the base plain text file or look at a number of WYSIWYG editors, e.g. ArborText.
It's not Word though - which might be enough to put your authors off!
If you did go with DocBook, you would maintain each document section in a separate text file so your versioning solution would work well. DocBook can produce output in a number of formats simultaneously so you could have an HTML version, an OOXML version, and a PDF version produced from the same source. A PDF version of each changed section might be appropriate to send to government agencies for approval.
On pagination, you could make life a lot easier for yourself by not having continuous page numbers. Use section or chapter based page numbering, e.g. page I-1, I-2, ..., II-1, II-2.

Is there a well known classifier library?

I'm crawling data from internet,without classifying.
Is there such a library to recommend?
EDIT
I'm crawling jobs from other website,and I need to group them into different industries.
To sort unlabelled data into groups, you want clustering, not classification. The most complete machine learning library is the Java-based Weka. You'll probably want to start by extracting text from the web pages (remove script and style elements completely, strip other tags), and then running the text through the StringToWordVector filter before performing clustering.
My current employer developed a system to categorize web pages. There were not any useful libraries that we could find so we had to do our own. We do not license ours out.
I can give you some hints. Spam analyzers classify email into Junk or Not Junk. You can use the same tools such as Bayesian, CRM-114, etc to do your own classifications on any text, including web pages.
You will have to watch the results of these very carefully and give them a lot of human feedback. You can often find keyword sets that will score very well for you. Finding those keyword sets will take time and effort and it will change some over time.
You will have to write code to divide web pages into topic sections because most pages are not all one thing. There are ad frames, navigation and other things.