Why is there no .traineddata file in eng - tesseract

The question is as the title suggests: Why is there no eng.traineddata file in the folder eng?
I downloaded all the languages as a zip(I did not see any other option) from here and unzipped langdata-master.zip. From there, I navigated to the eng folder, but it did not contain the eng.traineddata file that many people were suggesting there should have been. Is there some download I am missing?
Thanks!

The trained data files are available in the tessdata project and not in the langdata project you looked in.
The langdata project contains the source training data that was used to create the trained data files. It's useful for people who want to make changes and build their own trained data files.

Related

Importing source files and folders into IAR Workbench

I have a cup of source files in a certain folder structure in my file system. I want to use this structure for a project in the IAR Workbench. Thinking of Eclipse, that could be so easy! But in the IAR Workbench, the folders will become to "Groups", which are only kind of virtual folders. The Workbench doesn't care about folders.
Is there some easy and fast way to import them?
Up to now I have to add the groups manually each and then add the files to the groups, and that's really annoying!
Is there maybe a tool to generate a proper project file (*.ewp) out of a file/folder structure path?
This would help me a lot!
You should have a look at IAR Project/Add Project Connection command.
Although IAR doesn't seem to have any public documentation on the xml syntax, or at least I couldn't find any, you can find Infineon DAVE (Config.xml) and Freescale PE (ProjectInfo.xml) files if you search around. These can be used as examples to figure out the syntax on how to write your own xml files in one of these interfaces, to allow you to specify where all your c, h, assembly and library files are from where ever they may be in your file system. They also allow you to define preprocessor includes for compiler/assembler, and DAVE allows you to define a path variable, which is also very useful.
See: https://mcuoneclipse.com/2013/11/01/iar-arm-v6-7-comes-with-improved-processor-expert-support/
I have modified a DAVE Config.xml file and found it EXTREMELY useful for managing and migrating even just a handful of project files. For example to upgrade to a new release with all files having a new directory root, you just change a single line in the xml file (defining the new root), and all source files, compiler includes etc are all updated to the new level. No more manually editing the preprocessor includes or replacing all the files in the project. And no more fiddling around with ../../ file system hierarchy navigation stuff, you just specify directly (or indirectly via a path to) where the files are, no more relative from where your project happens to be. VERY NICE.
IAR should consider opening this up (documenting) for general users, as it is very useful for project management and migration. While at it they should also consider generalizing the xml syntax a little bit and allow for definition of IAR group heading names, specifying linker file name, and definitely allowing multiple xml files to be included (connected) (so that subprojects can be easily added or removed without effecting the other subproject definition files) and a few basic things like that.
If they where to do a bang up job on this, they might consider allowing most/all aspects of IAR project configuration that might be required by the subproject, to be defined in these xml files, and then entire (sub)projects could just be plopped down anywhere and be up an running extremely quickly (OK, just let me dream a bit :)
For anyone who happens upon this you may want to check out https://github.com/IARSystems/project-migration-tools. They have a tool for pulling in file trees here.

Packaging multiple applications together

I have a project that requires me to add another application to the package.
This application will act as a proxy, such as the one described in the BBMSDKDemoProxy sample project. I'd like the user to be able to download one package, both applications are installed, and my main application is launched via the proxy.
The problem is that I don't know exactly what steps to follow to achieve this. The project will be distributed via the app world, but I'd like to know how to do this via a website too.
I've found a link stating that you simply add both applications in a single .zip and upload that to the app world, but I want to be sure about this.
Any help would be greatly appreciated, thanks.
For AppWorld: just include the extra cod file in the file set to be uploaded. If you are uploading a zip then include the extra cod in the zip content.
For Desktop/BES: you can include the cod file along the other files and manually edit the .alx to add an entry for the new module. I'd not recommend doing this unless you have a good understanding of the alx format and the different elements in the descriptor.
For OTA downloads: You'd place the new cod file with the other cod files (if it contained sibling cods you'd publish the siblings instead). Then you can manually edit the .jad file to add the new module(s).
Of these 3 options, only the first one is safe. Manually editing the alx or jad is tricky, and is very easy to make mistakes. If you need files for desktop-BES or OTA installs I'd add a new library project as #preetam has suggested in the comments.

Custom Eclipse (CDT) project layout, different from folder structure

A good hello to you fellow Stackoverflow people.
I am stuck with a small dilemma here.
At my work we used to work with UltraEdit projects but we want to migrate to using Eclipse CDT. (Not using its compiler/build options, we need an external SDK for this).
On the harddisk we have a specific folder structure to keep things seperate between two teams. Namely the 'productcode' + 'applicationcode'-group and the 'drivercode'-group.
Both groups have their own folder where they place sourcecode in.
application
drivercode
productcode
The filenames are given a specific prefix, denoting to which 'layer' they belong.
os (operating system)
application
system
unit
component
IO
hardware
All of these files (except for application which is only allowed in the application folder) can be in the product or drivercode folder.
In UltraEdit all of these files are grouped under their respective layer. So our project has the following folders:
0 Operating System
1 Application Layer
2 System Safety Layer
3 Unit Layer
4 Component Layer
5 IO Layer
6 Hardware Layer
Generic
XML
The virtual folder '0 Operating System' holds all os_xxx files from the real folders 'drivercode/productcode' And the same goes for 2, 3, 4, 5 and 6.
TL;DR:
Is it possible to get the same (virtual) folder structure within Eclipse CDT?
To make things more complex, this whole folder structure is devided in 3 projects. E.G. proj-1, proj-2, proj-3 and there is also a shared folder that holds code that is shared among projects.
I had a similar situation. Rather than a bunch of hunt/peck for linked resources, which tend break the ability to reuse the .*project files elsewhere, I made a 'workspace setup" script that just symlinked the sources into the directories where their projects were. That way the default eclipse mechanisms (build all source within a tree) just work out of the box.
I have found one way, but it is quite cumbersome.
I can create the structure I want using Linked Resource Folder and files.
However this means I need to go through all dialog's per folder/file in order to add them to the list. I hope there is an other way though. So I'll not accept my own answer as of yet.
Eclipse CDT plays well with existing projects.
I guess you probably also have manually generated Makefile? Then you only need to use File -> Import -> C/C++ -> Existing code as Makefile Project.
This will leave all your source where it was and team members that prefer to no use Eclipse can still use whatever they want, and build from command line.

Should I put my output files in source control?

I've been asked to put every single file in my project under source control, including the database file (not the schema, the complete file).
This seems wrong to me, but I can't explain it. Every resource I find about source control tells me not to put generated output files in a source control system. And I understand, it's not "source" files.
However, I've been presented with the following reasoning:
Who cares? We have plenty of bandwidth.
I don't mind having to resolve a conflict each time I get the latest revision, it's just one click
It's so much more convenient than having to think about good ignore files
Also, if I have to add an external DLL file in the bin folder now, I can't forget to put it in source control, as the bin folder is not being ignored now.
The simple solution for the last bullet-point is to add the file in a libraries folder and reference it from the project.
Please explain if and why putting generated output files under source control is wrong.
You haven't explained what "the database file" is.
I would certainly include 3rd party libraries in source control, as they're necessarily for the build and it's good to have a way of reproducing a build at a later time with the library versions you used at that particular moment. But yes, those libraries should be included from a "libraries" folder rather than the output directory.
I wouldn't generally include my own libraries built from the sources elsewhere in the same repository - although I have been in situations where that's been worth doing, where some projects didn't use the "latest and greatest" version of a common library, but just occasionally updated.
The most important practical argument I'd give against including everything, in a world where disk, processor and network are considered free and instantaneous, is that it makes it harder to tell what really changed for any given commit. It's easier to look down a list of 3 source files than 3 source files and 150 binaries from the obj/bin directories.
Generated output files (in general) are "dangerous" in a VCS because:
what you need to version is how to regenerate them: the day you will need to actually update them, chances are you won't remember how to do it
they can contain some private generated file which make them work on the committer desktop, but not on a client one ("works on my machine" TM syndrome)
some generated file are not easily stored in delta (binary especially), making them consuming lots of space (and the topic of cleaning that space will come-up someday...)
External libraries are not generated directly by your project, and can be put in a VCS, although external repositories like a public Maven repo are better at this kind of management.
Do we also put compiled object files such as class files, executables, DLLs build from our source? What about when we're doing serious volume testing and that database becomes many gigabytes or terabytes in size?
The clue is in the name: it's Source Code Management System.
I can understand the simplicity of put eveything in, it's more likely that developer doesn't forget some important file. But if you're doing regular automated builds then surely that gets picked up anyway?
I think the key phrase is here:
It's so much more convenient than
having to think about good ignore
files
Are you explicitly forbiden from having good ignore files? My guess is that already you are excluding .exe and .class (or whatever) files. Suppose you did take the trouble to exclude your database would that be a problem? Why? It's a concious action that you are chosing to take for the commone good. In Eclipse it's a couple of seconds work to add a new file type to the workspace's CVS ignore rules for all projects.
A rule of "No Ignore Files" is almost self-evidently absurd. Once you have the freedom the have some ignore files then why not just use them intelligently to exclude the DB? Who is inconveninced? Only yourself, if anyone, and you're prepared to do the extra work.

Organizing Master Graphics Files For A Web Application (Photoshop / Gimp)

Some of us programmers have to deal with graphic files every once and awhile...
Making quick fixes to existing graphics
Throwing together a quick sprite as a prototype
Create an icon because the designer is too busy
I don't think anybody should ever be editing web graphics (jpg / gif / png) directly. They should edit the master file and export over top of the web graphics.
Right now all the psd files in our system are either on the designers or various other peoples drives and if a programmer ever needs to make a proper change they can't without high overhead.
How should I approach solving this impediment?
Save the *.psd / *.xcf files right along side the web graphics *.jpg *.png *.gif so they're easy to find. Either train the Graphic design in source control or have somebody responsible for integration. I tend to like this because it make everything very discoverable and you can just have your web publishing system skip *.pdf *.xcf. However I can also see a coder not liking 2mb psd files crufting up their repository and having to download that on svn update.
Create a separate source control repository or network share for master graphics files.
Continue as is...
Any other suggestions?
I always treat the PSD's used on the projects I work on as source files and the jpegs and gifs that are exported from those PSD's as binaries. Same goes for Word files that turn into PDF's.
My trunk normally has two subfolders:
trunk
/source
/deploy
Everything that doesn't go up onto a server belongs in source folder and everything else belongs in deploy. If I create a database export that serves as a backup then I store it in a sub-folder in the source folder (since it doesn't go onto the server).
There are purists that claim that binary files do not belong in version control. I say that is BS. A PSD is a source file for your jpegs and gifs in the same way a .java file is the source to a .class file. I say, tread it as such.
I'd go for option 2. As you say, the overhead of large PSDs in your source repository is probably too high for comfort!
Have you looked into scripting Photoshop export so that the 'Change PSD file' -> 'Updated graphics on website' workflow is as compact and easy as possible? That'll make the whole enterprise really attractive for everyone IMO; especially if its set to run as a build step on commit.
EDIT: Photoshop seems to support Javascript as a scripting language so getting a satisfying result should be fairly straightforward http://morris-photographics.com/photoshop/tutorials/scripting1.html