hypercard data file types and start patterns needed - recovery

I am trying to recover deleted HyperCard data files.
I need to know both file extentions and "start Patterns" in the files, so that we can recover the deleted HyperCard data files.

By "data files", do you mean "stack" document files?
File extensions were not relevant to HyperCard "stacks" on Macintosh systems before OSX.
The 4-character file type identifier STAK was used by the Mac OS to associate HyperCard files ("stacks") with the HyperCard application, consistent with the then method of association. (Accordingly, file extensions were not used for this purpose by Mac OS prior to OSX; the filename including extension had no impact on its association to an application).
Accordingly, the first characters in a HyperCard stack were a bunch of Mac-specific characters including the substring "STAK". (the rest of those Mac characters won't paste here).
Have you considered how you will recover both data fork and resource fork? This may involve different challenges depending on the filesystem you are using, or how the original files were stored until now, because modern filesystems do not store data forks and resource forks in the same way that HyperCard "stack" files (and other documents from that era) used to.
Are you running an old OS in an emulator for this recovery task? It may make it easier.
In case it helps, you can also often get the "stack script" by opening a HyperCard "stack" file in your texteditor of choice.

Related

Postgresql fulltext search for Czech language (no default language config)

I am trying to setup fulltext search for Czech language. I am little bit confused, because I see some cs_cz.affix and cs_cz.dict files inside tsearch_data folder, but there is no Czech language configuration (it's probably not shipped with Postgres).
So should I create one? Which dics do I have to create/config? Is there some support for Czech language at all?
Should I use all possible dicts? (Synonym Dictionary, Thesaurus Dictionary, Ispell Dictionary, Snowball Dictionary)
I am able to create Czech configuration for ispell dict and it works fine, bud I am not sure if it's enough (just ispell configuration).
Thanks a lot I tried to read https://www.postgresql.org/docs/9.5/static/textsearch.html but I am little bit confused.
I have never tried it, but you should be able to create a Czech Snowball stemmer as long as you are ready to compile PostgreSQL from source.
There is an explanation in src/backend/snowball/README:
The files under src/backend/snowball/libstemmer/ and
src/include/snowball/libstemmer/ are taken directly from their libstemmer_c
distribution, with only some minor adjustments of file inclusions. Note
that most of these files are in fact derived files, not master source.
The master sources are in the Snowball language, and are available along
with the Snowball-to-C compiler from the Snowball project. We choose to
include the derived files in the PostgreSQL distribution because most
installations will not have the Snowball compiler available.
To update the PostgreSQL sources from a new Snowball libstemmer_c
distribution:
Copy the *.c files in libstemmer_c/src_c/ to src/backend/snowball/libstemmer
with replacement of "../runtime/header.h" by "header.h", for example
for f in libstemmer_c/src_c/*.c
do
sed 's|\.\./runtime/header\.h|header.h|' $f >libstemmer/`basename $f`
done
(Alternatively, if you rebuild the stemmer files from the master Snowball
sources, just omit "-r ../runtime" from the Snowball compiler switches.)
Copy the *.c files in libstemmer_c/runtime/ to
src/backend/snowball/libstemmer, and edit them to remove direct inclusions
of system headers such as <stdio.h> – they should only include "header.h".
(This removal avoids portability problems on some platforms where <stdio.h>
is sensitive to largefile compilation options.)
Copy the *.h files in libstemmer_c/src_c/ and libstemmer_c/runtime/
to src/include/snowball/libstemmer. At this writing the header files
do not require any changes.
Check whether any stemmer modules have been added or removed. If so, edit
the OBJS list in Makefile, the list of #include's in dict_snowball.c, and the
stemmer_modules[] table in dict_snowball.c.
The various stopword files in stopwords/ must be downloaded
individually from pages on the snowball.tartarus.org website.
Be careful that these files must be stored in UTF-8 encoding.
Now there is a Czech Snowball stemmer available here, it was contributed to the project. There is no stop word dictionary available, but I am sure you can either find one or create one yourself.
The real work would be to install Snowball and use the Snowball-to-C compiler to create the C and header files to add to the PostgreSQL source.
These files should then remain stable, so it shouldn't be difficult to upgrade to a new PostgreSQL version.
If you are willing to do the work, but don't want to patch PostgreSQL and build it from source every time, you could also consider submitting a patch to PostgreSQL. As long as the stemmer works fine, I don't expect that you will much resistance there (but the patch submission process is still tedious).

Which source control uses a "s." prefix on its filenames?

I found what appears to be an old source repository for some source code that I need to resurrect. But I have no idea what source control tools were used to generate and manage this source repository. In the directory, all of the files have a "s." prefixed to the file name. Without knowing the format in these files, I cannot manually extract the source code with any degree of accuracy. And even if I did, manually extracting the source code would be very time consuming and error prone.
What source/version control system prefixes its source files with "s." when it stores the source file in its repository directory?
How can I effectively extract the latest source code from this repository directory?
The s. prefix is characteristic of SCCS, the Source Code Control System. The code for that is probably still proprietary, but GNU has the CSSC project which can manipulate SCCS files. It tracks changes per-file in revisions, known as 'deltas'.
SCCS is the official revision control system for POSIX; you can find the commands documented on the Open Group site (but the file format is not specified there, AFAICT):
admin
delta
get
prs
rmdel
sact
unget
val
what
The file format is not specified by POSIX. The manual page for get says:
The SCCS files shall be files of an unspecified format.
The original SCCS command set included some extras not recorded by POSIX:
cdc — change delta commentary (for changing the checkin comments for a delta)
comb — combine, effectively for merging deltas
help — no prefix; the wasn't any other help program at the time. Commands generate error codes such as cm3 and help interpreted them.
sccsdiff — difference between two deltas of a file
Most systems now have a single command, sccs, which takes the operation name and then options. Often, the files were placed into an ./SCCS/ subdirectory and extracted from that as required, and the sccs front-end would handle name expansion, adding s. or SCCS/s. to the start of the file names.
For extracting the latest version of the source code, use get.
get s.*
sccs get s.*
These will get the default version of each file, and the default default is the latest version of the file.
If you need to make changes, use:
get -e s.filename.c
...make changes...
delta -y'Why you made the changes' s.filename.c
get s.filename.c
Note that the files 'lose' the s. prefix for the working file names, rather like RCS (Revision Control System) files lose the ,v suffix for the working file names. If you've not come across that, accept that it was different when SCCS and RCS were created, back in the late 70s or early 80s.
SCCS uses an s. prefix. But it might not be the only one!
I never knew this knowledge would come in useful some day!

Archiving succesive beta versions : how to save harddisk space?

I archive successive versions of an in-progress work :
MySoftware-v1.01beta.rar [2 GB]
MySoftware-v1.02beta.rar [2 GB]
MySoftware-v1.03beta.rar [2 GB]
MySoftware-v1.04beta.rar [2 GB]
etc.
Lots of files are modified, so it's not possible to backup only modified files : most of the files are modified each time.
How can do a .rar file that only saves the "difference" (should I use something like "patch" or "diff" ? -> I never used them). There are lots of "difference" tool, okay, but the result file won't be a .rar, it will only be a "difference file" : so each time I would like to re-open such an archive, I'll have to "de-diff" it and only THEN I will have a .rar again.
I'm on Windows, and if possible, I'd like to use winrar or command-line tool (it would be great if no third party software is needed).
Thanks a lot in advance!
You say 90% of your product is .wav files. Since diff on two wav files that are different is likely to produce huge differences, this is not likely to save you any space. Nor are .wav files really compressible, so zip or rar likely doesn't help much, either.
However, if, like most of us programmers, you derive your next version of the product from the previous one, by mostly retaining files unchanged (whether that be source or be .wav files), then what you really want to do is simply store, for each version, the files that changed. This is called "de-duplication" in the backup/compression world.
You can organize a complicated scheme your self to do this. (e.g., your self-suggested "do this with winrar"). But if you use a decent "source control system" (SVN or GIT would be fine), this will happen automatically as you checkin changed (and don't re-checkin unchanged) files. These tools work by keeping track of "differences" between versions; you can tell the tools to track text ("diff") style differences, or simply store the entire thing.
Also, since your individual versions occupy 2GB, I'd go waste $100 on a 2 or 4 terabyte (external) drive. That should last you in worst case through some 1000 iterations. (SVN/GIT will likely extended this a lot further).
You should really be using a source control system. A popular one is called 'git'. There are many others, each with their own strengths and weaknesses and the debate about which is 'best' is long and tedious.
Source control systems take care of storing and managing revisions of your files. The actual methods vary, but as a programmer who uses version control you 'check in' files for storage and version control, 'tag' them with revision numbers and then 'check out' files for modifying.
If you've ever downloaded source off the Internet using 'svn' or 'cvs', that's the type of thing I mean.
The source control system usually uses some sort of difference system to only store differences between modified files. Its purpose is to save you from having to even think about copying and backing up files - all you have to do is ensure your 'repository' is backed up correctly.
Also, as an added advantage you can make changes to source files and always have backups in case your changes need reverting. So suppose you want to try out a new file handling system you can use the source control system to create a testing (or whatever you want to call it) 'branch' and do all your changes in there without damaging a working copy of your software. If the changes are good you can then 'merge' the changes into the non testing branch of your repository.

Sync (Diff) of the remote binary file

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.
There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.
You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.
To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.
Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.

Hash of an .exe file

I'm wondering whether I will ever get a different result when producing a checksum on an .exe file before and then while or after running that file. I'm more concerned with common practice (such as producing a SHA hash of popular app like firefox.exe) than with boundary cases, but both are interesting. Thanks.
The hash of a file should be constant for as long as the file is identical (i.e. contains only the same bytes, in the same order). It's very rare to find applications that rewrite their on-disk representation at runtime, so the hash should be constant. There are self-modifying programs, but they tend to operate on the in-memory loaded copy of their code, rather than the disk copy.
Edit: We should consider "Self-updating" applications, but these tend to launch a little helper program to download and update the core application. It's difficult (especially on Windows) to update an execution whilst it's running. UNIX systems tend to operate Copy on Write systems, so it's possible that a software update might change your executable under your feet - but again, this is a "corner case".
The hash will only change if the exe changes. That will only happen if the app modifies itself, which isn't going to happen on windows without the app restarting. Firefox might update itself (including a restart), but apart from such cases, the hash will remain the same.
The hash will change if the file changes.
EXE files rarely change on their own. firefox.exe would change if the user updates to a new version.
You can check the "date modified" attribute of an EXE file (like firefox.exe) after running it to see whether it has changed, but you'll probably find it hasn't.
If you mean the modification of the last access time, don't worry, it's stored at the filesystem level, not within the file so the hash will remain the same.