Generate a torrent/magnet link from a single file in a torrent collection - metadata

I was wondering if it is possible, having a torrent collection (IE a torrent containing multiple files) to extract a single one, generating an almost new torrent/magnet link to download only that single file but using the same source (announce, etc), instead of dowloading the whole torrent and then select what to download or not.
Thanks for any hint about.

2019 Update: Yes, you now can! In 2017 a draft BEP was released that covers the question's behaviour for magnet URIs! This is great, as it creates a standard that keeps a consistent info_hash between a magnet URI pointing to the multi-file torrent, and a magnet URI pointing to a single file within that multi-file torrent. They will share a swarm, which means you can, as the question asks "[generate] an almost new torrent/magnet link to download only that single file but using the same source".
The draft BEP:
http://www.bittorrent.org/beps/bep_0053.html BEP 53: "Magnet URI extension - Select specific file indices for download"
Example URI to request files 0, 2, 4 and the inclusive range 6 through to 8:
magnet:?xt=urn:btih:HASH&dn=NAME&tr=TRACKER&so=0,2,4,6-8
And the draft BEP is making it's way into bittorrent libraries:
https://gitlab.com/proninyaroslav/libretorrent/tags/1.9 LibreTorrent 1.9 2018-NOV-26
https://github.com/webtorrent/webtorrent/issues/1395 Webtorrent 0.100.0 2018-MAY-23
2013-MAY-03 Original Answer:
Sometimes yes, but not often, and the resulting swarm has no peers.
Firstly, you need the original .torrent file, so if you only have a magnet URI you need to resolve that to a .torrent using DHT. Any bittorrent library that supports magnet URIs has the code for that task.
Once you have the .torrent, you then need to get the hashes relating to the file you're interested in. The .torrent file contains a very long string, each 20 bytes representing the hash of each piece in the torrent. Piece length is fixed for a torrent, typically between 256KB and 1MB. If the file starts at exactly a piece offset, and is sized equal to a multiple of the piece size or is the last file in the torrent then you can reuse these hashes. You can then create a new .torrent file with that information, and generate a new magnet URI from the torrent file, re-using the announce or using a new one.
Torrent info structure: https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure
Being lucky enough to get that offset is unlikely, with a piece length generally varying between 256KB & 1MB, you have a 1/262144 to 1/268435456 chance of getting that offset (given that a file could start anywhere in a piece), so the circumstance is unlikely. If you can't re-use the hashes, you need to generate new hashes which means you can't re-use the .torrent and would need to download the files to generate the new piece hashes.
The killer is that in the end, the torrent created has a different info_hash. The info_hash is the hash of the info describing the torrent, which was a description of many files and now in your new hash is the description of a single file, thus is a new torrent so there's no-one available to leech from. Peers collect into swarms based on the info_hash, and if you create a new torrent based on one file from a multifile torrent, the peers from the multifile torrent don't know about it and won't be available to leech from.
Even if you're lucky enough to get the right piece offsets, you create a torrent that doesn't have anyone sharing the file.
So, could you instead re-use the magnet URI and just specify a file name within the torrent? No, the BEP that describes how Bittorrent uses magnet URIs doesn't cover this behaviour. http://www.bittorrent.org/beps/bep_0009.html

Related

Migrating from itext2 to itext7

Years ago, I wrote a small app in itext2 to gather reports on a weekly basis and concatenate them into one PDF. The app used com.lowagie.text.pdf.PdfCopy to copy and merge the PDFs. And it worked fine. Performed exactly as expected.
A few weeks ago I looked into migrating the application to itex7. To that end, I used the copyPagesTo method of com.itextpdf.kernel.pdf.PdfDocument. When run on the same file set, this produces warnings like:
WARN PdfNameTree - Name "section.1" already exists in the name tree; old value will be replaced by the new one.
When I click on the link to "section.1" in the first document of the merged PDF, I am taken to "section.1" of the last document. Not what I expected and not what happens when using the itext2 app. In the PDF's produced by itext2, if I click on the link to "section.1" of the first document in the combined PDF, I am taken to section 1 of the first document.
There is a hint in Javadocs for copyPagesTo saying
If outlines destination names are the same in different documents, all
such outlines will lead to a single location in the resultant
document. In this case iText will log a warning. This can be avoided
by renaming destinations names in the source document.
There is however, no explanation of how this should be done. I find it odd that this should be necessary in itext7, although it wasn't in itext2.
Is there a simple way to get around his problem?
I've also tried the Sejda desktop app and it produces correct results, but I would prefer to automate the process through a batch script.
My guess is iText 2 didn't even know it might be a problem.
If iText can't deduplicate destination names, the procedure is roughly:
Follow /Catalog -> /Names -> /Dests in each document to find the destination name tree.
Deduplicate the names, by adding suffixes. Remember that a name with a suffix added might be equal to an existing name in the same or another document. Be careful!
Now you can rewrite the destination name trees. Since you have only used suffixes, you can do this in place - the lexicographic ordering of the names is unaltered so the search tree structure is not broken.
Now, rewrite destination links in each PDF for the new names. For example any dictionary entry with key /Dest, or any /D in a /GoTo action.
Now, after all this preprocessing, the files will merge without name clashes.
(I know all this because I've just implemented it for my own PDF software. It's slightly hairy stuff, but not intractable.)
If you like, I can provide a devel version of cpdf with this functionality, if you would like to test it.

Recovering data from Firebird database partially-encrypted by ransomware

our test server was hacked and they installed a ransomware (Cry36) for which there is no solution to date. We also didn't keep any snapshots up to date (lesion learned).
Since it's only a test server, i am not too worried. But we had stored in our Firebird DB (v2.5) a bunch of work which i would like to save.
Looking at the database in a hex editor, i can see that the data is encrypted up until offset 00006430.
Looking at the structure of the firebird database it says that all the headers are encrypted (Header page, PIP,..., Data page).
All the data is still there.
I've tryed with gfix and even copying the headers from an older version of the db. But while it does fix the db, the headers are wrong and most of the new pages are removed.
Does anyone have any idea how to restore the database or extract the tables?
Regards
I have used this method restoring ransomware files encrypted on hard drives from any ransomware by renaming the file in question back to its original filename and extension. You may be able to apply the same method to revert the data or database back to the pre-encrypted version of the file/s or data/bases.
From my testing:
the ransomed file = is compressed and or simply renamed, the encryption is either not applied actually but only implied or the containing file or renamed file is encrypted but the original file is never touched. Simply rename back to original and you can access the file as you could be for the attack. Example:
This is the Ransomed file:
Adobe Acrobat XI Pro 11.0.20.zip.id[42AF04FF-2275].[supportcrypt2019#cock.li].Adame
This is the Ransomed file, renamed and fixed:
Adobe Acrobat XI Pro 11.0.20.zip
The removed portion of the FileName is:
.id[42AF04FF-2275].[supportcrypt2019#cock.li].Adame
Upon renaming the file, you will be prompted for approval to change the application type/ file type for which the file will be opened (Back to its original state), and what application will open it (its original designation as determined by the FileType preset after the FileName. The reason the file doesn't work when ransomed is the final file extension renaming scheme, whereas in this case .ADAME is not a real file type, but made up, and no program will or can open it. Thus, the file can not be opened as named.
You would need to do this for each file individually, could you post more information on the database file and encryption information as this should work for you as well. The Ransom Methodology should be the same. I can not identify the naming scheme used on your system without more information pertaining to unusual or new/unidentified portions of code injected throughout your instance.
For Renaming multiple files you could try an application such as "Advanced Renamer" for bulk processing.

WebSphere MQ binary fiiles

This might be a question that may not be answered due to the nature of the external tool I am using (lack of documentation).
Basically, I am using a tool that pushes and pulls messages from the queue, more precisely - it pushes and pulls files. It worked perfectly for text files but when I tried pushing and then pulling a binary file - the pulled one was corrupted, it's size increased in comparsion with the original file (1.33 ratio).
For example moving a zip file wouldn't work...
I suppose it has something to do with the tools configuration, the only settings that can be changed regarding the problem are CCSID and encoding (UTF-8, Base16, etc.), I tried playing with both, unfortunately without success.
Tried using the following CCSIDs: 65535, 1208, 819
and encodings : UTF-8, Base16, Base64
In every case the binary file was corrupted after pulling it from the queue, I'm not entirely sure how the tool acomplishes that, it's written in Java, also I'm new to MQ so I tried searching for the correct options in IBM's docs but I haven't found anything that makes more sense than 65535 and Base16, yet it still doesn't work, could anyone with more experience with MQ tell if playing with these options makes sense at all in this case and if so - suggest what CCSID and encoding can I try to accomplish what Ive described above?
More information is really needed, but my suspicion is you are putting the message on the queue as a text message and playing around with encodings and ccsid's to try to get it right. You really need to know how the 'Java' app achieves this - is it using JMS (eg JMSBytesMessage) or base Java (something like setMessageData).
At a high level, there is a header on a message (The MD) which 'describes' the data - the MD format field. If you say the data is a string then MQ can convert between codepages should the getter request it etc. Put a tiny binary file into a message onto a queue, and browse the queue with amqsbcg or the GUI - what are the MD fields for format? What headers are on the payload - anything like RFH2's?
Put the same code in to give us a clue, or at least the amqsbcg output

LibXML: Comment-out a block of Elements

IS there a way to add/initate a comment ( e.g. $dom->createComment ... ) such that it comments out an entire block of xml tags. Basically I want to turn-off the content between the comment.
For example, it would look like this:
<TT>
<AA>keep</AA>
<!-- comment to blocking
<BB>hideme1</BB>
<CC>hideme2</CC>
-->
<DD>d's content is good</DD>
</TT>
Actually this question is a pre-cursor to my attempt to figure-out a method to be able to markup/label/identify the changes to an xml files in support of new client software functionality, but be able to have the ability to remove / back-out these xml changes in the rare event the client needs to fall back to the previous software version (and no I can't just simply point back to the original xml file because the client is allowed to make minor modifications to existing node text values). This is all going to be controlled via a perl script and LibXML's core modules (I can't use modules the client doesn't have).
So basically I've identified three possible types of xml changes as a result of new client sw functionality:
1.) ADD new element node(s) (typically to support new sw functionality)
2.) DELETE element node(s), or blocks of (would be rare, but never-the-less a possibility)
3.) CHANGE node text values (rare, but the new sw may require a new value)
For all three types, the client needs the ability to back out the changes. One thing I was thinking to use is ATTRIBUTES since the existing xml files don't use them. For example, for an ADD change type, I could include an atribute like 'ADD="sw version 4.1"' . This way if it needs to be removed, I could just simply have the perl script find those attribute strings and delete them (using LibXML methods). Same thing with CHANGE change type - I could use an attribute like CHG="newvalue_oldvalue", then again use straight perl (or LibXML) to switch back the value based on the contents of the attribute. The DELETE change type is giving me a problem though (as welll as the others lol!). I want to be able to "keep" the deleted lines in the xml file soley for the purposes if the sw falls back a version (at some late point the perl script could eventually cleanup/delete them).
I know this is a lot, I'm new to LibXML (but not to perl). I was just wonder if any of you have any thoughts as to how to go about it or seen anything resembling this kind of request ... I'd be grateful for any kind of advice! Thank you...

How can I tell if two image files are the same in Perl?

I have a Perl script I wrote for my own personal use that fetches image files from a website periodically. It then saves these images to a folder. These image files are quite often the same from fetch to fetch, and I'd like to not save duplicates if I can get around it.
My question: What would be the best way to compare/check if they are the same?
My only real thought so far is to open a file handle to existing one, md5 it, md5 the $response->content from the fetch and then compare them. Would that work?
Is there a better way?
EDIT:
Wow, already tons of great suggestions. Does it help if I tell you that this script runs daily via cron? I.e. it is guaranteed to always run at the exact same time everyday? Also: I'm looking at the last-modified headers on some of these, and they don't look 100% accurate, i.e. there are some that have a last-modified of over a week ago when I know the image is more recent than that. I'm assuming that's because the image file itself hasn't been modified on the server since then... which doesn't help me much...
Don't open and hash the stored image each time - stash the hash alongside the image when you store it. Compare sizes as well.
Don't issue a GET request straight away, do a HEAD first and compare the size, last modification date and any Etags to what you got last time.
There are a number of HTTP headers you can use for this -- if you save the time that you last retrieved the file, you can do a conditional get with
If-Modified-Since: <date>
Or, if the server returns an Etag header with the response, you can store that with the image, (or a collection of all of the etags you have seen for that image), and do:
If-None-Match: <all of your etags here>
If the server supports conditional gets, then you will get a "304 Not Modified" response, with no body.
Yep that sounsd right.
Depending on how you're getting the file and how frequently you might also be able to check for HTTP 304 Not Modified and save yourself the download.
md5 would work, but you'd still have to pull the file. Are there any useful metadata in the HTTP headers, content-length, cache-control directives, ETags, etc. ?
There's also a nice fdupes tool for the purpose. Don't know what system you're using and what systems the tool can be built for.