How should the 'System Use' field be interpreted in a 'Directory Record'? - specifications

In the ECMA 119 specifications (freely available here), I am trying to understand how to fetch the content of the System Use field:
How is one supposed to compute the length of the System Use field, i.e. how is the value of the LEN_SU found in the left column ?

The value of LEN_SU is given implicitly. From BP1 you know the total number of bytes in the directory record (LEN_DR). LEN_SU is then given (implicitly) as the bytes remaining in the directory record after 33+LEN_FI+possible_padding(1), where you get length LEN_FI from BP33.
Hence
LEN_SU = LEN_DR - (33+LEN_FI+possible_padding(1))
From the spec.:
Padding Field [BP (34 + LEN_FI)]
This field shall be present in the
Directory Record only if the number in the Length of the File
Identifier field is an even number.
System Use [BP (LEN_DR - LEN_SU + 1) to LEN_DR)
This field shall be
optional. If present, this field shall be reserved for system use. Its
content is not specified by this Standard. If necessary, so that the
Directory Record comprises an even number of bytes, a (00) byte shall
be added to terminate this field.

Related

Maximum length of headers

I am interested in the maximum length of the header name, header value.
And are there any restrictions on the maximum number of parameters?
None of the relevant specifications define a maximum length for a header name or value, but rfc5321 section 4.5.3.1.6 states that the maximum line length is 1000 octets (aka 1000 bytes) including the terminating <CR><LF> sequence.
How does that affect maximum header name/value lengths, you might ask?
It doesn't affect the maximum header value length at all because rfc5322 section 3.2.2 defines CFWS (Comment Folding WhiteSpace) which is further used in the BNF grammar definitions for headers, which basically allows header values to be infinite in length.
That said, while there is no explicit maximum length for a header field name, there is a practical one.
The maximum line length is 1000 octets (including the terminating <CR><LF> sequence).
The recommended maximum line length is 78 octets (see rfc5322 section 2.1.1).
The syntactical definition of a header looks like this:
optional-field = field-name ":" unstructured CRLF
field-name = 1*ftext
ftext = %d33-57 / ; Printable US-ASCII
%d59-126 ; characters not including
; ":".
(where optional-field is any header field that is not pre-defined in the specification such as "To", "From", "Date", "Subject", etc). This syntax definition can be found in rfc5322 section 3.6.8.
Header field names cannot be folded (as seen by the syntax definition).
Since it must be possible to represent a header field name and the colon (":") all within 998 bytes (1000 bytes minus the <CR><LF> sequence), we can safely conclude that the maximum length of a header field name is 997 bytes (or, since header field names are constrained to US-ASCII, 997 characters) and SHOULD be constrained to fit within a recommended maximum line length of 78 bytes, meaning the maximum header field name SHOULD be a maximum of 77 bytes/characters.

How do I check if a file is *mostly* identical with another?

I need to use Powershell to check if two files are the same but with the following restriction: there are eight specific bytes in the first 2K that are allowed to be different (if you're interested, it's certain timestamp bytes in the superblock of an ext4 image).
The code I found on Stack Overflow (obviously) for doing full checks is as follows:
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$hash = [System.BitConverter]::ToString(
$md5.ComputeHash([System.IO.File]::ReadAllBytes("fspec.bin")))
This gives me the hash of the entire file but what I really need is:
the first 2K of the file as a byte array so I can check specifics; and
the checksum of the remainder of the file to check equality.
The System.IO.File class has ReadAllBytes but does not appear to have the capacity to read a section of the file, nor seek to a specific place.
I have attempted to read in the byte array and use array slicing to get the parts as follows:
$restOfFile = [System.IO.File]::ReadAllBytes("fspec")
$firstTwoK = $restOfFile[0..2048]
$restOfFile = $restOfFile[2048..$restOfFile.Length]
# Then:
# 1. Check bytes in firstTwoK.
# 2. Check MD5 of all bytes in restOfFile.
Unfortunately, the fact that it's a 750M file is causing problems:
Array dimensions exceeded supported range.
At C:\testprog\testprog.ps1:42 char:1
+ ${devBytes} = ${devBytes}[2048..${devBytes}.Length]
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], OutOfMemoryException
+ FullyQualifiedErrorId : System.OutOfMemoryException
Is there a functional way to do what I need?
Use one of the derived types of System.Security.Cryptography.HashAlgorithm and use its ComputeHash method to specify an offset. For checking file uniqueness, MD5 is still fine to use, though you can use a stronger algorithm if you choose as well:
$fileBytes = [System.File.IO]::ReadAllBytes("C:\path\to\file.ext")
$md5Cng = [System.Security.Cryptography.MD5Cng]::Create()
$fileHashAfterOffset = $md5Cng.ComputeHash( $fileBytes, 2KB, $fileBytes.length - 2KB )
The first argument of ComputeHash is the file as a Byte[]. The second argument is the offset (e.g. don't include the first x bytes when generating the hash), and the third argument is how many bytes you want to evaluate. In this case, we want the rest of the file, so we take the total number of bytes in the $fileBytes array and subtract the offset from it.
Using 2KB is shorthand to get the number of bytes in 2 kilobytes.

How to parse logs and mask specific characters using Powershell

I have a problem that I really hope to get some help with.
It's rather complex but I will try and keep my explanation as simple and objective as possible. In a nutshell, I have log files that contain thousands of lines. Each line consists of information like date/time, source, type and message.
In this case the message contains a variable size ...999 password that I need to mask. Basically the message looks something like this (its an ISO message):
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
For each line I need to zero in on password length identifier (008) do a count on it and then proceed to mask the number of following characters, which would be PASSWORD in this case. I would change it to something like XXXXXXXX instead so once done the line would look like this:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008XXXXXXXX6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
I honestly have no idea how to start doing this with PowerShell. I need to loop though each line in the log file, and identify the number of characters to mask.
I've kept this high level as a starting point, there are some other complexities that I hope to figure out at a later time, like the fact that there are different types of messages and depending on the type the password length starts at another character position. I might be able to build on my aforementioned question first but if anyone understands what I mean then I would appreciate some help or tips about that too.
Any help is appreciated.
Thanks!
Additional information to original post:
Firstly, thank you to everyone for your answers thus far, its been greatly appreciated. Now that I have a baseline for how your answers are being formulated based on my information I feel I need to provide some more details.
1) There was a question about whether or not the password starting position is fixed and the logic behind it.
The password position is not fixed. In an ISO message (which these are) the password, and all information in the message, is dependent on the data elements present in the message which are in turn are indicated by the bitmap. The bitmap is also part of the message. So in my case, I need to script additional logic above and beyond the answers provided to come full circle.
2) This is what I know and these are the steps I hope to accomplish with the script.
What I know:
- There are 3 different msg types that contain passwords. I've figured out where the starting position of the password is for each msg type based on the bitmap and the data elements present.
For example 0210 contains one in this case:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
What I need to do:
Pass the log file to the script
For each line in the log identify if the line has a msg type that contains a password
If the message type contains a password then determine length of password by reading the preceding 3 digits to the password ("ans ...999" which means alphanumeric - special with length max of 999 and 3 digit length info). Lets say the character position of the password would be 107 in this case for arguments sake, so we know to read the 3 numbers before it.
Starting at the character position of the password, mask the number of characters required with XXX. Loop through log until complete.
It does seem as though you're indicating the position of the password and the length of the password will vary. As long as you have the '008' and something like '666' to indicate a starting and stopping point something like this should work.
$filePath = '.\YourFile.log'
(Get-Content $filePath) | ForEach-Object {
$startIndex = $_.IndexOf('008') + 3
$endIndex = $_.IndexOf('666', $startIndex)
$passwordLength = $endIndex - $startIndex
$passwordToReplace = $_.Substring($startIndex,$passwordLength)
$obfuscation = New-Object 'string' -ArgumentList 'X', $passwordLength
$_.Replace($passwordToReplace, $obfuscation)
} | Set-Content $filePath
If the file is too large to load into memory then you will have to StreamReader and StreamWriter to write the content to a new file and delete the old.
Assuming a fixed position where the password-length field starts, based on your sample line (if that position is variable, as you've hinted at, you need to tell us more):
$line = '22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff'
$posStart = 62 # fixed 0-based pos. where length-of-password field stats
$pwLenFieldLen = 3 # length of length-of-password field
$pwLen = [int] $line.SubString($posStart, $pwLenFieldLen) # extract password length
$pwSubstitute = 'X' * $pwLen # determine the password replacement string
# replace the password with all Xs
$line -replace "(?<=^.{$($posStart + $pwLenFieldLen)}).{$pwLen}(?=.*)", $pwSubstitute
Note: This is not the most efficient way to do it, but it is concise.

Modelica combiTimeTable

I have a few questions regarding combitimeTables: I tired to import a txt file (3 columns: first time + 2 measured data)into a combitimeTable. - Does the txt file have to have the following header #1; double K(x,y) - Is it right, that the table name in combitimeTable have to have the same name than the variable after double (in my case K)? - I get errors if i try to connect 2 outputs of the table (column 1 and column2). Do I have to specify how many columns that I want to import?
And: Why do i have to use in the path "/" instead of "\" ?
Modelica Code:
Modelica.Blocks.Sources.CombiTimeTable combiTimeTable(
tableOnFile=true,
tableName="K",
fileName="D:/test.txt")
Thank you very much!
The standard text file format for CombiTables is:
#1
double K(4,3)
0 1 10
1 3 20
2 5 30
3 7 40
In this case note the "tableName" parameter I would set as a modifier to the CombiTable (or CombiTimeTable) is "K". And yes, the numbers in parenthesis indicate the dimensions of the data to the tool, so in this case 4 rows and 3 columns.
Regarding the path separator "/" or "\", the backslash character "\" which is the path separator in Windows where as the forward slash "/" is the path separator on Unix like systems (e.g. Linux). The issue is that in most libraries the backslash is used as an escape character. So for example "\n" indicates new line and "\t" indicates tab so if my file name string was "D:\nextfolder\table.txt", this would actually look something like:
D:
extfolder able.txt
Depending on your Modelica simulation tool however it might correct this. So if you used a file selection dialog box to choose your file, the tool should automatically switch the file separator character to the forward slash "/" and your text would look like:
combiTimeTable(
tableOnFile=true,
tableName="K",
fileName="D:/nextfolder/table.txt",
columns=2:3)
If you are getting errors in your connect statement, I would guess you might have forgotten the "columns" parameter. The default value for this parameter comes from the "table" parameter (which is empty by default because there are zero rows by two columns), not from the data in the file. So when you are reading data from a file you need to explicitly set this

How can I convert the tiger hash values from the official implementations into the form used by Direct Connect?

I am trying to implement a Direct Connect Client, and I am currently stuck at a point where I need to hash the files in order to be able to upload them to other clients.
As the all other clients require a TTHL (Tiger Tree Hashing Leaves) support for verification of the downloaded data. I have searched for implementations of the algorithm, and found tiger-hash-python.
I have implemented a routine that uses the hash function from before, and is able to hash large files, according to the logic specified in Tree Hash EXchange format (THEX) (basically, the tree diagram is the important part on that page).
However, the value produced by it is similar to those shown on Wikipedia, a hex digest, but is different from those shown in the DC clients I'm using for reference.
I have been unable to find out how the hex digest form is converted to this other one (39 characters, A-Z, 0-9). Could someone please explain how that is done?
Well ... I tried what Paulo Ebermann said, using the following functions:
def strdivide(list,length):
result = []
# Calculate how many blocks there are, using the condition: i*length = len(list).
# The additional maths operations are to deal with the last block which might have a smaller size
for i in range(0,int(math.ceil(float(len(list))/length))):
result.append(list[i*length:(i+1)*length])
return result
def dchash(data):
result = tiger.hash(data) # From the aformentioned tiger-hash-python script, 48-char hex digest
result = "".join([ "".join(strdivide(result[i:i+16],2)[::-1]) for i in range(0,48,16) ]) # Representation Transform
bits = "".join([chr(int(c,16)) for c in strdivide(result,2)]) # Converting every 2 hex characters into 1 normal
result = base64.b32encode(bits) # Result will be 40 characters
return result[:-1] # Leaving behind the trailing '='
The TTH for an empty file was found to be 8B630E030AD09E5D0E90FB246A3A75DBB6256C3EE7B8635A, which after the transformation specified here, becomes 5D9ED00A030E638BDB753A6A24FB900E5A63B8E73E6C25B6. Base-32 encoding this result yielded LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ, which was found to be what DC++ generates.
The only mention of the format of the hash in the Direct Connect protocol I found is on the $SR page on the NMDC Protocol wiki:
For files containing TTH, the <hub_name> parameter is replaced with TTH:<base32_encoded_tth_hash> (ref: TTH_Hash).
So, it is Base32-encoding. This is defined in RFC 4648 (and some earlier ones), section 6.
Basically, you are using the capital letters A-Z and the decimal digits 2 to 7, and one base32 digit represents 5 bits, while one base16 (hexadecimal) digit represents only 4 ones.
This means, each 5 hex digits map to 4 base32-digits, and for a Tiger hash (192 bits) you will need 40 base32-digits (in the official encoding, the last one would be a = padding, which seems to be omitted if you say that there are always 39 characters).
I'm not sure of an implementation of a conversion from hex (or bytes) to base32, but it shouldn't be too complicated with a lookup table and some bit-shifting.