What is multicodec and how it is related to multihash? - hash

I don't have any background with this subject.
To try to understand them better, I read:
Multihash
CIDv1: Multicodec prefix
From what I understand, the multihash is the algorithm used to hash (one way) the value. so it means, we can't go back (we can't decode the hash to the value).
Questions
I don't understand, in simple words, what is multicodec and if it's related to decoding the hash to a value (which makes no sense).
what is the motivation to multicodec prefix?

The multicodec is related to decoding the value the hash points to, if that makes it easier to understand. Don't worry, no magic hash decoding is happening ;). Remember we're making CIDs, and we can use CIDs to lookup content. However then we have the question of "how do we decode this data we just retrieved?", the multicodec solves that problem for us. Reading From Data to Data Structures might help clear up some confusion.
The multicodec prefix allows IPFS to evolve to support new and different encodings for the data that's actually put into IPFS. This refers to IPLD, and you can actually find the answer you're looking for under Links (with information about the codecs under Codecs):
For links we use a CID. A CID is an extension of multihash, in fact a multihash is part of a CID. We simply add a codec to a multihash that tells us what format the data is in (JSON, CBOR, Bitcoin, Ethereum, etc). This way, we can actually link between data in different formats and any link to data anyone ever gives us can be decoded so that it can become more than just a series of bytes.
CID is a standard that anyone can implement, even people that have no other interest in IPLD beyond the need for hash links to different data types can use it.

Related

Reading & writing text in Scala, getting the encoding right?

I'm reading and writings some text files in Scala. As a complete beginner in the language, I wanted to make sure to find the right way to do it, e.g. get the encoding right.
So most of the stuff I found (also on SO ) recommends I use io.Source.fromFile.However, after trying it out like so, reading a UTF-8 file:
val user_list = Source.fromFile("usernames.txt").getLines.toList
val user_list = Source.fromFile("usernames.txt", enc="UTF8").getLines.toList
I looked at the docs but was left with some questions.
Get the encoding right:
the docs show that I can set an encoding in Source.fromFile as I tried above. Looking at the man on Codec and the types listed there, I was wondering if those are all my codec options - is there e.g. no Utf-16, Big-Endian vs Little-Endian, etc.?
I am slightly obsessed with this since it used to trip me up in Python a lot. Is this less of concern with Scala for some reason?
Get the reading in right:
All the examples I looked at used the getLines method and postprocessed it with MkString or List, etc. Is there any advantage to that over just reading in the entire file (my files are small) in one go?
Get the writing out right:
Every source I could find tells me that Scala has no file writing function and to use the Java FileWriter. I was surprised by this - is this still accurate?
Looking at it I feel the question might be a little broad for SO, so I'd be happy to take it back if it does not meet the requirements. At this point, I'm not struggling with specific examples but rather trying to set things up in a way I don't get in trouble later.
Thanks!
Scala only has a basic IO api in the standard library. For the most part you just use the java apis. The fact that a decent api from java exists is probably why the Scala team is not prioritizing having a robust and fully featured IO api.
There are also third party scala libraries you could use as well however. Better Files I've never used but heard good things about as a Scala file api. As well as fs2 which provides functional, streaming IO. I'm sure there are others out there as well.
For encoding, there are many possible encoding available. It's just that only a couple of the most common ones are available as static fields, the rest you typically access through Codec("Encoding Name"). Most apis will also let you just enter a String directly instead of needing to get a Codec instance first. The codec is really just a wrapper over java.nio.charset.Charset. You can run java.nio.charset.Charset.availableCharsets() to see all of the encodings available on your system.
As far as reading, if the files are small you can load them fully into memory if you prefer that. The only reason not to do so is if you want to avoid the extra memory use of loading the entire file at once if reading through line by line is enough. You may want to use Vector instead of List for efficiency reasons (Vector is better in many cases and should probably be preferred as a default collection, but tradition and old habits die hard and most people/guides seem to default to List, but this is a whole other topic)

Approaches to programming application-level protocols?

I'm doing some simple socket programming in C#. I am attempting to authenticate a user by reading the username and password from the client console, sending the credentials to the server, and returning the authentication status from the server. Basic stuff. My question is, how do I ensure that the data is in a format that both the server and client expect?
For example, here's how I read the user credentials on the client:
Console.WriteLine("Enter username: ");
string username = Console.ReadLine();
Console.WriteLine("Enter plassword: ");
string password = Console.ReadLine();
StreamWriter clientSocketWriter = new StreamWriter(new NetworkStream(clientSocket));
clientSocketWriter.WriteLine(username + ":" + password);
clientSocketWriter.Flush();
Here I am delimiting the username and password with a colon (or some other symbol) on the client side. On the server I simply split the string using ":" as the token. This works, but it seems sort of... unsafe. Shouldn't there be some sort of delimiter token that is shared between client and server so I don't have to just hard-code it in like this?
It's a similar matter for the server response. If the authentication is successful, how do I send a response back in a format that the client expects? Would I simply send a "SUCCESS" or "AuthSuccessful=True/False" string? How would I ensure the client knows what format the server sends data in (other than just hard-coding it into the client)?
I guess what I am asking is how to design and implement an application-level protocol. I realize it is sort of unique to your application, but what is the typical approach that programmers generally use? Furthermore, how do you keep the format consistent? I would really appreciate some links to articles on this matter as well.
Rather than reinvent the wheel. Why not code up an XML schema and send and receive XML "files".
Your messages will certainly be longer, but with gigabyte Ethernet and ADSL this hardly matters these days. What you do get is a protocol where all the issues of character sets, complex data structures have already been solved, plus, an embarrassing choice of tools and libraries to support and ease your development.
I highly recommend using plain ASCII text if at all possible.
It makes bugs much easier to detect and fix.
Some common, machine-readable ASCII text protocols (roughly in order of complexity):
netstring
Tab Delimited Tables
Comma Separated Values (CSV) (strings that include both commas and double-quotes are a little awkward to handle correctly)
INI file format
property list format
JSON
YAML Ain't Markup Language
XML
The world is already complicated enough, so I try to use the least-complex protocol that would work.
Sending two user-generated strings from one machine to another -- netstrings is the simplest protocol on my list that would work for that, so I would pick netstrings.
(netstrings will will work fine even if the user types in a few colons or semi-colons or double-quotes or tabs -- unlike other formats that choke on certain commonly-typed characters).
I agree that it would be nice if there existed some way to describe a protocol in a single shared file such that that both the server and the client could somehow "#include" or otherwise use that protocol.
Then when I fix a bug in the protocol, I could fix it in one place, recompile both the server and the client, and then things would Just Work -- rather than digging through a bunch of hard-wired constants on both sides.
Kind of like the way well-written C code and C++ code uses function prototypes in header files so that the code that calls the function on one side, and the function itself on the other side, can pass parameters in a way that both sides expect.
Tell me if you discover anything like that, OK?
Basically, you're looking for a standard. "The great thing about standards is that there are so many to choose from". Pick one and go with it, it's a lot easier than rolling your own. For this particular situation, look into Apache "basic" authentication, which joins the username and password and base64-encodes it, as one possibility.
I have worked with two main approaches.
First is ascii based protocol.
Ascii based protocol is usally based on a set of text commands that terminate on some defined delimiter (like a carriage return or semicolon or xml or json). If your protocol is a command based protocol where there is not a lot of data being transferred back and forth then this is the best way to go.
FIND\r
DO_SOMETHING\r
It has the advantage of being easy to read and understand because it is text based.
The disadvantage (may not be a problem but can be) is that there can be an unknown number of bytes being transferred back and forth from the client and the server. So if you need to know exactly how many bytes are being sent and received this may not be the type of protocol you want.
The other type of protocol is binary based with fixed sized messages that are sent in the header. This has the advantage of knowing exactly how much data the client is expected to receive. It also can potentially save you bandwith depending on what your sending across. Although, ascii can also save you space too, it depends on your application requirements. The disadvantage of a binary based protocol is that it is difficult to understand by just looking at it....requiring you to constantly look at documentation.
In practice, I tend to mix both strategies in protocols I have defined based on my application's requirements.

Advice on modeling behaviors/concepts not commonly thought to be persisted

I am looking for advice and experience related to the best way to REST-ify behaviors/concepts that are not "commonly persisted" - in other words, resource-ifying transformative or side-effect behaviors.
First, let me say that I know there's no complete test for REST-ful-ness - and actually I don't care. I find the CRUD-over-the-Web notion of REST very intuitive and it works well on most of the data I'm providing access to. At the same time, I am not worried about straying from anyones bible on how to perfectly REST-ify something. I'm in search of whatever the best compromise between practical and REST-intuitive is for the cases that don't fit so well. Clearly, consistency is the main goal, but intuitive-ness helps ensure that.
Given that, let me dive into the details of what I'm after.
Since REST is inherently resource-oriented, it is easy to model things commonly persisted - what is less clear is how to model behaviors/concepts that are not commonly persisted, especially those which have side effects or are purely transformational.
For example, take a stackoverflow.com question. It can be Created, Updated, Read, and Deleted. Everyone can relate to that and it all makes sense under REST.
But now consider something like a translation - for example, I want to build a REST API for my service that translates an English sentence to Spanish.
There are at least two ways to address the translation scenario. I could:
Look at a translation invocation as the creation of a "translation instance" (which doesn't happen to be persisted, but could be), in which case a POST /Translation (i.e., a create) actually makes a lot of sense. This is particularly the case if the translation service requires the URL of something to be translated (since the contents of that URL can change over time).
Look at translation as really the act of querying a larger dictionary of known answers, in which case a GET /Translation (i.e., a read) might be more appropriate. This is especially tempting if the translation service requires just the text of the sentence to be translated. Noone would expect the translation of a static sentence to change over time. You could also argue that it could be cacheable, which GET would lend itself towards.
This same dilemma can crop up for other actions which primarily have side effects (e.g., sending an SMS or an e-mail) and are less commonly associated with persistence of the data.
My approach thus far has been to essentially "Order"-ify these cases, that is looking at everything as though one was placing an order. More generally, I think this is just converting verbs (translate) into nouns (translation order), which would seem to be one way of REST-ifying such things.
Can anyone share a better, more intuitive, way to approach the modeling of actions/behaviors that are not commonly assumed to be persisted?
The "Would it hurt to do more than once?"-test perhaps?
In your examples - what would happen if you are asked to translate the same text twice? Not really much. Sending an sms twice is likely to be a bad thing though.
The two options you describe are valid options and you have already highlighted scenarios when one is better than the other. The only other option I would suggest which is use POST with a "processing resource". e.g.
POST /translator?from=en&to=fr
Content-Type: text/plain
Body: The sentence to be translated
=>
200 OK
La phrase à traduire
One nice thing about "processing resources" is that they may or may not have persistent side effects. Obviously the disadvantage is that intermediaries can't cache the response.

How safe is information contained within iPhone app compiled code?

I was discussing this with some friends and we began to wonder about this. Could someone gain access to URLs or other values that are contained in the actual objective-c code after they purchase your app?
Our initial feeling was no, but I wondered if anyone out there had definitive knowledge one way or the other?
I do know that .plist files are readily available.
Examples could be things like:
-URL values kept in a string
-API key and secret values
Yes, strings and information are easily extractable from compiled applications using the strings tool (see here), and it's actually even pretty easy to extract class information using class-dump-x (check here).
Just some food for thought.
Edit: one easy, albeit insecure, way of keeping your secret information hidden is obfuscating it, or cutting it up into small pieces.
The following code:
NSString *string = #"Hello, World!";
will produce "Hello, World!" using the strings tool.
Writing your code like this:
NSString *string = #"H";
string = [stringByAppendingString:#"el"];
string = [stringByAppendingString:#"lo"];
...
will show the characters typed, but not necessarily in order.
Again: easy to do, but not very secure.
When you purchase an app it is saved on your hard disk as "FooBar.ipa"; that file is actually in Zip format. You can unzip it and inspect the contents, including searching for strings in the executable. Try it! Constant values in your code are not compressed, encrypted, or scrambled in any way.
I know this has already been answered, but I want to give my own suggestion too.
Again, please remember that all obfuscation techniques are never 100% safe, and thus are not the best, but often they are "good enough" (depending on what you want to obfuscate). This means that a determined cracker will be able to read your strings anyways, but these techniques may stop the "casual cracker".
My other suggestion is to "crypt" the strings with a simple XOR. This is incredibly fast, and does not require any authorization if you are selling the app through the App Store (it does not fall into the categories of algorithms that require authorization for exporting them).
There are many snippets around for doing a XOR in Cocoa, see for example: http://iphonedevsdk.com/forum/iphone-sdk-development/11352-doing-an-xor-on-a-string.html
The key you use could be any string, be it a meaningless sequence of characters/bytes or something meaningful to confuse readers (e.g. use name of methods, such as "stringWithContentsOfFile:usedEncoding:error:").

POST/GET bindings in Racket

Is there a built-in way to get at POST/GET parameters in Racket? extract-binding and friends do what I want, but there's a dire note attached about potential security risks related to file uploads which concludes
Therefore, we recommend against their
use, but they are provided for
compatibility with old code.
The best I can figure is (and forgive me in advance)
(bytes->string/utf-8 (binding:form-value (bindings-assq (string->bytes/utf-8 "[field_name_here]") (request-bindings/raw req))))
but that seems unnecessarily complicated (and it seems like it would suffer from some of the same bugs documented in the Bindings section).
Is there a more-or-less standard, non-buggy way to get the value of a POST/GET-variable, given a field name and request? Or better yet, a way of getting back a collection of the POST/GET values as a list/hash/a-list? Barring either of those, is there a function that would do the same, but only for POST variables, ignoring GETs?
extract-binding is bad because it is case-insensitive, is very messy for inputs that return multiple times, doesn't have a way of dealing with file uploads, and automatically assumes everything is UTF-8, which isn't necessarily true. If you can accept those problems, feel free to use it.
The snippet you wrote works when the data is UTF-8 and when there is only one field return. You can define it is a function and avoid writing it many times.
In general, I recommend using formlets to deal with forms and their values.
Now your questions...
"Is there a more-or-less standard, non-buggy way to get the value of a POST/GET-variable, given a field name and request?"
What you have is the standard thing, although you wrongly assume that there is only one value. When there are multiple, you'll want to filter the bindings on the field name. Similarly, you don't need to turn the value into a string, you can leave it as bytes just fine.
"Or better yet, a way of getting back a collection of the POST/GET values as a list/hash/a-list?"
That's what request-bindings/raw does. It is a list of binding? objects. It doesn't make sense to turn it into a hash due to multiple value returns.
"Barring either of those, is there a function that would do the same, but only for POST variables, ignoring GETs?"
The Web server hides the difference between POSTs and GETs from you. You can inspect uri and raw post data to recover them, but you'd have to parse them yourself. I don't recommend it.
Jay