I recently discovered extprot. extprot is a "long-term binary encoding for cross-language communication and long-term serialization". In short, this is a protocol that can be used to encode messages between two different computers, similarly as ONC-RPC, XMP-RPC, ANS.1, Corba, Java RMI, SOAP or Google's Protocol Buffers, amongst others. And of course, extprot is Free software under MIT license.

extprot has interesting properties:

  • it is extensible: one can add new data types without breaking compatibilities with previous version of the encoding;
  • it is descriptive: one describes the message format and a compiler produces encoding and decoding code for a specific programming language;
  • it is self-delimiting: each message or part of a message has its own length, allowing to skip it if not needed or unknown to the decoder;
  • it is self-describing: each part of a message contains a basic type, allowing to decode the message even if the overall message description is unknown.

Overall, I found the description and rationale behind extprot quite interesting. I share the same views as Mauricio Fernández, extprot's author, regarding the compactness and efficiency of binary protocols compared to textual ones, like XML-RPC or SOAP. Having used ONC-RPC in a project of my own, the descriptive aspect is quite useful and I have experimentally verified that a binary encoding is much more network efficient than a textual one. Compared to ONC-RPC, extprot adds the self-describing aspect and support certain data types specific to OCaml, like Sum types or Polymorphic types.

However, extprot is a non-standard, proprietary protocol.

After looking at the binary encoding of the protocol, I made a few observations:

  1. The wire_type within the prefix of each piece of data is encoded over 4 bits, 16 values. 10 values are already used, leaving only 6 left, not accounting the use of the least significant bit. I would suggest to use a bit larger encoding, over 5 or 6 bits;
  2. In the encoding of signed integer with vints, the 63rd bit is put in zero position, in fact limiting values to 64 bits. It might be better to allow encoding of 128 bits or more integers in the future;
  3. The endianess of Bits32 and Bits64_long integers is not specified;
  4. There are no unsigned integer for 32 and 64 bits integer. They are necessary so as not to loose the semantics of those integers, even if one could encode them as vints;
  5. There are no 32 bits floats.

I'm not sure of the rationale behind the current scheme, except it is very OCaml oriented (which is not a bad point for me ;-). In light of those remarks, I would suggest following modifications:

  • Drop Bits32 and Bits64_long types and encode all integers (except 8 bits and booleans) as vints, so as to simplify the encoding. As a side effect, there is no need to specify the endianess of integers;
  • Increase the wire_type size to 6 bits (64 values);
  • Introduce vints tags for signed_int32, unsigned_int32, signed_int64, unsigned_int64. It might impact performance a little but I'm pretty sure the overhead is negligible on modern processor with carefully craft code;
  • The encoding/decoding of signed vints depends on the tag value, so that we have (n >> 63) for signed_64bits, (n >> 31) for signed_32bits, etc.;
  • Eventually, introduce float_32bits, even if I don't see how they could be mapped on OCaml's 64 bits floats.

extprot is an interesting proposal, with the (relative) simplicity and efficiency of binary encoding of ONC-RPC, with the added bonus of being self-describing as XML based protocols like XML-RPC or SOAP. However, extprot is not a standard, limiting its attractiveness compared to other protocols like ASN.1 or Corba.

A real comparison of those different protocols would be quite interesting, but this is too much work for this simple blog entry.