When I made providence I remembered quite a lot of tooling around proto buffers that I had available when I worked at Google, including a data converter, a data inspector, streaming tools and a distributed filesystem util that could parse the remote protobuf data file based on a local IDL. Well, enough about all the awesome tooling of past jobs, instead I wanted all of that to at least be possible with providence.

Then I decided to do all of that, and created some tooling. Dropped it and created it again. A couple of times. Getting this right is pretty important if you want to have it used widely. What I show here is the current 0.0.2 version of providence. See here for more about providence.

In the end I ended up with two tools in addition to the compiler itself. The data converter (pvd), and the RPC tool (pvdrpc).

Data Converter

The data converter is a small program that can read a thrift IDL, and a binary (or non-binary) serialized data file that follows that IDL structure, and the output the result in some form of readable (or other binary) data format. As for example:

Given the thrift IDL in thrift/test.thrift:

struct MyData {
  1: string text
  2: i32 sequence
  3: list<string> tags
}

And a binary file with a set of data entries in test.MyData format. We could do something like this:

cat test.data | pvd -I thrift/ test.MyData

And should make the output:

{
  "text": "not a test at all",
  "sequence": 144
}
{
  "text": "test 2",
  "tags": [
    "first",
    "second"
  ]
}

Input and output can be specified to point directly to files (no shell piping needed), and the input and output serialization format can be specified too.

pvd -i fast_binary,file:test.data -o pretty -I thrift/ test.MyData

Which should read the data file serialized with the FastBinarySerializer format and print it out with the simple “pretty printer” format.

RPC Tool

I don’t have as much remote distributed filesystems to work on, but thrift has a pretty widely used RPC protocol system, which if used correctly can be pretty nice. But testing a server with various client settings, data with a moving thrift definition can be quite the hassle, as you need to create a real client to do thrift RPC…

The providence RPC tool connects to a remote service (currently only through a HTTP servlet endpoint), and sends a service call (which can be parsed from files the same way as the data files above), and sent as an actual remote procedure call:

[
  "test",
  1,
  0,
  {
    "request": {
      "test": "my test",
      "sequence": 44,
      "tags": [
        "first",
        "second"
      ]
    }
  }
]

Which is the RPC call data, and the shell command:

cat req.json | pvdrpc -I thrift/ test.MyService http://localhost:8080/test_service

Which would then:

  • Parse the thrift IDL in thrift/ directory, pick out the test.MyService service from there.
  • Read the req.json request call (which needs to follow the desired input format’s service call syntax), and use that to send the actual RPC call to the remote service at http://localhost:8000/test_service.
  • Parse the response (in whatever format returned) and print out to standard out so the user can read it. E.g.:
[
  "test",
  2,
  0,
  {
    "success": {
      "test": "my test",
      "sequence": 44,
      "tags": [
        "first",
        "second"
      ]
    }
  }
]

Or if a non-200 HTTP response is received will print out the error message received.

Note that in the two json structures shown there is the serialized service call data in addition to the request and response wrappers around the actual request and response objects. The format here is:

["${method}", ${type}, ${sequence}, ${wrapper}]

Where the type can be 1: request, 2: response, 3: exception, and 4: oneway. The sequence number is something the transport layer (http, client etc) can use to match response messages to the correct caller, and the wrapper is either the request params wrapper, or the response wrapper.

  • The request params wrapper message is a generated thrift struct that contains all the method params as field values. E.g. the method ResponseStruct my_test(1: i32 num, 2: string text) would become equivalent to the struct:

      struct my_test___params {
        1: i32 num
        2: string text
      }
    
  • The response wrapper is a wrapper of the return type (put in a field named success), and of each possible exception put in a union. In order to make it impossible to conflict with field IDs, the success field gets the index 0, which is not allowed to declare. E.g.:

      union my_test___response {
        0: ResponseStruct success
        1: MyException ex1
      }
    

Future Works

For the future there are a number of things I want to update with these tools, and the providence library that supports it. Most notably:

  • Support the thrift standard TFramedTransport client server RPC calls (mainly for the pvdrpc command). E.g. using the thrift://host:port URL schema.

And for providence in general:

  • Add support for the various annotations that is already supported by the old C++ thrift compiler. For java it may be stuff like class access, final and class names.
  • Add support for making POJO jackson objects instead of linking to providence (or thrift for that matter) as a whole. Being able to define jackson POJO objects with proper annotations from thrift IDL may be nice. (still to be immutable objects with builders).
  • Write complete client and server libraries, specifically for using the google-http-client (porting the once used in testing pvdrpc) and Servlet / service code for javax.servlet. E.g. using jetty or netty servlet interfaces.

And the library yet to be tested for service inheritance, which may or may not work at all.

Packaging

I have currently only made debian package for the tools, which you can download [here] for the 0.0.2 version (no warranty given). It contains the pvd and pvdrpc tools describe in addition to the pvdc providence compiler.