Thursday, July 14, 2016

Microservices: from development to production


Let’s face it, microservices sound great, but they’re sure hard to set up and get going. There are service gateways to consider, setting up service discovery, consolidated logging, rolling updates, resiliency concerns… the list is almost endless. Distributed systems benefit the business, not so much the developer.

Until now.

Whatever you think of sbt, the primary build tool of Lagom, it is a powerful beast. As such we’ve made it do the heavy lifting of packaging, loading and running your entire Lagom system, including Cassandra, with just one simple command:


sbt> install


This “install” command will introspect your project and its sub-projects, generate configuration, package everything up, load it into a local ConductR cluster and then run it all. Just. One. Command. Try doing that with your >insert favourite build tool here<!

Lower level commands also remain available so that you can package, load and run individual services on a local ConductR cluster in support of getting everything right before pushing to production.

Lagom is aimed at making the developer productive when developing microservices. The ConductR integration now carries that same goal through to production.

Please watch the 8 minute video for a comprehensive demonstration, and be sure to visit the “Lagom for production” documentation in order to keep up to date with your production options. While we aim for Lagom to run with your favourite orchestration tool, we think you’ll find the build integration for ConductR hard to beat. Finally, you can focus on your business problem, and not the infrastructure to support it in production.

 Enjoy!

Tuesday, July 5, 2016

Developers need to care about resiliency

Disclaimer: I'm the technical lead for Lightbend ConductR - a tool that focuses on managing distributed systems with a key goal of resiliency.

I've been doing a reasonable amount of travelling over the past few years. Overall I enjoy it; I don't think that I'm away that much that it has become painful - may be 3-6 international flights per year.

One of the things you hear about when travelling is missing an international connection. I've been fortunate in that this has happened just once; a couple of weeks ago in fact.

The airline was British Airways (BA), and they did a really good job of trying to make up time given that flights out of Heathrow were causing delays across Europe. Thus my flight from Berlin TXL to London LHR was about two hours late. I missed my Sydney SYD flight from LHR. The BA staff did a great job of putting me up in a hotel overnight and getting me on to the next available flight. Honestly, from a staff perspective, BA were fantastic in fact.

What was frustrating though was that it took about two hours to arrange the accomodation and flight booking. This was also considering that I didn't have to queue for long and that a staff member attended to me in a reasonable time frame. The problem was the BA computer system.

Apparently BA have some new system. There were IT staff walking around helping the front-of-house staff get into the system and deal with its incapacity to handle any load. The IT staff were frustrated. The front-of-house staff were frustrated. I was frustrated (although not as frustrated as a Business Class passenger in front of me who felt that his ticket meant that BA should treat him like royalty!).

BA's computer system had an amazing effect on all concerned, except most likely the people that wrote it. In my opinion the original developers should be there supporting the front-of-house staff. They should feel the pain that they have inflicted.

I'm sure that you have similar stories to share, where computer systems have failed you miserably. Computer systems will of course fail, that's natural, but it is the fact that their degree of failure is considered acceptable that is the problem. Computer systems should not fail to the extent that they do. Your airline reservation system, online banking site, or whatever it is, it should be more reliable that it probably has been.

The problem is that developers do not understand that building-in resilience to their software is more important than most other things. As my colleague, Jonas Bonér has stated many times, "without resiliency nothing else matters". He's so right. Why is it then that developers just don't get this?

My answer to that is that many developers just see what they do as a job, and they don't really care about what they do. Putting that aside though, creating and then indeed managing distributed systems, a key requirement for resiliency, is harder than not; not hard, but harder and developers are lazy (btw: in case you don't realise this, I'm a developer!).

We need systems that are resilient. We therefore need developers to care about resiliency. The more that developers care about resiliency, the more tools and technologies we'll see appearing in support of it. I strongly feel that it all starts with the developer though.

I imagine a world where, given the inevitability of missing flight connections, I can wait in a queue for no longer than 10 minutes, be handled within another 10 minutes and then sleep off the tiredness and inconvenience of waiting another 24 hours for my next flight. The developer just needs to start caring in order for this to happen. Make he or she responsible for managing their system in production and they'll start caring, I guarantee it.

Here's a language/tool agnostic starting point for you if you are a developer that cares enough to have read this far: The Reactive Manifesto.

Thanks!