A Software Perspective: 2015

Sunday, July 26, 2015

Jurassic Delivery

Who didn't love when Jurassic Park hit in 1993? Hot damn if I didn't own that on VHS as soon as it came out. Velociraptors. T-Rexes. 3-D GUI Unix operating systems. It was the shit. There have been some downturns, but when Jurassic World came out with Chris Pratt in a leading role I couldn't pass it up. Clearly things had been taken to the next level and I needed to submit and enjoy. I saw it later than most and, despite reviews I heard, I thoroughly enjoyed it.

I love when humans try to control the uncontrollable and pay the price. If you're going to build a dinosaur park... I don't care how much effort you put into trying to control the dinosaurs... you are going to lose. There's something very gratifying for me watching these wannabe puppeteers suffer the T-Rex bite they genetically engineered, bred fierce, and sought to tame.

Software delivery is a vicious dinosaur. You start with an idea. So innocent, harmless, but requiring so much care and nurturing. You grow it gradually. It begins to require more and different kinds of care to keep progressing. As your little dino grows you realize that he's getting unruly and you need to build some safeguards into your system and delivery process. Maybe you need to add some tests, some automation. Before you're done with any of that... BAM a major bug hits. Your dino's fully formed teeth are capable of biting through the steel cable you engineered to keep him in. We'll introduce a stronger, higher gauge cable, AND let's electrify it.

Lather. Rinse. Repeat.

We all face significant challenges that we need to solve on behalf of our customers. There will be an endless string of problems that we run into along the way. We'll solve some of those by leveraging any combination of libraries, frameworks, and solutions - some well-known, some well-understood, some cutting edge, and some not well-understood.

You have options. More well-known technologies likely have a larger support base. They probably also have more and better documentation, and a community that is seeking and solving its problems. On the flip-side, technologies that are more well-known and understood may not be solving the latest problems, and maybe not in the most effective way.

Cutting edge technologies are likely solving or streamlining more problems, or more significant problems. They're a jump forward of sorts. Maybe you can solve the problem in far fewer lines of code. Maybe concurrency is simplified immensely. Maybe deployment and scalability become low-hanging fruit. Being a newer technology though, it is likely not as well-understood, possibly not as well-documented. Certainly the adoption level is lower, which means that community support is going to be lower.

The former might result in a more manageable, tamable stegosaurus. It's less rapid than other approaches, but consistent. Your stegosaurus is going to eat, sleep, and shit in a fairly predictable manner. It's not going to bite you, and any fires it may start will be manageable.

The latter might result in an unmanageable, unpredictable T-Rex. It's fast, vicious, and will bite your head off as soon as it gets the opportunity.

Remember when Dr. Grant and the kids were running through the field as a flock of dinos ran past them?

These guys seem reasonable. Extremes are rarely the right answer. Maybe a Gallimimus software delivery pipeline is a proper middle ground?

Whether you realize it or not, you own the characteristics of your delivery pipeline. The series of choices you made since the inception of your idea created your pipeline. It's not done though. You are constantly adjusting and molding it.

Investing in change... in new and valuable technology is important. After all, who doesn't want to avoid solving solved problems, and leap forward? But, we must treat it as an investment. If it's the right business decision do it, but do it intelligently. Don't take on a conversion or technological change expecting stegosaurus-like outcomes. You might have a raptor on your hands. You have no business taking on a change like this without an investment in understanding the ways that it can bite you and accounting for those. If you think you're going to convert from a COBOL stack to a Java stack, a .NET stack to a Scala stack, an on-prem stack to a cloud stack, or any other kind of major conversion, expect unpredictability. Likewise, don't breed a T-Rex / velociraptor hybrid without expecting some casualties.

As you move forward make intentional decisions about the state of your pipeline. Consider:

Your current state
The relative newness of the thing you're evaluating. Is it well-known and understood?
The learning level and talent of your engineering organization
The trust level your management team has for your engineering organization

Maybe the right thing for your organization is to engineer your own dinosaur. But mayyyybe not an Indominus Rex. Maybe a stegoraptor though?

Tuesday, July 21, 2015

Thoughts on DevOps

DevOps, like most paradigm shifting buzz words, has become an overloaded, muddled term. I've been thinking about this a lot lately and here are some uncategorized thoughts on this rapidly evolving area of software delivery.

(Most of these statements should probably start with.. "regardless of where you are today")

Focus on customers. The culture and organizational change associated with DevOps should make everyone involved in the delivery of a solution (including infrastructure roles) more aligned and accountable to the customer.
I view this primarily as a movement to apply engineering practices to infrastructure and operations management. Versioning, automation, testing.

DevOps is about dependency removal, in much the same way that agile is. There's no better way to remove dependencies than to add the function of that dependency to teams requiring it. Add people with the skills, or grow the skill set within the team.
I believe one maximizes agility by removing all dependencies and allowing a team to create, manage, and run its entire stack.
Teams running their entire stack leaves the potential for similar, possibly duplicate effort across teams. While duplication is evil, its tradeoff is agility.
A centralized "DevOps" team is completely reasonable in my view, but it should not be how an organization starts. The formation of a central DevOps team should be a conscious decision to follow the DRY principle - to remove duplication that has emerged organically. As well, the team must not become a bottleneck as teams evolve / change.
Ownership boundaries are clearer with fewer dependencies. If a dev teams owns the app code and ops owns the infrastructure, who addresses an unclear problem near the boundary?

Still thinking...

Wednesday, February 25, 2015

The Feature Toggle Antipattern

The move toward a more agile software delivery model requires the adoption of improved technical practices. One of the first is generally the concept, and tooling associated with, continuous integration (CI). The adoption of CI practices yields other challenges, one of which is partially complete features. Many features take much longer to complete than a best practice integration cycle.

Feature toggles are a very useful way to solve this problem. By employing this concept you can effectively decouple commits and code integration from the release of a feature in that code. This is very powerful. We can now get the benefits of continuous integration without the obvious issue of exposing a partially completed feature.

Like with all things we can take this concept too far. Martin Fowler advocates for avoiding feature toggles to hide things in production:

Your first choice should be to break the feature down so you can safely introduce parts of the feature into the product. The advantages of doing this are the same ones as any strategy based on small, frequent releases. You reduce the risk of things going wrong and you get valuable feedback on how users actually use the feature that will improve the enhancements you make later.

He simply suggests embracing your agility. The need for these toggles means you're already releasing to production more frequently than you can complete features. Why not break the work down further, and learn from each release? Here Martin is suggesting avoiding features that will take longer than your release cycle. Instead, break them down. But a frequent production release cycle is good. Don't attempt to solve this by releasing less frequently...

Not everyone may be able to accomplish this easily though. It's a worthy goal to improve to over time. However, teams need to monitor for over reliance on these toggles. So a couple things to watch out for:

Completed, but not released features. If you've completed a feature you should be ready to release it. Otherwise what could you have been working on instead that could be released today and have added customer value today?
The number of hidden features. If #1 is a problem, this is likely also a problem. However, this can also manifest if you have too much work in progress (WIP). Reducing WIP can drive feature completion, therefore releasability, and therefore customer value.

If taken to an extreme the number and scope of features that are hidden in production can reach cumbersome levels. I call this the Feature Toggle Antipattern. In its worst form agile teams lose sight of their stockpile of not-yet-released features, even releasing (i.e. no longer hiding) features less frequently than in a typical waterfall project.

In a waterfall project there is a clear beginning and end, often to a fault. That's one of the things that agile overcomes very successfully. Aligning teams around a product and driving features through that team eliminates the on / off nature of waterfall projects - the eminent big bang.

With frequent releases (no big bang) it's easy to lose sight of the customer value that's hiding in your toggles. When it's time to actually release those features you could end up with a big (detoggling) bang that dwarfs its waterfall alternative. There will be other loss too aside from the customer value opportunity loss. If you encounter any issues, or as you get feedback from customers you'll need to change and adapt. Many of those features were developed a long time ago and you have all the cost of context switching and refamiliarization.

If you're going to use feature toggles to deliver your product make sure you avoid this antipattern. You'll avoid many of the heartaches that drove you to embrace agile in the first place.

Saturday, January 31, 2015

The Cloud Decision

For some there may not even be a debate when it comes to the cloud. The flexibility and scalability that it offers small companies that don't want to build their own datacenter and are not sure about the size of their customer base (their viral coefficient even), is invaluable. For larger companies the evaluation is generally more difficult. The primary driver is usually cost, and that savings must be measured against all the other change that's necessary - in security, architecture, and governance to name just a few. Any good startup (or smaller company) has a strong culture of innovation. After all that's how startups start. Innovation is a critical element of the cloud decision making process that too easily gets lost in the evaluation for larger companies. This, and other value generating endeavors are amplified by a service enabled infrastructure. Particularly one where self-service is encouraged.

Looking at the cloud decision from the CIO level is just too high. Doing that is going to dismiss the most significant benefits. I imagine the typical evaluation goes something like this:

Cost. In the cloud we can pay for what we use, and not worry about underutilized assets. Ok, that's a +.
We'll really need to ramp up security.
Let's do it!

It may be true that an organization will lower costs by doing just this. However, there is an enormous amount of lost opportunity in making this move so naively.

If you have an on-premise datacenter you probably have dedicated infrastructure teams. You've probably also built up processes for development teams to interact with those infrastructure teams. Focusing on cost and ignoring the self-service, service-enabled nature of cloud providers might cause you to reimplement your existing datacenter, architecture, and processes in the cloud, avoiding the majority of the benefits.

Let's take a simple example that I recently heard to illustrate a company that recognized the value of the self-service model and reaped the benefits. A developer at a larger company supported an existing, cumbersome process to make regularly released files available to external parties. That process was one where, once her releasable artifact was built, she sent and notified another team of its availability. At that point the other team would "approve" its release and put it on an external facing FTP site. The process took several days on average.

In their cloud migration / implementation this developer was empowered to use the available services. She recognized that the cloud storage solution now available could easily replace the FTP site and the to-be-released artifact could be automatically sent to the storage solution directly from the build process. The net result was an automated, reliable process with immediate results rather than a multi-day lead time.

There are two reasons this succeeded.

The developer was intimately familiar with the process. Enough so such that she could recognize and implement the improvement.
There was a conscious decision to empower her; to allow her access to cloud services. Her organization could have easily restricted access the storage solution such that only the FTP team (or other infrastructure team) had access.

An enormous benefit of moving to the cloud is its service-enabled nature. Organizations with manual processes and hand-offs have all the opportunity in the world to take advantage of this. There is a key though. The true benefit only occurs when there is a reduction in dependencies. Reduction. In. Dependencies. This must mean that requests are not necessary; that teams can "request" infrastructure via a console or API, on-demand, and not depend on an external entity.

The on-premise datacenter versus cloud provider decision is a difficult one. It's one that I do not think should be made lightly, and one that should not be made for cost reasons alone. Organizations need to make sure they recognize the real benefits and take advantage of them. This can be a very large change. In many cases it's a culture change, an architecture change, and a governance change. Think through what is means to reduce dependencies. Roles may change, and skill sets may be challenged. This requires great leadership, trust, and maturity to accomplish successfully. I'll end with a sobering statistic from VMWare:

63 percent of Amazon AWS projects are considered failed, compared to 57 percent of projects on Rackspace and 44 percent of Microsoft Azure projects.

Saturday, January 10, 2015

Single Responsibility Principle 2.0

I see this concept coming back a lot as of late, at least in my ongoing learning and discovery of good design and architecture. The Single Responsibility Principle is one of the SOLID principles for good object oriented design and development. I think the SRP, and likely many other principles, have become applicable at higher levels of abstraction.

Some amazing advances in approaches to infrastructure and configuration management, deployment, scaling have hit our industry hard in recent years. Historically, before virtual machine technology was really the norm there was still a need and desire to consolidate applications and services to run on a minimal physical footprint. After all you want to use the hardware that you've purchased effectively. You don't want to run a beefy server at 10% utilization all day, so you load it up to more fully utilize it. This drives a lot of coupling at the infrastructure level of our architectures.

When VM technology became heavily adopted this started to become less of an issue. You could use the same physical hardware, but host many logical VMs on it. Thus, you've separated more of the concerns, thereby further decoupling co-hosted applications. This improves architecture, but trades for increased complexity and load on infrastructure teams. If we're going to build more single-purpose servers then we'll need more VMs. This spawned the need for greater automation in the infrastructure space. Along come tools like Puppet, Chef, SaltStack, and Ansible. These tools have done an amazing job fulfilling exactly this need. Write your infrastructure as code, version it, and leverage it for infrastructure, on demand.

We now have tools that enable us to rethink our approach to design and architecture in support of the single responsibility principle. In the early 2000s when Uncle Bob wrote about the SRP he raised visibility to a concern that we should be asking ourselves as we design and change classes. It's now 2015 and our tools have advanced incredibly. We need to ask ourselves this same question as we design all layers of our systems.

How can we leverage the SRP to reduce coupling not just in our classes, but in our application / service design, and in our infrastructure design?

Modern tools enable, and frankly necessitate, that we ask ourselves this question across our entire stack. It's really this thinking that drives teams and organizations to microservice architectures. A class that is designed with a single responsibility changes for one reason and one reason alone. This keeps classes small, thus easier to change. Services should be easy to change as well. What better way than to enable that than to give each its own, single purpose and therefore small, changeable implementation.

I have been particularly interested in Docker lately. I believe it adds a tremendous amount of value in this space. Despite great tools like Puppet, Chef, etc., their approach is still bulkier. With Docker's lightweight VMs the creation of individual service or application images is fast, and spawning them is even faster. You can start a Docker container just as fast as your service itself can start. Docker also really shines in its simplistic mechanism for linking together containers for interactions. If you have a typical web server, app server, database architecture it does not take long to get those pieces Dockerized and running together, linked, in a Docker environment. I really like what Fig did to simplify this even further.

Many of us likely have some work to do to improve our applications and systems in consideration of the SRP. There won't be any shortage of work there. The good news is that there is little holding us back when it comes to available tooling. It's there and most of it is free. Despite the challenges it seems to me that it's worth giving some thought and moving in that direction. More easily changed services will yield business agility, and therefore customer satisfaction.