Release Management’s Traffic Overview Explained

I wanted to write another post to help clear up some functionality in Release Management (for Visual Studio 2013). Many teams that I have opened up Release Management (RM) to see the traffic overview page and wondered how that data is actually calculated. What are App/Traffic/Fail metrics to be watching and why is this important. The pro-tip here is that if you hover over these numbers in the tool, you can see a better description.

Each of your Release Paths (pipelines) are listed down this page. Each of your stages is shown for each pipeline. You can get a sense of how many steps are required for an application to reach its ultimate destination (in this case “prod-client”), which is the right-most stage for any given pipeline.

Under each stage, you have:

  • App – which are the number of apps that have been released to that stage (all time). Note, you can have multiple applications going through a shared release path. If you have two applications that follow the same approval chain and environments, have them share Release Paths.
  • Traffic – Number of releases in the last 5 days to have been released to that stage (or have failed to get to that stage, but attempted)
  • Fail – Number of failed releases in the last 5 days to have been released to that stage.

You should probably see higher numbers on the left than you do assuming you’re like most development shops.

So what can you do on this screen?

If you double click on any of the stages, you’ll get more details about the apps that have been transitioned through that Release Path. It will show you the all-time traffic through each of those stages (more than the last 5 days).

Release Management Approvals Explained

Microsoft’s Release Management for Visual Studio 2013 has a lot of great functionality around authorization and security for releases that are being promoted through environments within the enterprise. One of the great strengths of Release Management is its ability to track actions being performed around deployment in an auditable fashion. Every single approval and manual step performed by humans is captured by the tools (assuming you’re respecting the rules of the game), and every single automated action has an owner that is tracked as that automation is performed in a company’s IT environments.

A Release Path in Release Management defines the environments by which applications will be promoted. It contains the definition of which environments are relevant and who will be participating in that promotion process. The release path below is simple one that contains only a Dev and Prod:

For each environment below, one can define relevant approvers in the workflow. I’m highlighting what I think is the most important of these:

Acceptance: This person or group of people is responsible for accepting a new release into that environment. During the release process, Release Management (RM) will optionally email everyone that is named in that box. And if any one of them approve it, then Release Management will start the automated deployment process.

Validation: This person or group of people is responsible for indicating, generally, that the automated deployment worked. Perhaps as part of the process, the team decides to run a quick smoke test against the environment before considering it “validated.” The team has the ability to define what “validated” means and it could vary by team and environment. It could mean that the QA team has run a regression test, or perhaps the developers have performed a quick smoke test. In this case, as in the case of Acceptors, if any one member of the group approves, then the release is considered validated for that environment.

Approvers: After an environment is validated, then there’s another step called Approval. This is a separate from validation in that – it’s not indicating that the release is working in some fashion, but rather this is the chance for a person or team to approve that the release can be considered “worthy of being promoted beyond.” For a development environment, the development team may be signifying that the release is worthy of moving to QA or PROD. This Approval step is different from the other two kinds in that multiple people or groups can be added. All of the ones listed must approve before the release is considered “Approved” in that environment.

 

Below is an example chart that I’ve used to map out roles and responsibilities in the release process. This chart was created for a traditionally managed IT organization with separate QA and DEV teams.

Environment

Development

Testing

Production

Acceptance

[All Developers]

Recommend leaving automated to lower friction of deploying to a development environment.

 

[Testing Leads]

Recommended a small group make this acceptance to prevent any loss of work from testing team due to errant releases to the Test environment.

[Infrastructure Team Leads]

Team required to indicate that production was ready for a release and the change management process was respected.

Validation

[All Developers]

Recommend leaving this automated to prevent friction of deploying to development team.

[All Developers]

Likely, the developers would do a quick smoke test on the QA environment to ensure it’s in working order before testers start performing their tests.

[All Testers]

Whomever has the skills to indicate that the release successfully made it to the environment.

Approval

[Development Leads]

 

Agrees that the release is good enough for QA to begin testing.

[Testing Leads] and [Product Owner]

 

Required both sign offs before the release is readied for production.

[Product Owner]

 

Someone that signs off that the release was a success and new features are working as expected.

 

I have a few patterns that I’ve noticed that seem to be worthy of reminding:

  • Validation and Approval can be confusing. They both take place after the deployment, but I like to suggest that they mean very different things. Validation is something that happens to suggest that the release is good for the “current environment” while “approval” really indicates that the release is good enough for a “future environment.”
  • Acceptance or Approval both are gatekeepers to prevent the releasing of a build to a new environment. When teams use both of these, it can be confusing which one is going to be used for tracking “ready for next.” I’d suggest “Approval” to be used particularly between Test and Prod as the “this is ready” as opposed to acceptance is “the environment is ready” for the new release.

I’d be curious if you have any other patterns of approvals/disapprovals or if you have used these states in other ways that defined above. Please comment and share.

Recreating TFS Shelvesets in New Environments

Invariably, when you work in the ALM field enough, you get your share of source code migration projects. Lately, I’ve had a few of those, and many of them between two separate Team Foundation Server instances. In the world of source code migration, often consultants will look to recommend either tip-only migration or the use of automated tools to essentially play-back all the changesets/checkins from the old system into the new one. The tip-only migration means to get-latest the code from the old, and check that latest version into the new system. Although typically far simpler, this approach of checking in a single latest version does have its complications. Lately, I’ve worked with several clients that have had invested in shelvesets that need to be brought over. In those cases, the teams had access to both the old TFS server and the new one at the same time. I had written up some instructions on how to do this easily for the developers so that they could bring over their own shelvesets at their convenience before the old system was taken down.

This approach is not-automated and requires the users’ participation, but is a decent option for small teams.

What you need is a developer’s local workstation with workspaces mapped to both the old and the new TFS servers.

  1. Ensure that a local workspace for the old environment is mapped and up to the version of code that your shelveset is based on (often this is “latest”). At this point, TFS believes that the local workstation has the latest version of the code.
  2. In Windows Explorer, navigate to that location and Delete everything under the workspace mappings to that the folder itself is empty. TFS may detect that as a bunch of deletes, but that’s okay.
  3. Unshelve the shelveset to be migrated. At this point, the old workspace’s folders should only have the shelveset files in it.
  4. Paste that set of files into the new workspace in the right place in the folder structure.
  5. If using VS 2012 or VS 2013, Open up the Pending Changes Window and ensure to promote those changes that should show up now as Pending Changes.
  6. If your original shelveset included any deletions, the developer will need to go perform those deletions in the new workspace as well.
  7. Shelve the changes.

At this point, you should have a shelveset mapped to the correct folder path in your new environment ready to be used by the developer.

Promote the Bits or Promote the Code

Release management in the enterprise has design patterns just like software has. A design pattern in the software design is a way to communicate recurring structures of code and logic in applications architectures. They help technologists in various roles communicate intent and structure in a succinct way. If both people are aware of a particular design pattern, then a huge short-cut in communication can be reasonably be taken. Invariably in my consulting career when doing a tools-focused engagement, we start to talk about how to do automated builds and deployments. One of the first decisions is whether or not to couple build and deployment and this is a “release” pattern that enterprise teams are choosing all the time.

Promote the Code.

This pattern was earlier defined commonly as “Source Code Promotion Modeling” – essentially having a branch for every environment, and create a build for every branch that would deploy to that environment. This can be represented by the diagram below showing the artifacts that would typically be created in this situation.

This has traditionally been less expensive to implement. It requires developer skillsets in branching and merging which are ubiquitous, and it requires very easy configuration strategy for builds. The downsides are two-fold: (1) Code is recompiled in each environment and therefore what was tested in a lower environment is not being promoted, but rather in best cases the actual source code was that gets recompiled and (2) Depending on implementation, even the same code cannot be guaranteed to be the same in the lower environments if reliance on merging with the possibility of two-way merging can happen.

Promote the Bits

In this world, the source code is compiled once, but released/deployed potentially man times. This has the result that the compiled binaries that were tested in a lower environment are identical to the ones that are being promoted to the higher environments. Typically, the application is just re-configured in each of the environments that are required. The downside has traditionally that this is certainly more sophisticated and thus more expensive to deploy in enterprises that are just adopting new patterns. What is changing however is the cost to move to this pattern has started to decrease in the Microsoft community due to Microsoft’s entry into the market through “Release Management for Visual Studio.” Having an anointed Microsoft solution, in reality, has pushed this scenario more and more into the mainstream of enterprise software development.

Neither of these concepts are new. Using Team Foundation Server and Release Management, both scenarios are fairly easy and cost effective to implement. I have found that these two phrases: Promote the Bits or the Code — definitely have simplified the conversations with my clients and allowed us to move into the things that in fact do vary from client to client. If you’re small and looking for some quick wins – Promote the Code can work. If you’re subject to regulatory compliance such as SOX or other – Promote the Bits looks to be the answer.

Deliberate Risk Taking in Software Teams

Risk is a difficult concept to understand and deal with. As an industry, when building software, we talk about lowering our overall risk and that will increase our chances of success. So many things can potentially go wrong on projects that we are simply conditioned to remove those possibilities as much as possible. In many cases, I would agree with that approach, but just as in the field of Enterprise Risk Management (ERM), our industry needs more nuanced ways of looking at risk in order to move forward and striking the right balance between debilitating risk aversion and wild-west gun slinging. Today this is managed by our industry gurus, our project managers, and scrum masters using intuition and experience. Perhaps if we formalized the thinking about risk, we can increase our overall competency beyond our sages of wisdom.

ERM categorizes risks into two categories: ancillary and core.

  • A core risk is one that the company wants to take on because often it is the reason for them being in business and earning revenues. The business is said to be exploiting this risk and deriving profits from doing so. If you’re a consulting company, such as Polaris Solutions, you really want to take on as many fixed-bid projects as you can – as long as you know they are the types of projects your company can do well. Similarly, a staff augmentation firm that might not have strong engagement / project management competencies may want to avoid those same engagements. The consulting group should take on the risk because in theory it should be compensated for taking on the risk and it knows it can deliver on or ahead of schedule. Additionally, imagine a cloud hosting service provider like Microsoft Azure. People pay Microsoft so they can offload their risk of outages and complexity in infrastructure. If Microsoft hires a lot of really smart people that do infrastructure management, it wants the risks associated with that work because it can charge a price for that. In theory – their competency is keeping the datacenter functioning well.
  • An ancillary risk is one that the company would love to reduce, mitigate, or offload to a third party whenever it can. Companies extend credit to their customers all the time through “accounts receivable” – which are promises of customers to pay invoices within 30/60/90/etc. days. There is a risk always that a customer might not pay their invoice to the company. Most companies don’t like that risk and that’s why many have shrunken payment terms, or outsource collections from the outset to someone that is good at “collecting.” At my company, Polaris, there’s the risk that an employee gets hurt on the job. I don’t personally like having that risk, so we buy (in addition to compliance reasons) workers compensation insurance which prevents me from having to pay the medical bills directly. That’s an ancillary risk that I don’t want, and is not core to what we do.

These concepts can apply to software development teams as well. Unfortunately, most of us treat all risks in the same way – they are things that we want to remove from our paths. Some examples of risks that indeed are ancillary for most enterprise development teams:

  • Laptop / hard drive failure results in source code getting lost.
  • Changing business conditions resulting in changing requirements (*note on this in my next post)
  • Dependencies on other teams’ output not getting built in time
  • Dependencies on third party libraries that are unexpectedly failing
  • Not considering product like hardware when testing and sizing applications results in scalability challenges

If those are ancillary risk, then what would possibly be a core risk to the team? A software development team exists to build software, and therefore the core risks are often surrounding the software itself. Some examples might be:

  • An experiment in new design patterns results in slower code
  • An architecture spike didn’t produce a successful POC
  • Some new feature might be more complicated than we thought it was going to be

These do share a commonality that they are the actual product, and sometimes within the teams own control. I’ve seen agile and waterfall teams unnecessarily pad estimates in such a way to reduce their risk of missing commitments to the project or product. By padding estimates, especially on an agile team, they are ensuring that fewer User Stories are being accepted by the team. Those teams often always meet their commitments and then have time left over. If this culture persists, the team takes on less and less over time. Their overall value to the business becomes less and less as they are marginalized from key product enhancements. In extreme cases, when left unchecked, this leads to team members being “let go.”

In this post, I’m not arguing to take on all commitments you can, but I am pushing that we recognize when our teams are starting to look “risk averse” and are being overall safe beyond what is acceptable. As members of the team, we can recognize this most easily, but a trained scrum master can recognize this through the empirical data on sprints. The moral of the story is that padding estimates is not avoiding the same type of risk as adding another hard drive to prevent hardware failure. So think differently about these cases, and we should ensure that our teams keep striking a good balance between too much/too little core risk.

What do you think – do you see any core risks that your team is silently mitigating?

Getting Real ROI (return on investment) on ALM

As you come into work on a Monday, you find your team is about to start down a path whereby they are going to commit to significant architectural refactoring of the application. You talk to the senior developer and ask “why?” The answer you hear back is unsatisfying as it is obscure. Phrases such as “separations of concerns” and “single responsibility principal” are mentioned faster than you can actually drink your first cup of coffee. Scenarios like this happen every day. How do you validate that this is the right thing to do or not?

We founded Polaris Solutions because we were (and still are very) passionate about making the IT landscape a better place. We help our customers by providing advice, implementation, and knowledge transfer on the principals that help an IT team/organization be successful. As part of that process, we like to listen to our customers and truly understand how they are getting their work done before we set out to help any change. For us, we call that the “Assessment.”

An ALM Assessment is meant to (1) understand the current situation (2) identify any really good practices that can be built upon (3) find out any areas of investment that may result in increased maturity/success for that team(s). As part of the results, which do vary based upon immediate and longer-term needs – a roadmap can be created which consist of modifications to process, tools, and practices/people which should improve things. Most carefully, managers should take those assessments and apply an ROI paradigm to them. Questions like “Which of the following improvements should have a real positive ROI?” or “Which ones have the highest?” come to mind, and they should.

As an example, if it’s recommended that the team should adopt automated build or deployments, how much will this improve my team? As a suggestion, I’d recommend consider what the ROI on automated deployment might actually be. To get there, you can take a high level view on what the ROI should be – and then just like good computer scientists, divide-and-conquer that problem.

  • ROI = (Gain from Investment – Cost of Investment) / Cost of Investment

Implementing automated build and deployment can be estimated and priced. The gain from doing so is more difficult because it represents the removal of the cost of doing it the “old way.”

  • Gain = Cost of Manually Building Today + Cost of Outages from Manual Process
  • Gain = Cost of Single Manual Build * Frequency + Cost of Outages
  • Gain = Hourly Salary * Duration of Single Manual Build * Frequency + Lost Revenue * Probability of Outages

An example of ROI over simplified – for a single year.

  • ROI = (40K – 10K) / 10K = 3.0

If we spend 10K to remove 40K of costs to our company. This is a win. Should we do this first?

Maybe – it depends where this stacks up against other initiatives within the IT organization. Perhaps better requirement analysis or superior unit testing might have significant gains to the organization. One major point here is – it is very hard to feel like this is an exact science because some of the practices mentioned within this article are tough to quantify benefits. When doing an exercise such as this, keep in mind that no matter what – this is an estimate and not actual. So the first challenge is figuring out what your variables are, and then putting in assumptions in for them. I find this is an opportunity for analysis paralysis, but when you don’t know something, start making those assumptions and documenting them.

This exercise, I believe, is an important one. Unfortunately, few think about this in real practical terms. Phrases that truly are poor substitutes for this are that we all hear: “It’s faster” “It’s newer” “It’s more integrated.” They are good positioning statements, but ROI in your organization should start with these, not end with these.

Speaking at the Visual Studio 2013 Launch in Chicago

As you might already know, Microsoft is increasing the cadence at which they will release Visual Studio. And this year, 2013 is coming out. This year, I am really excited to be presenting up in Chicago with Jeff Fattic at the launch event for Visual Studio 2013. I will be talking about how to use Agile in the enterprise and making it scale for you. To register for the launch event, see the MS Events site: https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032569677&Culture=en-US&community=0. This will be on November 20th at the Drury Conference Center out near Oak Brook.

Efficient Testing Tour – Lab Management Slides for STL and KC

I just finished my portion of the Efficient Testing Tour with Microsoft. I want to thank everyone that came out to see us talk over the past two days in St. Louis and Kansas City. Overall, we had a great discussion in both places, and I wanted to make sure I shared my slides in case anyone wants to refer to them in the future. Here they are: