We'll see | Matt Zimmerman

a potpourri of mirth and madness

Posts Tagged ‘Project management

QCon London 2010: Day 1

For the first time in several years, I had the opportunity to attend a software conference in the city where I lived at the time. I’ve benefited from many InfoQ articles in the past couple of years, and watched recordings of some excellent talks from previous QCon events, so I jumped at the opportunity to attend QCon London 2010. It is being held in the Queen Elizabeth II Conference Center, conveniently located a short walk away from Canonical’s London office.

Whenever I attend conferences, I can’t help taking note of which operating systems are in use, and this tells me something about the audience. I was surprised to notice that in addition to the expected Mac and Windows presence, there was a substantial Ubuntu contingent and some Fedora as well.

Today’s tracks included two of particular interest to me at the moment: Dev and Ops: A single team and the unfortunately gendered Software Craftsmanship.

Jason Gorman: Beyond Masters and Apprentices

A Scalable, Peer-led Model For Building Good Habits In Large & Diverse Development Teams

Jason explained the method he uses to coach software developers.
I got a bad seat on the left side of the auditorium, where it was hard to see the slides because they were blocked by the lectern, so I may have missed a few points.

He began by outlining some of the primary factors which make software more difficult to change over time:

  • Readability: developers spend a lot of their time trying to understand code that they (or someone else) have written
  • Complexity: as well as making code more difficult to understand, complexity increases the chance of errors. More complex code can fail in more ways.
  • Duplication: when code is duplicated, it’s more difficult to change because we need to keep track of the copies and often change them all
  • Dependencies and the “ripple effect”: highly interdependent code is more difficult to change, because a change in one place requires corresponding changes elsewhere
  • Regression Test Assurance: I didn’t quite follow how this fit into the list, to be honest. Regression tests are supposed to make it easier to change the code, because errors can be caught more easily.

He then outlined the fundamental principles of his method:

  • Focus on Learning over Teaching – a motivated learner will find their own way, so focus on enabling them to pull the lesson rather than pushing it to them (“there is a big difference between knowing how to do something and being able to do it”)
  • Focus on Ability over Knowledge – learn by doing, and evaluate progress through practice as well (“how do you know when a juggler can juggle?”)

…and went on to outline the process from start to finish:

  1. Orientation, where peers agree on good habits related to the subject being learned. The goal seemed to be to draw out knowledge from the group, allowing them to define their own school of thought with regard to how the work should be done. In other words, learn to do what they know, rather than trying to inject knowledge.
  2. Practice programming, trying to exercise these habits and learn “the right way to do it”
  3. Evaluation through peer review, where team members pair up and observe each other. Over the course of 40-60 hours, they watch each other program and check off where they are observed practicing the habits.
  4. Assessment, where learners practice a time-boxed programming exercise, which is recorded. The focus is on methodical correctness, not speed of progress. Observers watch the recording (which only displays the code), and note instances where the habit was not practiced. The assessment is passed only if less than three errors are noticed.
  5. Recognition, which comes through a certificate issued by the coach, but also through admission to a networking group on LinkedIn, promoting peer recognition

Jason noted that this method of assessing was good practice in itself, helping learners to practice pairing and observation in a rigorous way.

After the principal coach coaches a pilot group, the pilot group then goes on to coach others while they study the next stage of material.

To conclude, Jason gave us a live demo of the assessment technique, by launching Eclipse and writing a simple class using TDD live on the projector. The audience were provided with worksheets containing a list of the habits to observe, and instructed to note instances where he did not practice them.

Julian Simpson: Siloes are for farmers

Production deployments using all your team

After a brief introduction to the problems targeted by the devops approach, Julian offered some advice on how to do it right.

He began with the people issues, reminding us of Weinberg’s second law, which is “no matter what they tell you, it’s always a people problem”.

His people tips:

  • In keeping with a recent trend, he criticized email as a severely flawed communication medium, best avoided.
  • respect everyone
  • have lunch with people on the other side of the wall
  • discuss your problems with other groups (don’t just ask for a specific solution)
  • invite everyone to stand-ups and retrospectives
  • co-locate the sysadmins and developers (thomas allen)

Next, a few process suggestions:

  • Avoid code ownership generally (or rather, promote joint/collective ownership)
  • Pair developers with sysadmins
  • It’s done when the code is in production (I would rephrase as: it’s not done until the code is in production)

and then tools:

  • Teach your sysadmins to use version control
  • Help your developers write performant code
  • Help developers with managing their dev environment
  • Run your deploy scripts via continuous integration (leading toward continuous deployment)
  • Use Puppet or Chef (useful as a form of documentation as well as deployment tools, and on developer workstations as well as servers)
  • Integrate monitoring and continuous integration (test monitoring in the development environment)
  • Deliver code as OS packages (e.g. RPM, DEB)
  • Separate binaries and configuration
  • Harden systems immediately and enable logging for tuning security configuration (i.e. configure developer workstations with real security, making the development environment closer to production)
  • Give developers access to production logs and data
  • Re-create the developer environment often (to clear out accumulated cruft)

I agreed with a lot of what was said, objected to some, and lacked clarity on a few points. I think this kind of material is well suited to a multi-way BOF style discussion rather than a presentation format, and would have liked more opportunity for discussion.

Lars George and Fabrizio Schmidt: Social networks and the Richness of Data

Getting distributed webservices done with Nosql

Lars and Fabrizio described the general “social network problem”, and how they went about solving it. This problem space involves the processing, aggregation and dissemination of notifications for a very high volume of events, as commonly manifest in social networking websites such as Facebook and Twitter which connect people to each other to share updates. Apparently simple functionality, such as displaying the most recent updates from one’s “friends”, quickly become complex at scale.

As an example of the magnitude of the problem, he explained that they process 18 million events per day, and how in the course of storing and sharing these across the social graph, some operations peak as high as 150,000 per second. Such large and rapidly changing data sets represent a serious scaling challenge.

They originally built a monolithic, synchronous system called Phoenix, built on:

  • LAMP frontends: Apache+PHP+APC (500 of them)
  • Sharded MySQL multi-master databases (150 of them)
  • memcache nodes with 1TB+ (60 of them)

They then added on asynchronous services alongside this, to handle things like Twitter and mobile devices, using Java (Tomcat) and RabbitMQ. The web frontend would send out AMQP messages, which would then be picked up by the asynchronous services, which would (where applicable) communicate back to Phoenix through an HTTP API call.

When the time came to re-architect their activity , they identified the following requirements:

  • endless scalability
  • storage- and cloud-independent
  • fast
  • flexible and extensible data model

This led them to an architecture based on:

  • Nginx + Janitor
  • Embedded Jetty + RESTeasy
  • NoSQL storage backends (no fewer than three: Redis, Voldemort and Hazelcast)

They described this architecture in depth. The things which stood out for me were:

  • They used different update strategies (push vs. pull) depending on the level of fan-out for the node (i.e. number of “friends”)
  • They implemented a time-based activity filter which recorded a global timeline, from minutes out to days. Rather than traversing all of the user’s “friends” looking for events, they just scan the most recent events to see if their friends appear there.
  • They created a distributed, scalable concurrent ID generator based on Hazelcast, which uses distributed locking to assign ranges to nodes, so that nodes can then quickly (locally) assign individual IDs
  • It’s interesting how many of the off-the-shelf components had native scaling, replication, and sharding features. This sort of thing is effectively standard equipment now.

Their list of lessons learned:

  • Start benchmarking and profiling your app early
  • A fast and easy deployment keeps motivation high
  • Configure Voldemort carefully (especially on large heap machines)
  • Read the mailing lists of the NoSQL system you use
  • No solution in docs? – read the sources
  • At some point stop discussing and just do it

Andres Kitt: Building Skype

Learnings from almost five years as a Skype Architect

Andres began with an overview of Skype, which serves 800,000 registered users per employee (650 vs. 521 million). Their core team is based in Estonia. Their main functionality is peer-to-peer, but they do need substantial server infrastructure (PHP, C, C++, PostgreSQL) for things like peer-to-peer supporting glue, e-commerce and SIP integration. Skype uses PostgreSQL heavily in some interesting ways, in a complex multi-tiered architecture of databases and proxies.

His first lesson was that technical rules of thumb can lead us astray. It is always tempting to use patterns that have worked for us previously, in a different project, team or company, but they may not be right for another context. They can and should be used as a starting point for discussion, but not presumed to be the solution.

Second, he emphasized the importance of paying attention to functional architecture, not only technical architecture. As an example, he showed how the Skype web store, which sells only 4 products (skype in, skype out, voicemail, and subscription bundles of the previous three) became incredibly complex, because no one was responsible for this. Complex functional architecture leads to complex technical architecture, which is undesirable as he noted in his next point.

Keep it simple: minimize functionality, and minimize complexity. He gave an example of how their queuing system’s performance and scalability were greatly enhanced by removing functionality (the guarantee to deliver messages exactly once), which enabled the simplification of the system.

He also shared some organizational learnings, which I appreciated. Maybe my filters are playing tricks on me, but it seems as if more and more discussion of software engineering is focusing on organizing people. I interpret this as a sign of growing maturity in the industry, which (as Andres noted) has its roots in a somewhat asocial culture.

He noted that architecture needs to fit your organization. Design needs to be measured primarily by how well they solve business problems, rather than beauty or elegance.

He stressed the importance of communication, a term which I think is becoming so overused and diluted in organizations that it is not very useful. It’s used to refer to everything from roles and responsibilities, to personal relationships, to cultural norming, and more. In the case of Skype, what Andres learned was the importance of organizing and empowering people to facilitate alignment, information flow and understanding between different parts of the business. Skype evolved an architecture team which interfaces between (multiple) business units and (multiple) engineering teams, helping each to understand the other and taking responsibility for the overall system design.

Conclusion

Overall, I thought the day’s talks gave me new insight into how Internet applications are being developed and deployed in the real world today. They affirmed some of what I’ve been wondering about, and gave me some new things to think about as well. I’m looking forward to tomorrow.

Advertisements

Written by Matt Zimmerman

March 10, 2010 at 17:44

Ubuntu 10.04 LTS: How we get there

The development of Ubuntu 10.04 has been underway for nearly two months now, and will produce our third long-term (LTS) release in April. Rick Spencer, desktop engineering manager, summarized what’s ahead for the desktop team, and a similar update will be coming soon from Jos Boumans, our new engineering manager for the server team.

What I want to talk about, though, is not the individual projects we’re working on. I want to explain how the whole thing comes together, and what’s happening behind the scenes to make 10.04 LTS different from other Ubuntu releases.

Changing the focus

Robbie Williamson, engineering manager for the foundations team, has captured the big picture in the LTS release plan, the key elements of which are:

Merge from Debian testing

By merging from Debian testing, rather than the usual unstable, we aim to avoid regressions early in the release cycle which tend to block development work. So far, Lucid has been surprisingly usable in its first weeks, compared to previous Ubuntu releases.

Add fewer features

By starting fewer development projects, and opting for more testing projects over feature projects, we will free more time and energy for stabilization. This approach will help us to discover regressions earlier, and to fix them earlier as well. This doesn’t mean that Ubuntu 10.04 won’t have bugs (with hundreds of millions of lines of source code, there is no such thing as a bug-free system), but we believe it will help us to produce a system which is suitable for longer-term use by more risk-averse users.

Avoid major infrastructure changes

We will bring in less bleeding-edge code from upstream than usual, preferring to stay with more mature components. Where a major transition is brewing upstream, we will probably opt to defer it to the next Ubuntu cycle. While this might delay some new functionality slightly, we believe the additional stability is well worth it for an LTS release.

Extend beta testing

With less breakage early in the cycle, we plan to enter beta early. Traditionally, the beta period is when we receive the most user feedback, so we want to make the most of it. We’ll deliver a usable, beta-quality system substantially earlier than in 9.10, and our more adventurous users will be able to upgrade at that point with a reasonable expectation of stability.

Freeze with Debian

With Debian “squeeze” expected to freeze in March, Ubuntu and Debian will be stabilizing on similar timelines. This means that Debian and Ubuntu developers will be attacking the same bugs at the same time, creating more opportunities to join forces.

Staying on course

In addition, we’re rolling out some new tools and techniques to track our development work, which were pioneered by the desktop team in Ubuntu 9.10. We believe this will help us to stay on course, and make adjustments earlier when needed.  Taking some pages from the Agile software development playbook, we’ll be planning in smaller increments and tracking our progress using burn-down charts As always, we aim to make Ubuntu development as transparent as possible, so all of this information is posted publicly so that everyone can see how we’re doing.

Delivering for users

By making these changes, we aim to deliver for our users the right balance of stability and features that they expect from an Ubuntu LTS release. In particular, we want folks to feel confident deploying Ubuntu 10.04 in settings where it will be actively maintained for a period of years.

Written by Matt Zimmerman

December 23, 2009 at 09:00

Amplify Your Effectiveness (AYE) Conference: Day 1

Morning Session: Project Patterns

I chose to attend a session entitled: Is this the Way We Really Want to do Things? Seeing Project Patterns and Changing What You Don’t Like (Johanna Rothman). My goal was to explore the causes of the troublesome patterns I see in projects at work. In particular, I see:

  • Too many projects starting up at once
  • Projects being instantiated without enough consideration for the probability of success (“is this a good idea?” rather than “can we realistically achieve this?”)
  • Key people finding out too late about projects which affect them

All of these patterns lead to increased project risk, communication bottlenecks, low motivation, and high stress.

In the session, we conducted a simulation of a team, with engineers, a project manager, a senior manager and a customer. I took on the role of the senior manager.

In the course of the simulation, we received requirements from the customer, implemented them, and delivered products. While the team was working on implementation, I talked with the customer about what was coming next: what would happen when we delivered, what the next project would be, and so on. Part of the simulation was that I had to be separated from the group while they were working.

When we delivered the first batch of products, and the customer was happy with them, it was time to decide what to do next. We gave the customer a choice of two projects we had discussed, one of which was similar to the previous one (but larger scale and more involved), while the other was different. Despite repeated attempts, we could not persuade the customer to prioritize one over the other.

So, I decided that we should change gears and start work on the “different” project. It seemed to be of greater economic value to the customer, and simpler to execute. One of the engineers disagreed with this decision, but didn’t explain why. The project manager seemed to agree, and I left the team to work. They produced a prototype, which the customer liked, and with a few small changes it was accepted as a finished product.

To my surprise, though, I found out later that the team was in fact working on both projects at once, delivering two different types of products. The decision hadn’t actually been made. These unexpected products were delivered to the customer, but didn’t meet the expanded requirements, and that work was wasted.

The debrief which followed was unfortunately too short, and I didn’t feel that we were able to fully explore what the simulation had revealed. The project manager indicated that he hadn’t understood the decision to have been made, pointing to a communication failure.

This reminded me that while we often think of a decision as an event which happens at a point in time, it is more commonly a process of change, which takes time and must be revisited in order to check progress and evaluate. A decision is really just an idea for a change, and there is more work to be done in order to implement that idea. This can be true even when there is a very explicit “decision point”: it still takes time for that message to be received, interpreted and accepted.

One of the tangents we followed during the debrief had to do with how humans think about numbers. Jerry asked each member of the group to write down a random number, and then we wrote them all on a flipchart. They were: 8, 75, 47, 72, 45, 44, 32, 6, 13 and 47. This reminded me of the analyses of election results which indicate fraud.

Afternoon Session: Saying No

After lunch, I decided to attend Jerry Weinberg’s session, Saying No That Really Means No. This was much larger than the morning session, with over 40 people sitting in a large circle.

The subject of discussion was the variety of difficulties that people face in saying “no” to things which don’t seem right for them. For example, saying “yes” to a project which is doomed to failure. This seemed like a good follow-on to the morning’s exercise.

Jerry began by asking the audience to name some of their difficulties, and tell stories of times when they had trouble saying “no”. One of these stories was role-played and analyzed as an example. Most of the time, though, was filled with storytelling and discussion.

This is a deeply complex topic, because this problem is rooted in self-image, social norms, egocentrism, misperception, and other cognitive phenomena. There was no key insight for me, just a reinforcement of the necessity of self-awareness. The only way to avoid patterns like this is to notice when they are happening, and that can be challenging, especially in a stressful situation.

Once you realize what’s happening, there are all sorts of tools which can be applied to the problem: negotiation, requests for help, problem-solving, even simple inaction.

Written by Matt Zimmerman

November 10, 2009 at 04:34