We'll see | Matt Zimmerman

a potpourri of mirth and madness

Posts Tagged ‘Web

Where’s your data center?

Thanks to the tremendous growth of “social” applications over the past five years, we have our pick of services for collecting, saving and sharing our experiences online. We each have collections of photos, contacts, messages and more, spread across multiple popular services like Twitter, Facebook, LinkedIn, as well as many less popular services which address particular needs or preferences. We’re also producing a wealth of “exhaust data” through our browsing history, mobile sensors, transactions and other activity streams that we rarely if ever examine directly.

This ecosystem is becoming so complex that it’s easy to lose track of what you’ve created, shared or seen. We need new tools to manage this complexity, to make the most of the wealth of information and connections available to us through various services. John Battelle calls these “metaservices”, and points to growth in the number of connections between the services we use.

I expect that this next age of information tools will center around data rather than services. Data is a common denominator for these online experiences, a bridge across disparate services, technologies, social graphs, and life cycles. Personal data, in particular, has this property: the only thing that links together your photos on Flickr, Facebook, Picasa and Twitpic is…you.

So where’s your “data center”? I don’t anticipate the emergence of a single service where you do everything. There will continue to be innovation in the form of new and specialized services which meet a particular need very well. There won’t be a single service which is everything to everybody.

Instead, I foresee us wanting to track, save, use and control all of our “stuff” across the web. That’s why my new colleagues and I are working to make that possible.

There’s open source code available on github, a vibrant IRC channel (#lockerproject on Freeenode), and lots more I’d like to write about it. But it’s time to get back to work for now…

Advertisement

Written by Matt Zimmerman

August 9, 2011 at 15:53

Why I’m excited about joining Singly

This summer, I’ll be taking a bit of time off, moving back to San Francisco and starting a new job. I can’t wait to get back to work. Here’s why.

Me and my data


I have a singular relationship with my data. I have a copy of every email I’ve sent since I first got an Internet email address in 1994 (82,000 messages and counting). I have even older files downloaded from BBSes, and passed between friends on floppy disks. Chat logs, text messages, voicemail…I hold onto them all. Anything which is relevant to me personally, I tend to save.

This must seem banal to people who are first getting online today. In the age of Gmail and Flickr, it’s easy to assume that all of your data will be preserved indefinitely, with little or no effort on your part. But for me, it has been hard work over the years, because I’ve done it myself. I’ve carried my data with me to countless new computers, operating systems, storage technologies, file formats and cities over the years. Everywhere I’ve lived, I’ve brought it with me. Physically.

Really?

Why do I do this? Why have I gone to such trouble for a collection of bits? Especially now, why is most of my data still at home?

One pragmatic answer is that I can simply do more with my data when I have a copy. I can work with it using any software I want, including software that I write myself. I don’t have to worry about whether I can transfer it from one web service to another. I’m never stuck using yesterday’s services because my data is never trapped in them. My personal data is always available to me me, always raw, ready and waiting for the next wave of software to come along. When it does, I can load my data into it and keep going. The fact that Facebook and Google disagree over sharing their users’ data doesn’t bother me in the least.

Another reason is that I want to be in control of it. I decide who to share my data with, and when. Some of it, I prefer not to share at all, with any person or company, and I have that choice. Even if a powerful government wants to access my data, I am afforded certain protection under the law, at least in the countries where I’ve lived. If I turned over my data to service providers, my choices and protection would likely be much more limited.

I have a deeper emotional attachment to my data as well. Enfolded within that vast pattern of bits is some part of my self. By sharing my personal data with other people, I show them something of who I am. Increasingly, my personal data is part of my identity. This is more than just a state of mind: it’s been shown that even our “non-identifying” personal data can reveal who we are.

In other words, it’s not just “my data”—it’s me data”.

Singly

I’m joining Singly because I want to take this concept much further, and combine people, data and software into a different shape with people at the center.

Today, we are creating vastly greater amounts of personal data, and it’s stored in many more places. We leave our trail on the Internet in the form of activity streams, messages and content, spread across different web sites, each with their own inscrutable terms of service and (if we’re lucky) their own API. These disconnected silos prevent us from using all of this information effectively.

Meanwhile, we want—and need—to connect with each other in more ways than ever before. We need applications which can connect us, through our personal data, to the services we need.

Singly is building the technology to make this possible. It will be designed with the deepest respect for the relationship that we have with our personal data, and with a vision for truly personal computing.

Singly is…

  • A team of passionate people, dedicated to a vision for personal data
  • Building an open source data locker, which aggregates and stores your personal data from around the web and ensures that it’s always available to you
  • Enabling developers to create powerful distributed applications based on this data, without having to deal with the complexity of multiple web services APIs
  • Providing secure hosting services for personal data lockers
  • Hiring! We’re looking for people with deep experience in security and cryptography, cloud infrastructure and user experience, as well as software engineering generalists

This opportunity is a great fit for my interests and experience. Singly aims to be the commercial part of a vibrant open source community, and I’m looking forward to building on what I’ve learned in Debian, Canonical and Ubuntu to help make it a success.

I’ll have lots more to say about it as time goes on. Meanwhile, if you’re interested in following what we’re doing, here’s where:

Written by Matt Zimmerman

May 27, 2011 at 17:13

Three ways for Ubuntu to help developers

Developers are a crucial part of any successful software platform. In the same way that an operating system is “just” a means for people to use applications, a platform is “just” a means for developers to create applications and make them available to people.

There are three primary ways in which Ubuntu can help developers do their work. They are all related, but distinct, and so we should consider them individually:

1. Developing for Ubuntu

Today, Ubuntu bundles thousands of free software applications, for both clients and servers, most of which are packaged by Debian.

Ubuntu also carries certifications for a variety of third-party ISV software, both open source and proprietary, which are coordinated through Canonical’s partner program.

In both of these cases, many of these applications are actually developed on other platforms, and ported to Ubuntu, either by the free software community or by the creators of the software.

2. Developing on Ubuntu

Ubuntu is already quite popular among developers, who mainly run Desktop Edition on their workstations. They might be developing:

  • web applications (with server-side and browser components)
  • portable applications (e.g. using Java, or Adobe AIR)
  • mobile applications (e.g. for Android or iOS)
  • native applications, which might target Ubuntu Desktop Edition itself, or supporting multiple platforms through a framework like Qt

3. Distributing through Ubuntu

Like other modern operating systems, Ubuntu isn’t just a platform where applications run, but also a system for finding and installing applications. Starting with APT, which originated in Debian, we’ve added Software Center, the ISV partner repository, and various other capabilities in this area. They all help to connect developers with users, facilitating distribution of software to users, and feedback to developers.

So, where should we focus?

Some developers might be interested in all three of these, while others might only care about one or two.

However, most of the developer improvements we could make in Ubuntu would only address one of these areas.

For this reason, I think it’s important that we consider the question of the relative importance of these three developer scenarios. Given that we want Ubuntu to flourish as a platform, how would we prioritize them?

I have my own ideas, which I’ll write about in subsequent posts, but here’s your chance to tell me what you think. :-)

Written by Matt Zimmerman

November 26, 2010 at 11:32

Posted in Uncategorized

Tagged with , , ,

Embracing the Web

The web offers a compelling platform for developing modern applications. How can free software benefit more from web technology, and at the same time promote more software freedom on the web? What would the world be like if FLOSS web applications were as plentiful and successful as traditional FLOSS applications are today?

Web architecture

The web, as a collection of interlinked hypertext documents available on the Internet, has been well established for over a decade. However, the web as an application architecture is only just hitting its stride. With modern tools and frameworks, it’s relatively straightforward to build rich applications with browser-oriented frontends and HTTP-accessible backends.

This architecture has its limitations, of course: browser compatibility nightmares, limited offline capabilities, network latency, performance challenges, server-side scalability, complicated multimedia story, and so on. Most of these are slowly but surely being addressed or ameliorated as web technology improves.

However, for a large class of applications, these limitations are easily outweighed by the advantages: cross-platform support, instantaneous upgrades, global availability, etc. The web enables developers to reach the largest audience of users with the most compelling functionality, and simplifies users’ lives by giving them immediate access to their digital lives from anywhere.

Some web advocates would go so far as to say that if an application can be built for the web, it should be built for the web because it will be more successful. It’s no surprise that new web applications are being developed at a staggering rate, and I expect this trend to continue.

So what?

This trend represents a significant threat, and a corresponding opportunity, to free software. Relatively few web applications are free software, and relatively few free software applications are built for the web. Therefore, the momentum which is leading developers and users to the web is also leading them (further) away from free software.

Traditionally, pragmatists have adopted free software applications because they offered immediate gratification: it’s much faster and easier to install a free software application than to buy a proprietary one. The SaaS model of web applications offers the same (and better) immediacy, so free software has lost some of its appeal among pragmatists, who instead turn to proprietary web applications. Why install and run a heavyweight client application when you can just click a link?

Many web applications—perhaps even a majority—are built using free software, but are not themselves free. A new generation of developers share an appreciation for free software tools and frameworks, but see little value in sharing their own software. To these developers, free software is something you use, not something you make.

Free software cannot afford to ignore the web. Instead, we should embrace the web more completely, more powerfully, and more effectively than proprietary systems do.

What would that look like?

In my view, a FLOSS client platform which fully embraced the web would:

  • treat web applications as first-class citizens. The web would not be just another application, represented by a browser, but more like a native application runtime. Web applications could feel much more “native” while still preserving the advantages of a web-style user experience. There would be no web browser: that’s a tool for legacy systems to run web applications within a compatibility environment.
  • provide a seamless experience for developers to build web applications. It would be as fast and easy to develop a trivial client/server web application as it is to write “Hello, world!” in PyGTK using Quickly. For bonus points, it would be easy to develop and run web applications locally, and then deploy directly to a PaaS or IaaS cloud.
  • empower the user to manage their applications and data regardless of where they are hosted. Traditional operating systems act as a connecting fabric for local applications, providing a shared namespace, file store and IPC mechanisms, but web applications are lacking this. The web’s security model requires that applications are thoroughly sandboxed from each other, but a mediating operating system could connect them in meaningful ways, just as web browsers store cookies and passwords for various websites while protecting them from each other.

Imagine a world where free web applications are as plentiful and malleable as free native applications are today. Developers would be able to branch, test and submit patches to them.

What about Chrome OS?

Chrome OS is a step in the right direction, but doesn’t yet realize this vision. It’s a traditional operating system which is stripped down and focused on running one application (a web browser) very, very well. In some ways, it elevates web applications to first-class status, though its paradigm is still fundamentally that of a web browser.

It is not designed for development, but for consuming the web. Developers who want to create and deploy web applications must use a more traditional operating system to do so.

It does not put the end user in control. On the contrary, the user is almost entirely dependent on SaaS applications for all of their needs.

Although it is constructed using free software, it does not seem to deliver the principles or benefits of software freedom to the web itself.

How?

Just as free software was bootstrapped on proprietary UNIX, the present-day web is fertile ground for the development of free web applications. The web is based on open standards. There are already excellent web development tools, web application frameworks and server software which are FLOSS. Leading-edge web browsers like Firefox and Chrome/Chromium, where much web innovation is happening today, are already open source.

This is a huge head start toward a free web. I think what’s missing is a client platform which catalyzes the development and use of FLOSS web applications.

Written by Matt Zimmerman

July 26, 2010 at 10:43

We’ve packaged all of the free software…what now?

Today, virtually all of the free software available can be found in packaged form in distributions like Debian and Ubuntu. Users of these distributions have access to a library of thousands of applications, ranging from trivial to highly sophisticated software systems. Developers can find a vast array of programming languages, tools and libraries for constructing new applications.

This is possible because we have a mature system for turning free software components into standardized modules (packages). Some software is more difficult to package and maintain, and I’m occasionally surprised to find something very useful which isn’t packaged yet, but in general, the software I want is packaged and ready before I realize I need it. Even the “long tail” of niche software is generally packaged very effectively.

Thanks to coherent standards, sophisticated management tools, and the principles of software freedom, these packages can be mixed and matched to create complete software stacks for a wide range of devices, from netbooks to supercomputing clusters. These stacks are tightly integrated, and can be tested, released, maintained and upgraded as a unit. The Debian system is unparalleled for this purpose, which is why Ubuntu is based on it. The vision, for a free software operating system which is highly modular and customizable, has been achieved.

Rough edges

This is a momentous achievement, and the Debian packaging system fulfills its intended purpose very well. However, there are a number of areas where it introduces friction, because the package model doesn’t quite fit some new problems. Most of these are becoming more common over time as technology evolves and changes shape.

  • Embedded systems need to be pared down to the essentials to minimize storage, distribution, computation and maintenance costs. Standardized packaging introduces excessive code, data and interdependency which make the system larger than necessary. Tight integration makes it difficult to bootstrap the system from scratch for custom hardware. Projects like Embedded Debian aim to adapt the Debian system to be more suitable for use in these environments, to varying degrees of success. Meanwhile, smart phones will soon become the most common type of computer globally.
  • Data, in contrast to software, has simple requirements. It just needs to be up to date and accessible to programs. Packaging and distributing it through the standardized packaging process is awkward, doesn’t offer tangible benefits, and introduces overhead. There have been extensive debates in Debian about how to handle large data sets. Meanwhile, this problem is becoming increasingly important as data science catalyzes a new wave of applications.
  • Client/server and other types of distributed applications are notoriously tricky to package. The packaging system works within the context of a single OS instance, and so relationships which span multiple OS instances (e.g. a server application which depends on a database running on another server) are not straightforward. Meanwhile, the web has become a first-class application development platform, and this kind of interdependency is extremely common on both clients and servers.
  • Cross-platform applications such as Firefox, Chromium and OpenOffice.org have long struggled with packaging. In order to be portable, they tend to bundle the components they depend on, such as libraries. Packagers strive for normalization, and want these applications to use the packaged versions of these libraries instead. Application developers build, test and ship one set of dependencies, but their users receive a different stack when they use the packaged version of the application. Developers on both sides are in constant tension as they expect their configuration to be the canonical one, and want it to be tightly integrated. Cross-platform application developers want to provide their own, application-specific cross-platform update mechanism, while distributions want to use the same mechanism for all their components.
  • Virtual appliances aim to combine application and operating system into a portable bundle. While a modular OS is definitely called for, appliances face some of the same problems as embedded systems as they need to be minimized. Furthermore, the appliance becomes a component in itself, and requires metadata, distribution mechanisms and so on. If someone wants to “install” a virtual appliance, how should that work? Packaging them up as .debs doesn’t make much sense for the same reasons that apply to large data sets. I haven’t seen virtual appliances really taking off, but I expect cloud to change that.
  • Runtime libraries for languages such as Perl, Python and Ruby provide their own packaging systems, which manage dependencies and other metadata, installation, upgrades and removal in a standardized way. Because these operate independently of the OS package manager, all sorts of problems arise. Projects such as GoboLinux have attempted to tie them together, to varying degrees of success. Meanwhile, each new programming language we invent comes with a different, incompatible package manager, and distribution developers need to spend time repackaging them into their preferred format.

Why are we stuck?

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
– Abraham Maslow

The packaging ecosystem is very strong. Not only do we have powerful tools for working with packages, we also benefit from packages being a well-understood concept, and having established processes for developing, exchanging and talking about them. Once something is packaged, we know what it is and how to work with it, and it “fits” into everything else. So, it is tempting to package everything in sight, as we already know how to make sense of packages. However, this may not always be the right tool for the job.

Various attempts have been made to extend the packaging concept to make it more general, for example:

  • Portage, of Gentoo fame, offers impressive flexibility by building packages with a custom configuration, tailored for the needs of the target system.
  • Conary, from rPath, offers finer-grained dependencies, powerful revision control and object-oriented build recipes.
  • Nix provides a consistent build and runtime environment, ensuring that programs are run with the same dependencies used to build them, by keeping the relevant versions installed. I don’t know much about it, but it sounds like all dependencies implicitly refer to an exact version.

Other package managers aim to solve a specific problem, such as providing lightweight package management for embedded systems, or lazy dependency installation, or fixing the filesystem hierarchy. There is a long list of package managers of various levels which solve different problems.

Most of these systems suffer from an important fundamental tradeoff: they are designed to manage the entire system, from the kernel through applications, and so they must be used wholesale in order to reap their full benefit. In other words, in their world, everything is a package, and anything which is not a package is out of scope. Therefore, each of these systems requires a separate collection of packages, and each time we invent a new one, its adherents set about packaging everything in the new format. It takes a very long time to do this, and most of them lose momentum before a mature ecosystem can form around them.

This lock-in effect makes it difficult for new packaging technologies to succeed.

Divide and Conquer

No single package management framework is flexible enough to accommodate all of the needs we have today. Even more importantly, a generic solution won’t account for the needs we will have tomorrow. I propose that in order to move forward, we must make it possible to solve packaging problems separately, rather than attempting to solve them all within a single system.

  • Decouple applications from the platform. Debian packaging is an excellent solution for managing the network of highly interdependent components which make up the core of a modern Linux distribution. It falls short, however, for managing the needs of modern applications: fast-moving, cross-platform and client/server (especially web). Let’s stop trying to fit these square pegs into round holes, and adopt a different solution for this space, preferably one which is comprehensible and useful to application developers so that they can do most of the work.
  • Treat data as a service. It’s no longer useful to package up documentation in order to provide local copies of it on every Linux system. The web is a much, much richer and more effective solution to that problem. The same principle is increasingly applicable to structured data. From documents and contacts to anti-virus signatures and PCI IDs, there’s much better data to be had “out there” on the web than “down here” on the local filesystem.
  • Simplify integration between packaging systems in order to enable a heterogeneous model. When we break the assumption that everything is a package, we will need new tools to manage the interfaces between different types of components. Applications will need to introspect their dependency chain, and system management tools will need to be able to interrogate applications. We’ll need thoughtfully designed interfaces which provide an appropriate level of abstraction while offering sufficient flexibility to solve many different packaging problems. There is unarguably a cost to this heterogeneity, but I believe it would easily outweigh the shortcomings of our current model.

But I like things how they are!

We don’t have a choice. The world is changing around us, and distributions need to evolve with it. If we don’t adapt, we will eventually give way to systems which do solve these problems.

Take, for example, modern web browsers like Firefox and Chromium. Arguably the most vital application for users, the browser is coming under increasing pressure to keep up with the breakneck pace of innovation on the web. The next wave of real-time collaboration and multimedia applications relies on the rapid development of new capabilities in web browsers. Browser makers are responding by accelerating deployment in the field: both aggressively push new releases to their users. A report from Google found that Chrome upgrades 97% of their users within 21 days of a new release, and Firefox 85% (both impressive numbers). Mozilla recently changed their maintenance policies, discontinuing maintenance of stable releases and forcing Ubuntu to ship new upstream releases to users.

These applications are just the leading edge of the curve, and the pressure will only increase. Equally powerful trends are pressing server applications, embedded systems, and data to adapt as well. The ideas I’ve presented here are only one possible way forward, and I’m sure there are more and better ideas brewing in distribution communities. I’m sure that I’m not the only one thinking about these problems.

Whatever it looks like in the end, I have no doubt that change is ahead.

Written by Matt Zimmerman

July 6, 2010 at 15:31

Ubuntu 10.10 (Maverick) Developer Summit

I spent last week at the Ubuntu Developer Summit in Belgium, where we kicked off the 10.10 development cycle.

Due to our time-boxed release cycle, not everything discussed here will necessarily appear in Ubuntu 10.10, but this should provide a reasonable overview of the direction we’re taking.

Presentations

While most of our time at UDS is spent in small group meetings to discuss specific topics, there are also a handful of presentations to convey key information and stimulate creative thinking.

A few of the more memorable ones for me were:

  • Mark Shuttleworth talked about the desktop, in particular the introduction of the new Unity shell for Ubuntu Netbook Edition
  • Fanny Chevalier presented Diffamation, a tool for visualizing and navigating the history of a document in a very flexible and intuitive way
  • Rick Spencer talked about the development process for 10.10 and some key changes in it, including a greater focus on meeting deadlines for freezes (and making fewer exceptions)
  • Stefano Zacchiroli, the current Debian project leader, gave an overview of how Ubuntu and Debian developers are working together today, and how this could be improved. He has posted a summary on the debian-project mailing list.

The talks were all recorded, though they may not all be online yet.

Foundations

The Foundations team provides essential infrastructure, tools, packages and processes which are central to the development of all Ubuntu products. They make it possible for the desktop and server teams to focus on their areas of expertise, building on a common base system and development procedures.

Highlights from their track:

Desktop

The desktop team manages both Desktop Edition and Netbook Edition, on a mission to provide a top-notch experience to users across a range of client computing devices.

Highlights from their track:

Server/Cloud

The server team is charging ahead with making Ubuntu the premier server OS for cloud computing environments.

Highlights from their track:

ARM

Kiko Reis gave a talk introducing ARM and the corresponding opportunity for Ubuntu. The ARM team ran a full track during the week on all aspects of their work, from the technical details of the kernel and toolchain, to the assembly of a complete port of Netbook Edition 10.10 for several ARM platforms.

Kernel

The kernel team provided essential input and support for the above efforts, and also held their own track where they selected 2.6.35 as their target version, agreed on a variety of changes to the Ubuntu kernel configuration, and created a plan for providing backports of newer kernels to LTS releases to support deployment on newer hardware.

Security

Like the kernel team, the security team provided valuable input into the technical plans being developed by other teams, and also organized a security track to tackle some key security topics such as clarifying the duration of maintenance for various collections of packages, and the ongoing development of AppArmor and Ubuntu’s AppArmor profiles.

QA

The QA team focuses on testing, test automation and bug management throughout the project. While quality is everyone’s responsibility, the QA team helps to coordinate these activities across different teams, establish common standards, and maintain shared infrastructure and tools.

Highlights from their track include:

Design

The design team organized a track at UDS for the first time this cycle, and team manager Ivanka Majic gave a presentation to help explain its purpose and scope.

Toward the end of the week, I joined in a round table discussion about some of the challenges faced by the team in engaging with the Ubuntu community and building support for their work. This is a landmark effort in mating professional design with free software community, and there is still much to learn about how to do this well.

Community

The community track discussed the usual line-up of events, outreach and advocacy programs, organizational tools, and governance housekeeping for the 10.10 cycle, as well as goals for improving the translation of Ubuntu and related resources into many languages.

One notable project is an initiative to aggressively review patches submitted to the bug tracker, to show our appreciation for these contributions by processing them more quickly and completely.

Written by Matt Zimmerman

May 17, 2010 at 12:01

A farewell to Facebook

Actually, on second thought, this isn’t a farewell. There is no goodwill on either side of this bitter parting. Facebook doesn’t care about my privacy or consent, and I don’t care about their shady and anti-competitive profiteering. It’s over between us.

So, here’s what I did:

  1. made a list of all of my connections on Facebook (via copy and paste from a browser window)
  2. confirmed that I have alternate contact information for all of them, either by email or other social media
  3. deleted all of my photo albums
  4. deleted all of the information from all sections of my profile
  5. removed all applications
  6. deleted everything under “My Links”
  7. deleted all of my notes (this was tedious, as my blog was syndicated there)
  8. followed the WikiHow instructions to delete my account

Obviously, I can’t really control whether they hold onto my data in spite of this. This should provide a clear statement of intent on my part, though, if it comes up in future shenanigans.

Written by Matt Zimmerman

May 16, 2010 at 12:52

Posted in Uncategorized

Tagged with , , ,

On Cloud

If you’re convinced that “cloud” is a useless buzzword, being used to describe everything under the sun, old and new, then…well, you’re right—that is, except for the “useless” part. It’s true that this word is being (ab)used in many different products, services and technologies, which do not seem to relate to each other in any concrete way. “Cloud” in the abstract is being defined in many different ways, based on different fundamental characteristics of “cloud technology”. Nonetheless, there is something genuinely important going on here, and this is my view of what it’s about.

The hype

Countless business which only had “web sites” or “web applications” in 2008 now call them “cloud services”. They aren’t delivering any new benefits to their users, and they haven’t redesigned their infrastructure. What do they mean by this?

Cloud is defined, some say, by where your data is kept, and “cloud” means storing your data on remote servers instead of internal ones. Millions of Gmail and Flickr users have been “in the cloud” without even knowing it, and so has everyone who reads their mail over IMAP, or uses voicemail. In short, cloud is just data that is somewhere else.

Credit: jurvetson

Last September, I watched a presentation on “Cloud computing” where Symantec pitched their anti-virus product as having something like three different “cloud computing” technologies. These, it turned out, all involved downloading files over the Internet. In this view, cloud is essentially about the network, derived from our habit of using a drawing of a cloud in our network diagrams for so many years. Wikipedia currently shares this view, as its Cloud computing article opens with “Cloud computing is Internet-based computing”. No Internet, no cloud.

Maybe cloud is just SaaS, and any service provided on a pay-per-use basis over the Internet is a “cloud” service. It would seem that cloud is about paying for things, especially on a “utility” basis. The cloud is computing by the hour; it’s something you buy.

Others maintain that “cloud” is just virtualization, so if you’re using KVM or VMWare, you’ve got a cloud already. In this view, cloud is about operating systems, and whether they’re running on real or virtual hardware. Cloud means deploying applications as virtual appliances, and creating new servers entirely in software. If your servers only have one OS on them, then they’re not cloud-ready.

If this vagueness and contradiction irritates you, then you’re not alone. It bugs me, and a lot of other people who are getting on with developing, using and providing technology and services. Cloud is not just a new name for these familiar technologies. In fact, it isn’t a technology at all. I don’t think it even makes sense to categorize technologies as “cloud” and “non-cloud”, though some may be more “cloudy” than others.

Nonetheless, there is some meaning and validity in each of these interpretations of cloud. So what is it?

A different perspective

Credit: TangYauHoong

Cloud is a transition, a trend, a paradigm shift. It isn’t something which exists or not: it is happening, and we won’t understand its essential nature until it’s over. Simon Wardley explains this, in presentation format, much better than I could in this blog post, so if you haven’t already, I suggest that you take 14 minutes of your time and go and watch him do his thing.

This change doesn’t have a clear beginning or an end. Early on, it was a disconnected set of ideas without any identity. We’re now somewhere in the middle of the bell curve, with enough insight to give it a name, and sufficient momentum to guess at what happens next. In the end, it will be absorbed into “the way things are”.

What’s going on?

So, what is actually changing? The key trends I see are:

  1. People and organizations are becoming more comfortable relying on resources which do not exist in any particular place. Rather than storing precious metals under our beds, we entrust our finances to banks, which store our data and provide us with services which let us “use” our “money”. In computing terms, we’re no longer enamored with keeping all of “our” programs and “our” data on “our” hard drive: as it turns out, it’s not very safe there after all, and in some ways we can actually exercise more control over programs and data “out there” than “in here”. We can no longer hoard our gold, or drive off would-be thieves with a pistol, but that wasn’t a productive way to spend our energies anyway. While we can’t put our hands on our money, we can know exactly what’s happening with it from minute to minute, and use it in a variety of ways at a moment’s notice.
  2. IT products are ceding ground to services. In many cases, we no longer need to build software, or even buy it: we can simply use it. As computing needs become better understood, and can be met by commodity products, it becomes more feasible to develop services to meet them on-demand. This is why we see a spike in acronyms ending in “aaS”. Simon explains the progression in detail in his talk, linked above.
  3. Hardware, software and data are being reorganized according to new models, in order to provide these services effectively. Architectural patterns, such as “hardware/OS/framework/application” are giving way to new ones, like “hardware/OS/IaaS/OS/PaaS/SaaS/Internet/web browser/OS/hardware”. These are still evolving, and the lines between them are blurry. It’s a bit early to say what the dominant patterns will be, but system, network, data and application architecture are all being transformed. No single technology or architecture defines cloud, but virtualization (at the infrastructure level) and the web (at the application level) both seem to resonate strongly with “cloudy” design patterns.

These trends are reinforcing and accelerating each other, driving information technologies and businesses in a common direction. That, in a nutshell, is what cloud is all about.

So what?

This transformation is disruptive in many different ways, but the angles which most interest me at the moment are:

  • Operating systems – Cloud seems to indicate further commoditization of operating systems. We will have more operating systems than ever before (thanks to virtualization and IaaS), but we probably won’t think about them as much (as in software appliances). I think that the cloud world will want operating systems which are free, standard and highly customizable, which is potentially a great opportunity for Ubuntu and other open systems.
  • Software freedom – As Eben Moglen and others have pointed out, this trend has significant consequences for the free software movement. As the shape of software changes, our principles of freedom must evolve as well. What does it mean to have the freedom to “run” a program in the cloud? To copy it? To change it? What other protections might people need in order to exercise these freedoms in the future?
  • DevOps – I see a strong resonance between cloud—which is blurring the lines between hardware and software, infrastructure and applications, network and computer—and DevOps, which is bringing together the people who currently work in these different categories. Cloud means that we can no longer afford to treat these elements separately, and must work together to find better approaches to developing and deploying software, knocking down barriers and mapping new territory. Fortunately, cloud also means developing powerful new tools which will power this revolution.
  • Clients – A majority of cloud related activity seems to be focused on what happens to back-end servers, but what about the computers we actually touch, on our desks, and in our pockets? How will they change? Will we end up reverting to a highly centralized computing model, where clients are strictly limited to front-end user interface processing (e.g. a web browser)? Will clients “join” the cloud and be providers of computing resources, not only consumers? What new types of devices will we need in order to make the most of cloud?

I’d like to explore some of these topics in future posts.

Written by Matt Zimmerman

April 22, 2010 at 14:00

Interviewed by Ubuntu Turkey

As part of my recent Istanbul visit, I was interviewed by Ubuntu Turkey. They’ve now published the interview in Turkish. With their permission, the original English interview has been published on Ubuntu User by Amber Graner.

Written by Matt Zimmerman

April 19, 2010 at 15:15

Introducing the jonometer

a learning experiment using Python, Twitter, CouchDB and desktopcouch

For a while now, I’ve been wanting to do a programming project using CouchDB and third-party web service APIs. I started out with an application to sync Launchpad bug data into CouchDB so that I could analyze it locally, a bit like Bug Hugger. It quickly got too complex for my spare time, and stalled. I’d still like to pick it up someday when I can devote more time to it.

More recently, I was noticing that Jono seemed to be having a rocking good time lately, sending a lot of awesome tweets about jams. This was only conjecture, though, and I needed hard data. I need to quantify just how strong these influences were.

Now, this was a project I could get done in an evening of hacking and learning.

Version One

First, I threw together this quick proof of concept to learn the Twitter API and get some tantalizing preliminary data. Behold version 1.0 of the jonometer:

#!/usr/bin/python

# python-twitter

import sys
import twitter
import re

username = 'jonobacon'
updates_wanted = 100
patterns = ['rock', 'awesome', 'jam']

class Counter:
    """A simple accumulator which counts matches of a regex"""

    def __init__(self, pattern):
        self.pattern = pattern
        self.regex = re.compile(pattern, re.I)
        self.count = 0

    def update(self, s):
        """Increment count if the string s matches the pattern"""
        if self.regex.search(s):
            self.count += 1

def main():
    client = twitter.Api()
    counters = map(Counter, patterns)
    updates_found = 0
    for update in client.GetUserTimeline(username, updates_wanted):
        updates_found += 1
        for counter in counters:
            counter.update(update.GetText())

    for counter in counters:
        print counter.pattern, counter.count

if __name__ == '__main__':
    sys.exit(main())

The output looked like this:

rock 5
awesome 6
jam 10

In other words, about 5% of Jono’s recent tweets were rocking, another 6% were awesome, and a whopping 10% were jamming! I was definitely onto something, but I had to find out more.

One of the shortcomings of this quick prototype is that it would download the data from Twitter every time I ran it. This meant that it was fairly slow (about 2 seconds for 100 tweets), which is inconvenient for experimenting with different patterns, and that I wouldn’t want to try it with larger data sets (say, thousands of tweets, or multiple people).

Version Two

Enter CouchDB, the darling of the NoSQL crowd: fast, scalable and simple, it was just what I wanted for the next version of the jonometer. I replaced the Counter objects with a single Database, which stores all of the tweets in CouchDB. This was incredibly simple to do, because python-twitter provides an .AsDict() method which returns a tweet as a dictionary object, and CouchDB can store this type of data structure directly into the database. Easy!

Each time the jonometer is run, it downloads all of the new tweets since the previous run. In order to do this, it needs to keep track of the most recent tweet ID it has seen, so that it can pick up where it left off. I had originally planned to store a record in the database with the sync state, but after Stuart reminded me that Gwibber does much the same thing, I followed its example and instead calculated it using a view. Each row in the “maxid” view records the highest tweet ID seen for a particular user:

The maxid view
Key Value
jonobacon 10743678774

…so although the jonometer is currently Jono-specific, it could be extended easily.

For the core functionality, I created a view called “matches” to count how many tweets match each pattern. For each key (username and pattern), there is a row in this view which records how many tweets from that user matched that pattern:

The matches view
Key Value
["jonobacon", null] 100
["jonobacon", "Awesome"] 6
["jonobacon", "Jam"] 10
["jonobacon", "Rock"] 5

The null pattern is used to keep a count of the total number of tweets for that user.

Once the data is loaded, the runtime for the CouchDB version is only about 0.3 seconds, including the Python interpreter startup as well as checking Twitter to see if there are new tweets. I doubled the size of the database up to 200 (which was about all Twitter would give me in one batch), and this didn’t change measurably. If I’ve done all of this right, it should scale easily up to thousands of tweets. Awesome! Adding or changing a pattern currently requires manually deleting the view so that it can be re-created. There is probably an established pattern for dealing with this, but I don’t know what it is yet.

Here’s the code for version 2:

#!/usr/bin/python

# python-twitter
# python-desktopcouch

import sys
import twitter
import re
from desktopcouch.records.server import CouchDatabase
from desktopcouch.records.record import Record

username = 'jonobacon'
# title string : JavaScript regex
patterns = { 'Rock' : 'rock',
        'Awesome' : 'awesome',
        'Jam' : 'jam' }

class Database(CouchDatabase):
    design_doc = "jonometer"
    database_name = "jonometer"

    def __init__(self, patterns):
        """patterns is a dictionary of (title string, JavaScript regex)"""

        CouchDatabase.__init__(self, self.database_name, create=True)
        self.patterns = patterns.copy()

        # set up maxid view
        if not self.view_exists("maxid", self.design_doc):
            mapfn = '''function(doc) { emit(doc.user.screen_name, doc.id); }'''
            viewfn = '''function(key, values, rereduce) {
    return Math.max.apply(Math, values);
}'''
            self.add_view("maxid", mapfn, viewfn, self.design_doc)

        # set up a view to count occurrences of each pattern
        if not self.view_exists("matches", self.design_doc):

            mapfn = '''
function(doc) {
    emit([doc.user.screen_name, null], 1);

    var pattern = null;
    var pattern_name = null;
'''

            mapfn += ''.join(['''   
    pattern = "%s";
    pattern_name = "%s";
    if (new RegExp(pattern, "i").exec(doc.text)) {
        emit([doc.user.screen_name, pattern_name], 1);
    }
    ''' % (pattern, pattern_name)
       for pattern_name, pattern in self.patterns.items()])

            mapfn += '}'

            viewfn = '''function(key, values, rereduce) { return sum(values); }'''
            self.add_view("matches", mapfn, viewfn, self.design_doc)

    def maxid(self, username):
        """Return the highest known tweet ID for the specified user"""

        view = self.execute_view("maxid", self.design_doc)
        result = view[username].rows
        if len(result) > 0:
            return result[0].value
        return None

    def count_matches(self, username, pattern_name=None):
        """Return the number of tweets from username which match 
        the specified pattern.

        If no pattern is specified, count all tweets."""

        assert pattern_name is None or pattern_name in self.patterns
        view = self.execute_view("matches", self.design_doc)
        result = view[[username, pattern_name]].rows
        if len(result) > 0:
            return result[0].value

def main():
    client = twitter.Api()
    db = Database(patterns)

    maxid = db.maxid(username)
    if maxid:
        timeline = client.GetUserTimeline(username, since_id=maxid)
    else:
        timeline = client.GetUserTimeline(username, count=100)

    for tweet in timeline:
        print "new:", tweet.GetText()
        record = Record(tweet.AsDict(),
                record_type='https://mdzlog.alcor.net/tag/jonometer')
        record_id = db.put_record(record)

    for pattern in patterns:
        print pattern, db.count_matches(username, pattern)
    print "total", db.count_matches(username)

if __name__ == '__main__':
    sys.exit(main())

Written by Matt Zimmerman

March 19, 2010 at 23:41