Posts Tagged ‘Web’
Where’s your data center?
Thanks to the tremendous growth of “social” applications over the past five years, we have our pick of services for collecting, saving and sharing our experiences online. We each have collections of photos, contacts, messages and more, spread across multiple popular services like Twitter, Facebook, LinkedIn, as well as many less popular services which address particular needs or preferences. We’re also producing a wealth of “exhaust data” through our browsing history, mobile sensors, transactions and other activity streams that we rarely if ever examine directly.
This ecosystem is becoming so complex that it’s easy to lose track of what you’ve created, shared or seen. We need new tools to manage this complexity, to make the most of the wealth of information and connections available to us through various services. John Battelle calls these “metaservices”, and points to growth in the number of connections between the services we use.
I expect that this next age of information tools will center around data rather than services. Data is a common denominator for these online experiences, a bridge across disparate services, technologies, social graphs, and life cycles. Personal data, in particular, has this property: the only thing that links together your photos on Flickr, Facebook, Picasa and Twitpic is…you.
So where’s your “data center”? I don’t anticipate the emergence of a single service where you do everything. There will continue to be innovation in the form of new and specialized services which meet a particular need very well. There won’t be a single service which is everything to everybody.
Instead, I foresee us wanting to track, save, use and control all of our “stuff” across the web. That’s why my new colleagues and I are working to make that possible.
There’s open source code available on github, a vibrant IRC channel (#lockerproject on Freeenode), and lots more I’d like to write about it. But it’s time to get back to work for now…
Three ways for Ubuntu to help developers
Developers are a crucial part of any successful software platform. In the same way that an operating system is “just” a means for people to use applications, a platform is “just” a means for developers to create applications and make them available to people.
There are three primary ways in which Ubuntu can help developers do their work. They are all related, but distinct, and so we should consider them individually:
1. Developing for Ubuntu
Today, Ubuntu bundles thousands of free software applications, for both clients and servers, most of which are packaged by Debian.
Ubuntu also carries certifications for a variety of third-party ISV software, both open source and proprietary, which are coordinated through Canonical’s partner program.
In both of these cases, many of these applications are actually developed on other platforms, and ported to Ubuntu, either by the free software community or by the creators of the software.
2. Developing on Ubuntu
Ubuntu is already quite popular among developers, who mainly run Desktop Edition on their workstations. They might be developing:
- web applications (with server-side and browser components)
- portable applications (e.g. using Java, or Adobe AIR)
- mobile applications (e.g. for Android or iOS)
- native applications, which might target Ubuntu Desktop Edition itself, or supporting multiple platforms through a framework like Qt
3. Distributing through Ubuntu
Like other modern operating systems, Ubuntu isn’t just a platform where applications run, but also a system for finding and installing applications. Starting with APT, which originated in Debian, we’ve added Software Center, the ISV partner repository, and various other capabilities in this area. They all help to connect developers with users, facilitating distribution of software to users, and feedback to developers.
So, where should we focus?
Some developers might be interested in all three of these, while others might only care about one or two.
However, most of the developer improvements we could make in Ubuntu would only address one of these areas.
For this reason, I think it’s important that we consider the question of the relative importance of these three developer scenarios. Given that we want Ubuntu to flourish as a platform, how would we prioritize them?
I have my own ideas, which I’ll write about in subsequent posts, but here’s your chance to tell me what you think. :-)
Embracing the Web
The web offers a compelling platform for developing modern applications. How can free software benefit more from web technology, and at the same time promote more software freedom on the web? What would the world be like if FLOSS web applications were as plentiful and successful as traditional FLOSS applications are today?
Web architecture
The web, as a collection of interlinked hypertext documents available on the Internet, has been well established for over a decade. However, the web as an application architecture is only just hitting its stride. With modern tools and frameworks, it’s relatively straightforward to build rich applications with browser-oriented frontends and HTTP-accessible backends.
This architecture has its limitations, of course: browser compatibility nightmares, limited offline capabilities, network latency, performance challenges, server-side scalability, complicated multimedia story, and so on. Most of these are slowly but surely being addressed or ameliorated as web technology improves.
However, for a large class of applications, these limitations are easily outweighed by the advantages: cross-platform support, instantaneous upgrades, global availability, etc. The web enables developers to reach the largest audience of users with the most compelling functionality, and simplifies users’ lives by giving them immediate access to their digital lives from anywhere.
Some web advocates would go so far as to say that if an application can be built for the web, it should be built for the web because it will be more successful. It’s no surprise that new web applications are being developed at a staggering rate, and I expect this trend to continue.
So what?
This trend represents a significant threat, and a corresponding opportunity, to free software. Relatively few web applications are free software, and relatively few free software applications are built for the web. Therefore, the momentum which is leading developers and users to the web is also leading them (further) away from free software.
Traditionally, pragmatists have adopted free software applications because they offered immediate gratification: it’s much faster and easier to install a free software application than to buy a proprietary one. The SaaS model of web applications offers the same (and better) immediacy, so free software has lost some of its appeal among pragmatists, who instead turn to proprietary web applications. Why install and run a heavyweight client application when you can just click a link?
Many web applications—perhaps even a majority—are built using free software, but are not themselves free. A new generation of developers share an appreciation for free software tools and frameworks, but see little value in sharing their own software. To these developers, free software is something you use, not something you make.
Free software cannot afford to ignore the web. Instead, we should embrace the web more completely, more powerfully, and more effectively than proprietary systems do.
What would that look like?
In my view, a FLOSS client platform which fully embraced the web would:
- treat web applications as first-class citizens. The web would not be just another application, represented by a browser, but more like a native application runtime. Web applications could feel much more “native” while still preserving the advantages of a web-style user experience. There would be no web browser: that’s a tool for legacy systems to run web applications within a compatibility environment.
- provide a seamless experience for developers to build web applications. It would be as fast and easy to develop a trivial client/server web application as it is to write “Hello, world!” in PyGTK using Quickly. For bonus points, it would be easy to develop and run web applications locally, and then deploy directly to a PaaS or IaaS cloud.
- empower the user to manage their applications and data regardless of where they are hosted. Traditional operating systems act as a connecting fabric for local applications, providing a shared namespace, file store and IPC mechanisms, but web applications are lacking this. The web’s security model requires that applications are thoroughly sandboxed from each other, but a mediating operating system could connect them in meaningful ways, just as web browsers store cookies and passwords for various websites while protecting them from each other.
Imagine a world where free web applications are as plentiful and malleable as free native applications are today. Developers would be able to branch, test and submit patches to them.
What about Chrome OS?
Chrome OS is a step in the right direction, but doesn’t yet realize this vision. It’s a traditional operating system which is stripped down and focused on running one application (a web browser) very, very well. In some ways, it elevates web applications to first-class status, though its paradigm is still fundamentally that of a web browser.
It is not designed for development, but for consuming the web. Developers who want to create and deploy web applications must use a more traditional operating system to do so.
It does not put the end user in control. On the contrary, the user is almost entirely dependent on SaaS applications for all of their needs.
Although it is constructed using free software, it does not seem to deliver the principles or benefits of software freedom to the web itself.
How?
Just as free software was bootstrapped on proprietary UNIX, the present-day web is fertile ground for the development of free web applications. The web is based on open standards. There are already excellent web development tools, web application frameworks and server software which are FLOSS. Leading-edge web browsers like Firefox and Chrome/Chromium, where much web innovation is happening today, are already open source.
This is a huge head start toward a free web. I think what’s missing is a client platform which catalyzes the development and use of FLOSS web applications.
We’ve packaged all of the free software…what now?
Today, virtually all of the free software available can be found in packaged form in distributions like Debian and Ubuntu. Users of these distributions have access to a library of thousands of applications, ranging from trivial to highly sophisticated software systems. Developers can find a vast array of programming languages, tools and libraries for constructing new applications.
This is possible because we have a mature system for turning free software components into standardized modules (packages). Some software is more difficult to package and maintain, and I’m occasionally surprised to find something very useful which isn’t packaged yet, but in general, the software I want is packaged and ready before I realize I need it. Even the “long tail” of niche software is generally packaged very effectively.
Thanks to coherent standards, sophisticated management tools, and the principles of software freedom, these packages can be mixed and matched to create complete software stacks for a wide range of devices, from netbooks to supercomputing clusters. These stacks are tightly integrated, and can be tested, released, maintained and upgraded as a unit. The Debian system is unparalleled for this purpose, which is why Ubuntu is based on it. The vision, for a free software operating system which is highly modular and customizable, has been achieved.
Rough edges
This is a momentous achievement, and the Debian packaging system fulfills its intended purpose very well. However, there are a number of areas where it introduces friction, because the package model doesn’t quite fit some new problems. Most of these are becoming more common over time as technology evolves and changes shape.
- Embedded systems need to be pared down to the essentials to minimize storage, distribution, computation and maintenance costs. Standardized packaging introduces excessive code, data and interdependency which make the system larger than necessary. Tight integration makes it difficult to bootstrap the system from scratch for custom hardware. Projects like Embedded Debian aim to adapt the Debian system to be more suitable for use in these environments, to varying degrees of success. Meanwhile, smart phones will soon become the most common type of computer globally.
- Data, in contrast to software, has simple requirements. It just needs to be up to date and accessible to programs. Packaging and distributing it through the standardized packaging process is awkward, doesn’t offer tangible benefits, and introduces overhead. There have been extensive debates in Debian about how to handle large data sets. Meanwhile, this problem is becoming increasingly important as data science catalyzes a new wave of applications.
- Client/server and other types of distributed applications are notoriously tricky to package. The packaging system works within the context of a single OS instance, and so relationships which span multiple OS instances (e.g. a server application which depends on a database running on another server) are not straightforward. Meanwhile, the web has become a first-class application development platform, and this kind of interdependency is extremely common on both clients and servers.
- Cross-platform applications such as Firefox, Chromium and OpenOffice.org have long struggled with packaging. In order to be portable, they tend to bundle the components they depend on, such as libraries. Packagers strive for normalization, and want these applications to use the packaged versions of these libraries instead. Application developers build, test and ship one set of dependencies, but their users receive a different stack when they use the packaged version of the application. Developers on both sides are in constant tension as they expect their configuration to be the canonical one, and want it to be tightly integrated. Cross-platform application developers want to provide their own, application-specific cross-platform update mechanism, while distributions want to use the same mechanism for all their components.
- Virtual appliances aim to combine application and operating system into a portable bundle. While a modular OS is definitely called for, appliances face some of the same problems as embedded systems as they need to be minimized. Furthermore, the appliance becomes a component in itself, and requires metadata, distribution mechanisms and so on. If someone wants to “install” a virtual appliance, how should that work? Packaging them up as .debs doesn’t make much sense for the same reasons that apply to large data sets. I haven’t seen virtual appliances really taking off, but I expect cloud to change that.
- Runtime libraries for languages such as Perl, Python and Ruby provide their own packaging systems, which manage dependencies and other metadata, installation, upgrades and removal in a standardized way. Because these operate independently of the OS package manager, all sorts of problems arise. Projects such as GoboLinux have attempted to tie them together, to varying degrees of success. Meanwhile, each new programming language we invent comes with a different, incompatible package manager, and distribution developers need to spend time repackaging them into their preferred format.
Why are we stuck?
I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
– Abraham Maslow
The packaging ecosystem is very strong. Not only do we have powerful tools for working with packages, we also benefit from packages being a well-understood concept, and having established processes for developing, exchanging and talking about them. Once something is packaged, we know what it is and how to work with it, and it “fits” into everything else. So, it is tempting to package everything in sight, as we already know how to make sense of packages. However, this may not always be the right tool for the job.
Various attempts have been made to extend the packaging concept to make it more general, for example:
- Portage, of Gentoo fame, offers impressive flexibility by building packages with a custom configuration, tailored for the needs of the target system.
- Conary, from rPath, offers finer-grained dependencies, powerful revision control and object-oriented build recipes.
- Nix provides a consistent build and runtime environment, ensuring that programs are run with the same dependencies used to build them, by keeping the relevant versions installed. I don’t know much about it, but it sounds like all dependencies implicitly refer to an exact version.
Other package managers aim to solve a specific problem, such as providing lightweight package management for embedded systems, or lazy dependency installation, or fixing the filesystem hierarchy. There is a long list of package managers of various levels which solve different problems.
Most of these systems suffer from an important fundamental tradeoff: they are designed to manage the entire system, from the kernel through applications, and so they must be used wholesale in order to reap their full benefit. In other words, in their world, everything is a package, and anything which is not a package is out of scope. Therefore, each of these systems requires a separate collection of packages, and each time we invent a new one, its adherents set about packaging everything in the new format. It takes a very long time to do this, and most of them lose momentum before a mature ecosystem can form around them.
This lock-in effect makes it difficult for new packaging technologies to succeed.
Divide and Conquer
No single package management framework is flexible enough to accommodate all of the needs we have today. Even more importantly, a generic solution won’t account for the needs we will have tomorrow. I propose that in order to move forward, we must make it possible to solve packaging problems separately, rather than attempting to solve them all within a single system.
- Decouple applications from the platform. Debian packaging is an excellent solution for managing the network of highly interdependent components which make up the core of a modern Linux distribution. It falls short, however, for managing the needs of modern applications: fast-moving, cross-platform and client/server (especially web). Let’s stop trying to fit these square pegs into round holes, and adopt a different solution for this space, preferably one which is comprehensible and useful to application developers so that they can do most of the work.
- Treat data as a service. It’s no longer useful to package up documentation in order to provide local copies of it on every Linux system. The web is a much, much richer and more effective solution to that problem. The same principle is increasingly applicable to structured data. From documents and contacts to anti-virus signatures and PCI IDs, there’s much better data to be had “out there” on the web than “down here” on the local filesystem.
- Simplify integration between packaging systems in order to enable a heterogeneous model. When we break the assumption that everything is a package, we will need new tools to manage the interfaces between different types of components. Applications will need to introspect their dependency chain, and system management tools will need to be able to interrogate applications. We’ll need thoughtfully designed interfaces which provide an appropriate level of abstraction while offering sufficient flexibility to solve many different packaging problems. There is unarguably a cost to this heterogeneity, but I believe it would easily outweigh the shortcomings of our current model.
But I like things how they are!
We don’t have a choice. The world is changing around us, and distributions need to evolve with it. If we don’t adapt, we will eventually give way to systems which do solve these problems.
Take, for example, modern web browsers like Firefox and Chromium. Arguably the most vital application for users, the browser is coming under increasing pressure to keep up with the breakneck pace of innovation on the web. The next wave of real-time collaboration and multimedia applications relies on the rapid development of new capabilities in web browsers. Browser makers are responding by accelerating deployment in the field: both aggressively push new releases to their users. A report from Google found that Chrome upgrades 97% of their users within 21 days of a new release, and Firefox 85% (both impressive numbers). Mozilla recently changed their maintenance policies, discontinuing maintenance of stable releases and forcing Ubuntu to ship new upstream releases to users.
These applications are just the leading edge of the curve, and the pressure will only increase. Equally powerful trends are pressing server applications, embedded systems, and data to adapt as well. The ideas I’ve presented here are only one possible way forward, and I’m sure there are more and better ideas brewing in distribution communities. I’m sure that I’m not the only one thinking about these problems.
Whatever it looks like in the end, I have no doubt that change is ahead.
Ubuntu 10.10 (Maverick) Developer Summit
I spent last week at the Ubuntu Developer Summit in Belgium, where we kicked off the 10.10 development cycle.
Due to our time-boxed release cycle, not everything discussed here will necessarily appear in Ubuntu 10.10, but this should provide a reasonable overview of the direction we’re taking.
Presentations
While most of our time at UDS is spent in small group meetings to discuss specific topics, there are also a handful of presentations to convey key information and stimulate creative thinking.
A few of the more memorable ones for me were:
- Mark Shuttleworth talked about the desktop, in particular the introduction of the new Unity shell for Ubuntu Netbook Edition
- Fanny Chevalier presented Diffamation, a tool for visualizing and navigating the history of a document in a very flexible and intuitive way
- Rick Spencer talked about the development process for 10.10 and some key changes in it, including a greater focus on meeting deadlines for freezes (and making fewer exceptions)
- Stefano Zacchiroli, the current Debian project leader, gave an overview of how Ubuntu and Debian developers are working together today, and how this could be improved. He has posted a summary on the debian-project mailing list.
The talks were all recorded, though they may not all be online yet.
Foundations
The Foundations team provides essential infrastructure, tools, packages and processes which are central to the development of all Ubuntu products. They make it possible for the desktop and server teams to focus on their areas of expertise, building on a common base system and development procedures.
Highlights from their track:
- Early on in the week, they conducted a retrospective to discuss how things went during the 10.04 cycle and how we can improve in the future
- One of their major projects has been about revision control for all of Ubuntu’s source code, and they talked last week about what’s next
- We’re aiming to provide btrfs as an install-time option in 10.10
- In order to keep Ubuntu moving forward, the foundations team is always on the lookout for stale bits which we don’t need to keep around anymore. At UDS, they discussed culling unattended packages, retiring the IA64 and SPARC ports and other spring cleaning
- There was a lot of discussion about Upstart, including its further development, implications for servers, desktops and the kernel, and the migration of server init scripts to upstart jobs
- After maintaining two separate x86 boot loaders for years, it looks like we may be ready to replace isolinux with GRUB2 on Ubuntu CDs
Desktop
The desktop team manages both Desktop Edition and Netbook Edition, on a mission to provide a top-notch experience to users across a range of client computing devices.
Highlights from their track:
- A key theme for 10.10 is to help developers to create applications for Ubuntu, by providing a native development environment, improving Quickly, improving desktopcouch, making it easier to get started with desktopcouch, and enabling developers to deliver new applications to Ubuntu users continuously
- With more and more touch screen devices appearing, Ubuntu will grow some new features to support touch oriented applications
- The web browser is a staple application for Ubuntu, and as such we are always striving for the best experience for our users. The team is looking ahead to Chromium, using apport to improve browser bug reports, and providing a web-oriented document capability via Zoho
- Building on work done in 10.04, we will aim to make simple things simple for basic photo editing
- Security-conscious users may rest easier knowing that the X window system will run without root privileges where kernel modesetting is supported
Server/Cloud
The server team is charging ahead with making Ubuntu the premier server OS for cloud computing environments.
Highlights from their track:
- Providing more powerful tools for managing Ubuntu in EC2 and Ubuntu Enterprise Cloud infrastructure, including boot-time configuration, image and instance management, and kernel upgrades
- Improving Ubuntu Enterprise Cloud by adding new Eucalyptus features (such as LXC container support, monitoring, rapid provisioning, and load balancing. If you ever wanted to run a UEC demo from a USB stick, that’s possible too.
- Providing packaged solutions for cloud building blocks such as hadoop and pig, Drupal, ehcache, Spring, various NOSQL databases, web frameworks, and more
- Providing turn-key solutions for free software applications like Alfresco and Kolab
- Making Puppet easier to deploy, easier to configure, and easier to scale in the cloud
ARM
Kiko Reis gave a talk introducing ARM and the corresponding opportunity for Ubuntu. The ARM team ran a full track during the week on all aspects of their work, from the technical details of the kernel and toolchain, to the assembly of a complete port of Netbook Edition 10.10 for several ARM platforms.
Kernel
The kernel team provided essential input and support for the above efforts, and also held their own track where they selected 2.6.35 as their target version, agreed on a variety of changes to the Ubuntu kernel configuration, and created a plan for providing backports of newer kernels to LTS releases to support deployment on newer hardware.
Security
Like the kernel team, the security team provided valuable input into the technical plans being developed by other teams, and also organized a security track to tackle some key security topics such as clarifying the duration of maintenance for various collections of packages, and the ongoing development of AppArmor and Ubuntu’s AppArmor profiles.
QA
The QA team focuses on testing, test automation and bug management throughout the project. While quality is everyone’s responsibility, the QA team helps to coordinate these activities across different teams, establish common standards, and maintain shared infrastructure and tools.
Highlights from their track include:
- There was a strong sense of broadening and deepening our testing efforts, mobilizing testers for specific testing projects, streamlining the ISO testing process by engaging Ubuntu derivatives and fine-tuning ISO test cases, and reactivating the community-based laptop testing program
- In support of this effort, there will be projects to improve test infrastructure, including enabling tests to target specific hardware and tracking test results in Launchpad
- There is a continuous effort to improve high-volume processing of bug reports, and two focus areas for this cycle will be tracking regressions (as these are among the most painful bugs for users) and improving our response to kernel bugs (as the kernel faces some special challenges in bug management)
Design
The design team organized a track at UDS for the first time this cycle, and team manager Ivanka Majic gave a presentation to help explain its purpose and scope.
Toward the end of the week, I joined in a round table discussion about some of the challenges faced by the team in engaging with the Ubuntu community and building support for their work. This is a landmark effort in mating professional design with free software community, and there is still much to learn about how to do this well.
Community
The community track discussed the usual line-up of events, outreach and advocacy programs, organizational tools, and governance housekeeping for the 10.10 cycle, as well as goals for improving the translation of Ubuntu and related resources into many languages.
One notable project is an initiative to aggressively review patches submitted to the bug tracker, to show our appreciation for these contributions by processing them more quickly and completely.
On Cloud
If you’re convinced that “cloud” is a useless buzzword, being used to describe everything under the sun, old and new, then…well, you’re right—that is, except for the “useless” part. It’s true that this word is being (ab)used in many different products, services and technologies, which do not seem to relate to each other in any concrete way. “Cloud” in the abstract is being defined in many different ways, based on different fundamental characteristics of “cloud technology”. Nonetheless, there is something genuinely important going on here, and this is my view of what it’s about.
The hype
Countless business which only had “web sites” or “web applications” in 2008 now call them “cloud services”. They aren’t delivering any new benefits to their users, and they haven’t redesigned their infrastructure. What do they mean by this?
Cloud is defined, some say, by where your data is kept, and “cloud” means storing your data on remote servers instead of internal ones. Millions of Gmail and Flickr users have been “in the cloud” without even knowing it, and so has everyone who reads their mail over IMAP, or uses voicemail. In short, cloud is just data that is somewhere else.
Last September, I watched a presentation on “Cloud computing” where Symantec pitched their anti-virus product as having something like three different “cloud computing” technologies. These, it turned out, all involved downloading files over the Internet. In this view, cloud is essentially about the network, derived from our habit of using a drawing of a cloud in our network diagrams for so many years. Wikipedia currently shares this view, as its Cloud computing article opens with “Cloud computing is Internet-based computing”. No Internet, no cloud.Maybe cloud is just SaaS, and any service provided on a pay-per-use basis over the Internet is a “cloud” service. It would seem that cloud is about paying for things, especially on a “utility” basis. The cloud is computing by the hour; it’s something you buy.
Others maintain that “cloud” is just virtualization, so if you’re using KVM or VMWare, you’ve got a cloud already. In this view, cloud is about operating systems, and whether they’re running on real or virtual hardware. Cloud means deploying applications as virtual appliances, and creating new servers entirely in software. If your servers only have one OS on them, then they’re not cloud-ready.
If this vagueness and contradiction irritates you, then you’re not alone. It bugs me, and a lot of other people who are getting on with developing, using and providing technology and services. Cloud is not just a new name for these familiar technologies. In fact, it isn’t a technology at all. I don’t think it even makes sense to categorize technologies as “cloud” and “non-cloud”, though some may be more “cloudy” than others.
Nonetheless, there is some meaning and validity in each of these interpretations of cloud. So what is it?
A different perspective
Cloud is a transition, a trend, a paradigm shift. It isn’t something which exists or not: it is happening, and we won’t understand its essential nature until it’s over. Simon Wardley explains this, in presentation format, much better than I could in this blog post, so if you haven’t already, I suggest that you take 14 minutes of your time and go and watch him do his thing.This change doesn’t have a clear beginning or an end. Early on, it was a disconnected set of ideas without any identity. We’re now somewhere in the middle of the bell curve, with enough insight to give it a name, and sufficient momentum to guess at what happens next. In the end, it will be absorbed into “the way things are”.
What’s going on?
So, what is actually changing? The key trends I see are:
- People and organizations are becoming more comfortable relying on resources which do not exist in any particular place. Rather than storing precious metals under our beds, we entrust our finances to banks, which store our data and provide us with services which let us “use” our “money”. In computing terms, we’re no longer enamored with keeping all of “our” programs and “our” data on “our” hard drive: as it turns out, it’s not very safe there after all, and in some ways we can actually exercise more control over programs and data “out there” than “in here”. We can no longer hoard our gold, or drive off would-be thieves with a pistol, but that wasn’t a productive way to spend our energies anyway. While we can’t put our hands on our money, we can know exactly what’s happening with it from minute to minute, and use it in a variety of ways at a moment’s notice.
- IT products are ceding ground to services. In many cases, we no longer need to build software, or even buy it: we can simply use it. As computing needs become better understood, and can be met by commodity products, it becomes more feasible to develop services to meet them on-demand. This is why we see a spike in acronyms ending in “aaS”. Simon explains the progression in detail in his talk, linked above.
- Hardware, software and data are being reorganized according to new models, in order to provide these services effectively. Architectural patterns, such as “hardware/OS/framework/application” are giving way to new ones, like “hardware/OS/IaaS/OS/PaaS/SaaS/Internet/web browser/OS/hardware”. These are still evolving, and the lines between them are blurry. It’s a bit early to say what the dominant patterns will be, but system, network, data and application architecture are all being transformed. No single technology or architecture defines cloud, but virtualization (at the infrastructure level) and the web (at the application level) both seem to resonate strongly with “cloudy” design patterns.
These trends are reinforcing and accelerating each other, driving information technologies and businesses in a common direction. That, in a nutshell, is what cloud is all about.
So what?
This transformation is disruptive in many different ways, but the angles which most interest me at the moment are:
- Operating systems – Cloud seems to indicate further commoditization of operating systems. We will have more operating systems than ever before (thanks to virtualization and IaaS), but we probably won’t think about them as much (as in software appliances). I think that the cloud world will want operating systems which are free, standard and highly customizable, which is potentially a great opportunity for Ubuntu and other open systems.
- Software freedom – As Eben Moglen and others have pointed out, this trend has significant consequences for the free software movement. As the shape of software changes, our principles of freedom must evolve as well. What does it mean to have the freedom to “run” a program in the cloud? To copy it? To change it? What other protections might people need in order to exercise these freedoms in the future?
- DevOps – I see a strong resonance between cloud—which is blurring the lines between hardware and software, infrastructure and applications, network and computer—and DevOps, which is bringing together the people who currently work in these different categories. Cloud means that we can no longer afford to treat these elements separately, and must work together to find better approaches to developing and deploying software, knocking down barriers and mapping new territory. Fortunately, cloud also means developing powerful new tools which will power this revolution.
- Clients – A majority of cloud related activity seems to be focused on what happens to back-end servers, but what about the computers we actually touch, on our desks, and in our pockets? How will they change? Will we end up reverting to a highly centralized computing model, where clients are strictly limited to front-end user interface processing (e.g. a web browser)? Will clients “join” the cloud and be providers of computing resources, not only consumers? What new types of devices will we need in order to make the most of cloud?
I’d like to explore some of these topics in future posts.
Interviewed by Ubuntu Turkey
As part of my recent Istanbul visit, I was interviewed by Ubuntu Turkey. They’ve now published the interview in Turkish. With their permission, the original English interview has been published on Ubuntu User by Amber Graner.
Introducing the jonometer
a learning experiment using Python, Twitter, CouchDB and desktopcouch
For a while now, I’ve been wanting to do a programming project using CouchDB and third-party web service APIs. I started out with an application to sync Launchpad bug data into CouchDB so that I could analyze it locally, a bit like Bug Hugger. It quickly got too complex for my spare time, and stalled. I’d still like to pick it up someday when I can devote more time to it.
More recently, I was noticing that Jono seemed to be having a rocking good time lately, sending a lot of awesome tweets about jams. This was only conjecture, though, and I needed hard data. I need to quantify just how strong these influences were.
Now, this was a project I could get done in an evening of hacking and learning.
Version One
First, I threw together this quick proof of concept to learn the Twitter API and get some tantalizing preliminary data. Behold version 1.0 of the jonometer:
#!/usr/bin/python # python-twitter import sys import twitter import re username = 'jonobacon' updates_wanted = 100 patterns = ['rock', 'awesome', 'jam'] class Counter: """A simple accumulator which counts matches of a regex""" def __init__(self, pattern): self.pattern = pattern self.regex = re.compile(pattern, re.I) self.count = 0 def update(self, s): """Increment count if the string s matches the pattern""" if self.regex.search(s): self.count += 1 def main(): client = twitter.Api() counters = map(Counter, patterns) updates_found = 0 for update in client.GetUserTimeline(username, updates_wanted): updates_found += 1 for counter in counters: counter.update(update.GetText()) for counter in counters: print counter.pattern, counter.count if __name__ == '__main__': sys.exit(main())
The output looked like this:
rock 5 awesome 6 jam 10
In other words, about 5% of Jono’s recent tweets were rocking, another 6% were awesome, and a whopping 10% were jamming! I was definitely onto something, but I had to find out more.
One of the shortcomings of this quick prototype is that it would download the data from Twitter every time I ran it. This meant that it was fairly slow (about 2 seconds for 100 tweets), which is inconvenient for experimenting with different patterns, and that I wouldn’t want to try it with larger data sets (say, thousands of tweets, or multiple people).
Version Two
Enter CouchDB, the darling of the NoSQL crowd: fast, scalable and simple, it was just what I wanted for the next version of the jonometer. I replaced the Counter objects with a single Database, which stores all of the tweets in CouchDB. This was incredibly simple to do, because python-twitter provides an .AsDict() method which returns a tweet as a dictionary object, and CouchDB can store this type of data structure directly into the database. Easy!
Each time the jonometer is run, it downloads all of the new tweets since the previous run. In order to do this, it needs to keep track of the most recent tweet ID it has seen, so that it can pick up where it left off. I had originally planned to store a record in the database with the sync state, but after Stuart reminded me that Gwibber does much the same thing, I followed its example and instead calculated it using a view. Each row in the “maxid” view records the highest tweet ID seen for a particular user:
Key | Value |
---|---|
jonobacon | 10743678774 |
…so although the jonometer is currently Jono-specific, it could be extended easily.
For the core functionality, I created a view called “matches” to count how many tweets match each pattern. For each key (username and pattern), there is a row in this view which records how many tweets from that user matched that pattern:
Key | Value |
---|---|
["jonobacon", null] | 100 |
["jonobacon", "Awesome"] | 6 |
["jonobacon", "Jam"] | 10 |
["jonobacon", "Rock"] | 5 |
The null pattern is used to keep a count of the total number of tweets for that user.
Once the data is loaded, the runtime for the CouchDB version is only about 0.3 seconds, including the Python interpreter startup as well as checking Twitter to see if there are new tweets. I doubled the size of the database up to 200 (which was about all Twitter would give me in one batch), and this didn’t change measurably. If I’ve done all of this right, it should scale easily up to thousands of tweets. Awesome! Adding or changing a pattern currently requires manually deleting the view so that it can be re-created. There is probably an established pattern for dealing with this, but I don’t know what it is yet.
Here’s the code for version 2:
#!/usr/bin/python # python-twitter # python-desktopcouch import sys import twitter import re from desktopcouch.records.server import CouchDatabase from desktopcouch.records.record import Record username = 'jonobacon' # title string : JavaScript regex patterns = { 'Rock' : 'rock', 'Awesome' : 'awesome', 'Jam' : 'jam' } class Database(CouchDatabase): design_doc = "jonometer" database_name = "jonometer" def __init__(self, patterns): """patterns is a dictionary of (title string, JavaScript regex)""" CouchDatabase.__init__(self, self.database_name, create=True) self.patterns = patterns.copy() # set up maxid view if not self.view_exists("maxid", self.design_doc): mapfn = '''function(doc) { emit(doc.user.screen_name, doc.id); }''' viewfn = '''function(key, values, rereduce) { return Math.max.apply(Math, values); }''' self.add_view("maxid", mapfn, viewfn, self.design_doc) # set up a view to count occurrences of each pattern if not self.view_exists("matches", self.design_doc): mapfn = ''' function(doc) { emit([doc.user.screen_name, null], 1); var pattern = null; var pattern_name = null; ''' mapfn += ''.join([''' pattern = "%s"; pattern_name = "%s"; if (new RegExp(pattern, "i").exec(doc.text)) { emit([doc.user.screen_name, pattern_name], 1); } ''' % (pattern, pattern_name) for pattern_name, pattern in self.patterns.items()]) mapfn += '}' viewfn = '''function(key, values, rereduce) { return sum(values); }''' self.add_view("matches", mapfn, viewfn, self.design_doc) def maxid(self, username): """Return the highest known tweet ID for the specified user""" view = self.execute_view("maxid", self.design_doc) result = view[username].rows if len(result) > 0: return result[0].value return None def count_matches(self, username, pattern_name=None): """Return the number of tweets from username which match the specified pattern. If no pattern is specified, count all tweets.""" assert pattern_name is None or pattern_name in self.patterns view = self.execute_view("matches", self.design_doc) result = view[[username, pattern_name]].rows if len(result) > 0: return result[0].value def main(): client = twitter.Api() db = Database(patterns) maxid = db.maxid(username) if maxid: timeline = client.GetUserTimeline(username, since_id=maxid) else: timeline = client.GetUserTimeline(username, count=100) for tweet in timeline: print "new:", tweet.GetText() record = Record(tweet.AsDict(), record_type='https://mdzlog.alcor.net/tag/jonometer') record_id = db.put_record(record) for pattern in patterns: print pattern, db.count_matches(username, pattern) print "total", db.count_matches(username) if __name__ == '__main__': sys.exit(main())