We'll see | Matt Zimmerman

a potpourri of mirth and madness

Posts Tagged ‘Web

On Cloud

If you’re convinced that “cloud” is a useless buzzword, being used to describe everything under the sun, old and new, then…well, you’re right—that is, except for the “useless” part. It’s true that this word is being (ab)used in many different products, services and technologies, which do not seem to relate to each other in any concrete way. “Cloud” in the abstract is being defined in many different ways, based on different fundamental characteristics of “cloud technology”. Nonetheless, there is something genuinely important going on here, and this is my view of what it’s about.

The hype

Countless business which only had “web sites” or “web applications” in 2008 now call them “cloud services”. They aren’t delivering any new benefits to their users, and they haven’t redesigned their infrastructure. What do they mean by this?

Cloud is defined, some say, by where your data is kept, and “cloud” means storing your data on remote servers instead of internal ones. Millions of Gmail and Flickr users have been “in the cloud” without even knowing it, and so has everyone who reads their mail over IMAP, or uses voicemail. In short, cloud is just data that is somewhere else.

Credit: jurvetson

Last September, I watched a presentation on “Cloud computing” where Symantec pitched their anti-virus product as having something like three different “cloud computing” technologies. These, it turned out, all involved downloading files over the Internet. In this view, cloud is essentially about the network, derived from our habit of using a drawing of a cloud in our network diagrams for so many years. Wikipedia currently shares this view, as its Cloud computing article opens with “Cloud computing is Internet-based computing”. No Internet, no cloud.

Maybe cloud is just SaaS, and any service provided on a pay-per-use basis over the Internet is a “cloud” service. It would seem that cloud is about paying for things, especially on a “utility” basis. The cloud is computing by the hour; it’s something you buy.

Others maintain that “cloud” is just virtualization, so if you’re using KVM or VMWare, you’ve got a cloud already. In this view, cloud is about operating systems, and whether they’re running on real or virtual hardware. Cloud means deploying applications as virtual appliances, and creating new servers entirely in software. If your servers only have one OS on them, then they’re not cloud-ready.

If this vagueness and contradiction irritates you, then you’re not alone. It bugs me, and a lot of other people who are getting on with developing, using and providing technology and services. Cloud is not just a new name for these familiar technologies. In fact, it isn’t a technology at all. I don’t think it even makes sense to categorize technologies as “cloud” and “non-cloud”, though some may be more “cloudy” than others.

Nonetheless, there is some meaning and validity in each of these interpretations of cloud. So what is it?

A different perspective

Credit: TangYauHoong

Cloud is a transition, a trend, a paradigm shift. It isn’t something which exists or not: it is happening, and we won’t understand its essential nature until it’s over. Simon Wardley explains this, in presentation format, much better than I could in this blog post, so if you haven’t already, I suggest that you take 14 minutes of your time and go and watch him do his thing.

This change doesn’t have a clear beginning or an end. Early on, it was a disconnected set of ideas without any identity. We’re now somewhere in the middle of the bell curve, with enough insight to give it a name, and sufficient momentum to guess at what happens next. In the end, it will be absorbed into “the way things are”.

What’s going on?

So, what is actually changing? The key trends I see are:

  1. People and organizations are becoming more comfortable relying on resources which do not exist in any particular place. Rather than storing precious metals under our beds, we entrust our finances to banks, which store our data and provide us with services which let us “use” our “money”. In computing terms, we’re no longer enamored with keeping all of “our” programs and “our” data on “our” hard drive: as it turns out, it’s not very safe there after all, and in some ways we can actually exercise more control over programs and data “out there” than “in here”. We can no longer hoard our gold, or drive off would-be thieves with a pistol, but that wasn’t a productive way to spend our energies anyway. While we can’t put our hands on our money, we can know exactly what’s happening with it from minute to minute, and use it in a variety of ways at a moment’s notice.
  2. IT products are ceding ground to services. In many cases, we no longer need to build software, or even buy it: we can simply use it. As computing needs become better understood, and can be met by commodity products, it becomes more feasible to develop services to meet them on-demand. This is why we see a spike in acronyms ending in “aaS”. Simon explains the progression in detail in his talk, linked above.
  3. Hardware, software and data are being reorganized according to new models, in order to provide these services effectively. Architectural patterns, such as “hardware/OS/framework/application” are giving way to new ones, like “hardware/OS/IaaS/OS/PaaS/SaaS/Internet/web browser/OS/hardware”. These are still evolving, and the lines between them are blurry. It’s a bit early to say what the dominant patterns will be, but system, network, data and application architecture are all being transformed. No single technology or architecture defines cloud, but virtualization (at the infrastructure level) and the web (at the application level) both seem to resonate strongly with “cloudy” design patterns.

These trends are reinforcing and accelerating each other, driving information technologies and businesses in a common direction. That, in a nutshell, is what cloud is all about.

So what?

This transformation is disruptive in many different ways, but the angles which most interest me at the moment are:

  • Operating systems – Cloud seems to indicate further commoditization of operating systems. We will have more operating systems than ever before (thanks to virtualization and IaaS), but we probably won’t think about them as much (as in software appliances). I think that the cloud world will want operating systems which are free, standard and highly customizable, which is potentially a great opportunity for Ubuntu and other open systems.
  • Software freedom – As Eben Moglen and others have pointed out, this trend has significant consequences for the free software movement. As the shape of software changes, our principles of freedom must evolve as well. What does it mean to have the freedom to “run” a program in the cloud? To copy it? To change it? What other protections might people need in order to exercise these freedoms in the future?
  • DevOps – I see a strong resonance between cloud—which is blurring the lines between hardware and software, infrastructure and applications, network and computer—and DevOps, which is bringing together the people who currently work in these different categories. Cloud means that we can no longer afford to treat these elements separately, and must work together to find better approaches to developing and deploying software, knocking down barriers and mapping new territory. Fortunately, cloud also means developing powerful new tools which will power this revolution.
  • Clients – A majority of cloud related activity seems to be focused on what happens to back-end servers, but what about the computers we actually touch, on our desks, and in our pockets? How will they change? Will we end up reverting to a highly centralized computing model, where clients are strictly limited to front-end user interface processing (e.g. a web browser)? Will clients “join” the cloud and be providers of computing resources, not only consumers? What new types of devices will we need in order to make the most of cloud?

I’d like to explore some of these topics in future posts.


Written by Matt Zimmerman

April 22, 2010 at 14:00

Interviewed by Ubuntu Turkey

As part of my recent Istanbul visit, I was interviewed by Ubuntu Turkey. They’ve now published the interview in Turkish. With their permission, the original English interview has been published on Ubuntu User by Amber Graner.

Written by Matt Zimmerman

April 19, 2010 at 15:15

Introducing the jonometer

a learning experiment using Python, Twitter, CouchDB and desktopcouch

For a while now, I’ve been wanting to do a programming project using CouchDB and third-party web service APIs. I started out with an application to sync Launchpad bug data into CouchDB so that I could analyze it locally, a bit like Bug Hugger. It quickly got too complex for my spare time, and stalled. I’d still like to pick it up someday when I can devote more time to it.

More recently, I was noticing that Jono seemed to be having a rocking good time lately, sending a lot of awesome tweets about jams. This was only conjecture, though, and I needed hard data. I need to quantify just how strong these influences were.

Now, this was a project I could get done in an evening of hacking and learning.

Version One

First, I threw together this quick proof of concept to learn the Twitter API and get some tantalizing preliminary data. Behold version 1.0 of the jonometer:


# python-twitter

import sys
import twitter
import re

username = 'jonobacon'
updates_wanted = 100
patterns = ['rock', 'awesome', 'jam']

class Counter:
    """A simple accumulator which counts matches of a regex"""

    def __init__(self, pattern):
        self.pattern = pattern
        self.regex = re.compile(pattern, re.I)
        self.count = 0

    def update(self, s):
        """Increment count if the string s matches the pattern"""
        if self.regex.search(s):
            self.count += 1

def main():
    client = twitter.Api()
    counters = map(Counter, patterns)
    updates_found = 0
    for update in client.GetUserTimeline(username, updates_wanted):
        updates_found += 1
        for counter in counters:

    for counter in counters:
        print counter.pattern, counter.count

if __name__ == '__main__':

The output looked like this:

rock 5
awesome 6
jam 10

In other words, about 5% of Jono’s recent tweets were rocking, another 6% were awesome, and a whopping 10% were jamming! I was definitely onto something, but I had to find out more.

One of the shortcomings of this quick prototype is that it would download the data from Twitter every time I ran it. This meant that it was fairly slow (about 2 seconds for 100 tweets), which is inconvenient for experimenting with different patterns, and that I wouldn’t want to try it with larger data sets (say, thousands of tweets, or multiple people).

Version Two

Enter CouchDB, the darling of the NoSQL crowd: fast, scalable and simple, it was just what I wanted for the next version of the jonometer. I replaced the Counter objects with a single Database, which stores all of the tweets in CouchDB. This was incredibly simple to do, because python-twitter provides an .AsDict() method which returns a tweet as a dictionary object, and CouchDB can store this type of data structure directly into the database. Easy!

Each time the jonometer is run, it downloads all of the new tweets since the previous run. In order to do this, it needs to keep track of the most recent tweet ID it has seen, so that it can pick up where it left off. I had originally planned to store a record in the database with the sync state, but after Stuart reminded me that Gwibber does much the same thing, I followed its example and instead calculated it using a view. Each row in the “maxid” view records the highest tweet ID seen for a particular user:

The maxid view
Key Value
jonobacon 10743678774

…so although the jonometer is currently Jono-specific, it could be extended easily.

For the core functionality, I created a view called “matches” to count how many tweets match each pattern. For each key (username and pattern), there is a row in this view which records how many tweets from that user matched that pattern:

The matches view
Key Value
["jonobacon", null] 100
["jonobacon", "Awesome"] 6
["jonobacon", "Jam"] 10
["jonobacon", "Rock"] 5

The null pattern is used to keep a count of the total number of tweets for that user.

Once the data is loaded, the runtime for the CouchDB version is only about 0.3 seconds, including the Python interpreter startup as well as checking Twitter to see if there are new tweets. I doubled the size of the database up to 200 (which was about all Twitter would give me in one batch), and this didn’t change measurably. If I’ve done all of this right, it should scale easily up to thousands of tweets. Awesome! Adding or changing a pattern currently requires manually deleting the view so that it can be re-created. There is probably an established pattern for dealing with this, but I don’t know what it is yet.

Here’s the code for version 2:


# python-twitter
# python-desktopcouch

import sys
import twitter
import re
from desktopcouch.records.server import CouchDatabase
from desktopcouch.records.record import Record

username = 'jonobacon'
# title string : JavaScript regex
patterns = { 'Rock' : 'rock',
        'Awesome' : 'awesome',
        'Jam' : 'jam' }

class Database(CouchDatabase):
    design_doc = "jonometer"
    database_name = "jonometer"

    def __init__(self, patterns):
        """patterns is a dictionary of (title string, JavaScript regex)"""

        CouchDatabase.__init__(self, self.database_name, create=True)
        self.patterns = patterns.copy()

        # set up maxid view
        if not self.view_exists("maxid", self.design_doc):
            mapfn = '''function(doc) { emit(doc.user.screen_name, doc.id); }'''
            viewfn = '''function(key, values, rereduce) {
    return Math.max.apply(Math, values);
            self.add_view("maxid", mapfn, viewfn, self.design_doc)

        # set up a view to count occurrences of each pattern
        if not self.view_exists("matches", self.design_doc):

            mapfn = '''
function(doc) {
    emit([doc.user.screen_name, null], 1);

    var pattern = null;
    var pattern_name = null;

            mapfn += ''.join(['''   
    pattern = "%s";
    pattern_name = "%s";
    if (new RegExp(pattern, "i").exec(doc.text)) {
        emit([doc.user.screen_name, pattern_name], 1);
    ''' % (pattern, pattern_name)
       for pattern_name, pattern in self.patterns.items()])

            mapfn += '}'

            viewfn = '''function(key, values, rereduce) { return sum(values); }'''
            self.add_view("matches", mapfn, viewfn, self.design_doc)

    def maxid(self, username):
        """Return the highest known tweet ID for the specified user"""

        view = self.execute_view("maxid", self.design_doc)
        result = view[username].rows
        if len(result) > 0:
            return result[0].value
        return None

    def count_matches(self, username, pattern_name=None):
        """Return the number of tweets from username which match 
        the specified pattern.

        If no pattern is specified, count all tweets."""

        assert pattern_name is None or pattern_name in self.patterns
        view = self.execute_view("matches", self.design_doc)
        result = view[[username, pattern_name]].rows
        if len(result) > 0:
            return result[0].value

def main():
    client = twitter.Api()
    db = Database(patterns)

    maxid = db.maxid(username)
    if maxid:
        timeline = client.GetUserTimeline(username, since_id=maxid)
        timeline = client.GetUserTimeline(username, count=100)

    for tweet in timeline:
        print "new:", tweet.GetText()
        record = Record(tweet.AsDict(),
        record_id = db.put_record(record)

    for pattern in patterns:
        print pattern, db.count_matches(username, pattern)
    print "total", db.count_matches(username)

if __name__ == '__main__':

Written by Matt Zimmerman

March 19, 2010 at 23:41

Optimizing my social network

I’ve been working to better organize my online social network so as to make it more useful to me and to the people I know.

I use each social networking tool in a different way, and tailor the content and my connections accordingly. I don’t connect with all of the same people everywhere. I am particularly annoyed by social networks which abuse the word “friend” to mean something wholly different than it means in the rest of society. If I’m not someone’s “friend” on a certain website, it doesn’t mean that I don’t like them. It just means that the information I exchange with them fits better somewhere else.

Here is the arrangement I’ve ended up with:

  • If you just want to hear bits and pieces about what I’m up to, you can follow me on identi.ca, Twitter or FriendFeed. My identi.ca and Twitter feeds have the same content, though I check @-replies on identi.ca more often.
  • If you’re interested in the topics I write about in more detail, you can subscribe to my blog.
  • If you want to follow what I’m reading online, you can subscribe to my Google Reader feed.
  • If (and only if) we’ve worked together (i.e. we have worked cooperatively on a project, team, problem, workshop, class, etc.), then I’d like to connect with you on LinkedIn. LinkedIn also syndicates my blog and Twitter.
  • If you know me “in real life” and want to share your Facebook content with me, you can connect with me on Facebook. I try to limit this to a manageable number of connections, and will periodically drop connections where the content is not of much interest to me so that my feed remains useful. Don’t take it personally (see the start of this post). Virtually everything I post on my Facebook account is just syndicated from other public sources above anyway. I no longer publish any personal content to Facebook due to their bizarre policies around this.

Written by Matt Zimmerman

March 18, 2010 at 17:28

QCon London 2010: Day 3

The tracks which interested me today were “How do you test that?”, which dealt with scenarios where testing (especially automation) is particularly challenging, and “Browser as a Platform”, which is self-explanatory.

Joe Walker: Introduction to Bespin, Mozilla’s Web Based Code Editor

I didn’t make it to this talk, but Bespin looks very interesting. It’s “a Mozilla Labs Experiment to build a code editor in a web browser that Open Web and Open Source developers could love”.

I experimented briefly with the Mozilla hosted instance of Bespin. It seems mostly oriented for web application development, and still isn’t nearly as nice as desktop editors. However, I think something like this, combined with Bazaar and Launchpad, could make small code changes in Ubuntu very fast and easy to do, like editing a wiki.

Doron Reuveni: The Mobile Testing Challenge

Why Mobile Apps Need Real-World Testing Coverage and How Crowdsourcing Can Help

Doron explained how the unique testing requirements of mobile handset application are well suited to a crowdsourcing approach. As the founder of uTest, he explained their approach to connecting their customers (application vendors) with a global community of testers with a variety of mobile devices. Customers evaluate the quality of the testers’ work, and this data is used to grade them and select testers for future testing efforts in a similar domain. The testers earn money for their efforts, based on test case coverage (starting at about $20 each), bug reports (starting at about $5 each), and so on. Their highest performers earn thousands per month.

uTest also has a system, uTest Remote Access, which allows developers to “borrow” access to testers’ devices temporarily, for the purpose of reproducing bugs and verifying fixes. Doron gave us a live demo of the system, which (after verifying a code out of band through Skype) displayed a mockup of a BlackBerry device with the appropriate hardware buttons and a screenshot of what was displayed on the user’s screen. The updates were not quite real-time, but were sufficient for basic operation. He demonstrated taking a picture with the phone’s camera and seeing the photo within a few seconds.

Dylan Schiemann: Now What?

Dylan did a great job of extrapolating a future for web development based on the trend of the past 15 years. He began with a review of the origin of web technologies, which were focused on presentation and layout concerns, then on to JavaScript, CSS and DHTML. At this point, there was clear potential for rich applications, though there were many roadblocks: browser implementations were slow, buggy or nonexistent, security models were weak or missing, and rich web applications were generally difficult to engineer.

Things got better as more browsers came on the scene, with better implementations of CSS, DOM, XML, DHTML and so on. However, we’re still supporting an ancient implementation in IE. This is a recurring refrain among web developers, for whom IE seems to be the bane of their work. Dylan added something I hadn’t heard before, though, which was that Microsoft states that anti-trust restrictions were a major factor which prevented this problem from being fixed.

Next, there was an explosion of innovation around Ajax and related toolkits, faster javascript implementations, infrastructure as a service, and rich web applications like GMail, Google Maps, Facebook, etc.

Dylan believes that web applications are what users and developers really want, and that desktop and mobile applications will fall by the wayside. App stores, he says, are a short term anomaly to avoid the complexities of paying many different parties for software and services. I’m not sure I agree on this point, but there are massive advantages to the web as an application platform for both parties. Web applications are:

  • fast, easy and cheap to deploy to many users
  • relatively affordable to build
  • relatively easy to link together in useful ways
  • increasingly remix-able via APIs and code reuse

There are tradeoffs, though. I have an article brewing on this topic which I hope to write up sometime in the next few weeks.

Dylan pointed out that different layers of the stack exhibit different rates of change: browsers are slowest, then plugins (such as Flex and SilverLight), then toolkits like Dojo, and finally applications which can update very quickly. Automatically updating browsers are accelerating this, and Chrome in particular values frequent updates. This is good news for web developers, as this seems to be one of the key constraints for rolling out new web technologies today.

Dylan feels that technological monocultures are unhealthy, and prefers to see a set of competing implementations converging on standards. He acknowledged that this is less true where the monoculture is based on free software, though this can still inhibit innovation somewhat if it leads to everyone working from the same point of view (by virtue of sharing a code base and design). He mentioned that de facto standardization can move fairly quickly; if 2-3 browsers implement something, it can start to be adopted by application developers.

Comparing the different economics associated with browsers, he pointed out that Mozilla is dominated by search through the chrome (with less incentive to improve the rendering engine), Apple is driven by hardware sales, and Google by advertising delivered through the browser. It’s a bit of a mystery why Microsoft continues to develop Internet Explorer.

Dylan summarized the key platform considerations for developers:

  • choice and control
  • taste (e.g. language preferences, what makes them most productive)
  • performance and scalability
  • security

and surmised that the best way to deliver these is through open web technologies, such as HTML 5, which now offers rich media functionality including audio, video, vector graphics and animations. He closed with a few flashy demos of HTML 5 applications showing what could be done.

Written by Matt Zimmerman

March 12, 2010 at 17:14

Multivac emerging

Science fiction writer Isaac Asimov envisioned a computer called Multivac, powerful enough to process all of the planet’s data. Humanity painstakingly collects massive quantities of information to submit to Multivac on a daily basis, in exchange for the opportunity to ask questions of it.

With so much information at its disposal, Multivac is capable of amazing feats of analysis and prediction, which guide humanity to resolving global problems of war, poverty and so on.

The corporate mission statement of Google, Inc. is to organize the world’s information and make it universally accessible and useful. Google constantly processes information from the web, books, and photographic imagery from space and from the surface of the planet. Its famously simple search interface invites humans to ask it about anything, and it provides instantaneous answers in the form of references to information it has collected.

Google is not yet capable, in general, of providing meaningful answers to natural language questions, though research is ongoing, and systems like Wolfram Alpha hint at more abstract manipulation of data at a global scale.

We seem to be edging closer to Asimov’s vision of Multivac. What would you ask Multivac, given the opportunity? How will our future reality differ from science fiction?

Written by Matt Zimmerman

November 8, 2009 at 18:37

Posted in Uncategorized

Tagged with ,

Google Voice

I’ve been experimenting with Google Voice while traveling in the US. I would have tried it sooner, but it isn’t very UK-friendly at present.

The good:

  • Free SMS to US mobiles from the browser
  • Convenient browsing and searching of received/placed calls, SMS and voicemail
  • Initiation of phone calls from the browser
  • Unified contacts database with my Android phone (and GMail, though I don’t use it)
  • Simple call routing, so I can use a fixed number when I’m in the US even though I usually pick up new a prepaid SIM on each trip
  • Ability to choose a phone number through searching, to find one which is easy to remember
  • Speech-to-text of voicemails (maybe just good enough to be useful)

The bad or missing:

  • Requires a data connection on Android (problematic e.g. when roaming or using a prepaid/pay-as-you-go SIM), though it falls back gracefully to non-Google-Voice service
  • Calls outside the US cost money (Vonage and other VOIP providers offer this for free)
  • Calls can apparently only be placed between POTS lines (no softphone functionality)
  • Yet another place to set my time zone.  Being able to selectively block calls while I’m asleep abroad would be the killer feature for me
  • Caller ID doesn’t seem fully integrated in Android: it sometimes looks like I’m on a call with myself

The boring:

  • Free calls within the US: people still pay for this?
  • Voicemail: people still leave voicemail?

The “ah” moment came when someone gave me a phone number on IRC, and I copy/pasted it into my browser to call them.

Written by Matt Zimmerman

September 5, 2009 at 20:26

Posted in Uncategorized

Tagged with , ,