Community by the Numbers, Part III: Power Laws
In my first article in this series I talked about community numbers: how the sizes of groups ultimately affect their success (or failure). However what I discussed only offers up the most rudimentary explanation of the dynamics, and that is because typically not all of the members of a group are equally involved.
In order to better define who constitutes the tightly-knit "participant community" upon which the group thresholds act, we have to study power laws which let us measure the intensity of individuals' involvement in a group.
An Overview of Power Laws
The best-known power law is probably the Pareto principle, which is otherwise known as the "80/20 law." It's been overused throughout the years; Pareto's actual law only said that 80% of the wealth would be held by 20% of the population.
However, it offers a fine example of how power laws work. They generally describe a discrepancy between intensity and population: inevitably, some people do a lot more of the work in any social situation. Other examples include Zipf's Law,
which suggests that the frequency of a word's usage is inversely proportionate to its ranking among words (making the second ranked word appear half as much, the third a quarter as much, etc), and the long tail, which talks about selling a very large number of items in a very small individual quantity.
For online communities, which have been the focus of most of my studies on the topic of community sizes, I've found that the participation inequality power rule is very apt.
This term comes from Will Hill of AT&T Laboratories, who said, "A major reason why user-contributed content rarely turns into a true community is that all aspects of Internet use are characterized by severe participation inequality." It's often equated with the 1% law, though I like to be more precise and say that 90% of an online community tends to be lurkers, 9% tends to be intermittent participants, and 1% tends to be active participants.
These values heavily influence online community sizes that are larger than the tightly-knit communities group thresholds that I previously discussed.
Power Laws & Group Thresholds
When I wrote about tightly-knit communities in my first article, I didn't consider the degree of participation. That's certainly an entirely valid model for some types of groups. Corporations, for example, ideally should be entirely filled with active participants, while Skotos' online game Castle Marrach also fits into the category due to the implicit requirements it creates for participation. There are some challenges to grow this type of community, since you're only searching for a specific type of high-energy participant — but they can be overcome if you offer sufficient incentive (such as a salary or a lot of internal feedback).
However, most communities, and in particular, online communities, will not fall into this category, and thus when we're looking at group thresholds, we have to measure them against the number of active participants, not against the number of total members. Thus, for groups which allow for non-participation, we'll often measure 10% (or maybe 1%) of the group size against the group thresholds.
RPGnet, one of the community sites that Skotos runs, offers a good example of this. We regularly see monthly uniques of approximately 200,000 users. However we probably have about 20,000 active registered users, confirming the lurker:participant ratio. When we recognize that only 2,000 of those are particularly active participants and that they're divided upon 6 successful forums, we start to see how community numbers that actually match the group thresholds can gel.
You can reverse this approach and look at active participants first. During some recent consulting for a local non-profit organization with 60 active online members, I was able to infer that their broader community was around 6000, which turned out to fairly accurately predict the total number of people who came to their live events over the course of a year.
Generally, this logic can be applied to a community of any size. You first measure whether it's an all-participant community or one that matches an existing power law, and then you use the corrected community number to truly measure which of the group thresholds may apply to it.
Power Laws & Leaders
The power laws can also help you to measure the number of leaders in a community. Inevitably all of your participants will become leaders of some sort, while your high-level participants will become the top-tier leaders.
I noted this in my first discussions of group threshold. In a group of 7 members, you can reasonably expect to have one higher level participant, and thus the one leader that we saw naturally appear. Similarly in a Judas group of 13, there's the opportunity for more than one leader to appear, creating the possibility for the first hierarchical conflicts.
Understanding your count of leaders can help you see how to grow groups. For example when I first created iPhoneWebDev I had to do an immense amount of effort to grow the community. This is because with Participation Inequality I had to grow the group by 10 members before I got the least amount of help increasing the content of the group and I had to grow it by 100 members before I had someone who was doing as much work as I was to create content.
At 100 members, with my first active participant, we continued to grow, but we were both were working hard and felt rather lonely.
I finally saw the group stabilize, then take off on its own, when it hit 600-700 members, and that shows how beautifully the power logs work hand-in-hand with the group thresholds. With 700 members, I could reasonably expect there to be 7 leaders. In other words, I had a committee of leaders: the perfect size for a starting working group.
From my experience with other online groups, if the iPhoneWebDev grows to over 10,000 members, I can expect that there will be some transition issues. As the core active community members exceed 100 people I will start having some Non-Exclusive Dunbar Number problems, typically social contract failures. These can be solved by either adding some hierarchy (appointing some people to be official "staff"), or by starting to break the group into sub-communities.
Varying the Power Laws
In my first article, I noted that it's possible to expend additional energy to make tightly-knit groups able to function effectively at non-optimal sizes. It is similarly possible for the values of the participation inequality sized groups to change by expending more energy. Conversely, a drain on energy may decrease this ratio.
For example when I used to run AOL forums I would frequently reward first-time participants with free time (at that time worth $5 an hour) if they asked good questions or offered valuable input. CompuServe similarly offered constructive feedback by telling users how many responses they'd received to a new comment when they logged back in, encouraging them to leave lurker status. More energy in the community — driven either by the moderators, good social software design, or by a greater commitment by its members — can allow you to increase the active participant percentage, maybe times 2, or even 4, but even with a lot of effort not by an order of magnitude.
As a group grows in size, I believe the participation inequality worsens. A huge Yahoo! group with a million members might have moved from a 90/9/1 ratio to 95/4.5/.5. I suspect this is because the energy required to change the participation inequality numbers is so large as to not be economical.
There are also some interesting interrelations between the numbers of people at the various levels of participation. Though discovering 100 new members has a good chance of adding 10 new participants, 1 of whom is very active, my experience has been that things trickle-down in the other direction as well: that adding 1 new high-level participant can lead to the creation of 9 medium-level participants and 90 lurkers (though don't let that suggest that all of your effort should be expended on the high-level participants only).
Looking at Participation Inequality
Here is a close look at four online communities, using the quantcast.com metrics service, where you can see some participation inequality in action:
From this you can see a typical online community site shows the normal 90% 9% 1% participation inequality. RPGnet shows a slightly better then average participation inequality due to its longevity and the quality of the community. ObesityHealth shows evidence of a great community with its 4% active participants, probably because you have to be very committed if you are going to have bariatric surgery. Last, an example of relatively unhealthy community that is unable to sustain its active participants.
You do have to be careful when analyzing quantcast numbers if you see active participants of greater then 6% — in almost all cases if you look deeper it is because there is some restriction that keeps people from lurking, either a fee or some other type of gateway, causing a distortion in the statistics.
Conclusion
Multiple factors influence the success (or failure) or community. As we saw in my first article on community numbers, the first factor is the differing group thresholds of community sizes. In my second article, I show that personal limits on the number of people you can have intimacy and trust with is an important factor. In this article I show that larger groups are subject to the power law of participation inequality, causing a small fraction of a community to be subject to group thresholds. In all three articles I show how expending energy can allow you to change the numbers, but with limits.
I hope this discussion of community numbers will give you some tools to look at the communities you are in, or are trying to build, and to better understand how to make them more successful.
Some other posts about the Dunbar Number and group size issues:
- 2004-03: The Dunbar Number as a Limit to Group Sizes
(also some really good comments)- 2005-02: Dunbar Triage: Too Many Connections
- 2005-03: Dunbar, Altruistic Punishment, and Meta-Moderation
- 2005-07: Cheers: Belongingness and Para-Social Relationships
- 2005-08: Dunbar & World of Warcraft
- 2005-10: Dunbar Number & Group Cohesion
- 2008-09: Community by the Numbers, Part One: Group Thresholds
- 2008-11: Community by the Numbers, Part II: Personal Circles
My bookmarks to various papers and websites on this topic are available at delicious.com/ChristopherA under some of the following tags:
- participation inequality - more specifics on participation inequality.
- participation inequality - everything I have on the topic of power laws, including participation inequality.
If you have any links on this topic that you would like to share with me, tag them for:ChristopherA and I'll take a look.
Illustrations by Nancy Margulies. Many thanks to Shannon Appecline and F. Randall Farmer for their assistance with this series.
Posted on March 19, 2009 at 01:46 AM in Social Software, Web/Tech | Permalink | Comments (2) | TrackBack (0)
Community by the Numbers, Part II: Personal Circles
In my previous post, I talked about the limits on sizes of tightly-knit communities. These group limits are closely related to a number of interesting personal limits, and are often confused with them.
Unlike the group limits, personal limits actually measure something different: the number of connections that an individual can hold. They're yet another thing that you must consider when thinking about communities of people.
Personal Limits
The Support Circle: This is the number of individuals that you seek advice, support, or help from in times of severe emotional or financial stress. In most societies, the average size of an individual's Support Circle is 3-5. The people are the core of your intimate social network and most typically are also kin. In sociology papers this is often called the "support clique".
The Sympathy Circle: This is larger then the Support Circle — it is the number of people that you go to for sympathy and also those people whose death would be devastating to you. The Sympathy Circle typically is in the range of 10-15 people, but can vary widely from as few as 7 to as many as 20. The Sympathy Circle often may be made up of kin, but usually includes some peers as well.
In sociology papers the Sympathy Circle is also known as a "sympathy group", but I wanted to avoid the term "group", as it is implies that all the members of a Sympathy Circle are connected. Instead, members of your Sympathy Circle will have additional people in their own Sympathy Circles that are not part of your own.
An interesting issue with the Sympathy Circle is that as a personal limit, 10-15 is a typical size. however, if you bring them all together in one place, they will likely become a Judas-Number-sized group, with all of the problems associated with that size.
The Trust Circle: These are the people that you have some type of intimate connection to. One study measured it as the people that you would send a family Christmas card to, while another simply tested emotional closeness.
In pre-Friendster days the Trust Circle would be those people that you considered your "friends", however today the meaning of that term has begun to change. In my own usage, your Trust Circle are people that you have strong ties to and that in some measure you can trust. I have also called the Trust Circle your personal "intimate social network".
The size of different individuals' Trust Circles can vary widely (40-200), but some studies show that the mean is on the low side of 150. This has led a number of researchers to compare this number with the Exclusive Dunbar Number of 150. However, I believe that this is a mistake; they are related, but in today's society members of your Trust Circle are rarely in the same mutual group.
The Emotional Circle
I personally define your Emotional Circle as the total number of people that you can have some type of non-mutual emotional connection with, most likely spread across numerous groups of all sorts. You "like" them in some way, but do not necessarily have to have strong ties to them.
In academia this threshold is called "social channel capacity". A study using two different methods to estimate, both suggest that it falls right around 290. However, I like to describe this number as "just short of 300." As I wrote in Dunbar Triage, many people confuse this number with the Dunbar Number (and in fact I have in some of my older pieces). However, like the Trust Circle, it's a distinct entity.
Emotional Circle size can vary quite a bit from individual to individual. Some people might have half the average capacity, and others considerably more — which is much more variation than you see among the sizes of smaller personal thresholds.
Some of those variations are individual, but some are societal. As I wrote in Cheers: Belongingness and Para-Social Relationships, I believe that our modern era of television causes us to create para-social relationships with imaginary characters who we nonetheless become emotionally involved with, and thus might reduce our social channel capacity.
Is our Emotional Circle smaller today because of TV or is it higher because online communities can help to remind us of our emotional connections to other people? That's a topic that probably deserves more study.
An interesting point to make is that the people who are in your Emotional Circle, but are not in your Trust Circle, are your "weak ties" in social network terms. What is important about weak ties is that studies show (pdf) that opportunities and knowledge flow to you much more through weak ties than through the more insular strong ties of your trust circle.
The Familiar Stranger
Outside of our Emotional Circle is a larger, more tenuous circle: those people whose faces you recognize, but who you know nothing more about. These are your "Familiar Strangers".
Studies show that the percentage of familiar strangers in your vicinity has a real impact on your willingness to take risks. If you are in a new place with no one that you recognize, you'll avoid eye contact and will generally be unwilling to approach strangers. In a place where there are a lot of people that you've seen before (say in your favorite cafe, at a conference, or in the lunchroom of a large company), you'll be much more willing to take risks, such as asking questions, or sitting down next to someone to eat lunch.
I haven't been able to find any studies to show how many people that we can recognize, but for some people it is much larger than the number of people in your Emotional Circle, probably well over a thousand. However, there is also a lot more variance: some people are face-blind or near face-blind, and have a difficult time even recognizing friends.
There could also be some interesting research looking more closely at social network software. I find it fascinating that the professionally-oriented social network LinkedIn resisted supporting photos in profiles for so long yet ultimately failed, as well as how other social network software companies have attempted to require "real" photos of people rather then allowing "fakesters" or avatars.
Crossing the Circles
I've used the term "circles" throughout this article because it's a great metaphor for these levels of personal involvement. They can literally be thought of as concentric circles of people getting further and further away from an individual.
However, if you want to consider them with an even more graphic bent, think of these circles as the ridge lines of a topographical map. An individual sits at the center, and around him lie many other people, fading slowly away as the distance increases.
Winding through these topographical lines, like forests or rivers, are geographies of physical and emotional connection.
Kin are one of the most interesting geographies, because they lie all across the map. There's a clump of them in the innermost circles, but there are also many who lie in the realm of Familiar Strangers, including those cousins and great-aunts who you only see at family gatherings, and whom you know nothing about.
There are also forces being exerted upon the circles, acting like gravity to draw people together. They are the forces of trust, influence, and more. Their pulls are greatest toward the center, across your Circles of Support and Sympathy, but as people move farther away, these forces drop off quickly.
Thus, though I've described them as circles, with strict boundaries, we should also see these personal connections as fluid entities, a regular ecosytem of personal community.Conclusion
Whereas the group thresholds that I discussed in my last article define the limits placed on community group size, the personal limits described herein instead define the limits placed on how many people an individual can know with various degrees of intimacy.
Perhaps there are societies where these two things might be the same. A true survival community might contain everyone a person knows, and thus he could draw out all his personal circles across that community canvas. However, in our modern era they're much more likely to be distinct, with an individual interacting with the members of his circles of acquaintances through numerous different group communities.
With this bifurcation of personal and group community limits, we have to briefly stop and ask a few questions. How do they relate? What can personal limits tell us about efficient community creation? Does founding a group upon a personal circle make its growth easier or harder? Conversely, what type of communities lead naturally to the creation of intimate circles?
Herein I've simply outlined personal thresholds as a contrast to group thresholds. The exploration of how these limits interact is worthy of additional studies.In my next article "Community by the Numbers, Part III: Power Laws", I will talk about how both group thresholds and personal thresholds have a role in larger, less tightly-knit groups.
Some other posts about the Dunbar Number and group size issues:
- 2004-03: The Dunbar Number as a Limit to Group Sizes
(also some really good comments)- 2005-02: Dunbar Triage: Too Many Connections
- 2005-03: Dunbar, Altruistic Punishment, and Meta-Moderation
- 2005-07: Cheers: Belongingness and Para-Social Relationships
- 2005-08: Dunbar & World of Warcraft
- 2005-10: Dunbar Number & Group Cohesion
- 2008-09: Community by the Numbers, Part One: Group Thresholds
My bookmarks to various papers and websites on this topic are available at delicious.com/ChristopherA under some of the following tags:
- personal circles - everything I have on the topic.
- familiar strangers - those people you recognize by face.
If you have any links on this topic that you would like to share with me, tag them for:ChristopherA and I'll take a look.
Illustrations by Nancy Margulies, photo by davitydave. Many thanks to Shannon Appecline and F. Randall Farmer for their assistance with this series.
Posted on November 25, 2008 at 12:44 PM in Social Software, Web/Tech | Permalink | Comments (5) | TrackBack (3)
Community by the Numbers, Part One: Group Thresholds
We often think of communities as organic creatures, which come into existence and grow on their own. However, the truth is they are fragile blossoms. Although many communities surely germinate and bloom on their own, purposefully creating communities can take a tremendous amount of hard work, and one factor their success ultimately depends upon is their numbers.
If a community is too small you'll often have insufficient critical mass to sustain it. Conversely, if it's too large you can end up with a community that's too noisy, too cliquey, or otherwise problematic. These optimal and sub-optimal community sizes appear in strata, like discrete layers of rock. For a community to advance from one strata to the next often takes immense energy.
We can analyze these community sizes in three ways. In this first article I'm going to talk about numerical group thresholds that have been observed in various sizes of tightly-knit communities, while in its sequel I'm going to talk about personal thresholds and how they relate to group thresholds. In my final post, I'm going to consider how power laws and inequalities of participation further complicate these simple values in the creation of larger communities. Together these three articles constitute what I call "Community by the Numbers," a theory of community size.
Though I'm going to point to some studies which support these numbers, in general my goal here isn't to try and prove this theory of community size numbers, but rather to lay the theory out completely.
Tightly-Knit Group Thresholds
Groups can clearly exist at any size, from a partnership of two, on upward. However what I'm going to write about here are the threshold values: the ideal numbers where a community seems to function best, and the less than ideal numbers at which a community begins to grow unstable, remaining so until a new threshold number is reached.
I'm also specifically talking about groups that are both tightly-knit and participatory communities. Clearly Ford Motor Company, with 250,000 employees, doesn't match any of these group thresholds. But any self-contained community within Ford probably will (and in fact, it will probably be either a "Working Group" or a "Non-Exclusive Dunbar Group", both terms I'll explain below). Similarly, a non-corporate community that doesn't require everyone to participate won't work quite the same as a community that does require participation from each member (though that's again the topic of the third article in this series).
7, "The Working Group".
This community size probably runs from about 4-9 members, but 7 is a pretty good average, and one that shows up in multiple studies. This number may well relate to the general rule of seven (original paper), which suggests that 7 is a number that the brain can easily and intuitively comprehend.
It has become increasingly clear that a tightly-knit group of 7 is the first group size which is truly an optimal community size. Groups below this size can function effectively, but risk not having enough manpower to deliver a result that everyone is happy with, or having insufficient viewpoints to avoid group think.
Seven is not only an optimal size for a wide variety of corporate and government committees, it is also a healthy size for a small business and even a good size for a party of close friends. More importantly, 7 is a very comfortable group size as it "feels" relatively natural. At this size members find it easy to get to know the other members of the group, and they're able to function well together in a very intuitive and organic fashion.
An interesting example of this group size is the modern infantry
"squad", which consists of two fire teams of 4 people, and a squad
leader, for a total of 9 people. Each fire team is is large enough to
function on its own, but together the group of 9 can still have effective
small group dynamics.
It is typically at this size that the first signs of leadership in a group informally emerge, but the leadership usually isn't overbearing at this level, nor does there tend to be any rebellion against it — perhaps because the group may be too small to elicit multiple leaders.
13—"The Judas Number". A group size of 13 doesn't represent a threshold ideal value, but rather a threshold nadir. It is one of the points where groups can change behavior and risk becoming dysfunctional. There's one of these nadirs beyond every group threshold, where the previously harmonious group dynamics become more difficult. I've chosen to highlight this specific number because it's a point that small communities often hit, particularly as entrepreneurial organizations try to grow above their startup beginnings.
(I should note that 13 isn't a precise number, but rather one offered because it's in the right range and because it's poetically easy to remember. The exact number occurs somewhere between 9 and 25, but I suspect it is worst in the range of 12-15.)
In a group of this community size no one ever feels like they get a fair share of time. Studies show that at this size participants underestimate the amount of time they contributed to the conversation, and thus will come out feeling like they were unfairly ignored despite having a fair share of the conversation. Groups of this size risk people being lumped into categories and ceasing to be trusted as individuals. In addition, problems start with the development of "too many chiefs," yet there is not enough enough variety of non-chiefs for them to direct. Furthermore, multiple leaders may struggle for hierarchical status, increasing the conflict in an already troublesome group.
If your community is unfortunately stuck at this nadir, one of two things usually occurs.
Most commonly, the group shrinks. This could be because participants unhappy with the group dynamics abandon it; or it could occur in a more organized way with the unwieldy large group breaking into two or more smaller groups. For example, a terrible group of 13 could become two more functional groups of 6 and 7.
Alternatively, more energy could be expended. This could be in the form of more formal organization, rewards for participation, or more time to be casual and socialize in order to shake off the tensions of this size group. Though these efforts don't usually change the size of the group, they can improve its dynamics.
Energy could also be spent to help push the group up to the next threshold. Though this could occur naturally — for example if the group focuses on a topic of particular interest that causes new people to continually be added. In addition, in order to grow a group to a new threshold it often requires the efforts of more than one leader to succeed.
A group size of 13 isn't necessarily bad, just more difficult. Anthropological studies show that primitive hunting tribes often temporarily broke into "bands" of this size — my presumption is that the value of having that many people hunting together outweighed the social costs of the group. It is interesting that most juries are made up of groups this size. I believe that the social dynamics of this size of group with all new members creates some tension among the jurors, which may serve justice to make sure that all sides are considered by the jury without falling into groupthink. However, from my experience, the interpersonal conflict in a jury can also slow down the deliberation process and cause much frustration among the participants.
50—"The Non-Exclusive Dunbar Number". More properly this group size falls in the range of 25-75 participants, but it seems to feel the most natural in the range of 50-60. Studies of the sizes guilds in online games support this hypothesis. For instance, based on graphs of the guild sizes in Ultima Online, groups have a median of 61 members. Similar numbers hold true in studies of a more recent game, World of Warcraft.
I call this value the "Non-Exclusive Dunbar Number" because it matches the lower end of a threshold that Robin Dunbar set for group sizes. However, at this size it applies to mostly non-exclusive groupings, which includes the above mentioned online guilds, many employee communities, and the majority of social gatherings that manage to rise above the size of a Working Group. Groups of this size can be serious or take up a lot of time, but in general they are not exclusive — they don't tend to be the only group that individual participants are involved in.
90—"The Dunbar Valley". As Non-Exclusive Dunbar Number communities grow, they reach a point where increased time obligations and the noise of socialization required to keep the group cohesive requires a much more serious commitment from the participants. Like the Judas Number, the Dunbar Valley is a threshold nadir where more energy is required to keep a tightly-knit community together; either the community agrees to a higher level of commitment and grows to the next level, or the community splits apart.
I've found this to be true when growing a small business — where it is too small for any middle-management, but the sub-groups are too large for one person to manage effectively. I've also seen this with more ephemeral groups, such as when a small conference that worked well at 60 participants tries to grow and finds at at 100 participants they can't sustain a high enough intimacy level.
Another illustration of the Dunbar Valley is the history of the ancient Roman "century", a grouping that was originally 100 soldiers. However, as the years went by, centuries tended to decrease in numbers to only include 70 or 80 soldiers. This might well be due to Non-Exclusive Dunbar constraints: even in a very devoted group of military men, there was still the need for relationships with other century groups, with support staff, and with camp followers, ultimately lowering the attention that could be spent on the century itself.
150—"The Exclusive Dunbar Number". Robin Dunbar got much of the discussion of group thresholds started with his article, "Co-Evolution Of Neocortex Size, Group Size And Language In Humans." However, as I've written previously, and as I've described in this article, Dunbar's group threshold of 150 applies more to groups that are highly incentivized and relatively exclusive and whose goal is survival.
Dunbar makes this obvious by the statement that such a grouping "would require as much as 42% of the total time budget to be devoted to social grooming."
The result of the grooming requirement is that communities bounded by the Exclusive Dunbar Number are relatively few. You will find hunter/gatherer and other subsistence societies where this is a natural tribe size. You'll also find these groups sizes in terrorist and mafia organizations.
Clearly, as we step up toward higher group thresholds, more and more
time is required to simply keep the group going. You see this in
depictions of mafia life — in the TV series The Sopranos a lot of time
is spent dining, hanging out, and drinking together. That is part of that 42% social
grooming time required for that intense of a survival group.
It is possible for a large company to force groups up to this size by expending lots of energy (which is to say money) to keep it healthy. Apple did this during the invention of the Macintosh, the first OS X operating system, and the iPhone, but the intensity required of such large teams is not sustainable for long periods of time.
Without that extra energy, few modern tightly-knit communities can reach this threshold, or else can't hold it for very long. Instead they fracture into groups of individual interest (even if they continue to "meet" in the same real-world or online forum), which are more than more likely to be bounded by the Non-Exclusive Dunbar number.
Given the difficulty in even arriving at the Exclusive Dunbar number, it may well be the highest limit of all for a tightly-knit community. Beyond this limit, communities are less cohesive, less trusted, and less participatory (and the topic of my third article in this series.)
Conclusion
There are many different ways to measure groups, and one is by counting its members. As I've discussed here, the number of members can have a huge impact on whether the communities are successful or not. Thus, as community organizers, social software engineers, game designers, or as sociologists interested in community dynamics, we must ultimately consider group thresholds and group nadirs; to understand how to create cohesive communities, rather than groups that fly apart.
In my next article I'm going to talk about thresholds that are personal, rather then group-oriented.
Some other posts about the Dunbar Number and group size issues:
- 2004-03: The Dunbar Number as a Limit to Group Sizes
(also some really good comments)- 2005-02: Dunbar Triage: Too Many Connections
- 2005-03: Dunbar, Altruistic Punishment, and Meta-Moderation
- 2005-07: Cheers: Belongingness and Para-Social Relationships
- 2005-08: Dunbar & World of Warcraft
- 2005-10: Dunbar Number & Group Cohesion
- 2008-11: Community by the Numbers, Part II: Personal Circles
My bookmarks to various papers and websites on this topic are available at delicious.com/ChristopherA under some of the following tags:
- group threshold - everything I have on the topic
- workinggroup - on small groups such as committees
- dunbar number - on larger groups such as tribes
If you have any links on this topic that you would like to share with me, tag them for:ChristopherA and I'll take a look.
Many thanks to Shannon Appecline and F. Randall Farmer for their assistance with this series.
Posted on September 24, 2008 at 01:53 PM in Social Software, Web/Tech | Permalink | Comments (2) | TrackBack (1)
New Blog for Ephemera
This blog has been quiet lately as I've been doing a lot of work in the last year on the iPhone. I've been speaking at conferences like eComm 2008 (presentation, video from panel), writing an book on the iPhone with my co-author Shannon Appelcline called iPhone in Action: Introduction to Web and SDK Development (first two chapters free), and I am one of the organizers for the upcoming iPhoneDevCamp 2, a MacHack style conference on August 1st-3rd, and I am working on some social software apps for the iPhone.
I've been reluctant to post too many of these off posts on this blog, as I like the length, quality and high-signal-to-noise ratio of the posts that I've written for this blog. I do plan to continue to offer more social software and social media posts, including followups to my Collective Choice articles, as well as updates on my popular Dunbar Number posts. But I don't want to spam my readers here with notes on iPhone Apps, my thoughts on movies, circuses, the odd conferences I speak at, etc.
So I've been trying to figure out how to cover some of the other things that I am up to without loosing my readers here, so decided to a create a new blog for my shorter and more transitory thoughts: Christopher Allen's Ephemera Blog. So far I have posted:
- Max 9 pages, thus Max 144 Apps on iPhone OS 2.0
- First Look at AirMe App for iPhone
- Trying Out TypePad's Blog It Web App
- iPhone TypePad App Second Pass
- iPhone TypePad App First Look
- Ephemera Blog
The RSS feed for this blog is here
.
If you are interested in some of the other things I am looking at, you can follow my del.icio.us bookmarks, or read my Google Shared Items, watch my Twitter posts, or see them all combined together in either FriendFeed or Plaxo Pulse.
Posted on July 13, 2008 at 01:22 PM in Weblogs | Permalink | Comments (0)
In Seoul for the Social Web
I'm in Seoul, South Korea this week for the 13th Global Forum on Business Driven Action Learning and Executive Development, where I'm presenting on the topic of the how to get involved with the Social Web.
My presentation is an offshoot of an odd sideline of mine, executive blog and social web coaching. Basically, many times over the last couple of years I've been asked by colleagues and friends to help them with the social web. I've always been honored to spend the time to teach them how to blog, better manage the noise of the web by using a feed reader, how to participate in social networks, and how to improve their personal brand.
Increasingly, the social network of their peers is now asking me to help them to do the same for them. So I've been doing this more and more over the last year on a consulting basis as their social networks appear to be saying saying that I have a lot to offer.
I've twice now been out to Cincinnati where I've been working with Drew Boyd at Johnson & Johnson, where I've been coaching him on his blog and personal brand. He has been blogging now for almost six months at Innovation in Practice with some success. Now he is referring me to other executives at Johnson & Johnson who need similar coaching, or need strategic advice on how to deal with specific social web initiatives within Johnson & Johnson.
Most recently, Drew has asked me to join him in Seoul to speak at the Global Forum, which is a worldwide gathering of executive trainers. There I am presenting a synopsis of what I teach, and how they can begin to learn how to teach their staff how to do the same. My trip has been sponsored by Johnson & Johnson, so it has been a lot of fun to be here. I get to both dive into deep discussions of methodologies for teaching inside business, as well as learn about the Korean and other international business cultures and how they are different from the United States.
During the evenings I've had a chance to wander the streets of downtown Seoul. It is very different then wandering though streets in Europe, not only because of the different culture and language, but also because all of the signs are in Hangul, the korean character set. Very few signs use the roman alphabet, so it is often very difficult to figure out what is what without walking in. Like Japan, there are some signs in English to represent a brand or a style, but not many. Even China has more signs in English then Korean's do -- the last time I was there I was surprised by how almost every important sign was in both Chinese and English, so much so I was beginning to learn Chinese language characters just by association! Despite the language problems everyone is nice and it feels safe to wander in these neighborhoods.
I return on Monday, and I'll make a copy of my final presentation available here.
Posted on June 25, 2008 at 01:18 AM in Weblogs | Permalink | Comments (0)
iPhoneDevCamp and Hack-a-Thon
I feel privileged and honored to have been part of the iPhoneDevCamp this last weekend. Over 380 iPhone developers came out to the Adobe Campus in San Francisco to help each other make the best possible web pages and webapps for the iPhone.
I was the keynote speaker on Saturday and Master of Ceremonies for the MacHack-style Hack-a-Thon Demo on Sunday.
At the Hack-a-Thon almost 50 iPhone web applications were demonstrated to an enthusiastic audience. Take a look at Tilt, a game that takes advantage of the iPhone's motion sensor, PickleView, which is a same-time live baseball game enhancer, and The Pool, an attractive social game of water droplets hitting a pool. What is remarkable about these applications is not just the quality, but that each of them was written over just the weekend by a small team of 3-4 people who hadn't met each other before Friday!
Prizes were awarded after the Hack-a-Thon based on the spirit of openness, contribution, sharing, and participation. Prizes included 3 iPhones and some very expensive Adobe software. In particular Joe Hewitt, of Firebug fame, was honored for his positive contributions, generous spirit, and wonderful iPhone UI example code. During the demonstrations, more than one person praised Joe, saying that his assistance, his code, or his debugger made their apps possible. Personally, I think about one-third of the web apps presented used some of his code.
Building on my experience with the same-time collaboration tool SynchroEdit, and the Skotos web-based games, I worked remotely with Kalle from Sweden and Erwin from Kansas to present an AJAX chat application called iLace. I am particularly proud of how well this little web application performs and how well it works using the iPhone UI. In particular, I think its melding of text entry and chat message receipt and its response to changes between portrait and landscape modes are very good examples of what can be done for chat on the iPhone. Source code is available!
My keynote presentation slides are now available in .pdf and .mov. I'm told a live recording of the session and an .mp3 will be available soon.
Over the last few weeks an online developer community that I started at WWDC called iPhoneWebDev has grown to over 650 members. It's now the best place to get online support for building iPhone web pages and webapps. I'd like to keep the momentum from the iPhoneDevCamp going forward on this list, so if you are interested in developing for the iPhone, check out the example code and join the discussion today!
Posted on July 8, 2007 at 11:19 PM in Games, iPhone, User Interface, Web/Tech | Permalink | Comments (0) | TrackBack (0)
Getting Ready for the iPhone
I've been excited about the web capabilities of the upcoming iPhone for some time. As a reluctant laptop user ("oh, my aching shoulders"), there is real appeal to me in a better portable web browser. I have tried most of the PDA and cellphone browsers to date, and none offer more then a poor cousin to the web that we experience on the desktop.
Instead, the iPhone offers a desktop-class browser. There is no transcoding, nor any subset of HTML such as WML. Full web pages are rendered in the small display, and when you "double-tap" with your finger the section you touch is expanded to a more readable size. The video available at the Apple website shows this capability in use.
Because of the iPhone's upcoming July 29th release, I decided to participate in this week's Apple WWDC conference for Macintosh developers. There a number of announcements about the iPhone were released, and a number of technical sessions on the iPhone and iPhone-related technologies were held. Together the iPhone demonstrations at the public keynote and other demonstrations throughout the WWDC offered some real promise for when the phone is released on June 29th.
The biggest announcement at the public keynote was that there will not be an SDK for building native iPhone apps; instead, the only way for third parties to get involved is to create web applications optimized for the iPhone. This came as a big disappointment to the majority of developers participating at WWDC. However, as someone who has been involved lately in creating AJAX/Web 2.0 apps, I was less unhappy.
The other significant announcement at the keynote was that a Safari 3.0 beta for both Mac and Windows was being released and that a third Safari platform would be released on July 29th—inside the iPhone. This means that web 2.0 applications created to work with Safari on the Mac will likely also work on the iPhone.
Since SynchroEdit, an open-source simultaneous web editor (in the style of SubEthaEdit) for Firefox that I produced last year, is one of the most sophisticated AJAX/Web 2.0 applications, I dug deeper at various WWDC sessions to see if it might be possible to make SynchroEdit work on the iPhone.
One of the biggest things that SynchroEdit needs in order to function is DOM Mutation Events. At a party for WebKit (the open source code underpinnings of Safari's web renderer) and in questions after a session at WWDC it was confirmed that these are available to Safari 3.0 and presumably the iPhone.
The other key ability that SynchroEdit requires is WYSIWYG editing. This was terribly broken in Safari 2.0, but I saw many demonstrations of it working in Safari 3.0, so I don't anticipate any problems with this.
SynchroEdit also requires AJAX and in particular the XMLHttpRequest function, and the keynote clearly said that this was available.
The final thing that SynchroEdit needs is the ability to keep the browser at readystate==3, i.e. not "finish" sending the page, so that we can continue to interactively pass updates to users as they arrive, without creating a new connection for every message. It is not clear if this will be supported on the iPhone, but there are ways to work around it.
So, in principle, it appears that we should be able to make SynchoEdit work on the iPhone. I am not sure that many iPhone users need SynchroEdit, but as an example of a very sophisticated web technology that should work on that platform, it shows the potential for what might be possible.
Because of this technological capability I've decided to begin investigating what type of social software apps could be highly useful on the iPhone and that aren't being served by the existing web 2.0 community. I am also going to continue investigating the technical issues of developing web apps for the iPhone
If you are interested as well, I invite you to participate in the new iPhoneWebDev community. It should be a great resource for everyone interested in getting in on the ground floor with this new web technology. I have also begun tagging relevant web pages in del.icio.us with the tag iphonewebdev—I hope that others will begin to use this tag as well.
I have quite a bit more I'd like to write about specific iPhone technology, but unfortunately I have to wait until the WWDC confidentiality expires on June 29th with the release of the iPhone, so keep an eye out here for more details.
Posted on June 15, 2007 at 08:06 PM in iPhone, Social Software, User Interface, Web/Tech | Permalink | Comments (0) | TrackBack (1)
Collective Choice: Experimenting with Ratings
by Christopher Allen & Shannon Appelcline
[This is the fourth in a series of articles on collective choice, co-written by my collegue Shannon Appelcline. It will be jointly posted in Shannon's Trials, Triumphs & Trivialities online games column at Skotos.]
Last year in Collective Choice: Rating Systems we took a careful look at eBay and other websites that collect ratings, and used those systems as examples to highlight a number of theories about how to make rating systems more useful.
We suggested three main methods for improving rating systems:
Granular Ratings: Based on the clumping of ratings to high values, we believed that ratings could be made more useful by increasing the size of a rating scale. Most rating scales are 5-point ranges, so we suggested a 10-point range instead.
Distinct Ratings: Raters can be somewhat arbitrary in how they rate items, varying both from each other and even from themselves (usually over multiple sessions). Thus we believed that providing explicit statements of what each number meant could improve ratings.
Statistical Ratings: Finally we stated that in low volumes ratings could be biased by various quirks of data entry, either malevolent or not, and that ratings could be improved with strong statistical methods being used to polish up data and automatically keep "bad" data in line with "good".
In the year since we wrote that article we've decided to practice what we preach and have rolled out an entirely new rating system called The RPGnet Gaming Index. We've applied all of the above theories and thus far it looks like they're not only working, but that they're actually providing better rating systems than previous ones we've used at the RPGnet site.
In this article we're going to step through the data we've collected from this experience and see how it applies to our theory: first by looking at our previous RPGnet rating system, then by looking at the new system, and finally by by examining the data from these two systems and comparing their results. We've also run into some unexpected troubles along the way, and we'll talk about that too.
The RPGnet Reviews System
RPGnet is our gaming site for tabletop roleplaying—games like Dungeons & Dragons and Vampire: The Masquerade. We purchased it in 2001 from the original owners. One of the benefits of RPGnet was that it had a very large community. As of today it sports one of the top-100 forums on the Internet, with over 1000 simultaneous users regularly logging in. However, because of its maturity, we also inherited many existing systems.
One of these was the RPGnet Reviews System which gave individual users the ability to review gaming products—mostly role-playing games, but also board games, books, DVDs, and a smattering of related products.
Most of these reviews are submitted by average readers who just want to talk about a product that they like (or don't), though a fair percentage are instead submitted by staff reviewers. (Overall at least 26% of our reviews are based on publisher "comp" copies, and thus may be considered largely professional, while the other 74% may or may not be.) The large community size of RPGnet applies to the Reviews System as well: currently it features 8,505 published reviews.
Looking at the RPGnet Reviews through our three filters we find the following:
Granularity. The ratings from our existing reviews aren't as granular as we'd like. We have a theoretical scale of 2-10, but that's based upon a Style rating of 1-5 and a Substance rating of 1-5.
| Rating | Style | Substance | % |
| 1 | 81 | 225 | 1.8% |
| 2 | 732 | 651 | 8.1% |
| 3 | 2364 | 1777 | 24.3% |
| 4 | 3618 | 3525 | 42.0% |
| 5 | 1709 | 2326 | 23.7% |
Approximately 90% of raters rate only with values of 3-5, and thus our scale is more limited than the 2-10 range would indicate. 42.9% of reviews further rate Style and Substance exactly the same, suggesting that not everyone sees a difference between these two elements. On the whole this scale isn't as a bad as a singular 5-point scale, but it also isn't a real 10-point scale, and the two orthogonal types of comparison don't necessarily provide a coherent description of a product.
Distinctiveness. Conversely, the review ratings are fairly distinct because the Review System provides an explanation of what each rating number means. For example the five Substance ratings are: I Wasted My Money (1); Sparse (2); Average (3); Meaty (4); Excellent(5). The descriptions could be better, but hopefully they connect to some users in meaningful ways, and help them to rate consistently.
Statistics. Our review ratings have no statistical basis. These values are used entirely unfiltered.
On the whole, the existing RPGnet Reviews embodied slightly less than half of what we wanted to see in a rating systems: some improvement over a simple 5-point scale; some effort put into making individual ratings distinct; and nothing statistical.
There is room for improvement, however, as we'll see when we analyze this system more fully.
The RPGnet Gaming Index
Our newer system is the RPGnet Gaming Index. It doesn't supersede our Reviews, but instead offers a complementary look at the roleplaying field. The Index is essentially an RPG industry database. It contains individual entries for many different gamebooks—currently 5248—and allows registered users to rate each of them. Those ratings are then turned into averages by various mathematical formulas on a nightly basis and the roleplaying games in our index are then ranked.
The large size of RPGnet has allowed us to very quickly turn our ideas of a Gaming Index into reality. Just six months after release we have:
- 5248 well-written Index entries
- 5908 different editions
- 4240 authors
- 4478 covers
- 360 different game systems
- 345 series
- 10142 individual ratings
Most of the ratings are clumped around the best and worst games, with many less popular games unrated as of yet. Four different items have at least 80 ratings each (Call of Cthulhu, Exalted, Nobilis, and Unknown Armies). Our average rating is 6.79. Ratings above 7.82 are in the 99th percentile, ratings above 7.21 are in the 90th percentile, and ratings below 6.53 are beneath the 10th percentile.
(For more info on the creation of the RPG Index, and how to encourage user generated content, see Shannon's articles, "Managing User Creativity", Part One and Part Two.)
The RPGnet Index also handles some unusual situations, such as when a game book contains other game books as part of an anthology or compilation. For instance, the 8-book compilation In Search of Adventure has a composite rating of 6.57 which is partially based upon the individual adventures that make it up.
Granularity: The first thing we did was provide a 10-point scale for this new system.
Distinctiveness: We also made sure each point of the scale was clearly defined. Currently the points of our scale are: Worthless (1), Poor (2), Some Flaws (3), Almost Average (4), Average (5), Above Average (6), Good (7), Very Good (8), Outstanding (9), and One of the Best Ever (10).
We made some mistakes in our original release of our "distinctive" titles, and we discovered this had real effects on the user input, telling us that these title labels are meaningful to users.
First, we initially labeled 6 as "average", to mirror the rating system for our existing Reviews, rather than setting 5 to be average. But as we noted in our first article, people like to be nice, and thus they tend to rate on the good side of a scale. Changing the label for our definition of average from 6 to 5 has slowly started dropping the average of all ratings down as a result (providing more breadth, a topic we'll talk about more shortly).
Second, two of our original distinctive titles were at odds with the others. Our original "2" value said that the game had "a few useful elements" and our original "9" value said that it was the "best of the year". The 2 was much more specific than any of our other terms and the 9 created a comparative query that was very different from anything else. Overall our ratings conformed to a bell curve centered between 6 and 7, but we saw very clear dropouts in our curve at 2 and 9, telling us that we'd made mistakes in those terms, and that people were less willing to use them as a result. Since we've made the change to our current set of titles those two discontinuities have disappeared.
Statistics. Finally, we fully integrated statistics into our new Index by using two main methods: bayesian weights and trust.
We explained bayesian weights pretty fully in our previous article. Here's what we said then:
The idea behind a bayesian average is that you normalize ratings by pushing them toward the average rating for your site, and you do that more for items with fewer ratings than those with more ratings. The basic formula looks like this:
b(r) = [ W(a) * a + W(r) * r ] / (W(a) + W(r)]r = average rating for an item
W(r) = weight of that rating, which is the number of ratings
a = average rating for your collection
W(a) = weight of that average, which is an arbitrary number, but should be higher if you generally expect to have more ratings for your items; 100 is used here, for a database which expects many ratings per item
b(r) = new bayesian rating
Say three "shill" users had come onto your site and rated a brand new indie film a "10" because the producer asked them to. However, you use a bayesian average with a weight of 100, and thus 3 ratings won't move the movie very far from the average site rating of 6.50:
b(r) = [100 * 6.50 + 3 * 10] / (100 + 3)
b(r) = 680 / 103
b(r) = 6.60
We implemented bayesian weights exactly as we'd detailed, but with a lower weight of 25. Since then we've accrued over 10,000 ratings in the database, and we can probably start thinking about cranking that weight up, another topic we'll return to.
Our trust-based algorithms suggest that some ratings are better than others, and should thus be more trusted (and thus more weighted when we calculate the average rating of an item). Though bayesian weights have been used before, we're not aware of other systems that weight ratings based on trust.
The calculation of trust is very simple:
Weight = 0 if #ratings(user) <= 2
Otherwise Weight = #ratings(user) / 50 to a maximum of 2
Weight *= 2, to a maximum of 4, if the user included a comment
This was based on the idea that the average good rater would rate 25 different items and the average great rater would rate at least 50. Additionally, we believed that ratings with comments were more likely to be thoughtful than those without.
That, overall, is a quick picture of what we've done with the RPGnet Gaming Index. Some of these ideas were laid out from the start, and others have been tuned as we progressed.
So how did we do, particularly in comparison to our existing RPGnet Reviews System?
The Comparison
One of our goals in improving rating systems has been to widen the range of possible input. As we noted earlier we discovered that 90% of our RPGnet Reviews Ratings were in the 3-5 range, and only 10% in the 1-2 range.
Generally, we can measure the success of widening a range by seeing whether the average rating of a database moves toward the true average. For the purposes of a 10-point scale from 1-10, that's a desired value of 5.5. That generally means we're looking for our average rating to decrease because people tend to rate high.
The following table compares the average results of Reviews ratings and Index ratings.
| Database | Average |
| Converted Reviews | 7.25 |
| Massaged Reviews | 7.29 |
| Unweighted Index | 7.10 |
| Weighted Index | 6.78 |
Here's what the categories in the above chart represent:
Converted Reviews: The Style + Substance of the Reviews, converted from its 2-10 scale to a 1-10 scale:
$rating = avg($style) + avg($substance);$rating = ($rating * 1.125) - 1.25;
Massaged Reviews: The Style + Substance of the Reviews, with Substance given double weight over Style because we think that more closely reflects the intentions of the reviewer, converted from its 2-10 scale to a 1-10 scale:
$rating = (average($style) + 2*average($substance))/1.5;$rating = ($rating * 1.125) - 1.25;
Unweighted Index: Index ratings exactly as users have entered into our Gaming Index:
$rating = average($index-rating);
Weighted Index: Index ratings adjusted by the weight of each individual rating, which is based on user trust and inclusion of comments:
$rating = average($index-rating*$index-weight)/average($index-weight);
Our average rating—which is our criteria for success—decreased somewhat from the Reviews System to the Gaming Index and it decreased much more dramatically when we introduced our trust systems.
The following chart shows the a typical example of how review and index ratings differ, using the venerable Dungeons & Dragons Player's Handbook as an example:


For this book the median ratings from reviews-only is 8, and the median from index-only is 7. A one-to-two point drop in median rating from reviews to index was consistent in all of our most-rated games other than those which were a rated a "10" in both places.
We believe that this initial success of our unweighted Gaming Index can be attributed to the slightly better granularity—a 10-point scale versus two 5-point scales—and our improved distinctiviness—based on better naming of the rating levels. The veracity of this will ultimately be played out as the Index grows.
However we have no doubt that our statistical approach to the index data, when we moved from our unweighted Index to our weighted Index, is providing even better results. We had theorized that users who input more and who include comments would provide "better" data, and by our criteria of the average of the ratings moving toward 5.5 that seems to be borne out. The following table looks at the information a bit more precisely, by comparing average ratings as total number of ratings increases over several ranges:
| # of Ratings | Average w/Comment | Average w/o Comment |
| 1-2 | 8.55 | 8.88 |
| 3-24 | 8.08 | 8.16 |
| 25-49 | 7.32 | 7.11 |
| 50-99 | 7.14 | 7.03 |
| 100+ | 6.17 | 6.99 |
This table fairly definitively shows that base maxim: that the breadth of the ratings, and thus their quality, increases the more ratings a user makes. The improved quality of ratings with comments is less definitive. Among the vast mass of users the two values are pretty close, and sometimes the reverse of what we expect, but for the best and the worst users, ratings with comments seem to be better than those without. This latter point is another one that we'll have to continue to monitor as the Index grows beyond its current total of 10,000 ratings.
The other major element of our statistical approach to the Index is our bayesian weight. The following chart shows a top-ten chart for roleplaying games calculated via four different methodologies: our Reviews; our Index with no weighting; our Index with a 25 bayesian weighting (as it currently stands); and our Index with a 50 bayesian weighting:
| # | Reviews-Only | 0-weight Index | 25-weight Index | 50-weight Index |
| 1 | Delta Green: Countdown | The Chronicles of Talislanta | Delta Green: Countdown | Delta Green |
| 2 | Nobilis | Wildside | Spirit of the Century | Delta Green: Countdown |
| 3 | Castle Falkenstein | Devil's Due | Delta Green | Unknown Armies |
| 4 | Vimary Sourcebook | Lodges: The Faithful | Unknown Armies | Call of Cthulhu |
| 5 | Liber Servitorum | Apocalypse | Call of Cthulhu | Nobilis |
| 6 | Ork! | Earthdawn Gamemaster's Compendium | Nobilis | Spirit of the Century |
| 7 | GURPS Russia | Into the Badlands | Pendragon | Over the Edge |
| 8 | GURPS Reign of Steel | Earthdawn Player's Compendium | Over the Edge | Pendragon |
| 9 | Cudgel's Compendium | Chronicle of the Black Labyrinth | Mutants & Masterminds | Mutants & Masterminds |
| 10 | Corum | The Spell Book | Pulp Hero | Vimary Sourcebook |
We actually did do a little bit of statistical analysis on the Reviews because on our first try to produce this chart we got a random clump of reviews that were 5/5 from a much larger pool, so we further ordered them by descending total count of reviews, and as a result you're seeing a better selection of ranked reviews than a truly unstatistical sampling would allow. We did the same for the unweighted Index (which clumped a number of results at "10"), except we further ordered items at the same weight by decreasing number of views (another statistical decision).
Clearly, deciding which of these lists is "right" is a much more subjective measure than the mathematical analysis we were able to apply to earlier problems. However, most roleplayers would tell you that the unweighted Reviews and Index lists are terrible. The top 5 items in the Reviews list actually aren't bad for a starting list of good games—but only because we did the aforementioned statistical ordering. Before that we just had a random listing of gaming items. Even with our attempts at quickie statistical analysis the unweighted Index is still quite bad, with only Talislanta regularly showing up on other "best" lists.
The problem is the ability of one person to come in and rate an item a "10" (or a "5"/"5"), thereby making that item more highly rated than any item which has an actual consensus of ratings. Of our unweighted top Reviews only the top three had more than 2 reviews and the rest had 2. Not surprisingly those top three were the best fits to a typical top-ten list. Of the unweighted Index only the top three had more than 1 rating, and the rest had 1. Our single good pick was in those top three.
Our 25-weight Index, which is what we currently use, has been generally accepted by the RPGnet community as a good marker of what's good and what's not. However there have been two items on it which some percentage of people disagree with: Spirit of the Century and Pulp Hero. It's instructive to see that when we increase to a 50-weight Index Spirit of the Century drops (even more notably than depicted here, because its actual rating changes from .01 from first place to .16 from first place) and Pulp Hero disappears entirely.
The questions of what to set your bayesian weight to, when to increase it, and what maximum value to set it to are all relatively unstudied and thus we don't have good answers to them. As we pass 10,000 ratings we're considering upping the bayesian value to 50. We expect that 100 will be our ultimate value when the Index is fully mature, however if we increase the weight too far an older, less rated game will never be able to get enough weight to get out of the doldrums.
Conclusion
We're by no means done with this ratings experiment. Though we've pleased and impressed with the growth of the RPGnet Index thus far, by next year we hope that the Index will include the vast majority of all games in print (as opposed to somewhat less than half now) and that our 10,000 ratings will grow to 50,000 or more. This will allow us to offer even more definitive answers to our questions.
In the meantime we're still mucking with our statistics and facing new problems. Some of the newest:
- What to do about drive-by ratings: Our trust algorithm does a good job of making drive-by ratings, where a publisher points his audience to an item in our site, mostly irrelevant, but there's some concern that they could have more effect in the long run.
- How to incorporate our review ratings in our index ratings: It seems a shame to waste the thousands of reviews that have been written—and indeed currently they're calculated into a composite rating we use in the Index—but we're realizing that people have very different purposes for writing reviews and inputing ratings, which may result in some of the upward skew we see on the review side of things. Ultimately we need to decide whether they're just too different or whether our statistical massaging is enough to incorporate those reviews into a composite Index rating.
- How to pick some of our numbers: As we already noted we don't have good formulas for when to choose which bayesian weights. Likewise we've been guessing at which values to use for the trust-based weighting of our raters. Originally we set our desired rating count to 100 for good rater and 200 for great raters, but we've since dropped those to 50 for good and 100 for great based upon the real numbers of ratings that users were making. Again, we'd prefer to derive an actual formula for this type of calculation
Shannon has discussed some of these issues more in his recent article More Thoughts Abour Ratings.
Despite unanswered questions, we still feel good about the basic ideas we laid out in our article last year. We have no doubt that giving our ratings a statistical basis has dramatically improved them and evidence thus far suggests that both granularity and distinctiveness have been helpful as well.
Related articles from this blog:
2005-12: Systems for Collective Choice 2005-12: Collective Choice: Rating Systems 2006-01: Collective Choice: Competitive Ranking Systems 2006-08: Using 5-Star Rating Systems
Related articles from Shannon Appelcline's Trials, Triumphs & Trivialities:
#192: Managing User Creativity, Part One #193: Managing User Creativity, Part Two #196: Collective Choice: Ratings, Who Do You Trust? #198: Collective Choice: More Thoughts About Ratings
Posted on January 1, 2007 at 10:38 PM in Social Software, User Interface, Web/Tech | Permalink | Comments (1) | TrackBack (0)
Speaking about SynchroEdit at WikiWednesday
I will be speaking tonight at WikiWednesday on the topic of Same Time, Different Place Editing, and will be demonstrating SynchroEdit integration with MediaWiki and EditThisPagePHP.
If you are interested, see you tonight (Wednesday) at 6-8pm, at Socialtext.
Posted on December 6, 2006 at 02:34 PM in User Interface, Web/Tech | Permalink | Comments (0) | TrackBack (0)
Ratings: Who Do You Trust?
My colleague, Shannon Appelcline, has been working on a game rating system for RPGnet. This has resulted in real-world application of the principles for designing rating systems which we've previously discussed in our Collective Choice articles. Shannon's newest article, Ratings, Who Do You Trust? offers a look at weighting ratings based on reliability.

On the RPGnet Gaming Index we've put this all together to form a tree of weighted ratings that answer the question, who do you trust?
Here's how we measured each type of trust, and what we did about it:
Volume of Ratings for an Item. Introduce a bayesian weight to offset the variability of items with low-volume ratings.
Volume of Ratings by a User. Give each user a weight based on his volume of contribution which is applied to his ratings.
Depth of Content by a User. Give each rating a weight based on the depth of thought implicit in the rating which is applied to that rating.
These all get put together to create our final ratings for the Gaming Index, with each user's individual rating for an item getting multiplied by its user weight and its content weight, and then all of that averaged with the other user ratings and the bayesian weight too. The result is in no way intuitive, but users don't really need to understand the back end of a rating system. Conversely we hope it's accurate, or at least more accurate than would otherwise be true given the relatively low volume of ratings we've collected thus far.
Here are some of Shannon's earlier discussions about the design behind the new "user content" based RPGnet Gaming Index:
- Encouraging User Creativity - A look at the "XP" system which has helped to incentivize the creation of the database at the heart of the ratings.
- Managing User Creativity, Part Two - An examination of the nuts and bolts of RPGnet's Gaming Index database.
Related articles from this blog:
2005-12: Systems for Collective Choice 2005-12: Collective Choice: Rating Systems 2006-01: Collective Choice: Competitive Ranking Systems 2006-08: Using 5-Star Rating Systems 2007-01: Experimenting with Ratings
Related articles from Shannon Appelcline's Trials, Triumphs & Trivialities:
#196: Collective Choice: Ratings, Who Do You Trust? #198: Collective Choice: More Thoughts About Ratings
Posted on September 14, 2006 at 04:28 PM in Games, Social Software, User Interface, Web/Tech, Weblogs | Permalink | Comments (2) | TrackBack (1)
Dunbar Number Presentation at MeshForum 2006
Last May I did an abbreviated version of my Dunbar Number talk at MeshForum 2006. A MP3 podcast of that talk is now available at IT Conversations or can be downloaded here (10mb).
If you'd like to follow along, here is a pdf copy of my presentation sides (10mb).
Biggest addition to what I've written about before is some discussion about different kinds of social software and what what size groups they seem to be appropriate for.
Some other posts about the Dunbar Number and group size issues:
Posted on August 31, 2006 at 11:52 AM in Science, Social Software, Web/Tech | Permalink | Comments (0) | TrackBack (2)
Using 5-Star Rating Systems
In Collective Choice: Rating Systems I discuss ratings scales of various sorts, from eBay's 3-point scale to RPGnet's double 5-point scale, and BoardGame Geek's 10-point scale.
Of the various ratings scales, 5-point scales are probably the most common on the Internet. You can find them not just in my own RPGnet, but also on Amazon, Netflix, and iTunes, as well as many other sites and services. Unfortunately 5-point rating scales also face many challenges in their use, and different studies suggest different flaws with this particular methodology.
First, one study using Amazon data has shown that many undetailed ratings (where the rater isn't required to add any additional information other than the rating they select) show a bimodal distribution. In other words the distribution of ratings tends to cluster around two different numbers (e.g., 1 and 5) rather than offering a normal distribution where the ratings cluster around a single height (e.g., 3). Thus the median of these ratings is not an accurate reflection of product quality, but instead is a statement of conflicting opinions.
Second, our own study using RPGnet data has shown that many detailed ratings (where the rater does add additional information, in this case a full review) offer normal distributions, however it is biased toward the high end of the scale. On RPGnet, for example, we discovered that 90% of this 5-point rating system was 3 or higher with an average around 4.
Randy Farmer of Yahoo suggests that this scale limitation is particularly troublesome for fan-based ratings, such as those found on episodic TV sites:
Only the fans of a show evaluate the episodes, and being fans, will never rate an episode one or two stars, ever. I've seen this attempted over and over on the net with the same results every time: Each episode of a show is 4-stars +/- .5 stars. This goes all the way back to the Babylon-5 website, probably the first source for this kind of data.
(And indeed, the TV episode TKO, from Babylon 5's first season, is considered an entirely atrocious episode by even the fans. Yet it has a 6.1 of 10 "Fair" rating on tv.com.)
Thus even when a bimodal distribution is not a problem, on a 5-point scale the upward bias often results in only 2 or 3 meaningful data points. This is problematic because it minimizes differentiation. In many cases, a 5-star rating system where most of the ratings are either 3 or 4 is actually no better then just a thumbs-up/thumbs-down rating system.
However, given that 5-point scales are probably here to stay, we are forced to make the best use of them we can.
First, we need to provide raters with incentives, so that they provide meaningful ratings. We've already seen that this can be done by requesting detailed ratings: when a person takes the time to write text, and knows that his name will be attached to it, he generally does a better job in his rating. There are other possible incentives techniques as well, such as RPGnet's new XP System.
Second, we need to provide means for a 5-point scale to become more meaningful by encouraging raters to use not just the top half of the scale, but the bottom half as well. One method to accomplish this is to make ratings distinct -- as I briefly mentioned in my previous article on this topic -- and encourage standards so that an "average" rating is 2 or 3, not 4.
As an example of how to accomplish both of these goals with already existing 5-point rating scales, I've detailed my own experiences with using ratings on two popular services -- iTunes and Amazon. By providing myself with incentives and making my use of ratings very distinctive, I have created more meaningful and useful output for myself.
Music Ratings - iTunes
Apple's iTunes software offers you the ability to rate individual songs with a 0-5 Star rating. If you use iTunes with an iPod, you can change the rating of a song on your iPod and the change will be reflected in your iTunes database the next time you sync your iPod. The "Shuffle Songs" feature available on more modern iPods has an option to have songs with higher ratings be played more often. A very powerful feature, Smart Playlists, can dynamically create sophisticated playlists based on ratings. All of this makes rating music on iTunes very useful.
After Shannon and I wrote our Rating Systems article, I examined the ratings in my iTunes catalog. Using the Alastair's fabulous XLST iTunes rating statistics tool, I discovered that the ratings I created in iTunes clearly were biased overly high, matching the pattern we'd described. I had far too many songs rated with 4 Stars, and almost nothing rated 1 or 2. This made my ratings less useful.
| Here are some statistics from your iTunes Library: 4172 tracks, 412 (10%) rated | |||||
| Cumulative % of Rated | |||||
|---|---|---|---|---|---|
| Number | % of rated | Actual | Target | Shortfall | |
| Tracks rated 5 stars: | 112 | 27 | 27 | 5 | -22 |
| Tracks rated 4 stars: | 183 | 44 | 72 | 15 | -57 |
| Tracks rated 3 stars: | 92 | 22 | 94 | 50 | -44 |
| Tracks rated 2 stars: | 22 | 5 | 99 | 90 | -9 |
| Tracks rated 1 stars: | 3 | 1 | 100 | ||
So over the last few months I've completely revamped my iTunes ratings. Since I can't change the user interface, I've changed my behavior. I'm also taking advantage of two other fields: "checked" which I use to give more distinctiveness to my ratings, and "play count" which shows whether or not I've listened to something through to the end.
Here are the criteria I used:
Rated 5 - Exemplars
: Only my most favorite songs are rated 5. They have to meet the following criteria: they make me feel good or excite me no matter how often I listen to them, I can typically listen to them often without getting tired of them, and they are the best of their particular genre.
Rated 4 - Great
: There is only a small difference between a song that is rated 4 and 5 in my ratings -- typically it doesn't excite me or make me smile quite as much, or it isn't necessarily an exemplar of its genre. However, I still can typically listen to them often without getting tired of them. Items that are rated 4 and 5 are ones that I carry on my iPod Shuffle.
Rated 4 - Great (Unchecked)
: There are a few songs that I do consider to be great, but that I only want to play when I'm in the mood for them, or I want to only play in a specific order, or they "don't play well" with other music. For instance I love the song "The Highwayman" by Loreena McKennitt, however, it is over 10 minutes long and I just don't want to hear that type of song unless I'm in the mood for it. Other examples are the 12 songs that make up Mussorgsky's "Pictures at an Exhibition" -- I want them played in order when I do play them, and I really don't want them played in the middle of my other songs. Unfortunately, iTunes does not let you select only unchecked items, so I don't have a Smart Playlist for these; instead I keep them in a regular playlist.
Rated 3 - Good
: These are songs I like. Typically I can play them regularly but not too often. Songs rated 3-5 go on my iPod Nano.
Rated 3 - Good (Unchecked)
: There is a lot of music that I think is Good, but I don't want to play all the time. I have a large catalog of sound tracks from movies. All but a few of those tracks are in this category. Again, iTunes does not let you select only unchecked items in a Smart Playlist, so I have several regular playlists for these items.
Rated 2 - Ok
: I have very diverse musical tastes, starting with jazz, various ethnic and world music, and also including quite a bit of pop, rap, R&B, punk, and metal that I enjoy. I don't enjoy them all the time -- but I do like them to pop up every once in a while for variety. So I rate these 2 and leave them checked. I have an old 40GB iPod that I take on long trips, and it stores everything I have that is checked and rated 2-5.
Rated 2 - Ok (Unchecked)
: Some songs are OK, but I really have to be in the mood specifically for that song. Listening to Jimmy Buffet's "Margaritaville" can be a guilty pleasure on a lazy summer day at the beach, but it isn't something I want to regularly listen to. I have a number of special playlists for songs rated like this.
Rated 1 - Don't Like
: These are the songs that I don't like. They're just not my style. Many are still quality music, they just doesn't work for me. I do keep most of these for completeness -- it might just be one or two songs on the album, and I want to keep the album complete. Or I keep it in case my tastes change. But in general, once something is rate 1 Star, I'll probably never listen to it again.
Rated 1 - Trash (Unchecked)
: These are songs that not only do I not like, they just are not good music. I don't like most rap music, but I can tell that most are still quality. Some are junk -- these I rate 1 and uncheck, and are candidates for deletion the next time I purge my collection.
Unrated & Listened
, playcount > 0: If I've listened to something through to the end, but haven't rated it yet, it shows up in this Smart Playlist. Periodically I check this Smart Playlist, sort by playcount, and try to rate everything that I've listened to more then once.
Unrated & Unlistened
, play count=0: This is the default when a new song is added to my library. So any song that is unrated, checked, and has a play count of 0 shows up in my "Unrated & Unlistened" Smart Playlist. When I'm in the mood for variety, I go through this playlist and rate songs.
Modifying my rating system in this way has caused my average rating for music to change from around 4 to somewhere between 2 and 3. It will probably, over time, become closer to 2 as I rate more of my collection. This gives me a lot of distinctiveness so that I can create Smart Playlists that work well for me.
| Here are some statistics from your iTunes Library: 6519 tracks, 726 (11%) rated | |||||
| Cumulative % of Rated | |||||
|---|---|---|---|---|---|
| Number | % of rated | Actual | Target | Shortfall | |
| Tracks rated 5 stars: | 74 | 10 | 10 | 5 | -5 |
| Tracks rated 4 stars: | 144 | 20 | 30 | 15 | -15 |
| Tracks rated 3 stars: | 211 | 29 | 59 | 50 | -9 |
| Tracks rated 2 stars: | 270 | 37 | 96 | 90 | -6 |
| Tracks rated 1 stars: | 27 | 4 | 100 | ||
Obviously rating a large music collection can become a chore -- you don't want to spend your limited music listening time always fine tuning your ratings. So I have some approaches that make it easier for me to rate my music with less effort:
First, I sorted my catalog by my old ratings, and modified everything down by 1, Starting with everything rated 2 becoming 1, 3 becoming 2, etc. This gave me a good base to start with
Next I created Smart Playlists for each rating, i.e. "Rating 5 - Exemplar" with "Match only checked songs" and "Live updating" checked. I then added "Play Count" as a column to my view, and sorted by it. This gave me the songs that I played the most and least, and I adjusted some songs up and down accordingly.
Then I created a new Smart Playlist that simply plays songs rated 3 to 5, limiting the list to the first 100 GB selected by random (i.e. everything random), and saved this Smart Playlist as "Plays Well With Others". I play this on occasion in the background, and when I hear something that jars me I know something isn't rated right. Thus without a lot of effort I can change ratings for songs that no longer fit their rating, or uncheck items where the rating was appropriate but it "didn't play well with others".I try to be aware when I'm using my iPod of what a songs rating is, and change it if it seems wrong. The next time I sync the iPod my ratings will be adjusted in my iTunes catalog.
I also try to be aware of Play Count -- this number only goes up if you play a song to the end. So even if I'm not able to take a look at the rating (for instance when I'm in a car), I can at least forward to the next song. Periodically I review the play counts for songs that I've rated and consider moving them up and down accordingly. Of course, this means that I have to be careful and not let the iPod keep running when I'm not listening.
A tip for those of you that do put a lot of effort into your iTunes ratings: I've learned the hard way that unlike most song information, the rating is NOT stored in the song itself, so if your iTunes database gets corrupted, or you move your music to another server, you'll lose all your ratings. One way to avoid this is to periodically backup your ratings into a field that is stored in the song itself. I personally use the "Grouping" field as it is rarely used, select all songs with the same rating and click on "Get Info", and change the Grouping field to "My Rating: 5 Stars".
I only have 11% of my collection rated so far, but using this system I'm finding it a lot easier to manage my ratings. I'm already getting many benefits from it -- I'm playing my music more often, my iPods typically have the music I want on them, and various music discovery services can use my ratings to help me identify new music I might enjoy. This provides the incentive to keep me entering meaningful ratings.
Book Ratings - Amazon
Amazon also uses a 5-Star rating system, and your ratings can be used by Amazon to help you find books that you might like. Though I like to support my local bookstores, it is this feature that brings me back to Amazon time and again. Whenever I browse through Amazon and see a book I've already read I try to take the time to update my rating.
Amazon has a number of different tools to assist you in your ratings. If you are an Amazon customer, you can go to Improve Your Recommendations: Edit Items You Own and see all the books that you've purchased and quickly rate them with a nice AJAX interface. You can also review items that you've already rated, whether or not you own them, at Improve Your Recommendations: Edit Items You've Rated.
Amazon has also recently added a very nice web service called Your Media Library that can be used to help manage your media library of books, music, and dvds. I personally only have used it to manage my books and dvds, as I find rating albums useless -- it is songs that I prefer to rate.
After browsing through my ratings to date, I discovered the same flaws I found iTunes -- my ratings typically were too high; most were a 4. This is particularly encouraged by the popup when your cursor is over the Stars "1 - I hate it, 2 - I don't like it, 3 - It's Ok, 4 - I like it, and 5 - I love it". I suspect if I use the same trick that I use for iTunes of making a rating of 2 Stars mean "Ok" I could potentially cause the recommendation engine to be less effective (though it could possibly make it better, I don't know). So I am being much more brutal with my ratings and pushing many more down to 3, so that my ratings of 4 and 5 have more meaning.
5 Stars
: These have to be the exemplars -- the best books I've ever read, would be glad to read again, would be proud to show off on my best bookshelf, and will buy extra copies to give to friends.
4 Stars
: These have to be really good books -- most of them I'm willing to read again and I promote them by offering to loan them to my more discriminating friends. Although I may keep them on my bookshelf I'd rather give them to a friend then sell them at a used book store.
3 Stars
: These are books are decent books, and I do share them with my voracious reader friends. But I don't push them and I'm much more likely to sell them at a used bookstore then keep them on my shelf. This is the rating that I significantly underused previously, and I'm finding that the key discriminator for me so far is how much I feel like recommending this to friends who are more discriminating readers.
2 Stars
: This rating is where the Amazon rating system fails the most -- these are suppost to be books that "I don't like", however, most of the time I don't buy books that I probably wouldn't like, much less read them, so I have very few in this category. However, I've decided this category is for books that are just not quite good enough, or are slightly disappointing. Not bad, or disliked, but just somewhat disappointing.
1 Stars
: This is where I put the books that I don't like, or worse, I hate. Not many here, but I'm willing to risk more then many people are so I have some. Also books go here that just don't fit my interest, like romance novels that get recommended to me because I like some crossover fantasy-romance authors.
Since I started more accurately rating my books at Amazon, I've found that their suggestions for other books to read to be more accurate. Thus I am getting value from rating these books, and I have incentive to continue to make the effort.
Conclusion
Offering an incentive for people to rate is important for ratings of all sorts, with both individual gain and status recognition being powerful motivators.
However the easiest technique for making a 5-point rating scale more useful is to make it "distinct". If a user has a more specific meaning for each rating, ratings will slowly settle toward a truer average, and thus more of each rating scale will be used. We've also tried this technique recently on RPGnet, with our new Gaming Index; and thus far our new 10-point scale -- which has distinct meanings for each number -- is averaging 7.27. That's still a fair amount above the real average of 5.5, but at least it's below the 8+ rating that our old double 5-point scale resulted in.
Often you, as a consumer of rating systems, will be making use of rating scales designed by others, rather than those you're designing yourself. For those cases it often makes sense to design your own rules for what each number means, and to do so in such a way that your median is the average of the scale, rather than toward one of the extremes. When you do, even if you're using a tight 5-point scale you'll end up with enough differentiation for it to actually be more meaningful than a thumbs up or a thumbs down.
Related articles from this blog:
2005-12: Systems for Collective Choice 2005-12: Collective Choice: Rating Systems 2006-01: Collective Choice: Competitive Ranking Systems 2007-01: Experimenting with Ratings
Related articles from Shannon Appelcline's Trials, Triumphs & Trivialities:
#192: Managing User Creativity, Part One #193: Managing User Creativity, Part Two #196: Collective Choice: Ratings, Who Do You Trust? #198: Collective Choice: More Thoughts About Ratings
Posted on August 11, 2006 at 08:49 AM in Books, Music, Social Software, User Interface, Web/Tech | Permalink | Comments (6) | TrackBack (1)
BayChi Talk Next Tuesday
I will be speaking next Tuesday (July 11th) at the monthly meeting of BayCHI, the San Francisco Bay Area Chapter of ACM SIGCHI (Computer Human Interface Special Interest Group), along with Michael H. Goldhaber.
The synopsis of my topic is:
The Dunbar Number, Unstructured Trust, and Why Groups Don't Scale
We are relying increasingly on internet-mediated social software tools for our day-to-day interaction with other people. To design this type of software, we must better understand the psychology and social dynamics of individuals in groups. Awareness of what makes us human is now often as important to the success of the software as is understanding software architecture and code. One particular sociological factor, the Dunbar Number, is useful in understanding why groups don't scale at different group sizes. A deeper awareness of why groups have different behaviors, the nature of unstructured trust, and which current tools appear to work best at different scales, can give guidance to both the online facilitator and the social software designer.
Michael H. Goldhaber will be speaking on:
The Real Nature of the Emerging Attention Economy: Seen As a New Level in the Massively Multiplayer Game Known as Western Culture:
Think of the human world as a Massively Multiple Interactive Game (which it is). As interactions change and increase, we are passing to a new level, something that hasn’t really happened to the same depth for centuries. The rules, fundamental values, and just about everything else are diverging from what was familiar in the level characterized by the exchange of Money, the prevalence of Markets and the dominance of Industrial production of standardized goods (call this MMI). The new level also depends on human abilities and desires, but now what matters most is our strictly limited abilities to pay attention and our much greater (on average) desires to receive it. The full passage will take many decades, but we are already well along.
The location is at PARC's George E. Pake Auditorium in Palo Alto, and the talk starts at 7:30. There also an optional dinner beforehand nearby at 5:30.
I hope to see some of you there!
Posted on July 6, 2006 at 01:58 PM in Social Software | Permalink | Comments (1) | TrackBack (2)
Flames: Emotional Amplification of Text
I've been a moderator/host/forum leader for various bulletin boards and other online communities since the early 1980s; first on CompuServe, later on GEnie and AOL, and then professionally in the early days of Consensus Development. One of the behaviors that happens in online communities and that I rarely see elsewhere is flaming -- where one member writes an extremely inappropriate, typically passionately worded attack on another. Flaming behavior can hurt an online community.
It is commonly thought that flames occur because "there is very little proper policing done on the Internet" but I believe this to be false. Instead, I believe that it is the consequence of the medium primarily existing as text.
In fact, what you'll observe if you study individual flames is that they typically start as an escalation of emotion, only spiraling later into passionate and personal arguments. The only way to stop flames from destroying a community is to break this cycle.
So I've taught all my staffers over the years my ideas on what causes the cycle of flames, and how to avoid them. One particular piece of advice that I give is in regards to how emotions are amplified in the online text medium.
This happens for several different reasons:
Since text is lacking tonal and visual context, we have a tendency to over-interpret any emotional content that does exist (link to paper). In fact, we may have no better than a random chance of correctly interpreting the emotional tone of ironic vs sincere text in a message (link to Epley/Kruger paper).
In addition, we tend to respond to someone's emotional state by expressions of similar intensity (this phenomenon is known as Emotional Contagion). And the higher the level of intensity of our emotions, the less our ability to be empathetic (link to paper).
These tendencies lead into a vicious feedback cycle.
- One person starts with a very trivial or subtle emotional context, say irony.
- This is interpreted at a higher level of emotions, such as sarcasm.
- A reply is made at a similar level of emotion, for example being sarcastic.
- This, in turn, is interpreted at an even higher level of emotion, maybe a mild insult.
- In turn this is replied to at a similarly intense level.
- A flame is born!
Thus I now find that now there are certain words and phrases that I avoid using when responding to people online. I have to be very careful with irony and sarcasm, and when I use them I include symbols such as smilies to such give the emotional context that is missing from the text. I find that even the slightest hint of blame will be over-interpreted. I avoid the words "should" and "didn't", never tell someone that they forgot something, etc.
My online community staffers have found understanding this cycle an important tool in moderating the communities they lead.
See Also:
2005-07: Extrapolative Hostility in the Online Medium
(This is a update/rewrite of what I originally wrote in Dave Winer's UserLand discussion group back in September of 2000).
Posted on February 13, 2006 at 01:07 PM in Social Software, Web/Tech | Permalink | Comments (10) | TrackBack (3)
On Being an Angel
In the last month or so I've received a number of links to Life With Alacrity as a venture capital blog, and to myself as a venture capitalist.
However, I don't consider myself a venture capitalist. Instead, I am what is known as an "angel investor".
This week has also seen a new topic enter the blog zeitgeist: the topic of reforming or reinventing venture capital. This topic was initially raised by Dave Winer, followed by Robert Scoble, Doc Searls, Jeff Nolan, Michael Arrington, Thatedeguy and many more.
All types of venture investment -- seed, angel, venture, and institutional alike -- carry with it great risks and great rewards. But before we can reinvent venture capital and related venture funding methods like angel capital, we need to understand how it works.
So What is a Venture Capitalist?
A venture capitalist is a partner or associate in a venture capital management firm, which manages money on behalf of large institutional investors.
Basically, a large institutional investor (such as a pension fund or an insurance company) can statistically afford to invest a small part of their portfolio -- perhaps from 1% to as much as 5% -- in high-risk, long-term investments. If they lose the money outright, their other more stable investments have a good chance of making up the loss. But if the high risk investment does well, they can substantially improve their IRR (internal rate of return). To a certain extent they can't lose if they are careful. So these institutional investors invest in a number of types of high-risk funds, including such investments as venture capital funds.
A venture capital management company will manage one or more of these funds, investing in private companies. These VC management firms operate off of a management fee, from 2% to 3% of the capital invested to date. Thus all of the salaries for the staff of a VC management firm are paid, even if the investments are a failure. In addition, if any of the investments are successful, the VC management company earns 20% off of the top of the gain (called a "carry"), which is distributed to all of the full partners in the VC management firm, and sometimes a little of it to the associates.
It is the VC associates that do the brunt of the work for a VC management firm. They make a good salary, but the real return is if they are able to do well in identifying, managing, and selling new startups; then they are invited to become a partner the next time the VC management firm raises a fund. Then if the fund that they are a partner in does well, they can make a true fortune, or even start their own VC management firm.
However, the odds are against the VC associate. It's common wisdom that an associate can't easily manage more the 7 firms at a time. Other common wisdom says that 1 in 5 investments will survive to break even and that 1 in 20 will "make the fund", i.e. pay for all the losses in the other 19 investments. Some newer firms say 1 in 10, but I'll go with the older more conservative numbers. Thus associates are incentivized to try to manage more then 7 investments and to be smarter than their peers in the firm, so that at least one of their investments will be the 1 in 20 that makes the fund. This makes it easier for the associate to become a partner in the future, as at best 1 in 3 or 1 in 5 associates becomes a partner. Cutthroat competition between associates exists in some firms. This pressure often adds the perception that associates don't give enough attention to companies in their portfolio; they want their startups to do well, but the odds are it is another company that will make it, or a startup managed by peer associates. So they divide their attention. This is not unrelated to the Dunbar Triage problem.
Another problem that VC management firms face is the number of investments they are able to effectively handle. If there are 5-6 associates and 2-4 partners, there is probably a max of 50 investments that they have time to manage. If they are managing a $500 million dollar fund, that means that they have to invest at least $10M in a company, but in fact that is more likely to be $25M over time. If 1 in 20 makes the fund, that $25.0M has to give a return of $250M. Thus when entrepreneurs complain that VCs will not invest in their company, it is often because the VCs can't figure out how to invest a minimum of $25M and turn out at the end with $250M. A related problem is that a startup that might have a successful business model that could grow into a profitable $50M annual revenues will be encouraged to take a more risky route so that they can go public, which requires a minimum of $100-200M annual revenues.
There is a lot of variety in VC management firms; some VCs have smaller funds under management, others give their associates more of a share, others have different management fees or carry percentages, and most specialize in some way: either vertically in a particular field, or horizontally in a particular stage of investment. For instance, there are some VC firms known as mezzanine firms that only invest in your company right before they think it can go public.
This is the way most VC management firms work. Periodically a new VC management firms will explore and push the limits of the above boundary conditions, but the more edges they attempt, the more likely they will fail.
My Three Angels
So what is an angel investor? I learned a lot of what I know from the 3 angel investors that invested in my software startup, Consensus Development.
Gifford Pinchot -- Partner Angel
Gifford Pinchot, with his wife Libba, was my first angel investor in Consensus Development. We met at a Maxis meeting where Gifford had been asked to facilitate the formation of a new startup to create simulation software. At the end of the meeting we left frustrated with the results of the meeting, but Gifford liked what he heard about my broader vision. Gifford flew me to San Diego, where we walked the beach and discussed my vision for collaborative software. He liked what he heard, and later in the month flew me to his home in Connecticut, where I stayed for a month in a barn guest house near his home while we worked on our first business plan.
Gifford only invested a low 5 figures, which got me started. However, it wasn't his money that was his most valuable contribution -- it was his time. Over the years he probably put 5-10% of his into time as Chairman of Consensus Development working with me, talking to me, advising me, and coached me. When our first software effort, InfoLog (a folksonomy tagging program like del.icio.us that was a decade too early) failed, he didn't walk away and instead encouraged me to continue. I dug deeper into the problem, discovered that trust and security were a key obstacle, and created a profitable consulting business. But Gifford encouraged me when I said we were going to take the risk of dropping all of our profitable consulting and focusing on a product, SSL Plus. Later, when this company was being shopped around to various buyers, Gifford spent lots of time doing due diligence, and ultimately came on half-time as CEO so that I could concentrate on selling the business.
In the end, Gifford earned probably 7 figures on his initial 5 figure investment, close to a hundred-fold return on the dollars he invested. However, his real investment was the time he spent with me -- almost 10 years of never giving up.
Scott Loftesness -- Seed Angel
I met Scott Loftesness when he was the executive vice president at Visa International. We learned of each other through CompuServe, where we both were sysops in the 80s. I did some consulting for him at Visa in the groupware area over a couple of years and we grew to respect and trust each other. I came to him when I branched out from groupware consulting and began to include consulting on cryptographic security. I'd seen an opportunity--I had a potential contract from RSA Data Security to be a distributor of RSAREF--but in order to take advantage of this opportunity I needed some seed capital.
Scott invested over twice what Gifford invested, but still 5 figures. However, like Gifford, what I gained from my association with Scott was a lot more then the seed capital. He had a respected name in the industry -- a friend at Visa USA told me "Scott is where all innovation at Visa flows from." He joined my board of directors, supported our risky choice to drop all groupware and cryptographic consulting to focus on our SSL project, helped tremendously in doing due diligence on potential buyers, and was pivotal to the negotiations to close our final sale of Consensus Development.
In the end, Scott Loftesness also did quite well in his investment in Consensus Development. His involvement on the day-to-day operation of Consensus Development was significantly less, but he was always around to support and advise us when we needed him.
Jim Bidzos -- Hands-Off Angel
Jim Bidzos was the CEO of RSA Data Security, whose firm had a critical patent on almost all meaningful cryptographic security. Over the years I did a lot of consulting for him to support various projects like RSAREF in standards, to create client tools for their Certificate Services Division, and to help with the founding of Verisign.
One day I told Jim that RSAREF would never be successful in his goal of promoting the RSA algorithm in security standards as long as it could only be sold through RSA salespeople. They preferred to sell RSA's premiere toolkit, BSAFE. I somewhat jokingly proposed that maybe Consensus Development should sell it instead. To my surprise, he agreed.
A couple of years later I leveraged the fact that Consensus Development had the only RSA toolkit available other then RSA's own to get the contract to develop the reference implementation of SSL 3.0 for Netscape. I took this Netscape contract back to Jim and said that I needed some investment to make this successfully. He invested a middle six figures in Consensus Development in return for a percentage that was roughly equivalent to that of Gifford and Scott, but because of his involvement as CEO of RSA Data Security he could not be on our board of directors.
After this investment, Jim had very little to do with Consensus Development. In fact, he had spread his angel money so widely in the cryptographic security industry that he was also invested in a couple of our competitors. In the end his investment was worth roughly 10 times what he invested, but the cachet of being able to tell others that Jim Bidzos was an investor made Consensus Development much more "legitimate", which also added significant value to us.
Founding of Alacrity Ventures
After I left Certicom, the company that had purchased my firm, Consensus Development (see Bad Business of Fear for more info), I wondered what I should do next. I could theoretically retire if I abandoned the Bay Area, but I was not ready for that and I thought I had maybe enough capital to start one more business of my own instead. Under a non-compete from Certicom, I was not sure what type of non-cryptographic business I wanted to start. So I decided that one thing I could do was some angel investing. In part this was to make money, but a larger part of it was that I enjoyed working with entrepreneurs. I wanted to do for others what Gifford Pinchot had done for me.
I did some study about how venture economics works, how angels and venture capital firms invest, and became concerned. I saw that being an angel investor in many ways is much harder then being a venture capitalist.
One of the biggest challenges is that angels share all the problems of the institutional investor, of the VC management firm, and of the VC associate.
The first challenge is deciding how much to invest. The institutional investors only risk 1%-5% of their capital. If I limited myself to that amount I could maybe invest in a couple of companies. I decided I was still young and could risk investing more.
The second problem was no management fee -- unlike a VC firm, angels don't get a management fee to cover salaries, legal fees, other expenses.
The third problem was my time. Most angels still work for a living -- being an angel investor is part-time, a venture capitalist typically works full-time. If only 1 in 20 investments "make the fund", but I could at most manage 7 investments, that meant that I had a 2/3rd's chance of losing my entire investment. I might be able to argue that for some kinds of businesses I might more informed than the average VC, and thus might be able to make better choices, but not that much better.
The key, I decided, was to work with at least 2 other angel investors. That would theoretically allow us to invest in 21 companies, diversify our portfolios, and split the work. I approached my first angel investor, Gifford Pinchot, and he agreed to be one of the partners. The second was Harold Shattuck, who had done some due diligence and operations consulting for Consensus Development, and had been VC once before, but enjoyed being closer to the actual building of a new company with some operating interaction. I was the managing partner for files and accounting, but we all brought to the table our "deal flow", performed due diligence together, and worked closely with each other.
Lessons from Alacrity Ventures
Alacrity Ventures is over 6 years old, and I have learned many lessons from it.
First, I feel that we did a good job selecting our investments, during a time in which being an angel investor was very difficult. I discovered that Gifford, Harold and I were really good at due diligence; our differing skills, Gifford's in coaching and evaluating the management team, Harold's in operations and business models, and mine in technology truly complemented each other.
For a long time I could say that the good news was that that out of 13 investments, all but 1 were still in business. However, we were never able to invest in the 21 investments that we planned because we discovered a significant problem in angel investing: the VC.
The angel investor can only really afford to invest early on, as a seed investor, or in an early investment round such as series A. However, the firms we invested in needed more money along the way; in fact, almost all firms need money at more then one point. The venture climate at the time was such that the VCs required in their term sheets that previous investment rounds lose their liquidation preferences, and ultimately their investment.
Let me give a specific example -- we invested in a first round of an enterprise software company in 2000 that is still around today. In 2002 they needed more money, and because of the difficulty in getting VC investment, the lead VC insisted that the preferences from the previous rounds be removed, effectively making us common stock, unless we participated in this subsequent round. We reluctantly did invest some more, but because we don't have the funds that a VC has, we were only able to protect some of our preferred stock. A year and half later, the software company needed more money, and the VC did it again. This time, all our stock was converted to common. Now it is 2006, and the company might be acquired this year; however the VCs, because of their liquidation preferences, will get the first $65 million (or more). As I doubt the firm is worth more then $50M, we will not get anything, nor will any of the other founders that are no longer involved with the firm.
This has repeated itself over and over again. We made a decent choice and did our due diligence well, but subsequent VC investors have pushed us out. A few of our ventures have failed outright. That is understandable given our original 1 in 20 expectations. But what we didn't expect was how difficult it was going to be to participate in the upside. Yes, we had preferences in our early rounds that should have protected us, but they didn't.
So of our 13 investments, only 2 remain that may "make the fund": a very innovative high-tech titanium powder manufacturer ITT, and a high-tech manufacturer of ceramic devices Vapore. But even as these two investments survive, they are still vulnerable to requiring additional investment and possibly forcing us out.
Of the rest: one of our early investments sold to VeriSign at a 50% premium, our investment in Salon.com will give us a small return, MG Taylor paid off its loan, and Skotos may someday pay back its original investment. The other 8 are being written off as a loss.
Advice to Angels
So in spite of the odds, you still want to become an angel investor? Here is some advice...
Collaborate with other angels: Going it alone is dangerous -- there are a number of angel investor networks, such as Gathering of Angels, Band of Angels and others in listed the Directory of Angel-Investor Networks. Be careful, though, the enthusiasm of others can be contagious -- don't always go with the herd.
Do your own due diligence: I can't emphasize this enough. Talk to the entrepreneurs and meet their staff. Read their business plan and tear it apart. Find the hidden assumptions. Understand their business model. It needs to feel realistic. Try to get more eyes on the job: different people see different things. Don't follow others; they may have different investment criteria then your own.
Be an advisor first: Be an advisor first -- if the entrepreneurs don't listen to your advice, don't invest. If you have to invest to become an advisor, invest only a small amount, or have part of the money be contingent on a meaningful goal.
Guard your upside: When negotiating terms, don't worry about the downside. It is the VCs that need items on the term sheet for when things go wrong -- what you need to guard is for when things go right. Watch for changes in the executive staff -- they may be incentivized differently than you are.
Consider a secured loan: Somewhat contrary to the "guard your upside" advice, rather then investing only in stock, consider investing via a secured loan as well. The security can not only be on hard company assets, but intellectual property such as copyrights, trademarks or patents. Your return will be lower on the loan, but if you can get all of your investment back early and get a small percentage of the company, it can be a good way to balance risk. Just remember to file the property documents to make sure that the assets are properly secured, and be prepared that someday you may own that asset.
Save $2 for every $1: Almost every company you invest in, even if successful, will need additional funding. Make sure that you keep on hand $2 for every $1 initially invested. This will also help you from being squeezed out by later VC investors.
Invest in acquisition targets: Let the VCs take companies public -- the companies that you should be interested are the companies that will eventually be acquired. Creating an acquisition target requires the management to think differently -- coach them to do so.
Understand the founders dilemma: There are many founders dilemmas, however, one is particularly important to the angel investor. A founder may be incentivized to sell sooner then his early investors. Remember that most often, the only significant asset that a founder has is his company. If the founder has an opportunity to sell early and buy a house, he might, even if it may not be enough return on investment for the risk that the angel took. Find ways to keep your interest aligned with that of the founders, which may include even buying some stock directly from the founder.
Consider alternative exits: There are lots of boutique opportunities that are too small for VCs. I know of a local Berkeley software company that was number one in their market, but too small to go public. They had $20M in annual revenues, and profits of almost $10M, but little opportunity for growth -- early investors could have gotten their money back in dividends rather than sale of the company.
Time the cycle: We didn't invest at the ideal time for the angel investor. We picked well considering the times, but had we waited for a few years it would have been easier. Not to say that timing is everything; we'd have lost our titanium powder opportunity if we'd waited for better market timing.
Respect people: Treat the people you invest in like a paying client. Respect their time and concerns.
Be prepared that the plan will change: I've never been involved with a business where the business plan doesn't significantly change. As an angel investor you need to help your businesses to plan for those changes.
Advice to Entrepreneurs
So you want investment from an angel investor? Some advice...
Recognize the odds: The angel investor is taking a substantial risk investing in your company -- you need to be able show a scenario where the investor might be able to make 10x or 20x their investment. So if you are looking for $100K, you need to show how the angel can ultimately have stock worth $1M to $2M.
Consider their advice: Angel investors may not always be right, but show them that you are listening. If you use angels for more then just a source of money, you'll get a lot more value from them.
Draft your business plan: An angel investor does not need as complete a business plan as a VC does, but they need to see how you think. You should clearly identify what the product or service is, who is going to buy it, what is the marketplace that those buyers may find it in, what differentiates your product or service and why your team is good enough to deliver. Angel investors know that your plan will change, probably drastically, but if they understand your thinking process they can be more confident that your company will survive change.
It takes time: Don't count on the money from an angel investor (or any investor) until you get the check. Investors are always selecting from a number of choices, often very competitive choices. No matter how optimistic you are, it is likely it will take 6 months or likely more to raise angel money.
Team with Many Hats: Angel investors don't recruit new team members for you. You don't necessarily have to have your whole team in place, but there at least needs to be someone who has experience managing, someone with development experience, someone with marketing experience, and someone with sales experience. Whatever team is there, they need to be able to juggle all of those hats. Financial, HR, and administrative positions can all be part-time or farmed out.
Advice to Venture Capitalists
Value the angel investor: The angel investor serves a point in the marketplace that you are not able to serve. Rather then driving them out, find some way for them to continue to participate so that they can find other ventures for you.
Angels are not VCs: The angel investor can't afford to invest in later rounds -- their model is different than yours. It may make sense to force participation in subsequent rounds by other VCs, but carve out some room for angels.
The future of Alacrity Ventures
Though I've enjoyed some aspects of being an angel investor, I enjoy working with creative people to innovate new products more. I expect to spend most of my time in the next few years continuing to explore social software and collaboration tools, and the new product opportunities that may evolve from them.
Thus I expect that any future angel investments I make will be more along the lines of Gifford's style of investment in Consensus Development: a small investment of money and a large investment of time. Harold and Gifford both feel the same way. Currently we plan to continue monitoring our existing investments, but don't plan any new investments unless we can take a more active role in the firm -- for instance Harold is a board member in Vapore.
Gifford is now dedicating his life to building a better world by transforming business education. He is a co-founder and President of the Bainbridge Graduate Institute, which provides an MBA program integrating sustainability, green economics, the internet, and open source within a traditional MBA program. As an open source school, he helps other schools to use BGI’s curriculum. Check out his blog entry on Angel Philanthropy.
If there's one thing we've learned from six years of angel investing, one thing that may be more valuable than all the nuts and bolts I describe here, it's that Gifford Pinchot's partner-style of angel investment is what suits our investing style, not Jim Bidzos' style of hands-off angel investing, and that's a lesson that we're going to carry forward with Alacrity Ventures.
Posted on January 31, 2006 at 06:17 PM in Business, Social Software, Web/Tech | Permalink | Comments (13) | TrackBack (6)



