Tuesday 22 August 2017

CollabPRES: Local news for an Internet age


(This is an essay I wrote on December 30th, 2005; it's a dozen years old. Please bear this in mind when reading this; things on the Internet do change awfully fast. I'm republishing it now because it contains a lot of ideas I want to develop over the next few weeks)

The slow death of newsprint

Local newspapers have always depended heavily on members of the community, largely unpaid, writing content. As advertising increasingly migrates to other media and the economic environment for local newspapers gets tighter, this dependency on volunteer contributors can only grow.

At the same time, major costs on local papers are printing and distribution. In the long run, local news must move to some form of electronic delivery; but for the present, a significant proportion of the readership is aging and technology-averse, and will continue to prefer flattened dead trees.
Approaches to local Internet news sites

I've been building systems to publish local news on the Internet for six years now. In that time, most local news media have developed some form of Internet presence. Almost without exception, these Internet news sites have been modelled (as the ones I've written have) on the traditional model of a local newspaper or magazine: an editor and some journalists have written all the content, and presented it, static and unalterable, before the waiting public.

I've become less and less convinced by this model. Newer Internet systems in other areas than local news have been exploiting the technology in much more interesting ways. And reading recent essays by other people involved in this game whom I respect, it seems they're thinking the same. So how can we improve this?

PRES

PRES is a web application I built six years ago to serve small news sites. It's reasonably good, stable and reliable, but there's nothing particularly special about PRES, and it stands here merely as an example of a software system for driving a relatively conventional Internet news site. It's just a little bit more sophisticated than a 'blog' engine.

PRES revolves around the notion of an editor – who can approve stories; authors – who can contribute stories; subscribers – who are automatically alerted, by email, to new stories in their areas of interest, and who may be able to contribute responses or comments to stories; and general readers,who simply read the stories on the Web. It organises stories into a hierarchy of 'categories' each of which may have different presentation. Within each category one article may be nominated by the editor as the 'lead article', always appearing at the top of the category page. Other articles are listed in chronological order, the most recent first. Only eight 'non-lead' stories are by default shown in any category, so articles 'age out' of the easily navigated part of the website automatically as new articles are added.

PRES also offers flexible presentation and does a number of other useful news-site related things such as automatically generating syndication feeds, automatically integrating Google advertising, and providing NITF (News Industry Text Format) versions of all articles; but all in all it's nothing very special, and it's included here mainly to illustrate one model of providing news.

This model is 'top down'. The editor – a special role - determines what is news, by approving stories. The authors – another special role – collect and write the news. The rest of the community are merely consumers. The problem with this approach is that it requires a significant commitment of time from the editor, particularly, and the authors; and that it isn't particularly sensitive to user interest.

WIKI

A 'wiki' is a collaborative user edited web site. In a classic wiki there are no special roles; every reader is equal and has equal power to edit any part of the web site. This model has been astoundingly successful; very large current wiki projects include Wikipedia, which, at the time of writing after only five years now has over 2 million articles in over 100 languages. The general quality of these articles is very high, and I have come to use Wikipedia to get a first overview about subjects of which I know little. It is, by far, more comprehensive and more up to date than any print encyclopedia; and it is so precisely because it makes it so easy for people who are knowledgeable about a particular subject to contribute both original articles and corrections to existing articles.

However, as with all systems, in this strength is its weakness. Wikipedia allows anyone to contribute; it allows anyone to contribute anonymously, or simply by creating an account; and in common with many other web sites it has no means of verifying the identity of the person behind any account. It treats all edits as being equal (with, very exceptionally, administrative overrides). Wikipedia depends, then, on the principle that people are by and large well motivated and honest; and given that most people are by and large well motivated and honest, this works reasonably well.

But some people are not well motivated or honest, and Wikipedia is very vulnerable to malice, sabotage and vandalism, and copes poorly with controversial topics. Particular cases involve the malicious posting of information the author knows to be untrue. In May of this year an anonymous user, since identified, edited an article about an elderly and respected American journalist to suggest that he had been involved in the assassinations of John F and Robert Kennedy. This went uncorrected for several months, and led to substantial controversy about Wikipedia in the US.

Similarly, many articles concern things about which people hold sharply different beliefs. A problem with this is that two groups of editors, with different beliefs, persistently change the article, see-sawing it between two very different texts. Wikipedia's response to this is to lock such articles, and to head them with a warning, 'the neutrality of this article is disputed'. An example here at the time of writing is the article about Abu Bakr, viewed by Sunni Muslims as the legitimate leader of the faithful after the death of Mohamed, and by Shia Muslims as a usurper.

However, these problems are not insurmountable, and, indeed, Wikipedia seems to be coping with them very well by developing its own etiquette and rules of civil society.

Finally, using a WIKI is a little intimidating for the newcomer, since special formatting of the text is needed to make (e.g.) links work.

The Wiki model is beginning to be applied to news, not least by Wikipedia's sister project WikiNews. This doesn't seem to me yet to be working well (see 'Interview with Jimmy Wales' in further reading). Problems include that most contributors are reporting, second hand, information they have gleaned from other news sources; and that attempting to produce one global user-contributed news system is out of scale with the level of commitment yet available, and with the organising capabilities of the existing software.

This doesn't mean, of course, that a local NewsWiki could not be successful; indeed, I believe it could. Local news is by definition what's going on around people; local people are the first-hand sources. And it takes only a relatively small commitment from a relatively small group of people to put local news together.

Karma and Webs of Trust

The problem of who to trust as a contributor is, of course, not unique to a wiki; on the contrary it has been best tackled so far, I believe, by the discussion system Slashcode, developed to power the Slashdot.org discussion site. Slashcode introduces two mechanisms to scoring user-contributed content which are potentially useful to a local news system. The first is 'karma', a general score of the quality of a user's contributions. Trusted users (i.e., to a first approximation, those with high 'karma') are, from time to time, given 'moderation points'. They can spend these points by 'moderating' contributions – marking them as more, or less valuable. The author of a contribution that is marked as valuable is given an increment to his or her karma; the author of a contribution marked down, loses karma. When a new contribution is posted its initial score depends on the karma of its author. This automatic calculation of 'karma' is of course not essential to a karma based system. Karma points could simply be awarded by an administrative process; but it illustrates that automatic karma is possible.

The other mechanism is the 'web of trust'. Slashcode's implementation of the web of trust idea is fairly simple and basic: any user can make any other user a 'friend' or a 'foe', and can decide to modify the scores of contributions by friends, friends of friends, foes, and foes of friends. For example, I modify contributions by my 'friends' +3, which tends to bring them to the top of the listing so I'm likely to see them. I modify contributions by 'friends of friends' by +1, so I'm slightly more likely to see them. I modify contributions by 'foes' by -3, so I'm quite unlikely to see them.

Slashdot's web of trust, of course, only operates if a user elects to trust other people, only operates at two steps remove (i.e. friends and friends of friends, but not friends of friends of friends) and is not additive (i.e. If you're a friend of three friends of mine, you aren't any more trusted than if you're a friend of only one friend of mine). Also, I can't qualify the strength of my trust for another user: I can't either say “I trust Andrew 100%, but Bill only 70%”, or say “I trust Andrew 100% when he's talking about agriculture, but only 10% when he's talking about rural transport”.

So to take these issues in turn. There's no reason why there should not be a 'default' web of trust, either maintained by an administrator or maintained automatically. And similarly, there's no reason why an individual's trust relationships should not be maintained at least semi-automatically.

Secondly, trust relationships can be subject specific, and thus webs of trust can be subject specific. If Andrew is highly trusted on agriculture, and Andrew trusts Bill highly on agriculture, then it's highly likely that Bill is trustworthy on agriculture. But if  Andrew is highly trusted on agriculture, and Andrew trusts Bill highly on cars, it doesn't necessarily imply that Bill is to be trusted on cars. If a news site is divided into subject specific sections  (as most are) it makes sense that the subjects for the trust relationships should be the same as for the sections.

What is news?

So what is news? News is what is true, current and interesting. Specifically it is what is interesting to your readers. Thus it is possible to tune the selection of content in a news-site by treating page reads as voting, and giving more frequently read articles more priority (or, a slightly more sophisticated variant of the same idea, give pages a score based on a formula computed§ from the number of reads and a 'rate this page' scoring input).

The problem with a simple voting algorithm is that if you prioritise your front page (or subcategory – 'inner' pages) by reads, then your top story is simply your most read story, and top stories will tend to lock in (since they are what a casual reader sees first). There has to be some mechanism to attract very new stories to the attention of readers, so that they can start to be voted on. And there has to be some mechanism to value more recent reads higher than older ones.

So your front page needs to comprise an ordered list of your currently highest scoring articles, and a list of your 'breaking news' articles – the most recently added to your system. How do you determine an ordering for these recent stories?

They could simply be the most recent N articles submitted. However, there is a risk that the system could be 'spammed' by one contributor submitting large numbers of essentially similar articles. Since without sophisticated text analysis it is difficult to automatically determine whether articles are 'essentially similar' it might be reasonable to suggest that only a highly trusted contributors should be able to have more than one article in the 'breaking news' section at a time.

The next issue is, who should be able to contribute to, and who edit, stories. The wiki  experience suggests that the answer to both these things should be 'everyone', with the possible proviso that, to prevent sabotage and vandalism you should probably require that users identify themselves to the system before being allowed to contribute.

As Robin Miller says
“No matter how much I or any other reporter or editor may know about a subject, some of the readers know more. What's more, if you give those readers an easy way to contribute their knowledge to a story, they will.”
Consequently, creating and editing new stories should be easy, and available to everyone. Particularly with important, breaking stories, new information may be becoming available all the time, and some new information will become available to people who are not yet trusted contributors. How, then, do you prevent a less well informed but highly opinionated contributor overwriting an article by a highly trusted one?

'Show newer, less trusted versions'

In wikis it is normal to hold all the revisions of an article. What is novel in what I am suggesting here is that rather than by default showing the newest revision of an article, as wikis typically do, by default the system should show the newest revision by the most trusted contributor according to the web of trust of the reader for the subject area the article is in, if (s)he has one, or else according to the default web of trust for that subject. If there are newer revisions in the system, a link should be shown entitled 'show newer, less trusted versions'. Also, when a new revision if a story is added to the system, email should be automatically sent to the most trusted previous contributor to the article according to the default web of trust, and to the sub-editor of the section if there is one, or else to the contributor(s) most trusted in that section.

All this means that casual users will always see the most trusted information, but that less casual users will be able to see breaking, not yet trusted edits, and that expert contributors will be alerted to new information so that they can (if they choose) 'endorse' the new revisions and thus make them trusted.

Maintaining the Web of Trust

Whenever a contributor endorses the contribution of another contributor that's a strong indication of trust. Of course, you may think that a particular contribution is valuable without thinking that its author is generally reliable. So your trust for another contributor should not simply be a measure of your recent endorsement of their work. Furthermore we need to provide simple mechanisms for people who are not highly ranked contributors to maintain their own personal web of trust.

Fortunately, if we're already thinking of a 'rate this page' control, HTML gives us the rather neat but rarely used image control, a rectangular image which returns the X, Y co-ordinates of where it was clicked. This could easily be used to construct a one-click control which scores 'more trusted/less trusted' on one axis, and 'more interesting/less interesting' on the other.

Design of CollabPRES

CollabPRES is a proposal for a completely new version of PRES with some of the features of a WIKI and an advanced web of trust system. While there will still be a privileged role – an Administrator will be able to create and manage categories (sections) and will be able to remove articles and to remove privileges from other users in exceptional circumstances. An article will not exist as a record in itself but as a collection of revisions. Each revision will be tagged with its creator (a contributor) and with an arbitrary number of endorsers (also contributors). In order to submit or edit an article, or to record an opinion of the trustworthiness of an article, a contributor must first log in and identify themselves to the system. contributors will not be first class users authenticated against the RDBMS but second class users authenticated against the application. There will probably not be a threaded discussion system, as, seeing the article itself is editable, a separate mechanism seems unnecessary.

Whether contributors are by default allowed to upload photographs will be an administrative decision for the site administrator. Where contributors are not by default permitted to upload images, the administrator will be able to grant that privilege to particular contributors.

In order to make it easier for unsophisticated users to add and edit stories, it will be possible to upload a pre-prepared text, HTML, OpenOffice, or (ideally, if possible) MSWord file as an alternative to editing text in an HTML textarea control
.
To be successful, CollabPRES must have means of integrating both local and national advertising into the output. At present this paper does not address that need.

Finally, there must still be an interaction between the website and the printed page, because many of the consumers of local news still want hard copy, and will do for at least some years to come.

Whereas in most current local papers the website is at best an adjunct to the printed paper, CollabPRES turns that on its head by generating the layout of the printed paper automatically from the content currently on the website. At some point on press day, the system will generate, using XSL to transform CollabPRES's native XML formats to either postscript, PDF, or whatever SGML format the paper's desktop publishing software uses, the full content of the paper in a ready to print format, ready to be printed to film and exposed onto the litho plate. If the transform has been set up correctly to the paper's house style, there should be no need for any human intervention at all.

Obviously editors may not want to be muscled out of this process and may still want to have the option of some final manual adjustment of layout; but that should no longer be the role of the editor of a local paper in a CollabPRES world. Rather, the role of the editor must be to go out and recruit, encourage and advise volunteer contributors, cover (or employ reporters to cover) those stories which no volunteers are interested in, and monitor the quality of contributions to the system, being the contributor of last resort, automatically 100% trusted, who may tidy up any article.

CollabPRES and the local news enterprise

Technology is not a business plan. Technology is just technology. But technology can support a business plan. Local news media need two things, now. They need to lower their costs. And they need to engage their communities. CollabPRES is designed to support these needs. It provides a mechanism for offloading much of the gathering and authoring of news to community volunteers. It automates much of the editing and prioritisation of news. But it implies a whole new way of working for people in the industry, and the issue of streamlining the flow of advertising from the locality and from national campaigns into the system still needs to be addressed.

Inspiration


  1. PRES - of historical interest only, now.
  2. Wikipedia 
  3. WikiNews (see also interview with Jimmy Wales, founder of WikiMedia)
  4. Robin Miller's essay 'A Recipe for Newspaper Survival in the Internet Age

No comments:

Creative Commons Licence
The fool on the hill by Simon Brooke is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License