Dan Heller's Photography Business Blog Industry analysis from www.danheller.com

The photography world -- the business, the culture, the art, the politics, the technology.

Site Feed

Subscribe to
Posts [Atom]

My Photo
Location: Santa Cruz, California, United States
My Books on the
Photography Business

Thursday, May 03, 2007

Keywording Follow-up

In my last posting on the role that photo keywords will play in the future of stock photography, I've had a few emails that warranted some further discussion. One was from a photographer who shoots for fotolia:

On [fotolia], photographers need to list their keywords in order of pertinence. The most relevant keyword at the top and the least at the bottom.

Top and bottom of what? The photo's IPTC keyword set? That set is not defined to have an "order", so all the keywords you put there are subject to re-ordering by any application that happens to read (or, more importantly) re-write that data. If you happen to use a keywording application that preserves the order in which you place keywords, this is an anomaly that isn't guaranteed to persist in the photo.

Once again, the assumption is IPTC data. If an agency were to have you use another app (proprietary) that's writing these keywords somewhere else, then it brings into question the usefulness of such an app. The value of keywording images is only preserved if they stay with your images.

Regardless of how Fololia is doing this, their policy of "ordering keywords" is either a serious design flaw, or it's part of a broader method for "locking-in" users to keep them from going to other agencies. Such tactics are actually being employed by a number of microstocks, in their efforts to give disincentives to photographers to leave. (Another tactic is to have photographers use the agency's version of a model release, which makes it impractical for photographers to use multiple agencies, at least for those photos.)

Back to keywording... the point being that many agencies are already grappling with the problem of optimizing search results for users, and in so doing, are trying to come up with ways to use existing standards or resources to solve the problem. That's fine, if it can work, but as we've seen, it's hard to resist the tendency to be proprietary in an effort to keep photographers from leaving, and that's where things go awry for everyone. But, the worse problem is, that if no one comes up with a universal solution, the problem will perpetuate to the point where keywording itself will devolve to uselessness. to explain how that can happen, we have this excerpt:

I'm guilty of hitting a Thesaurus from time to time to get massive amounts of words but I've since learned that since my 2,000 images can easily be lost in a sea of millions of photo's I needed to be very careful about how I approached it. My main plan is to come up with one major keyword and blast it. I put the main word as the title, the subject and my top keyword.

This is a prime example of a site giving incentive for photographers to manipulate the system in their own favor, which is what will keep the de facto usages from ever evolving, and for the system to break down entirely. First, even with the best intentions, not everyone who keywords their photos will be consistent on their use of words like "future" or "moody" or "sexy" for any given image. Second, it doesn't take long for anyone who submits a photo of a woman to see their image come up more often when "sexy" is in the title than when it isn't. Without a central policy (and policing mechanism to ensure policy compliance), the quality of search results go down across the board as more and more users "cheat." As less honorable people learn the same methods for rising higher in the search results, they too will do the same with an even broader wordset, resulting in more and more results for them, but also a degrading quality of search results even more. This perpetuates until finally there are no real useful keywords in images anymore, and the usefulness of the site diminishes.

This is exactly what happened with traditional search engines--which is why they stopped looking at web page's keywords metadata tag. They became too unreliable as an indicator of what a page's content "is." At least their backup plan was to parse the text in the document to derive meaning. That's not possible with images.

Here we see one of the most basic examples of game theory: competitors are given incentives and disincentives to either cooperate or compete, based on weighing the risks and benefits of their actions. In this context, "cooperation" is defined as assigning keywords in ways that allow for the search engine to perform well (give the user quality results). If the system isn't designed to encourage cooperation rather than manipulation, the system is designed poorly. And that's the inherent problem with the current keywording problem and the way microstocks are set up. Photographers' only incentive is to do whatever they can to make their images rise to the top. This means that there is absolutely no benefit to "properly" keywording images. Ever. And as photographers continue to rush in, the problem gets less manageable all the time. This will eventually degrade search results in a way that will either turn buyers away, or cause less incentive for quality photographers to stay, or both. Because the era of microstock agencies is so new, the marketplace has yet to catch up to it. So, there's still time to fix this problem.

(For economics students reading this, this is exactly why a "totally free-market model" doesn't work. There must be policies to ensure fair competition, and those policies must be uniformly and reliably enforced. Of course, it's not so easy to do it in practice, but that doesn't mean you don't try.)

There are only two ways to address this problem. One is the method employed by Getty and Corbis, et al., who manage keywording entirely in-house. Though the photographer may submit images with his initial keyword selections, those keywords are stripped out and placed into the site's central databank. The team responsible for keywording, reviews and revises all photos and their keywords, orders and categorizes them, and places the results into a hierarchical database store that's far more intelligent than a flat storage mechanism such as IPTC. So long as the searching mechanism looks in that centralized file--and does so intelligently--there's nothing inherently wrong with this approach. It's only "cost" is that it requires a staff to manage the process. this isn't a big cost, but if you're a microstock house changing $1/image, this not only may be very costly, but your resources for growth are severely curtailed.

To this, I allude to my previous posting, where I called for an extension to the standard for how keywords are represented so they can preserve hierarchical representation. e.g., "people:men:old" is a SINGLE keyword with hierarchy built-in. Here, at least, much of the manpower necessary to handle what is currently managed by a "staff" can be put back to the photographer, or to the contributing community at large.

The other way to address the problem is to change the incentive/reward model, starting with a realistic policy on keywording, and backing it up with a rule-based intelligent search mechanism (as outlined in my previous blog posting). Under such a system, those who keyword the best and most effectively will make more sales, and those who try to skirt around the system (overly keyword, or mis-keyword) will eventually drip down the results list. As the site's overall search quality rises, so does the business, and along with it more photographers, and even more incentive to cooperate.

While this model still puts the task of keywording into the hands of the photographer, not a centralized staff, the problem can't be entirely eliminated, but it's mitigated significantly by a more realistic cost/benefit system. To alleviate the remaining liability of leaving the keywording task solely in the hands of the photographer, I come to some feedback I got from others.

One microstock agency told me they were going to similarly crowd-source their photographers to keyword everyone else's photos too. The theory being similar to that of the wiki model, where the aggregate of input and opinions generally yields quality results.

While this model works for sites like wikipedia, the difference here is that the "contributor" has incentive to bump up their own quality and to degrade others. As long as that incentive exists, the system's integrity is unstable. (Imagine how the quality of wikipedia entries would degrade if contributors were paid on whether their entries were the ones defined, rather than those of the previous author for the same entry.)

Still, while the idea of crowdsourcing the keywording task may have its problems, the idea itself has some merit. For example, providing the mechanism for non-contributors to do the work may merit some thought. Somehow providing incentive for non-photographers to come to a photo site to keyword photos is a marketing challenge, but it's only the biggest of a very small series of hurdles to clear. I say this with the odd realization and surprise on how millions of people are willing to surf sites that are even less interesting, are more boring, with even less to do. In other words, nothing surprises me, so it wouldn't be a stretch to assume a good, creative marketing person could come up with a way to get people to keyword images in a way that doesn't give them incentive to make one photographers' images rank better than another's.

During one brainstorm session that went far too long into the night, I actually had a weird idea: make a multi-player game out of keywording images. The ability to gain experience points, and the opportunity to trade or monetize those points is a proven model among millions of otherwise intelligent (but bored) brains. Quality and quantity rises to the top, photos getting keywords based on the aggregate assignments weighted by experience points and other merits. The "pay" would be based on accomplishing tiers in experience points and other incentives and merits. It's almost like an old style boardwalk arcade of skiball, or at a carnival, where you get tickets for winning at games, and you turn them in for prizes. More and more experienced users would oversee the newcomers, and so on.

Ok, maybe it's a tad far-fetched, but other far-fetched ideas include the PC, the Internet, car rentals at airports, using oil instead of steam to power vehicles... Oh, and the printing press. Johannes Gutenberg almost went out of business because no one thought anyone would ever want a "printed book." With the web these days, that prediction might turn out to be true in the end after all. But it doesn't mean there wasn't economic opportunity to be had in the interim 560 years.