Tag Archives: information-architecture

Academic Papers folksonomies information-retrieval information design knowledge-management Knowledge-Organization-Systems metadata Papers pick-lists social-bookmarking Taxonomies Usability user-centered design Writing

Notes on “Creating a Controlled Vocabulary”

Creating a Controlled Vocabulary

 Fast, Karl, Fred Leise and Mike Steckel (2003)

 

This was a good rundown of the general process of creating a controlled vocabulary, but a lot of this seems pretty apparent to me. I guess I shouldn’t assume that this stuff is obvious, though, given how many companies make web sites or intranets without really bothering to find out how their users use vocabulary for their domain, or even establishing a vocabulary, for that matter.

The two most important points, to me, are number 5, “Establish a record of the rules you are using if you are creating a large thesaurus” and number 8, “Go back and refine. What can be improved?” In fact I think the whole notion of controlled vocabulary is misguided if there’s no clear rationale for it and attempts to update and maintain the terms at all times. Language in any field is constantly changing, and the pace of change is always accelerating. Anyone who was building a directory of Internet services would have left off the World Wide Web in 1989, and any list about self-publishing on the web would probably have left off the term “blog” in 1998. How useful would those pick lists be today?

Controlled vocabulary can be damaging if there’s no mechanism for change, or that mechanism is left unused. I don’t know why, but humanity seems to have some undying urge to compile things around ourselves into grand lists and hierarchies that are supposed to encompass all of what is or ever has been, ignoring our complete ignorance of what the future will bring. It’s not that classification in and of itself is bad, it’s that there’s a tendency to get to the “end” and say, “there, it’s done, and set in stone forever.”

 

 

 

Notes on “Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files”

Systems of Knowledge Organization for Digital Libraries:

Beyond Traditional Authority Files

(G Hodge – 2000)

One thing I liked was this definition:

“A KOS serves as a bridge between the user’s information need and the material in the collection. With it, the user should be able to identify an object of interest without prior knowledge of its existence.”

 

I like the notion that a KOS helps users find resources they’re not even aware of. I think that’s an important goal.

 

An impression I get from a lot of LIS people is a mild disdain for the web. Obviously the web is in many ways unstructured and can be difficult to use in ways that library systems are not. At one point the article states that “Someone recently compared the Web with a large room filled with books that were scattered all over the floor.”

 

The description above is an example of the kind of lame metaphors this disdain fosters. If the web is a large room filled with books, it is the largest room that has ever existed; the vast majority of books are available virtually for free; and although they are scattered all over the floor, thousands of people will freely provide you with maps to find books on certain subjects, and everyone is provided with magical binoculars that let them see deep inside books and find a single phrase.

 

I’m not saying that bringing better standards to the web any devising better KOSs to organize web resources is bad, just that it seems like many LIS people take the existence of the web for granted.

 

One thing mentioned throughout this article is the high cost of indexing and cataloging or merging different cataloging schemes together. I think the costs may be exaggerated in some ways. For example, if you wish to catalog web resources for educators and for medical professionals, two groups that probably have different terminology for similar concepts, you don’t need to pay thousands of grad students to index everything under one, then the other scheme. Instead develop a mapping system that translates between the two types of terminology. The mapping system would be a big project and have to be very robust, but once it’s built it can run behind the scenes when anyone does any kind of searching. The article mentions cases where this has been done (with MESH terms, for example) but insists that it is a high-cost venture.

 

Similarly, what’s wrong with using the users of the indexing system as the workforce? Logs of search terms and phrases and how they are used together can be analyzed. Users can be tracked to see which titles or abstracts they click on when searching for certain terms, how long they spend at that resources, etc. Users can even be asked to rate resources and search results. If you are in the market for a hard drive or digital camera, I recommend you go to bizrate.com, pricegrabber.com, or any of a dozen services that allow users to rate both products and merchants, making it easy to find a good LCD monitor at a reputable dealer despite the massive anonymity of the Internet and the ease of creating fly-by-night stores or selling junk merchandise online. Something similar could be done to winnow out junk information and organize information resources.