Learning Metonymy | lessons from emerson’s school

March 22, 2009

academic commons: new media and teaching

Filed under: digital humanities, pedagogy — waldo @ 8:41 pm

March 19, 2009

birkerts on the kindle

Filed under: digital humanities, metonymy — waldo @ 2:40 pm
Tags: , ,

Birkerts, a recent post on the Atlantic. Re-covering old ground: the page to screen transfer. But even more clearly, re-hearsing the blatant problem of his Gutenberg Argument. Here, he makes a case for the ‘page’ as context, and the ’screen’ as the flattening of context. And yet, in his own print book, in the pages of Gutenberg (as even my students in English 101 wonder) there is very little Gutenberg, very little context for understanding the history of print, or even of texts. Birkerts venerates the aura of the book—the aura that is removed from its context. This may be a reverse of Benjamin; read more closely, it seems to me, it is the real insight: reproduction brings us not just access, but greater access to the creative process (now more intertwined with reproduction, but process nonetheless): the reader is ready to turn into a writer.

Whitman wants context through access—and understands that context comes at a price for the page: specimen daze; the page of convulsiveness.

Birkerts wants the book ripped from its context, presumed pristine; the book, as it puts it, beyond revision. Birkerts wants the page as metaphor (window); what he doesn’t like is the screen as metonymy (contiguous with the process of its production).

March 10, 2009

devon think

Filed under: Bush, digital humanities — waldo @ 4:26 pm
Tags: , , , ,

« Battle Of The Sexes, cont. | Main | DevonThink Continued »

January 29, 2005

Tool For Thought

This week’s edition of the Times Book Review features an essay that I wrote about the research system I’ve used for the past few years: a tool for exploring the couple thousand notes and quotations that I’ve assembled over the past decade — along with the text of finished essays and books. I suspect there will be a number of you curious about the technical details, so I’ve put together a little overview here, along with some specific observations. For starters, though, go read the essay and then come back once you’ve got an overview.

The software I use now is called DevonThink, and I’m sorry to report that it is only available for Mac OS X. (I know there are a number of advanced search tools available for Windows, so I’m sure most of what I describe here could be reproduced — I just don’t know enough about the search tools on that platform to recommend anything.)

I talked in the Times essay about using the tool as a springboard for new ideas and inspiration. Here’s what that process looks like in practice. This is the window that shows me an overview of part of my “research library” in DevonThink:

screen1.jpg

These are all books that I have transcribed digital passages from over the past 10 years or so — you can see how many quotes for each book in the little number in parentheses after each title. Oftentimes I’ll start the exploration with a straightforward keyword search, in this case: “urban ecosystem.” I plug that in, and get back one result, a short quote from Manuel DeLanda’s excellent 10,000 Years Of Non-Linear History.

screen2.jpg

This is where it gets interesting. I take that quote, and click on the “see also” button, which generates an instant list of other documents or quotes that have some semantic connection to the original one. I can see a few words from the entry, along with the author and book title.

screen3.jpg

I find another, more elaborate quote from DeLanda in that bunch:

screen4.jpg

And then I perform a “see also” on that quote. I get back a few pointers to essays that I’ve actually written — and completely forgotten about — including a review of an E.O. Wilson book on biodiversity that I wrote about three years ago. Ultimately, I end up with this wonderful quote from Jane Jacobs that draws an explicit analogy between natural and made-made ecosystems. The whole process takes me no more than a minute.

screen5.jpg

Over the past few years of working with this approach, I’ve learned a few key principles. The system works for three reasons:

1) The DevonThink software does a great job at making semantic connections between documents based on word frequency.

2) I have pre-filtered the results by selecting quotes that interest me, and by archiving my own prose. The signal-to-noise ratio is so high because I’ve eliminated 99% of the noise on my own.

3) Most of the entries are in a sweet spot where length is concerned: between 50 and 500 words. If I had whole eBooks in there, instead of little clips of text, the tool would be useless.

I think #3 is the point that needs to be drilled home to people working on desktop search. It’s been hidden from us largely because the web itself is broken up into pages that are often in that 500 word sweet spot. Think about the difference between Google and Google Desktop: Google gives you URLs in return for your search request; Google Desktop gives you files (and email messages or web pages where appropriate.) On the web, a URL is an appropriate search result because it’s generally the right scale: a single web page generally doesn’t include that much information (and of course a blog post even less.) So the page Google serves up is often very tightly focused on the information you’re looking for.

But files are a different matter. Think of all the documents you have on your machine that are longer than a thousand words: business plans, articles, ebooks, pdfs of product manuals, research notes, etc. When you’re making an exploratory search through that information, you’re not looking for the files that include the keywords you’ve identified; you’re looking for specific sections of text — sometimes just a paragraph — that relate to the general theme of the search query. If I do a Google Desktop search for “Richard Dawkins” I’ll get dozens of documents back, but then I have to go through and find all the sections inside those documents that are relevant to Dawkins, which saves me almost no time.

So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don’t quite have a word for: a chunk or cluster of text, something close to those little quotes that I’ve assembled in DevonThink. If I have an eBook of Manual DeLanda’s on my hard drive, and I search for “urban ecosystem” I don’t want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I’ll still be assembling my research library by hand in DevonThink.

I wonder whether it might be possible to have software create those smaller clippings on its own: you’d feed the program an entire e-book, and it would break it up into 200-1000 word chunks of text, based on word frequency and other cues (chapter or section breaks perhaps.) Already Devonthink can take a large collection of documents and group them into categories based on word use, so theoretically you could do the same kind of auto-classification within a document. It still wouldn’t have the pre-filtered property of my curated quotations, but it would make it far more productive to just dump a whole eBook into my digital research library.

The other thing that would be fascinating would be to open up these personal libraries to the external world. That would be a lovely combination of old-fashioned book-based wisdom, advanced semantic search technology, and the personality-driven filters that we’ve come to enjoy in the blogosphere. I can imagine someone sitting down to write an article about complexity theory and the web, and saying, “I bet Johnson’s got some good material on this in his ‘library.’” (You wouldn’t be able to pull down the entire database, just query it, so there wouldn’t be any potential for intellectual property abuse.) I can imagine saying to myself: “I have to write this essay on taxonomies, so I’d better sift through Weinberger’s library, and that chapter about power laws won’t be complete without a visit to Shirky’s database.”

These extra features would be wonderful, but the truth is I’m thrilled to have the software work as well as it does in its existing form. I’ve been fantasizing about precisely this kind of tool for nearly twenty years now, ever since I lost an entire semester building a Hypercard-based app for storing my notes during my sophomore year of college. There’s a longstanding assumption that the modern, web-enabled PC is the realization of the Memex, but if you go back and look at Bush’s essay, he was describing something more specific — a personal research tool that would learn as you interacted with it. That’s what I think about whenever I use this system to stumble across a genuinely useful new idea: finally, I have a Memex!

Posted by sberlin at January 29, 2005 08:43 AM

Blog at WordPress.com.