David Rumsey is a map geek in the truest sense of the term. At 150,000 maps, his private collection is one of the largest in the United States. He is a pioneer in map digitization, and has made over 29,000 maps from his collection freely available in digital form on his website. David Rumsey is a board member of The Long Now Foundation, Yale Library Associates, and the Stanford University Library, among others.
Late in 2011 I had the chance to sit down with Mr. Rumsey to discuss the future of map digitization, the shifting role of libraries, and how science and the humanities can work together to help us manage and understand the vast quantity of information that we are producing as a society.
Gettliffe: You’ve used the terms “close read” and “distant read” in your discussions of maps. What do those terms mean?
Rumsey: The whole notion of a close read and a distant read comes out of the analysis of books. The close read is a more traditional read of a book from cover to cover. A distant read would be using computers to analyze the content of a million books, looking for overall patterns.
Franco Moretti, the scholar at Stanford who more or less coined the terms, wrote a book called “Graphs, Maps, Trees” before the Internet and global digitization. It was an analysis of a thousand books in which he was trying to see them all together, not in terms of a traditional humanities analysis of style or theme, but literally looking at occurrences of words.
Gettliffe: How does the concept of a distant read apply to maps?
Rumsey: A map is a text, just spread out spatially or on many pages in an atlas going in all directions. I think it’s why I love them so much. While I love to read books, I really like that text going everywhere. It’s just the way my brain is.
So how could you then do distant reading of maps with computers? With texts you can do optical character recognition (OCR) of every character and then transform that into a searchable text; but can you do that with maps? Conceptually you could but it’s a lot harder. So that’s a project I’ve been working on for several years. How could we OCR maps? Because if we could, the results would be phenomenal. It isn’t just a matter of being able to read the text faster. With maps you simply have information that is nowhere else.
The other kind of distant read that I’ve been exploring is putting together large groups of maps visually. Then you start to see patterns. For example, we joined together 674 maps of a German survey from the 1890s into one gigantic image. It’s a little different when you’re doing it with visual materials because you’re using computers but you’re also using your eyes.
Gettliffe: It seems like there are incredible challenges in the first method of distant reading you described. Whereas in words and letters you have a pretty stable symbolic system, at least for a given point in time, every single map has a different set of symbols that mean different things.
Rumsey: It gets you really excited about how great our brains are! We can look at a map and know that this is a river, this is a broad leaf forest, this is a town, and this is a province. We learn the ontology of the system. Of course we may have to look at the key a little bit. Every complicated map has a legend. The German maps have a big legend card that says, “these are the symbols for railroads,” “these are the symbols for broad leaf forests.” So you read that and then bingo, you’re doing what you want the computer to be able to do. I can’t believe that it won’t be possible, because it’s a mixture of OCR and image recognition.
Gettliffe: So you would have to include the ability for the computer to read the legend. But how consistent are legends?
Rumsey: They’re not! Exactly. So you’re going to wind up having analysis of a particular set of maps. So initially you’ll say “where is it worth doing this hard work?” and you’ll choose national map sets where there are enough similarities.
Gettliffe: What’s your intuition about where the ability for computers to read maps will lead us?
Rumsey: A couple of obvious things. It will allow us to create historical gazetteers. A gazetteer is a dictionary of names of places. They exist now but the biggest challenge has been showing changes in names of places over time. It may sound a little arcane but it’s actually incredibly important for historians and genealogists.
In addition there’s the whole question of land use, human and natural. There’s a group of environmental historians in France who have used my Cassini survey maps that I have in Google Earth to measure in analog the entire forest cover of Germany in the 1700’s and then compare it to today. Today we get that through satellite imagery. So: land use, natural coverage, historical gazetteers, roads, railroads, all the information that is in maps that you just won’t find elsewhere because it’s either highly spatial or simply not around. The idea is that it would fill in the gaps.
Gettliffe: Would it be accurate to say that what you come out with is a framework or a fabric with which you can start to piece together stories?
Rumsey: I think that’s right. See I’m primarily a librarian by nature, so I like to structure information in ways that others who are different from me, particularly scholars and historians, can find useful. So I can’t tell you exactly how they would make use of it, but my hunch is that it would be very useful.
Now I’ve been around technology long enough to see that some things we think will happen really do. Other things that we think will happen don’t happen at all, or they happen quite differently from the way we thought. I think you could make an argument that this OCR, or rendering maps into digital form that can be analyzed by computation, is way too complex. It may be that it’s 10-15 years away, and that’s en eternity at the current pace of change.
The other thing we can do is simply make these maps much more readable by humans and trust in our own ability to actually absorb and read them. Perhaps we can’t literally measure all the forest in Germany, although this French team was willing to spend the time to do it manually. Perhaps all we’ll have is an impression in our brains of what the forest was like, and that in itself will be sufficient.
Gettliffe: It seems to me is that there’s a natural tension between the concepts of a close read and a distant read. A close read seems to be more of a traditional humanities type of analysis, while a distant read sounds like a much more scientific approach. They have different values associated with them, and it seems like there would be the potential for them to come into conflict in terms of how we prioritize, store and manage information.
Rumsey: I think they exist side by side. In my map world I do both all the time. I’m looking at a particular map and really exploring it up close, and then I’m pulling back and searching across a whole group of maps and trying to get a sense of trends. They should reinforce each other over time.
The close read vs. distant read issue brings up a lot of the issues around technology and humanity, and those are really wonderful issues. For me it goes back to graduate school at Yale and a whole relationship with art and technology. I got my bachelor’s in studio art at Yale College, in film, and then did a three-year master’s in fine art doing art and technology. I formed a collaborative group of 7 people that included the head of the Yale electronics lab. This was 1967, which is pretty early in computation. We essentially built interactive environments that programmed light and sound and space using video. Our group had two names. “Yale Research Associates of the Arts” was the name to get grants, because it had this sensible sound to it, but our real name was PULSA. That was a 7-year involvement for me, and it was all about technology and art, and using technology in art, and the issues that creates.
Gettliffe: One of the things I’ve noticed that comes up as a massive obstacle to potential synergy between science and the humanities is the sense that there’s trauma imposed on one side or the other; often times on the humanities side of things there’s a sense that science or close logical thinking devalues creativity and direct experience.
Rumsey: Yes, the scientific method doesn’t lead from the heart; it’s a matter of proof and replicating results.
Gettliffe: Exactly. And there’s often a sense of devaluation on both sides. People who see things in a more artistic free-flowing kind of way also often say and do things that are interpreted by the other side as devaluing objective truth.
Rumsey: They have to compromise. At the time PULSA was formed we were all in our twenties. We took very seriously the idea of community and actually all lived together. We found the compromises we had to make very enlightening and wonderful and useful, but it took a lot of discussion and wine and time together just to get people to relax and share.
Currently that divide is a lot more promising because it’s a lot grayer. You have technologists and computer scientists who view themselves as very creative people. The whole world is getting closer to humanist values, and at the same time humanists, because they’re engaging with computation, are really technologists themselves. I think the way you bridge the gap is you have those skill sets and ideas literally crossover, rather than having specializations at both ends and then trying to speak.
In the library world we’re acutely aware of the need for technology skills in the education of the 21st century librarian. It just has to happen. The new librarian has to be a computer scientist as well, and needs to understand the whole language of search and digitization.
Gettliffe: Do you feel that that shift is occurring?
Rumsey: I do. I think the problem is well known now. Some librarians are engaging in it more readily than others, but it’s definitely front and center, both in terms of education of the next generation of librarians and at the libraries themselves.
I was at a board meeting of the Council on Library and Information Resources (CLIR) in November. We’re essentially a think tank for major research libraries, supported by Harvard, Yale, the Library of Congress, and others. We help libraries transition to the digital age. One of the great pieces of fun we had at the board meeting was to go to Culpepper, Virginia, to the Library of Congress’ special sub-library for audio and visual information. It’s unbelievable. They’re there digitizing all the old films, LPs, CDs, and tapes. They’re also pulling down a hundred channels of live TV every day and archiving it. All that is going to enable technology to analyze culture.
Gettliffe: It sounds like the domain of libraries is expanding with our ability to store and analyze digital information. What challenges does that raise?
Rumsey: One of the biggest challenges for some of these major research libraries is catering to the contrasting modes of research and consumption. At Stanford, which is the library I’m the most closely involved with, we have to manage millions of volumes of books in the “legacy library” that are still being read. There’s no question that the digital side of information seeking is growing fast, but Stanford can’t just put all those physical books at a storage facility somewhere because when that possibility was raised a lot of scholars said “No, we still need some books on campus. We need to be able to browse stacks. It’s how we think. We need to be able to see the edges of the books. We want to go down those rows, and we want the serendipity of finding a book that may have been misplaced.”
And then you have to ask, “Where are the students and the faculty and the graduate students? What do they need?” It winds up that libraries are social spaces, though I hate to use the word social because there’s not much talking; they just like to work together. You can see it. They’re working on their computers in various reading rooms and they’re not talking, but there’s still something going on with them being together. It sounds a little ethereal.
Another challenge is that as a librarian you have to figure out what to save. Often what you think is of interest now won’t be of interest in 50 years, and then in a 100 years it’ll be of interest again.
The problem for librarians now is that there’s such abundance. The abundance of digital information, video, audio, emails, e-books is astounding. So the question is what to keep. The current paradigm is to keep everything because storage is getting cheaper and smaller, but that’s only going to last so long. The science information coming down is huge. Astronomers looking at the universe are producing petabytes of information. You can’t keep it all because it becomes like the one to one map of the universe; where are you going to put it? So this is hard. I don’t know the answer to it. It’s one of the interesting challenges that are out there.
Gettliffe: How has the whole digitization push impacted the way people read?
Rumsey: There have been a number of things in the press lately stating that traditional book sales are actually up. With all the worry that e-books were going to replace books, it doesn’t seem to be happening. What’s wonderful is that more people are reading everything. E-books are pulling people into reading, and then they’re often getting a physical copy as well. Some of the publishers are putting out hard cover editions now with very beautiful bindings. It’s like they’re rediscovering the art of bookmaking because they’re seeing “oh, our strength is in the physical object; let’s make it beautiful!”
Gettliffe: It’s as if the contrast between physical and digital is reawakening an awareness of what’s really valuable about a physical book.
Rumsey: It’s a good illustration that contrast is healthy. I think it’s been great there.