Database Management | Paul R. Katz

In a post about the wonders of ABBYY FineScanner back in May, I promised to write about another pillar of my archival process, the database management program, DevonThink Pro Office. Like ABBYY FineScanner, it’s quite pricey ($149.95 after a 150-hour test-drive), but coming up on 15 months together I couldn’t imagine my life without it.

I should say at the outset that I can claim no particular expertise with regard to this program. I have no doubt that someone with more technical skill could wring much more from it than I can. I should also note that the program is only available for Mac — I know, I know — so if you haven’t been sucked into the Apple vortex, this post won’t be of much use to you. But my fellow Mac-owning archival researchers looking to build a digital database may find something of value in the ensuing description of the DevonThink process I’ve come to rely on over the past year.

When I fire up the program and open my Dissertation database, I’m met with the menu you see below to the left. At the top are a few items: Inbox, the default repository for new files I drag into the program; Tags, which I don’t really use; Mobile Sync, a reception point for items that come in through the DevonThink ToGo mobile app, and Evernote, which receives clippings I make with the Evernote app. (You’ll find a bit more on these last two at the bottom of the post). All of these came with the program or with apps I connected to it, as did the four items at the bottom of the list (i.e., All Images, All PDF Documents, Duplicates, and Orphaned Files). The stuff in between, though, is user-generated.

The header labeled Archives is where I put the documents I scan and the notes I take on them, organized by country and then by archive. Books/Articles is where I take notes on secondary sources; it’s also organized geographically. For Others are documents unrelated to my own project that may be of interest to friends and colleagues. Internet (Clippings/Links) is where I sort stray news articles and websites of interest. Logistics is home to information about the infrastructure of academic life — fellowships, grants, conference funding, seminars, and the like. Notebook is where I take notes and organize documents in ways that cut across multiple archives. Random/Interesting is self-explanatory, and Teaching Aids are where I put things that may be helpful for teaching all of this when I’m back home.

When I’m at the archive itself, the Archives header is, unsurprisingly, where most of the action is. Let’s imagine I’m spending the day at Argentina’s National Library. The “Biblioteca Nacional” folder has three subfolders, which correspond to the three divisions I’ve used so far: Archivo, Historia Oral, and Libros. As I work through an archival collection, I’ll create a subheader for the collection, then one for each of its boxes that I consult, and finally for each archival folder of interest.

Let’s say I’m working with the Silvio Frondizi Subcollection, on which my recent post, Revolutionary Human Rights, was based. More specifically, I’m looking through a folder from Box 7, labeled “Movimiento Nacional contra la Represión y la Tortura 1/2” (see below). When I come across a document I want to take note of, I’ll create a Rich Text File (RTF) in the corresponding folder, titled first with the date as closely as I know or can approximate it, and then either its title or a phrase that more effectively conveys its use. (I mark my own date approximations with question marks.) In the body of the RTF, I’ll include any document or page numbers that I may need for later citation followed by whatever thoughts have come into my mind. In cases where I have general observations about a folder, or a box, or an entire archival collection, I’ll create a separate RTF file in the corresponding place in the database titled “0 Overall” and take notes there. (The initial 0 is a way to make sure the file jumps to the top of the alpha-numeric heap.)

The contents of the folder, “Movimiento Nacional contra la Represión y la Tortura 1/2.” At right, above the line, an alphabetized list of the files it contains. Below it, a space to scroll through them. At left, nesting drop-down menus organized by Country, then Archive, Collection, Box, and Folder.

If a document is worth copying and I am permitted to take photos, I’ll scan it with my cell phone and convert it into an OCR-recognized PDF, which I will then label with the same name as the related RTF I’ve just created in DevonThink. Then, when I get home, I can upload the PDFs from my phone and easily sort them into their corresponding DevonThink folders. As a final step, I’ll right-click on the PDF, choose “Copy Item Link,” and paste a permalink to the PDF into the RTF (see below). That turns the RTF into an all-purpose base of operations, which I can then use as a building block for subsequent indexing.

*Copying the item link to the PDF for “1972? Ellos son torturados….” I will then paste this permalink into the identically named RTF.*

What kind of indexing? Sometimes an archival collection is already organized in ways that make sense for my research. The Silvio Frondizi Subcollection, for instance, groups documents chronologically and by organization or project, which is exactly how I want them. On the level of the collection itself, then, there’s no need for further reshuffling.

But other collections aren’t arranged in ways that are helpful to my work. This is particularly true of police and military archives, which typically operate through master indexes of names but are physically organized into vast collections based on other considerations, such as reporting unit or jurisdiction. I want to preserve this original system of organization, both because I will need to specify where I found the documents that I ultimately cite, and because each security organ’s proprietary system is a window onto the repressive logics I am trying to understand. But relying exclusively on these original systems would greatly hobble my ability to draw connections across the archive and to conceptualize it in ways that correspond to my arguments.

In these cases, I create archive-specific indexes that meet my own thematic needs. Take the political police files held at the Arquivo Público do Estado de São Paulo (APESP), where I worked for hundreds of hours from March till May, and which I drew on for this earlier post about torture and São Paulo’s armed Left. After finishing at APESP, I created an RTF titled “0 APESP Index.” The index features a couple dozen topics grouped under five major headings: Police/Military, Armed Groups, Anti-Torture/Human Rights Groups/Campaigns, Links to Other Countries, and Torture Topics. Within each of these categories, I added as many subheadings as necessary — phrases like “Resisting Torture” and “Testimonies” in the case of the “Torture Topics” grouping, for instance. I then went through the full list of RTF files that I created at APESP one-by-one, right clicking, copying each of their item links, and pasting these links into the “0 APESP Index” RTF in whatever slots seemed right (see below). Helpfully, even if I move the linked RTFs around, or modify their content or titles, the links will still work.

*The APESP index, at right. To the left, the organizational system used by São Paulo’s political police.*

(Because I take reasonably thorough notes while in the archive, holding a future index of just this sort in mind, the whole indexing process is quite a bit less arduous than it might sound. In this instance, it took about an hour and a half to catalogue the 150-or-so PDF files that I’d created at APESP. To my mind, it’s a worthwhile investment given the organizational and analytic power it unlocks. To be fair, though, this sort of stuff is fun to me to an extent that sometimes even I find disturbing.)

Archive-specific indexes aren’t the only sort I use DevonThink to build. The second kind are the thematic indexes which fill the Notebook portion of my database. Here, I keep running compilations of links to documents that I come across related to specific organizations, individuals, places, or themes. For instance, the armed Peronist group Montoneros is of particular interest. When I come across a document that pertains to this group, I copy-and-paste its item link into the “Montoneros” RTF in my Notebook (see below). It is my hope that, as I move into the writing stage, these indexes will serve as proto-outlines and also help me with the macro organization of the dissertation and subsidiary articles.

*A thematic index, for Argentina’s Montoneros.*

This description of my process hasn’t touched on many of the features that set DevonThink apart, so allow me to mention them briefly. With DevonThink, you can:

— Import photos and merge them instantly into multi-page PDFs, which can then be OCR-converted
— Take notes on documents and PDFs
— “Replicate” files so that identical copies sit in various places at once, yet an edit to any is an edit to all
— Sync to your phone or tablet using DevonThink ToGo (a product which which I’m less satisfied than with DevonThink Pro Office)
— Import directly from EverNote (which has far better web-clipping capabilities than DevonThink ToGo)
— Develop customized workflows using Automator
— Create “smart groups” based on tags, keywords, or full text
— Enjoy powerful search functionality including concordance

This last feature alone is, to me, worth DevonThink’s purchase price. While no OCR is perfectly searchable, on net it works pretty well, especially when supplemented by the keyword-driven notes I take in the linked RTFs. The result is that when I have only the inkling of a document in mind, I can almost always find it quickly. Full-database searches, moreover, at times yield parallels and connections that I wouldn’t have anticipated. I’d never create a thematic index without doing one first.

In closing, I want to stress that the process I’ve described here is not something I could have created whole-cloth at the outset — even having consulted the numerous academy-specific posts I found online. (Though this one in particular, by historian Rachel Leow, did serve as an extremely helpful jumping-off point.) Rather it’s a method that could only have grown, trial-and-error style, out of my intensive use of the program during a sustained period of primary research, and I’m sure it will continue to change as my work advances. If you end up going the DevonThink route, I’m sure your system will look different than mine; indeed, that’s the idea!

I hope these words and screenshots prove useful to someone. If you’re that person, or if you have any questions or have found anything here to be unclear, please do let me know!

Tag: Database Management

DevonThink for Archival Research