A bit of why and how – or, History in the digital age

posted May 27, 2013, 7:44 PM by Aiala Levy   [ updated May 27, 2013, 7:47 PM ]

originally posted on January 1, 2012

So why 1854 in the previous post if the dissertation begins in 1890?

1. The thorough historian's justification: context is key.  If I don't understand the politics of theater prior to 1890, how can I highlight changes, discover exceptions, or spot continuities?  What may seem to be representative of turn-of-the-twentieth-century modernization/modernity (or progress/civilization, to use the more popular terms of the day) may have actually already been in practice or in thought long before.  Or maybe not.  

2. The practical historian's justification: researching the establishment of the Theatro São José would make for a perfect short paper for a fascinating panel in an interesting conference conveniently taking place in São Paulo in July.  And it's a theater that still existed after 1890 so studying its origins wouldn't be such a digression, right?

The truth, of course, is that the historian never knows exactly where the sources will lead her.  Or at least I never do.  Nor do I want to; part of the fun of History with a capital "H"--the study of history, the doing, the finding, the interpreting, the writing--is precisely the mystery and surprise.  Like a cartographer embarking on an expedition, I may not be forging a new path and may even have a general sense of what form my final map may take, but the details and precise measurements are lacking.  If I knew ahead of time what lay in store, there'd be no point in undertaking the project.  

This time around, my sources led me to the Theatro São José, the province/state funded theater that burned down in 1898.  Tales of embezzlement, mismanagement, and unheeded warnings surrounded this public-private enterprise that was intended to bring civilization to a backwater provincial capital.  Yet, despite the theater's shortcomings, in terms of both architecture and programming, its destruction was bemoaned by São Paulo's press and added urgency to the demand for a municipal theater.  All of this I gleaned from the major dailies of 1890s São Paulo: A Platéa, A Noite, and O Estado de São Paulo.  I was now hooked and wanted to learn more.

So how does one get to 1854 from the 1890s, especially given the lack of available periodicals from the earlier period?  (An absence felt more acutely now that the Arquivo Público do Estado de São Paulo is closed through at least March 2012.)  One helpful resource has been the Acervo Histórico da Assembleia Legislativa do Estado de São Paulo (the Historical Archive of the São Paulo State Legislative Assembly), an intimate archive with welcoming and knowledgeable staff (I was even offered cake and coffee!).  Much of the archive's holdings are digitized, and those documents unclear or unavailable online were (re)scanned for me at the director's suggestion.  Other digitized files too large for uploading online were graciously shared via an 8GB USB flash drive (BYOFD).  I was impressed.

The beauty of digitization is the ability to search.  Keyword queries obviously have their limitations.  First, the source must have words to search.  Second, the querier must already understand and anticipate the diction and spelling likely to be used.  Third, OCR (optical character recognition) technology is still less than perfect; depending on the condition and typeface of a given document, "t" may be confused for "f," "8" for B," or a character ignored altogether.  But in the grand scheme of things, searchability has thus far proven to be more of a boon than a hindrance.  

At the Assembly (ALESP) Archive, the online database of digitized documents can be searched by keyword, date, and location.  While the staff-created descriptions occasionally have minor inaccuracies or spelling errors, you can't beat the ease of typing in "theatro," downloading a dozen PDF copies of handwritten manuscripts, and then reading and annotating these beautiful documents in the comfort of your own home.  Link the PDF files to a bibliographic record on Zotero and, voila, history's at your fingertips and ready to use!

The handwritten documents from the ALESP Documentos database may not be word searchable themselves, but the typewritten anais (annals of legislative sessions) are.  This is incredibly helpful when plodding through a 500-page text, much of it petty arguments stemming from bruised egos (OK, so some of it is entertaining).  Enter "S. José" or "theatro" (or just "heat" to compensate for the difficult "t" while avoiding other words) and you'll at the very least pick up a legislative trail.  From there you can find other potential keywords (for example, an impresario's name or a bill number) and figure out what other dates your topic of interest may appear in the annals.  Suddenly you feel like Sherlock Holmes.

But what to do about the thousands of pages of legislative debates not digitized and unindexed?  I discovered the following algorithm for their efficient examination:

1. Run keyword queries of all legislation passed by the province of São Paulo during my studied period, available on the ALESP website.  Some keywords I've successfully used thus far: theatro, teatro, Quartim, dramatica (no accent marks).

2. Note each bill's date(s), sponsor(s), and related draft/proposal (projeto) number(s) as available.

3. Work backwards from the bill's date in the records, using the sponsor's name, bill/proposal number, and keywords as signposts when skimming.

The disadvantage of this method is that I run the risk of missing a relevant debate that did not materialize into legislation.  Most of the unindexed and undigitized annals at the ALESP archive, however, are not original minutes (there was no official typographer prior to 1868 or so) but edited compilations, meaning that they're relatively quick to read and that they're lacking many discussions.

No method is foolproof, but I can't help but be excited by the opportunities enabled by text analysis technology, which everyday grows more sophisticated.  Text analysis, of course, requires digitization, which not only aids the researcher of the present but will also prove invaluable in the future as historical materials continue to deteriorate.  Undoubtedly, a certain magic is lost with the conversion of touchable, smellable objects to series of 0s and 1s, and reading manuscripts at home cannot substitute the knowledge and community gained by interacting with staff and scholars at a brick and mortar archive.  Nevertheless, I'm glad digitization is being taken seriously by not only the Assembly Archive, but also by the State Public Archive, the Edgard Leuenroth Archive at UNICAMP, the Brasiliana collection at USP, and the Museu Lasar Segall, among others.  Thanks, guys, obrigada!