Chronicling America: Historic American Newspapers online

Chronicling America: Historic American Newspapers online

I want to begin with is asking where you’ve
been my whole life because I spent a large part of my life sitting in front of World
War II-era microfilm machines turning cranks in rapture really while entire years of my
youth passed by. I made the mistake of, I went to graduate school not knowing I was
going to be a historian and I made the mistake one day of reading a newspaper on microfilm
which had never occurred to me was possible or that microfilm existed outside of old James
Bond movies and it took me maybe 30 minutes to make a fundamental discovery that shaped
everything that I’ve done for the last 35 years which, oh my God this stuff actually
happened, and it all happened at the same time and people didn’t know they were living
in history and they didn’t know how things were going to turn out. What are we going
to do about that? And my mom had been a 5th Grade school teacher for 30 years told me
when I went to graduate school to study history, she said, “Well what for? We already know
what happened.” And I read those newspapers and said, “Oh no, we don’t.” And so
from that moment in front of the artifacts that you folks are doing so much to share
with the world, my life changed. From that, a whole another, a 600 page book grew when
I was looking for lynchings and discovered Coca-Cola and football games on the same pages
and I thought, “Ok, how would you tell that story?” And then when I discovered the big
difference it made in my life, I had 160 of my students in my previous institution hit
the microfilm machines and chronicle what they found there in six months of a newspaper.
And they would cuss me and tell me this is hard on my eyes, it’s dark in there, all
these kind of things and then at the end of it they would say that was worth any number
of books I could have read and certainly your entire semester of lectures seeing what was
in those newspapers. So then I wrote this book in which I read every newspaper I could
find from the 1890s and the American South, which meant, you remember what this was like,
going to the library and persuading people we need this newspaper from Shreveport in
1892. Other people will use it, they will really, please buy this for me. And they fell
for it and I happily used papers and then I actually bought my own microfilm machine
so I could just be a complete nerd on weekends, at nighttime, partly because I made the mistake
of trying to eat M&M’s while I was at a microfilm machine trying to keep the span
of attention longer and the librarian, “Mr. Ayers. You’re setting a terrible example
for the students.” It’s microfilm and they’re M&M’s, they melt in your mouth
not in your hand. Really, it’s going to be OK. But of course now you go to libraries,
“Another latte!” But back then it was not that way. So then, you know you heard
briefly about the valley of the shadow, I thought it out back in 1991and then the World
Wide Web comes along and we said, “We built the thing. ” And I don’t get to use this
very often at SGML which we thought stood for Sounds Good Maybe Later. And it turned
out Maybe Later turned out to be HTML, and we were able to build sites since 1993 and
I knew what needed to be on there. It needed to be the newspapers that had changed my life.
And so I said, “Surely we can OCR these babies in 1993.” You couldn’t. But we
could scan them all, we could digitize them all. And we did that. And then it turned out
that we didn’t know that something called the PDF file was going to be invented. And
so we had the same group four facts compression tiffs that we had to figure out some kind
of device that would show them on the computer screens at the time. And they just all choked
when you called up a page of a newspaper. They couldn’t handle it but we persisted.
And so then we transcribed 10,000 pages of newspapers ourselves. Well ourselves being
not me exactly but graduate students who were paid nicely for it by NEH grants, thank you
very much. In the Valley of the Shadow and then xml-tagged and it’s still one of the
larger groups I think of xml-tagged newspapers on these two communities in the American Civil
War. Then I wrote a book in which you could go to the digital version and see every newspaper
or letter or diary in which it was based. Then we wrote an article for the American
Historical View that claims to be the first native digital peer-reviewed article, and
the fact that it was also the last one, and it’s ten years ago, is one of the reasons
I’m before of you today. That’s a little discouraging that, here’s my beef: you folks
are helping us through the most profound social change of our time, surely scholars can think
of more to do with them, with the work that you’re doing than we have so far. And so
that’s why I jumped at this chance to think about, because I’ve been trying to do some
of my part in all this. Where this is a piece of paper that managed, probably driving some
people crazy. So I thought wouldn’t it be great if we could share this possibility of
reading all these newspapers with lots of people. So this is something I first thought
of at the University of Virginia and then took with me when I went to the University
of Richmond in 2007. In which students at classes all across the country, as it turns
out the continent, can do research and instead of just getting a B+ on it back and then “great”
and throw it away and all that work is for naught, wouldn’t it be great if all those
people who read six months of the newspaper were actually able to preserve it and share
it with other people. So I’ll just do a quick search for newspapers and see what we
got. And all these things, at So here’s a 191 articles about newspapers
and you can see that the idea is that, that’s a monograph but it’s only one paragraph
long. It’s got everything a monograph has in it. It has original sources, it has secondary
context, it makes an argument and you can go back to where it came from and it fits
all together. So I have no idea what that is, we’re not going to look at that, I don’t
want you paying attention to that, listen to me while I’m talking, that’s always
the danger of showing things. You can map it geographically, and so we’re still expanding
all that. Now that was devised before Chronicling America really came online. And now, ever
since, I’ve been trying to figure out exactly what to do with it. One of my former students,
Andrew Torget, at the University of North Texas came in, and I’m sure all of you have
seen this, in which you would go through and for Texas newspapers, graphs that map the
newspaper, the quality of the tagging, and show it in maps, and then show the percentage
of the words that are good words. And then the stamp of the university they’ve mapped
using Chronicling in the meta-data, the spread of the newspapers across the United States.
And all those things are really cool. What I’m trying to think about, what do we do
now, what else might we do with it. Voting America has every vote in the United States
from 1840-2008 and many of those were from newspapers originally. We’re doing a new
digital edition of the Paullin Historical Atlas, Atlas of the Historical Geography of
the United States that will be out in a few weeks from the digital scholarship lab. And
we were looking on there and it said most of this information came from newspapers or
the Library of Congress. So things that we now just think of as being there were not
just always there. They were recorded by newspapers and then transcribed into the iconic maps
of American history. Some things too that you might not think of as newspapers turn
out to be with the sesquicentennial of the Civil War coming, I thought it would be great
if people actually had a chance to really look in the records themselves to see why
people said they seceded. So in Virginia delegates came from, 152 delegates came from all the
counties of Virginia to Richmond and debated how they were going to save the United States.
Most of them were Unionists; they came there. We call it the Secession Convention now, but
they didn’t know it was the Secession Convention. And they talked, and then they talked and
talked. And then talked some more. And it was all written down by newspaper reporters
and recorded in the newspapers of Richmond. And then in 1965, all that was gathered into
a four-volume 3,000 page book, books that were published by the Library of Virginia.
So when the sesquicentennial came along, I decided wouldn’t it be great if we would
actually make that available, so that people could see what’s in those 3,000 pages. And
so it just read 3,000 pages and found every time that slave, with an asterisk so slavery,
slave holders mentioned, and everybody knows that the Civil War was fought over state’s
rights which is why slavery is only mentioned 1,432 times in all of that. But this is newspapers
that then became something else and now that we’re trying to find ways that you might
look inside them to find the patterns. So I did it as a map, you can do it as a time
plot, it can show you the frequency with which they mention the word, anything that has to
do with slavery. So those are newspapers as well so sometimes newspapers are not immediately
visible but they underpin most of what we know about the American past and certainly
the 19th century American past. And I am grateful for that. So what we’re doing now is trying
to think about making a kind of scholarship that would be worthy of the work that you
folks are doing. And one thing that we’re doing is, actually I think I’ll show something
else instead. I think the most sophisticated work so far is done by my colleague Rob Nelson
at the University and he took Mining the Dispatch, newspapers, Richmond Dispatch that was tagged
with an IMLS grant to XML and is doing topic modeling on it. He’s now, he’ll be coming
out with a new article that is comparing it to the New York Times. Now I love this, we
got that by buying the CD-ROM that they sell for 50 bucks. And it has aski-text from all
the New York Times articles from the Civil War era. And OK, but there it was in clean,
and now and then doing the comparison. And you all know what topic modeling is; which
is it basically the computer reads the newspapers and finds the patterns within it without knowing
what the words mean. And so if you look at fugitive slave ads, the computer says well
I notice all these words of negro, years, reward, boy, man, name, jail, delivery, give,
left, delivered and apparently those are runaway slave ads. And then Rob is a, you can adjust
the chart for various degrees of the threshold by which it would recognize these. And then
has the ads themselves and this is the critical thing; how do we look at the big patterns
and not lose the thing that makes the work that you do so important. The newspapers are
windows into the soul of America. They are, every page is interesting. Even the ads, really.
You can look at those, and I’m talking to the converted I know here, but it’s wonderful
to assign this or to go to a general audience and show them these things and just to watch
their eyes open as they realize what these newspapers are. Now Rob is comparing these
and seeing how the northern and southern vocabularies of war changed, how they ebbed and flowed
with politics or with battles and it’s remarkable but the crucial thing is you’re always able
to go back to the original record. And so the challenge before us right now is how do
we combine the power of the analogue and the power of the digitial. And you folks are doing
that on a scale that no one else is. That you are using the remarkable ingenuity and
skill that you have to make it possible to not just sit in front of a microfilm and hope
that you come across a record of a crime, which is what I did for two years in my dissertation.
You feel a little ghoulish. Yes! No, I mean I feel bad for you but all right some evidence
you know? To do all that. And then do this for everything else and what would it have
meant what is it meaning right now for dissertations that are being written, for books that are
being written that Chronicling America exists. Now I’ll have to say that I recently published
an essay called, not that one but the one that was on there when I left it, which is
called “Does Digitial Scholarship Have a Future?” and I’m sure it’s going to
roll around. This is, let’s see if we can find it. There we are. And I started just
to pretend I hadn’t written it and just say some things to you I said well no maybe
I’ll just go ahead and show you but here’s what I said. Basically it’s a jeremiad.
“Though the recent popularity of the phrase digital scholarship reflects impressive interdisciplinary
ambition and coherence, two crucial elements remain in short supply in the emerging field.
First the number of scholars willing to commit themselves and their careers to digital scholarship
has not kept pace with institutional opportunities.” By which I mean you all. You’ve given us
incredible tools. And I keep saying, “Hey everybody. Let’s go write something that
really takes advantage of that full capacity.” And instead you’ll read a book or a journal
article. it’s got a newspaper in it, if you figure the odds of them actually just
stumbling over that would have been fairly small and you know they found it through search
and Chronicling, and it doesn’t say that. It just says this newspaper. Right? And so
in some ways the contribution, I see people smiling, the contributions that you’re making
some ways are hidden because here’s the thing; well let’s read more and see what
the thing is. “Second, today few scholars are trying as they did earlier in the web’s
history, by which I mean way back in the early 90’s, to re-imagine the form as well as
the substance of scholarship. In some ways, scholarly innovation has become domesticated
with the very ubiquity of the web bringing a lowered sense of excitement, possibility
and urgency. These two deficiencies form a reinforcing cycle. The diminished sense of
possibility weakens the incentive for scholars to take risks and the unwillingness to take
risks limits the impact of the excitement generated by boldly innovative projects.”
So what I’m saying to my colleagues is wow look at this. Look at all these newspapers
we have for American history. Surely we can think of something cool and new to do with
it. And people who for a long, I remember when I first started talking to people about
this World Wide Web thing and one of my colleagues, slightly older said, “Isn’t this a bit
like the hula-hoop?” I said,”No, this is a bit like television; it’s going to
change everything here pretty soon.” And now we don’t even think about using technology
when we use Google Earth or when we use Skype. So what seemed ten years ago, if you’re
a faculty member trying to get people to adopt technology they just take it for granted.
But we skipped the stage that where we had the idea that maybe we could do something
ourselves with it instead of just shipping PDFs around, of scholarship where we could’ve
written 30 years ago. Maybe we can do new kinds of scholarship if we would think about
the capacity that’s inherent in things like Chronicling America. So what might that look
like? Well we’ve tried to do one project at the DSL in Richmond that has some of the
elements of what we call generative scholarship. And so this is Visualizing Emancipation. It’s
animation I won’t play because experience shows that your eyes will follow any moving
object over here rather than listening to me. And this, the blue dots are everywhere
the United States Army was, across the era of the entire Civil War. The red dots are
the places where African Americans interacted with that army. A large part of this is from
the original records of the War of the Rebellion, 128 volumes that have been digitized, geo-
indexed, and animated. But a lot of the other sources and source types, one of the things
that you can have are newspapers. And so you can see here, these are evidence that we went
with the newspapers, many of them Chronicaling to be able to look for runaway ads. And what
this shows, the reaction of enslaved people to an unexpected moment of freedom. You hear
the Union Army is somewhere nearby; maybe you can make your way to it. Tragically one
of the things that you see is that the outcome often was not what they had hoped. Often it
was abuse. Often it was conscription. Often it was being dragged back into slavery if
the Confederate Army caught you and so forth. This wouldn’t have been possible five years
ago. And what else is not possible my colleague, Scott Nesbit, who actually made this, is writing
what would scholarship look like to describe this. We’re used to taking quotes and weaving
them together into a story but what if you start with a moving image that shows enormous
complexity over an area the size of continental Europe that shows the behaviors of millions
of people. What would scholarship look like if you did that? Will it be a journal article?
Will it be blog entries? And you’ll see up at the top, add an event, which means that
we’ll be able to add the capacity of crowd sourcing to this. It also says data download.
And following the wonderful example of Chronicling, be able to share all the information behind
this. So we’re trying to make a scholarship worthy of the gift that you’ve given us.
We’re trying to think about how would we rise to the possibility of plentitude, of
the multitude of things that you’ve given us. I’m sometimes chagrinned by looking
in the newspaper and people seem to think that history moves forward when you dig up
a body to see if it actually has arsenic in it, right. That’s how we make new discoveries
in history. Or somebody finds a box under their grandmother’s bed after she passes
away with some old letters in it. But you folks are giving us billions of words, of
boxes that we have yet to open. But there are patterns within it that we’ve only begun
to glimpse. So I would like to let you know that there are people out here who are so
grateful for all that you’re doing. We’re using what you’re giving us in the way that
you’re giving it to us now. How much faster could I have written books and if I could
actually look for what I was looking for rather than just wandering around and until I kicked
over it, and there it is. At the same time that I’m not giving up the serendipity,
the surprise that comes from reading the whole page, of seeing the simultaneity. So things
that would have seemed like science fiction when I was first starting in all this, you’ve
created. And now I want you to help us have the courage to write the kind of scholarship
that can do justice to it. I think that we need to be thinking about what Chronicling
America makes visible to us. The distribution and spread and decline of different kinds
of newspapers but also how the fact that what can they tell us and what can they not? They
feel like the voice of the past, understanding the political affiliation and all that, race
is obvious and gender, kind of taken for granted sometimes. So the point of being is that I
hope that we’ll have occasion going forward for you all to say, you know I spent a lot
of time with these. Here’s what I think a great study of it could look like. Here’s
what I think that a form of scholarship that’s actually native to the web. Now not a book,
doesn’t mean anything against books but what it does mean is that we’re living in
the middle of the most profound social transformation of our time. You’re a part of it. Scholarship
is kind of standing on the edge; you watch entertainment, you watch video, you watch
music, you watch books be transformed but scholarship hasn’t really figured out a
way yet to use the power not just to disseminate what we’re writing otherwise, but to re-imagine
what scholarship might be. You folks have had that imagination, you’re doing the work
that allows us to dream big and I just wanted to come here today and thank you. Thank you
very much. [Applause]
Who wants to go first? I’m ready. I’m Errol Somay; Director of the Virginia Newspaper
project, Library of Virginia. My question to you Dr. Ayers is, why not? Why isn’t
there more scholarship or energy behind this? Is it just because it’s just so overwhelming?
If you’d read my essay, Errol, “Does Digital Scholarship Have a Future?” There’s many
reasons; the risk-reward balance doesn’t look right. It’s pretty clear what you need
to do to get tenure and this isn’t it. Right? And so I’m trying to build a bridge from
both sides. You know I was dean for a long time and so sitting, judging a lot of tenure
files and now I read all the tenure files. And you can see that scholarship is a very
clear thing. Scholarship is a contribution to an ongoing conversation. It’s not just
any thought you happen to have about a subject. And the reason we spend so much time in graduate
school bringing people up on the literature is so you can make a meaningful contribution
to that conversation. And to the extent that you’re not, it’s not scholarship and you’re
not going to get tenure. OK? So the trick is for us to think about how do you make an
argument with digital sources and through digital means that you couldn’t make otherwise.
The most obvious answer is show a lot of stuff. I’ve tried that. It takes a long time and
people still don’t think of it as scholarship because unless it makes an identifiable argument
that can be tested and then saying, OK now we know this, we can go on to the next thing.
So I think that, and here’s the irony: presidents love this because it’s something for us
to talk about, deans love it, department chairs love it. Not so much the department chairs
but departments like, “Oh. That looks kind of like teaching or service, which are both
good but they’re not scholarship.” And so on one hand, people keep saying we need
to change the standards of scholarship and I say yeah but you kind of got to realize
this is the game we signed up for. To contribute to ongoing conversations. There’s lots of
ways to do that. Does yours? On the other hand, relax. You can contribute to an ongoing
conversation through a really brilliant blog entry as well as a journal article. So both
sides need to recognize what they can bring to it. So I think maybe that’s, it’s nobody’s
fault, it’s just that there’s very few incentives to do it. And the main incentive
once you’ve tried it and you see that people out there are using stuff online in far greater
numbers than you could have gotten otherwise and that people can instantaneously have access
to it all over the world, you start saying, “Oh this is why I would do this.” And
so I think that you know that what I worry about is, I gave a talk at the Library of
Congress maybe 10 years ago and a young woman stood at the end of it, she says, “You know
that makes me want to cry.” And I said why is that? She says, “Cause we would love
to do the stuff that you’re telling us to but they won’t let us.” And the trick
is figuring out who they is and sometimes they is we. Hi I’m Millie Fries from Iowa.
I come to this project as an educator, middle school, high school, so first thank you for
speaking a language I understand without acronyms that, because I didn’t understand that.
What I am wondering why we would ever need another middle school, high school history
textbook again if we can teach with some of the tools like you showed us. We would have
kids so engaged in history we’d never have to sell a subject ever again or have someone
ask, “Well we already know what happened.” Why can’t we have some kind of template
sort of things where here’s the topics you want to cover, here are the resources, the
newspapers, the primary sources. Can we move to something like that and completely away
from textbooks? That’s a great question. I taught a class at UR this spring called
Touching the Past which is about all of the different ways that the past is present in
our lives. Video games and movies and had them watch how much history was on television
for a week and to compare the Bancroft Prize-winning books with the bestselling books on Amazon,
they felt sorry for us. And I asked them to write their first essay, tell me about your
experience in school up to this point with history and they all said the same thing.
I love my history teachers. I hated my textbook. And the textbook seems like the thing that
people equate history with. I meet people on airplanes; I tell them I’m a historian.
I always hated history, the names and dates. Yeah I know, right? But why did that? It’s
because we’ve domesticated it to the textbook. There’s no reason to answer your question
that if we make enough cool projects that believe me I’ve worked with teachers, they
would much prefer to have things that kids can discover the truth for themselves rather
than double-column, shrink-wrapped, processed, corporatized textbooks. Of which I’ve written
one. And this is so much better. So in all honesty, what’s Visualizing Emancipation
is meant so people in middle school or high school can actually imagine the most profound
social change ever in this nation’s history which is the end of perpetual bondage of 4
million people. But rather than reading about first there was the 13th Amendment then the
14th Amendment, maybe you could click on each one of those red dots and then a story of
an individual person trying to make himself or herself free. That’s the great promise
I will end with this. Chronicling America is what we should be doing; a great democratizing
effort that if scholars can think of other ways to present it, to channel it, it connects.
The other thing the kids told me is that local history is the history that really first got
them interested and kept them connected to history. We know that that’s what people
really care about. The work you’re doing is a way to make that bridge to the local.
What I’m talking about are tools like Visualizing Emancipation that also lets us see how that
might connect with state, regional, national, even international patterns. So if we’re
going to make it useful, we know people love it, but we need to give them tools that let
them show, what’s the larger story here. But I do believe Chronicling America and the
great projects of you know the documentation in the Library of Congress and all those things,
they’re a great gift that we’re going to figure out pretty soon how to rise to.
And that’s my message today; have faith in us and give us ideas. We’d love to be
your allies and partners. Thanks again. [Applause]
I wanted to talk about this project that I hope is making good on the incredible amount
of work that has gone into assembling historic newspapers in Chronicling America and that,
I hope, is not imperiling my tenure case. But we shall see. This is a project that is
investigating the anti-bellum culture of reprinting. So just really briefly, I want to talk about
what I mean by that. As many of you will know if you’ve worked with 19th-century newspapers,
they were kind of an all-purpose media. They didn’t only contain what we think of as
news. They also included poetry, they included fiction. They included travel narratives and
for a lot of readers, they were the primary vehicle whereby they got all kinds of content.
And it’s not very great, but you can see here a poem by Whittier and a short story,
a temperance story, and an advertisement and down on the page is some news. As part of
that configuration, this is also before the rise of most modern copyright law. And so
texts would freely circulate within the system of 19th-century newspapers. If I was an editor
in St. Louis, then I would subscribe to newspapers in New York and Boston and Philadelphia, and
when they came in, I would browse through them and I would, if there was anything I
thought my readers might like or if there was anything that filled a certain number
of column inches that I needed filled, then I would simply take it and I would reprint
it. Sometimes I would change aspects of it; sometimes I would remove the author’s name,
put one of my own author’s names on the piece. Sometimes I would change it to suit
my readership, I might change it to suit my political bent or something of that nature.
And here you can see actually, this is a poem by the Scottish poet, Charles Mackay, the
title of it changes as it moves around the country. The first line of it changes, and
as we’re going to talk about a little later, the new version not Mackay’s version is
actually the one that becomes the most popular version and becomes turned into a song, that
was apparently beloved by Abraham Lincoln, although that I think is more anecdotal than
maybe true. So some scholars have written about this culture of reprinting, most famously
and perhaps most powerfully, Meredith McGill, who writes about this culture of reprinting
that really formed the basis for not only newspapers but magazines and even books which
were reprinted as well. But in this book, American Literature and the Culture of Reprinting,
she actually talks about how hard it is to get at the reprinted text. Basically she says
19th-century newspapers aren’t all that well indexed and so the only way to get at
them is through bibliographies that have been compiled by other scholars. Or you can do
what Ed talked about and you can read all the newspapers. You can sit down and you can
read them and sort of index them yourself as you go through. Now when you digitize the
newspapers, things get a little bit better because you can search them.. Right? So if
you know that a text was popular, you can go in here and you can search for it and you
can find more instances of it and this is actually what got me started down this whole
crazy path. I found this Nathaniel Hawthorne story that was reprinted and I went to some
digital archives and I started searching and in about three days of searching, I uncovered
three times as many copies of that story as the best bibliography of Nathaniel Hawthorne
listed and I knew I was on to something. The problem is that with just a basic search interface,
you can only find the things that you already know are there because you have to have keywords
that you can use to find those copies. So getting at the text that were really popular
during the 19th-century that we’ve lost is impossible through basic search and so
this is where I began to talk with my new colleague, David Smith, to see if we could
solve that problem using the data. So David’s going to talk a little bit about some of the
work he’s done. So our initial approach to this problem needs
to solve a couple of technical issues with this, solve a couple of technical issues due
to the fact that you’ve been digitizing so much data. We made it a little easy on
ourselves by starting out with the period covered by Meredith McGill and the work that
Ryan had done before, just the 1860 and before period. There’s nothing in principle that
limits us to that but it allows us a relatively small playground in which to get started,
you know a mere 41,000 issues of 132 different newspapers. So there are a couple of features
of this data which some of you here are more familiar with than anyone else. One thing,
there are no breaks between the articles so if the point is to find reprinted stories,
poems and so forth, those stories and poems run together into the rest of the content
of the newspaper. So at a high level, what are we going to do? What we’d like to do
is say hey, there’s this passage, this article in the Journal Extra that matches up with
this article in the Jeffersonian and that also matches up with that article in the Cleveland
Morning Leader and we actually close the loop and you know ideally we want this cluster
to say this text was reprinted three times or 100 times, whatever on this slide. But
even to start this problem, we need to solve this computational problem of finding these
pairs of issues inside 41,000 newspapers, just in the ante bellum period, which means
that in theory, a brute force approach would require 874 million pair-wise comparisons
between issues of newspapers. And not only that, because we don’t know where the boundaries
of the articles are, we need to search every cell of this grid, which indicates the beginning
point of where two newspapers might start to match up and the ending point for where
two newspapers might stop matching up. And then by hypothesis, there’s this other stuff,
there’s these other things in the corners, which you know, ads that aren’t the same,
you know other articles that aren’t the same in the two newspapers. And again the
final problem is one that will be very familiar to, or the final couple problems will be familiar
to people in this room, there are species of reprinting that are not interesting from
the point of view of viral text, they’re just reprinting that happens in the normal
course of doing business as a newspaper. Like having a masthead in the upper left and right
there, or having you know the National Republican newspaper’s manifesto that they reprint
every week or advertisements, you know there’s a standing ad for a certain oculist that appears
every week in the Vermont newspaper or in every newspaper in Nashville or something
like that. That doesn’t mean that we don’t want reprints within the same newspaper. Iit’s
interesting to know, right, that you know a paper printed The Raven in 1848 and again
in 1852 we just don’t want this kind of boilerplate reprinting. The final problem
we want to solve is again of great interest to everyone here, which is the state of the
optical character recognition. This is why you show the page images in a lot of the interface.
But you know, were work, we’re working with what we’ve got and I’ll have a few more
remarks about that later. So as a computer scientist, speaking again about making one’s
tenure case, how do I explain working on this project from the point of view of my field,
what’s interesting about this from the computer science point of view. First of all, a lot
of the work on finding duplicate text has been in a very different setup, mostly on
the web. For instance, where you want to remove duplicate webpages from your search engine
index so you save space, or so you don’t have the same results showing up again in
search results and also so that you can detect plagiarism by students in programming assignments
or writing assignments. So again that’s sort of you know most of the document is being
copied which is very different from what’s here. On the other end, people have worked
at much smaller bits of texts being reprinted, sometimes called meme tracking by Jure Leskovec
and others, where you know they’re talking about very short, quotable, viral phrases
of three or five or 10 words and again, that’s not what we’re dealing with. We need to
search for text that might, be a few 100 or a few 1000 words but still much smaller than
the whole document. We don’t know the boundaries, unlike some other people, and we have this
problem of wanting to ignore very close duplicates in the same newspaper, as well as the noisy
OCR. So at a high level as I sketched out, we want to first approximately detect these
pairs of newspapers that might contain reprinted texts, approximately find these high-confidence
regions, the passages that are actually being reprinted and then link these passages together
into clusters of viral networks. How are we going to do that? So we adopt a strategy that
at its basic level is familiar from anybody who has built an inverted index. But rather
than building an inverted index of terms, we’ll do it with of collections of terms.
Here I’ll show an example of building an inverted index of five grams. But we also
build indexes of longer end-grams as well as ones that aren’t contiguous sequences
of five words. So how do we build that index? Well we run through each document, here are
three example documents 1, 2 and 3, and get the first five words. So those five words
appear in document 1. Those five words appear in document 1, the second sequence of five
words and so on. And with cleaner text, we can afford to get longer end-grams, which
filters have some of those spurious matches earlier, we can use gapped end-grams. So what
we’d like to see is that if a text is reprinted widely as this article announcing the completion
of the first trans-Atlantic cable was, we would expect to see a few end-grams appear
in multiple texts in that actual cluster. That end-gram appears in the first and the
third documents but not in the second due to no CR error. That end-gram appears in the
first and the second, sorry, first and the third. That end-gram appears in the first
and the second there. And the one thing to note here however is that we can ignore a
lot of the index terms, this is, which we couldn’t do if we were just searching for
an individual document. We can ignore all of the terms that only appear once, which
by Zipf’s Law is going to cut our index in half. By definition, we’re only looking
for things that are repeated. So we’ll take that index and instead of organizing it by
the order that these end-grams appear in the text, we will organize it by the order. By
end-gram, which allows us to see, by document pair,which allows us to see which document
pairs share a lot of text or only a little bit of text. By document here I mean, I’m
sorry, an entire newspaper. So newspapers 1 and 2 only share that end-gram in this example.
Newspapers 1 and 3 share a couple of them. We’re going to prune out newspapers from
the same series to cut out this boilerplate. We’re going to cut out end-grams that are
just very common, fixed phrases in the language, things are very common. So after this process,
we’ve got ourselves down from down to only comparing 15 million pairs of newspapers in
this corpus or less than 1% of the total number of pairs that we would have to compute if
we’d done this brute force. So now we’ll talk about finding the reprinted passages
inside, and we’ll use an algorithm that is related to one that you might have seen,
so-called edit distance if you, or Levenshtein distance. It’s often used for instance in
finding cognates between two languages. For instance couleur to color where for instance
French inserts a U is vis a vis the American spelling E goes to O and so forth. And the
nice thing about alignment dyamic programming, alignment algorithms like this is that it
allows you to search this exponential number of paths or possible ways that two newspapers
could line up in only quadratic time due to the work of Edsger Dijkstra and Vladimir Levenshtein.
Anyway, so but we don’t want the alignment of the whole word, the whole newspaper issue,
we only want the part in between. To speed that up, we’ll anchor this alignment of
thepoints, of the end-grams that we already found and then we have these pair-wise alignments.
To cut a long story short, from this point on we use a single link clustering and find
these connected components within that graph. So what did we find, what does it look like?
Well here’s an example of James Buchanan’s farewell address where he’s sort of concerned
talking about how horrible it is that slavery is tearing the country apart, and you know
obviously this is widely reprinted in the corpus and even though there are a lot of
differences in the OCR transcription, we can find four out of a cluster of 30 different
examples of this in the collection. Or other temperance stories by TS authors. So that’s
the stage at which we turn it back over to Ryan and some of our other colleagues for
analysis. Ok. You still with us? Excellent. So what does this mean for me? Well what this
means for me is an incredible corpus to work with and at the moment we’re working with
about 392,000 texts. Clusters, we have several thousand clusters of widely reprinted texts
from the 19th-century. The exciting thing for me as a scholar of the period is that
the vast majority of these are just not texts that literary scholars have ever really paid
any attention to either because they’re obscure, because they’re anonymously written,
because they don’t fit the genre categories that literary scholars tend to work with.
So what are the things that we’re finding? These examples are drawn from our top 20 of
the most widely reprinted things in the corpus that we’ve generated. Unsurprisingly a fair
number of political speeches, but what’s interesting is that the political speeches
get contextualized in widely different ways. They go viral but the different newspapers
that print them are using them for very different purposes to support different causes. This
is Washington’s farewell speech that goes viral actually on two separate occasions in
the years leading up to the Civil War and for different reasons. I’m happy to dig
into more of these in the Q&A if you want cause I’m going to kind of fly through them.
The other thing that we get is a lot of news, unsurprisingly. David mentioned the message
that Queen Victoria sends on the completion of the trans-Atlantic cable and actually this
also goes viral twice. The reason it goes viral twice is that the first time the telegraph
operators don’t actually transcribe the entire message and the newspapers reprint
it along with a lot of commentary about how rude the queen was in her message and then
they reprinted again with the entire message saying “oops our bad” she wasn’t actually
rude we just didn’t get the whole message. So it actually sort of circulates around the
country twice in two different versions. An awful lot of stories, fiction, sentimental
stories in particular. Lots of stories of husbands and wives and children. We know that
this is a popular genre in the 19th-century and we see it reflected in a lot of the things
that go viral. What I find really interesting about a lot of these is that in the newspaper
they get framed in a very particular way. I don’t know quite what to call them actually.
I’ve been working with a few possibilities, anecdotes. Because often these stories are
not unlike the thing that you get from your cousin in your email, “Did you hear that
story about…?” By which I mean they’re framed not quite as fiction and not quite
as news. The following letter was found by a husband, by his wife after she passed away
and it’s this very sentimental letter. But of course it doesn’t say what his name was
or what her name was or where they lived or anything that would allow you to track it
down. And there’s no so you can’t immediately go and verify whether this story
took place. Lots of travel narratives, travel narratives are very popular. This is a lovely
one about journeying, sailing through the Paris sewers actually in canoes, which I guess
would be fun. And lots of jokes unsurprisingly. Lots of jokes in your Facebook feed, lots
of jokes in 19th-century newspapers. This is one is about a young husband who decides
he’s going to put his foot down and assert his authority over his wife and then is summarily
disciplined for doing so. So what else can we do with these? And what’s exciting is
when we put this newspaper data into conversation with other kinds of data, we can learn some
really great things about 19th-century print culture writ large. And so one of these, the
Newberry Library in Chicago provides an atlas of historical county boundaries, so what did
the country look like at various points of time? And when I first started experimenting
with this data in our GIS, I brought Chronicling America data about the founding of newspapers
and I overlaid that with the historical county boundary data from the Newberry Library to
get this map that shows the spread of newspapers and the growth of political boundaries in
the country from the beginning all the way to the year 2000 which is when the data ends.
Just as a broad visualization, it’s fascinating. I love the way that the newspapers sort of
strike out for the territories. They appear and then the political boundaries follow closely
from them. I need to change the color, it looks a little bit too much like an AT&T commercial
at the end but I like this visualization. We can also bring in historical census data
and we can do things like try and get snapshots of the potential readership for particular
stories. We know where they were printed. We have the historical county boundaries.
If we bring the census data in, then we can maybe learn something about who might have
been reading different stories and whether those audiences were different. So here are
just a few John Greenleaf Whittier poems, again these are all from our top 20 most viral
texts. Another poem, that Charles Mackay poem. This affectionate spirit which is really a
little bit of parental advice that tells dads that it’s okay to show affection towards
your children. You will not ruin them by giving them a hug basically. And then this really
self-indulgent article about how much smarter kids will be if they read a newspaper, which
was widely printed in the newspapers. At the broadest level, we can get a sense of how
many reprintings we’ve discovered in the data thus far. I mean we’re only working
with what’s in Chronicling America right now and the approximate population within
say five miles, or we could do 10 miles or two miles of where those were reprinted. But
the census data is actually much richer than this so you can actually dig in and see how
many literate people lived within five miles of where this was printed. If you have an
abolitionist piece, what was the slave population like near where this was printed? One interesting
geographic visualization just based on the data thus far, this is the speech that the
abolitionist John Brown gives at his sentencing hearing. It goes viral, it’s widely reprinted.
But alone among everything in our top 20 most reprinted texts, the John Brown speech is
not printed in Kansas and it is not printed in Nebraska which were the two places that
were fighting at the moment about slavery. It actually is printed a little bit in the
South. One might sort of assume well it probably wasn’t printed in the south. It was in the
South but not in the Midwest at least in the papers that we’re looking at now and I have
to add that caveat whenever we talk about this stuff. There’s also lovely collections
of historical maps and you can bring these into conversation with your data. This is
from the David Rumsey collection, an 1843 traveler’s map of the United States. This
was a map that included railroad lines, post roads, things of that nature for people trying
to get around the country. And this was actually what got me thinking about geography in the
first place. I geo-rectified this map which means you bring it into alignment with modern
coordinate systems and I overlaid some of the print histories that I had been working
with on it, and I immediately noticed this close correlation between print histories
and the railroad networks. And this is perhaps not shocking, population, rail, print, they’re
all going to follow the same paths but I had not really been thinking about transportation
networks before I visualized this and this got me thinking about it. And so then I said
well is there good data about historical transportation networks? And it turns out that the University
of Nebraska has a brilliant project, Railroads in the Making of Modern America, where they
actually provide GIS data for the transportation network at various points in the history of
the country. And so we can bring that data in, again, into conversation. This is the
railroad network in 1861, or sorry, that was ’55, and then ’61. And we can also do
some time visualizations so that you see the spread of the rail network along with different
printings of stories. And what’s interesting here is that there are some stories that seem
very closely aligned with the transportation network and there are others that seem to
not be so closely aligned. This Charles Mackay poem for instance seems to appear primarily
in places that are not on the rail network and that’s maybe an interesting thing to
dig into. Most of these visualizations suggest further research. They suggest a connection
and then you want to dig in and find out is this really a connection and what’s going
on here. You want to learn more about it. We don’t have to watch all these. I just
threw them all in there because I like them. And I’m going to let David talk about some
of the modeling that he’s been doing. Right. So what we’d like to do and are currently
working on, you’ve seen some qualitative visualizations that we’ve been doing, we’d
like also to do quantitative evaluations to make sure that we’re finding a significant
number of the things that were actually being reprinted. We’re currently working on some
manual cluster construction. Here’s some texts that we know were reprinted, let’s
just manually find all of them and evaluate that we’re getting them. We’d like to
build models to try and distinguish between texts that did go viral or not or to characterize
different kinds of texts that went viral or to try and characterize perhaps some of these
more usual or unusual genres. So the one thing to note just at the high level on the quantitative
evaluation is that just by looking at large clusters that are very long, we’re able
to very easily get without a lot of labor, find the very long texts that are being reprinted.
Not surprising right? There are just more opportunities for those end-grams to match
and to overcome the problems with the OCR. And as you get shorter and shorter texts reprinted,
say down to 1,000 matching characters among them, it takes more computational effort to
find them. So not surprising. Another interesting quantitative thing to note about these data
are the time lag between the initial text and the last text in a cluster or say the
median text in a cluster. How long did it take different kinds of texts to travel around
the country? And if you plot the distribution of these median time lags, you see there are
two peaks here. One is around, this is on a log scale which means that the first peak
is around two or three weeks. There’s certain texts, that you know, newsy one might speculate,
that travel very fast. But then there’s another peak out here at seven, at around
three years. And you can also plot this across time, as the years go on, newspapers become
better at retailing faster texts. Communication is getting better. I mean there are just more
texts and more newspapers. Again, not surprising to all of you who work on this data. So is
there anything different about these fast and slow texts? Well so if we fit a regression
model to the texts of these fast and slow clusters, you find that articles that travel
fast tend to use terms in this period, not surprisingly, like Texas, Mexico, Zachary
Taylor, say the Mexican War, also things to do with trials, corpse, cases so forth. Whereas
the slow texts are airier and more relaxing, love, young, earth, awoke, benevolence, behold,
bright, woman, things. Some other work that we’re currently working on that we don’t
have any results yet are digging into these individual clusters trying to actually trace
the chain of transmission using good old fashioned textual-critical tools or cladistics and stemmatology.
How can we, can we account for some of these missing bridge texts? Can we use statistical
inference to distinguish OCR errors from editorial changes that might indicate that two texts
were jointly influenced? And finally can we think about modeling the network? Can we actually
get a quantitative correlation between these clusters of reprinting and the railroad network
or the network of papers that shared a political view or a religious view or a social view
like the editors are brothers-in-law or something like that? So I’ll close just with a couple
of remarks about moving beyond Chronicling America. The questions that this corpus has
allowed us to ask are applicable to lots of other areas. For instance, source criticism
of the immense literature that comes out of the Civil War or any area in history. Things
like Grant’s memoirs are going to get reused in different ways by different historians.
Or another project that I’m working on now with a political scientist in tracking policy
ideas in bills. So the sad fact is, if you know Congress, that most bills fail. And if
you’re in the minority, say a Democrat in 2005, the bill that you introduce is probably
going to fail. And most people just look at the bills that pass. The question is, are
there ideas in those bills that fail that show up again in bills that do pass. Perhaps,
a little bit later or in the same session. And you know going back in an even higher
level of granularity, can we use these networks of texts to do better search? The short answer
is yes, you want to retrieve clusters not just individual passages in texts. So I’ll
close with a newspaper that’s not in Chronicling America because it’s from the wrong country.
It’s the Economist from 1871, which maybe gives a name to what we’re doing. Some of
the philosophers should turn from the invention of electrometers, galvanometers, hygrometers
and so forth to the far more difficult problem of inventing a mode of measuring the intensity
and diffusion of political wishes and convictions. So how do they diffuse? So I know we’re
running out of time so I’m going to do this really quickly but the final kind of modeling
that I’ve tried to do is network modeling. Ok. Now? The final kind of modeling that I’ve
been working with our data is network modeling. Reprinted texts are actually a pretty direct
influence or can be a pretty direct indication of influence. So we have all of this data
about texts that were shared between different publications and if we take that, we can use
it to model the networks of influence during the antebellum period. So here what you’re
looking at, I’m going to zoom in in a second, but the circles, the nodes in this network,
are individual newspapers. And the lines between them are shared reprints. If two newspapers
share one reprint, there’s a very thin line between them. If they share hundreds of reprints
with one another, then there’s a very thick line between them. The colors indicate communities.
These are groups of newspapers that shared a lot of the same texts. And so it’s figuring
out the network software, figuring out that these are possibly communities. And what’s
very interesting thus far about the experiments I’ve done with the network visualizations
is that they really are indicating these fascinating connections between newspapers that would
be very hard to get at if you were just reading the newspapers. There are communities that
emerge that are not geographic, that span wide geographies. David alluded to one. There
was this incredibly clear connection that came out in the network visualization between
a newspaper in Vermont and a newspaper in Missouri, which is quite a span in the 1840’s
but this incredibly strong connection between them so we asked one of the graduate students
working on the project to dig into this. What’s going on here? And she discovers that the
editors were brothers –in-law. And that they were probably just sharing a lot of newspapers
and copying from each other frequently. And so the network graphs have been very suggestive.
They’ve also been overturning maybe some of the presuppositions we tend to make. Because
you can’t read all the newspapers, scholars often read certain newspapers. Newspapers
in New York and Boston and Philadelphia get an inordinate amount of attention frankly,
and in our network visualizations, we’re finding that there are newspapers in Nashville
that are incredibly central to the reprinting during the period. Kind of brokers of information,
the Nashville Union American came up this morning. The Nashville Union American is incredibly
important in our data set at least, as a kind of broker of reprinted text. So what are our
next steps? Our next step is that we want more data. It’s very incomplete at this
moment. You know what’s in Chronicling America right? Every time there’s a new batch, we
perform the analysis again and we’re finding new connections. We’re finding new reprinted
texts. We’ve also started conversations, this is perhaps something not to say in this
gathering, but we’ve started conversations with some of the commercial archives of 19th-century
periodicals, to try and get access to their data. They are not so forthcoming as Chronicling
America, which will be a shock to you. And we’ve started to annotate the data and I
wanted to point out the incredible work especially of Abby Mullen here, but also of Matthew Williamson.
These are two history grad students who are working on this project with us. Abby has
compiled an incredible amount of data about these newspapers. I hope that this eventually
has a home on Chronicling America, to be honest with you. Editorial tenure, what happened
to various editors? Things like, wives who took over newspapers when their husbands’
names were still on the masthead after their husbands died or something of this nature.
And she is just building this affiliation, political affiliations of all of these different
newspapers which often shift midstream from one party to another. It’s a thorny problem
as we’re learning. And so we’re annotating the data, we’re building a web interface
for the project, At the moment it’s just a placeholder website but within
a few months, at least some of our preliminary findings will be available there, some of
the data will be browsable there and searchable there. And we just want to thank the NEH who
gave us a grant to do this project and also the NU Lab which is our intellectual home
at Northeastern and thank you. I’ve been at the Library for over 20 years. I started as a work-study student and I continually
moved my way up. But that’s enough about me, we’re here to talk about genealogy and
how you do genealogy at the Library of Congress. And later I’ll give some examples of how
I’ve used Chronicling America to locate things relevant to my own family. So I have
15 minutes and I know I’m right before break, so I’m going to get started. OK. Like I
said my name Ahmed Johnson, I’m a reference librarian at the Library of Congress in the
Local History and Genealogy Reading Room. Just a little background and information about
the Library of Congress; the Library of Congress was established as a legislative library in
1800. Of course the British came around and burned the capitol in 1814. So what did we
do? We purchase Thomas Jefferson’s personal library. And that contained over 6,000 volumes.
True renaissance man. Had books about everything, right, and I think actually there’s an exhibit
at the Library of Congress right now, where you can see those various types and it may
be available online as well. What do we have? We have three buildings; the Adams Building,
the Jefferson Building and the Madison Building. We have 21 reading rooms and seven overseas
offices. OK. What do we have at the Library of Congress? We have every book ever published,
right? No. Impossible. I get that all the time, “You have everything ever published?”
No, we don’t, that would be impossible. But what do we have? We have over 151 million
items in our collections. Not all books, actually we have about 23 million books and we have
over 117 million non-classified special items. What are special items? Newspapers, of course.
Manuscripts, telephone books, sheet music, posters, photographs and so forth. So we’re
not just books. We add about 10,000 items a day and supposedly if you lined all our
collections up, you could travel from Washington, DC, to Milwaukee, Wisconsin, over 525 miles.
So a massive collection, right? OK, what about reference services? These statistics were
based on 2012. We welcome more than 1.7 million on-site visitors. I talk to most of them because
everyone wants to find out their family history, right? Also we provided reference services
to over 550 individuals and persons via telephone and through written correspondence. And electronic,
I’m sorry. What do we have at the Library of Congress’s local history and genealogy
reading room? Well, we have over 60,000 genealogies, and when I say “genealogies” what exactly
do I mean? Family histories. Someone publishes the information, sends in a copy to the Library
of Congress or we receive a copy via copyright. We also have over 100,000 local histories.
And when I say “local histories,” I don’t mean local for just the Washington, DC, area.
People come into the reading room all the time and they are confused by that. “When
you say local, do you mean just local for the Washington, DC, area?” No, it’s for
the entire country. So if you know what county your relatives lived in, you can search our
catalogue and see what books we have relating to your family and where they lived. And keep
in mind we’re not an archive or a repository for unpublished materials. There’s exceptions
to everything I say. We’re not an archive for unpublished material but we have newspapers,
we have manuscripts and so forth but primarily when you’re looking for family histories
and local histories, we have published materials. So keep that in mind. OK, our staff. We have
specialists in everything from African American, yours truly. British Isles, Canadian, Hispanic,
Scandinavian and so forth. Now, when I say all these things, don’t think it’s a different
person for each subject area. Times are hard, people are taking on more duties. I’m probably
going to be Hispanic by next week. But we can answer questions about city directories,
the origin of names, maritime history, migration and immigration as well as biography and others.
How do you do genealogy? This is really basic. I know you’re not here to get a course in
genealogy but I always like to provide this cause this is what I do. I always suggest
that you begin with yourself and work backwards. We all have two parents, four grandparents,
eight great-grandparents right? So don’t start with great-grandpa or great-grandma.
Start with yourself and work your way up. You may find connections further up the line.
Right? And you want to document these vital records. What are vital records? Birth, death,
marriage, sometimes divorce records. You want to document your information with census records,
which go back to 1790. Interview your oldest living relative. Grandma, great-grandma, interesting
things she has to say about her life and other things that were going on before you were
here. Then you want to look at things lying around the basement and attic and the trunks
and so forth. Now once you do that, you want to get out into the community. County courthouses,
state archives, a genealogical society, historical society in the area where your family lived.
Not where you live now, you may not find too much unless you stayed in the same location
where your relatives came or were from. After exhausting all your sources at home, of course
like I just said, you venture out to the community, county courthouse and so forth, then you trace
your family back to the 1790 census which is the first census for the United States.
Did I skip one? OK, this is our home page. This is the best access point for information
about our services and collections. As you can see here, we have links. I think the best
link on this page is the Ask a Librarian link. That allows you to submit a question directly
to one of us, our reference librarians. And now we won’t answer your question for you,
but we’ll lead you in the right direction, tell you where to go to find your information.
Often times, I get people say, “My great-grandfather came from here. Give me everything you have
on him.” Not going to work, right? You do your own research, we’ll tell you where
to find it. And it may be the Library of Congress, it may not be. We may refer you to the National
Archives and other places. We also have biographies and bibliographies and guides and also how
to search our library’s catalogue, which is available online. Why would you come to
the Library of Congress? Usually people come to the Library of Congress to use our subscription
data bases. How many of you are familiar with We have it at the Library of
Congress for free. Free is always good, right? Free at the Library of Congress. We also have
others. We have over 300 subscription data-bases at the Library of Congress. So often times,
people come to the Library of Congress to use our subscription data-bases. We also have
Heritage Quest and many others. What can you do from home? We have an excellent website
called American Memory, which has digital collections and as you can see you can browse
by topic; African American, government, law, immigration, American Expansion and so forth.
For the purposes of this talk, I chose immigration and as you can see for immigration we have
13 collections. All of these are key word searches, so you just put in a name, you can
put in a location. See what you get, make it your own. And as an example, I selected
California, 1849-1900. Why were people going to California during that time? Gold, they
were looking for gold, right? You can search this collection. Similar to Chronicling America,
you can put in keyword searches, you can search by subjects, you can search by titles. Because
genealogy is not just about names, dates and locations, right? It’s about what made people
do the things they did. What made them move from one location to another? Back during
these times, people had a shared existence. It was more communal, people tended to go
to the same churches, attend the same schools and so forth. So I say all of that to tell
you that you may not find your relative here, but you may find instances of why they may
have come to California during this time, ok? So once again, I can’t guarantee that
you’re going to find something directly related to your relative, but you may find
something very interesting about that time period. Also my family’s from the Washington,
DC, area, I’m a fourth-generation native Washingtonian, so this is really great for
me. Similar to the other database, keyword searches and this has information from the
1600’s to 1925. Same thing, all keyword searches. Let’s talk about Chronicling America.
I use this database daily. I can give you hundreds of stories of where I’ve actually
found information for researchers so I’m delighted to be asked to come here to speak.
I’m really hot up here right now, though. I think it’s the bright lights. But anyway,
Chronicling America is great because, like I just mentioned, genealogists are usually
interested in names, dates, locations, vital records, births, marriages and deaths. Newspapers
have obituaries, so we’re always looking for newspapers. I think the first search I
conducted, I selected, as you all know you can select by state or you can do a particular
newspaper. I selected the Shenandoah Herald, which is in Woodstock, Virginia, and I located
an obituary for a Thelma Dysart. I selected that because one of the big shots at the Library
of Congress, his last name is Dizard but it’s not spelled that way so it doesn’t work.
This is a story about a two-year-old who died, really tragic. But the only reason I picked
it was because the second in charge of the Library of Congress name was Bob Dizard so
I just thought maybe I could find something interesting. I think he’s from Virginia.
But as I mentioned earlier, my family is from the Washington, DC, area. So I went to DC
newspapers and look at what I found. The Open Forum. My second great-grandfather’s name
was Hiram S. Haywood. Now, in many documents I found information about him, him being a
fireman. I found the date he got married, I found all kinds of information. But if you
look at this article here, this tells you about his personality. The Open Forum, this
is a letter to the editor where he wrote about how they wanted more money. And he talks about
the price of beef stock being 15 cents a pound and that was up, that was because of inflation
but yet they didn’t get any more money. So he’s pleading his case and he titles
it, “To the Men in Charge, Washington, DC.” Great stuff, right? So now I have this gem
from Chronicling America about my second great-grandfather. And this was in 1913, the Washington Herald.
And I have another example, the real estate transfers during this time period, lots of
information about real estate appeared in newspapers. Oh you know what, let me go back
one slide. Another thing I saw that his occupation was a fireman. Now African American, 1913
fireman. This blew my mind. I didn’t know we had African American firemen in 1913. He
wasn’t the fireman that you think of today. He was the person, like you said, that would
put the, light the gas lamps and do the, I can’t think of the name. Scutter the coal
for heat in the buildings and so forth. So I found out through this article exactly what
he did and he actually died doing this. But the next thing I was able to locate was a
real estate transfer. Once again, Hiram S. Haywood, Lot 102, Square 5113, 10 dollars,
a stamp of 50 cents. My family still owns that property on Sheriff Road, which is in
the Deanwood section of Washington, DC, so another great find. And I remember doing an
oral history with my great-aunt and she mentioned this amusement park that they used to visit
as kids and it was called Suburban Gardens. So what did I do? I went to Chronicling America
and did the same kind of search, Washington, DC, and I got 288 hits for Suburban Gardens.
Suburban Gardens was the first black-owned and operated amusement park in DC and there
has not been an amusement park in DC since. I got all kinds of information about this
park. I only have 15 minutes so I didn’t show you everything I found but Cab Calloway
performed there. They even talked about when they bought their first rollercoaster, how
it cost 30,000 dollars. And that was in 1920, I believe it opened , 1920-21. So the reason
I show these examples, and it’s really hot up here, so I’m kind of, I’m trying to
deal with it as best I can, but overall I wanted to provide you with a brief history
of the Library of Congress, tell you how massive our collections are, talk about some of our
digital collections that you can use from home and then tell you about a few things
that you can do at the Library of Congress and then provide you some examples of why
we just love Chronicling America. Now in closing, I would just like to say, as a librarian at
the Library of Congress, often times, we have so much, it’s so massive, right? You can
get caught up in the newspapers alone, the photographs, the maps and so forth but when
you have a database like this that allows you to search because many of the paper newspapers
aren’t indexed, this makes it so much faster, as the gentleman was stating earlier. You
can do so much more in such a shorter period of time, because I hated microfilm. I hated
looking at microfilm and I’m fairly young so I can imagine my older clientele, who are
usually doing genealogical research, how they felt. So every time I can use this data base,
I dive right in. So thank you very much.

3 thoughts on “Chronicling America: Historic American Newspapers online

  1. I love the fact that these newspapers are available for us. Technically it could be a little more user friendly though. I went to and it is easy to clip the articles and save them. Believe it or not, I just discovered the digital newspapers on Chronicling America and am having a bit of trouble seeing the thumbnails and search instructions. But aside from that, I considering the publishing of these digital images a major miracle for researchers like me.

  2. ๐Ÿ˜„๐Ÿ˜ƒ๐Ÿ˜…๐Ÿ˜†๐Ÿ˜†๐Ÿ˜‡๐Ÿ˜‡๐Ÿ˜‡๐Ÿ˜‡๐Ÿ˜‘๐Ÿ˜•๐Ÿ˜ฌ๐Ÿ˜ก๐Ÿ˜‰๐Ÿ˜‰๐Ÿ˜ ๐Ÿ˜ƒ๐Ÿ˜ฏ๐Ÿ˜ฏ๐Ÿ‘™๐Ÿ‘™๐Ÿ‘˜๐Ÿ’‰๐Ÿ’ฃ๐Ÿšช๐Ÿ”ฎ๐Ÿ”ฎ๐Ÿ“œ๐Ÿšช๐Ÿ’ฃ๐Ÿ’ฃ๐Ÿ’ฃ๐Ÿ’ถ๐Ÿ’ถ๐Ÿšช๐Ÿšฌ๐Ÿ’Š๐Ÿ“ฉ๐Ÿ“ฆ๐ŸŽŽ๐ŸŽŽ๐Ÿ“ซ๐Ÿ“ญ๐Ÿ“ฉ๐ŸŽถ๐ŸŽถ๐ŸŽท๐ŸŽน๐ŸŽน๐ŸŽธ๐ŸŽธ๐ŸŽท๐ŸŽป๐ŸŽป๐ŸŽป๐Ÿ“€๐Ÿ“ป๐ŸŽฏt

Leave a Reply

Your email address will not be published. Required fields are marked *