Research Around OCR Errors When Using Newspapers in Genealogy

Research Around OCR Errors When Using Newspapers in Genealogy

The quality of images of old newspapers
that you are searching on newspaper websites dramatically impacts the types
of results that you will receive. The reason is the optical character
recognition scans on these old newspapers to use computers to index
images. Today we’re going to talk about how to overcome some of the problems
associated with the OCR scans of newspapers. Howdy, I’m Devon Noel Lee with Family History Fanatics, where we help you climb your family tree understand DNA and write the stories of your ancestors along the way.
Today we’re gonna continue on newspapers series and if you’ve missed any of our
previous newspaper research videos be sure to check the end screen at the end
of this video or the description below this video for the links that you need
to go check them up. Today we’re talking about how to overcome OCR challenges in
newspapers and we’re going to use, but you can use any
newspaper research website. We’re looking at a typical entry in a newspaper, in
this case we’re looking at society news and we can see an article where Mrs.
Alexander B Hustin or Huston H u s t o n it’s having a reception and has invited
a number of people to attend. As you look through you’re gonna see some smudges,
some even more smudges, even more smudges, and those smudges make it difficult for
the optical character recognition, the OCR program, to scan these documents and
rabidly read them and turn them into digital text. Now as we continue to span further
down the page you’re gonna see more of that black smudging, the beginning of the
blurring of the letters, but as you get way down to the bottom it is really
really hard to read the names of the people that attended this event. And the
question is what if your ancestors names are right here? Let me demonstrate how
that OCR does not pick up some of the names that we see here on the newspaper
article. Let’s type in Albert Carey type that in and press find. So, even though
this page is a little bit blurry and the there’s that black hue behind it kind of
smudging out the details the OCR was able to pick up that name but if I was
searching for Maude Stone because I didn’t know that Maude was named Mrs.
Albert Carey I didn’t know that she married. Watch what happens the OCR did
not recognize this stone it only recognizes Maude. So one of the
strategies you’re going to apply when you’re researching in newspapers is
you’re going to sometimes search for just their names, and sometimes just for
first name, and yeah you’re gonna get a lot of results but make sure you narrow
down to a time and a place and then just be willing to check out the results. It
is worth it when you’re trying to crack brick walls because you never know where
the information is hiding, but general interest I can understand if you don’t
go down this route. Before we continue with the rest of this video be sure to
subscribe to our channel and hit the bell to be notified of future videos. So
I hope that was an interesting look at the actual newspaper seeing how blurry
pages, faded, scrunch drip writing, and these type of things in the newspapers
contribute to OCR problems. Now how do you get around them? Now one way is to
use OCR substitutes there are common letters that are looked the same when
the computer is reading a document. You will often find a B will be included as
an H rather than the B or the C will be an e or an e will be the C and vice-versa so
develop a list of substitutes. If you’ll check the description below I’ll have a
link to my blog to a specific article
that’s going to give you a list of letter substitutes that you can
implement in your OCR searches. Be sure to check it out. So when you’re searching
for a name or a business or some of the other keywords that we mentioned in a
previous video what you need to do is substitute letters. So, let’s say you’re
looking for the last name that you see on this screen well we have c’s we have
h’s we have b’s we have e’s let’s play around with different substitutes so
maybe you’ll put this in. Now I don’t even know how to say that name the first
one looks like schibler this one looks like it’s Skiblyn maybe, but to the OCR it
looks the same. So, try the different substitutes you may try this one
Shiblin it’s very similar to Schibler but the R got messed up and how about
this one. So go ahead and play around with different letter substitutions to see if
you can track down your ancestors in newspapers when you previously thought
there were no articles for them. The next thing you’re going to want to try is
cluster research, cluster research is when you are researching a cluster of
individuals maybe a group of people who were in the Eastern Star Association. You know your grandmother was part of the Eastern Star, but you
also knew a couple other club members so by searching for those club members you
might find articles about your grandmother because the OCR couldn’t
pick up her name, but it’s sitting there in the newspaper. You can also look for
other family members sometimes the article for a different family member
will lead you to an article that has all of the family members in it. And you
definitely want to look for friends when you see people serving together in
different organizations maybe their neighbours, Witnesses on common documents,
then you want to go search for those people in newspapers and see if your
ancestor pops up there in a newspaper clipping about their friends and it
happens to name them as well. So in this case you may be looking for Charles Lee
Miliey or Miles, it’s really hard to read, and do you know that John Henry Martin
is often in association with them they’re friends.
So be sure to check out cluster research. The next method is the Stumbleupon
method, um now Elissa Scalise Powell says that hope is not a research strategy, but
sometimes you’ll find some really cool articles because you go into the
newspaper looking for one thing and because you take time to read some
of the other articles that appear on the page you might find some extra details
about people you weren’t searching for at that moment but they’re part of your
overall research plan so let’s take a look. So again let’s say we were looking for
this article about Mrs. Alexander B Huston or Huston and you just happen to
see a name over here on this side this name right here Mrs. D E G I S C
E R Degisher, well maybe you knew that these two people were related or
connected or maybe it happens to be that Mrs. Alexander she was on one branch of
your tree and then Degischer was on the other branch of your tree and
you were reading the news from the town that you’re researching in and happened
to pick up both branches from the same day in the same newspaper. Now the next
strategy of OCR workarounds is to just read everything. Maybe you’re looking for
your ancestors obituary and you know when they died and you just start
reading all of the obituaries that happens within a couple of weeks after
your ancestors death it’s worth a try, have fun. I’m not particularly gonna do
it unless there’s a serious brick wall that I’m trying to break down
and I haven’t solved any other problems, but don’t discount that sometimes you
just have to read. You can scan and browse and things of that nature but
sometimes you can’t get around the OCR you just have to read. Now if you’re
unwilling to read everything of the time and place your ancestors lived then
you’re gonna have to come to terms with the fact that you’re gonna miss out on
some cool discoveries. So instead why don’t you search for low-hanging fruit
look for just birth, marriag, and death announcements as well as maybe some land
transfers and probate announcements other than that you’re probably just
gonna leave the rest of the newspaper alone. So I hope this strategy which of
searching for clusters using OCR substitution or my previous video of
searching for keywords helps you find really great stories in newspapers. But
my question of the day is this how do you get around OCR problems in
newspapers? I loved in our community that we can learn together so share your tips.
Once again I’m Devin Noel Lee with Family History Fanatics to watch more videos
about newspapers check here and to find our latest video check here.

5 thoughts on “Research Around OCR Errors When Using Newspapers in Genealogy

  1. Great tips, as usual. 😊
    Like you, I really only try to go in super hard when it is a brick wall relative, but the way I work it is as follows:
    1. First I have to find at least one mention of my person in a paper local and contemporary to them.
    2. Once I am sure it is my person (I know it is the area or an event tied to them) I go to the article so that that whole page is open, not just the clipping I made.
    3. I click on the newspaper link at the top and it opens up to page-month-year-paper, etc. I click back so that I can search only that paper in a certain year or maybe just the paper itself. Also, it shows me what is available for that paper as far as dates.
    4. In the search bar, I type in persons name so that it gives me all search results for that name in that paper. I know I’ll have at least one but I’m doing this to get to the next step.
    5. On the results page, I adjust the range of dates to what I know is relevant. I know some people would just put in the life span but then you might end up missing where the person is named in an obituary of a relative that is after your person and might give you clues. But I do cut off when it starts to around 10 years after their birth.
    6. Then I start playing with their name – the way it is spelled with OCR, nicknames, and incorrect spellings.

    Recently, I found a whole world of information for a relative who was called William in his later years, but went by Will, Bill, and Willie in his early years. If I had not done my search that way, I wouldn’t have found out he was sent away to a detention center and then got arrested for escaping from the superintendent! Lol! I even found a letter the paper published from him to his sister because he was traveling in Africa and they thought that was interesting to publish.
    So, my way depends on that paper that I know reported on the person in order for me to mine it in various ways.

  2. Very useful tips. I don't subscribe to, but OCR obits (partial) are showing up on Ancestry searches, and they are sometimes rather garbled. For example, the OCR SW gets counties vs states mixed up (Washington County vs Washington state). So I do additional work to make sure I make the correct choice. Sometimes I can find the complete obit on familysearch.
    A question: are there any holiday specials on the price for subbing to My local library doesn't subscribe to it. THANKS Devon!

  3. One that I recently found useful for finding females (when you don't know their married name, or even when you do, but OCR is messing up the collection of the first and last name) is to search for "nee Smith" (with quotes), and using what others have suggested for limiting locale's, newspapers or date ranges. Sometimes (easy for unusual surnames, overwhelming for common surnames) you find other female relatives that you didn't even realize were there!

    I appreciate your sharing the use of OCR letter substitution! (boy, that would be helpful if I had the patience to try the permutations… But good to know!) And if you find a relative that comes to visit often, I use that person's name in that locale. Or a part of my ancestor's first name and the other person's (if ONE of those names is unusual, and there's frequent contact)

    I wish Ancestry (owners of would implement wildcard character searches, where S* would get all words beginning with the letter S (which is impractical), but with some foreign names, it might be useful to search for Schie* to find variable surname spellings on the end.

    Or mid-name variations (or OCR messes) Shim*an would get Shimerman, Shimean, Shiman (* is 0 or more characters).

    Or the use of a question mark for single letter substitution. Buch??ld would get Buchwald, Buchauld, Buchweld, Buchwold. Or useful for ?ather?ne for Catherine, Katherine, Catherane or Katherene, etc.

  4. Great video, Devon! I have seen the smudges but I did not realize that this causes problems with the OCR so that is good information!

Leave a Reply

Your email address will not be published. Required fields are marked *