Week 9 Reflection-Text Mining

Of all the digital tools discussed this week, I find Google’s H-Bot to be most impressive.  Although it’s format is nearly simple, ask a question-get an answer, and may seem somewhat redundant, because a simple internet search can yield more information about a topic rather than answering a specific question, the technology behind it is very refined.  The ability to like verbs with past, present, and future tenses is still under development.  Text mining in many programs is in large part WYSIWYG.  The word “state” may contain many different meanings and contexts, yet computers have to be trained to recognize the differences between these words.  The H-Bot, and it’s ability to show to link verbs and tenses together as well as associate words like “when” and “where” with quantitative and geographic information is truly a break-through in developing computer programs that can more efficiently scan the millions of sources already available for more accurate information relating to queries.

Of course, I’ve spoke about my experiences in the past with text mining and current projects we are working on that utilizes these digital technologies.  And it is clear that these technologies are certainly imperfect.  As I stated last week, it appears that Google really is the company on the forefront of developing programs like N-Gram viewer and H-Bot. As many programs rely on search algorithms and program platforms popularized and invented by Google.  But even the imperfection of these programs give digital historians a place to continually refine to make large amounts of data more easily searchable.  If there was a theme to tie digital history together, it certainly would be progression.  The technology that has been developed from rudimentary programs have been expanded and more complex programming has taken place to refine the programs to act more human-like.  Simultaneously, the programs may be becoming more complex while the interfaces and ability for the lay person to utilize these searching tools and database software is becoming simpler and more straight forward.  The computer processes are hidden “under the hood” and ensure that more people can utilize these tools for historical research.  Certainly, democratization of these tools goes hand in hand with the progression from simple program that uses simple programming with a sometimes complicated or confusing interface, to complex program that is visually simple for the user to work with.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Week 9 Reflection-Text Mining

  1. pamsc says:

    The other part of the story is what data base to apply the tools to. Is the web (or even all books published in a period) the most useful data?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s