Text Analysis and Word Counts (TAWC)

By Adam D. I. Kramer

adik [at] uoregon [dot] edu

Addresses mangled to thwart spam.


TAWC is a PERL program designed to count words in digital text, distributed under the GNU public license. It does so in two phases:

Pre-processing
In the pre-processing phase, TAWC processes digital text, in the form of individual files, delimited spreadsheets, or MySQL database columns. TAWC allows users to specify which sections of the digital text are relevant, e.g., which parts of the text corrospond to names, and which parts corrospond to text which should be analyzed. This allows for easy processing of digital text such as transcripts or online log files such as those from mIRC or AIM.
Analysis
In the analysis phase, TAWC separates the digital text read in during the pre-processing phase, and counts how many words occur within it. TAWC allows researchers to define word categories which contain lists of words or word patterns. TAWC allows users to specify any number of word categories, and will count how many words and word patterns from each category are used in the digital text.

Obtaining TAWC


This research was supported by:


Page updated 9/1/04