Text Analysis and Word Counts
(TAWC)
By Adam D. I. Kramer
adik [at] uoregon [dot] edu
Addresses mangled to thwart spam.
TAWC is a PERL program designed to count words in digital text,
distributed under the GNU
public license. It does so in two phases:
- Pre-processing
- In the pre-processing phase, TAWC processes digital text, in the form of
individual files, delimited spreadsheets, or MySQL database columns. TAWC
allows users to specify which sections of the digital text are relevant,
e.g., which parts of the text corrospond to names, and which parts
corrospond to text which should be analyzed. This allows for easy processing
of digital text such as transcripts or online log files such as those from
mIRC or AIM.
- Analysis
- In the analysis phase, TAWC separates the digital text read in
during the pre-processing phase, and counts how many words occur within it.
TAWC allows researchers to define word categories which contain lists
of words or word patterns. TAWC allows users to specify any
number of word categories, and will count how many words and word patterns
from each category are used in the digital text.
Obtaining TAWC
- The current version of TAWC is 0.8. This is a pre-release version. It
is not guaranteed to do anything! If you want to use it, you are
strongly encouraged to try it out. Please send any questions, comments, or suggestions on the
program or the manual to Adam at one of the email
addresses above. Support is not guaranteed, but hey, it could happen.
- You may click here to download TAWC. You
may have to right-click and choose "save as," lest your browser attempt to
display the perl code.
- The TAWC manual is also available
for download.
This research was supported by:
Page updated 9/1/04