|
|
Four different readability statistics are calculated within analyze. Readability statistics assess variables including the average number of words per sentence, average length of sentences, number of syllables per word, and so on, to derive a formulaic estimate of the ``readability'' of the text. They do not take into account less quantifiable elements such as semantic content, grammatical correctness, or meaning. Thus, there is no guarantee that a text that a readability test identifies as easy to understand actually is readable. However, in practice it has been found that real documents that the tests identify as ``easy to read'' are likely to be easier to comprehend at a structural level.
The four test formulae used in the analyze function are as follows:
File rap-bat.wc contains: 243 words 95 lines 1768 charactersSentences are counted using a custom awk script, explained in ``Spanning multiple lines''. Then the number of letters is established (by subtracting the white space from the file and counting the number of characters), and the number of syllables is estimated using another awk script. Finally, these values are fed into four calculations that make use of bc, the SCO OpenServer binary calculator.
bc is a simple programming language for calculations; it recognizes a syntax similar to C or awk, and can use variables and functions. It is fully described in bc(C), and is used here because unlike the shell's eval command, it can handle floating point arithmetic (that is, numbers with a decimal point are not truncated). Because bc is interactive and reads commands from its standard input, the basic readability variables are substituted into a here-document which is fed to bc, and the output is captured in another environment variable. For example:
233 : Flesch=`bc << %% 234 : w = ($wordcount / $sentences) 235 : s = ($sylcount / $wordcount) 236 : 206.835 - 84.6 s - 1.015 w 237 : %% 238 : `analyze also prints the output from the tests, as follows:
ARI = -10.43 Kincaid= -7.01 Coleman-Liau = -17.00 Flesch Reading Ease = 184.505Depending on the setting of $LOG (the variable that controls file logging) the output is printed to the terminal, or printed to the terminal and a logfile (the name of which is set by the variable $LOGFILE.)