A P A R A D I G M FOR PROGRAMM~[NG STYLE R E S E A R C H Paul W. Oman Computer Science Dept. University of Idaho Moscow,...
22 downloads
600 Views
849KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
A P A R A D I G M FOR PROGRAMM~[NG STYLE R E S E A R C H Paul W. Oman Computer Science Dept. University of Idaho Moscow, Idaho 83843 (208) 885-6589
Curtis R. Cook Computer Science Dept. Oregon State University Corvallis, Oregon 97331 (503) 754-3273
Abstract
Programming style guidelines and automated coding style analyzers have been developed without a solid experimental or theoretical basis. In this paper we make a distinction between typographic style characteristics and underlying structural style content and show that this distinction aids in assessing the influence of style factors. This distinction permits straightforward identification of specific style factors and a better understanding of their effect on program comprehension. The results of our studies have a direct impact on automated coding style assessment programs, programming standards, program maintainability, and code formatting tools. INTRODUCTION Programming style is an intuitive and elusive concept. It is highly individualistic and easily recognized, yet difficult to define or quantify. The goal of programming style is to make a program clear, easy to understand and thereby easy to work with. Publications on programming style often present either a set of rules for good programming [7,8,9] or descriptions of programs for automated style analysis [2,12,13]. Books on programming style usually contain a set of style rules and guidelines based on analogous writing style guidelines for English. Kernighan and Plauger's The Elements of Programming Style [7] is the best known of these books. Automated style analyzers are essentially "style grading" programs that quantify program characteristics thought to represent programming style and compute a style score as a weighted sum of the factors. Although the authors of style guides and style analyzers may have difficulty in defining style, they have no difficulty enumerating style rules and identifying what they believe to be important programming style characteristics. Authors choose stylistic factors commonly thought to influence style with little or no supporting evidence demonstrating the importance or effect of that characteristic on program comprehension and maintainability. Typically their rules are general, and occasionally contradictory, with no guidelines on how to resolve conflicts between rules. In automated style analyzers the choice of factors and corresponding weights is entirely subjective. It is not clear that these programs are measuring program style. For example, one would expect some relation between good programming style and program errors. Harrison and Cook [6] found no relation between Berry and Meeking's style score [2] and the error proneness of a collection of C modules. Furthermore, although many style scores are based (in part) on software complexity metrics, there is no clear understanding of the relationship between style and complexity. For example, Evangelist [5] demonstrated that application of Kernighan and
69
SIGPLAN Notices, Vol. 23, No. 12
Plauger's style rules had unpredictable affects on five common software complexity metrics. His results contradict Arthur's later claims [1,p. 175] that application of the Kernighan and Plauger rules always reduces complexity. We believe that programming style is a multi-faceted concept that is not captured by a collection of rules or by a single style score. Instead we propose that programming style is best analyzed by separating stylistic factors into classes and then studying the affects of these factors on program comprehension and maintainability. We have found it useful to break programming style into two classes: those pertaining to typographic arrangement and those measuring the structural content of the code. Factors belonging in the typographic category include level and method of indentation, line length, detail and placement of comments, placement of blank lines, use of embedded spaces, identifier length, module length, and formats for type and data declarations. The structural style category includes factors for modularity, use of labels and gotos, use of constant definitions, use of included files, use of literals, methods of type and data declarations, use of fibrary functions, level of nesting, control flow, information flow, operator and operand usage, and other factors related to program complexity. Conspicuously absent from this paradigm are the more nebulous characteristics of style -the evaluation of economy, simplicity, readability, and other such traits. We recognize the importance and impact of these characteristics and suggest that controlled studies need to be directed at each of these factors in turn. We do not claim that our operational classification and corresponding list of factors completely captures programming style. Rather, our paradigm represents a modest start at formalizing an approach and methodology used for research on programming style. The primary focus of the research reported in this paper is the isolation of stylistic factors and their effects on program comprehension. We present results from a series of controlled experiments that show the benefits of our approach. We then discuss several direct applications of stylistic concerns to programming tools such as language-directed editors and pretty- printers. W h a t is " p r o g r a m m i n g style" ? There is general agreement that programming style is an elusive concept with intuitive elements that make it difficult to define and quantify. The following illustrate the diversity in attempts to define programming style. "The 'elegance' or 'style' of a program is sometimes considered a nebulous attribute that is somehow unquantifiable; a programmer has an instinctive feel for a 'good' or 'bad' program ..." [2,p.80] "Programming style brings to mind the ways that a creative programmer brings clarity, maintainability, testability, reliability, and efficiency to the coding of a module." [1,p. 173] Programming style shares many similarities with literary writing style. Effective writing is more than observing established conventions for spelling, grammar, or sentence structure. It is the "perception and judgment" a writer exercises in selecting "from equally correct expressions, the one best suited to his material, audience, and intention" [3]. Many books about programming style have rules based on English literature style rules. For example, the form and approach used in Kernighan and Plauger's The Elements of Programming Style, is based on Strunk and White's Elements of Style [16].
70
Programming style rules are goal directed; one rule may stress code efficiency while another addresses code readability. Often these rules are at odds with each other; but, for the most part, the underlying purpose is to produce code that is clear and easily understood without sacrificing performance. Since the purpose of employing programming style standards is to promote clarity and simplify structure, stylistic factors should be investigated with a view toward determining how specific style factors affect comprehension and complexity. Hence, there are four progressive stages of programming style analysis: (1) determining the affects of stylistic factors on comprehension and complexity, (2) establishing programming style standards (relative to local constraints) based on the results from 1, (3) gauging the adequacy and completeness of the standards, and (4) measuring the degree of conformity between the code and the standards. Note that it is only the last stage that suggests the need for automated style analyzers, and yet this is the area where much work has been concentrated. Also note that language-directed editors and pretty-printers operate on formatting rules for arranging code, although there are no established standards for doing so. It appears that only the first and last stages are being addressed without proper attention to the middle steps. Previous studies on programming style. Two types of studies dominate programming style research: (1) investigations into the affects of program style on program comprehension, and (2) development of automated program style analyzers. There are several good representative studies looking at the effects of stylistic variants on program comprehension. Miara, et al, [11] investigated the affect of different indentation levels on students ability to read programs and answer short questions about the programs. They concluded that two to four space indentation was optimal, while six space indentation actually interfered with comprehension. Woodfield, Dunsmore, and Shen [17] conducted a study on the affects of comments and modularity on program comprehension. Their results show that comments and method of modularity both affect students' ability to comprehend a program as measured by a 20 question objective test. The above studies are cited as examples demonstrating the utility of studying the impact of specific stylistic factors on program comprehension. The accepted paradigm for this type of research is controlled administration of a comprehension test across treatment groups receiving a variety of program styles. However, results from many previous studies have been clouded by improper methodology [4]. We suggest that such studies have inadequately controlled stylistic variants within and across treatment groups, and therefore have had difficulty establishing the true cause of observed differences. This may partially explain why some earlier studies on indentation, as reported in [15], are at odds with Miara's later findings. Contradictory evidence on the usefulness of mnemonic names, chunking, nesting, commenting, and other style factors, can also be found in the programming style literature. Proper control can be achieved by recognizing different classes of style factors and then manipulating specific factors across treatment groups. In the following section we present two straightforward experiments where different code styles were given to groups of computer science students and significant comprehension differences were obtained. The other major line of programming style research has concentrated on the development of automated program style analyzers. These style "graders', calculate a single style score that is a weighted sum of the counts of various program characteristics. The counts used to compute the sum are either fixed or specified by the operator. All programs combine measures of
71
typographic characteristics (e.g. indention, blank lines) with those of structure (e.g. control flow, modularity). While some of these studies have informally recognized a distinction between classes of style factors, none have studied them separately. One of the first automated style analyzers was proposed by Rees [13] who described a Pascal source code style grader based on ten factors: average line length, comments, indentation, blank lines, embedded spaces, modularity, variety of reserved words, identifier length, variety of identifier names, and the use of labels and gotos. Each factor was quantified and assigned a positive or negative weight with the sum of the weighted factors giving a percent score. The weights and trigger-points (maximum and minimum values that trigger the assignment of weighted scores for each factor) were established by adjusting the parameters until the automated style analyzer awarded "A" grades to good programs. ees style analysis methodology was implemented on a Unix system by Rosenthal, who distributed his source code through the SIGPLAN Notices correspondence column [ 14]. His style grader is essentially the same as Rees'. Meekings later published a modified version of the style grader [10] and then Berry and Meekings adapted it to work on C source code [2]. The Berry and Meekings style analyzer calculates measurements on the same factors used by Rees ,except the "variety of identifiers" measure was replaced by counts of included files and the percentage of constant definitions" measure has been added. Other minor changes were made to the manner in which the metrics were calculated, but for the most part, the Berry and Meekings style analyzer perpetuates the same style assessment methodology established by Rees. e
T
•
•
•
In their implementation of a FORTRAN source code style analyzer, Redish and Smyth used 33 factors for the automated evaluation of students programs [12]. The 33 measures can be grouped as follows: commenting (4), indentation (1), block sizes (2), statement labels and formats (7), counts of names and statements (6), array declarations (2), control flow and nesting measures (7), blank lines (1), operator count (1), operand count (1), and parameterization (1). Their AUTOMARK program, similar in purpose to those described by Rees and Berry and Meekings, calculates a score and percent for each of thirty factors summed into a final score. Additionally, their ASSESS program is used to obtain nonnumeric evaluation of ten style factors plus comments on indentation, commenting, and label usage. Although the list of factors used by ASSESS and AUTOMARK represents the most complete assessment of style used to date, the scoring system is primarily an adaptation of the Rees maximum and minimum trigger-point methodology with factor weights assigned by the instructor.
Classes of Programming Style Factors. In the analysis of previous research on programming style several things become apparent: (1) "Style" is a multi- dimensional assessment, with little agreement on definition, effects, and characteristics, (2) Some stylistic factors are easy to measure while others can only be operationally defined and approximated. (3) Style characteristics affect program comprehension and therefore impact program maintenance. This includes both structural and typographic factors. To better understand programming style we have found it helpful to distinguish between typographic style and the underlying structural style of the algorithm. The typographic characteristics represent the physical layout of the code. They do not, in any way, affect the performance of the code but may affect the maintainability of the code. On the other hand, the structural style characteristics are dominated by the programming constructs (e.g. looping and branching structures, modularity) and affect both performance and maintainability.
72
STYLE AND C O M P R E H E N S I O N EXPERIMENTS The importance and influence of structural style characteristics on program comprehension is widely accepted and has been demonstrated by many experiments. Typographic style characteristics are considered much less important and influential. Further, the relationship between structural and typographic style has been obscured by results from studies that failed to properly isolate relevant stylistic factors [4,15]. The following two experiments demonstrate the necessity and utility of studying stylistic characteristics with this distinction in mind. Comparing Kernighan & Plauger's nested IF statements. We conducted an experiment comparing three versions of nested IF statements taken from Kernighan & Plauger's Elements of Programming Style, [7,p. 123]. The three versions are listed in Figure 1. Kernighan & Plauger present version 1 as an example of an "ill-chosen layout," and suggest that simple reformatting can improve understandability. Version 2 is their typographic rearrangement of version 1; the only changes are method of indentation and use of embedded spaces. They then argue that a further transformation yields a better version analogous to a CASE statement. By combining logical conditions they eliminate one nested IF-THEN-ELSE and derive version 3. Versions 2 and 3 are written in the same typographic style (i.e. the use of indentation and embedded spaces is the same). Hence, the difference between versions 1 and 2 is purely typographic, while the difference between 2 and 3 is predominately structural. A simple comprehension test, consisting of five short answer and two subjective questions, was designed and administered to 36 junior-level computer science students. The experiment was conducted during the first ten minutes of class. Subjects were randomly assigned into three treatment groups of 12 students; each group receiving one version of the nested IF statements. They were asked to read the IF statements, answer the five questions that followed, record the time necessary to do so, and then subjectively rate the indentation and structure of the code. Thus, the independent variable was the "style" of the nested IF statements (Kemighan and Plauger's three versions). Dependent measures for each subject were: score (0 to 5 points), time required to answer the five questions (0 to 5 minutes), and subjective ratings for indentation and structure on a 5 point forced-choice scale (1-very poor, 5-very good). The materials distributed to subjects consisted of a page of instructions followed by a listing of the IF statements and the test questions. Average scores, times, and ratings for all three groups are shown in Table 1. In support of Kernighan & Plauger, both scores and times improve across the three versions. This is illustrated by the plot of scores and times in Figure 2. Interestingly, the subjective ratings show virtually no change across the styles. Univariate and multivariate analysis of variance showed a significant difference for the time measure (F=4.99, p<.01, d.f.=2,33). Although the trend is for scores to improve across the versions, these differences are not strong enough to report significance. However, these results demonstrate that stylistic variations affect program comprehension. We emphasize that differences between version 1 and 2 are purely typographic; while differences between 2 and 3 are structural.
Comparing code formatters and structural style. To further demonstrate the impact of style on comprehension, and to illustrate the implications on code formatting tools like language-directed editors and pretty-printers, we repeated the above experiment using three different versions of IF statements. In this experiment, version 1 was Kernighan and Plauger's original nested IF statements formatted by
73
L O ~,--~ ! O.1
i
o
~
,:4
N
o ~r~
,-4
I:1 O
-~.~ O
8
~ O
J
d,i
~
o~
f | H | H H N • o
o,
,o
oo
oo
,o
let
H
~ ~ N N N AN
•"
~ H
N~
~
............
A U
MMMmMM
Z ~
N~
AI U~
%
~ H ~
~M
MMM t~ l,.o
QA
o
o
o
•r4
or'4
.~
h
h
H
74
I-o
X
i
8
b~
i~ i ~
"a
o
~i 0
~
0
~
0
~
0
O
O .X.
<'*i r
,.-4 UNNNNN
k 13
N N N N N H
t~ ~V~NV
u[-~
^
~P
u O
A F-e N NH F-4 N o..~.. I a ~.-~ ~ ~..~..~..
H H H H
N N V V V
U ~
~
0 U
H ~ H ~ H ~
~4
.~1
,°
O
o ~
H
U .r'l
t~
eo
o
IU )-I
u
M
m
75
J
g °¢t
Macintosh Pascal, a syntax-directed Pascal editor/interpreter for the Apple Macintosh. Version 2 is the "best" of the Kernighan & Plauger versions (version 3 in the previous experiment). And version 3 is a sequential implementation of version 2 formed by eliminating the ELSE clauses and repeating logical conditions as necessary. Version 3 has no nested statements and thus represents the most primative (inefficient) implementation but, as will be seen, the best in terms of program comprehension. Figure 3 contains a listing of all three versions. The differences between version 1 and version 2 are typographic and structural style combined. The indentation, use of embedded spacing, and logical structure are all different (however slight). Version 3 was written in the same typographic style as version 2 and therefore the difference between the two is primarily structural. The same comprehension test with these new versions was then administered to 33 junior-level computer science students. Procedures were identical to the first experiment except that treatment groups consisted of 11 students. Again, the independent variable was the "style" of the IF statements and the dependent variables were score, time, and subjective ratings. Average scores, times, and ratings for all three groups are shown in Table 2. As expected, both scores and times improve across the three versions as illustrated in Figure 4. Again, the subjective ratings show little change across the styles. Analysis of the data showed significant differences for both score (F=5.02, p<.01, d.f.=2,30) and time (F=3.12, p<.05, d.f.=2,30). What is surprising is the relatively large time and score differences for such a small task.
Implications on standards, maintainability, and code formatting. The results of the two experiments reaffirm the importance of structural style characteristics and clearly demonstrate that typographic style characteristics are more than cosmetic. The improvement in comprehension observed in the above experiments seems attributable to improv.ed spatial arrangement provided by typographic and structural reformatting. In each successive version there is a refinement toward an arrangement that displays the 1:1 relationship between logical conditions and subsequent action. This 1:1 relationship first takes shape in Kernighan and Plaugers second version and becomes more apparent in their third version. The sequential version, which specifically states each condition, makes the 1:1 correspondence completely obvious. This refinement in clarity corresponds exactly with our observed improvements in comprehension. Many formatting tools are designed to produce code that is pretty (i.e. pleasing to the eye) with little or no attempt to improve the visual cues that programmers use to recognize the underlying code structure. Kernighan and Plauger's first version and the Macintosh Pascal version are examples of this style. These versions display the nested structure of the IF statements but obscure the 1:1 relationship between logical condition and action. In this instance, recognition of the 1:1 relationship provides the key to understanding the structure, while nesting is almost irrelevant. It should be noted that, to the best of our knowledge, no existing language-directed editor or pretty-printer can effectively use embedded spaces and indentation to clearly display this 1:1 relationship (i.e. produce Kernighan and Plauger's second or third versions). Another point needs to be raised concerning the difference between Kernighan and Plauger's third version and the sequential implementation of that same code. The improvement in comprehension comes at the cost of machine efficiency. This is evidence of the efficiency versus maintainability trade-off (analogous to the classic time versus space trade-off) which has occasionally been discussed but rarely demonstrated.
76
CONCLUSIONS We have attempted to present a different view of the factors influencing programming style and a new approach to program style research. By considering typographic and structural style factors separately we were better able to assess their influence and determine their utility. Automated program style analyzers greatly oversimplify style as they combine arbitrary factors thought to influence style, arbitrarily create a range for each factor, and assign an arbitrary weight to each factor in order to compute a single style score. Hence, style graders are too subjective and confound the effects of different classes of factors. Our experimental results have several implications for style analyzers and code formatting programs: . Typographic style rules and guidelines should be established that best reflect the underlying structure and support the task to be performed. While typographic factors have no impact on code efficiency, they do affect program comprehension. Folklore and published style guidelines should be re-examined in light of our experimental results. 2. Structural style rules and guidelines need to be viewed in terms of the maintainability/efficiency trade-off. Standards will differ with site and application. . Useful code formatting tools must be more sophisticated than those presently available. The combination of typographic and structural rearrangement provides a powerful mechanism with which to improve code maintainability. Sirnplistic pretty-printers and language-directed editors are inadequate and may, in fact, decrease maintainability by obscuring structural cues. Our paradigm suggests that more work needs to be directed toward the analysis of how particular stylistic factors affect program comprehension and program maintainability. Then, direction should be focused toward the development of stylistic principles to aid programmers. Once stylistic standards are established and evaluated, automated style analyzers and code formatters can be developed with a strong theoretic and applied base of principles.
References 1. Arthur, L.J.
Measuring Programmer Productivity and Software Quality. John
Wiley, New York, NY, 1985. 2. Berry, R.E. and Meekings, B.A.E. A Style Analysis of C Programs. Commuui~ c~ions of ~he A CM, 28, 1 (Jan. 1985), 80-88. 3. Birk, N.P. and Birk,G.B. New York, NY, 1959.
Understanding and Using English, Odyssey Press,
4. Brooks, R . E . Studying P r o g r a m m e r Behavior Experimentally: The Problems of Proper Methodology, Comraunica~ions of ~he A CM, 23, 4 (Apr. 1980), 207-213.
77
. Evangelist, M. Program Complexity and Programming Style, Proceeding~ of the International Conference on .Data Engineering (Los Angeles, CA, Apr. 24-27). IEEE, Silver Springs, MD, 1984, pp. 534-541. .
Harrison, W. and Cook, C. A Note on the Berry-Meekings Style Metric, Communication8 of the ACM, 29, 2 (Feb. 1986), 123-133.
. Kernigan, B. W., and Plauger, P. J. McGraw-Hill, New York, NY, 1974. .
.
The Elements of Programming Style,
Ledgard, H.F., and Chrnura, L.J. FORTRAN With Style: Programming Proverb~, Hayden Book Co., Rochelle Park NJ, 1978. Marca, D.
Some Pascal Style Guidelines, A CM SIGPLAN Notices, 16, 4 (Apr.
1981), 70-80. 10. Meekings, B. Style Analysis of Pascal Programs, A CM SIGPLAN Notices, 18, 9 (Sept. 1983), 45-54. ll. Miaxa, R.J., Musselman, J., Navarro, J. and Shneiderman, B. Program Indentation and Comprehensibility, CCommunica~ion~ of ~he A CM, 26, 11 (Nov. 1983), 861-867. 12. Redish,K.A. and Smyth,W.F. Program Style Analysis: A Natural By-Product of Program Compilation, Communications of the ACM, 29, 2 (Feb. 1986), 126133. 13. Rees,M. J. Automatic Assessment Aids for Pascal Programs, A CM SIGPLAN Notices, 17, 10 (Oct. 1982), 33--42. 14. Rosenthal,D. in correspondence from the members, ACM SIGPLAN Notices, 18, 3, (Mar. 1983), 4-5. 15. Shell, B.A. The Psychological Study of Programming, Computing Surveys, 13, 1 (Max. 1981), 101-120. 16. Strunk, W. and White, E. B. NY, 1959.
The Elements of Style, MacMillan, New York,
17. Woodfield, S., Dunsmore, H., and Shen, V. The Effect of Modularization and Comments on Program Comprehension, In Proceedings of the Fifth International Conference on Software Engineering (San Diego, CA, Mar. 9-12). IEEE CS Press, Los Alamitos, CA, 1981, pp. 215- 223.
78