OCR Text |
Show Good Evaluation Impossible are significant. They do, however, lack cardinal precision. pre-cision. If we know that the standards are tightly grouped, i.e. we accept the heuristic assumption, then, 35 ones could simply, reflect a lecture that is .slightly below the standard. They still lack cardinal precision. The ones cannot be thought of as being 75 per cent below par. It would be just as appropriate to divide each number in the scale one to seven by 100 and talk about an average of four-hundredths. The extreme values one or seven, if the entire class votes for them, convey admittedly slightly more information than does a mixture, but what exactly is being conveyed is still ordinal (lacking a real basis) and quite vague. Evaluation Not Valid The main thrust of my argument is that a frequency fre-quency distribution of each categeory for a given course has no validity, for the numbers of the scale have no interpersonal comparability, except under the heuristic assumption that a common standard is being used by each student. And then, if that were the case, the scale can only validly read-1, 0, 1 for below par, par or above par. I have not taken the time to go through all the seven categories, but the same questions ques-tions and arguments apply to them also. Other points of criticism dealing with the methodology method-ology of the program not the idea itself could be made but these points all relate to the same analytical an-alytical fallacy of using numbers as if they necessarily neces-sarily have cardinal significance.' I invite all students to think seriously about the assumptions behind the present evaluation program. J. P. Gander Economics Faculty FACTS FROM FIGURES: DO THE LATTER SPEAK FOR THEMSELVES? i The course evaluation program is a serious under-' under-' taking and the students who entrepreneured the pro-gram pro-gram are to be commended. Like all entrepreneurial - innovations, however, this one must also be subjected i- to the "market test." The student body as a whole is sn the market place. The concern I have is for the set H of criteria students will use in deciding on the value ' to them of such a program. In this article, all I want to do is to raise some questions about the "value of the ft" product." The students can then formulate their own uli criteria of judgment, tig Xjie evaluation card has seven categories. For each if category, the student is to select a number from one to seven. This number is supposed to indicate the ij extent to which the instructor fulfilled the particular (! task being evaluated. The number four, your are ad- vised is the average. The questions one should raise are the following. Is there a real concept of the "aver- age" prepared lecture? The evaluation question does not .ask, how frequent in the course of, say, 40 lec- , ; tures was the instructor prepared or not prepared. It I asks the degree of preparedness and asks the student I to indicate this by a number. Are these cardinal or , j ordinal numbers? If four reflects average prepared- ness and if this is a real substantive concept with cardinal significance, then three reflects 2 per cent less than the average. If the concept has only ordinal . i significance, then four has no "average" meaning and three simply indicates that the lecture was less prepared than "some" lecture that the student has in his mind, is using as a standard, and gives the number four to because the instructions say so. Will It Work? Take a simple illustration. Student A has in mind a lecture from a past instructor and gives the present instructor a six. Student B has had few courses at the University so he uses (unconsciously) the personality of his father as a basis to judge the personality of the instructor and then evaluates the preparedness of the lecture and gives it a two. Student C uses some other basis to determine preparedness and gives the lecture a four. What kind of information can you get from the numbers twb, four, six when the basis of comparison com-parison is different? Even if we made the heuristic assumption that all students will in fact judge on the basis of a more or less general consensus of what constitutes a well prepared lecture, what is the meaning mean-ing of two, four, six? Do these results say that one-third one-third of the students think the lecture was 50 per cent below par, one-third think it was par and one-third think it was 50 per cent above par? What about the extreme values? In a class of 35, say the results are 35 ones. How much information do we have? All we can infer is that 35 students chose the lowest number in the scale. Since all 35 did this, this is what makes one significant. But significant of what? If the standards of comparison are widely dispersed, dis-persed, then 35 ones would mean the lecture was below he lowest standard of a widely dispersed group of standards. Intuitively we may conclude that the ones |