Knowledge assessment and grading represent important instructional stages aimed at determining the achievement of learning objectives and learning standards among pupils. As such, they play a significant role in ensuring the quality of the educational system.
In Slovenia, internal grading has a significant selective function in one’s progression in primary school as well as one’s progression towards a higher level of education. At the same time, it influences the formation of their educational and career aspirations. Due to its deciding role in the future of pupil's education, it is crucial for internal grading to be valid, reliable, objective and sensitive.
In the first part of the empirical research, we examined methodological aspects of internal assessment and grading by analysing opinions and practices of primary school teachers in terms of internal and external assessment and grading. In the second part of the research, we analysed teachers' grading of written exams in Slovene and Mathematics among fourth-grade primary school pupils. In the first part, data were collected through a survey, using a questionnaire. The sample consisted of 882 primary school teachers from different Slovenian regions, 381 of whom were class teachers and 501 of whom were subject teachers. The data from the second part of the research were collected on a sample of 24 class teachers who each graded written exams either in Slovene or Mathematics from five fourth-graders on two different occasions. Written exams were prepared by a class teacher and were intended for internal grading of the pupils in her class. In the grading process, each rater had to set her own assessment criteria. All the participants also evaluated the written exams in terms of difficulty and types of items, clarity, precision and unambiguity of the instructions, as well as the maximum score assigned to each item. Additionally, they each wrote a reflective essay on their assessment and grading practices.
The results showed that primary school teachers rate numerical grades higher than descriptive grades when it comes to their informative and motivational value as well as their measurement characteristics. The same applies for internal grading in relation to external grading where they attribute higher validity, reliability and objectivity to the first. The research showed that class teachers as well as subject teachers take into account other components besides knowledge in the grading process, such as pupils’ verbal and writing skills, their effort, and active participation in the classroom. A considerable share of teachers also includes various other components into the grades (pupils’ working habits, their attitudes towards learning content, etc.), and an even larger share would do so if given the autonomy to decide by themselves. Grading of different components that cannot be graded objectively per se lowers validity and reliability of grading. Based on that, we propose a reflection on particular learning standards in the curricula, the achievement of which cannot be graded objectively, consequently producing grades with low validity and reliability.
The teachers stated that they used different assessment strategies; they support the use of alternative assessment as a supplement to the traditional assessment strategies, but are unsure of the measurement characteristics of alternative assessment strategies and do not agree with them replacing the traditional strategies. The majority of the teachers asserted that they present assessment criteria to their pupils on different occasions. They generally know the criteria used by the other teachers from the same school and believe these criteria agree with their own. The majority of them also stated that they frequently collaborate with the other teachers at their school in preparation of written exams.
Despite the significant differences in the opinions and practices between the class teachers and the subject teachers, between teachers of different years and of different subjects, and between those with different work experience, their answers speak of their effort to ensure good measurement characteristics of internal grading, as well as of their high trust of actually achieving that.
The analysis of the sensitivity of grading from the second part of the research in general suggests a higher sensitivity of the second grading. The sterner raters proved to perform more sensitive grading compared to the more lenient ones. According to the Intraclass Correlation Coefficient, the average score of an individual exam proved to be highly objective (with a high interrater reliability) for the Slovene exam and even more so for the Mathematics exam. These findings (higher objectivity for Mathematics compared to languages) are confirmed by similar findings from the literature. The second grading proved to be more objective than the first for both subjects. Objectivity of a single rater was found to be different for the Slovene and Mathematics exams for both the first and the second grading. For the Slovene exam, objectivity of the first grading of a single rater was poor in terms of both absolute agreement and consistency. The second grading resulted in sufficient consistency of scores, but not grades. Both the first and the second grading were inconsistent and resulted in biased grades in terms of absolute agreement of an individual rater’s numerical grade compared to the average grade from all of the raters for an individual exam. As expected, the grading of an individual rater was found to be more objective in the case of Mathematics than Slovene. While the scoring of an individual rater proved to be consistent on both occasions, the second grading also resulted in a high absolute agreement of the assigned scores with the average scores from all of the raters for an individual exam. Similar to Slovene, assignment of the numerical grades in Mathematics was inconsistent and poor in terms of absolute agreement of the individual rater’s grade with the average grade from all of the raters on both occasions. Rater reliability was also found to be higher for the Mathematics exams than Slovene exams: in Mathematics, only one out of thirteen raters was unreliable, whereas in Slovene, there were seven unreliable raters in the group of eleven. Despite differences in their grading comparison to the results from existing studies in which raters used uniform assessment criteria leads to a conclusion that for Mathematics, rater reliability of the teachers in our research was good, whereas rater reliability for Slovene was lower, but still comparable to the results of the previous studies.
The present research is the first research in Slovenia which deals with internal grading in primary school in a comprehensive manner: it tackles teachers’ views on their own assessment and grading, and their opinions about the national assessment; it deals with assessment strategies and circumstances surrounding the assessment and grading processes, as well as the analysis of classroom grading of written exams.
The present results confirm some of the previous findings on internal grading and point out the aspects of internal grading that need to be reconsidered and reconceptualised to improve its measurement characteristics. Based on our findings, we propose extensions of our research methodology and procedures, and present guidelines for the improvement of measurement characteristics of internal grading on the level of educational system, individual school and individual teacher.
|