Muhammad Kamal Uddin, Rowshan Ara Parvez, Tasnuva Tazrin Mullick
and Md. Ahsan Habib
University of Dhaka, Bangladesh
Pages: 71 – 82
Multiple Choice Questions (MCQs) are globally used to assess students’ knowledge
and progress from a wide content coverage yet in a short time and in a form of an
objective assessment. The score distribution of Secondary and Higher Secondary
(SSC and HSC) Examinations in Bangladesh consist of 25% to 40% marks to MCQs.
Admission tests in tertiary education (e.g., public universities and medical colleges)
sets minimum Grade Point Averages (GPAs) of SSC and HSC examinations as
application criteria. Furthermore, varying weights on GPAs are given toward making
merit position for enrolment. It appears that the MCQ test plays a crucial role in high
stake public examination and higher education admission test. However, the MCQs in
HSC Examination have never been evaluated after their introduction. Additionally, the
number of MCQs has been suddenly reduced by 10% in 2018 without examining the
trade-off between content coverage and content quality in using MCQs. The present
study, therefore, addressed the issue by assessing and comparing reliability, validity,
item difficulty, item discrimination and person-item map between two changeover years
(2016 and 2018) and three educational streams (science, humanities, and business)
adopting Classical Test Theory (CTT) and Item Response Theory (IRT) approaches.
The reliability coefficient was consistently good in 2016 but poor in 2018 for all three
streams. The concurrent validity coefficient (correlation between MCQ and CQ) was less
than optimal (0.55 as recommended by Smith & Smith, 2005) with fairly similar across
years and streams. The mean difficulty was mostly moderate irrespective of the year
and stream, mean discriminability was mostly fair to good for 2016 but poor to fair for
2018 with highest for science, followed by humanities and then business studies. The
person-item map mapping ability of persons (students) in relation to difficulty of items
(questions) was found to exhibit a poor fit in general. Overall findings prove MCQ testing
a moderately valid method and falsify the whimsical MCQ reduction as a justified step.
It is concluded that greater importance should be placed on imparting rigorous training
to test makers on educational measurement and psychometrics on a regular basis. The
findings have implications for teachers, educators, and policy makers.
Keywords: Multiple Choice Questions, Classical Test Theory, Item Response Theory,
Item Difficulty, Item Discrimination, Person-Item Map