| |
Testing and Assessment Glossary of Terms
|
A |
| ability / trail parameter |
In item response theory (IRT), a theoretical value indicating the level of a participant on the ability or trait measured by the test; analogous to the concept of true score in classical test theory. |
| ability testing |
The use of standardised tests to evaluate the current performance of a person in some defined domain of cognitive, psychomotor, or physical functioning. |
| absolute score interpretation |
The level of an individual's or group's competence in some defined criterion domain as inferred from the test score. |
| accommodation |
A reasonable modification in an assessment instrument or its administration made to compensate for the effects of a qualified disability without altering the purpose of the assessment instrument. |
| accountability |
Responsibility of a certification board, advisory committee, or other sponsor of a certification program to its stakeholders to demonstrate the efficacy and fairness of certification policies, procedures and assessment instruments. |
| accreditation |
A status awarded by a certification agency to a candidate that has demonstrated compliance with the standards set forth in the certification program. |
| acculturation |
The process whereby individuals from one culture adopt the characteristics and values of another culture with which they have come in contact. |
| achievement levels / proficiency levels |
Descriptions of student or adult competency in a particular subject area, usually defined as ordered categories on a continuum, often labeled from "basic" to "advanced," that constitute broad ranges for classifying performance. |
| adaptive testing |
A sequential form of individual testing in which successive items in the test are chosen based primarily on the psychometric properties and content of the items and the participant's response to previous items. |
| adjusted validity / reliability coefficient |
A validity or reliability coefficient -- most often, a product-moment correlation -- that has been adjusted to offset the effects of differences in score variability resulting from different populations. See restriction of range or variability. |
| ADL |
In 1997, the US DoD and the White House Science and Technology Bureau launched the Advanced Distributed Learning initiative. ADL was targeted from the very beginning to Web-based education. Its work is coordinated with other organisations like IEEE, IMS and AICC. As a result of this joint work, it was produced the Sharable Courseware Object Reference Model (SCORM). This proposal includes a reference model for educational sharable software objects, a runtime environment and a content aggregation model. |
| administrative independence |
An organisational structure for the governance of a certification program that ensures control over the essential certification and re-certifications decisions without being subject to approval by or undue influence from any other body. See Autonomy. |
| advisory committee |
A group of individuals appointed or elected to recommend and implement policy related to certification program operation. |
| age equivalent |
The chronological age population for which a given score is the average score. Thus, if children ten years and six months of age have a median score of 17 on a test, the score 17 is said to have an age equivalent of 10-6. |
| AICC |
The Aviation Industry CBT Committee is the natural response to the educational standardisation challenge from one of the largest users of educational software. The activities of the AICC are targeted, among others, to the definition of software and hardware requirements for student computers, needed peripherals, multimedia formats for course contents, and user interface properties. |
| ALIC |
Advanced Learning Infrastructure Consortium (ALIC) is a collaborative effort of the Japanese government and the industry and academic professionals working on e-learning field. |
| alternate forms |
Two or more versions of a test that are considered interchangeable, in that they measure the same constructs, are intended for the same purposes, and are administered using the same directions. Alternate forms is a generic term used to refer to any of three categories. Parallel forms have equal raw score means, equal standard deviations, and equal correlations with other measures for any given population. Equivalent forms do not have the statistical similarity of parallel forms, but the dissimilarities in raw score statistics are compensated for in the conversions to derived scores or in form-specific norm tables. Comparable forms are highly similar in content but the degree of statistical similarity has not been demonstrated. |
| analytic scoring procedure |
A procedure in which the judgment of each critical dimension of performance is undertaken separately and the resultant values re combined for an overall score. In some instances, scores on the separate dimensions may also be used in interpreting performance. |
| anchor test |
A common set of items administered with each of two or more different forms of a test for the purpose of equating the scores of these forms. |
| answer key |
The key that describes the scoring scenario for a question or test. |
| appeal |
Request by applicant, candidate or certified person for reconsideration of any adverse decision made by the certification body related to her/his desired certification status. |
| applicant |
An individual who declares interest in earning a credential offered by a certification program, usually through a request for information and the submission of materials. See Candidate. |
| ARIADNE |
The Alliance of Remote Instructional Authoring and Distribution Networks for Europe was part of the European Commission's fourth framework program. The main working fields of this alliance include computer networks for education and learning, methodologies for the development, management and reuse of educational contents, syllabus definition for computer based training, and educational metadata. |
| assessment |
Any systematic method of obtaining evidence from tests, examinations, questionnaires, surveys and collateral sources used to draw inferences about characteristics of people, objects, or programs for a specific purpose. |
| assessment instrument |
The methods for determining if candidates possess the necessary knowledge and/or skills related to the purpose of the certification. |
| attention assessment |
The process of collecting data and making an appraisal of a person's ability to focus on the relevant stimuli in a situation. The assessment may be directed at mechanisms involved in arousal, sustained attention, selective attention and vigilance, or limitation in the capacity to attend to incoming information. |
| authoring system |
A generic name for one or more computer programs that allow a user to author, and edit items (i.e. questions, choices, correct answer, scoring scenarios and outcomes) and maintain test definitions (i.e. how items are delivered with a test). |
| automated narrative report |
A programmed, computer-generated interpretation of an examinee's test scores or test score profile, corresponding to the level of each score or by the interrelationships among the scores, and based on empirical data and/or expert judgment. |
| autonomy |
Control over all essential certification and re-certification decisions without being subject to approval by or undue influence from any other body. See Administrative Independence. |
|
|
| B |
| battery |
A set of tests standardised on the same population, so that norm-referenced scores on the several tests can be compared or used in combination for decision making. |
| bias |
In a statistical bias context, a systematic error in a test score. In a fairness bias context, bias may refer to the inappropriateness of content in the assessment instrument, either in terms of its irrelevance, overemphasis, exclusion, under representation, irrelevant components in the construct of test scores. Fairness Bias usually favours one group of participant over another. In a eligibility bias context, bias refers to the inappropriateness or irrelevance of requirements for certification or re-certification if they are not reasonable prerequisites for competence in a profession, occupation, role or for product use and support. See Fairness. |
| bilingual |
The characteristic of being relatively proficient in two languages. |
| blue-print |
A document that contains information about the assessment, including its stakeholders, the intended candidates, eligibility, job analysis, the conditions under which the assessment must be conducted, content domains, and other information to ensure that assessments are valid, equivalent and unbiased. |
| bubble sheets |
Paper forms that contain printed circles (i.e. bubbles), and other guide marks, to prompt a participant to fill in the form for later scanning by an optical mark reader. |
|
|
| C |
| CAA |
Computer Assisted Assessment. A common term used to describe the use of computers to support assessments. |
| CAL |
Computer Aided Learning. A common term used to describe the use of computers to support learning. |
| CAT |
Computer Adaptive Testing. A method by which a computer selects the range of questions to be asked based on the performance of the participant on previous questions. |
| calibration |
The process of setting the test score scale, including mean, standard deviation, and possibly shape of score distribution, so that scores on a scale have the same relative meaning on scores on a related scale. |
| candidate |
An individual who has met the eligibility qualifications for, but has not yet earned, a credential awarded through a certification program or a person that participates in a test, assessment or exam by answering questions. See Applicant. |
| CBA |
Computer Based Assessment. A common term used to describe the use of computers to deliver, mark, score, and analyse assessments. |
| CBL |
Computer Based Learning. A common term used to describe the use of computers to support learning. |
| CEN |
The European Committee for Standardisation (Comité Europeacute;en de Normalisation, CEN). |
| CEN ISSS |
The European Committee for Standardisation (Comité Europeacute;en de Normalisation, CEN) hosts the Information Society Standardisation System (ISSS) subcommittee. |
| CEN ISSS LT |
Educational standardisation activities at CEN ISSS take place within the Learning Technologies Workshop (CEN/ISSS/LT). The main efforts are devoted to reuse and interoperation for educational resources, educational collaboration, metadata for educational contents, and learning process quality, all this having in mind the European cultural diversity. |
| certificant |
An individual who has earned a credential awarded through a certification program. |
| certificate |
A written statement or document from the certification agency confirming the competence of an individual. |
| certification |
A process, often voluntary, by which individuals who have demonstrated the level of knowledge and skill required in the profession, occupation, role or the competent use or support of a product, are identified to the public and other stakeholders. See also licensing, credentialing. |
| certification agency |
The organisational or administrative unit that sponsors a certification program. See also licensing, credentialing. |
| certification body |
The organisational or administrative unit that sponsors a certification program and maintains certification records. See Registration Body |
| certification board |
A group of individuals appointed or elected to govern one or more certification programs as well as the certification agency, and responsible for all certification decision making, including governance. |
| certification committee |
A group of individuals appointed or elected to recommend and implement policy related to certification program operation. |
| certification process |
All activities by which a body establishes that a person fulfils specified competence requirements, including application, evaluation, decision on certification, surveillance and recertification, use of certificates and logos/marks. |
| certification processing |
The process of matching an individual's accomplishments against the requirements for a certification program, and awarding certifications when all requirements have been met. |
| certification program |
The standards, policies, procedures, assessment instruments and related products and activities through which individuals are publicly identified as qualified in a profession, occupation, role or for the competent use or support of a product. |
| certification scheme |
Specific certification requirements related to specified categories of persons to which the same particular standards and rules, and the same procedures apply. |
| certification system |
Set of procedures and resources for carrying out the certification process as per a certification scheme, leading to the issue of a certificate of competence including maintenance. |
| choice |
One of the possible responses that a participant might select. Choices contain the correct answer/s and distracters. |
| class mean |
The average score for all participants in a class for a particular test. |
| class standard deviation |
The standard deviation of the scores achieved by participants within a class for a particular test. |
| classical test theory |
The view that an individual's observed score on a test is the sum of a true score component for the participant, plus an independent measurement error component. A few simple premises about these components lead to important relationships among validity, reliability, and other test score statistics. |
| classification accuracy |
The degree to which neither false positive nor false negative categorisations and diagnoses occur when a test is used to classify an individual or event. See sensitivity and specificity. |
| coaching |
Planned short term instructional activities in which prospective participant participate prior to the test administration for the primary purpose of increasing their test scores. Coaching typically includes simple practice, instruction on test-taking strategies, and so forth. Activities that approximate the instruction provided by regular school curricula or training programs are not typically referred to as coaching. |
| coefficient alpha |
An internal consistency reliability coefficient based on the number of parts into which the test is partitioned (e.g., items, subtests, or raters), the interrelationships of the parts, and the total test score variance. Also called Cronbach's alpha, and, for dichotomous items, KR 20. |
| commentary |
Comments, remarks and observations that clarify terms, provide examples of practice that help explain a standard, or offer suggestions regarding evidence that must be documented to demonstrate compliance. |
| composite score |
A score that combines several scores by a specified formula. |
| computer assisted assessment |
A common term used to describe the use of computers to support assessments. |
| computerised adaptive test |
A method by which a computer selects the range of questions to be asked based on the performance of the participant on previous questions. See adaptive test. |
| computer based assessment |
A common term used to describe the use of computers to deliver, mark, score, and analyse assessments. |
| computer based mastery test |
An adaptive test administered by computer that indicates whether or not the participant has mastered a certain domain. The test is not designed to provide scores indicating degree of mastery, but only whether the test performance was above or below some specified level. |
| conditional measurement error variance |
The variance of measurement efforts that affect the scores of examinees at a specified test score level; the square of the conditional standard error of measurement. |
| conditional standard error of measurement |
The standard deviation of measurement errors that affect the scores of examinees at a specified test score level. |
| confidence interval |
An interval between two values on a score scale within which, with specified probability, a score or parameter of interest lies. The term is also used in these standards to designate Bayesian creditability intervals that define the probability that the unknown parameters falls in the specified interval. |
| content domain |
The set of organised categories characterising subject matter under which behaviours, knowledge, skills, abilities, attitudes, and other characteristics may be represented in specifications for assessment instruments by which items are classified. |
| content standard |
A statement theoretical concept or characteristic that a test is designed to measure. |
| construct domain |
The set of interrelated attributes (e.g., behaviours, attitudes, values) which are included under a construct label. A test typically samples from this construct domain. |
| construct equivalent |
The extent to which the construct measured by one test is essentially the same as the construct measured by another test. Also, the degree to which a construct measured by a test in one cultural or linguistic group is comparable to the construct measured by the same test in a different cultural or linguistic group. |
| construct irrelevance |
The extent to which test scores are influenced by factors that are irrelevant to the construct that the test is intended to measure. Such extraneous factors distort the meaning of test scores from what is implied in the proposed interpretation. |
| construct underrepresentation |
The extent to which a test fails to capture important aspects of the construct that the test is intended to measure. In this situation, the meaning of test scores is narrower than the proposed interpretation implies. |
| construct response item |
An exercise for which examinees must create their own responses or products rather than choose a response from an enumerated set. |
| continuing competence |
The ability to provide service at specified levels of knowledge and skill, not only at the time of initial certification but throughout an individual’s professional career. See Re-certification and Continuing Education. |
| continuing education |
Activities, often short courses, that certified professionals engage in to receive credit for the purpose of maintaining continuing competence and renewing certification. See Re-certification and Continuing Competence. |
| convergent evidence |
Evidence based on the relationship between test scores and other measures of the same construct. |
| corrective scoring |
A calculation used to offset the effects of guessing in objective tests. |
| credentialing |
Granting, by some authority, a person a credential, such as a certificate, license, or diploma, that signifies a certain level of competence in some domain of knowledge or activity. |
| criterion domain |
See construct domain: the construct domain of a variable used as a criterion. |
| criterion-referenced score interpretation |
A score interpretation that does not depend upon the score's rank within, or relationship to the distribution of scores for other examinees. Examples of criterion-referenced interpretations include comparison to cut scores, interpretations based on expectancy tables, and domain-referenced score interpretations. |
| criterion-referenced test |
A test that allows its users to make score interpretations in relation to a functional performance level, as distinguished from those interpretations that are made in relation to the performance of others. See also domain-referenced test. |
| cross-validation |
A procedure in which an empirically derived scoring system or set of weights from one sample is applied to a second sample in order to investigate the stability of prediction of the scoring system or weights. |
| cut score |
A specified point on a score scale, such that scores at or above that point are interpreted differently from scores below that point. Sometimes there is only one cut score, dividing the range of possible scores into "passing" and "failing" or "mastery" and "nonmastery" regions. Sometimes two or more cut scores may be used to define three or more score categories, as in establishing performance standards. See also, performance standards. |
|
|
| D |
| database |
A collection of information/data, often organised within tables, within a computer's mass storage system. Databases are structured in a way to provide for rapid search and retrieval by computer software. The following databases are used by testing systems; item, test definition, scheduling and results. |
| DCMI |
The Dublin Core Metadata Initiative (DCMI) is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices. |
| delivery channel |
One of more testing centres, usually managed by a delivery provider (i.e. an organisation that provides candidate scheduling services, computers, proctoring services, and the space in which to conduct a computerised test). |
| delivery provider |
An organisation that provides candidate scheduling services, computers, proctoring services, and the space in which to conduct a computerised test. |
| derived score |
A score to which raw scores are converted by numerical transformation (e.g., conversion of raw scores to percentile ranks or standard scores). |
| distracter |
One of the choices, that a participant may select, that is not the correct answer. |
| diagnostic and intervention decisions |
Decisions based upon inferences derived from psychological test scores as part of an assessment of an individual. See also intervention. |
| diagnostic assessment |
Primarily used to identify needs and to determine prior knowledge of participants. Diagnostic assessments usually occur prior to a learning experience. |
| differential item functioning |
A statistical property of a test item in which different groups of participants have different rates of correct item response, conditional upon total test score or equivalent measure. |
| difficulty |
A statistical property, sometimes known as facility, indicating the level of a question, from 0.0 to 1.0. Calculated as the average score for the question divided by the maximum achievable score. A facility of 0.0 means that the question is very hard (no-one got it right) and 1.0 means that it is very easy (no-one got it wrong). 0.5 ideal. |
| discipline |
A formal, published process for the enforcement of standards governing the professional behaviour (i.e., ethics) of certificants. |
| discriminant evidence |
Evidence based on the relationship between test scores and measures of different constructs. |
| discrimination |
Discrimination refers to the formulate used for calculating the potential of a question to distinguish between stronger and weaker students. The statistical correlation of the question score and the test score from -1.0 to +1.0. A high correlation (close to +1.0) means that the question is measuring the same thing as the test. A low correlation means that there is little correlation between participants getting the question right and getting a good score in the test. A negative correlation indicates that participants getting the question right generally get a bad overall test score. |
| documentation |
The body of literature (e.g., tests manuals, manual supplements, research reports, publications, user's guides, etc.) made available by publishers and test authors to support test use. |
| domain-referenced test |
A test that allows users to estimate the amount of a specified content domain that an individual has learned. domains may be based on sets of instructional objectives, for example. See also criterion-referenced tests and content-related evidence of validity. |
| domain sampling |
The process of selecting test items to represent a specified universe of performance. |
| drag-and-drop question |
A response style where the participant indicates their selection by using a mouse or pointing device to drag and drop graphic elements that illustrate their choice(s). |
|
|
| E |
| EdNA |
Education Network Australia, EdNA is targeted to promote Internet as a supporting tool for computer-based learning among the Australian educational community, from students to content providers. |
| eligibility requirements |
Published criteria, often benchmarks for education, training and experience, with which applicants must demonstrate compliance in order to qualify for the certification. |
| empirical keying |
The strategy of using empirical relationships between individual test items and the criterion of interest as the basis for test scoring. |
| equated forms |
Two or more test forms constructed to the same explicit content and statistical specifications and administered under identical procedures (alternate forms); through statistical adjustments, the scores on the alternate forms have been placed on a common scale. |
| equating |
A statistical process used to convert scores on two or more alternate forms of an assessment instrument to a common score for purposes of comparability and equivalence. |
| equivalent forms |
See alternate forms. |
| error of measurement |
The difference between an observed score and the corresponding true score or proficiency. See also standard error of measurement and true score. |
| essay response |
A response style where the participant enters an essay in response to the stimulus. |
| essential element |
A statement that is directly related to the blue-print and specifies what a certification program must do to fulfill the requirement of the blue-print. |
| evaluation |
The process that assesses a person’s achievements (fulfillment of the requirements of the scheme) and/or the effectiveness of learning experiences |
| examination |
A method or procedure to access an individual's knowledge, skills and abilities. Such procedures may involve written or oral responses, or by observation of the candidate performing tasks. |
| examiner |
A person deemed by the certifying agency to posses the relevant technical and personal qualifications to conduct an examination as part of the certification process. |
|
|
| F |
| facility |
A statistical property, sometimes known as difficulty, indicating the level of a question, from 0.0 to 1.0. Calculated as the average score for the question divided by the maximum achievable score. A facility of 0.0 means that the question is very hard (no-one got it right) and 1.0 means that it is very easy (no-one got it wrong). 0.5 ideal. |
| factor |
In measurement theory, a statistically derived, hypothetical dimension that accounts for part of the intercorrelations among tests. Strictly, the term refers to a statistical dimension defined by a factor analysis, but it is also commonly used to denote the psychological construct associated with the dimension. Single-factor tests presumably assess only one construct; multi-factor tests measure two or more constructs. |
| factor analysis |
Any of several statistical methods of analysing the intercorrelations or covariance's among variables by constructing hypothetical factors, which are fewer in number than the original variables. The analysis indicates how much of the variation in scores on each original measure can be accounted for by each of the hypothetical factors. |
| factorial structure |
The set of factor obtained in a factor analysis. |
| fairness |
The principle that all applicants and candidates will be treated in an equitable manner throughout the entire certification process. See Bias |
| false negative |
In classification or selection, an error in which an individual is assessed or predicted not to meet the criteria for inclusion in a particular group but in truth does (or would) meet these criteria. See sensitivity and specificity. |
| false positive |
In classification or selection, an error in which an individual is assessed or predicted to meet the criteria for inclusion in a particular group but in truth does not (or would not) meet these criteria. See sensitivity and specificity. |
| feedback |
Feedback is term used when stimulus is provided to a participant according to their responses within an assessment. Feedback is normally provided at an item, topic, and/or assessment level. |
| field test |
A test administration used to check the adequacy of testing procedures, generally including test administration, test responding, test scoring, and test reporting. A field test is generally more extensive than a pilot test. See pilot test. |
| fill-in-the-blanks |
A response style where the participant completes a phrase by entering a word, words or a number. |
| flag |
An indicator attached to a test score, a test item, or other entity to indicate a special status. A flagged test score generally signifies a score obtained in a modified, non-standard test administration. A flagged test item signifies an item with undesirable characteristics, such as excessive differential item functioning. |
| focus group |
An evaluation activity comprising of a semi-structure discussion with a group of people. Focus groups, comprising of stakeholders, are used to inform test-designers on the significance of each topic to be administered within a certification exam. |
| formative assessment |
An assessment that has a primary objective of providing prescriptive feedback (item, topic and/or assessment level) to a participant. |
| frequency analysis |
Frequency analysis measures the number of times a particular distracter, or combination of distracters, was selected by a groups of participants. |
| functional equivalence |
The degree to which similar activities or behaviours have the same functions in different culture or linguistic groups. |
|
|
| G |
| gain score |
The difference between the score on a test and the score on an earlier administration of the same or an equivalent test. |
| GEM |
Project GEM, Gateway to Educational Materials, provides a unified framework for the publication and location of educational resources available through the Internet. This project was born in 1997 as a special project within ERIC Clearinghouse on Information & Technology. |
| generalisability coefficient |
An index formed as the ratio of (a) the sum of variances that are considered components of test score variance in the setting under study to (b) the foregoing sum plus the weighted sum of variances attributable to various error sources in this setting. Such indices, which arise from the application of generalisability theory, are typically interpreted in the same manner as reliability coefficients. |
| generalisability theory |
An extension of classical reliability theory and methodology in which analysis of variance is used to estimate variance components that indicate the magnitude of errors from specified sources. The analysis is used to evaluate the generalisability of scores beyond the specific sample of items, persons, and observational conditions that were studied. |
| grade equivalent score |
The school grade level for which a given score is the real or estimated median or mean. |
| graphical hotspot question |
A response style where the participant indicates their selection by using a mouse or pointing device on a graphic display. |
|
|
| H |
| high-stakes test |
A test whose results has important, direct consequences for examinees, program, or institutions tested. |
| holistic scoring |
A method of obtaining a score on a test, or a test item, that results from an overall judgment of performance using specified criteria. |
| hotspot response |
A response style where the participant indicates their selection by using a mouse or pointing device on a graphic display. |
|
|
| I |
| IEEE LTSC |
The Learning Technologies Standardisation Committee from the IEEE covers practically all aspects related to computer-based education. Its main objective is to develop technical standards, recommended practices and guidelines for software components, tools, technologies and design methods to facilitate the development, implementation, maintenance and interoperation of educational systems |
| IMS |
The IMS is a member funded global consortium that develops and promotes the use of specifications for online learning resources, systems, products, and services. |
| informed consent |
The written agreement of a person, or that person's legal custodian, for some procedure to be performed on or by the individual, such as taking a test. |
| intelligence test |
A psychological or educational test designed to measure intellectual processes in accord with some evidence-based theory of intelligence. |
| interested party(ies) |
The various individuals and groups with an interest in the quality, governance, and operation of a certification program, such as the public, employers, customers, clients, third party payers, etc. See Stakeholders |
| internal consistency coefficient |
An index of the reliability of test scores derived from the statistical interrelationships of responses among item responses or scores on separate parts of a test. |
| internal structure |
In test analysis, the factorial structure of item responses. (See factorial structure) |
| inter-rater agreement |
The consistency of rater judgments of the work or performance of people; sometimes referred to as inter-rater reliability, although the typical index of agreement does not reflect variation in the performance of participants from one sample or occasion to another. |
| intervention planning |
The activity of a practitioner that involves the development of a treatment protocol. |
| inventory |
A questionnaire or checklist, usually in the form of a self-report, that elicits information about an individual's personal opinions, interests, attitudes, preferences, personality characteristics, motivations, and typical reactions to situations and problems. |
| invigilator |
An individual who supervises a written examination/test to maintain a fair and consistent testing environment, but takes no part in the examination process. See Proctor. |
| ISO |
The International Standardisation Organisation. |
| ISO/IEC |
The International Standardisation Organisation and International Electrotechnical Commission Committee. |
| ISO/IEC JTC1 SC36 |
The 36th subcommittee of the first joint International Standardisation Organisation and International Electrotechnical Commission Committee (ISO/IEC JTC1 SC36) was launched in 1999 to cover all aspects related to the standardisation in the field of learning technologies. Its focus is on interoperability, not only at the technical level, but also taking into account social and cultural issues. |
| item |
A general term referring to an individual problem, question, choices, correct answer, scoring scenarios and outcomes used within a test. |
| item analysis |
The process of studying the responses to questions delivered in the pilot study or prototype in order to select the best questions in terms of facility and discrimination. |
| item bank |
The system by which test items are maintained, stored and classified to facilitate item review, item development and examination assembly. |
| item characteristic curve |
A function relating the probability of a certain item response, usually a correct response, to the level of the attribute measured by the item. Also called item response curve. |
| item pool |
The aggregate of items from which a test or test scale's items are selected during test development, or the total set of items from which a particular test is selected for participant during adaptive testing. |
| item prompt |
The question, stimulus, or instructions that direct the efforts of examinees in formulating their responses to a constructed-response exercise. |
| item response theory (IRT) |
A theory of test performance that emphasises the relationship between mean item score (P) and level (0) of the ability or trait measured by the item. In the case of an item scored 0 (incorrect response) or 1 (correct response), the mean item score equals the proportion of correct responses. In most applications, the mathematical function relating P to 0 is assumed to be a logistic function that closely resembles the cumulative normal distribution |
| item type or format |
The structure of a problem that stimulates a candidate to respond within an assessment instrument (i.e. drag-and-drop, essay, fill-in-the-blank, hot-spot, multiple choice, multiple-response, numeric, open-ended, selection, short answer). |
|
|
| J |
| job analysis |
Any of several methods used singly or in combination to identify the tasks performed on a job or the knowledge, skills, abilities, and other personal characteristics relevant to job performance. |
| job task analysis |
See job analysis |
| JSR 168 |
Java Specification Request 168 (JSR 168) defines a standard interface that addresses the areas of content aggregation, personalisation, presentation, and security for portlets implemented for the Java platform and defines the contract between a portlet and its container. |
|
|
| K |
| key |
An element of an item that details the correct choice(s) to allow the item to be graded correctly. |
|
|
| L |
| learning outcomes |
The intended product from the process of learning. |
| licensing |
The issuing, usually by a government agency, of a credential indicating competence in some profession or client-centered activity. See also certification, credentialing. |
| likert scale |
See lykert. |
| lykert scale |
A method to prompt a respondent to express their opinion on a statement being presented. Likert scales are often 4 point scales (strongly agree, agree, disagree, strongly disagree), 5 point scales (strongly agree, agree, neutral, disagree, strongly disagree), but sometimes as many as 10 potential choices. |
| local evidence |
Evidence (usually related to reliability or validity) collected for a specific set of participants in a single institution or at a specific location. |
| local norms |
Norms by which test scores are referred to a specific, limited reference population, (locale, organisation, or institution); local norms are not intended as representative of populations beyond that setting. |
| local setting |
The organisation or institution where a test is used. |
| low-stakes test |
A test whose results has only minor or indirect consequences for examinees, programs, or institutions tested. |
|
|
| M |
| mandated tests |
Tests that are administered because of a mandate from an external authority. |
| mastery test |
A test designed to indicate that the participant has or has not mastered some domain or knowledge or skill. Mastery is generally indicated by a passing score or cut score. See cut score. |
| matrix sampling |
A measurement format in which a large set of test items is organised into a number of relatively short item sets, each of which is randomly assigned to a subsample of participants, thereby avoiding the need to administer all items to all examinees in a program evaluation. |
| mean |
Arithmetic average of some scores, i.e. the sum of the scores divided by the number of scores. |
| measurement error variance |
That portion of the observed score variance attributable to one or more sources of measurement error; the square of the standard error of measurement. [2-Feldt] |
| moderator variable |
In regression analysis, a variable that serves to explain, at least in part, the correlation of two other variables. |
| multi-factor test |
An instrument that measures two or more constructs which are less than perfectly correlated. |
| multimedia |
Graphics, animation, audio, and video presented by a computer. |
| multiple choice |
A response style where the participant selects one choice from several to indicate their opinion as to the correct answer. |
| multiple response |
A response style where the participant selects more than one choice from several to indicate their opinion as to the correct answers. Multiple response questions have answer keys that describe various combination of choices being right or wrong with different possible outcome for the different combination of selections. |
|
|
| N |
| neuropsychodiagnosis |
Classification or description of inferred central nervous system status on the basis of neuropsychological assessment. |
| neuropsychological assessment |
A specialised type of psychological assessment designed to generate hypotheses and inferences about normal or pathological processes affecting the central nervous system and the resulting psychological and behavioural functions or dysfunction's. |
| normalised standard score |
A derived test score in which a numerical transformation has been chosen so that the score distribution closely approximates a normal distribution, for some specific population. |
| norm-referenced test interpretation |
A score interpretation based on a comparison of a participant's performance to the performance of other people in a specified reference population. |
| norms |
Statistics or tabular data that summarise the distribution of test performance for one or more specified groups, such as participants of various ages or grades. Norms are usually designed to represent some larger population, such as participants throughout the country. The group of examinees represented by the norms is referred to as the reference population. |
| numeric response |
A response style where the participant enters a number to indicate their choice |
|
|
| O |
| objective testing |
Style of testing that measures the participants knowledge of objective facts, the correct answers, to which, are known in advance |
| OCR |
Optical Character Recognition. A method whereby a computer can recognise text and other marks that have been scanned. |
| OMR |
Optical Mark Reader. A device that scans paper forms (normally bubble sheets) and recognises the marks made on the form. |
| operational use |
The actual use of a test, after initial test development has been completed, to inform an interpretation, decision, or action based, in part, upon test scores. |
| outcome |
The event that will occur after a question or questions have been answered (i.e. the item is scored, feedback is provided, etc.) |
| outcome evaluation |
The activity of a practitioner that evaluates the efficacy of an intervention. |
|
|
| P |
| parallel forms |
See alternate forms. |
| participant |
A person that participates in a testing, assessment or survey process by answering questions. |
| participant mean |
The mean of the percentage score achieved by candidates. Used to determine validity of choices, within an item, by examining the choices selected by the higher and/or lower scoring candidates. |
| percentile |
The score on a test below which a given percentage of scores fall. |
| percentile rank |
The percentage of scores in a specified distribution that fall below the point at which a given score lies. |
| performance assessments |
Product- and behaviour-based measurements based on settings designed to emulate real-life contexts or conditions in which specific knowledge or skills are actually applied. |
| performance domain |
The set of organised categories characterising a role or job under which tasks and associated knowledge and/or skills may be represented in the job analysis. |
| performance standard |
An objective definition of a certain level of performance in some domain in terms of a cut score or a range of scores on the score scale of a test measuring proficiency in that domain. Also, sometimes, a statement or description of a set of operational tasks exemplifying a level of performance associated with a more general content standard; the statement may be used to guide judgments about the location of a cut score on a score scale. |
| personality assessment |
A specialised type of psychological assessment relating to inferred normal or abnormal personality dimensions. |
| pilot test |
A test administered to a representative sample of participants solely for the purpose of determining the properties of the test. See field test. |
| policy |
The principles, plan or procedures established by an agency, institution, or government, generally with the intent of reaching a long-term goal. |
| portal |
A portal is a Web-based application that provides personalisation, single sign-on, and content aggregation from different sources and hosts the presentation layer of information systems. |
| portfolio assessments |
Systematic collections of educational or work products that are typically collected over time. |
| portlet |
A portlet is a Web component, usually managed by a container, that processes requests and generates dynamic content. Portals use portlets as pluggable user interface components to provide a presentation layer to information systems. |
| practice analysis |
See Job Analysis |
| practitioner |
In the context of psychological or neuropsychological assessment, an appropriately qualified interpreter of psychological test results and relevant collateral information. |
| precision of measurement |
A general term that refers to the reliability of a measure, or its sensitivity to measurement error. |
| predictor domain |
The construct domain of a construct used as a predictor. See construct domain. |
| pretest |
an administration of test items to a representative sample of participants solely for the purpose of determining the characteristics of the item. |
| proctor |
An individual who supervises a written examination/test to maintain a fair and consistent testing environment, but takes no part in the examination process. See Invigilator. |
| program evaluation |
The collection of systematic evidence to determine the extent to which a planned set of procedures obtains particular effects. |
| program norms |
See user norms. |
| PROMETEUS |
PROmoting Multimedia access to Education and Training in EUropean Society is another European initiative that gets together more than 400 institutions involved in computer-based education. |
| proposed interpretation |
A summary, or a set of illustrations, of the intended meaning of test scores, based on the construct(s) or concept(s) the test is designed to measure. |
| psychodiagnosis |
Formalisation or classification of functional mental health status based on psychological assessment. See neuropsychodiagnosis. |
| psychological assessment |
A comprehensive examination of psychological functioning that involves collected, evaluating, integrating test results and collateral information, and reporting information about an individual. Various methods may be used to acquire information during a psychological assessment; administering, scoring and interpreting tests and inventories; behavioral observation; client and third party interviews; analysis of prior educational, occupational, medical, and psychological records. |
| psychological testing |
Any procedure that involves the use of tests or inventories to assess particular psychological constructs of an individual. |
| psychometric |
Properties of the items and test such as the distribution of item difficulty and discrimination indices. |
| psychometric analysis |
The analysis of the items and test such as the distribution of item, difficulty and discrimination indices. |
| psychometrician |
A qualified person who analyses the psychometrics of a test or item. |
| public member |
A representative of the consumers of services provided by a defined certificant population, serving on the governing body of a certification program. |
| publish |
To release and make public, in hardcopy, electronic, or web-based formats, an assessment by publishing from the development system to the production or release system to make it widely available. |
|
|