the degree to which a questionnaire measures what it is supposed to measure is an indication of its

Research article
Open up Access
Published: 18 May 2007

Questionnaire discrimination: (re)-introducing coefficient δ

BMC Medical Enquiry Methodology book seven, Article number:nineteen (2007) Cite this article

12k Accesses
thirty Citations
Metrics details

Abstract

Groundwork

Questionnaires are used routinely in clinical research to measure health status and quality of life. Questionnaire measurements are traditionally formally assessed by indices of reliability (the degree of measurement fault) and validity (the extent to which the questionnaire measures what it is supposed to mensurate). Neither of these indices assesses the degree to which the questionnaire is able to discriminate between individuals, an important aspect of measurement. This newspaper introduces and extends an existing index of a questionnaire's ability to distinguish between individuals, that is, the questionnaire'southward discrimination.

Methods

Ferguson (1949) [i] derived an index of exam discrimination, coefficient δ, for psychometric tests with dichotomous (correct/incorrect) items. In this newspaper a general form of the formula, δ _G, is derived for the more general class of questionnaires allowing for several response choices. The calculation and characteristics of δ _Gare then demonstrated using questionnaire data (GHQ-12) from 2003–2004 British Household Panel Survey (Northward = 14761). Coefficients for reliability (α) and bigotry (δ _{One thousand}) are computed for 2 commonly-used GHQ-12 coding methods: dichotomous coding and four-point Likert-type coding.

Results

Both scoring methods were reliable (α > 0.88). However, δ _{One thousand}was substantially lower (0.73) for the dichotomous coding of the GHQ-12 than for the Likert-blazon method (δ _G= 0.96), indicating that the dichotomous coding, although reliable, failed to discriminate between individuals.

Decision

Coefficient δ _Gwas shown to take decisive utility in distinguishing betwixt the cross-sectional discrimination of two as reliable scoring methods. Ferguson's δ has been neglected in discussions of questionnaire blueprint and performance, perhaps considering it has non been implemented in software and was restricted to questionnaires with dichotomous items, which are rare in health intendance research. It is suggested that the more full general formula introduced here is reported as δ _G, to avoid the implication that items are dichotomously coded.

Peer Review reports

Background

Questionnaire measures are routinely used in clinical enquiry as measures of health condition and quality of life [2] as well as other outcomes such as mood, stress, satisfaction and and so on. The theory underlying the use of questionnaires every bit instruments of measurement is predominantly psychometric [3], and in keeping with this tradition the measurement backdrop of such questionnaires are reported as indices of reliability and validity. The reliability coefficient (for instance, Cronbach'southward α) estimates the degree of measurement error in the data, and hence the reproducibility of the measurements. Validity refers to the caste to which the questionnaire measures what is intended to be measured, and this is usually inferred from the caste to which the questionnaire agrees with other criteria.

Reliability and validity of measurement are of course paramount for expert-quality data, just the degree to which a measurement instrument is capable of discerning differences between individuals is also a fundamental aspect of measurement theory [4]. For a questionnaire to be useful in assessing health status, information technology must be able to distinguish between individuals who differ in health status, and fail to distinguish betwixt those who do non. A questionnaire that failed to distinguish real differences would exist unlikely to exist valid, and hence discrimination is a necessary just not sufficient condition of validity. The concept described here as 'discrimination' is also referred to as 'discriminatory ability' [3] only should not be dislocated with discriminant validity, item bigotry or discriminant functions.

A niggling-reported statistic, Ferguson's [1]δ, quantifies the extent to which a measure can distinguish between cases. The statistic is conceptually uncomplicated. It is the ratio of observed differences to the theoretical maximum possible number of differences. When all possible scores occur with the same frequency, then the scale is maximally discriminating and the index is 1.0. Ferguson demonstrated that a normal distribution of test scores would yield a coefficient of around 0.nine, and a rectangular distribution, one.0. Skewed distributions event in fewer discriminations and hence lower values of δ, reaching a minimum of 0.0 when no discriminations at all are made and every respondent has the same score.

That this statistic has not been more widely used may be due to the limiting assumption that the measure comprises dichotomous items (east.g. incorrect/correct), with each response coded every bit 0 or 1. Most health condition questionnaires use polytomous scales, typically 5- or seven-point Likert-type scales (eastward.g. Strongly disagree, Disagree, Not certain, Concord, Strongly Agree). Researchers wishing to compute δ would therefore exist forced to dichotomise detail responses in lodge to compute the statistic.

As noted above, discrimination does non ensure validity: a loftier δ indicates that something is being discriminated, but non necessarily the thing intended. As Guilford [5] points out, whatsoever discussion of bigotry must take place within the more problematic context of validity. Interestingly, Guilford likewise suggested that the goals of maximising both discrimination and reliability may be incompatible. High reliability is sometimes claimed when the measure out is constructed of highly-correlated items. As well as potentially limiting the validity of the resulting calibration by excluding uncorrelated but valid items, this volition tend to decrease bigotry. Depending on the circumstances it may be desirable to ameliorate bigotry by increasing the heterogeneity of the questionnaire items at the price of reliability (although reliability should not autumn beneath an acceptable level). Hence discrimination should exist a key consideration of questionnaires at the design stage.

The residuum of this paper develops the original formula for δ to let for the computation of the statistic for questionnaire measures with polytomous items. The resulting general formula applies equally well to dichotomous and polytomous scales. The utility of the statistic will then exist demonstrated using information from the 12-detail Full general Health Questionnaire (GHQ-12) [6], which may be coded equally the sum of 12 dichotomous items (known as 0011 coding) or of 12 items with four response categories (known as 0123 coding).

Methods

Ferguson's formula for δ assumes that the test comprises one or more items, each with but two response categories: wrong or correct. The items are therefore dichotomous and coded as 0 or 1, respectively. The definitional formula for δ is:

$δ = \frac{{due north}^{2} - \sum_{i = 0}^{k} f_{i}^{ii}}{n^{2} - \frac{{due north}^{2}}{g + ane}}$

(ane)

In which: n = sample size

f = frequency of score i

k = number of questionnaire items

This definitional formula has been further modified [5, 7]. Guilford simplifies it to a computational formula as follows [5]:

$δ = \frac{(k + 1) (n^{two} - \sum_{i = 0}^{k} f_{i}^{2})}{chiliad n^{2}}$

(ii)

The simplification offered by Cliff [7] is not presented here due to notational differences between his newspaper and Ferguson'southward. Both modifications maintain the supposition that items are dichotomous.

Thus specified, δ ranges from zero to one. When δ = 0.0, the questionnaire has minimal discrimination, and this occurs when all respondents have the same scale score, that is, the questionnaire fails to discriminate whatsoever respondent from whatever other respondent. When δ = one.0, the questionnaire has maximal discrimination since all possible scores occur with the same frequency.

As noted, all current formulae depend on the questionnaire items being dichotomous and coded as 0 or 1. This ensures that all summed calibration scores fall inside the range 0..k, and that the maximum number of different summed scores is g+1. Attempts to compute δ for polytomous item measures neglect because the summed scale scores no longer fall within the range 0..thousand, and the length of the summed scale is no longer fixed at k+ane. The summed scale range of polytomous item measures will vary according to the number of response categories every bit well as the number of items.

A modified formula to take into account polytomous item measures is presented beneath. To distinguish the resulting statistic δ from the strictly dichotomous course, I propose appending the subscript G (δ _G, for Generalised δ). Hence when δ is cited, it may be causeless that the mensurate comprises either dichotomous or dichotomised items, and that when δ _Grandis cited, information technology may be assumed that the measure comprises polytomous items. δ _Gmay be applied to dichotomous scales; the older δ may non be practical to polytomous scales.

If nosotros consider a questionnaire scale comprising k items with each item having thousand response categories with each item coded 0..g-1, the possible range of scores is 0..yard(m-i). For instance, a scale comprising 12 items with four responses per item would have a calibration range of 0..36, hence:

$δ_{Yard} = \frac{n^{2} - \sum_{i = 0}^{yard (1000 - 1)} f_{i}^{2}}{n^{ii} - \frac{{due north}^{2}}{1 + k (m - 1)}}$

(3)

Where: n = sample size

f = frequency of score i

k = number of questionnaire items

chiliad = length of scale

Modifying the simplified equation (2):

$δ_{M} = \frac{(one + k (m - 1)) (n^{2} - \sum_{i = 0}^{grand (m - one)} f_{i}^{two})}{n^{ii} yard (m - one)}$

(iv)

Note that for dichotomous items, thou = ii then k(m-1) = k. Hence for dichotomous items δ _G= δ.

The modified formula for δ _Yardhas been implemented in the statistical software parcel R equally function delta.g with 95% confidence limits bootstrapped by resampling with replacement [come across Additional file one]. For those researchers without access to R, a simple spreadsheet is available to compute coefficient δ _Grandfrom frequency tables. This may exist obtained from the writer and farther implementations are being developed for other platforms.

Having derived the general formula for δ it should prove useful to demonstrate the calculation and backdrop of the coefficient. The 2004 British Household Console Survey [8] sampled 14761 individuals from the general population (sampling details, protocol and information are available at the survey website [nine]). As part of this survey respondents completed the 12-item General Health Questionnaire, a self-written report measure of psychiatric morbidity. The data were obtained for an ongoing study of the measurement properties of the GHQ-12 in a general UK sample (usage ID: 21697) and are used here for sit-in purposes only.

The GHQ-12 comprises twelve statements (items) with iv responses per detail and may exist scored dichotomously (0011) or polytomously (0123) [five]. From equation (3), for dichotomous scoring, grand = 12 and m = ii and for polytomous scoring, k = 12 and m = 4. There has been much debate over the relative benefits of these and other coding schemes, principally over the establishment of threshold values for clinical severity, but for the purposes of this discussion I will focus on the upshot of scoring method on reliability and discrimination. To this end, the reliability of each scoring method was estimated using Cronbach'due south α and the discrimination by δ _Yard.

Results

Equally can be seen in Figure 1, the distribution of the GHQ-12 score was profoundly affected by the scoring method. Polytomous coding produced a slightly skewed distribution but one with clearly defined tails (skew = i.3, SE = 0.02), with discrimination δ _{One thousand}= 0.96 (actual value: 0.957; bootstrapped 95% CL: 0.956, 0.959) and reliability of α = 0.88. Dichotomous coding resulted in a highly-skewed distribution (skew = one.86, SE = 0.02) with 54.2% of the sample scoring the calibration minimum: this lack of bigotry was reflected in the value of δ _{One thousand}= 0.73 (actual value: 0.731; bootstrapped 95% CL: 0.723, 0.739). Reliability was α = 0.89. The ii scoring methods were highly correlated (r = 0.90, p < 0.001).

Discussion

The results demonstrate the utility of δ _Gin distinguishing between discriminating and undiscriminating questionnaires. In terms of reliability the two scoring methods were indistinguishable, since Cronbach's alpha was 0.88 for polytomous scoring and 0.89 for dichotomous coding. We would conclude on this basis that the scales were equally reliable. The 2 methods yielded highly correlated scores (r = 0.9): this implies that the coding method did not profoundly affect the validity of measurement since the two methods would be likely to correlate equally well with whatever external criterion.

Consideration of δ _Thousandwould, however, pb u.s.a. to a different conclusion, since dichotomous coding produced a calibration with a lower index of discrimination (δ _M= 0.73) than polytomous coding (δ _Chiliad= 0.96). Dichotomous coding substantially reduced the ability of the GHQ-12 to distinguish between individuals compared to the four-point coding. Both coding methods resulted in a skewed distribution, but the dichotomous coding resulted in more than one-half of the sample scoring the aforementioned (zero): in issue the questionnaire could not distinguish any departure between these cases. Hence, the discrimination of the questionnaire was compromised, and the caste to which it was compromised was quantified past δ _K.

Conclusion

This paper attempts to reintroduce coefficient δ as an index of questionnaire discrimination. The coefficient is non-parametric, making no assumptions of the data, and is conceptually simple, being the ratio of observed discriminations to the maximum possible number of discriminations. The full general form δ _Thouis useful for the evaluation and blueprint of the majority of questionnaire measures, that is, those comprising several items with the same number of response categories. It is simple to farther change the formula to have into account scales comprising items with unlike numbers of responses, such as the SF-36. The statistic may also exist used for single-item measures.

It is hoped that researchers will at present report and seek to maximise both coefficients of reliability and discrimination when evaluating and designing questionnaire measures. Consideration of the discrimination of a questionnaire should lead to an improvement in the quality of measurement: this should result in greater understanding of the characteristics of different questionnaires in different populations, and also allow questionnaires to exist compared and selected on characteristics other than reliability.

The comparative fail of Ferguson's δ and its lack of generality accept resulted in an absence of studies to elucidate its sampling distribution and other characteristics, in particular its relationship to validity, reliability and effect size. Further studies of these characteristics will exist forthcoming.

References

Ferguson GA: On the theory of test discrimination. Psychometrika. 1949, 14: 61-68. 10.1007/BF02290141.

CAS Commodity PubMed Google Scholar
Gummesson C, Atrroshi I, Ekdah C: Performance of health-condition scales when used selectively or within multi-scale questionnaire. BMC Medical Research Methodology. 2003, three: 3-10.1186/1471-2288-3-three.

Commodity PubMed PubMed Central Google Scholar
Kline P: The handbook of psychological testing. 2000, Routledge, London, 2

Google Scholar
Hand DJ: Statistics and the theory of measurement (with word). Journal of the Majestic Statistical Society, Series A. 1996, 159: 445-492.

Article Google Scholar
Guilford JP: Psychometric Methods. 1954, McGraw-Loma, New York

Google Scholar
Goldberg DP, Williams P: A User's Guide to the General Health Questionnaire. 1988, Windsor: NFER-Nelson

Google Scholar
Cliff N: A theory of consistency of ordering generalizable to tailored testing. Psychometrika. 1977, 42: 375-399. ten.1007/BF02293657.

Article Google Scholar
Taylor MF, Brice J, Buck N, Prentice-Lane E: British Household Panel Survey User Transmission Volume A: Introduction, Technical Report and Appendices. 2006, Colchester: Academy of Essex

Google Scholar
British Household Panel Survey (BHPS). [http://www.iser.essex.air-conditioning.uk/ulsc/bhps]

Pre-publication history

The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/vii/xix/prepub

Download references

Author information

Affiliations

King's College London, Department of Psychology (at Guy'south), Institute of Psychiatry, London, UK

Matthew Hankins
Section of Primary Care & Public Health, Brighton & Sussex Medical School, Brighton, UK

Matthew Hankins
Brighton & Sussex University Hospitals NHS Trust, Regal Sussex County Hospital, Brighton, UK

Matthew Hankins

Corresponding author

Correspondence to Matthew Hankins.

Additional information

Competing interests

The writer(s) declare that they have no competing interests.

Authors' contributions

MH is the sole writer.

Electronic supplementary cloth

12874_2007_202_MOESM1_ESM.doc

Boosted File 1: R code and examples. The file contains R lawmaking for computing coefficient delta and bootstrapped 95% conviction limits. (DOC xxx KB)

Authors' original submitted files for images

Rights and permissions

This article is published under license to BioMed Primal Ltd. This is an Open up Access article distributed nether the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original piece of work is properly cited.

Reprints and Permissions

About this article

Cite this article

Hankins, Yard. Questionnaire discrimination: (re)-introducing coefficient δ . BMC Med Res Methodol 7, 19 (2007). https://doi.org/10.1186/1471-2288-seven-19

Download citation

Received: xv January 2007
Accepted: xviii May 2007
Published: 18 May 2007
DOI : https://doi.org/10.1186/1471-2288-7-19

Keywords

Response Category
Questionnaire Detail
Questionnaire Measure out
British Household Panel Survey
Item Discrimination

singerefored.blogspot.com

Source: https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-7-19

the degree to which a questionnaire measures what it is supposed to measure is an indication of its

Questionnaire discrimination: (re)-introducing coefficient δ

Abstract

Groundwork

Methods

Results

Decision

Background

Methods

Results

Discussion

Conclusion

References

Pre-publication history

Author information

Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary cloth

12874_2007_202_MOESM1_ESM.doc

Authors' original submitted files for images

Rights and permissions

About this article

Cite this article

Keywords

0 Response to "the degree to which a questionnaire measures what it is supposed to measure is an indication of its"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel