• Privacy Policy

Research Method

Home » Validity – Types, Examples and Guide

Validity – Types, Examples and Guide

Table of Contents

Validity

Definition:

Validity refers to the extent to which a concept, measure, or study accurately represents the intended meaning or reality it is intended to capture. It is a fundamental concept in research and assessment that assesses the soundness and appropriateness of the conclusions, inferences, or interpretations made based on the data or evidence collected.

Research Validity

Research validity refers to the degree to which a study accurately measures or reflects what it claims to measure. In other words, research validity concerns whether the conclusions drawn from a study are based on accurate, reliable and relevant data.

Validity is a concept used in logic and research methodology to assess the strength of an argument or the quality of a research study. It refers to the extent to which a conclusion or result is supported by evidence and reasoning.

How to Ensure Validity in Research

Ensuring validity in research involves several steps and considerations throughout the research process. Here are some key strategies to help maintain research validity:

Clearly Define Research Objectives and Questions

Start by clearly defining your research objectives and formulating specific research questions. This helps focus your study and ensures that you are addressing relevant and meaningful research topics.

Use appropriate research design

Select a research design that aligns with your research objectives and questions. Different types of studies, such as experimental, observational, qualitative, or quantitative, have specific strengths and limitations. Choose the design that best suits your research goals.

Use reliable and valid measurement instruments

If you are measuring variables or constructs, ensure that the measurement instruments you use are reliable and valid. This involves using established and well-tested tools or developing your own instruments through rigorous validation processes.

Ensure a representative sample

When selecting participants or subjects for your study, aim for a sample that is representative of the population you want to generalize to. Consider factors such as age, gender, socioeconomic status, and other relevant demographics to ensure your findings can be generalized appropriately.

Address potential confounding factors

Identify potential confounding variables or biases that could impact your results. Implement strategies such as randomization, matching, or statistical control to minimize the influence of confounding factors and increase internal validity.

Minimize measurement and response biases

Be aware of measurement biases and response biases that can occur during data collection. Use standardized protocols, clear instructions, and trained data collectors to minimize these biases. Employ techniques like blinding or double-blinding in experimental studies to reduce bias.

Conduct appropriate statistical analyses

Ensure that the statistical analyses you employ are appropriate for your research design and data type. Select statistical tests that are relevant to your research questions and use robust analytical techniques to draw accurate conclusions from your data.

Consider external validity

While it may not always be possible to achieve high external validity, be mindful of the generalizability of your findings. Clearly describe your sample and study context to help readers understand the scope and limitations of your research.

Peer review and replication

Submit your research for peer review by experts in your field. Peer review helps identify potential flaws, biases, or methodological issues that can impact validity. Additionally, encourage replication studies by other researchers to validate your findings and enhance the overall reliability of the research.

Transparent reporting

Clearly and transparently report your research methods, procedures, data collection, and analysis techniques. Provide sufficient details for others to evaluate the validity of your study and replicate your work if needed.

Types of Validity

There are several types of validity that researchers consider when designing and evaluating studies. Here are some common types of validity:

Internal Validity

Internal validity relates to the degree to which a study accurately identifies causal relationships between variables. It addresses whether the observed effects can be attributed to the manipulated independent variable rather than confounding factors. Threats to internal validity include selection bias, history effects, maturation of participants, and instrumentation issues.

External Validity

External validity concerns the generalizability of research findings to the broader population or real-world settings. It assesses the extent to which the results can be applied to other individuals, contexts, or timeframes. Factors that can limit external validity include sample characteristics, research settings, and the specific conditions under which the study was conducted.

Construct Validity

Construct validity examines whether a study adequately measures the intended theoretical constructs or concepts. It focuses on the alignment between the operational definitions used in the study and the underlying theoretical constructs. Construct validity can be threatened by issues such as poor measurement tools, inadequate operational definitions, or a lack of clarity in the conceptual framework.

Content Validity

Content validity refers to the degree to which a measurement instrument or test adequately covers the entire range of the construct being measured. It assesses whether the items or questions included in the measurement tool represent the full scope of the construct. Content validity is often evaluated through expert judgment, reviewing the relevance and representativeness of the items.

Criterion Validity

Criterion validity determines the extent to which a measure or test is related to an external criterion or standard. It assesses whether the results obtained from a measurement instrument align with other established measures or outcomes. Criterion validity can be divided into two subtypes: concurrent validity, which examines the relationship between the measure and the criterion at the same time, and predictive validity, which investigates the measure’s ability to predict future outcomes.

Face Validity

Face validity refers to the degree to which a measurement or test appears, on the surface, to measure what it intends to measure. It is a subjective assessment based on whether the items seem relevant and appropriate to the construct being measured. Face validity is often used as an initial evaluation before conducting more rigorous validity assessments.

Importance of Validity

Validity is crucial in research for several reasons:

  • Accurate Measurement: Validity ensures that the measurements or observations in a study accurately represent the intended constructs or variables. Without validity, researchers cannot be confident that their results truly reflect the phenomena they are studying. Validity allows researchers to draw accurate conclusions and make meaningful inferences based on their findings.
  • Credibility and Trustworthiness: Validity enhances the credibility and trustworthiness of research. When a study demonstrates high validity, it indicates that the researchers have taken appropriate measures to ensure the accuracy and integrity of their work. This strengthens the confidence of other researchers, peers, and the wider scientific community in the study’s results and conclusions.
  • Generalizability: Validity helps determine the extent to which research findings can be generalized beyond the specific sample and context of the study. By addressing external validity, researchers can assess whether their results can be applied to other populations, settings, or situations. This information is valuable for making informed decisions, implementing interventions, or developing policies based on research findings.
  • Sound Decision-Making: Validity supports informed decision-making in various fields, such as medicine, psychology, education, and social sciences. When validity is established, policymakers, practitioners, and professionals can rely on research findings to guide their actions and interventions. Validity ensures that decisions are based on accurate and trustworthy information, which can lead to better outcomes and more effective practices.
  • Avoiding Errors and Bias: Validity helps researchers identify and mitigate potential errors and biases in their studies. By addressing internal validity, researchers can minimize confounding factors and alternative explanations, ensuring that the observed effects are genuinely attributable to the manipulated variables. Validity assessments also highlight measurement errors or shortcomings, enabling researchers to improve their measurement tools and procedures.
  • Progress of Scientific Knowledge: Validity is essential for the advancement of scientific knowledge. Valid research contributes to the accumulation of reliable and valid evidence, which forms the foundation for building theories, developing models, and refining existing knowledge. Validity allows researchers to build upon previous findings, replicate studies, and establish a cumulative body of knowledge in various disciplines. Without validity, the scientific community would struggle to make meaningful progress and establish a solid understanding of the phenomena under investigation.
  • Ethical Considerations: Validity is closely linked to ethical considerations in research. Conducting valid research ensures that participants’ time, effort, and data are not wasted on flawed or invalid studies. It upholds the principle of respect for participants’ autonomy and promotes responsible research practices. Validity is also important when making claims or drawing conclusions that may have real-world implications, as misleading or invalid findings can have adverse effects on individuals, organizations, or society as a whole.

Examples of Validity

Here are some examples of validity in different contexts:

  • Example 1: All men are mortal. John is a man. Therefore, John is mortal. This argument is logically valid because the conclusion follows logically from the premises.
  • Example 2: If it is raining, then the ground is wet. The ground is wet. Therefore, it is raining. This argument is not logically valid because there could be other reasons for the ground being wet, such as watering the plants.
  • Example 1: In a study examining the relationship between caffeine consumption and alertness, the researchers use established measures of both variables, ensuring that they are accurately capturing the concepts they intend to measure. This demonstrates construct validity.
  • Example 2: A researcher develops a new questionnaire to measure anxiety levels. They administer the questionnaire to a group of participants and find that it correlates highly with other established anxiety measures. This indicates good construct validity for the new questionnaire.
  • Example 1: A study on the effects of a particular teaching method is conducted in a controlled laboratory setting. The findings of the study may lack external validity because the conditions in the lab may not accurately reflect real-world classroom settings.
  • Example 2: A research study on the effects of a new medication includes participants from diverse backgrounds and age groups, increasing the external validity of the findings to a broader population.
  • Example 1: In an experiment, a researcher manipulates the independent variable (e.g., a new drug) and controls for other variables to ensure that any observed effects on the dependent variable (e.g., symptom reduction) are indeed due to the manipulation. This establishes internal validity.
  • Example 2: A researcher conducts a study examining the relationship between exercise and mood by administering questionnaires to participants. However, the study lacks internal validity because it does not control for other potential factors that could influence mood, such as diet or stress levels.
  • Example 1: A teacher develops a new test to assess students’ knowledge of a particular subject. The items on the test appear to be relevant to the topic at hand and align with what one would expect to find on such a test. This suggests face validity, as the test appears to measure what it intends to measure.
  • Example 2: A company develops a new customer satisfaction survey. The questions included in the survey seem to address key aspects of the customer experience and capture the relevant information. This indicates face validity, as the survey seems appropriate for assessing customer satisfaction.
  • Example 1: A team of experts reviews a comprehensive curriculum for a high school biology course. They evaluate the curriculum to ensure that it covers all the essential topics and concepts necessary for students to gain a thorough understanding of biology. This demonstrates content validity, as the curriculum is representative of the domain it intends to cover.
  • Example 2: A researcher develops a questionnaire to assess career satisfaction. The questions in the questionnaire encompass various dimensions of job satisfaction, such as salary, work-life balance, and career growth. This indicates content validity, as the questionnaire adequately represents the different aspects of career satisfaction.
  • Example 1: A company wants to evaluate the effectiveness of a new employee selection test. They administer the test to a group of job applicants and later assess the job performance of those who were hired. If there is a strong correlation between the test scores and subsequent job performance, it suggests criterion validity, indicating that the test is predictive of job success.
  • Example 2: A researcher wants to determine if a new medical diagnostic tool accurately identifies a specific disease. They compare the results of the diagnostic tool with the gold standard diagnostic method and find a high level of agreement. This demonstrates criterion validity, indicating that the new tool is valid in accurately diagnosing the disease.

Where to Write About Validity in A Thesis

In a thesis, discussions related to validity are typically included in the methodology and results sections. Here are some specific places where you can address validity within your thesis:

Research Design and Methodology

In the methodology section, provide a clear and detailed description of the measures, instruments, or data collection methods used in your study. Discuss the steps taken to establish or assess the validity of these measures. Explain the rationale behind the selection of specific validity types relevant to your study, such as content validity, criterion validity, or construct validity. Discuss any modifications or adaptations made to existing measures and their potential impact on validity.

Measurement Procedures

In the methodology section, elaborate on the procedures implemented to ensure the validity of measurements. Describe how potential biases or confounding factors were addressed, controlled, or accounted for to enhance internal validity. Provide details on how you ensured that the measurement process accurately captures the intended constructs or variables of interest.

Data Collection

In the methodology section, discuss the steps taken to collect data and ensure data validity. Explain any measures implemented to minimize errors or biases during data collection, such as training of data collectors, standardized protocols, or quality control procedures. Address any potential limitations or threats to validity related to the data collection process.

Data Analysis and Results

In the results section, present the analysis and findings related to validity. Report any statistical tests, correlations, or other measures used to assess validity. Provide interpretations and explanations of the results obtained. Discuss the implications of the validity findings for the overall reliability and credibility of your study.

Limitations and Future Directions

In the discussion or conclusion section, reflect on the limitations of your study, including limitations related to validity. Acknowledge any potential threats or weaknesses to validity that you encountered during your research. Discuss how these limitations may have influenced the interpretation of your findings and suggest avenues for future research that could address these validity concerns.

Applications of Validity

Validity is applicable in various areas and contexts where research and measurement play a role. Here are some common applications of validity:

Psychological and Behavioral Research

Validity is crucial in psychology and behavioral research to ensure that measurement instruments accurately capture constructs such as personality traits, intelligence, attitudes, emotions, or psychological disorders. Validity assessments help researchers determine if their measures are truly measuring the intended psychological constructs and if the results can be generalized to broader populations or real-world settings.

Educational Assessment

Validity is essential in educational assessment to determine if tests, exams, or assessments accurately measure students’ knowledge, skills, or abilities. It ensures that the assessment aligns with the educational objectives and provides reliable information about student performance. Validity assessments help identify if the assessment is valid for all students, regardless of their demographic characteristics, language proficiency, or cultural background.

Program Evaluation

Validity plays a crucial role in program evaluation, where researchers assess the effectiveness and impact of interventions, policies, or programs. By establishing validity, evaluators can determine if the observed outcomes are genuinely attributable to the program being evaluated rather than extraneous factors. Validity assessments also help ensure that the evaluation findings are applicable to different populations, contexts, or timeframes.

Medical and Health Research

Validity is essential in medical and health research to ensure the accuracy and reliability of diagnostic tools, measurement instruments, and clinical assessments. Validity assessments help determine if a measurement accurately identifies the presence or absence of a medical condition, measures the effectiveness of a treatment, or predicts patient outcomes. Validity is crucial for establishing evidence-based medicine and informing medical decision-making.

Social Science Research

Validity is relevant in various social science disciplines, including sociology, anthropology, economics, and political science. Researchers use validity to ensure that their measures and methods accurately capture social phenomena, such as social attitudes, behaviors, social structures, or economic indicators. Validity assessments support the reliability and credibility of social science research findings.

Market Research and Surveys

Validity is important in market research and survey studies to ensure that the survey questions effectively measure consumer preferences, buying behaviors, or attitudes towards products or services. Validity assessments help researchers determine if the survey instrument is accurately capturing the desired information and if the results can be generalized to the target population.

Limitations of Validity

Here are some limitations of validity:

  • Construct Validity: Limitations of construct validity include the potential for measurement error, inadequate operational definitions of constructs, or the failure to capture all aspects of a complex construct.
  • Internal Validity: Limitations of internal validity may arise from confounding variables, selection bias, or the presence of extraneous factors that could influence the study outcomes, making it difficult to attribute causality accurately.
  • External Validity: Limitations of external validity can occur when the study sample does not represent the broader population, when the research setting differs significantly from real-world conditions, or when the study lacks ecological validity, i.e., the findings do not reflect real-world complexities.
  • Measurement Validity: Limitations of measurement validity can arise from measurement error, inadequately designed or flawed measurement scales, or limitations inherent in self-report measures, such as social desirability bias or recall bias.
  • Statistical Conclusion Validity: Limitations in statistical conclusion validity can occur due to sampling errors, inadequate sample sizes, or improper statistical analysis techniques, leading to incorrect conclusions or generalizations.
  • Temporal Validity: Limitations of temporal validity arise when the study results become outdated due to changes in the studied phenomena, interventions, or contextual factors.
  • Researcher Bias: Researcher bias can affect the validity of a study. Biases can emerge through the researcher’s subjective interpretation, influence of personal beliefs, or preconceived notions, leading to unintentional distortion of findings or failure to consider alternative explanations.
  • Ethical Validity: Limitations can arise if the study design or methods involve ethical concerns, such as the use of deceptive practices, inadequate informed consent, or potential harm to participants.

Also see  Reliability Vs Validity

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Alternate Forms Reliability

Alternate Forms Reliability – Methods, Examples...

Construct Validity

Construct Validity – Types, Threats and Examples

Internal Validity

Internal Validity – Threats, Examples and Guide

Reliability Vs Validity

Reliability Vs Validity

Internal_Consistency_Reliability

Internal Consistency Reliability – Methods...

Split-Half Reliability

Split-Half Reliability – Methods, Examples and...

research instruments validity

  • My Bookings
  • How to Determine the Validity and Reliability of an Instrument

How to Determine the Validity and Reliability of an Instrument By: Yue Li

Validity and reliability are two important factors to consider when developing and testing any instrument (e.g., content assessment test, questionnaire) for use in a study. Attention to these considerations helps to insure the quality of your measurement and of the data collected for your study.

Understanding and Testing Validity

Validity refers to the degree to which an instrument accurately measures what it intends to measure. Three common types of validity for researchers and evaluators to consider are content, construct, and criterion validities.

  • Content validity indicates the extent to which items adequately measure or represent the content of the property or trait that the researcher wishes to measure. Subject matter expert review is often a good first step in instrument development to assess content validity, in relation to the area or field you are studying.
  • Construct validity indicates the extent to which a measurement method accurately represents a construct (e.g., a latent variable or phenomena that can’t be measured directly, such as a person’s attitude or belief) and produces an observation, distinct from that which is produced by a measure of another construct. Common methods to assess construct validity include, but are not limited to, factor analysis, correlation tests, and item response theory models (including Rasch model).
  • Criterion-related validity indicates the extent to which the instrument’s scores correlate with an external criterion (i.e., usually another measurement from a different instrument) either at present ( concurrent validity ) or in the future ( predictive validity ). A common measurement of this type of validity is the correlation coefficient between two measures.

Often times, when developing, modifying, and interpreting the validity of a given instrument, rather than view or test each type of validity individually, researchers and evaluators test for evidence of several different forms of validity, collectively (e.g., see Samuel Messick’s work regarding validity).

Understanding and Testing Reliability

Reliability refers to the degree to which an instrument yields consistent results. Common measures of reliability include internal consistency, test-retest, and inter-rater reliabilities.

  • Internal consistency reliability looks at the consistency of the score of individual items on an instrument, with the scores of a set of items, or subscale, which typically consists of several items to measure a single construct. Cronbach’s alpha is one of the most common methods for checking internal consistency reliability. Group variability, score reliability, number of items, sample sizes, and difficulty level of the instrument also can impact the Cronbach’s alpha value.
  • Test-retest measures the correlation between scores from one administration of an instrument to another, usually within an interval of 2 to 3 weeks. Unlike pre-post tests, no treatment occurs between the first and second administrations of the instrument, in order to test-retest reliability. A similar type of reliability called alternate forms , involves using slightly different forms or versions of an instrument to see if different versions yield consistent results.
  • Inter-rater reliability checks the degree of agreement among raters (i.e., those completing items on an instrument). Common situations where more than one rater is involved may occur when more than one person conducts classroom observations, uses an observation protocol or scores an open-ended test, using a rubric or other standard protocol. Kappa statistics, correlation coefficients, and intra-class correlation (ICC) coefficient are some of the commonly reported measures of inter-rater reliability.

Developing a valid and reliable instrument usually requires multiple iterations of piloting and testing which can be resource intensive. Therefore, when available, I suggest using already established valid and reliable instruments, such as those published in peer-reviewed journal articles. However, even when using these instruments, you should re-check validity and reliability, using the methods of your study and your own participants’ data before running additional statistical analyses. This process will confirm that the instrument performs, as intended, in your study with the population you are studying, even though they are identical to the purpose and population for which the instrument was initially developed. Below are a few additional, useful readings to further inform your understanding of validity and reliability.

Resources for Understanding and Testing Reliability

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985).  Standards for educational and psychological testing . Washington, DC: Authors.
  • Bond, T. G., & Fox, C. M. (2001). Applying the Rasch model: Fundamental measurement in the human sciences . Mahwah, NJ: Lawrence Erlbaum.
  • Cronbach, L. (1990).  Essentials of psychological testing .   New York, NY: Harper & Row.
  • Carmines, E., & Zeller, R. (1979).  Reliability and Validity Assessment . Beverly Hills, CA: Sage Publications.
  • Messick, S. (1987). Validity . ETS Research Report Series, 1987: i–208. doi: 10.1002/j.2330-8516.1987.tb00244.x
  • Liu, X. (2010). Using and developing measurement instruments in science education: A Rasch modeling approach . Charlotte, NC: Information Age.
  • Search for:

Recent Posts

  • Avoiding Data Analysis Pitfalls
  • Advice in Building and Boasting a Successful Grant Funding Track Record
  • Personal History of Miami University’s Discovery and E & A Centers
  • Center Director’s Message

Recent Comments

  • November 2016
  • September 2016
  • February 2016
  • November 2015
  • October 2015
  • Uncategorized
  • Entries feed
  • Comments feed
  • WordPress.org

Grad Coach

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

research instruments validity

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

research instruments validity

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Research aims, research objectives and research questions

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Validity In Psychology Research: Types & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it’s intended to measure. It ensures that the research findings are genuine and not due to extraneous factors.

Validity can be categorized into different types based on internal and external validity .

The concept of validity was formulated by Kelly (1927, p. 14), who stated that a test is valid if it measures what it claims to measure. For example, a test of intelligence should measure intelligence and not something else (such as memory).

Internal and External Validity In Research

Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other confounding factor.

In other words, there is a causal relationship between the independent and dependent variables .

Internal validity can be improved by controlling extraneous variables, using standardized instructions, counterbalancing, and eliminating demand characteristics and investigator effects.

External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity), and over time (historical validity).

External validity can be improved by setting experiments more naturally and using random sampling to select participants.

Types of Validity In Psychology

Two main categories of validity are used to assess the validity of the test (i.e., questionnaire, interview, IQ test, etc.): Content and criterion.

  • Content validity refers to the extent to which a test or measurement represents all aspects of the intended content domain. It assesses whether the test items adequately cover the topic or concept.
  • Criterion validity assesses the performance of a test based on its correlation with a known external criterion or outcome. It can be further divided into concurrent (measured at the same time) and predictive (measuring future performance) validity.

table showing the different types of validity

Face Validity

Face validity is simply whether the test appears (at face value) to measure what it claims to. This is the least sophisticated measure of content-related validity, and is a superficial and subjective assessment based on appearance.

Tests wherein the purpose is clear, even to naïve respondents, are said to have high face validity. Accordingly, tests wherein the purpose is unclear have low face validity (Nevo, 1985).

A direct measurement of face validity is obtained by asking people to rate the validity of a test as it appears to them. This rater could use a Likert scale to assess face validity.

For example:

  • The test is extremely suitable for a given purpose
  • The test is very suitable for that purpose;
  • The test is adequate
  • The test is inadequate
  • The test is irrelevant and, therefore, unsuitable

It is important to select suitable people to rate a test (e.g., questionnaire, interview, IQ test, etc.). For example, individuals who actually take the test would be well placed to judge its face validity.

Also, people who work with the test could offer their opinion (e.g., employers, university administrators, employers). Finally, the researcher could use members of the general public with an interest in the test (e.g., parents of testees, politicians, teachers, etc.).

The face validity of a test can be considered a robust construct only if a reasonable level of agreement exists among raters.

It should be noted that the term face validity should be avoided when the rating is done by an “expert,” as content validity is more appropriate.

Having face validity does not mean that a test really measures what the researcher intends to measure, but only in the judgment of raters that it appears to do so. Consequently, it is a crude and basic measure of validity.

A test item such as “ I have recently thought of killing myself ” has obvious face validity as an item measuring suicidal cognitions and may be useful when measuring symptoms of depression.

However, the implication of items on tests with clear face validity is that they are more vulnerable to social desirability bias. Individuals may manipulate their responses to deny or hide problems or exaggerate behaviors to present a positive image of themselves.

It is possible for a test item to lack face validity but still have general validity and measure what it claims to measure. This is good because it reduces demand characteristics and makes it harder for respondents to manipulate their answers.

For example, the test item “ I believe in the second coming of Christ ” would lack face validity as a measure of depression (as the purpose of the item is unclear).

This item appeared on the first version of The Minnesota Multiphasic Personality Inventory (MMPI) and loaded on the depression scale.

Because most of the original normative sample of the MMPI were good Christians, only a depressed Christian would think Christ is not coming back. Thus, for this particular religious sample, the item does have general validity but not face validity.

Construct Validity

Construct validity assesses how well a test or measure represents and captures an abstract theoretical concept, known as a construct. It indicates the degree to which the test accurately reflects the construct it intends to measure, often evaluated through relationships with other variables and measures theoretically connected to the construct.

Construct validity was invented by Cronbach and Meehl (1955). This type of content-related validity refers to the extent to which a test captures a specific theoretical construct or trait, and it overlaps with some of the other aspects of validity

Construct validity does not concern the simple, factual question of whether a test measures an attribute.

Instead, it is about the complex question of whether test score interpretations are consistent with a nomological network involving theoretical and observational terms (Cronbach & Meehl, 1955).

To test for construct validity, it must be demonstrated that the phenomenon being measured actually exists. So, the construct validity of a test for intelligence, for example, depends on a model or theory of intelligence .

Construct validity entails demonstrating the power of such a construct to explain a network of research findings and to predict further relationships.

The more evidence a researcher can demonstrate for a test’s construct validity, the better. However, there is no single method of determining the construct validity of a test.

Instead, different methods and approaches are combined to present the overall construct validity of a test. For example, factor analysis and correlational methods can be used.

Convergent validity

Convergent validity is a subtype of construct validity. It assesses the degree to which two measures that theoretically should be related are related.

It demonstrates that measures of similar constructs are highly correlated. It helps confirm that a test accurately measures the intended construct by showing its alignment with other tests designed to measure the same or similar constructs.

For example, suppose there are two different scales used to measure self-esteem:

Scale A and Scale B. If both scales effectively measure self-esteem, then individuals who score high on Scale A should also score high on Scale B, and those who score low on Scale A should score similarly low on Scale B.

If the scores from these two scales show a strong positive correlation, then this provides evidence for convergent validity because it indicates that both scales seem to measure the same underlying construct of self-esteem.

Concurrent Validity (i.e., occurring at the same time)

Concurrent validity evaluates how well a test’s results correlate with the results of a previously established and accepted measure, when both are administered at the same time.

It helps in determining whether a new measure is a good reflection of an established one without waiting to observe outcomes in the future.

If the new test is validated by comparison with a currently existing criterion, we have concurrent validity.

Very often, a new IQ or personality test might be compared with an older but similar test known to have good validity already.

Predictive Validity

Predictive validity assesses how well a test predicts a criterion that will occur in the future. It measures the test’s ability to foresee the performance of an individual on a related criterion measured at a later point in time. It gauges the test’s effectiveness in predicting subsequent real-world outcomes or results.

For example, a prediction may be made on the basis of a new intelligence test that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out, then the test has predictive validity.

Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests. Psychological Bulletin , 52, 281-302.

Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory . New York: Psychological Corporation.

Kelley, T. L. (1927). Interpretation of educational measurements. New York : Macmillan.

Nevo, B. (1985). Face validity revisited . Journal of Educational Measurement , 22(4), 287-293.

Print Friendly, PDF & Email

Related Articles

What Is a Focus Group?

Research Methodology

What Is a Focus Group?

Cross-Cultural Research Methodology In Psychology

Cross-Cultural Research Methodology In Psychology

What Is Internal Validity In Research?

What Is Internal Validity In Research?

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

17.4.1   Validity of instruments

Validity has to do with whether the instrument is measuring what it is intended to measure. Empirical evidence that PROs measure the domains of interest allows strong inferences regarding validity. To provide such evidence, investigators have borrowed validation strategies from psychologists who for many years have struggled with determining whether questionnaires assessing intelligence and attitudes really measure what is intended.

Validation strategies include:

content-related: evidence that the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept(s), population and use;

construct-related: evidence that relationships among items, domains, and concepts conform to a priori hypotheses concerning logical relationships that should exist with other measures or characteristics of patients and patient groups; and

criterion-related (for a PRO instrument used as diagnostic tool): the extent to which the scores of a PRO instrument are related to a criterion measure.

Establishing validity involves examining the logical relationships that should exist between assessment measures. For example, we would expect that patients with lower treadmill exercise capacity generally will have more shortness of breath in daily life than those with higher exercise capacity, and we would expect to see substantial correlations between a new measure of emotional function and existing emotional function questionnaires.

When we are interested in evaluating change over time, we examine correlations of change scores. For example, patients who deteriorate in their treadmill exercise capacity should, in general, show increases in dyspnoea, whereas those whose exercise capacity improves should experience less dyspnoea. Similarly, a new emotional function measure should show improvement in patients who improve on existing measures of emotional function. The technical term for this process is testing an instrument’s construct validity.

Review authors should look for, and evaluate the evidence of, the validity of PROs used in their included studies. Unfortunately, reports of randomized trials and other studies using PROs seldom review evidence of the validity of the instruments they use, but review authors can gain some reassurance from statements (backed by citations) that the questionnaires have been validated previously.

A final concern about validity arises if the measurement instrument is used with a different population, or in a culturally and linguistically different environment, than the one in which it was developed (typically, use of a non-English version of an English-language questionnaire). Ideally, one would have evidence of validity in the population enrolled in the randomized trial. Ideally PRO measures should be re-validated in each study using whatever data are available for the validation, for instance, other endpoints measured. Authors should note, in evaluating evidence of validity, when the population assessed in the trial is different from that used in validation studies.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Reliability vs Validity in Research | Differences, Types & Examples

Reliability vs Validity in Research | Differences, Types & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method , technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design , planning your methods, and writing up your results, especially in quantitative research .

Table of contents

Understanding reliability vs validity, how are reliability and validity assessed, how to ensure validity and reliability in your research, where to write about reliability and validity in a thesis.

Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.

What is reliability?

Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable.

What is validity?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it probably isn’t valid.

However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may not accurately reflect the real situation.

Validity is harder to assess than reliability, but it is even more important. To obtain useful results, the methods you use to collect your data must be valid: the research must be measuring what it claims to measure. This ensures that your discussion of the data and the conclusions you draw are also valid.

Prevent plagiarism, run a free check.

Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. Methods of estimating reliability and validity are usually split up into different types.

Types of reliability

Different types of reliability can be estimated through various statistical methods.

Types of validity

The validity of a measurement can be estimated based on three main types of evidence. Each type can be evaluated through expert judgement or statistical methods.

To assess the validity of a cause-and-effect relationship, you also need to consider internal validity (the design of the experiment ) and external validity (the generalisability of the results).

The reliability and validity of your results depends on creating a strong research design , choosing appropriate methods and samples, and conducting the research carefully and consistently.

Ensuring validity

If you use scores or ratings to measure variations in something (such as psychological traits, levels of ability, or physical properties), it’s important that your results reflect the real variations as accurately as possible. Validity should be considered in the very earliest stages of your research, when you decide how you will collect your data .

  • Choose appropriate methods of measurement

Ensure that your method and measurement technique are of high quality and targeted to measure exactly what you want to know. They should be thoroughly researched and based on existing knowledge.

For example, to collect data on a personality trait, you could use a standardised questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the findings of previous studies, and the questions should be carefully and precisely worded.

  • Use appropriate sampling methods to select your subjects

To produce valid generalisable results, clearly define the population you are researching (e.g., people from a specific age range, geographical location, or profession). Ensure that you have enough participants and that they are representative of the population.

Ensuring reliability

Reliability should be considered throughout the data collection process. When you use a tool or technique to collect data, it’s important that the results are precise, stable, and reproducible.

  • Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific behaviours or responses will be counted, and make sure questions are phrased the same way each time.

  • Standardise the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the influence of external factors that might create variation in the results.

For example, in an experimental setup, make sure all participants are given the same information and tested under the same conditions.

It’s appropriate to discuss reliability and validity in various sections of your thesis or dissertation or research paper. Showing that you have taken them into account in planning your research and interpreting the results makes your work more credible and trustworthy.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). Reliability vs Validity in Research | Differences, Types & Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/research-methods/reliability-or-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, the 4 types of validity | types, definitions & examples, a quick guide to experimental design | 5 steps & examples, sampling methods | types, techniques, & examples.

Validity in research: a guide to measuring the right things

Last updated

27 February 2023

Reviewed by

Cathy Heath

Validity is necessary for all types of studies ranging from market validation of a business or product idea to the effectiveness of medical trials and procedures. So, how can you determine whether your research is valid? This guide can help you understand what validity is, the types of validity in research, and the factors that affect research validity.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is validity?

In the most basic sense, validity is the quality of being based on truth or reason. Valid research strives to eliminate the effects of unrelated information and the circumstances under which evidence is collected. 

Validity in research is the ability to conduct an accurate study with the right tools and conditions to yield acceptable and reliable data that can be reproduced. Researchers rely on carefully calibrated tools for precise measurements. However, collecting accurate information can be more of a challenge.

Studies must be conducted in environments that don't sway the results to achieve and maintain validity. They can be compromised by asking the wrong questions or relying on limited data. 

Why is validity important in research?

Research is used to improve life for humans. Every product and discovery, from innovative medical breakthroughs to advanced new products, depends on accurate research to be dependable. Without it, the results couldn't be trusted, and products would likely fail. Businesses would lose money, and patients couldn't rely on medical treatments. 

While wasting money on a lousy product is a concern, lack of validity paints a much grimmer picture in the medical field or producing automobiles and airplanes, for example. Whether you're launching an exciting new product or conducting scientific research, validity can determine success and failure.

  • What is reliability?

Reliability is the ability of a method to yield consistency. If the same result can be consistently achieved by using the same method to measure something, the measurement method is said to be reliable. For example, a thermometer that shows the same temperatures each time in a controlled environment is reliable.

While high reliability is a part of measuring validity, it's only part of the puzzle. If the reliable thermometer hasn't been properly calibrated and reliably measures temperatures two degrees too high, it doesn't provide a valid (accurate) measure of temperature. 

Similarly, if a researcher uses a thermometer to measure weight, the results won't be accurate because it's the wrong tool for the job. 

  • How are reliability and validity assessed?

While measuring reliability is a part of measuring validity, there are distinct ways to assess both measurements for accuracy. 

How is reliability measured?

These measures of consistency and stability help assess reliability, including:

Consistency and stability of the same measure when repeated multiple times and conditions

Consistency and stability of the measure across different test subjects

Consistency and stability of results from different parts of a test designed to measure the same thing

How is validity measured?

Since validity refers to how accurately a method measures what it is intended to measure, it can be difficult to assess the accuracy. Validity can be estimated by comparing research results to other relevant data or theories.

The adherence of a measure to existing knowledge of how the concept is measured

The ability to cover all aspects of the concept being measured

The relation of the result in comparison with other valid measures of the same concept

  • What are the types of validity in a research design?

Research validity is broadly gathered into two groups: internal and external. Yet, this grouping doesn't clearly define the different types of validity. Research validity can be divided into seven distinct groups.

Face validity : A test that appears valid simply because of the appropriateness or relativity of the testing method, included information, or tools used.

Content validity : The determination that the measure used in research covers the full domain of the content.

Construct validity : The assessment of the suitability of the measurement tool to measure the activity being studied.

Internal validity : The assessment of how your research environment affects measurement results. This is where other factors can’t explain the extent of an observed cause-and-effect response.

External validity : The extent to which the study will be accurate beyond the sample and the level to which it can be generalized in other settings, populations, and measures.

Statistical conclusion validity: The determination of whether a relationship exists between procedures and outcomes (appropriate sampling and measuring procedures along with appropriate statistical tests).

Criterion-related validity : A measurement of the quality of your testing methods against a criterion measure (like a “gold standard” test) that is measured at the same time.

  • Examples of validity

Like different types of research and the various ways to measure validity, examples of validity can vary widely. These include:

A questionnaire may be considered valid because each question addresses specific and relevant aspects of the study subject.

In a brand assessment study, researchers can use comparison testing to verify the results of an initial study. For example, the results from a focus group response about brand perception are considered more valid when the results match that of a questionnaire answered by current and potential customers.

A test to measure a class of students' understanding of the English language contains reading, writing, listening, and speaking components to cover the full scope of how language is used.

  • Factors that affect research validity

Certain factors can affect research validity in both positive and negative ways. By understanding the factors that improve validity and those that threaten it, you can enhance the validity of your study. These include:

Random selection of participants vs. the selection of participants that are representative of your study criteria

Blinding with interventions the participants are unaware of (like the use of placebos)

Manipulating the experiment by inserting a variable that will change the results

Randomly assigning participants to treatment and control groups to avoid bias

Following specific procedures during the study to avoid unintended effects

Conducting a study in the field instead of a laboratory for more accurate results

Replicating the study with different factors or settings to compare results

Using statistical methods to adjust for inconclusive data

What are the common validity threats in research, and how can their effects be minimized or nullified?

Research validity can be difficult to achieve because of internal and external threats that produce inaccurate results. These factors can jeopardize validity.

History: Events that occur between an early and later measurement

Maturation: The passage of time in a study can include data on actions that would have naturally occurred outside of the settings of the study

Repeated testing: The outcome of repeated tests can change the outcome of followed tests

Selection of subjects: Unconscious bias which can result in the selection of uniform comparison groups

Statistical regression: Choosing subjects based on extremes doesn't yield an accurate outcome for the majority of individuals

Attrition: When the sample group is diminished significantly during the course of the study

Maturation: When subjects mature during the study, and natural maturation is awarded to the effects of the study

While some validity threats can be minimized or wholly nullified, removing all threats from a study is impossible. For example, random selection can remove unconscious bias and statistical regression. 

Researchers can even hope to avoid attrition by using smaller study groups. Yet, smaller study groups could potentially affect the research in other ways. The best practice for researchers to prevent validity threats is through careful environmental planning and t reliable data-gathering methods. 

  • How to ensure validity in your research

Researchers should be mindful of the importance of validity in the early planning stages of any study to avoid inaccurate results. Researchers must take the time to consider tools and methods as well as how the testing environment matches closely with the natural environment in which results will be used.

The following steps can be used to ensure validity in research:

Choose appropriate methods of measurement

Use appropriate sampling to choose test subjects

Create an accurate testing environment

How do you maintain validity in research?

Accurate research is usually conducted over a period of time with different test subjects. To maintain validity across an entire study, you must take specific steps to ensure that gathered data has the same levels of accuracy. 

Consistency is crucial for maintaining validity in research. When researchers apply methods consistently and standardize the circumstances under which data is collected, validity can be maintained across the entire study.

Is there a need for validation of the research instrument before its implementation?

An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.

  • Understanding research validity for more accurate results

Without validity, research can't provide the accuracy necessary to deliver a useful study. By getting a clear understanding of validity in research, you can take steps to improve your research skills and achieve more accurate results.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 11 January 2024

Last updated: 15 January 2024

Last updated: 17 January 2024

Last updated: 12 May 2023

Last updated: 30 April 2024

Last updated: 18 May 2023

Last updated: 25 November 2023

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

research instruments validity

Users report unexpectedly high data usage, especially during streaming sessions.

research instruments validity

Users find it hard to navigate from the home page to relevant playlists in the app.

research instruments validity

It would be great to have a sleep timer feature, especially for bedtime listening.

research instruments validity

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Uncomplicated Reviews of Educational Research Methods

  • Instrument, Validity, Reliability

.pdf version of this page

Part I: The Instrument

Instrument is the general term that researchers use for a measurement device (survey, test, questionnaire, etc.). To help distinguish between instrument and instrumentation, consider that the instrument is the device and instrumentation is the course of action (the process of developing, testing, and using the device).

Instruments fall into two broad categories, researcher-completed and subject-completed, distinguished by those instruments that researchers administer versus those that are completed by participants. Researchers chose which type of instrument, or instruments, to use based on the research question. Examples are listed below:

Usability refers to the ease with which an instrument can be administered, interpreted by the participant, and scored/interpreted by the researcher. Example usability problems include:

  • Students are asked to rate a lesson immediately after class, but there are only a few minutes before the next class begins (problem with administration).
  • Students are asked to keep self-checklists of their after school activities, but the directions are complicated and the item descriptions confusing (problem with interpretation).
  • Teachers are asked about their attitudes regarding school policy, but some questions are worded poorly which results in low completion rates (problem with scoring/interpretation).

Validity and reliability concerns (discussed below) will help alleviate usability issues. For now, we can identify five usability considerations:

  • How long will it take to administer?
  • Are the directions clear?
  • How easy is it to score?
  • Do equivalent forms exist?
  • Have any problems been reported by others who used it?

It is best to use an existing instrument, one that has been developed and tested numerous times, such as can be found in the Mental Measurements Yearbook . We will turn to why next.

Part II: Validity

Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument. There are numerous statistical tests and measures to assess the validity of quantitative instruments, which generally involves pilot testing. The remainder of this discussion focuses on external validity and content validity.

External validity is the extent to which the results of a study can be generalized from a sample to a population. Establishing eternal validity for an instrument, then, follows directly from sampling. Recall that a sample should be an accurate representation of a population, because the total population may not be available. An instrument that is externally valid helps obtain population generalizability, or the degree to which a sample represents the population.

Content validity refers to the appropriateness of the content of an instrument. In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know? This is particularly important with achievement tests. Consider that a test developer wants to maximize the validity of a unit test for 7th grade mathematics. This would involve taking representative questions from each of the sections of the unit and evaluating them against the desired outcomes.

Part III: Reliability

Reliability can be thought of as consistency. Does the instrument consistently measure what it is intended to measure? It is not possible to calculate reliability; however, there are four general estimators that you may encounter in reading research:

  • Inter-Rater/Observer Reliability : The degree to which different raters/observers give consistent answers or estimates.
  • Test-Retest Reliability : The consistency of a measure evaluated over time.
  • Parallel-Forms Reliability: The reliability of two tests constructed the same way, from the same content.
  • Internal Consistency Reliability: The consistency of results across items, often measured with Cronbach’s Alpha.

Relating Reliability and Validity

Reliability is directly related to the validity of the measure. There are several important principles. First, a test can be considered reliable, but not valid. Consider the SAT, used as a predictor of success in college. It is a reliable test (high scores relate to high GPA), though only a moderately valid indicator of success (due to the lack of structured environment – class attendance, parent-regulated study, and sleeping habits – each holistically related to success).

Second, validity is more important than reliability. Using the above example, college admissions may consider the SAT a reliable test, but not necessarily a valid measure of other quantities colleges seek, such as leadership capability, altruism, and civic involvement. The combination of these aspects, alongside the SAT, is a more valid measure of the applicant’s potential for graduation, later social involvement, and generosity (alumni giving) toward the alma mater.

Finally, the most useful instrument is both valid and reliable. Proponents of the SAT argue that it is both. It is a moderately reliable predictor of future success and a moderately valid measure of a student’s knowledge in Mathematics, Critical Reading, and Writing.

Part IV: Validity and Reliability in Qualitative Research

Thus far, we have discussed Instrumentation as related to mostly quantitative measurement. Establishing validity and reliability in qualitative research can be less precise, though participant/member checks, peer evaluation (another researcher checks the researcher’s inferences based on the instrument ( Denzin & Lincoln, 2005 ), and multiple methods (keyword: triangulation ), are convincingly used. Some qualitative researchers reject the concept of validity due to the constructivist viewpoint that reality is unique to the individual, and cannot be generalized. These researchers argue for a different standard for judging research quality. For a more complete discussion of trustworthiness, see Lincoln and Guba’s (1985) chapter .

Share this:

  • How To Assess Research Validity | Windranger5
  • How unreliable are the judges on Strictly Come Dancing? | Delight Through Logical Misery

Comments are closed.

About Research Rundowns

Research Rundowns was made possible by support from the Dewar College of Education at Valdosta State University .

  • Experimental Design
  • What is Educational Research?
  • Writing Research Questions
  • Mixed Methods Research Designs
  • Qualitative Coding & Analysis
  • Qualitative Research Design
  • Correlation
  • Effect Size
  • Mean & Standard Deviation
  • Significance Testing (t-tests)
  • Steps 1-4: Finding Research
  • Steps 5-6: Analyzing & Organizing
  • Steps 7-9: Citing & Writing
  • Writing a Research Report

Create a free website or blog at WordPress.com.

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar
  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threats of external validity, how to assess reliability and validity.

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Types of validity.

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

Action research for my dissertation?, A brief overview of action research as a responsive, action-oriented, participative and reflective research technique.

This article presents the key advantages and disadvantages of secondary research so you can select the most appropriate research approach for your study.

A confounding variable can potentially affect both the suspected cause and the suspected effect. Here is all you need to know about accounting for confounding variables in research.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Reliability and validity: Importance in Medical Research

Affiliations.

  • 1 Al-Nafees Medical College,Isra University, Islamabad, Pakistan.
  • 2 Fauji Foundation Hospital, Foundation University Medical College, Islamabad, Pakistan.
  • PMID: 34974579
  • DOI: 10.47391/JPMA.06-861

Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool controls random error. The current narrative review was planned to discuss the importance of reliability and validity of data-collection or measurement techniques used in research. It describes and explores comprehensively the reliability and validity of research instruments and also discusses different forms of reliability and validity with concise examples. An attempt has been taken to give a brief literature review regarding the significance of reliability and validity in medical sciences.

Keywords: Validity, Reliability, Medical research, Methodology, Assessment, Research tools..

Publication types

  • Biomedical Research*
  • Reproducibility of Results

Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Instrument validity.

Validity (a concept map shows the various types of validity) A instrument is valid only to the extent that it’s scores permits appropriate inferences to be made about 1) a specific group of people for 2) specific purposes.

An instrument that is a valid measure of third grader’s math skills probably is not a valid measure of high school calculus student’s math skills. An instrument that is a valid predictor of how well students might do in school, may not be a valid measure of how well they will do once they complete school.  So we never say that an instrument is valid or not valid…we say it is valid for a specific purpose with a specific group of people. Validity is specific to the appropriateness of the interpretations we wish to make with the scores.

In the reliability section , we discussed a scale that consistently reported a weight of 15 pounds for someone. While it may be a reliable instrument, it is not a valid instrument to determine someone’s weight in pounds. Just as a measuring tape is a valid instrument to determine people’s height, it is not a valid instrument to determine their weight.

There are three general categories of instrument validity. Content-Related Evidence (also known as Face Validity) Specialists in the content measured by the instrument are asked to judge the appropriateness of the items on the instrument. Do they cover the breath of the content area (does the instrument contain a representative sample of the content being assessed)? Are they in a format that is appropriate for those using the instrument? A test that is intended to measure the quality of science instruction in fifth grade, should cover material covered in the fifth grade science course in a manner appropriate for fifth graders. A national science test might not be a valid measure of local science instruction, although it might be a valid measure of national science standards.

Criterion-Related Evidence Criterion-related evidence is collected by comparing the instrument with some future or current criteria, thus the name criterion-related. The purpose of an instrument dictates whether predictive or concurrent validity is warranted.

– Predictive Validity If an instrument is purported to measure some future performance, predictive validity should be investigated. A comparison must be made between the instrument and some later behavior that it predicts.  Suppose a screening test for 5-year-olds is purported to predict success in kindergarten. To investigate predictive validity, one would give the prescreening instrument to 5-year-olds prior to their entry into kindergarten. The children’s kindergarten performance would be assessed at the end of kindergarten and a correlation would be calculated between the screening instrument scores and the kindergarten performance scores.

– Concurrent Validity Concurrent validity compares scores on an instrument with current performance on some other measure.  Unlike predictive validity, where the second measurement occurs later, concurrent validity requires a second measure at about the same time.   Concurrent validity for a science test could be investigated by correlating scores for the test with scores from another established science test taken about the same time. Another way is to administer the instrument to two groups who are known to differ on the trait being measured by the instrument. One would have support for concurrent validity if the scores for the two groups were very different. An instrument that measures altruism should be able to discriminate those who possess it (nuns) from those who don’t (homicidal maniacs).  One would expect the nuns to score significantly higher on the instrument.

Construct-Related Evidence Construct validity is an on-going process. Please refer to  pages 174-176 for more information. Construct validity will not be on the test.

– Discriminant Validity An instrument does not correlate significantly with variables from which it should differ.

– Convergent Validity An instrument correlates highly with other variables with which it should theoretically correlate.

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.info

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Method of preparing a document for survey instrument validation by experts

Associated data.

Validation of a survey instrument is an important activity in the research process. Face validity and content validity, though being qualitative methods, are essential steps in validating how far the survey instrument can measure what it is intended for. These techniques are used in both scale development processes and a questionnaire that may contain multiple scales. In the face and content validation, a survey instrument is usually validated by experts from academics and practitioners from field or industry. Researchers face challenges in conducting a proper validation because of the lack of an appropriate method for communicating the requirement and receiving the feedback.

In this Paper, the authors develop a template that could be used for the validation of survey instrument.

In instrument development process, after the item pool is generated, the template is completed and sent to the reviewer. The reviewer will be able to give the necessary feedback through the template that will be helpful to the researcher in improving the instrument.

Graphical abstract

Image, graphical abstract

Specifications table

*Method details

Introduction

Survey instruments or questionnaires are the most popular data collection tool because of its many advantages. Collecting data from a huge population in a limited time and at a lower cost, convenient to respondents, anonymity, lack of interviewer bias and standardization of questions are some of the benefits. However, an important disadvantage of a questionnaire is poor data quality due to incomplete and inaccurate questions, wording problems and poor development process. The problems are critical and can be avoided or mitigated [14] .

To ensure the quality of the instrument, using a previously validated questionnaire is useful. This will save time and resources in development process and testing its reliability and validity. However, there can be situations wherein a new questionnaire is needed [5] . Whenever a new scale or questionnaire needs to be developed, following a structured method will help us to develop a quality instrument. There are many approaches in scale development and all the methods include stages for testing reliability and validity among them.

Even though there are many literatures available on the reliability and validity procedures, many researches struggle to operationalize the process. Collingridge [8] wrote in the Methodspace blog of Sage publication that he repeatedly asked professors on how to validate the questions in a survey and unfortunately did not get an answer. Most of the time, researchers send the completely designed questionnaire with the actual measurement scale without providing adequate information for the reviewers to provide proper feedback. This paper is an effort to develop a document template that can capture the feedback of the expert reviewers of the instrument.

This paper is structured as follows: Section 1 provides the introduction to the need for a validation format for research, and the fundamentals of validation and the factors involved in validation from various literature studies are discussed in Section 2. Section 3 presents the methodology used in framing the validation format. Section 4 provides the results of the study. Section 5 presents explanation of how the format can be used and feedback be processed. Finally, Section 6 concludes the paper with a note on contribution.

Review of literature

A questionnaire is explained as “an instrument for the measurement of one or more constructs by means of aggregated item scores, called scales” [21] . A questionnaire can be identified on a continuum of unstructured to structure [14] . A structured questionnaire will “have a similar format, are usually statements, questions, or stimulus words with structured response categories, and require a judgment or description by a respondent or rater” [21] . Research in social science with a positivist paradigm began in the 19th century. The first use of a questionnaire is attributed to the Statistical Society of London as early as 1838. Berthold Sigismund proposed the first guidelines for questionnaire development in 1856, which provided a definite plan for the questionnaire method [13] . In 1941, The British Association for the Advancement of Science provided Acceptance of Quantitative Measures for Sensory Events [26] provided a much pervasive application or questionnaire in research, similar to Guttman scale [15] , Thurstone Scale [27] and Likert Scale [18] .

Carpenter [6] argued that scholars do not follow the best practices in the measurement building procedure. The author claims that “the defaults in the statistical programs, inadequate training and numerous evaluation points can lead to improper practices”. Many researchers have proposed techniques for scale development. We trace the prominent methods from the literature. Table 1 presents various frameworks in scale development.

Frameworks of Scale development.

Reeves and Marbach-Ad [22] argued that the quantitative aspect of social science research is different from science in terms of quantifying the phenomena using instruments. Bollen [4] explained that a social science instrument measures latent variables that are not directly observed, although inferred from observable behaviour. Because of this characteristic of social science measures, there is a need to ensure that what is being measured actually is measuring the intended phenomenon.

The concept of reliability and validity was evolved as early as 1896 by Pearson. The validity theory from 1900 to 1950 basically dealt with the alignment of test scores with other measures. This was operationally tested by correlation. The validity theory was refined during the 1950s to include criterion, content and construct validity. Correlation of the test measure to an accurate criterion score is the criterion validity. In 1955, criterion validity was proposed as concurrent validity and predictive validity. Content validity provides “domain relevance and representativeness of the test instrument”. The concept of construct validity was introduced in 1954 and got increased emphasis, and from 1985 it took a central form as the appropriate test for validity. The new millennium saw a change in the perspectives of validity theory. Contemporary validity theory is a metamorphosis of epistemological and methodological perspectives. Argument-based approach and consequences-based validity are some new concepts that are evolving [24] .

American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) jointly developed ‘Standards for educational and psychological testing’. It is described as “the degree to which evidence and theory support the interpretations of test scores for posed uses of tests” [1] .

Based on the ‘Standards’, the validity tests are classified on the type of evidence. Standards 1.11 to 1.25, describe various evidence to test the validity [1] . Table 2 presents different types of validities based on evidence and their explanation.

Types of validity.

( Source: [1] )

Souza et al. [25] argued that “there is no statistical test to assess specifically the content validity; usually researchers use a qualitative approach, through the assessment of an experts committee, and then, a quantitative approach using the content validity index (CVI).”

Worthington and Whittaker [29] conducted a content analysis on new scales developed between 1995 and 2004. They specifically focused on the use of Exploratory and Confirmatory Factor Analysis (EFA & CFA) procedures in the validation of the scales. They argued that though the post-tests in the validation procedure, which are usually based on factor-analytic techniques, are more scientific and rigorous, the preliminary steps are necessary. Mistakes committed in the initial stages of scale development lead to problems in the later stages.

Messick [20] proposed six distinguishable notions of construct validity for educational and psychological measurements. Among the six, the foremost one is the content validity that looks at the relevance of the content, representativeness and technical quality. In a similar way Oosterveld et al. [21] developed taxonomy of questionnaire design directed towards psychometric aspects. The taxonomy introduces the following questionnaire design methods: (1) coherent, (2) prototypical, (3) internal, (4) external, (5) construct and (6) facet design technique. These methods are related “to six psychometric features guiding them face validity, process validity, homogeneity, criterion validity, construct validity and content validity”. The authors presented these methods under four stages: (1) concept review, (2) item generation, (3) scale development and (4) evaluation. After the definition of the construct in the first stage, the item pool is developed. The item production stage “comprises an item review by judges, e.g., experts, or potential respondents, and a pilot administration of the preliminary questionnaire, the results of which are subsequently used for refinement of the items”.

What needs to be checked?

This paper mainly focuses on the expert validation done under the face validity and content validity stages. Martinez [19] provides a clear distinction between content validity and face validity. “Face validity requires an examination of a measure and the items of which it is composed as sufficient and suitable ‘on its face’ for capturing a concept. A measure with face validity will be visibly relevant to the concept it is intended to measure, and less so to other concepts”. Though face validity is the quick and excellent first step for assessing the appropriateness of measure to capture the concept, it is not sufficient. It needs to be interpreted along with other forms of measurement validity.

“Content validity focuses on the degree to which a measure captures the full dimension of a particular concept. A measure exhibiting high content validity is one that encompasses the full meaning of the concept it is intended to assess” [19] . An extensive review of literature and consultation with experts ensures the validity of the content.

From the review of various literature studies, we arrive at the details of validation that need to be done by experts. Domain or subject matter experts both from academic and industry, a person with expertise in the construct being developed, people familiar with the target population on whom the instrument will be used, users of the instrument, data analysts and those who take decisions based on the scores of the test are recommended as experts. Experts are consulted during the concept development stage and item generation stage. Experts provide feedback on the content, sensitivity and standard settings [10] .

During the concept development stage, experts provide inputs on the definition of the constructs, relating it to the domain and also check with the related concepts. At the item generation stage, experts validate the representativeness and significance of each item to the construct, accuracy of each item in measuring the concept, inclusion or deletion of elements, logical sequence of the items, and scoring models. Experts also validate how the instrument can measure the concept among different groups of respondents. An item is checked for its bias to specific groups such as gender, minority groups and linguistically different groups. Experts also provide standard scores or cutoff scores for decision making [10] .

The second set of reviewers who are experts in questionnaire development basically check the structural aspects of the instrument in terms of common errors such as double-barreled, confusing and leading questions. This also includes language experts, even if the questionnaire is developed in a popular language like English. Other language experts are required in case the instrument involves translation.

There were many attempts to standardize the validation of the questionnaire. Forsyth et al. [11] developed a Forms Appraisal model, which was an exhaustive list of problems that occur in a questionnaire item. This was found to be tiresome for experts. Fowler and Roman [12] developed an ‘Interviewer Rating Form’, which allowed experts to comment on three qualities: (1) trouble reading the question, (2) respondent not understanding the meaning or ideas in the question and (3) respondent having difficulty in providing an answer. The experts had to code as ‘ A ’ for ‘No evidence of a problem’, ‘ B ’ for ‘Possible problem’ and ‘ C ’ for ‘Definite Problem’. Willis and Lessler [28] developed a shorter version of the coding scheme for evaluation of questionnaire items called “Question appraisal system (QAS)”. This system evaluates each item on 26 problem areas under seven heads. The expert needs to just code ‘Yes’ or ‘No’ for each item. Akkerboom and Dehue [2] developed a systematic review of a questionnaire for an interview and self-completion questionnaire with 26 problems items categorized under eight problem areas.

Hinkin [16] recommended a "best practices" of “clearly cite the theoretical literature on which the new measures are based and describe the manner in which the items were developed and the sample used for item development”. The author claims that “in many articles, this information was lacking, and it was not clear whether there was little justification for the items chosen or if the methodology employed was simply not adequately presented”.

Further to the qualitative analysis of the items, recent developments include quantitative assessments of the items. “The content adequacy of a set of newly developed items is assessed by asking respondents to rate the extent to which items corresponded with construct definitions” [16] . Souza et al. [25] suggest using the Content Validity Index (CVI) for the quantitative approach. Experts evaluate every item on a four-point scale, in which “1 = non-equivalent item; 2 = the item needs to be extensively revised so equivalence can be assessed; 3 = equivalent item, needs minor adjustments; and 4 = totally equivalent item”. The number of items with a score of 3 or 4 and dividing it with the total number of answers is used to calculate an index of CVI. The CVI value is the percentage of judges who agree with an item, and the index value of at least 0.80 and higher than 0.90 is accepted.

Information to be provided to the experts

The problems with conducting a face validity and content validity may be attributed to both scale developer and the reviewer. Scale developers do not convey their requirements to the experts properly, and experts are also not sure about what is expected by the researcher. Therefore, a format is developed, which will capture the requirements information for scale validation from both the researcher and the experts.

Covering letter

A covering letter is an important part when sending a questionnaire for review. It can help in persuading a reviewer to support the research. It should be short and simple. A covering letter first invites the experts for the review and provides esteem to the expert. Even if the questionnaire for review is handed over personally, having a covering letter will serve instructions for the review process and the expectations from the reviewer.

Boateng et al. [3] recommended that the researcher specifies the purpose of the construct or the questionnaire being developed, justifying the development of new instruments by confirming that there are no existing instruments are crucial. If there are any similar instruments, how different is the proposed one from the existing instruments.

The covering letter can mention the maximum time required for the review and any compensation that the expert will be awarded. This will motivate the reviewer to contribute their expertise and efforts. Instructions on how to complete the review process, what aspects to be checked, the coding systems and how to give the feedback are also provided in the covering letter. The covering letter ends with a thank you note in advance and personally signed by the instrument developer. Information on further contact details can also be provided at the end of the covering letter.

Introduction to research

Boateng et al. [3] proposed that it is an essential step to articulate the domain(s) before any validation process. They recommend that “the domain being examined should be decided upon and defined before any item activity. A well-defined domain will provide a working knowledge of the phenomenon under study, specify the boundaries of the domain, and ease the process of item generation and content validation”.

In the introduction section, the research problem being addressed, existing theories, the proposed theory or model that will be investigated, list of variables/concepts that are to be measured can be elaborated. Guion [30] defended that for those who do not just accept the content validity by the evaluations of operational definition alone, five conditions will be a tentative answer: “(1) the content domain should be grounded in behavior with a commonly accepted meaning, (2) the content domain must be defined in a manner that is not open to more than one interpretation, (3) the content domain must be related to the purposes of measurement, (4) qualified judges must agree that the domain has been sufficiently sampled and (5) the response content must be dependably observed and evaluated.” Therefore, the information provided in the ‘Introduction’ section will be helpful to the expert to do a content validity at the first step.

Construct-wise item validation

After the need for the measure or the survey instrument is communicated, the domain is validated. The next step is to validate the items. Validation may be done for developing a scale for a single concept or as a questionnaire with multiple concepts of measure. For a multiple construct instrument, the validation is done construct-wise.

In an instrument with multiple constructs, the Introduction provides information at the theory level. The domain validation is done to assess the relevance of the theory to the problem. In the next section, the domain validation is done at variable level. Similar to the Introduction, details about the construct is provided. The definition of the construct, source of the definition, description of the concept, and the operational definition are shared to the experts. Experts will validate the construct by relating it to the relevant domain. If the conceptualization and definition are not properly done, it will result in poor evaluation of the items.

New items are developed by deductive method or deductive method. In deductive methods, items are generated from already existing scales and indicators through literature review. In inductive technique, the items are generated through direct observation, individual interviews, focus group discussion and exploratory research. It is necessary to convey how the item is generated to the expert reviewer. Even when the item or a scale is adopted unaltered; it becomes necessary to validate them to assess their relevance to a particular culture or a region. Even in such situations, it is necessary to inform the reviewer about the source of the items.

Experts review each item and the construct as a whole. For each item, item code, the item statement, measurement scale, the source of item and description of the item are provided. In informing the source of the item, there are three options. When the item is adopted as it is from the previous scales, the source can be provided. If the item is adapted by modifying the earlier item, the source and the original item can be informed along with description of modification done. If the item is developed by induction, the item source can be mentioned. First, experts evaluate each item to assess if they represent the domain of the construct and provide their evaluation and 4-point or 3-point scale. When multiple experts are used for the validation process, this score can also be used for quantitative evaluation. The quality parameters of the item are further evaluated. Researchers may choose the questionnaire appraisal scheme from many different systems available. An open remarks column is provided for experts to give any feedback that is not covered by the format. A comments section is provided at the end of the construct validation section where the experts can give the feedback such underrepresentation of the construct by the items.

Validation of demography items

The same way, the information regarding each of the demography items that will be required in the questionnaire is also included in the format. Finally, space for the expert to comment on the entire instrument is also provided. The template of the evaluation form is provided in the Appendix.

Inferring the feedback

Since the feedback is a qualitative approach, mathematical or statistical approach is not required for inferring the review. Researcher can retain, remove or modify the statements of the questionnaire as indicated by the experts as essential, not essential and modify. As we have recommended using the quality parameters of QAS for describing the problems and issues, researcher will get a precise idea on what need to be corrected. Remarks by the experts will carry additional information in form of comments or suggestion that will be easy to follow when revising the items. General comments at the end of each scale or construct will provide suggestions on adding further items to the construct.

Despite the various frameworks available for the available to the researchers for developing the survey instrument, the quality of the same is not at the desirable level. Content validation of the measuring instrument is an essential requirement of every research. A rigorous process expert validation can avoid the problems at the latter stage. However, researchers are disadvantaged at operationalising the instrument review process. Researchers are challenged with communicating the background information and collecting the feedback. This paper is an attempt to design a standard format for the expert validation of the survey instrument. Through a literature review, the expectations from the expert review for validation are identified. The domain of the construct, relevance, accuracy, inclusion or deletion of items, sensitivity, bias, structural aspects such as language issues, double-barreled, negative, confusing and leading questions need to be validated by the experts. A format is designed with a covering page having an invitation to the experts, their role, introduction to the research and the instrument. Information regarding the scale and the list of the scale item are provided in the subsequent pages. The demography questions are also included for validation. The expert review format will provide standard communication and feedback between the researcher and the expert reviewer that can help in developing a rigorous and quality survey instruments.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

[OPTIONAL. This is where you can acknowledge colleagues who have helped you that are not listed as co-authors, and funding. MethodsX is a community effort, by researchers for researchers. We highly appreciate the work not only of authors submitting, but also of the reviewers who provide valuable input to each submission. We therefore publish a standard ``thank you'' note in each of the articles to acknowledge the efforts made by the respective reviewers.]

Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.mex.2021.101326 .

Appendix. Supplementary materials

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The development and structural validity testing of the Person-centred Practice Inventory–Care (PCPI-C)

Contributed equally to this work with: Brendan George McCormack, Paul F. Slater, Fiona Gilmour, Denise Edgar, Stefan Gschwenter, Sonyia McFadden, Ciara Hughes, Val Wilson, Tanya McCance

Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Faculty of Medicine and Health, Susan Wakil School of Nursing and Midwifery/Sydney Nursing School, The University of Sydney, Camperdown Campus, New South Wales, Australia

ORCID logo

Roles Formal analysis, Methodology, Writing – original draft, Writing – review & editing

Affiliation Institute of Nursing and Health Research, Ulster University, Belfast, Northern Ireland

Roles Data curation, Investigation, Methodology, Writing – review & editing

Affiliation Division of Nursing, Queen Margaret University, Edinburgh, Scotland

Roles Data curation, Formal analysis, Writing – review & editing

Affiliation Nursing and Midwifery Directorate, Illawarra Shoalhaven Local Health District, New South Wales, Australia

Roles Data curation, Methodology, Validation, Writing – review & editing

Affiliation Division of Nursing Science with Focus on Person-Centred Care Research, Karl Landsteiner University of Health Sciences, Krems, Austria

Roles Data curation, Investigation, Validation, Writing – review & editing

Affiliation Prince of Wales Hospital, South East Sydney Local Health District, New South Wales, Australia

Roles Conceptualization, Formal analysis, Methodology, Validation, Writing – original draft, Writing – review & editing

  • Brendan George McCormack, 
  • Paul F. Slater, 
  • Fiona Gilmour, 
  • Denise Edgar, 
  • Stefan Gschwenter, 
  • Sonyia McFadden, 
  • Ciara Hughes, 
  • Val Wilson, 
  • Tanya McCance

PLOS

  • Published: May 10, 2024
  • https://doi.org/10.1371/journal.pone.0303158
  • Reader Comments

Fig 1

Person-centred healthcare focuses on placing the beliefs and values of service users at the centre of decision-making and creating the context for practitioners to do this effectively. Measuring the outcomes arising from person-centred practices is complex and challenging and often adopts multiple perspectives and approaches. Few measurement frameworks are grounded in an explicit person-centred theoretical framework.

In the study reported in this paper, the aim was to develop a valid and reliable instrument to measure the experience of person-centred care by service users (patients)–The Person-centred Practice Inventory-Care (PCPI-C).

Based on the ‘person-centred processes’ construct of an established Person-centred Practice Framework (PCPF), a service user instrument was developed to complement existing instruments informed by the same theoretical framework–the PCPF. An exploratory sequential mixed methods design was used to construct and test the instrument, working with international partners and service users in Scotland, Northern Ireland, Australia and Austria. A three-phase approach was adopted to the development and testing of the PCPI-C: Phase 1 –Item Selection : following an iterative process a list of 20 items were agreed upon by the research team for use in phase 2 of the project; Phase 2 –Instrument Development and Refinement : Development of the PCPI-C was undertaken through two stages. Stage 1 involved three sequential rounds of data collection using focus groups in Scotland, Australia and Northern Ireland; Stage 2 involved distributing the instrument to members of a global community of practice for person-centred practice for review and feedback, as well as refinement and translation through one: one interviews in Austria. Phase 3 : Testing Structural Validity of the PCPI-C : A sample of 452 participants participated in this phase of the study. Service users participating in existing cancer research in the UK, Malta, Poland and Portugal, as well as care homes research in Austria completed the draft PCPI-C. Data were collected over a 14month period (January 2021-March 2022). Descriptive and measures of dispersion statistics were generated for all items to help inform subsequent analysis. Confirmatory factor analysis was conducted using maximum likelihood robust extraction testing of the 5-factor model of the PCPI-C.

The testing of the PCPI-C resulted in a final 18 item instrument. The results demonstrate that the PCPI-C is a psychometrically sound instrument, supporting a five-factor model that examines the service user’s perspective of what constitutes person-centred care.

Conclusion and implications

This new instrument is generic in nature and so can be used to evaluate how person-centredness is perceived by service users in different healthcare contexts and at different levels of an organisation. Thus, it brings a service user perspective to an organisation-wide evaluation framework.

Citation: McCormack BG, Slater PF, Gilmour F, Edgar D, Gschwenter S, McFadden S, et al. (2024) The development and structural validity testing of the Person-centred Practice Inventory–Care (PCPI-C). PLoS ONE 19(5): e0303158. https://doi.org/10.1371/journal.pone.0303158

Editor: Nabeel Al-Yateem, University of Sharjah, UNITED ARAB EMIRATES

Received: January 26, 2023; Accepted: April 20, 2024; Published: May 10, 2024

Copyright: © 2024 McCormack et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data cannot be shared publicly because of ethical reason. Data are available from the Ulster University Institutional Data Access / Ethics Committee (contact via email on [email protected] ) for researchers who meet the criteria for access to confidential data

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Person-centred healthcare focuses on placing the beliefs and values of service users at the centre of decision-making and creating the context for practitioners to do this effectively. Person-centred healthcare goes beyond other models of shared decision-making as it requires practitioners to work with service users (patients) as actively engaged partners in care [ 1 ]. It is widely agreed that person-centred practice has a positive influence on the care experiences of all people associated with healthcare, service users and staff alike. International evidence shows that person-centred practice has the capacity to have a positive effect on the health and social care experiences of service users and staff [ 1 – 4 ]. Person-centred practice is a complex health care process and exists in the presence of respectful relationships, attitudes and behaviours [ 5 ]. Fundamentally, person-centred healthcare can be seen as a move away from neo-liberal models towards the humanising of healthcare delivery, with a focus on the development of individualised approaches to care and interventions, rather than seeing people as ‘products’ that need to be moved through the system in an efficient and cost-effective way [ 6 ].

Person-centred healthcare is underpinned by philosophical and theoretical constructs that frame all aspects of healthcare delivery, from the macro-perspective of policy and organisational practices to the micro-perspective of person-to-person interaction and experience of healthcare (whether as professional or service user) and so is promoted as a core attribute of the healthcare workforce [ 1 , 7 ]. However, Dewing and McCormack [ 8 ] highlighted the problems of the diverse application of concepts, theories and models all under the label of person-centredness, leading to a perception of person-centred healthcare being poorly defined, non-specific and overly generalised. Whilst person-centredness has become a well-used term globally, it is often used interchangeably with other terms such as ’woman-centredness’ [ 9 ], ’child-centredness’ [ 10 ], ’family-centredness’ [ 11 ], ’client-centredness’ [ 12 ] and ’patient-centredness’ [ 13 ]. In their review of person-centred care, Harding et al [ 14 ] identified three fundamental ‘stances’ that encompass person-centred care— Person-centred care as an overarching grouping of concepts : includes care based on shared-decision making, care planning, integrated care, patient information and self-management support; Person-centred care emphasising personhood : people being immersed in their own context and a person as a discrete human being; Person-centred care as partnership : care imbued with mutuality, trust, collaboration for care, and a therapeutic relationship.

Harding et al. adopt the narrow focus of ’care’ in their review, and others contend that for person-centred care to be operationalised there is a need to understand it from an inclusive whole-systems perspective [ 15 ] and as a philosophy to be applied to all persons. This inclusive approach has enabled the principles of person-centredness to be integrated at different levels of healthcare organisations and thus enable its embeddedness in health systems [ 16 – 19 ]. This inclusive approach is significant as person-centred care is impossible to sustain if person-centred cultures do not exist in healthcare organisations [ 20 , 21 ].

McCance and McCormack [ 5 ] developed the Person-centred Practice Framework (PCPF) to highlight the factors that affect the delivery of person-centred practices. McCormack and McCance published the original person-centred nursing framework in 2006. The Framework has evolved over two decades of research and development activity into a transdisciplinary framework and has made a significant contribution to the landscape of person-centredness globally. Not only does it enable the articulation of the dynamic nature of person-centredness, recognising complexity at different levels in healthcare systems, but it offers a common language and a shared understanding of person-centred practice. The Person-centred Practice Framework is underpinned by the following definition of person-centredness:

[A]n approach to practice established through the formation and fostering of healthful relationships between all care providers , service users and others significant to them in their lives . It is underpinned by values of respect for persons , individual right to self-determination , mutual respect and understanding . It is enabled by cultures of empowerment that foster continuous approaches to practice development [ 16 ].

The Person-centred Practice Framework ( Fig 1 ) comprises five domains: the macro context reflects the factors that are strategic and political in nature that influence the development of person-centred cultures; prerequisites focus on the attributes of staff; the practice environment focuses on the context in which healthcare is experienced; the person-centred processes focus on ways of engaging that are necessary to create connections between persons; and the outcome , which is the result of effective person-centred practice. The relationships between the five domains of the Person-centred Practice Framework are represented pictorially, that being, to reach the centre of the framework, strategic and policy frames of reference need to be attended to, then the attributes of staff must be considered as a prerequisite to managing the practice environment and to engaging effectively through the person-centred processes. This ordering ultimately leads to the achievement of the outcome–the central component of the framework. It is also important to recognise that there are relationships and there is overlap between the constructs within each domain.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0303158.g001

In 2015, Slater et al. [ 22 ] developed an instrument for staff to use to measure person centred practice- the Person-centred Practice Inventory- Staff (PCPI-S). The PCPI-S is a 59-item, self-report measure of health professionals’ perceptions of their person-centred practice. The items in the PCPI-S relate to seventeen constructs across three domains of the PCPF (prerequisites, practice environment and person-centred processes). The PCPI-S has been widely used, translated into multiple languages and has undergone extensive psychometric testing [ 23 – 28 ].

No instrument exists to measure service users’ perspectives of person-centred care that is based on an established person-centred theoretical framework or that is designed to compare with service providers perceptions of it. In an attempt to address this gap in the evidence base, this study set out to develop such a valid and reliable instrument. The PCPI-C focuses on the person-centred processes domain, with the intention of measuring service users’ experiences of person-centred care. The person-centred processes are the components of care that directly affect service users’ experiences. The person-centred processes enable person-centred care outcomes to be achieved and include working with the person’s beliefs and values, sharing decision-making, engaging authentically, being sympathetically present and working holistically. Based on the ‘person-centred processes’ construct of the PCPF and relevant items from the PCPI-S, a version for service users was developed.

This paper describes the processes used to develop and test the instrument–The Person-centred Practice Inventory-Care (PCPI-C). The PCPI-C has the potential to enable healthcare services to understand service users’ experience of care and how they align with those of healthcare providers.

Materials and methods

The aim of this research was to develop and test the face validity of a service users’ version of the person-centred practice inventory–The Person-centred Practice Inventory-Care.

The development and testing of the instrument was guided by the instrument development principles of Boateng et al [ 29 ] ( Fig 2 ) and reported in line with the COSMIN guidelines for instrument testing [ 30 , 31 ]. An exploratory sequential mixed methods design was used to construct and test the instrument [ 29 , 30 ] working with international partners and service users. A three-phase approach was adopted to the development and testing of the PCPI-C. As phases 1 and 2 intentionally informed phase 3 (the testing phase), these two phases are included here in our description of methods.

thumbnail

https://doi.org/10.1371/journal.pone.0303158.g002

Ethical approval

Ethics approval was sought and gained for each phase of the study and across each of the participating sites. For phase 2 of the study, a generic research protocol was developed and adapted for use by the Scottish, Australian and Northern Irish teams to apply for local ethical approval. In Scotland, ethics approval was gained from Queen Margaret University Edinburgh Divisional Research Ethics Committee; in Australia, ethics approval was gained from The University of Wollongong and in Northern Ireland ethics approval was gained from the Research Governance Filter Committee, Nursing and Health Research, Ulster University. For phase 3 of the study, secondary analysis of an existing data set was undertaken. For the original study from which this data was derived (see phase 3 for details), ethical approval was granted by the UK Office of Research Ethics Committee Northern Ireland (ORECNI Ref: FCNUR-21-019) and Ulster University Research Ethics Committee. Additional local approvals were obtained for each partner site as required. In addition, a data sharing agreement was generated to facilitate sharing of study data between European Union (EU) sites and the United Kingdom (UK).

Phase 1 –Item selection

An initial item pool for the PCPI-C was identified by <author initials to be added after peer-review> by selecting items from the ‘person-centred processes’ sub-scale of the PCPI-S ( Table 1 ). Sixteen items were extracted, and the wording of the statements was adjusted to reflect a service-user perspective. Additional items were identified (n = 4) to fully represent the construct from a service-user perspective. A final list of 20 items was agreed upon and this 20-item questionnaire was used in Phase 2 of the instrument development.

thumbnail

https://doi.org/10.1371/journal.pone.0303158.t001

Phase 2 –Instrument development and refinement

Testing the validity of PCPI-C was undertaken through three sequential rounds of data collection using focus groups in Scotland, Australia and Northern Ireland. The purpose of these focus groups was to work with service users to share and compare understandings and views of their experiences of healthcare and to consider these experiences in the context of the initial set of PCPI-C items generated in phase 1 of the study. These countries were selected as the lead researchers had established relationships with healthcare partners who were willing to host the research. The inclusion of multiple countries provided different perspectives from service users who used different health services. In Scotland, a convenience sample of service users (n = 11) attending a palliative care day centre of a local hospice was selected. In Australia a cancer support group for people living with a cancer diagnosis (n = 9) was selected and in Northern Ireland, people with lived experience who were attending a community group hosted by a Cancer Charity (n = 9) were selected. All service users were current users of healthcare and so the challenge of memory recall was avoided. The type of conditions/health problems of participants was not the primary concern. Instead, we targeted persons who had recent experiences of the health system. The three centres selected were known to the researchers in those geographical areas and relationships were already established, which helped with gaining access to potential participants. Whilst the research team had potential access to other centres in each country, it was evident at focus group 3 that no significant new issues were being identified from the participants and thus we agreed to not do further rounds of refinement.

A Focus Group guide was developed ( Fig 3 ). Participants were invited to draw on their experiences as a user of the service; particularly remembering what they saw, the way they felt and what they imagined was happening [ 32 ]. The participants were invited to independently complete the PCPI-C and the purpose of the exercise was reiterated i.e. to think about how each question of the PCPI-C reflected their own experiences and their answers to the questions. Following completion of the questionnaire, participants were asked to comment on each question in the PCPI-C (20 questions), with a specific focus on their understanding of the question, what they thought about when they read the question, and any suggestions to improve readability. The focus group was concluded with a discussion on the overall usability of the PCPI-C. Each focus group was audiotaped and the audio recordings were transcribed in full. The facilitators of the focus group then listened to the audio recordings, alongside the transcripts, and identified the common issues that arose from the discussions and noted against each of the questions in the draft PCPI-C. Revisions were made to the questions in accordance with the comments and recommendations of the participants. At the end of the analysis phase of each focus group, a table of comments and recommendations mapped to the questions in the instrument was compiled and sent to the whole research team for review and consideration. The comments and recommendations were reviewed by the research team and amendments made to the draft PCPI-C. The amended draft was then used in the next focus group until a final version was agreed. Focus group 1 was held in Scotland, focus group 2 in Australia and focus group 3 in Northern Ireland. Table 2 presents a summary of the feedback from the final focus group.

thumbnail

https://doi.org/10.1371/journal.pone.0303158.g003

thumbnail

https://doi.org/10.1371/journal.pone.0303158.t002

A final stage of development involved distributing the agreed version of the PCPI-C to members of ‘The International Community of Practice for Person-centred Practice’ (PcP-ICoP) for review and feedback. The PcP-ICoP is an international community of higher education, health and care organisations and individuals who are committed to advancing knowledge in the field of person-centredness. No significant changes to the distributed version were suggested by the PcP-ICoP members, but several members requested permission to translate the instrument into their national language. PcP-ICoP members at the University of Vienna, who were leading on a large research project with nursing homes in the region of Lower Austria, agreed to undertake a parallel translation project as a priority, so they could use the PCPI-C in their research project. The instrument was culturally and linguistically adapted to the nursing home setting in an iterative process by the Austrian research team in collaboration with the international research team. Data were collected through face-to-face interviews by trained research staff. Residents of five nursing homes for older persons in Lower Austria were included. All residents who did not have a cognitive impairment or were physically unable to complete the questionnaire (because of ill-health) (n = 235) were included. 71% of these residents (N = 167) managed to complete the questionnaire. Whilst in Austria, formal ethical approval for non-intervention studies is not required, the team sought informed consent from participants. Particular attention was paid throughout the interviews to assure ongoing consent of residents by carefully guided conversations.

Phase 3: Testing structural validity of the PCPI-C

The aim of this phase was to test the structural validity of the PCPI-C using confirmatory factor analysis with an international sample of service users. The PCPI-C comprises 20 items measured on a 5-point scale ranging from ‘strongly disagree’ to ‘strongly agree. The 20 items represent the 5 constructs comprising the final model to be tested, which is outlined in Table 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0303158.t003

A sample of 452 participants was selected for this phase of the study. The sample selected comprised two groups. Group 1 (n = 285) were service users with cancer (breast, urological and other) receiving radiotherapy in four Cancer Treatment Centres in four European Countries–UK, Malta, Poland and Portugal. These service users were participants in a wider SAFE EUROPE ( www.safeeurope.eu ) project exploring the education and professional migration of therapeutic radiographers in the European Union. In the UK a study information poster with a link to the PCPI-C via Qualtrics © survey was disseminated via UK cancer charity social media websites. Service user information and consent were embedded in the online survey and presented to the participant following the study link. At the non-UK sites, hard copy English versions of the surveys were available in clinical departments where a convenience sampling approach was used, inviting everyone in their final few days of radiotherapy to participate. The ‘DeepL Translator’ software (DeepL GmbH, Cologne, Germany) was used to make the necessary terminology adaptions for both the questionnaire and the participant information sheet across the various countries. Fluent speakers based in the participating sites and who were members of the SAFE EUROPE project team confirmed the accuracy of this process by checking the accuracy of the translated version against the original English version. Participants were provided with study information and had at least 24 hours to decide if they wished to participate. Willing participants were then invited to provide written informed consent by the local study researcher. The study researcher provided the hard copy survey to the service user but did not engage with or assist them during completion. Service users were informed they could take the survey home for completion if they wished. Completed surveys were returned to a drop box in the department or returned by post (data collected May 2021-March 2022). Group 2 were residents in nursing homes in Lower Austria (n = 125). No participating residents had a cognitive impairment and were physically able to complete the questionnaire. Data were collected through face-to-face interviews by trained research staff (data collected January 2021-March 2021).

Statistical analysis

Descriptive and measures of dispersion statistics were generated for all items to help inform subsequent analysis. Measures of appropriateness to conduct factor analysis were conducted using The Kaiser-Meyer-Olkin Measures of Sampling Adequacy and Bartletts Test of Sphericity. Inter-item correlations were generated to examine for collinearity prior to full analysis. Confirmatory factor analysis was conducted using maximum likelihood robust extraction testing of the 5-factor model.

Acceptable fit statistics were set at Root Mean Square Estimations of Approximation (RMSEA) of 0.05 or below; 90% RMSEA higher bracket below 0.08; and Confirmation Fit Indices (CFI) of 0.95 or higher and SRMR below 0.05 [ 33 – 35 ]. Internal consistency was measured using Cronbach alpha scores for factors in the accepted factor model.

The model was re-specified using the modification indices provided in the statistical output until acceptable and a statistically significant relationship was identified. All re-specifications of the model were guided by principles of (1) meaningfulness (a clear theoretical rationale); (2) transitivity (if A is correlated to B, and B correlated to C, then A should correlate with C); and (3) generality (if there is a reason for correlating the errors between one pair of errors, then all pairs for which that reason applies should also be correlated) [ 36 ].

Acceptance modification criteria of:

  • The items to first order factors were initially fitted.
  • Correlated error variance permitted as all items were measuring the same unidimensional construct.
  • Only statistically significant relationship retained to help produce as parsimonious a model as possible.
  • Factor loadings above 0.40 to provide a strong emergent factor structure.

Factor loading scores were based on Comrey and Lee’s [ 37 ] guidelines (>.71 = excellent, >.63 = very good, >.55 = good, >.45 = fair and >.32 = poor) and acceptable factor loading given the sample size (n = 452) were set at >0.3 [ 33 , 38 ].

Results and discussion

Demographic details.

The sample of 452 participants represented an international sample of respondents drawn from across five countries: UK (14.6% n = 66), Portugal (47.8%. n = 216), Austria (27.7%, n = 125), Malta (6.6, n = 30) and Poland (3.3%, n = 15). Table 4 outline the demographic characteristics of the sample. The final sample of 452 participants provides an acceptable ratio 33 of 22:1 respondent to items.

thumbnail

https://doi.org/10.1371/journal.pone.0303158.t004

The means scores indicate that respondents scored the items neutrally. The measures of skewness and kurtosis were acceptable and satisfied the conditions of normality of distribution for further psychometric testing. Examination of the Kaiser Meyer Olkin (0.947) and the Bartlett test for sphericity (4431.68, df = 190, p = 0.00) indicated acceptability of performing factor analysis on the items. Cronbach alpha scores for each of the constructs confirm the acceptability and unidimensionality of each construct.

Examination of the correlation matrix between items shows a range of between 0.144 and 0.740, indicating a broadness in the areas of care the questionnaire items address, as well as no issues of collinearity. The original measurement model was examined using maximum likelihood extraction and the original model had mixed fit statistics. All factor loadings (except for items 11 and 13) were above the threshold of 0.4 ( Table 3 ). Six further modifications were introduced into the original model based on highest scored modification indices until the fit statistics were deemed acceptable ( Table 5 for model fit statistics and Fig 4 for items correlated errors). Two item correlated error modifications were within factors and 4 between factors. The accepted model factor structure is displayed in Fig 4 .

thumbnail

https://doi.org/10.1371/journal.pone.0303158.g004

thumbnail

https://doi.org/10.1371/journal.pone.0303158.t005

Measuring person-centred care is a complex and challenging endeavour [ 39 ]. In a review of existing measures of person-centred care, DeSilva [ 39 ] identified that whilst there are many tools available to measure person-centred care, there was no agreement about which tools were most worthwhile. The complexity of measurement is further reinforced by the multiplicity of terms used that imply a person-centred approach being adopted without explicitly setting out the meaning of the term. Further, person-centred care is multifaceted and comprises a multitude of methods that are held together by a common philosophy of care and organisational goals that focus on service users having the best possible (personalised) experience of care. As DeSilva suggested, “it is a priority to understand what ‘person-centred’ means . Until we know what we want to achieve , it is difficult to know the most appropriate way to measure it . (p 3)” . However, it remains the case that many of the methods adopted are poorly specified and not embedded in clear conceptual or theoretical frameworks [ 40 , 41 ]. A clear advantage of the study reported here is that the PCPI-C is embedded in a theoretical framework of person-centredness (the PCPF) that clearly defines what we mean by person-centred practice. The PCPI-C is explicitly informed by the ‘person-centred processes’ domain of the PCPF, which has an explicit focus on the care processes used by healthcare workers in providing healthcare to service-users.

In the development of the PCPI-C, initial items were selected from the Person-centred Practice Inventory-Staff (PCPI-S) and these items are directly connected with the person-centred processes domain of the PCPF. The PCPI-S has been translated, validated and adopted internationally [ 23 – 28 ] and so provides a robust theoretically informed starting point for the development of the PCPI-C. This starting point contributed to the initial acceptability of the instrument to participants in the focus groups. Like DeSilva, [ 39 ] McCormack et al [ 42 ] and McCormack [ 41 ] have argued that measuring person-centred care as an isolated activity from the evaluation of the impact of contextual factors on the care experienced, is a limited exercise. As McCormack [ 41 ] suggests “ Evaluating person-centred care as a specific intervention or group of interventions , without understanding the impact of these cultural and contextual factors , does little to inform the quality of a service . ” (p1) Using the PCPI-C alongside other instruments such as the PCPI-S helps to generate contrasting perspectives from healthcare providers and healthcare service users, informed by clear definitions of terms that can be integrated in quality improvement and practice development programmes. The development of the PCPI-C was conducted in line with good practice guidelines in instrument development [ 29 ] and underpinned by an internationally recognised person-centred practice theoretical framework, the PCPF [ 5 ]. The PCPI-C provides a psychometrically robust tool to measure service users’ perspectives of person-centred care as an integrated and multi-faceted approach to evaluating person-centredness more generally in healthcare organisations.

With the advancement of Patient Reported Outcome Measures (PROMS) [ 43 , 44 ], Patient Reported Experience Measures (PREMS) [ 45 ] and the World Health Organization (WHO) [ 15 ] emphasis on the development of people-centred and integrated health systems, greater emphasis has been placed on developing measures to determine the person-centredness of care experienced by service users. Several instruments have been developed to measure the effectiveness of person-centred care in specific services, such as mental health [ 45 ], primary care [ 46 , 47 ], aged care [ 48 , 49 ] and community care [ 50 ]. However only one other instrument adopts a generic approach to evaluating services users’ experiences of person-centred care [ 51 ]. The work of Fridberg et al (The Generic Person-centred Care Questionnaire (GPCCQ)) is located in the Gothenburg Centre for Person-centred Care (GPCC) concept of person-centredness—patient narrative, partnership and documentation. Whilst there are clear connections between the GPCCQ and the PCPI-C, a strength of the PCPI-C is that it is set in a broader system of evaluation that views person-centredness as a whole system issue, with all parts of the system needing to be consistent in concepts used, definitions of terms and approaches to evaluation. Whilst the PCPI-S evaluates how person-centredness is perceived at different levels of the organisation, using the same theoretical framework and the same definition of terms, the PCPI-C brings a service user perspective to an organisation-wide evaluation framework.

A clear strength of this study lies in the methods engaged in phase 2. Capturing service user experiences of healthcare has become an important part of the evaluation of effectiveness. Service user experience evaluation methodologies adopt a variety of methods that aim to capture key transferrable themes across patient populations, supported by granular detail of individual specific experience [ 43 ]. This kind of service evaluation depends on systematically capturing a variety of experiences across different service-user groups. In the research reported here, service users from a variety of services including palliative care and cancer services from three countries, engaged in the focus group discussions and were freely able to discuss their experiences of care and consider them in the context of the questionnaire items. The use of focus groups in three different countries enabled different cultural perspectives to be considered in the way participants engaged with discussions and considered the relevance of items and their wording. The sequential approach enabled three rounds of refinement of the items and this enabled the most relevant wording to be achieved. The range of comments and depth of feedback prevented ‘knee-jerk’ changes being made based on one-off comments, but instead, it was possible to compare and contrast the comments and feedback and achieve a more considered outcome. The cultural relevance of the instrument was reinforced through the translation of the instrument to the German language in Austria, as few changes were made to the original wording in the translation process. This approach combined the capturing of individual lived experience with the systematic generation of key themes that can assist with the systematic evaluation of healthcare services. Further, adopting this approach provides a degree of confidence to users of the PCPI-C that it represents real service-user experiences.

The factorial validity of the instrument was supported by the findings of the study. The modified models fit indices suggest a good model fit for the sample [ 31 , 34 , 35 ]. The Confirmation Fit Indices (CFI) fall short of the threshold of >0.95. However, this is above 0.93 which is considered an acceptable level of fit [ 52 ]. Examination of the alpha scores confirm the reliability (internal consistency) of each construct [ 53 ]. All factor loadings were at a statistically significant level and above the acceptable criteria of 0.3 recommended for the sample size [ 38 ]. All but 2 of the loadings (v11 –‘ Staff don’t assume they know what is best for me’ and v13 – ‘My family are included in decisions about my care only when I want them to be’ ) were above the loadings considered as good to excellent [ 37 ]. At the level of construct, previous research by McCance et al [ 54 ] showed that all five constructs of the person-centred processes domain of the Person-centred Practice Framework carried equal significance in shaping how person-centred practice is delivered, and this is borne out by the approval of a 5-factor model in this study. However, it is also probable that there is a degree of overlap between items across the constructs, reflected in the 2 items with lower loadings. Other items in the PCPI-C address perspectives on shared decision-making and family engagement and thus it was concluded that based on the theoretical model and statistical analysis, these 2 items could be removed without compromising the comprehensiveness of the scale, resulting in a final 18-item version of the PCPI-C (available on request).

Whilst a systematic approach to the development of the PCPI-C was adopted, and we engaged with service users in several care settings in different countries, further research is required in the psychometric testing of the instrument across differing conditions, settings and with culturally diverse samples. Whilst the sample does provide an acceptable respondent to item ratio, and the sample contains international respondents, the model structure is not examined across international settings. Likewise, further research is required across service users with differing conditions and clinical settings. Whilst this is a limitation of this study reported here, the psychometric testing of an instrument is a continuous process and further testing of the PCPI-C is welcomed.

Conclusions

This paper has presented the systematic approach adopted to develop and test a theoretically informed instrument for measuring service users’ perspectives of person-centred care. The instrument is one of the first that is generic and theory-informed, enabling it to be applied as part of a comprehensive and integrated framework of evaluation at different levels of healthcare organisations. Whilst the instrument has good statistical properties, ongoing testing is recommended.

Acknowledgments

The authors of this paper acknowledge the significant contributions of all the service users who participated in this study.

  • View Article
  • Google Scholar
  • 2. Institute of Medicine Committee on Quality of Health Care in America (2001) Crossing the Quality Chasm : A New Health System for the 21st Century . Washington: National Academies Press. https://nap.nationalacademies.org/catalog/10027/crossing-the-quality-chasm-a-new-health-system-for-the (Accessed 20/1/2023).
  • PubMed/NCBI
  • 5. McCance T. and McCormack B. (2021) The Person-centred Practice Framework, in McCormack B, McCance T, Martin S, McMillan A, Bulley C (2021) Fundamentals of Person-centred Healthcare Practice . Oxford. Wiley. PP: 23–32 https://www.perlego.com/book/2068078/fundamentals-of-personcentred-healthcare-practice-pdf?utm_source=google&utm_medium=cpc&gclid=Cj0KCQjwtrSLBhCLARIsACh6Rmj9sarf1IjwEHCseXMsPLGeUTTQlJWYL6mfQEQgwO3lnLkUU9Gb0A8aAgT1EALw_wcB .
  • 7. Nursing and Midwifery Council (2018) Future Nurse : Standards of Proficiency for Registered Nurses . London. Nursing and Midwifery Council. https://www.nmc.org.uk/globalassets/sitedocuments/standards-of-proficiency/nurses/future-nurse-proficiencies.pdf (Accessed 20/1/2023).
  • 14. Harding E., Wait S. and Scrutton J. (2015) The State of Play in Person-centred Care : A Pragmatic Review of how Person-centred Care is Defined , Applied and Measured . London: The Health Policy Partnership.
  • 16. McCormack B. and McCance T. (2017) Person-centred Practice in Nursing and Health Care : Theory and Practice . Chichester, UK: Wiley-Blackwell.
  • 17. Buetow S. (2016) Person-centred Healthcare : Balancing the Welfare of Clinicians and Patients . Oxford: Routledge.
  • 32. Kruger R. A. and Casey M. A., (2000). Focus groups : A practical guide for applied research . 3rd ed. Thousand Oaks, CA: Sage Publications.
  • 33. Kline P., (2014). An easy guide to factor analysis . Oxfordshire. Routledge.
  • 34. Byrne B.M., (2013). Structural equation modeling with Mplus : Basic concepts , applications , and programming . Oxfordshire. Routledge.
  • 35. Wang J. and Wang X., (2019). Structural equation modeling : Applications using Mplus . New Jersey. John Wiley & Sons.
  • 37. Comrey A.L. and Lee H.B., (2013). A first course in factor analysis . New York. Psychology press.
  • 38. Hair J.F., Black W.C., Babin B.J., Anderson R.E. R.L. and Tatham , (2018). Multivariate Data Analysis . 8th Edn., New Jersey. Pearson Prentice Hall.
  • 39. DeSilva (2014) Helping measure person-centred care : A review of evidence about commonly used approaches and tools used to help measure person-centred care . London, The Health Foundation. https://www.health.org.uk/publications/helping-measure-person-centred-care .
  • 42. McCormack B, McCance T and Maben J (2013) Outcome Evaluation in the Development of Person-Centred Practice In B McCormack, K Manley and A Titchen (2013) Practice Development in Nursing (Vol 2 ) . Oxford. Wiley-Blackwell Publishing. Pp: 190–211.
  • 43. Irish Platform for Patients Organisations Science & Industry (IPPOSI) (2018). Patient-centred outcome measures in research & healthcare : IPPOSI outcome report . Dublin, Ireland: IPPOSI. https://www.ipposi.ie/wp-content/uploads/2018/12/PCOMs-outcome-report-final-v3.pdf (Accessed 20/1/2023).

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 15 May 2024

Evaluating the psychometric properties of the simplified Chinese version of PROMIS-29 version 2.1 in patients with hematologic malignancies

  • Qianqian Zhang 1 , 2 ,
  • Jinying Zhao 1 , 2 ,
  • Yating Liu 1 , 2 ,
  • Yan Cui 1 , 2 ,
  • Wen Wang 1 , 2 ,
  • Junjie Li 1 , 2 ,
  • Yanxia Liu 1 , 2 ,
  • Fei Tian 1 , 2 ,
  • Zhixin Wang 1 , 2 ,
  • Huijuan Zhang 1 , 2 ,
  • Guiying Liu 1 , 2 ,
  • Qiuhuan Li 4 ,
  • Tingyu Hu 5 ,
  • Wen Zhang 6 &
  • Wenjun Xie 1 , 2  

Scientific Reports volume  14 , Article number:  11153 ( 2024 ) Cite this article

Metrics details

The Patient-Reported Outcomes Measurement Information System 29-item Profile version 2.1 (PROMIS-29 V2.1) is a widely utilized self-reported instrument for assessing health outcomes from the patients’ perspectives. This study aimed to evaluate the psychometric properties of the PROMIS-29 V2.1 Chinese version among patients with hematological malignancy. Conducted as a cross-sectional, this research was approved by the Medical Ethical Committee of the Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (registration number QTJC2022002-EC-1). We employed convenience sampling to enroll eligible patients with hematological malignancy from four tertiary hospitals in Tianjin, Shandong, Jiangsu, and Anhui province in China between June and August 2023. Participants were asked to complete a socio-demographic information questionnaire, the PROMIS-29 V2.1, and the Functional Assessment of Cancer Therapy-General (FACT-G). We assessed the reliability, ceiling and floor effects, structural, convergent discriminant and criterion validity of the PROMIS-29 V2.1. A total of 354 patients with a mean age of 46.93 years was included in the final analysis. The reliability of the PROMIS-29 V2.1 was affirmed, with Cronbach’s α for the domains ranging from 0.787 to 0.968. Except sleep disturbance, the other six domains had ceiling effects, which were seen on physical function (26.0%), anxiety (37.0%), depression (40.4%), fatigue (18.4%), social roles (18.9%) and pain interference (43.2%), respectively. Criterion validity was supported by significant correlations between the PROMIS-29 V2.1 and FACT-G scores, as determined by the Spearman correlation test (P < 0.001). Confirmatory factor analysis (CFA) indicated a good model fit, with indices of χ 2 /df (2.602), IFI (0.960), and RMSEA (0.067). The Average Variance Extracted (AVE) values for the seven dimensions of PROMIS-29 V2.1, ranging from 0.500 to 0.910, demonstrated satisfactory convergent validity. Discriminant validity was confirmed by ideal √AVE values. The Chinese version of the PROMIS-29 V2.1 profile has been validated as an effective instrument for assessing symptoms and functions in patients with hematological malignancy, underscoring its reliability and applicability in this specific patient group.

Introduction

Hematological Malignancy (HM) represents a complex group of highly malignant tumor diseases that are challenging to treat. According to 2020 WHO statistics, the incidence rates of leukemia in China was 5.9 per 100,000, non-Hodgkin lymphoma, multiple myeloma and Hodgkin lymphoma were 6.4 per 100,000, 0.47 per 100,000, 1.5 per 100,000 respectively 1 . Patients afflicted with HM often grapple with a myriad of physical, psychological, and social challenges, exacerbated by both the disease and its associated treatments. In the realm of cancer care, the frequent and precise assessment of symptoms is paramount. Patient-reported outcome (PRO) measures have emerged as a gold standard, offering invaluable insights into patients’ subjective experiences and overall quality of life 2 . These tools are instrumental in fostering enhanced patient-nurse communication, enabling systematic monitoring, and facilitating tailored management of patients’ symptoms, thereby promoting patient-centered care 3 , 4 , 5 , 6 .

The Patient-Reported Outcomes Measurement Information System (PROMIS), an initiative by the National Institutes of Health, is renowned for its innovative self-report measures designed to evaluate the physical, mental, and social facets of health and well-being 7 . The versatility and comprehensiveness of PROMIS have garnered significant attention, marking it as a pivotal tool in the holistic assessment of individual health 2 , 8 , 9 . PROMIS includes item banks that can be administered using computer-adaptive testing, short forms for individual domains, and profiles that yield information about multiple domains for use in clinical trials, observational studies, and clinical practice 7 .

The PROMIS-29 V2.1, in particular, stands out for its robust design, aimed at addressing the gap in universal and generalizable measures for assessing core patient-reported symptoms and functional domains in individuals with chronic diseases 10 . Developed through meticulous processes including literature review, Item Response Theory (IRT) analysis, and expert reviews, the PROMIS-29 V2.1 ensures a comprehensive and standardized evaluation of patients’ health statuses 10 .

Although the PROMIS-29 V2.1 has been translated into Chinese by the PROMIS National Center-China (PNC-China), its application in the context of HM remains limited. There is a conspicuous absence of validation studies exploring the efficacy and reliability of PROMIS-29 V2.1 among HM patients. Given the critical need for nuanced assessments of physical, social, and mental health in this demographic, validating the PROMIS-29 V2.1 could not only enhance clinical practices but also pave the way for international comparative studies.

In light of this, our study is poised to conduct an exhaustive psychometric evaluation of the Chinese version of PROMIS-29 V2.1 among a selected cohort of HM patients in mainland China. We aim to delineate its reliability, validity, and potential applications in this specific medical and cultural context.

Study design

This multicenter cross-sectional study received approval from the Medical Ethical Committee of Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (registration number QTJC2022002-EC-1). We adhered to the Consensus-based standards for the selection of health measurement instruments (COSMIN) guidelines to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 among hematological malignancy patients.

Setting and sample

Patients were conveniently sampled from the hematology departments of four tertiary hospitals across Tianjin, Shandong, Jiangsu, and Anhui provinces in China, between June and August 2023. Based on the 5–10:1 case-to-variable ratio for psychometric evaluation and accounting for a potential 20% invalid sample rate, we aimed for a sample size between 174 and 348 and successfully included 354 cases 9 , 11 . The sample size was aslo sufficient to perform stable and precise model estimation by confirmatory factor analysis (CFA) 11 .

Patients were eligible for the study if they met the following criteria: (a) aged 18 or older, (b) had a diagnosis of Hematological Malignancy, including leukemia, lymphoma, myeloma, myelodysplastic neoplasms and myeloproliferative neoplasms, (c) Being able to speak Mandarin and read Chinese, and (d) signed an informed consent form. Patients with psychiatric illness, cognitive impairment or diagnosis of another cancer type were excluded.

Socio-demographic information questionnaire

A sociodemographic information questionnaire was developed to collect sociodemographic and clinical data including gender, age, residential location, education level, marital status, job, employment status, health insurance, average monthly family income, primary caregiver, diagnose, time since diagnosis, medical costs, treatment phase and medical treatment. Patients self-reported sociodemographic data, while trained nurse researchers extracted clinical data from medical records.

PROMIS-29 V2.1

The Chinese version of the PROMIS-29 V2.1 was used in this study, which was authorized by PNC-China. The PROMIS-29 V2.1 consists of 29 items measuring seven health and function domains: physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities, pain interference and intensity. Except for a single item for pain intensity, all domains include 4 items and are responded to with a five-point Likert scale from 1 to 5. The pain intensity item is answered with a 0 to 10 numeric rating scale ranging from 0 (without pain) to 10 (worst pain imaginable). Item scores in each domain were summed and transformed into T-scores metric: values of 50 (SD = 10) indicate the mean of the U.S. general population ( http://www.healthmeasures.net ) 7 . For physical function and social role, higher scores indicate better functioning and quality of life (QOL). For depression, anxiety, fatigue, pain interference, pain intensity, and sleep disturbance, a higher score indicates more serious implications of disease 7 .

The FACT-G are the most frequently used questionnaires to measure health-related quality of life (HRQOL) in patients with cancer. The FACT-G is comprised of four subscales: physical wellbeing (PWB, 7 items, 0–28), social/family wellbeing (SWB, 7 items, 0–28), emotional wellbeing (EWB, 6 items, 0–24), and functional wellbeing (FWB, 7 items, 0–28) 12 . All items in the FACT-G use a five-point rating scale (0 = not at all, 1 = a little bit, 2 = somewhat, 3 = quite a bit, and 4 = very much). The 12 items PWB l to 7, EWB l, EWB 3 to EWB 6 are reverse entries and need to be scored in reverse. The total score of the scale is 108, and the higher the score, the higher the quality of life 12 , 13 .

Date collection

Eligible patients were enrolled during hospitalization by the trained nurse researchers at each study site, who had received training regarding the study process to ensure the standardization of the data collection. All the participants were informed about the purpose and procedures of the study, and verbal consent was obtained before data collection. In addition, participants were informed of the voluntary nature of participation, participants’ rights, and the confidentiality of the data. Participants could choose to complete the survey either on paper or using web-based questionnaires based on their preferences. Data on every respondent were collected only once. The participants were required to return the questionnaire immediately after completion. To express gratitude, all participants were distributed a bottle of no-hand sanitizer after completing the questionnaire.

Date analysis

Analyses were conducted using IBM SPSS version 21.0 and IBM SPSS Amos Graphics (version 26.0). All significance tests were 2-tailed, with p < 0.05 considered signifcant.

Descriptive statistics were calculated for sample characteristics and study variables, in which continuous variables were analyzed by means and standard deviations, and categorical variables were described by counts and percentages. The PROMIS-29 V2.1 raw scores were transformed into T-scores based on the PROMIS guidelines ( http://www.healthmeasures.net ). The ceiling or floor effects were identified if responses exceeded 15% at the best and the worst possible score. Reliability was assessed via Cronbach’s α coefficient, Composite Reliability (CR) and split-half reliability.

Criterion validity was determined by correlating PROMIS-29 V2.1 domains with similar constructs in FACT-G, using Spearman correlation coefficients. Confirmatory factor analysis (CFA) was carried out using maximum likelihood estimation to examine the construct validity of the PROMIS-29 V2.1 domains. To examine the goodness of model fit, indices including the χ 2 /degree of freedom (χ 2 /df), root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), comparative fit index (CFI), incremental fit index (IFI), Normed Fit Index (NFI), and Tucker–Lewis index (TLI) were included. An acceptable CFA model should have a χ 2 /df < 3; a RMSEA < 0.08; and a GFI, CFI, IFI, NFI, TLI > 0.9 14 . AVE and √AVE index were performed to assess the convergent validity and discriminant validity.

Ethics approval and consent to participate

All participants signed written informed-consent forms and completed questionnaires online at their earliest convenience. Ethical approval was approved by the Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (No. QTJC2022002-EC-1).

Sample characteristics

A total of 400 questionnaires were distributed. 29 eligible participants did not consent to participate, while 371 agreed to be involved. In addition, 17 questionnaires were excluded for that the participants circled the same response choice for every question asked. A sample of 354 was chosen for the final analysis. The average age of the patients was 46.93 years. A majority of the participants were male (57.3%), married (78.8%), and unemployed (78.2%). In terms of education, the largest group had completed high school or an equivalent level of education (39.5%). Most participants were covered by employee health insurance (60.5%), and the prevalent income bracket was ¥3001–¥5000 per month (25.7%). Clinically, leukemia was the most common diagnosis, accounting for 43.8% of the patients. A significant portion (32.8%) were diagnosed for less than 6 months. The majority (83.6%) were undergoing treatment at the time of the survey. See more detail in Table 1 .

Reliability analysis

Regarding the reliability analysis, the internal consistency coefficients, CR and split-half coefficient were calculated. Reliability was excellent for the PROMIS-29 V2.1 scale with Cronbach’s α (0.965) and split-half coefficient (0.927). For all seven domains of PROMIS-29 V2.1 subscales, Cronbach’s α ranged from 0.787 (sleep disturbance) to 0.968 (pain interference and intensity), CR ranged from 0.778 (sleep disturbance) to 0.976 (pain interference and intensity), which were all above the threshold of 0.70, indicating sufficient reliability. See Tables 2 and 6 .

Descriptive statistics, ceiling, and floor statistics

Regarding the mean T-scores of PROMIS-29 V2.1, except the physical function (41.31 ± 11.85) and the ability to participate in social roles and activities (47.64 ± 11.38), the other five domains scores were significantly above than the reference level according to the PROMIS guidelines ( http://www.healthmeasures.net ). See Table 3 .

Floor effects reflect the percentage of people who report the worst possible score; ceiling effects reflect the percentage of people who report the best possible score. And the ceiling or floor effects were identified if responses exceeded 15% at the best and the worst possible score. As mentioned in the methods section, for physical function and social role, higher scores indicate better functioning and QOL. For depression, anxiety, fatigue, pain interference, pain intensity, and sleep disturbance, a higher score indicates more serious implications of disease. As shown in Table 3 , except sleep disturbance, the other six domains had ceiling effects, which were seen on physical function (26.0%), anxiety (37.0%), depression (40.4%), fatigue (18.4%), social roles (18.9%) and pain interference (43.2%), respectively. See Table 3 .

Criterion validity

After normality test, the scores of PROMlS-29 and FACT-G scale did not conform to normal distribution, so Spearman correlation analysis was used to conduct correlation analysis. The absolute value of correlation coefficient between PROMIS-29 V2.1 item scores with the corresponding domains coefficients in the FACT-G ranged from 0.156–0.752 (p < 0.001), showing satisfactory criterion validity. See Table 4 .

Construct validity

In our analysis, the PROMIS-29 V2.1 demonstrated excellent construct validity among patients with HM, as evidenced by a χ 2 /df of 2.602, an IFI of 0.960, and an RMSEA of 0.067. While the GFI was slightly below the ideal threshold at 0.850, the other indices, including AGFI, NFI, CFI, and TLI, exhibited values ranging from 0.937 to 0.960, affirming a commendable model fit (Table 5 ). The revised model, offering a visual representation of these findings, is illustrated in Fig.  1 .

figure 1

Confirmatory factor analysis model for PROMIS-29 V2.1 (F1–F7: anxiety, depression, physical function, fatigue, sleep disturbance, ability to participate in social roles and activities, and pain interference, respectively).

Convergent validity

The Average Variance Extracted (AVE) is the sum of the square of factor load, which represents the comprehensive explanation ability of the potential variable to all the measured variables. According to the general theory, the larger the AVE value, the stronger the potential variable's ability to explain its corresponding item at the same time; conversely, the stronger the item's ability to express the properties of the potential variable. When AVE > 0.5, convergent validity is good 15 , and when between 0.36 and 0.5, it is an acceptable range 15 . In this study, the AVE values for the seven dimensions of PROMIS-29 V2.1 range from 0.500 to 0.910. Each domain’s factor loadings, which are indicative of the relationships between the items and their respective constructs, were notably high across most domains, further corroborating this assertion, showing satisfactory convergent validity. See Table 6 .

Discrimination validity

In this study, the seven dimensions of PROMIS-29 V2.1 were significantly correlated (p < 0.01), and the absolute correlation coefficients are all smaller than the corresponding √AVE, indicating that there is a certain correlation among all latent variables, and a certain degree of differentiation between each other, showing ideal discrimination validity. See Table 7 .

This study is pioneering in its endeavor to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 profile among patients with HM. Our findings affirm the reliability and validity of this instrument in capturing the multifaceted health status, encompassing physical, mental, and social dimensions, of this specific patient group.

Regarding reliability, Cronbach’s alpha is considered an adequate measure of internal consistency 16 . Composite Reliability (CR) reflects whether all questions in each latent variable consistently explain the latent variable, and when the value is higher than 0.70, it indicates that the latent variable has good CR 17 . Compared to Cronbach’s α, CR is more able to incorporate the different factor loadings of each observation item on latent variables into the calculation formula, and its estimated value is closer to the internal consistency reliability of the scale 17 . In this study, both the Cronbach’s α and CR of all domains were close to, or meeting the more stringent criterion of 0.9, which providing evidence of high internal consistency reliability.

The T-scores derived from the PROMIS-29 V2.1 highlighted an apparent diminution in physical function and social participation compared to the reference group. This underscores a pronounced impairment in physical activities and social engagement. The results were similar to those of patients with breast cancer 2 and systemic sclerosis 18 .

Evidence of floor and ceiling effects has been observed in some PROMIS-29 V2.1 domains, which has also been noted in other PROMIS validation projects 8 , 19 . The floor and ceiling effects of the scale mean that the number of respondents who achieved the worst or the best possible score, which reflect the quantity scale features of score distribution 16 . Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the worst or the best possible score, respectively 16 , 20 .

In our study, a significant proportion of participants reported minimal symptoms in anxiety, depression, fatigue, and pain domains, aligning with general population trends. However, pronounced ceiling effects in each domains (except sleep disturbance) could be attributed to the fact that a majority of our sample were undergoing treatment, potentially amplifying these effects. Nevertheless, it would not be problematic when identifying those with poor physical performance. Such limitations may not exist in a future sample including more patients at different stage of the disease.

The criterion validity was demonstrated by its varying degree of correlations with FACT-G. Criterion validity refers to the extent to which scores on a particular instrument relate to a gold standard 16 . Current studies on PROs or QOL in people with HM usually use the FACT-G as the assessment tool 21 , 22 . Spearman correlation coefficients > 0.50 were considered strong correlation, 0.30–0.50 indicated moderate correlation, and < 0.30 indicated weak correlation 15 . In this study, the PROMIS-29 V2.1 domains showed adequate correlations with all corresponding dimensions of the FACT-G (P < 0.01).

CFA showed that the Chinese version of the PROMIS-29 V2.1 in patients with HM had good evidence for construct validity including the presence of the seven domains. According to the results of goodness-of-fit, the model is considered to have a good fitting effect when χ 2 /df < 3, IFI > 0.9, and RMSEA < 0.08 after correction, meanwhile, the values of the five fitting indices (GFI, AGFI, NFI, CFI, and TLI) should be all between 0 and 1, the closer to 0, the worse the fitting, and the closer to 1, the better the fitting 15 . The goodness-of-fit indices for the original domain of PROMIS-29 V2.1 were high. Meanwhile, the PROMIS-29 V2.1 were showing satisfactory convergent validity and discrimination validity. The results underscore the robust structural integrity of the PROMIS-29 V2.1 in capturing the multifaceted health outcomes of patients with HM.

Convergent validity which is evaluated by the AVE index, means that items measuring the same underlying domain should belong to the same dimension and there should have a high degree of correlation between items 15 . In the context of this study, the AVE values for all seven domains of the PROMIS-29 V2.1 were examined, offering insights into the measure’s convergent validity among patients with HM. These findings underscore the instrument’s robustness in capturing the intended constructs with minimal measurement error, attesting to its utility in this specific patient population. The consistency in factor loadings amplifies confidence in the PROMIS-29 V2.1’s ability to offer reliable, nuanced insights into the multifaceted health outcomes of patients with HM.

Discriminant validity evaluates the extent to which a construct is distinct from other constructs, ensuring that it is not highly correlated with other variables, and should theoretically be different from 15 . In this context, it is assessed by comparing the√AVE for each construct with the correlations between that construct and others. Ideal discriminant validity is achieved when the √AVE for each construct is greater than its highest correlation with any other construct 15 . In our study, the PROMIS-29 V2.1 demonstrated excellent discriminant validity among patients with hematologic malignancies. For instance, while there was a notable correlation between anxiety and depression (r = 0.900, p < 0.01), the √AVE values for these constructs were 0.934 and 0.937, respectively, exceeding the correlation coefficient. This pattern was consistent across all construct pairs, underscoring the instrument’s ability to distinguish between different aspects of patients’ health and well-being effectively. These findings affirm the multidimensionality of the PROMIS-29 V2.1 and its applicability in capturing a broad spectrum of health outcomes among patients with hematologic malignancies, without conflating distinct constructs.

To sum up, these findings reinforce the utility of the Chinese version of the PROMIS-29 V2.1 as a reliable tool, mirroring the intricate nuances of patients’ experiences and outcomes. This congruence in outcomes underscores the PROMIS-29 V2.1’s potential as a pivotal tool in both clinical and research settings for this patient population.

Limitations

However, this study has several limitations. First, the participant pool, though multicentric, was confined to tertiary hospitals in China, warranting caution in extrapolating these findings to broader settings and populations. Second, the cross-sectional design precludes insights into the instrument’s responsiveness and interpretability over varying clinical states, marking an avenue for future longitudinal studies. Third, this study doesn’t explain how the questionnaires work in the pre- and post-treatment patient population, and that's what we’re going to explore next.

This study meticulously evaluated the psychometric properties of the Chinese version of the PROMIS-29 V2.1 in patients with HM, utilizing a comprehensive, multicenter sample. Our findings affirm that this version of PROMIS-29 V2.1 is a validated and reliable instrument, adept at measuring a spectrum of symptoms and functional attributes in HM patients. However, the evolution of this instrument’s applicability doesn’t end here. Future studies should consider incorporating Item Response Theory (IRT) methodologies. This advanced approach will facilitate a nuanced, micro-level analysis of item performance, enhancing the precision and applicability of the instrument. In conclusion, our study not only underscores the psychometric properties of the Chinese version of the PROMIS-29 V2.1 but also paves the way for its widespread adoption in assessing and monitoring symptoms and functions among Chinese patients with HM.

Data availability

All data generated or analyzed during this study are included in this published article.

Ferlyay, J. et al. Global Cancer Observatory: Cancer Today (International Agency for Research on Cancer, 2020).

Google Scholar  

Cai, T. et al. Validity and reliability of the Chinese version of the Patient-Reported Outcomes Measurement Information System adult profile-57 (PROMIS-57). Health Qual. Life Outcomes 20 (1), 95 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Goswami, P. et al. HM-PRO: A novel patient-reported outcome measure in hematological malignancy for use in clinical practice. Blood 130 , 2176 (2017).

Thompson, C. A. et al. Association between patient-reported outcomes and physical activity measured on the apple watch in patients with hematological malignancies. Blood 130 , 2179 (2017).

Goswami, P. et al. Translating the science of patient reported outcomes into practice: Meaningfulness of HM-PRO scores in patients with hematological malignancies. Blood 138 , 4860 (2018).

Article   Google Scholar  

Cordoba, R. et al. EUROQoL-5D as a valid patient-reported outcome measurement (PROM) tool to predict health-related quality of life (HRQoL) and survival in patients with hematological malignancies. J. Clin. Oncol. 38S , e19141 (2020).

Hays, R. D. et al. PROMIS ® -29 v2.0 profile physical and mental health summary scores. Qual. Life Res. 27 (7), 1885–1891 (2018).

Huang, W. et al. Preliminary evaluation of the Chinese version of the patient-reported outcomes measurement information system 29-item profile in patients with aortic dissection. Health Qual. Life Outcomes 20 (1), 94 (2022).

Cai, T. et al. Psychometric evaluation of the PROMIS social function short forms in Chinese patients with breast cancer. Health Qual. Life Outcomes 19 (1), 149 (2021).

Cella, D. et al. PROMIS ® adult health profiles: Efficient short-form measures of seven health domains. Value Health 22 (5), 537–544 (2019).

Kahn, J. H. Factor analysis in counseling psychology research, training, and practice: Principles, advances, and applications. Couns. Psychol. 34 (5), 684–718 (2006).

Meregaglia, M. et al. Mapping health-related quality of life scores from FACT-G, FAACT, and FACIT-F onto preference-based EQ-5D-5L utilities in non-small cell lung cancer cachexia. Eur. J. Health Econ. 20 (2), 181–193 (2019).

Article   PubMed   Google Scholar  

Iravani, K. et al. Assessing whether EORTC QLQ-30 and FACT-G measure the same constructs of quality of life in patients with total laryngectomy. Health Qual. Life Outcomes 16 (1), 183 (2018).

Onde, D. & Alvarado, J. M. Reconsidering the conditions for conducting confirmatory factor analysis. Span. J. Psychol. 23 , e55 (2020).

Minglong, W. Structural Equation Modeling-Operation and application of AMOS 2nd edn. (Chongqing University Press, 2010).

Terwee, C. B. et al. Quality criteria were proposed for measurement properties of health status questionnaires. J. Clin. Epidemiol. 60 (1), 34–42 (2007).

Wu, M. Structural Equation Models: Operation and Application of AMOS 1st edn. (Chongqing University Press, 2022).

Morrisroe, K. et al. Validity of the PROMIS-29 in a large Australian cohort of patients with systemic sclerosis. J. Scleroderma Relat. Disord. 2 (3), 188–195 (2017).

Mcmullen, K. et al. Validation of PROMIS-29 domain scores among adult burn survivors: A National Institute on Disability, Independent Living, and Rehabilitation Research Burn Model System Study. J. Trauma Acute Care Surg. 92 (1), 213–222 (2022).

Gulledge, C. M. et al. What are the floor and ceiling effects of patient-reported outcomes measurement information system computer adaptive test domains in orthopaedic patients? A systematic review. Arthroscopy 36 (3), 901 (2020).

Sakellari, I. et al. A prospective study of incidence, clinical and quality of life consequences of oral mucositis post palifermin prophylaxis in patients undergoing high-dose chemotherapy and autologous hematopoietic cell transplantation. Ann. Hematol. 94 (10), 1733–1740 (2015).

Article   CAS   PubMed   Google Scholar  

Hudson, K. E. et al. The surprise question and identification of palliative care needs among hospitalized patients with advanced hematologic or solid malignancies. J. Palliat. Med. 21 (6), 789–795 (2018).

Download references

Acknowledgements

We thank all patients and the data collection team who participated in this study.

This work was supported by the Chinese Nursing Association Research Project (ZHKY202115), CAMS Innovation Fund for Medical Sciences (CIFMS) (2022-I2M-C&T-B-093), Special Research Fund for Central Universities, Peking Union Medical College (3332023063), The fourth batch of Management research projects of Hematology Hospital, Chinese Academy of Medical Sciences (GL2309, GL2314).

Author information

Authors and affiliations.

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China

Qianqian Zhang, Jinying Zhao, Yating Liu, Yan Cui, Wen Wang, Junjie Li, Yanxia Liu, Fei Tian, Zhixin Wang, Huijuan Zhang, Guiying Liu & Wenjun Xie

Tianjin Institutes of Health Science, Tianjin, 301600, China

The First Affiliated Hospital of the University of Science and Technology of China (Anhui Provincial Hospital), Anhui, 230001, China

Qilu Hospital of Shangdong University, Shandong, 250012, China

The Affiliated Hospital of Xuzhou Medical University, Jiangsu, 221002, China

School of Nursing, Fudan University, 305 Fenglin Road, Shanghai, 200032, China

You can also search for this author in PubMed   Google Scholar

Contributions

WJX designed the research. QQZ analyzed data and wrote the manuscript; JYZ, YTL, YC, WW, JJL, YXL, FT, ZXW, HJZ, GYL, YW, QHL, TYH collected patient data and managed database. WZ contributed to data processing and critically edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Wenjun Xie .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhang, Q., Zhao, J., Liu, Y. et al. Evaluating the psychometric properties of the simplified Chinese version of PROMIS-29 version 2.1 in patients with hematologic malignancies. Sci Rep 14 , 11153 (2024). https://doi.org/10.1038/s41598-024-61835-4

Download citation

Received : 03 November 2023

Accepted : 10 May 2024

Published : 15 May 2024

DOI : https://doi.org/10.1038/s41598-024-61835-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Psychometric evaluation
  • Hematological malignancy
  • Patient-reported outcomes

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

research instruments validity

The development of scientific literacy test instruments on Newton's law materials for high school students

  • Wati, Mustika
  • Mahtari, Saiyidah

The lack of use of scientific literacy instruments in school causes students to be unfamiliar with scientific literacy, so it is necessary to conduct research and development on test instruments that can measure students' scientific literacy abilities. This research aims to describe the validity, reliability, level of difficulty, and discriminating power of test instruments. This development research uses the Borg and Gall adaptation model with 199 students from senior high school in Banjarmasin as the test subject. Data analysis techniques in design validation, validity, reliability, difficulty level, and instrument discriminating power were analyzed using Aiken validity, classical test theory and item response theory with the Rasch program. The results showed: (1) design validation from Aiken validity and instrument validity was valid, (2) instrument reliability with excellent criteria, (3) difficulty level instrument is difficult and moderate category, (4) results of the discriminatory power showed that there were one questions with poor criteria and three biased questions so that three items will be discarded. So, it can be concluded that the scientific literacy test instrument could be used to measure scientific literacy skills and evaluate student learning outcomes.

  • PHYSICS EDUCATION

ORIGINAL RESEARCH article

Adaptation of the internet business self-efficacy scale for peruvian students with a commercial profile provisionally accepted.

  • 1 Peruvian Union University, Peru
  • 2 Scientific University of the South, Peru

The final, formatted version of the article will be published soon.

Introduction: Given the lack of instruments to evaluate the sense of efficacy regarding entrepreneurial capacity in Peruvian university students, this study aims to translate into Spanish, adapt, and validate the Internet Entrepreneurial Self-efficacy Scale in Peruvian university students with a commercial profile. Method: An instrumental study was conducted where 743 students between 18 and 42 years old participated in careers with a commercial profile (Administration, Accounting, Economics, and other related careers) from the three regions of Peru (Coast, Mountains, Jungle). For analyzing content-based validity, Aiken's V coefficient was used, Cronbach's Alpha coefficient was used for reliability, and internal structure was used through confirmatory factor analysis. Results: A reverse translation was achieved in the appropriate time and context. All items proved to be valid (V > .70), and the reliability of the instrument was very good (α = 0.96). Concerning the results of the confirmatory factor analysis, the three-dimensional structure of the instrument was evaluated, finding an adequate fit (2 (87) = 279.6, p < .001, CFI = .972, RMSEA = .049, SRMR = .025), based on this, the original internal structure was corroborated. In complementary analyses, it was found that the instrument is invariant according to sex and university. Finally, it demonstrates significant correlations with scales that measure similar constructs. Conclusions: The Entrepreneurial Selfefficacy Scale on the Internet shows adequate psychometric properties; therefore, it can be used as a management tool to analyze the entrepreneurial capacity of university students with a commercial profile. These findings allow universities to evaluate the entrepreneurial capabilities of students who can promote sustainable businesses, which in turn improves the relationship between the University, state, and company.

Keywords: entrepreneurial self-efficacy, Validation study, Entrepreneurship, university students, Peru

Received: 26 Jan 2024; Accepted: 13 May 2024.

Copyright: © 2024 Torres-Miranda, Ccama, Niño Valiente, Turpo Chaparro, Castillo-Blanco and Mamani-Benito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Oscar Mamani-Benito, Peruvian Union University, Lima, 05000, Peru

People also looked at

IMAGES

  1. 8. validity and reliability of research instruments

    research instruments validity

  2. PPT

    research instruments validity

  3. [PDF] Validity and Reliability of the Research Instrument; How to Test

    research instruments validity

  4. Validity

    research instruments validity

  5. Validity of Research Instruments

    research instruments validity

  6. Instrument Validity and Reliability

    research instruments validity

COMMENTS

  1. The 4 Types of Validity in Research

    The 4 Types of Validity in Research | Definitions & Examples. Published on September 6, 2019 by Fiona Middleton.Revised on June 22, 2023. Validity tells you how accurately a method measures something. If a method measures what it claims to measure, and the results closely correspond to real-world values, then it can be considered valid.

  2. Validity

    Validity is essential in medical and health research to ensure the accuracy and reliability of diagnostic tools, measurement instruments, and clinical assessments. Validity assessments help determine if a measurement accurately identifies the presence or absence of a medical condition, measures the effectiveness of a treatment, or predicts ...

  3. How to Determine the Validity and Reliability of an Instrument

    Validity refers to the degree to which an instrument accurately measures what it intends to measure. Three common types of validity for researchers and evaluators to consider are content, construct, and criterion validities. Content validity indicates the extent to which items adequately measure or represent the content of the property or trait ...

  4. Validity & Reliability In Research

    In simple terms, validity (also called "construct validity") is all about whether a research instrument accurately measures what it's supposed to measure. For example, let's say you have a set of Likert scales that are supposed to quantify someone's level of overall job satisfaction. If this set of scales focused purely on only one ...

  5. Validity in Research and Psychology: Types & Examples

    In this vein, there are many different types of validity and ways of thinking about it. Let's take a look at several of the more common types. Each kind is a line of evidence that can help support or refute a test's overall validity. In this post, learn about face, content, criterion, discriminant, concurrent, predictive, and construct ...

  6. A Primer on the Validity of Assessment Instruments

    What is validity? 1. Validity in research refers to how accurately a study answers the study question or the strength of the study conclusions. For outcome measures such as surveys or tests, validity refers to the accuracy of measurement. Here validity refers to how well the assessment tool actually measures the underlying outcome of interest.

  7. Validity In Psychology Research: Types & Examples

    In psychology research, validity refers to the extent to which a test or measurement tool accurately measures what it's intended to measure. It ensures that the research findings are genuine and not due to extraneous factors. Validity can be categorized into different types, including construct validity (measuring the intended abstract trait), internal validity (ensuring causal conclusions ...

  8. 17.4.1 Validity of instruments

    17.4.1 Validity of instruments. 17.4.1. Validity of instruments. Validity has to do with whether the instrument is measuring what it is intended to measure. Empirical evidence that PROs measure the domains of interest allows strong inferences regarding validity. To provide such evidence, investigators have borrowed validation strategies from ...

  9. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  10. Validity in Research: A Guide to Better Results

    An essential part of validity is choosing the right research instrument or method for accurate results. Consider the thermometer that is reliable but still produces inaccurate results. You're unlikely to achieve research validity without activities like calibration, content, and construct validity.

  11. Instrument, Validity, Reliability

    Validity is the extent to which an instrument measures what it is supposed to measure and performs as it is designed to perform. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees. As a process, validation involves collecting and analyzing data to assess the accuracy of an instrument.

  12. Quantitative Research Excellence: Study Design and Reliable and Valid

    Learn how to design and measure quantitative research with excellence and validity from this comprehensive article.

  13. Validity and Reliability of the Research Instrument; How to Test the

    The instrument development process followed five phases: drafting the instrument, face validity assessment by experts, data collection involving 242 undergraduate students as samples, discriminant ...

  14. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  15. PDF Validity and reliability in quantitative studies

    1 Convergent validity—shows that an instrument is highly correlated with instruments measuring similar variables. 2 Divergent validity—shows that an instrument is poorly correlated to instruments that measure differ-ent variables. In this case, for example, there should be a low correlation between an instrument that mea-

  16. Reliability and validity: Importance in Medical Research

    Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtain …

  17. Instrument Validity

    Instrument Validity. Validity (a concept map shows the various types of validity) A instrument is valid only to the extent that it's scores permits appropriate inferences to be made about. 1) a specific group of people for. 2) specific purposes. An instrument that is a valid measure of third grader's math skills probably is not a valid ...

  18. Method of preparing a document for survey instrument validation by

    Validation of a survey instrument is an important activity in the research process. Face validity and content validity, though being qualitative methods, are essential steps in validating how far the survey instrument can measure what it is intended for. These techniques are used in both scale development processes and a questionnaire that may ...

  19. Measuring the Validity and Reliability of Research Instruments

    The application of the Rasch model in validity and reliability research instruments is valuable because the model able to define the constructs of valid items and provide a clear definition of the measurable constructs that are consistent with theoretical expectations. Interestingly, this model can be effectively used on items that can be ...

  20. (PDF) Mixed Method Research: Instruments, Validity, Reliability and

    Therefore, the various ways of boosting the validity and reliability of the data and instruments are delineated at length. Finally, an outline of reporting the findings in the mixed method ...

  21. (PDF) Validity and Reliability in Quantitative Research

    Validity refers to whether the measuring instrument mea sures the behavio u r or quality it is intended to measure and is a measure of how well the measuring instrument performs its function ...

  22. The development and structural validity testing of the Person-centred

    A three-phase approach was adopted to the development and testing of the PCPI-C: Phase 1 -Item Selection: following an iterative process a list of 20 items were agreed upon by the research team for use in phase 2 of the project; Phase 2 -Instrument Development and Refinement: Development of the PCPI-C was undertaken through two stages.

  23. Evaluating the psychometric properties of the simplified ...

    The criterion validity was demonstrated by its varying degree of correlations with FACT-G. Criterion validity refers to the extent to which scores on a particular instrument relate to a gold ...

  24. The development of scientific literacy test instruments on ...

    The lack of use of scientific literacy instruments in school causes students to be unfamiliar with scientific literacy, so it is necessary to conduct research and development on test instruments that can measure students' scientific literacy abilities. This research aims to describe the validity, reliability, level of difficulty, and discriminating power of test instruments.

  25. Frontiers

    Introduction: Given the lack of instruments to evaluate the sense of efficacy regarding entrepreneurial capacity in Peruvian university students, this study aims to translate into Spanish, adapt, and validate the Internet Entrepreneurial Self-efficacy Scale in Peruvian university students with a commercial profile. Method: An instrumental study was conducted where 743 students between 18 and ...