A. Measurement B. Samples and also Sampling C. Theory Testing D. The Chi-Square Test E. The T-Test F. ANOVA G. Many Regression

For every measurement of attention and certain question or set of questions, there are a vast number of ways to make questions. Return the guiding principle need to be the particular purposes the the research, over there are better and worse questions for any certain operationalization. Just how to advice the measures?

Two of the major criteria of evaluation in any type of measurement or monitoring are:

Whether we are measuring what us intend to measure.Whether the same measurement procedure yields the very same results.

You are watching: If a test yields consistent results it is said to be

These two principles are validity and also reliability.Reliability is concerned with inquiries of stability and also consistency - go the very same measurement tool yield stable and consistent results when repeated over time. Think about measurement procedures in various other contexts - in construction or woodworking, a tape measure is a highly reliable measure up instrument.

Say you have a piece of hardwood that is 2 1/2 feet long. You measure it when with the tape measure - you get a measure up of 2 1/2 feet. Measure up it again and you get 2 1/2 feet. Measure it repeatedly and you consistently get a measure up of 2 1/2 feet. The tape measure yields reputable results.

Validity refers to the extent we space measuring what we hope to measure up (and what we think we space measuring). To continue with the instance of measuring the piece of wood, a tape measure that has been produced with exact spacing because that inches, feet, etc. Have to yield valid outcomes as well. Measuring this item of wood through a "good" tape measure up should create a exactly measurement of the wood"s length.

To use these ideas to social research, we want to use measurement tools that room both reliable and also valid. We want inquiries that yield constant responses once asked multiple time - this is reliability. Similarly, we want questions that gain accurate responses from respondent - this is validity.


Reliability refers to a problem where a measurement procedure yields constant scores (given an the same measured phenomenon) end repeat measurements. Probably the most straightforward way to assess reliability is to ensure that they fulfill the complying with three criteria of reliability. Steps that space high in reliability should exhibit every three.

Test-Retest Reliability

When a researcher administers the exact same measurement tool multiple times - asks the exact same question, complies with the same study procedures, etc. - walk he/she acquire consistent results, assuming the there has been no change in everything he/she is measuring? This is really the simplest an approach for assessing dependability - as soon as a researcher asks the same human being the same concern twice ("What"s her name?"), does he/she get earlier the same results both times. If so, the measure has actually test-retest reliability. Measure of the item of lumber talked about earlier has high test-retest reliability.

Inter-Item Reliability

This is a dimension that applies to cases where multiple items are provided to measure up a single concept. In together cases, answers come a collection of questions designed to measure up some solitary concept (e.g., altruism) have to be linked with each other.

Interobserver Reliability

Interobserver reliability involves the level to which various interviewers or observers utilizing the same measure get equivalent results. If various observers or interviewers usage the exact same instrument to score the very same thing, their scores must match. For example, the interobserver dependability of an observational evaluate of parent-child interaction is often evaluated by showing two observers a videotape that a parent and also child in ~ play. This observers room asked to use an assessment tool to score the interactions in between parent and child on the tape. If the instrument has high interobserver reliability, the scores of the 2 observers must match.


To reiterate, validity describes the level we room measuring what us hope to measure up (and what us think we room measuring). Exactly how to assess the validity the a collection of measurements? A precious measure need to satisfy four criteria.

Face Validity

This default is an assessment of even if it is a measure up appears, ~ above the face of it, to measure the ide it is intended come measure. This is a very minimum assessment - if a measure cannot accomplish this criterion, climate the other criteria room inconsequential. We have the right to think around observational actions of habits that would certainly have challenge validity. For example, striking the end at another person would have face validity because that an indicator of aggression. Similarly, offering aid to a stranger would fulfill the criterion of confront validity for helping. However, asking people around their favourite movie to measure racial prejudice has tiny face validity.

Content Validity

Content validity concerns the degree to i m sorry a measure up adequately represents all facets the a concept. Consider a series of questions that serve as indicators of depression (don"t feel prefer eating, lost interest in things usually enjoyed, etc.). If over there were various other kinds the common behaviors that mark a human being as depressed that were not had in the index, then the index would have low contents validity due to the fact that it did not adequately represent all facets that the concept.

Criterion-Related Validity

Criterion-related validity applies to instruments than have actually been emerged for usefulness together indicator of specific trait or behavior, either currently or in the future. Because that example, think about the driving test together a social measurement that has pretty an excellent predictive validity. That is to say, an individual"s power on a driving test correlates well v his/her steering ability.

Construct Validity

But for a numerous things we desire to measure, there is not necessarily a pertinent criterion available. In this case, revolve to build validity, which concerns the level to i beg your pardon a measure is concerned other measures as specified by theory or ahead research. Walk a measure up stack up with various other variables the means we mean it to? A an excellent example of this form of validity comes from early self-esteem studies - self-esteem describes a person"s feeling of self-worth or self-respect. Clinical observations in psychology had shown that world who had actually low self-esteem often had depression. Therefore, to develop the build validity that the self-esteem measure, the researchers showed that those with higher scores ~ above the self-esteem measure had lower depression scores, if those with low self-esteem had greater rates of depression.

Validity and also Reliability Compared

So what is the relationship in between validity and also reliability? The two perform not necessarily walk hand-in-hand.

At best, we have actually a measure that has actually both high validity and also high reliability. The yields continual results in repetitive application and it accurately shows what us hope to represent.

See more: Which Of The Following Quotations Best Expresses The Tenet Of Equality ?

It is possible to have actually a measure up that has high reliability however low validity - one the is continuous in getting poor information or continual in absent the mark. *It is also possible to have one that has actually low reliability and low validity - inconsistent and also not on target.

Finally, the is not possible to have a measure that has low reliability and high validity - girlfriend can"t really gain at what you desire or what you"re interested in if her measure fluctuates wildly.