WRITING CENTER "INFORMATION PICK-UP" (S.S.C.C.)
DEFINITION OF BASIC STATISTICS TERMINOLOGY
• DEFINITION OF BASIC STATISTICS TERMINOLOGY
AVERAGE: central tendency of group performance--based on mean (X), median (mdn) and mode
BELL CURVE: normal distribution of data (theoretical) when the most frequently occurring scores are grouped in the middle of the plotted frequency distribution, and as scores move away from the middle in both directions (higher or lower than the average), they decrease in frequency to an equal degree
BIASED SAMPLING: a situation in which each person in the test population did not have an equal chance of being selected for the sample (e.g. Random sampling was not achieved.), bias = persistent or systematic error in a study
CATEGORICAL OR QUALITATIVE DISTRIBUTION: statistical data summarized in table form and grouped into non-numerical categories
CHANCE: the supposed interpersonal purposeless determiner of unaccountable happenings (Freund), random occurrence
CONCEPT: an abstraction, a generalization of specific qualities drawn from specifics
CONTROL GROUP: "no treatment" group of a research study against whom the "treatment" groups, a method used to fight the "placebo" effect in which a group's expectations may affect results
CORRELATION: a measure of the degree of relationship between two or more variables
CORRELATION COEFFICIENT (Pearson Product Moment Correlation Coefficient or the Pearson "r"): (r) an index that measures the degree of linear (straight line) relationship between two variables;t he size of "r" can range from +1 through 0 to -1; this is the most commonly used measure of correlation; r = 0 means there is no relationship between the two variables
RELATIONSHIPS BETWEEN VARIABLES: (Newman and Newman): "If the correlation between two variables is +1 or -1, a perfect relationship exists between these two variables. In reality there are very few, if any, perfect relationships. The only one that comes to mind is the correlation between birth and death. If a person is born, he or she will eventually die; therefore you can predict perfectly without error that anyone born will die. That is, the correlation between birth and death is perfect."
The positive (+) or negative sign (-) indicates the direction of the relationship and not the magnitude. The magnitude (strength of the correlation) of the relationship is indicated by the number, regardless of the direction. The closer to the whole number 1, the greater the magnitude. A positive sign means that if one increases, the other also increases; if one decreases, the other also decreases. A negative sign means that if one increases, the other decreases; if the one decreases, the other increases.
DECISION THEORY (Bayesian decision theory): the concept that one must account for all the consequences which can arise when we base decisions on statistical data
DISTRIBUTION: an arrangement of statistical data that shows how many items, or what parts of the data, go into the different intervals or categories into which the data are grouped, the way statistical data falls on a table (numerical or quantitative distributions) or graph or curve (as a way of summation of statistical information)
EMPIRICAL: objective observations or tests, based on verifiable fact or proof
ERROR VARIANCE:
uncontrolled variance, considered a function or result of random
variation in measures due to chance
EVALUATION: the sum total
of the measures of a given population; the total plan for treating all measures
(and findings) from a given test; analysis or summary of data
EXPERIMENT: any process by
which data is obtained through the observation of uncontrolled events in nature
or controlled procedures in a lab, with observations of the outcomes
EXTERNAL VALIDITY: the extent
that a study is able to generalized to other people, groups, investigations--in
a social or larger context
HEURISTIC: the science
research's potential value for further discovery or investigation
HOMOGENEITY: degree of
similarity or "commonality" among items; similarity in terms of the
examined characteristics in a group
HYPOTHESIS: a statement of
the relationship between two or more variables in an "if then"
postulate form
IMPROPER INFERENCE: drawing
an unfounded or erroneous conclusion from a set of data
INTERVAL: numbers which
represent equal units of measure between points, such scales can be added,
subtracted, multiplied and divided
LAW OF LARGE NUMBERS: the
logical idea that as you increase the size of the sample, the more representative
your research will be
MEAN: mean = the sum of
scores arrived at by adding all the scores together DIVIDED BY the number of
subjects or scores being added
MEASUREMENT: the assessment
of a certain phenomenon or theory in quantitative terms
MEDIAN: the middle point of
a range of scores that have been put in order from lowest to highest or highest
to lowest (rank ordered), the number at which an equal number of scores fall
above and below it (Note: If there is an even set of scores, the mean
is found between the two middle scores and a new number is used as the
"median.")
MODE: most frequently
occurring score in a distribution of scores, usually the least frequently used
estimate of central tendency as it is the least stable (and is likely to change
easily and drastically from sample to sample), least representative of the
average of group performance
NULL HYPOTHESIS: a premise
that suggests that there is no relationship or correlation between two or more
variables
NUMERICAL OR QUANTITATIVE DISTRIBUTION: data grouped according to its numerical size or according to
numerical groups
OBJECTIVITY: a method of
measurement that is not influenced by the researcher's biases or prior beliefs
or expectations
PERCENTILE RANK: method of
expressing individual scores in terms of their standing among the total group
of scores; "a particular numerical percentile rank indicates the
percentage of scores which fall below the given rank" (Newman and Newman)
PLACEBO EFFECT: a reaction
to a placebo (a substance having no pharmacological effect but given to placate
a patient who supposes it to be a medicine; a pharmacologically inactive
substance or a sham procedure administered as a control in testing the efficacy
of a drug or course of action) manifested by a lessening of symptoms or the
production of anticipated side effects (Random
House Webster's College Dictionary)
PROBABILITY: proportion of times that this event occurs over the long run if the experiment is repeated many time sunder uniform conditions, projection into the future of the likelihood of a certain event occurring; the sum of the probabilities of the outcomes that comprise the event
1) The probability of an impossible event = 0.
2) The probability of a certain event = 1.
3) The probability of any event must be no less than zero and no
greater than 1.
PROBABILITY THEORY: mathematical study of chance, applied to issues in the behavioral, natural, education, medical science, business, social sciences, psychological studies, history and other fields--which uses statistical inference
QUARTILE DEVIATION: estimates variability in a set of data by calculating the 25% (Q1) and the 75% (Q3) points in a distribution of scores, then by subtracting Q1 from Q3 and dividing the difference by 2--to determine the midpoint; the median can be called Q2 since it is the point in the distribution where half of the scores fall above it and half fall below it
RANDOMIZATION: assignment of subjects, objects, treatments, so that each has an equal chance of being assigned by using random procedure
RANGE: the highest score in the distribution minus the lowest score in the distribution, seen as the simplest and least accurate measure of variability
RANDOM SAMPLE: method of selecting a sample of a population (a "statistical universe") so that every member in that group or population being sampled has an equal chance of being chosen or drawn
RATIO: scale that has equal intervals and an absolute 0; "one can add, subtract, multiply, and divide and one can say something is twice as much as something else, for example, a ruler." (Newman and Newman)
RELIABILITY: the consistency of the test measure (e.g. the test, no matter what it is measuring, will produce the same value or one very close to it every time it is used (For example: A very reliable test has a high test-retest reliability rate--in which the same test takers fall in the same score range upon re-testing.)
1) Increasing the number of items on a test will increase reliability.
2) Objective methods of scoring will increase test reliability.
3) Having a test measure a particular concept is likely to increase the test reliability. (However, the items should not be interconnected or interdependent.)
4) The items of a test must be approximately equivalent in item difficulty.
5) Tests should be administered in a standardized manner, without much variability or subjectivity. (Newman and Newman)
SKEWED DISTRIBUTION: any distribution in which most of the scores are closer to one end or the other (a non-normal or non-symmetrical distribution), with a positive skew with more scores on the lower end of the distribution (e.g. more low scores than high scores) and a negative skew with more scores on the high end of the distribution (e.g. more high scores than low scores)
STANDARD DEVIATION: the average amount each individual score differs form the mean of its group, calculated by subtracting the mean from each score and squaring the differences to eliminate negative numbers; the sum of the squares is then divided by the number of people (or scores) and the square root is taken of this quotient; this is the most frequently used estimate of variability
STATISTICAL MODEL: mathematical formulas applied to sets of numbers to find certain results
STATISTICAL SIGNIFICANCE: unlikely to occur by only chance (where chance is defined operationally by some alpha level--which is a subjective decision--usually at .05 or .01 or .001); the probability that something is likely to happen other than by simple chance
SUBSTANTIVE HYPOTHESIS (research hypothesis): suggests that a relationship between two or more variables exists, an idea which is usually to be tested through research
TARGET POPULATION: the group about which information is being drawn, sample
THEORY: a set of constructs and definitions asserting the relationship among certain events or variables in order to predict and explain the relationships between the variables (e.g. as in a cause-and-effect or other relationship)
TREND ANALYSIS: a curve fitting technique in which one tries to reflect the relationship between factors over a number of repeated measures or observations, with some future predictability
VALIDITY: the degree to which a test measures what it was designed to measure
1) face validity: students' reactions tot he test (the least accurate estimate of a test's validity)
2) content validity (definition or logical validity): the demonstration of how representative the test items are of the content or subject matter which the test supposedly measures
3) concurrent validity: how well a test correlates with another test that has already had its validity measured; "known group validity" applies a test to a group for which a certain factor/aspect has been proven by another test to see if the same findings are discovered via this other test measure
4) predictive validity: this predicts into the future on the basis of test results (inferential), and then checks that prediction. If the prediction is correct or if the statistician can predict better than chance, the better the predictive validity
* Predictive validity and concurrent validity--when taken together--have criterion validity, empirical validity and statistical validity (accuracy and usefulness)
5) construct validity: conglomeration of all other types of validity in seeing how closely a test achieves what "tests"
VARIABILITY: the degree the scores differ or vary around the central tendency...the more representative the measure of central tendency is for scores, the less the variability; the less representative the central tendency measure, the more the variability; variability is based on the measures of range, quartile deviation, and standard deviation)
Sources:
Freund, John E. Statistics: A First Course: 3rd Edition. Englewood Cliffs:
Prentice-Hall, Inc., 1981.
Hamburg, Morris. Basic Statistics: A Modern Approach, Second Edition.
New York: Harcourt Brace Jovanovich, Inc., 1979, 1974.
Mansfield, Edwin. Basic Statistics with Applications. New York: W.W.
Norton & Company, 1986.
Newman, Isadore and Carole Newman. Conceptual Statistics for
Beginners: Second Edition. Lanham: University Press of America,
Inc., 1994.
Note: EXCEL is an excellent program available on campus PCs that includes a "Wizard" feature which helps make automatic graphic and tables from statistical information which can be imported into a word-processing file. Talk to your computer lab assistants to see how this may be done.
(Revised 1998)