WRITING CENTER "INFORMATION PICK-UP"  (S.S.C.C.)

 

DEFINITION OF BASIC STATISTICS TERMINOLOGY

 

   DEFINITION OF BASIC STATISTICS TERMINOLOGY

 

AVERAGE:  central tendency of group performance--based on mean (X), median (mdn) and mode

 

BELL CURVE:  normal distribution of data (theoretical) when the most frequently occurring scores are grouped in the middle of the plotted frequency distribution, and as scores move away from the middle in both directions (higher or lower than the average), they decrease in frequency to an equal degree

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BIASED SAMPLING:  a situation in which each person in the test population did not have an equal chance of being selected for the sample (e.g. Random sampling was not achieved.), bias = persistent or systematic error in a study

 

CATEGORICAL OR QUALITATIVE DISTRIBUTION:  statistical data summarized in table form and grouped into non-numerical categories

 

CHANCE:  the supposed interpersonal purposeless determiner of unaccountable happenings (Freund), random occurrence

 

CONCEPT:  an abstraction, a generalization of specific qualities drawn from specifics

 

CONTROL GROUP:  "no treatment" group of a research study against whom the "treatment" groups, a method used to fight the "placebo" effect in which a group's expectations may affect results

 

CORRELATION:  a measure of the degree of relationship between two or more variables

 

CORRELATION COEFFICIENT (Pearson Product Moment Correlation Coefficient or the Pearson "r"): (r)  an index that measures the degree of linear (straight line) relationship between two variables;t he size of "r" can range from +1 through 0 to -1; this is the most commonly used measure of correlation; r = 0 means there is no relationship between the two variables

 

            RELATIONSHIPS BETWEEN VARIABLES:  (Newman and Newman):  "If the correlation between two variables is +1 or -1, a perfect relationship exists between these two variables.  In reality there are very few, if any, perfect relationships.  The only one that comes to mind is the correlation between birth and death.  If a person is born, he or she will eventually die; therefore you can predict perfectly without error that anyone born will die.  That is, the correlation between birth and death is perfect."

 

            The positive (+) or negative sign (-) indicates the direction of the relationship and not the magnitude.  The magnitude (strength of the correlation) of the relationship is indicated by the number, regardless of the direction.  The closer to the whole number 1, the greater the magnitude.  A positive sign means that if one increases, the other also increases; if one decreases, the other also decreases.  A negative sign means that if one increases, the other decreases; if the one decreases, the other increases.

 

DECISION THEORY (Bayesian decision theory):  the concept that one must account for all the consequences which can arise when we base decisions on statistical data

 

DISTRIBUTION:  an arrangement of statistical data that shows how many items, or what parts of the data, go into the different intervals or categories into which the data are grouped, the way statistical data falls on a table (numerical or quantitative distributions)  or graph or curve (as a way of summation of statistical information)

 

EMPIRICAL:  objective observations or tests, based on verifiable fact or proof

 

ERROR VARIANCE:  uncontrolled variance, considered a function or result of random variation in measures due to chance

 

EVALUATION:  the sum total of the measures of a given population; the total plan for treating all measures (and findings) from a given test; analysis or summary of data

 

EXPERIMENT:  any process by which data is obtained through the observation of uncontrolled events in nature or controlled procedures in a lab, with observations of the outcomes

 

EXTERNAL VALIDITY:  the extent that a study is able to generalized to other people, groups, investigations--in a social or larger context

 

HEURISTIC:  the science research's potential value for further discovery or investigation

 

HOMOGENEITY:  degree of similarity or "commonality" among items; similarity in terms of the examined characteristics in a group

 

HYPOTHESIS:  a statement of the relationship between two or more variables in an "if then" postulate form

 

IMPROPER INFERENCE:  drawing an unfounded or erroneous conclusion from a set of data

 

INTERVAL:  numbers which represent equal units of measure between points, such scales can be added, subtracted, multiplied and divided

 

LAW OF LARGE NUMBERS:  the logical idea that as you increase the size of the sample, the more representative your research will be

 

MEAN:  mean = the sum of scores arrived at by adding all the scores together DIVIDED BY the number of subjects or scores being added

 

MEASUREMENT:  the assessment of a certain phenomenon or theory in quantitative terms

 

MEDIAN:  the middle point of a range of scores that have been put in order from lowest to highest or highest to lowest (rank ordered), the number at which an equal number of scores fall above and below it  (Note:  If there is an even set of scores, the mean is found between the two middle scores and a new number is used as the "median.")

 

MODE:  most frequently occurring score in a distribution of scores, usually the least frequently used estimate of central tendency as it is the least stable (and is likely to change easily and drastically from sample to sample), least representative of the average of group performance

 

NULL HYPOTHESIS:  a premise that suggests that there is no relationship or correlation between two or more variables

 

NUMERICAL OR QUANTITATIVE DISTRIBUTION:  data grouped according to its numerical size or according to numerical groups

 

OBJECTIVITY:  a method of measurement that is not influenced by the researcher's biases or prior beliefs or expectations

 

PERCENTILE RANK:  method of expressing individual scores in terms of their standing among the total group of scores; "a particular numerical percentile rank indicates the percentage of scores which fall below the given rank" (Newman and Newman)

 

PLACEBO EFFECT:  a reaction to a placebo (a substance having no pharmacological effect but given to placate a patient who supposes it to be a medicine; a pharmacologically inactive substance or a sham procedure administered as a control in testing the efficacy of a drug or course of action) manifested by a lessening of symptoms or the production of anticipated side effects (Random House Webster's College Dictionary)

 

PROBABILITY:  proportion of times that this event occurs over the long run if the experiment is repeated many time sunder uniform conditions, projection into the future of the likelihood of a certain event occurring; the sum of the probabilities of the outcomes that comprise the event

 

1)  The probability of an impossible event = 0.

2)  The probability of a certain event = 1.

3)  The probability of any event must be no less than zero and no greater than 1.

 

PROBABILITY THEORY:  mathematical study of chance, applied to issues in the behavioral, natural, education, medical science, business, social sciences, psychological studies, history and other fields--which uses statistical inference

 

QUARTILE DEVIATION:  estimates variability in a set of data by calculating the 25% (Q1) and the 75% (Q3) points in a distribution of scores, then by subtracting Q1 from Q3 and dividing the difference by 2--to determine the midpoint; the median can be called Q2 since it is the point in the distribution where half of the scores fall above it and half fall below it

 

RANDOMIZATION:  assignment of subjects, objects, treatments, so that each has an equal chance of being assigned by using random procedure

 

RANGE:  the highest score in the distribution minus the lowest score in the distribution, seen as the simplest and least accurate measure of variability

 

RANDOM SAMPLE:  method of selecting a sample of a population (a "statistical universe") so that every member in that group or population being sampled has an equal chance of being chosen or drawn

 

RATIO:  scale that has equal intervals and an absolute 0; "one can add, subtract, multiply, and divide and one can say something is twice as much as something else, for example, a ruler." (Newman and Newman)

 

RELIABILITY:  the consistency of the test measure (e.g. the test, no matter what it is measuring, will produce the same value or one very close to it every time it is used (For example:  A very reliable test has a high test-retest reliability rate--in which the same test takers fall in the same score range upon re-testing.)

 

1)  Increasing the number of items on a test will increase reliability.

2)  Objective methods of scoring will increase test reliability.

3)  Having a test measure a particular concept is likely to increase the test reliability.  (However, the items should not be interconnected or interdependent.)

4)  The items of a test must be approximately equivalent in item difficulty.

5)  Tests should be administered in a standardized manner, without much variability or subjectivity.  (Newman and Newman)

 

SKEWED DISTRIBUTION:  any distribution in which most of the scores are closer to one end or the other (a non-normal or non-symmetrical distribution), with a positive skew with more scores on the lower end of the distribution (e.g. more low scores than high scores) and a negative skew with more scores on the high end of the distribution (e.g. more high scores than low scores)

 

STANDARD DEVIATION:  the average amount each individual score differs form the mean of its group, calculated by subtracting the mean from each score and squaring the differences to eliminate negative numbers; the sum of the squares is then divided by the number of people (or scores) and the square root is taken of this quotient; this is the most frequently used estimate of variability

 

STATISTICAL MODEL:  mathematical formulas applied to sets of numbers to find certain results

 

STATISTICAL SIGNIFICANCE:  unlikely to occur by only chance (where chance is defined operationally by some alpha level--which is a subjective decision--usually at .05 or .01 or .001); the probability that something is likely to happen other than by simple chance

 

SUBSTANTIVE HYPOTHESIS (research hypothesis):  suggests that a relationship between two or more variables exists, an idea which is usually to be tested through research

 

TARGET POPULATION:  the group about which information is being drawn, sample

 

THEORY:  a set of constructs and definitions asserting the relationship among certain events or variables in order to predict and explain the relationships between the variables (e.g. as in a cause-and-effect or other relationship)

 

TREND ANALYSIS:  a curve fitting technique in which one tries to reflect the relationship between factors over a number of repeated measures or observations, with some future predictability

 

VALIDITY:  the degree to which a test measures what it was designed to measure

 

1)  face validity:  students' reactions tot he test (the least accurate estimate of a test's validity)

2)  content validity (definition or logical validity):  the demonstration of how representative the test items are of the content or subject matter which the test supposedly measures

3)  concurrent validity:  how well a test correlates with another test that has already had its validity measured; "known group validity" applies a test to a group for which a certain factor/aspect has been proven by another test to see if the same findings are discovered via this other test measure

4)  predictive validity:  this predicts into the future on the basis of test results (inferential), and then checks that prediction.  If the prediction is correct or if the statistician can predict better than chance, the better the predictive validity

*  Predictive validity and concurrent validity--when taken together--have criterion validity, empirical validity and statistical validity  (accuracy and usefulness)

5)  construct validity:  conglomeration of all other types of validity in seeing how closely a test achieves what "tests"

 

VARIABILITY:  the degree the scores differ or vary around the central tendency...the more representative the measure of central tendency is for scores, the less the variability; the less representative the central tendency measure, the more the variability; variability is based on the measures of range, quartile deviation, and standard deviation)

 

Sources: 

Freund, John E.  Statistics:  A First Course:  3rd Edition.  Englewood Cliffs:

            Prentice-Hall, Inc., 1981.

Hamburg, Morris.  Basic Statistics:  A Modern Approach, Second Edition. 

            New York:  Harcourt Brace Jovanovich, Inc., 1979, 1974.

Mansfield, Edwin.  Basic Statistics with Applications.  New York:  W.W.

            Norton & Company, 1986.

Newman, Isadore and Carole Newman.  Conceptual Statistics for

            Beginners:  Second Edition.  Lanham:  University Press of America,

            Inc., 1994.

 

Note:  EXCEL is an excellent program available on campus PCs that includes a "Wizard" feature which helps make automatic graphic and tables from statistical information which can be imported into a word-processing file.  Talk to your computer lab assistants to see how this may be done.

(Revised 1998)