Contingency Tables Gerard E. A contingency table is a table of counts. A two-dimensional contingency table is formed by classifying subjects by two variables. One variable determines the row categories; the other variable defines the column categories.

The combinations of row and column categories are called cells. In order to use the statistical methods usually applied to such tables, subjects must fall into one and only one row and column categories. Such categories are said to be exclusive and exhaustive.

Exclusive means the categories don't overlap, so a subject falls into only one category. Exhaustive means that the categories include all possibilities, so there's a category for everyone. Often, categories can be made exhaustive by creating a catch-all such as "Other" or by changing the definition of those being studied to include only the available categories.

Also, the observations must be independent. This can be a problem when, for example, families are studied, because members of the same family are more similar than individuals from different families. The analysis of such data is beyond the current scope of these notes.

Textbooks often devoting a chapter or two to the comparison of two proportions the percentage of high school males and females with eating disorders, for example by using techniques that are similar to those for comparing two means.

When plots are made from two continuous variables where one is an obvious response to the other for example, cholesterol level as a response to saturated fat intakestandard practice is to put the response cholesterol on the vertical Y axis and the carrier fat intake on the horizontal X axis.

For tables of counts, it is becoming common practice for the row categories to specify the populations or groups and the column categories to specify the responses. For example, in studying the association between smoking and disease, the rows categories would be the categories of smoking status while the columns would denote the presence or absence of disease.

This is in keeping with A. Ehernberg's observation that it is easier to make a visual comparison of values in the same column than in the same row. Sampling Schemes There are many ways to generate tables of counts.

Three of the most common sampling schemes are Unrestricted Poisson sampling: Collect data until the sun sets, the money runs out, fatigue sets in, Sampling with the grand total fixed multinomial sampling: Collect data on a predetermined number of individuals and classify them according to the two classification variables.

Sampling with one set of marginal totals fixed compound multinomial sampling: Collect data on a predetermined number of individuals from each category of one of the variables and classify them according to the other variable. This approach is useful when some of the categories are rare and might not be adequately represented if the sampling were unrestricted or only the grand total were fixed.

For example, suppose you wished to assess the association between tobacco use and a rare disease. It would be better to take fixed numbers of subjects with and without the disease and examine them for tobacco use.

Each sampling scheme results in a table of counts. It is impossible to determine which sampling scheme was used merely by looking at the data.

Yet, the sampling scheme is important because some things easily estimated from one scheme are impossible to estimate from the others. The more that is specified by the sampling scheme, the fewer things that can be estimated from the data.

Private If sampling occurs with only the grand total fixed, then any population proportion of interest can be estimated.

For example, we can estimate the population proportion of individuals with eating disorders, the proportion attending public colleges, the proportion attending public college and are without eating disorder, and so on.

Suppose, due to the rarity of eating disorders, 50 individuals with eating disorders and 50 individuals without eating disorders are studied. Many population proportions can no longer be estimated from the data.

It's hardly surprising we can't estimate the proportion of the population with eating disorders. If we choose to look at 50 individuals with eating disorders and 50 without, we obviously shouldn't be able to estimate the population proportion that suffers from eating disorders.

Is it as obvious that we cannot estimate the proportion of the population that attends private colleges? We cannot if there is an association between eating disorder and type of college.The Three-Step Process. It can quite difficult to isolate a testable hypothesis after all of the research and study.

The best way is to adopt a three-step hypothesis; this will help you to narrow things down, and is the most foolproof guide to how to write a hypothesis. A Web site designed to increase the extent to which statistical thinking is embedded in management thinking for decision making under uncertainties.

The null hypothesis (H 0) is a hypothesis which the researcher tries to disprove, reject or nullify.

