Inferential analysis is often conducted to examine the relationship between variables or estimate population parameters. There are a wide variety of inferential methods of analyses. The most common include:
This is often used to compare the relationship between two categorical (nominal) variables. Data is presented in tabular layout (cross-tabulation) with the row representing the categories of first variable and the columns presenting the categories of the second variable.
2. The t-test
This is used to examine the relationship between an independent categorical (nominal or ordinal) variable and a dependent numeric (either discrete numeric or continuous) variable. It is only used when the independent categorical variable has two categories. What the test does is to compare the mean for the dependent variable in the two categories of the independent variable. For instance, the t-test may be used to compare mathematic mean score across the two gender categories (male vs. female). The t-test is a parametric test; hence, it can only be used when data fulfills the assumptions for parametric tests such as nominal distribution of the dependent variable. If the data does not meet any of the assumption, the non-parametric equivalent known as the Mann-Whitney U-test is used. There are two types of t-test:
- The independent sample t-test
- The paired sample t-test
Analysis of variance (ANOVA) is similar to the t-test but this one is used where the independent categorical variable has more than two categories. For instance, it may be used to compare the mathematic scores for students in three or more classes. Class (a categorical variable) becomes the independent variable while mathematics scores (a numeric variable) become the dependent variable. ANOVA is also a parametric test hence data must fulfill the basic assumption for it to be used.
4. Correlation Analysis
This method examines the relationship between two numeric (either discrete numeric or continuous) variables. For instance, correlation analysis may be used to examine the relationship between students’ performance in mathematics and their performance in science (the two variables are continuous). Correlation will tell you about the strength as well as the direction of the relationship. It usually gives a correlation coefficient (r) of between 0 and 1; a coefficient closer to 0 suggests a weak relationship while a coefficient closer to1 suggest a strong relationship. The coefficient may either be negative or positive. A positive coefficient indicates that the two variables have a positive relationship (when one increase the other one also increases) while a negative coefficient indicates a negative relationship.
5. Regression Analysis
Regression is an advance method of inferential analysis that enables the examination of relationship between multiple variables (multivariate analysis). It is used for two main purposes:
- Identifying a set of variables that predicts an outcome (the dependent variable). For instance, regression analysis may be used to identify a set of variable that predict the development of obesity (BMI greater than 30).
- Identify which variables have a significant relationship with the outcome variable. This is often indicated by the magnitude and sign of the beta estimates. Unlike univariate methods of analysis that only show association between variables, regression analysis enables the deduction of cause-effect relationships by facilitating the control of other variables that can also have an impact on the outcome variable.
Regression analysis yields several statistical measures. The first is the r-square, which indicates the goodness of fit of the regression model. It indicates the amount of variation in the outcome (dependent) variable that can be explained by predictor (independent) variables. For instance, the regression analysis in our example sought to determine factors that predict BMI of the participants. The r-square is 0.162 which suggest that the predictors in that model explain 16.2% of the changes in the participants BMI.
Another measure that is produced when one runs a regression model is the F-test (ANOVA), which tells whether there is a significant relationship between the outcome variable and all the predictor variables in the model. In our example, the F-test gave a p-value of less than 0.001. This indicates that the relationship between BMI and the predictors is statistically significant.
The third type of measure are the beta coefficients and their associated t-tests. The beta coefficients tell the relationship between each predictor variable and the outcome variable while holding other factors constant. In our example, the beta coefficient table shows that some variables such as age and ethnicity have a statistically significant relationship with BMI while others such as gender do not have a statistically significant relationship with BMI.