Study Notes: Bivariate Data

Free study material for CA Foundation Business Mathematics, Logical Reasoning and Statistics — Direction Tests

Bivariate data analysis involves studying the relationship between two variables to understand how one variable changes with respect to another.

## Core concept

Bivariate data consists of two variables observed on the same unit of observation. Unlike univariate data (single variable), bivariate analysis explores: - Correlation: strength and direction of linear association between two quantitative variables - Regression: predicting one variable (dependent) from another (independent) - Association: relationship between categorical or mixed-type variables

Key distinction: Correlation measures *association*; regression predicts one variable from the other.

## Classification of bivariate data

| Type | Variables | Tool | Example | |------|-----------|------|---------| | Quantitative–Quantitative | Both continuous/discrete | Scatter plot, Correlation, Regression | Height vs Weight | | Qualitative–Qualitative | Both categorical | Contingency table, Chi-square | Gender vs Product preference | | Mixed | One quantitative, one categorical | Box plot, two-way ANOVA | Sales by Region |

## Correlation analysis

Karl Pearson's Correlation Coefficient (r): $$r = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sqrt{\sum(x - \bar{x})^2 \sum(y - \bar{y})^2}}$$

Range: −1 to +1
r = +1: Perfect positive correlation (as x increases, y increases proportionally)
r = −1: Perfect negative correlation (as x increases, y decreases proportionally)
r = 0: No linear correlation
|r| > 0.7: Strong correlation; 0.3 < |r| < 0.7: Moderate; |r| < 0.3: Weak

Spearman's Rank Correlation Coefficient (ρ): Used when data is ordinal or skewed. $$\rho = 1 - \frac{6\sum d^2}{n(n^2-1)}$$ where *d* = difference in ranks, *n* = number of pairs.

## Regression analysis (CA Foundation scope)

Simple Linear Regression: Equation of line of best fit. $$\hat{y} = a + bx$$

where: - b (slope) = $\frac{\sum(x - \bar{x})(y - \bar{y})}{\sum(x - \bar{x})^2}$ - a (intercept) = $\bar{y} - b\bar{x}$

Interpretation: - *a*: value of *y* when *x* = 0 - *b*: change in *y* for one-unit increase in *x*

Line of regression of y on x: Used to predict *y* from *x*. Line of regression of x on y: Used to predict *x* from *y*.

## Worked example

Data: Advertising spend (₹ lakhs) vs Sales (₹ crores) for 5 companies.

| Ad Spend (x) | Sales (y) | |---|---| | 2 | 5 | | 3 | 7 | | 4 | 8 | | 5 | 10 | | 6 | 12 |

Calculate: Pearson's *r* and regression equation.

$\bar{x} = 4$, $\bar{y} = 8.4$

$\sum(x-\bar{x})(y-\bar{y}) = 8 + 3.2 + 0 + 3.2 + 8 = 22.4$

$\sum(x-\bar{x})^2 = 4 + 1 + 0 + 1 + 4 = 10$

$r = \frac{22.4}{\sqrt{10 \times 20.8}} ≈ 0.98$ (very strong positive correlation)

$b = \frac{22.4}{10} = 2.24$; $a = 8.4 - 2.24(4) = 0.44$

Regression equation: $\hat{y} = 0.44 + 2.24x$

*Interpretation*: For every ₹1 lakh increase in ad spend, sales increase by ₹2.24 crore.

## Common exam applications

Predict sales/demand from advertising spend or price
Test whether correlation is statistically significant
Compare strength of two correlations
Identify outliers in scatter plots
Use regression for forecasting

## Common mistakes

Confusing correlation with causation: *r* = 0.8 means association, not that x causes y
Ignoring data type: Pearson's *r* requires quantitative data; use Spearman's for ranks
Extrapolating beyond data range: Regression is unreliable outside observed values
Misreading regression direction: *y* = *a* + *b*(*x*) predicts *y* from *x*, not vice versa

Study Notes: Bivariate Data

Free study material for CA Foundation Business Mathematics, Logical Reasoning and Statistics — Direction Tests

Bivariate data analysis involves studying the relationship between two variables to understand how one variable changes with respect to another.

## Core concept

Key distinction: Correlation measures *association*; regression predicts one variable from the other.

## Classification of bivariate data

## Correlation analysis

Karl Pearson's Correlation Coefficient (r): $$r = \frac{\sum(x - \bar{x})(y - \bar{y})}{\sqrt{\sum(x - \bar{x})^2 \sum(y - \bar{y})^2}}$$

Range: −1 to +1
r = +1: Perfect positive correlation (as x increases, y increases proportionally)
r = −1: Perfect negative correlation (as x increases, y decreases proportionally)
r = 0: No linear correlation
|r| > 0.7: Strong correlation; 0.3 < |r| < 0.7: Moderate; |r| < 0.3: Weak

Spearman's Rank Correlation Coefficient (ρ): Used when data is ordinal or skewed. $$\rho = 1 - \frac{6\sum d^2}{n(n^2-1)}$$ where *d* = difference in ranks, *n* = number of pairs.

## Regression analysis (CA Foundation scope)

Simple Linear Regression: Equation of line of best fit. $$\hat{y} = a + bx$$

where: - b (slope) = $\frac{\sum(x - \bar{x})(y - \bar{y})}{\sum(x - \bar{x})^2}$ - a (intercept) = $\bar{y} - b\bar{x}$

Interpretation: - *a*: value of *y* when *x* = 0 - *b*: change in *y* for one-unit increase in *x*

Line of regression of y on x: Used to predict *y* from *x*. Line of regression of x on y: Used to predict *x* from *y*.

## Worked example

Data: Advertising spend (₹ lakhs) vs Sales (₹ crores) for 5 companies.

| Ad Spend (x) | Sales (y) | |---|---| | 2 | 5 | | 3 | 7 | | 4 | 8 | | 5 | 10 | | 6 | 12 |

Calculate: Pearson's *r* and regression equation.

$\bar{x} = 4$, $\bar{y} = 8.4$

$\sum(x-\bar{x})(y-\bar{y}) = 8 + 3.2 + 0 + 3.2 + 8 = 22.4$

$\sum(x-\bar{x})^2 = 4 + 1 + 0 + 1 + 4 = 10$

$r = \frac{22.4}{\sqrt{10 \times 20.8}} ≈ 0.98$ (very strong positive correlation)

$b = \frac{22.4}{10} = 2.24$; $a = 8.4 - 2.24(4) = 0.44$

Regression equation: $\hat{y} = 0.44 + 2.24x$

*Interpretation*: For every ₹1 lakh increase in ad spend, sales increase by ₹2.24 crore.

## Common exam applications

Predict sales/demand from advertising spend or price
Test whether correlation is statistically significant
Compare strength of two correlations
Identify outliers in scatter plots
Use regression for forecasting

## Common mistakes

Confusing correlation with causation: *r* = 0.8 means association, not that x causes y
Ignoring data type: Pearson's *r* requires quantitative data; use Spearman's for ranks
Extrapolating beyond data range: Regression is unreliable outside observed values
Misreading regression direction: *y* = *a* + *b*(*x*) predicts *y* from *x*, not vice versa

Bivariate Data

Practice Questions

Study Notes: Bivariate Data

What You'll Learn

Concept Deep Dive

Exam-Focused

Detailed Analytics

Related Topics in Direction Tests

Class Types

Classification

Data Types

Diagrams

Direction Tests: Introduction

Frequency Distribution

Graphs

Statistical Errors

Tabulation

Start mastering Bivariate Data today

Bivariate Data

Practice Questions

Study Notes: Bivariate Data

What You'll Learn

Concept Deep Dive

Exam-Focused

Detailed Analytics

Related Topics in Direction Tests

Class Types

Classification

Data Types

Diagrams

Direction Tests: Introduction

Frequency Distribution

Graphs

Statistical Errors

Tabulation

Start mastering Bivariate Data today