# Macroeconometrics 1: The Basic Building Blocks

Macroeconometrics 1:

The Basic Building Blocks

Brendan Epstein, Ph.D.

Johns Hopkins University

Contents

1 Preliminaries 2

2 Series 3

3 Logs 4

3.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Matrix Algebra 6

4.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.2 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.3 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.5 Laws of Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.6 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.7 Special Kinds of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.7.1 Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.7.2 Column Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.7.3 Row Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.4 Diagonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.5 Upper-Triangular Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.6 Lower-Triangular Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.7 Symmetric Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.8 Idempotent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.7.9 Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.7.10 Power Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

These lecture notes closely and sometimes literally follow sections from: Dowling, Edward T. Introduction to Mathematical Economics. Shaumís Outlines, 3rd ed., McGraw Hill, 2001; Nicholson, Walter.

Microeconomic Theory: Basic Principles and Extensions. South-Western College Pub, 9th ed., 2004; Simon,

Carl P., and Lawrence Blume. Mathematics for Economists. New York: Norton, 1994; Sydsaeter, Knut, and

Peter J. Hammond. Mathematics for Economic Analysis. Prentice Hall, 1995.

1

4.7.11 Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.8 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.9 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.10 Linear Independence and Rank of a Matrix . . . . . . . . . . . . . . . . . . . 15

4.10.1 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.10.2 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.10.3 Properties of Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.11 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Classical Linear Regression (CLR) Model Recap 16

5.1 CLR Model in Matrix Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2 CLR Model Finite-Sample Assumptions and Properties . . . . . . . . . . . . 23

5.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 A Primer on First-Order Di§erence Equations 30

1 Preliminaries

These lecture notes begin with a review of series and logs. Series are very important

for the theory of time series analysis, and logs are actually a very helpful tool for

applications.

The notes then dive into matrix algebra. A prominent modeling methodology in Macroeconometrics is a vector autoregression (VAR), which falls into the category of multivariable time series analysis. The word ìvectorî on its own conveys the fact that this

topic in Macroeconometrics is best thought of in matrix format. Because the theory

underlying VARs can be understood as a fairly straightforward of univariate time series

analysis, we will make use of matrix algebra quite a bit. Hence, the section in these

lecture notes devoted to this topic.

Thereafter, building on the matrix-algebra development, the notes recap key assumptions of the classical linear regression model and cast the model from the vantage point

of matrix algebra.

The notes come to an end by providing a primer on Örst-order di§erence equations. The

heart of time series analysis involves lagged dependent variables entering estimation

equations as independent variables. Understanding the implications of this modeling

framework hinges on understanding Örst-order di§erence equations.

2

2 Series

Consider n > 0 numbers a, ak, ak2

, …, akn1

. Each term is obtained by multiplying

the previous one by a constant k. Suppose we wish to Önd the sum

sn = a + ak + ak2 + akn2 + ::: + akn1

,

which is called a Önite geometric series with quotient k. To Önd the sum sn of this

series Örst multiply both sides of the immediately preceding equation by k to obtain

ksn = ak + ak2 + ak3 + akn1 + ::: + akn

:

Subtracting this last equation from the former yields:

sn ksn = a akn

because all other terms cancel (you should verify this yourself).

ñ If k = 1, then all the terms in sn are equal to a and therefore sn = n a.

ñ For k 6= 1 then

sn ksn = a akn

! (1 k) sn = a (1 k

n

)

! sn = a

1 k

n

1 k

,

(where ì!îdenotes ìimpliesî) which is the summation formula for a Önite geometric series.

Now, letís consider an inÖnite geometric series

sn = a + ak + ak2 + akn2 + ::: + akn1 + :::;

from before we know that in the Önite case

sn = a

1 k

n

1 k

if k 6= 1:

ñWhat happens to this expression as n tends to inÖnity? The answer evidently

depends only on k

n because only this term depends on n. In fact, k

n

tends to 0 if

1 < k < 1, but k

n does not tend to any limit if k > 1 or k 1. (If you are not

yet convinced that this claim is true, study the cases k = 2, k = 1, k = 1=2,

k = 1=2, and k = 2.) Hence, it follows that if jkj < 1, where jkj denotes the

absolute value of k, then the sum sn of the Örst n terms will tend to the limit

a= (1 k) as n tends to inÖnity. We let this limit be the deÖnition of the inÖnite

sum under consideration and we say that this inÖnite series converges in this

case. If jkj 1, we say that the inÖnite sum under consid

divergent series has no (Önite) sum. Divergence is obvious if jkj > 1. When k = 1

then sn = na, which tends to +1 if a > 0 or to 1 if a < 0. When k = 1 then

sn is a when n is odd, but 0 when a is even; again, there is no limit as n tends to

inÖnity.

ñ To summarize:

sn = a + ak + ak2 + akn2 + ::: + akn1 + :::

=

X1

n=1

akn1 =

a

1 k

if jkj < 1.

3 Logs

In what follows immediately below, x and y can be variables or constants. Logarithms

are such that (log denotes any kind of logarithm and ln denotes the natural logarithm;

ln is the most common type of logarithm used in Macroeconomics and International

Macroeconomics):

ñ ln(x y) = ln (x) + ln (y) (so the log of a product is equal to the sum of the logs).

ñ ln(x=y) = ln (x) ln (y) (so the log or a ratio is equal to the di§erence of the

logs).

ñ ln (x

y

) = y ln (x) (so the log of a variable or constant raised to the power of

another variable or constant is equal to the product of the exponent times the log

of the base).

ñ ln(1+x) ‘ x. This approximation is very precise when x is small (meaning about

less than 0.1), and still very good for larger x. This approximation is extremely

useful because it enables us to use log changes to approximate growth rates as

shown immediately below.

If a variable xt grows at rate gt between periods t and period t 1, then:

xt

xt1

1 = gt

!

xt

xt1

= 1 + gt

! ln

xt

xt1

= ln(1 + gt) (taking logs of both sides)

! ln (xt) ln (xt1) = ln(1 + gt) (using the log properties from above)

! d ln (xt) = ln(1 + gt) (the change in a log can simply be stated as d ln (xt))

! d ln (xt) ‘ gt (using the log properties from above),

where subscripts denote the time period. Therefore, the log change of a variable x

is approximately equal to the variableís growth rate. It immediately follows that the

percent change of a variable x can be approximated as: 100

3.1 An Example

You may know that in macroeconomic contexts growth rates are typically quoted at

ìannual frequency.î Consider any variable Zt for which you have data available at

less-than-yearly frequency. In particular, say you have data available for this variable

at frequency n. Then, the annualized growth rate g

Z

t of this variable is:

g

Z

t =

Zt

Zt1

n

1.

So, for instance, if you had data at monthly frequency that means that the annualized

monthly growth rate of Zt

is

g

Z

t =

Zt

Zt1

12

1,

since there are 12 months in a year; similarly, if you had data at quarterly frequency

that means that the annualized quarterly growth rate of Zt

is

g

Z

t =

Zt

Zt1

4

1,

since there are 4 quarters in a year; etc.

In turn, the term ìyearly growth rate,î in particular when referred to as related to

GDP, can refer to two things. The fourth-quarter over fourth-quarter growth of GDP,

so, for instance,

g

GDP

t =

GDP2015:Q4

GDP2014:Q4

1 ‘ ln (GDP2015:Q4) ln (GDP2014:Q4),

where GDP2015:Q4 is the level of GDP in the fourth quarter of the year 2015 and

GDP2014:Q4 is the level of GDP in the fourth quarter of the year 2014. Or, the yearover-year growth of GDP, so, for instance,

g

GDP

t =

GDP2015

GDP2014

1 ‘ ln (GDP2015) ln (GDP2014).

Another frequently cited growth rate in macroeconomics is that of prices, i.e., ináation,

since a majority of central banks around the world have adopted ináation targets. As

you know, the aggregate price level is usually measured by some sort of consumer

price index (CPI). Annualization of ináation is, of course, analogous to the preceding

examples. One thing to keep in mind, though, is that central bank ináation targets

are usually cast from the vantage point of whatís called a ì12-month ináation target,î

meaning that if, say, a central bank has an ináation target of 2 percent, then its

of success is whether

100

CP It

CP It11

1 ‘ 100 [ln (CP It) ln (CP It11)]

is approximately equal to 2 percent every month, where the CPI data is assumed to

be at monthly frequency.

4 Matrix Algebra

A matrix is simply a rectangular array of numbers. The size of a matrix is indicated

by the number of its rows and the number of its columns. A matrix with k rows and

n columns, where k and n are positive integers, is called a k n (read as ìk by nî)

matrix. The number in row i and column j is called the (i; j)th entry (read as the ìi

jth entryî), where i and j are positive integers.

4.1 Addition

Only matrices of the same size can be added, i.e., matrices with the same numbers of

rows and columns. The resulting sum is a matrix with the same size as those that are

being added. The (i; j)th entry of the sum of two matrices is simply the sum of the

(i; j)th entries of the matrices being added. For example, consider two k n matrices

A and B; then:

A+B =

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5+

2

6

4

b11 b1n

.

.

. bij

.

.

.

bk1 bkn

3

7

5 =

2

6

4

a11 + b11 a1n + b1n

.

.

. aij + bij

.

.

.

ak1 + bk1 akn + bkn

3

7

5.

The matrix 0 whose entries are all zero, is an additive identity since A + 0 = A for all

matrices A. Of course, for this addition to take place the 0 matrix under consideration

must have the same number of rows and columns as the A matrix under consideration.

For example, consider the k n matrix A; then:

A + 0 =

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5 +

2

6

4

011 01n

.

.

. 0ij

.

.

.

0k1 0kn

3

7

5 =

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5.

ñWhenever clariÖcation about matrix dimensions is needed, then the matrixís dimensions are noted as a subscript under the matrixís name. For instance, in the

preceding case we could have written the zero matrix in question as 0kn. Also,

if this zero matrix were a square matrix with dimensions n n we could simply

write: 0n

4.2 Scalar Multiplication

Matrices can be multiplied by scalars. This operation is called scalar multiplication.

For instance, the product of the kn matrix A and the scalar r is created by multiplying

each entry of A by r:

rA = r

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5

=

2

6

4

r a11 r a1n

.

.

. r aij

.

.

.

r ak1 r akn

3

7

5.

4.3 Subtraction

Consider a k n matrix A; since A is what one adds to A to obtain 0, then:

1

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5 =

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5.

Furthermore, consider two k n matrices A and B. Since A B is just shorthand for

A+ (B), then:

AB =

2

6

4

a11 a1n

.

.

. aij

.

.

.

ak1 akn

3

7

5

2

6

4

b11 b1n

.

.

. bij

.

.

.

bk1 bkn

3

7

5 =

2

6

4

a11 b11 a1n b1n

.

.

. aij bij

.

.

.

ak1 bk1 akn bkn

3

7

5.

4.4 Matrix Multiplication

Not all matrices can be multiplied together, and the order in which matrices are multiplied can matter.

The product of two matrices A and B exists in the order A B if and only if the

number of columns of A equals the number of rows of B.

Assuming the multiplication can be carried out, the (i; j)th entry of A B is given by

Xm

h=1

aih bhj .

For example, let A be a 3 2 matrix and B be a 2 2 matrix; since

columns of A equals the number of rows of B then the product A B exists:

A B =

2

4

a11 a12

a21 a22

a31 a32

3

5

b11 b12

b21 b22

=

2

4

a11b11 + a12b21 a11b12 + a12b22

a21b11 + a22b21 a21b12 + a22b22

a31b11 + a32b21 a31b12 + a32b22

3

5.

ñ In this case the product taken in reverse order, that is, B A, is not deÖned.

Note that if A is a k m matrix and B be a m n matrix, then the product A B

will be a k n matrix. As such, the product inherits the number of its rows from A

and the number of its columns from B.

The n n matrix I

In =

2

6

6

6

4

1 0 0

0 1 0

0

.

.

.

.

.

. 0

0 0 1

3

7

7

7

5

with aii = 1 and aij = 0 for all i 6= j has the property that for any m n matrix A,

A In = A,

and for any n l matrix B,

In B = B.

The matrix I is called the identity matrix because it is a multiplicative identity for

matrices just like the scalar 1 is for numbers.

4.5 Laws of Matrix Algebra

Associative laws:

(A + B) + C = A + (B + C);

(A B) C = A(B C).

Commutative law for addition:

A + B = B + A.

Distributive Laws:

A (B + C) = A B + A C;

(A + B) C = A C + B C.

8

The one important law which numbers always satisfy but matrices do not, is the

commutative law for multiplication.

ñ Although for any scalars a and b it is true that ab = ba, it is not true that

A B = B A for two matrices A and B even when both products are deÖned.

4.6 Transpose

The transpose of a k n matrix A is the n k matrix obtained by interchanging the

rows and columns of A. This matrix is often written as A0

. As such, the (i; j)th entry

of A becomes the (j; i)th entry of A0

. For example,

a11 a12 a13

a21 a22 a23 0

=

2

4

a11 a21

a12 a22

a13 a23

3

5.

The following rules are fairly straightforward to verify:

(A + B)

0 = A0 + B

0

;

(A B)

0 = A0 B

0

;

(A0

)

0 = A;

(rA)

0 = rA0

;

(AB)

0 = B

0

A0

,

where r is a scalar and A and B are k n matrices.

4.7 Special Kinds of Matrices

We now take a look at an important class of k n matrices.

4.7.1 Square Matrix

k = n, that is, equal number of rows and columns.

4.7.2 Column Matrix

Also referred to as a column vector. In this case, n = 1, so the matrix has only one

column. For example:

2

4

a11

a21

a31

3

5.

4.7.3 Row Matrix

Also referred to as a row vector. In this case, k = 1, so the matrix has only one row.

For example:

a11 a12 a13

.

4.7.4 Diagonal Matrix

k = n and aij = 0 for i 6= j, that is, a square matrix in which all nondiagonal entries

are 0. For example,

a11 0

0 a22

and

2

4

a11 0 0

0 a22 0

0 0 a32

3

5.

4.7.5 Upper-Triangular Matrix

aij = 0 if i > j, that is, a (usually square) matrix in which all entries below the (main)

diagonal are 0. For example:

a11 a12

0 a22

and

2

4

a11 a12 a13

0 a22 a23

0 0 a33

3

5.

4.7.6 Lower-Triangular Matrix

aij = 0 if i < j, that is, a (usually square) matrix in which all entries above the (main)

diagonal are 0. For example:

a11 0

a21 a22

and

2

4

a11 0 0

a21 a22 0

a31 a32 a33

3

5.

4.7.7 Symmetric Matrix

A0 = A, that is, aij = aji for all i and j. These matrices are necessarily square. For

example:

a b

b d

and

2

4

1 2 3

2 4 5

3 5 6

3

5.

4.7.8 Idempotent Matrix

A square matrix A for which A A = A, such as A = I or:

5 5

4 4

.

Indeed, note that in this case:

5 5

4 4

5 5

4 4

=

5 5 + (5 4) 5 (5) + (5) (4)

4 5 + (4) 4 4 (5) + (4) (4)

=

25 20 25 + 20

20 16 20 + 16

=

5 5

4 4

.

4.7.9 Permutation Matrix

A square matrix of 0s and 1s in which each row and each column contains exactly one

1. For example:

2

4

0 1 0

1 0 0

0 0 1

3

5.

4.7.10 Power Matrices

Suppose A is an n n matrix. Then, the power-matrices of A are given by:

A0 = I, A1 = A, A2 = AA, A3 = AAA, etc.

4.7.11 Inverse Matrix

Let A be an n n matrix and B be an n n matrix. The matrix B is an inverse of

A if:

A B = B A = I.

ñ An n n matrix has at most one inverse.

Let A be a k n matrix. The n k matrix B is a right inverse of A if:

A B = I.

Let A be a k n matrix. The n k matrix C is a left inverse of A if:

C A = I.

ñ If A has a right inverse B and a left inverse C, then A is invertible, and

Let x be a column vector of n variables,

x =

2

6

6

6

4

x1

x2

.

.

.

xn

3

7

7

7

5

,

c be a column vector of n constants,

c =

2

6

6

6

4

c1

c2

.

.

.

cn

3

7

7

7

5

,

and A be an n n matrix. Then,

Ax = c

amounts to a system of n equations in n unknowns. If A is invertible, then there is a

unique solution to the above-noted system of linear equations, and it is given by:

x = A1

c.

Let A and B be square invertible matrices. Then:

ñ (A1

)

1 = A.

ñ (A0

)

1 = (A1

)

0

.

ñ A B is invertible, and (A B)

1 = B1

A1

.

If A is invertible, then:

ñ Am is invertible for any integer m and (Am)

1 = (A1

)

m = Am.

ñ For any integers r and s, Ar

As = Ar+s

.

ñ For any scalar r 6= 0, rA is invertible and (rA)

1 = r

1

A1

.

4.8 Determinants

The determinant of a matrix is a scalar that ìdeterminesî whether the matrix in

question is ìnonsingularîor not. In particular, for any square matrix its determinant

is such that the square matrix is nonsingular if and only if its determinant is not zero.

(A singular matrix is not invertible.)

A 1 1 matrix is just a scalar. As such, consider the scalar a. Its inverse exists if and

only if a is nonzero, so it is natural to deÖne the determinant of such matrix to be just

the scalar a:

de

(an alternative notation for det [a] is jaj, not to be confused with its absolute value,

although clariÖcation may be needed depending on context).

For a 2 2 matrix

A =

a11 a12

a21 a22

,

its determinant is deÖned as:

det

a11 a12

a21 a22

= a11 det (a22) a12 det (a21)

= a11 a22 a12 a21,

where the second line is useful for conceptualizing the determinant of higher order

matrices.

Let A be a n n matrix. Let Aij be the (n 1) (n 1) submatrix obtained by

deleting row i and column j from A. Then, the scalar

Mij = det (Aij )

is called the (i; j)th minor of A and the scalar

Cij = (1)i+j

Mij

is called the (i; j)th cofactor of A. A cofactor is a signed minor. Note that Mij = Cij

if (i + j) is even and Mij = Cij if (i + j) is odd. As such,

det [A] = det

a11 a12

a21 a22

= a11 det (a22) a12 det (a21)

= a11 M11 a12 M12

= a11 C11 + a12C 12 ,

which is useful for motivating the derivation of the determinant of a 3 3 matrix.

The determinant of a 3 3 matrix

A =

2

4

a11 a12 a13

a21 a22 a23

a31 a32 a33

### Need assignment help for this question?

If you need assistance with writing your essay, we are ready to help you!

## OUR PROCESS

### Order

### Payment

### Writing

### Delivery

**Why Choose Us: **Cost-efficiency, Plagiarism free, Money Back Guarantee, On-time Delivery, Total Сonfidentiality, 24/7 Support, 100% originality

is given by:

det

2

4

a11 a12 a13

a21 a22 a23

a31 a32 a33

3

5 = a11 C11 + a12 C12 + a13 C13

= a11 M11 a12 M12 + a13 M13

= a11 det

a22 a23

a32 a33

a12 det

a21 a23

a31 a33

+ a13 det

a21 a22

a31 a32

.

ñ Note that the jth term on the right-hand side of the deÖnition is a1j times the

determinant of the submatrix obtained by deleting row 1 and column j from A.

The term is preceded by a plus sign if 1 + j is even and by a minus sign if 1 + j

is odd.

The determinant of an n n matrix A is given by

det [A] = jAj

= a11C11 + a12C12 + ::: + a1nC1n

= a11M11 a12M12 + ::: + (1)n+1

a1nM1n.

4.9 Eigenvalues

Let A be a n n matrix. Then, the scalar e is an eigenvalue of A if there exists a

nonzero vector x such that

Ax = ex,

in which case x is an eigenvector of A (associated with e).

It should be noted that if x is an eigenvector associated with the eigenvalue e, then ex

is another eigenvector for every scalar e 6= 0.

Eigenvalues and eigenvectors are also called characteristic values and characteristic

vectors, respectively.

Note that

Ax = ex

! (A eI) x = 0.

DeÖne p (e) det [A eI]. Then,

p (e) = 0

is called the characteristic equation (or eigenvalue equation) of A.

From the deÖnition of a determinant, it follows that p (e) is a polynomial in e. The

roots of this characteristic polynomial are the eigenvalues of

4.10 Linear Independence and Rank of a Matrix

4.10.1 Linear Independence

Let fx1; x2; :::; xrg be a set of n 1 vectors. These are linearly independent vectors if

and only if

1×1 + 2×2 + ::: + 3xr = 0 (1)

implies that 1 = 2 = ::: = r = 0. If equation (1) holds for a set of scalars that are

not all zero, then fx1; x2; :::; xrg is linearly independent.

The statement that fx1; x2; :::; xrg is linearly dependent is equivalent to saying that at

least one vector in this set can be written as a linear combination of the others.

4.10.2 Rank

Let A be an nm matrix. The rank of a matrix A, denoted rank (A), is the maximum

number of linearly independent columns of A.

If A is an n m matrix and rank (A) = m, then A has full column rank.

If is an n m matrix, its rank can be at most m. A matrix has full column rank if its

columns form a linearly independent set. For example, the 3 2 matrix

2

4

1 3

2 6

0 0

3

5

can have at most rank 2. In fact, its rank is only 1 because the second column is 3

times the Örst column.

4.10.3 Properties of Rank

rank (A0

) = rank (A).

If A is n k, then rank (A) min fn; kg, where min is the ìminimum operatorîand

in this case selects the minimum of the set fn; kg.

If A is k k and rank (A) = k, then A is nonsingular.

4.11 Trace

The trace of a matrix is a very simple operation deÖned only for square matrices. For

any n n matrix A, the trace of matrix A, denoted tr (A), is the sum of its diagonal

elements. Mathematically,

tr (A) = Xn

i=1

aii.

The trace of a matrix has the following properties:

15

ñ tr (In) = n.

ñ tr (A0

) = tr (A).

ñ tr (A + B) = tr (A) + tr (B).

ñ tr (A) = tr (A), where is a scalar.

ñ tr (A B) = tr (B A), where A is n m and B is m n.

5 Classical Linear Regression (CLR) Model Recap

Recall that an estimator is:

ñ Unbiased if, on average, it hits the true parameter value. That is, the mean of

the sampling distribution of the estimator is equal to the true parameter value.

ñ Consistent if, as the sample size increases, the estimates (produced by the estimator) ìconvergeî to the true value of the parameter being estimated. To be

slightly more precise consistency means that, as the sample size increases, the

sampling distribution of the estimator becomes increasingly concentrated at the

true parameter value.

ñ NOTE: Unbiasedness is a statement about the expected value of the sampling

distribution of the estimator. Consistency is a statement about ìwhere the sampling distribution of the estimator is goingîas the sample size increases.

ñ E¢ cient if the sampling distribution of the estimator being used has smallest

variance among the class of estimators being considered. For instance, even if we

are dealing with a set of biased estimators that are nonetheless consistent, then

among these biased and consistent estimators the e¢ cient one would be the one

that has the smallest variance.

5.1 CLR Model in Matrix Form

5.1.1 Representation

In the remainder of this section we will use the t subscript to index observations and

an n to denote the sample size. Then, the multiple linear regression model with k

parameters is written as follows:

yt = 1 + 2×2;t + 3×3;t + ::: + kxk;t + “t (2)

for t = 1; 2; :::; n and where: yt

is the dependent variable for observation t; xj;t for

j = 2; 3; :::; k are the independent variables; and is the Greek letter ìbeta.î

For each t deÖne a 1 k vector

xt =

1 x2;t xk;t

16

and let

=

1 2

k

0

be the k 1 vector of all parameters. Then, we can write equation (2) as:

yt = xt + “t (3)

for t = 1; 2; :::; n.

We can write equation (3) in full matrix notation by appropriately deÖning data vectors

and matrices. Let yt denote the n 1 vector on y: the tth element of yt

is yt

. Let

Xt be the n k vector of observations on the explanatory variables. In other words,

the tth row of Xt consists of the vector xt

. Equivalently, the (j; t)th element of X is

simply xj;t:

Xt =

2

6

6

6

4

x1

x2

.

.

.

xn

3

7

7

7

5

=

2

6

6

6

4

1 x2;1 x3;1 xk;1

1 x2;2 x3;2 xk;2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1 x2;n x3;n xk;n

3

7

7

7

5

.

Of course, because the regression model under consideration involves a constant parameter 1

, then x1;t = 1 8t (8 means ìfor allîin mathematics jargon). Then, we can

write equation (3) for all n observations in matrix notation:

yt = Xt + “t

, (4)

where:

“t =

2

6

6

6

4

“1

“2

.

.

.

“n

3

7

7

7

5

5.1.2 Estimation

Estimation of proceeds by minimizing the sum of squared residuals. DeÖne the sum

of squared residuals function for any possible k 1 parameter vector b as:

SSR (b)

Xn

t=1

(yt xtb)

2

,

where: is the Greek letter ìcapital sigma,îwhich is used to represent the summation

operator, where the sum is from t = 1 through t = n. The k 1 vector of ordinary

1

least squares estimates,

^ =

^

1 ^

2

^

k

0

=

2

6

6

6

4

^

1

^

2

.

.

.

^

k

3

7

7

7

5

,

minimizes SSR (b) over all possible k 1 vectors b. This is a problem in multivariable

calculus.

For ^ to minimize the sum of squared residuals, it must solve the Örst-order condition:

@SSR

^

@b

= 0. (5)

To be clear, this condition says that the partial derivative of SSR () with respect to

b (denoted using the calculus ìpartialîsymbol @) evaluated at ^ equals zero.

More concretely, using the power and chain rules from calculus,

@SSR

@b

=

@

Pn

t=1 (yt xtb)

2

@b

= 2 (yt xtb) xt

,

which is, of course, a 1 k vector. As such, equation (5) is equivalent to:

Xn

t=1

x

0

t

yt xt^

= 0 (6)

(where we have divided by 2 and then took the transpose). We can write this Örst

order condition as:

Xn

t=1

yt ^

1 ^

2×2;t ::: ^

kxk;t

= 0

Xn

t=1

x2;t

yt ^

1 ^

2×2;t ::: ^

kxk;t

= 0

.

.

.

Xn

t=1

x2;t

yt ^

1 ^

2×2;t ::: ^

kxk;t

= 0.

Using the properties of matrix algebra, this system of equations can of course by written

as

X0

t

yt Xt

or

(X0

tXt) ^ = X0

tyt

. (8)

It can be shown that equation (8) always has at least one solution.

Assuming that the k k symmetric matrix X0

tXt

is nonsingular, we can premultiply

both sides of equation (8) by (X0

tXt)

1

to solve for the CLR estimator ^:

^ = (X0

tXt)

1 X0

tyt

. (9)

This expression is the critical equation for matrix analysis of the multiple linear regression model and, of course, corresponds to the ordinary least squares (OLS) procedure. The assumption that X0

tXt

is invertible is equivalent tot he assumption that

rank (Xt) = k, which means that the columns of Xt must be linearly independent.

NOTE: it is tempting to simplify the equation for ^ as follows:

^ = (X0

tXt)

1 X0

tyt

= X1

t

(X0

t

)

1 X0

tyt

= X0

tyt

.

The problem, though, is that Xt

is usually not a square matrix, and so it cannot be

inverted. In other words, we cannot write (X0

tXt)

1 = X1

t

(X0

t

)

1

unless n = k, a case

that virtually never arises in practice.

The n 1 vectors of OLS Ötted values and residuals are given respectively by:

^yt = Xt^

t

and

^”t = yt ^yt

= yt Xt^.

Of course, from equation (7) and the deÖnition of ^”t

it follows that the Örst order

condition that deÖnes ^ is the same as:

X0

t^”t = 0. (10)

Because the Örst column of Xt consists entirely of ones, equation (10) implies that the

OLS residuals always sum to zero when an intercept is included in the equation and

that the sample covariance between each independent variable and the OLS residuals

is

The sum of squared residuals can be written as:

SSR =

Xn

t=1

^”

2

t

= ^”

0

t^”t

=

yt Xt^

0

yt Xt^

. (11)

All of the previous is better understood using full-áedged matrix algebra, which is

why weíre putting emphasis on thisó in fact, matrix algebra, believe it or not, makes

the study of multivariate time series much, much, much(!) simpler than it would be

otherwise…trust me!

ñ Letís start with

yt = Xt^ + “t

,

which implies that

“t = yt Xt^.

ñ The estimator ^ therefore minimizes the sum of squared residuals (which is a

scalar):

”

0

t”t =

yt Xt^

0

yt Xt^

= y

0

tyt ^

0

X0

tyt y

0

tXt^ + ^

0

X0

tXt^,

which follows by the rules of matrix algebra studied earlier. Therefore,

”

0

t”t =

yt Xt^

0

yt Xt^

= y

0

tyt ^

0

X0

tyt y

0

tXt^ + ^

0

X0

tXt^

= y

0

tyt 2^

0

X0

tyt+^

0

X0

tXt^,

where the second line follows from the fact that the transpose of a scalar is just

that same scalar, that is

y

0

tXt^ =

y

0

tXt^

0

= ^

0

X0

tyt

.

ñWe now need to take the drivative of the sum of squared residuals with respect

to ^ and set it to zero to get at the estimator that minimizes this sum (the proof

that this is a minimum is beyond the scope and needs of this class). Th

the Örst order condition is

@”

0

t”t

@^

=

@y

0

tyt

@^

| {z }

=0

2

@^

0

X0

tyt

@^

+

@^

0

X0

tXt^

@^

= 0.

ñ Note that

@^

0

@^

= @^

0

@^

1

2

6

6

4

@^

0

@1

.

.

.

@^

0

@k

3

7

7

5

=

2

6

6

6

6

4

@

h

^

1 ^

2

^

k

i

@1

.

.

.

@

h

^

1 ^

2

^

k

i

@k

3

7

7

7

7

5

=

2

6

4

1 0 0

.

.

.

0 0 1

3

7

5 = Ik.

ñ Putting everything together so far

@”

0

t”t

@^

=

@y

0

tyt

@^

| {z }

=0

2

@^

0

@^

X0

tyt

| {z }

2

@^

0

@^ X0

tyt

= 2IkX0

tyt

= 2X0

tyt

+

@^

0

X0

tXt^

@^

= 0.

ñ To understand how to deal with the last term above, recall that Xt

is n k, ^ is

k 1, and therefore ^

0

is 1 k. It follows that

^

0

X0

tXt^ =

1

k

2

6

4

x11 x1;k

.

.

.

.

.

.

.

.

.

xn;1 xn;k

3

7

5

2

6

4

1

.

.

.

k

3

7

5.

Then,

1

k

2

6

4

x11 x1;k

.

.

.

.

.

.

.

.

.

xn;1 xn;k

3

7

5

=

1×11 + ::: + kxn;1 kx1;k + ::: + kxn;k

and

1

k

2

6

4

x11 x1;k

.

.

.

.

.

.

.

.

.

xn;1 xn;k

3

7

5

2

6

4

1

.

.

.

k

3

7

5

=

1×11 + ::: + kxn;1 kx1;k + ::: + kxn;k

2

6

4

1

.

.

.

k

3

7

5

=

2

1×11 + ::: +

2

kxn;1 + ::: +

2

kx1;k + ::: +

2

kxn;k

= ^

0

X0

tXt^:

It follows that

@^

0

X0

tXt^

@^

=

2

1×11 + ::: +

2

kxn;1 + ::: +

2

kx1;k + ::: +

2

kxn;k

@1

+:::

+

2

1×11 + ::: +

2

kxn;1 + ::: +

2

kx1;k + ::: +

2

kxn;k

@k

which is equal to

21×11 + ::: + 2kxn;k.

Analogously to

^

0

X0

tXt^ =

2

1×11 + ::: +

2

kxn;1 + ::: +

2

kx1;k + ::: +

2

kxn;k,

it immediately follows that

21×11 + ::: + 2kxn;k = 2X0

tXt^.

ñ Putting this Önal piece into our earlier derivative, we then have

@”

0

t”t

@^

=

@y

0

tyt

@^

| {z }

=0

2

@^

0

@^

X0

tyt

| {z }

2

@^

0

@^ X0

tyt

= 2IkX0

tyt

= 2X0

tyt

+

@^

0

X0

tXt^

@^

| {z }

2X0

tXt^

= 0.

Thus,

@”

0

t”t

@^

= 2X0

tyt + 2X0

tXt^ = 0,

and it follows that

X0

tXt^ = X

0

tyt

.

Therefore,

^ = (X0

tXt)

1 X0

tyt

.

5.2 CLR Model Finite-Sample Assumptions and Properties

5.2.1 Assumptions

The CLR model consists of Öve basic assumptions about the way in which the observations are generated.

Assumption 1. The dependent variable can be calculated as a linear function of a

speciÖc set of independent variables, plus a disturbance term. The unknown coe¢ cients

of this linear function form the vector and are assumed to be constants. In other

words, the model can be written as in equation (4), where: yt

is an observed n 1

vector; Xt

is an n 1 observed matrix; and “t

is an n 1 vector of unobserved

errors or disturbances There can be several violations of this assumption. For instance:

ñWrong regressors: the omission of relevant independent variables or the inclusion

of irrelevant independent variables.

ñ Nonlinearity: when the relationship between the dependent and independent variables is not linear.

ñ Changing parameters: when the parameters do not remain constant during the

period in which data were collected.

Assumption 2. The expected value of the disturbance term is conditionally zero; that

is, the mean of the distribution from which the disturbance is drawn is zero conditional

on knowledge of the matrix Xt

. Mathematically,

E(“t

jXt) = 0 8t

(recall that the symbol 8 means ìfor allî in mathematical jargon). This assumption

imposes strict exogeneity (as explained later) on the explanatory variables, which,

for instance, rules out explanatory variables whose future values are correlated with

the error term. In particular, this assumption eliminates lagged dependent variables

entering the regression equation as regressors. Given assumption 2, we can condition

on the regressors when computing the expected value of ^. Of note, in a time

series context, assumption translates into the time series of disturbances

being white noise, as deÖned just below.

ñ The problem with violating assumption 2 is most easily assessed by assuming

that the equation in question is instead estimated by rearranging it and removing

the nonzero mean from the error term and adding it to the intercept term. This

situation creates an estimating equation obeying all the CLR model assumptions;

in particular, the mean of a new error term is zero. The only problem is that

ordinary least squares estimation gives an unbiased estimate of the new intercept,

which is the sum of the original intercept and the mean of the original error term;

2

it is therefore a biased estimate of the original intercept (the bias being exactly

equal to the mean of the original error term). Thus, the only implication of this

violation of the second assumption of the CLR model is that the OLS estimate of

the intercept is biased; the slope coe¢ cient estimates are una§ected. This biased

estimate is often welcome by the econometrician since, for prediction purposes,

s/he would want to incorporate the mean of the error term into the prediction.

Assumption 3. The disturbance terms all have the same variance and are not correlated with one another, which is referred to as the disturbances being ìsphericalî(absent one or the other condition then the disturbances are said to be ìnonsphericalî).

ñ These characteristics are usually described in terms of the variance-covariance

matrix of the disturbance vector. If we have a disturbance vector of length t, then

the variance covariance estimator of the disturbances is a matrix with t rows and

t columns.

ñ The diagonal terms are the variances of the individual disturbances, and the o§-

diagonal terms are the covariances between them. Each diagonal term gives the

variance of the disturbance associated with one of the sample observations (i.e.,

the Örst diagonal term gives the variance of the disturbance associated with the

Örst observation, and the last diagonal term gives the variance of the disturbance

term associated with the tth observation). If all these diagonal terms are the same,

the disturbances are said to have uniform variance or to be ìhomoskedastic.î

If the diagonal terms are not all the same, the disturbances are said to be

ìheteroskedastic;îthe disturbance term is then thought of as being drawn from

a di§erent distribution for each observation.

ñ Each o§-diagonal element of the variance-covariance matrix gives the covariance

between the disturbances associated with two of the sample observations. If all

these o§-diagonal terms are zero, the disturbances are said to be uncorrelated,

which means that in repeated samples there is no tendency for the disturbance

associated with one observation to be related to the disturbance associated with

any other. If the o§-diagonal terms are not all zero, the disturbances are said

to be ìautocorrelated;î the disturbance term for one observation is correlated

with the disturbance term for another observation.

ñ In a time series context it is common to encounter autocorrelated errors, heteroskedasticity, or both. The presence of heteroskedasticity or autocorrelated errors does not create bias or consistency issues in estimating coe¢ cients whenever

all other assumptions of the CLR model hold. However, OLS estimators are ine¢ cient. In addition, the standard errors are biased when heteroskedasticity

is present, which leads to bias in test statistics and conÖdence intervals.

ñ NOTE: To qualify as white noise, the random variable “t must satisfy three

conditions (recall that: the symbol 8 means ìfor allîin mathematical jargon; and

is the Greek letter ìsigmaî):

Its mean must be zero:

E(“t) = 0 8t;

24

Its variance must be Önite and constant:

E

”

2

t

= var

”

2

t

= E(“t”

0

t

)

=

2

”

I 8t;

It must be uncorrelated with past or future values of “t

:

E

“t”

0

j

= cov (“t

; “j )

= 0 8t 6= j.

Assumption 4. The observations on the independent variable can be considered

Öxed in repeated samples; that is, it is possible to redraw the sample with the same

independent variable values.

ñ Two important econometric problems correspond to violations of this assumption: errors in variables, i.e., errors in measuring the independent variables; and

autoregression, i.e., using a lagged value of the dependent variable as an independent variable. Of course, autoregression is very common in time series

analysis.

Assumption 5. The Öfth assumption of the CLR models is that the number of

observations is greater than the number of independent variables and that there are no

exact linear relationships between the independent variables. Although this is viewed

as an assumption in general, for a speciÖc case it can easily be checked, so that it

need not be assumed. The problem of multicollinearity (two or more independent

variables being approximately linearly related in the sample data) is associated with

this assumption. Under this assumption X0

tXt

is nonsingular, and so ^ is unique and

can be written as in equation (9). In other words, the matrix Xt has rank k per the

notation used earlier in these notes.

When these Öve assumptions hold, OLS is considered the optimal estimator. That

said, time series econometrics will require careful application of otherwise standard

econometric techniques, since, as noted above, key CLR model assumptions are easily

violated in time series contexts.

5.2.2 Properties

Theorem 1 Unbiasedness of OLS. Under the CLR model assumptions (1), (2), (3), and

(5) the OLS estimator ^ is an unbiased estimator for .

Proof. Use assumptions (1) and (3) to write:

^ = (X0

tXt)

1 X0

tyt

= (X0

tXt)

1 X0

t

(Xt + “t

)

= (X0

tXt)

1

(X0

tXt) + (X0

tXt)

1 X0

t”t

= + (X0

tXt)

1 X0

t”t

,

where we use the fact that (X0

tXt)

1

(X0

tXt) = Ik. Taking the expectation conditional on Xt

gives:

E

^jXt

= E

h

+ (X0

tXt)

1 X0

t”t

jXt

i

= +E

h

(X0

tXt)

1 X0

t”t

jXt

i

= + (X0

tXt)

1 X0

tE(“t

jXt)

= + (X0

tXt)

1 X0

t0

=

because E(“t

jXt) = 0 under the CLR modelís assumption (2). And, invertibility of X0

tXt

follows by the CLR modelís assumption (5).

Theorem 2 Variance-covariance matrix of the OLS estimator. Under the CLR

modelís assumptions (1) through (5)

var

^jXt

=

2

”

(X0

tXt)

1

. (1

Proof. From equation (12) we have:

var

^jXt

= E

h

^E

^jXt

i2

jXt

= E

^

2

jXt

(by Theorem 1)

= E

^

^

0

jXt

= E

(X0

tXt)

1 X0

t”t

(X0

tXt)

1 X0

t”t

0

jXt

(by substituting in rearrangement of equation 12)

= E

(X0

tXt)

1 X0

t”t

”

0

t

h

(X0

tXt)

1 X0

t

i0

jXt

(by transpose properties)

= E

(X0

tXt)

1 X0

t”t”

0

t

(X0

t

)

0

h

(X0

tXt)

1

i0

jXt

(by transpose properties)

= E

(X0

tXt)

1 X0

t”t”

0

tXt

h

(X0

tXt)

0

i1

jXt

(by transpose properties)

= E

(X0

tXt)

1 X0

t”t”

0

tXt

h

X0

t

(X0

t

)

0

i1

jXt

(by transpose properties)

= E

h

(X0

tXt)

1 X0

t”t”

0

tXt (X0

tX0

t

)

1

jXt

i

(by transpose properties)

= (X0

tXt)

1 X0

tE(“t”

0

t

jXt) Xt (X0

tX0

t

)

1

= (X0

tXt)

1 X0

t

var (“t

jXt) Xt (X0

tXt)

1

;

Now, use the CLR modelís assumption (4) to obtain:

var

^jXt

= (X0

tXt)

1 X0

t

2

”

InXt (X0

tXt)

1

=

2

”

(X0

tXt)

1 X0

tXt (X0

tXt)

1

=

2

”

(X0

tXt)

1

.

Equation (13) means that the variance of ^

j

(conditional on Xt) is obtained by multiplying

2

” by the jth diagonal element of (X0

tXt)

1

. This equation also tells us how to

obtain the covariance between any two OLS estimates: multiply

2

” by the appropriate

o§-diagonal elem

Theorem 3 Gauss Markov. Under the CLR modelís assumptions (1) through (5) ^ is the

best linear unbiased estimator (BLUE).

Proof. Any other linear estimator of can be written as:

~ = A0

tyt

, (14)

where: At

is an n k matrix. In order for ~ to be unbiased conditional on Xt

, At can

consist of nonrandom numbers and functions of Xt

. (For example, At cannot be a function

of yt

.) To see what further restrictions on At are needed, write:

~ = A0

t

(Xt + “t

)

= (A0

tXt) + A

0

t”t

. (15)

Then,

E

~jXt

= E[(A0

tXt) + A

0

t”t

jXt

]

= E[(A0

tXt) jXt

] + E(A0

t”t

jXt)

= (A0

tXt) + A

0

tE(“t

jXt)

(since At

is a function of Xt)

= (A0

tXt)

(since E(“t

jXt) equals 0).

For ~ to be an unbiased estimator of it must be true that E

~jXt

= for all k 1

vectors , that is,

(A0

tXt) = (16)

for all k 1 vectors . Because A0

tXt

is a k k matrix, equation (16) holds if and only if

A0

tXt = Ik. Equations (14) and (16) characterize the class of linear, unbiased estimators for

. Next, from equation (15) we have:

var

~jXt

= A0

t

[var (“t

jXt)] At

=

2

“A0

tAt

28

by the CLR modelís assumption (3). Therefore,

var

~jXt

var

^jXt

=

2

”

h

A0

tAt (X0

tXt)

1

i

=

2

”

h

A0

tAt A0

tXt (X0

tXt)

1 X0

tAt

i

(because A0

tXt equals Ik)

=

2

”

h

A0

t A0

tXt (X0

tXt)

1 X0

t

i

At

=

2

“A0

t

h

In Xt (X0

tXt)

1 X0

t

i

At

=

2

“A0

tMtAt

,

where Mt In Xt (X0

tXt)

1 X0

t

. Because Mt

is symmetric and idempotent, A0

tMtAt

is positive semi-deÖnite for any n k matrix At

. (ìPositive semi-deÖniteî means that

z

0

t

(A0

tMtAt) zt

is greater than or equal to the zero vector for every nonzero column vector

zt

.) This establishes that the OLS estimator ^ is BLUE.

It can be shown that the unbiased estimator of the error variance

2

”

can be written

as:

^

2

” =

^”

0

t^”t

n k

,

where we have labeled the explanatory variables so that there are k total parameters,

including the intercept.

Theorem 4 Unbiasedness of ^

2

”

. Under assumptions (1) through (5) of the CLR model

^

2

”

is an unbiased estimator of

2

”

so that E

^

2

”

jXt

=

2

”

for all

2

” > 0.

Proof. Write

^”t = yt Xt^

= yt Xt (X0

tXt)

1 Xtyt

= Mtyt

= Mt”t

,

where: Mt InXt (X0

tXt)

1 X0

t

, and the last equality follows because MtXt = 0. Because

Mt

is symmetric and idempotent,

^”

0

t^”t = ”

0

tM0

tMt”t

= ”

Because ”

0

tMt”t

is a scalar, it equals its ìtrace.îTherefore,

E(”

0

tMt”t

jXt) = E[tr (”

0

tMt”t)jXt

]

= E[tr (Mt”t”

0

t

)jXt

]

= trE[(Mt”t”

0

t

)jXt

]

= tr [MtE(“t”

0

t

jXt)]

= tr

Mt

2

”

In

=

2

”

tr (Mt)

=

2

”

(n k),

where: tr is the trace operator, which calculates the trace of its input. Note that the last

equality above follows from:

tr (Mt) = tr (In) tr h

Xt (X0

tXt)

1 Xt

i

= n tr h

Xt (X0

tXt)

1 Xt

i

= n tr (Ik)

= n k.

Therefore,

E

^

2

”

jXt

=

E(”

0

tMt”t

jXt)

n k

=

2

”

.

6 A Primer on First-Order Di§erence Equations

A di§erence equation expresses a relationship between a dependent variable and a

lagged independent variable (or variables) that changes at discrete intervals of time.

For example, It = f (Yt1), where I and Y are measured at the end of each year.

The order of a di§erence equation is determined by the greatest number of periods

lagged. As such: a Örst-order di§erence equation expresses a time lag of one period;

a second-order, two periods; etc.

The change in a variable y as t changes from time-period t to time-period t+1 is called

the Örst di§erence of y. It is written as:

dy

dt = dyt+1 = yt+1 yt

. (17)

One way to solve di§erence equations is called the iterative method. For ins

let

yt+1 = byt

where b is a constant, and let the initial value of y be given by y0 (a constant). Then,

yt+1 = byt (18)

! yt = byt1

! yt1 = byt2,

etc. Iterative backwards substitution implies that:

yt+1 = b(byt1)

| {z }

=yt

= b

2

yt1

= b

2

(byt2)

| {z }

=yt1

= b

3

yt2,

etc. Note that

yt+1 = b

3

yt2

! yt = b

3

yt3

! yt = b

s

yts.

So, when t = s we have:

yt = b

t

y0,

which traces back the value of yt

in any period t to its initial value weighted by a

function of the constant b. Therefore, the constant b plays a critical role in determining

the evolution of the variable y.

Given a Örst-order di§erence equation which is linear (i.e., all the variables are raised

to the Örst power and there are no cross products),

yt = a + byt1, (19)

where a and b are constants, the ìgeneral formula for a deÖnite solutionîis:

yt =

y0

a

1 b

b

t +

a

1 b

(20)

when b 6= 1 and

yt = y0 + at (21)

when b = 1. If no initial condition is given, an arbitrary constant A is used for y0

a

1b

in equation (20) and for y0 in equation (21). This situation is called the ìgenera

Therefore, equation (20) can be expressed in the general form

yt = c + Abt

, (22)

where: A = y0

a

1b

; and c =

a

1b

. Here, Abt

is called the complementary function and c is called the particular solution. The particular solution expresses the

intertemporal equilibrium level of y, while the complementary function represents

the deviations from that equilibrium.

Equation (22) will be dynamically stable, therefore, only if the complementary function Abt ! 0 (in this case the symbol ! is used to denote ìtendsîin the mathematical sense of limits) as t ! 1. All depends on the base b. Assuming A = 1 and

c = 0 for the moment, the exponential expression b

t will generate 7 di§erent time paths

depending on the value of b, as illustrated further below. If jbj > 1 the time path will

explode and move farther and farther away from equilibrium. If jbj < 1 the time path

will be damped and move toward equilibrium. If b < 0, the time path will oscillate

between positive and negative values. If b > 0 the tie path will be nonoscillating. Note

that if A 6= 1 the value of this multiplicative constant will scale up or down the magnitude of b

t

, but will not change the basic patter of movement. If A = 1 a mirror image

of the time path of b

t with respect to the horizontal axis will be produced. If c 6= 0 the

vertical intercept of the graph is a§ected, and the graph shifts up or down accordingly.

You should convince yourself that all of these scenarios are true by reproducing the

graphs further below for di§erent values of b (and A). In particular, because in the

equation yt = b

t

, b can range from 1 to +1:

ñ If b > 1, b

t

increases at an increasing rate as t increases, thus moving farther

away from the horizontal axis This is illustrated in panel (a) of the Ögure below,

which is a step function representing changes at discrete intervals of time, not a

continuous function.

ñ If b = 1, b

t = 1 for all values of t. This is represented by a horizontal line in panel

(b) of the Ögure below.

ñ If 0 < b < 1 then b is a positive fraction and b

t decreases as t increases, drawing

closer and closer to the horizontal axis, but always remaining positive, as shown

in panel (c) of the Ögure below.

ñ If b = 0 then b

t = 0 for all values of t as shown in panel (d) of the Ögure below.

ñ If 1 < b < 0 then b is a negative fraction and b

t will alternative in sign and draw

closer and closer to the horizontal axis as t increases as shown in panel (e) of the

Ögure below.

ñ If b = 1 then b

t oscillates between +1 and 1 as shown in panel (f) of the Ögure

below.

ñ If b < 1 then b

t will oscillate and move farther and farther away from the

horizontal axis as shown in panel (g) of the Ögur

In short: if jbj > 1 the time path explodes; if jbj < 1 the time path converges; if b > 0

the time path is nonoscillating; and if b < 0 the time path oscillates.

33