Best writers. Best papers. Let professionals take care of your academic papers

Order a similar paper and get 15% discount on your first order with us
Use the following coupon "FIRST15"
ORDER NOW

Macroeconometrics 1: The Basic Building Blocks

Macroeconometrics 1:
The Basic Building Blocks
Brendan Epstein, Ph.D.
Johns Hopkins University
Contents
1 Preliminaries 2
2 Series 3
3 Logs 4
3.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Matrix Algebra 6
4.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Laws of Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.6 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.7 Special Kinds of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.7.1 Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.7.2 Column Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.7.3 Row Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.4 Diagonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.5 Upper-Triangular Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.6 Lower-Triangular Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.7 Symmetric Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.8 Idempotent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.7.9 Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.7.10 Power Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
These lecture notes closely and sometimes literally follow sections from: Dowling, Edward T. Introduction to Mathematical Economics. Shaumís Outlines, 3rd ed., McGraw Hill, 2001; Nicholson, Walter.
Microeconomic Theory: Basic Principles and Extensions. South-Western College Pub, 9th ed., 2004; Simon,
Carl P., and Lawrence Blume. Mathematics for Economists. New York: Norton, 1994; Sydsaeter, Knut, and
Peter J. Hammond. Mathematics for Economic Analysis. Prentice Hall, 1995.
1
4.7.11 Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.8 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.9 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.10 Linear Independence and Rank of a Matrix . . . . . . . . . . . . . . . . . . . 15
4.10.1 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.10.2 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.10.3 Properties of Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.11 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Classical Linear Regression (CLR) Model Recap 16
5.1 CLR Model in Matrix Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 CLR Model Finite-Sample Assumptions and Properties . . . . . . . . . . . . 23
5.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 A Primer on First-Order Di§erence Equations 30
1 Preliminaries
 These lecture notes begin with a review of series and logs. Series are very important
for the theory of time series analysis, and logs are actually a very helpful tool for
applications.
 The notes then dive into matrix algebra. A prominent modeling methodology in Macroeconometrics is a vector autoregression (VAR), which falls into the category of multivariable time series analysis. The word ìvectorî on its own conveys the fact that this
topic in Macroeconometrics is best thought of in matrix format. Because the theory
underlying VARs can be understood as a fairly straightforward of univariate time series
analysis, we will make use of matrix algebra quite a bit. Hence, the section in these
lecture notes devoted to this topic.
 Thereafter, building on the matrix-algebra development, the notes recap key assumptions of the classical linear regression model and cast the model from the vantage point
of matrix algebra.
 The notes come to an end by providing a primer on Örst-order di§erence equations. The
heart of time series analysis involves lagged dependent variables entering estimation
equations as independent variables. Understanding the implications of this modeling
framework hinges on understanding Örst-order di§erence equations.
2
2 Series
 Consider n > 0 numbers a, ak, ak2
, …, akn1
. Each term is obtained by multiplying
the previous one by a constant k. Suppose we wish to Önd the sum
sn = a + ak + ak2 + akn2 + ::: + akn1
,
which is called a Önite geometric series with quotient k. To Önd the sum sn of this
series Örst multiply both sides of the immediately preceding equation by k to obtain
ksn = ak + ak2 + ak3 + akn1 + ::: + akn
:
Subtracting this last equation from the former yields:
sn ksn = a akn
because all other terms cancel (you should verify this yourself).
ñ If k = 1, then all the terms in sn are equal to a and therefore sn = n  a.
ñ For k 6= 1 then
sn ksn = a akn
! (1 k)  sn = a (1 k
n
)
! sn = a
1 k
n
1 k
,
(where ì!îdenotes ìimpliesî) which is the summation formula for a Önite geometric series.
 Now, letís consider an inÖnite geometric series
sn = a + ak + ak2 + akn2 + ::: + akn1 + :::;
from before we know that in the Önite case
sn = a
1 k
n
1 k
if k 6= 1:
ñWhat happens to this expression as n tends to inÖnity? The answer evidently
depends only on k
n because only this term depends on n. In fact, k
n
tends to 0 if
1 < k < 1, but k
n does not tend to any limit if k > 1 or k  1. (If you are not
yet convinced that this claim is true, study the cases k = 2, k = 1, k = 1=2,
k = 1=2, and k = 2.) Hence, it follows that if jkj < 1, where jkj denotes the
absolute value of k, then the sum sn of the Örst n terms will tend to the limit
a= (1 k) as n tends to inÖnity. We let this limit be the deÖnition of the inÖnite
sum under consideration and we say that this inÖnite series converges in this
case. If jkj  1, we say that the inÖnite sum under consid
divergent series has no (Önite) sum. Divergence is obvious if jkj > 1. When k = 1
then sn = na, which tends to +1 if a > 0 or to 1 if a < 0. When k = 1 then
sn is a when n is odd, but 0 when a is even; again, there is no limit as n tends to
inÖnity.
ñ To summarize:
sn = a + ak + ak2 + akn2 + ::: + akn1 + :::
=
X1
n=1
akn1 =
a
1 k
if jkj < 1.
3 Logs
 In what follows immediately below, x and y can be variables or constants. Logarithms
are such that (log denotes any kind of logarithm and ln denotes the natural logarithm;
ln is the most common type of logarithm used in Macroeconomics and International
Macroeconomics):
ñ ln(x  y) = ln (x) + ln (y) (so the log of a product is equal to the sum of the logs).
ñ ln(x=y) = ln (x) ln (y) (so the log or a ratio is equal to the di§erence of the
logs).
ñ ln (x
y
) = y  ln (x) (so the log of a variable or constant raised to the power of
another variable or constant is equal to the product of the exponent times the log
of the base).
ñ ln(1+x) ‘ x. This approximation is very precise when x is small (meaning about
less than 0.1), and still very good for larger x. This approximation is extremely
useful because it enables us to use log changes to approximate growth rates as
shown immediately below.
 If a variable xt grows at rate gt between periods t and period t 1, then:
xt
xt1
1 = gt
!
xt
xt1
= 1 + gt
! ln 
xt
xt1

= ln(1 + gt) (taking logs of both sides)
! ln (xt) ln (xt1) = ln(1 + gt) (using the log properties from above)
! d ln (xt) = ln(1 + gt) (the change in a log can simply be stated as d ln (xt))
! d ln (xt) ‘ gt (using the log properties from above),
where subscripts denote the time period. Therefore, the log change of a variable x
is approximately equal to the variableís growth rate. It immediately follows that the
percent change of a variable x can be approximated as: 100 
3.1 An Example
 You may know that in macroeconomic contexts growth rates are typically quoted at
ìannual frequency.î Consider any variable Zt for which you have data available at
less-than-yearly frequency. In particular, say you have data available for this variable
at frequency n. Then, the annualized growth rate g
Z
t of this variable is:
g
Z
t =

Zt
Zt1
n
1.
So, for instance, if you had data at monthly frequency that means that the annualized
monthly growth rate of Zt
is
g
Z
t =

Zt
Zt1
12
1,
since there are 12 months in a year; similarly, if you had data at quarterly frequency
that means that the annualized quarterly growth rate of Zt
is
g
Z
t =

Zt
Zt1
4
1,
since there are 4 quarters in a year; etc.
 In turn, the term ìyearly growth rate,î in particular when referred to as related to
GDP, can refer to two things. The fourth-quarter over fourth-quarter growth of GDP,
so, for instance,
g
GDP
t =
GDP2015:Q4
GDP2014:Q4
1 ‘ ln (GDP2015:Q4) ln (GDP2014:Q4),
where GDP2015:Q4 is the level of GDP in the fourth quarter of the year 2015 and
GDP2014:Q4 is the level of GDP in the fourth quarter of the year 2014. Or, the yearover-year growth of GDP, so, for instance,
g
GDP
t =
GDP2015
GDP2014
1 ‘ ln (GDP2015) ln (GDP2014).
 Another frequently cited growth rate in macroeconomics is that of prices, i.e., ináation,
since a majority of central banks around the world have adopted ináation targets. As
you know, the aggregate price level is usually measured by some sort of consumer
price index (CPI). Annualization of ináation is, of course, analogous to the preceding
examples. One thing to keep in mind, though, is that central bank ináation targets
are usually cast from the vantage point of whatís called a ì12-month ináation target,î
meaning that if, say, a central bank has an ináation target of 2 percent, then its
of success is whether
100 
CP It
CP It11
1 ‘ 100  [ln (CP It) ln (CP It11)]
is approximately equal to 2 percent every month, where the CPI data is assumed to
be at monthly frequency.
4 Matrix Algebra
 A matrix is simply a rectangular array of numbers. The size of a matrix is indicated
by the number of its rows and the number of its columns. A matrix with k rows and
n columns, where k and n are positive integers, is called a k n (read as ìk by nî)
matrix. The number in row i and column j is called the (i; j)th entry (read as the ìi
jth entryî), where i and j are positive integers.
4.1 Addition
 Only matrices of the same size can be added, i.e., matrices with the same numbers of
rows and columns. The resulting sum is a matrix with the same size as those that are
being added. The (i; j)th entry of the sum of two matrices is simply the sum of the
(i; j)th entries of the matrices being added. For example, consider two k n matrices
A and B; then:
A+B =
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5+
2
6
4
b11    b1n
.
.
. bij
.
.
.
bk1    bkn
3
7
5 =
2
6
4
a11 + b11    a1n + b1n
.
.
. aij + bij
.
.
.
ak1 + bk1    akn + bkn
3
7
5.
 The matrix 0 whose entries are all zero, is an additive identity since A + 0 = A for all
matrices A. Of course, for this addition to take place the 0 matrix under consideration
must have the same number of rows and columns as the A matrix under consideration.
For example, consider the k n matrix A; then:
A + 0 =
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5 +
2
6
4
011    01n
.
.
. 0ij
.
.
.
0k1    0kn
3
7
5 =
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5.
ñWhenever clariÖcation about matrix dimensions is needed, then the matrixís dimensions are noted as a subscript under the matrixís name. For instance, in the
preceding case we could have written the zero matrix in question as 0kn. Also,
if this zero matrix were a square matrix with dimensions n n we could simply
write: 0n
4.2 Scalar Multiplication
 Matrices can be multiplied by scalars. This operation is called scalar multiplication.
For instance, the product of the kn matrix A and the scalar r is created by multiplying
each entry of A by r:
rA = r
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5
=
2
6
4
r  a11    r  a1n
.
.
. r  aij
.
.
.
r  ak1    r  akn
3
7
5.
4.3 Subtraction
 Consider a k n matrix A; since A is what one adds to A to obtain 0, then:
1 
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5 =
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5.
 Furthermore, consider two k n matrices A and B. Since A B is just shorthand for
A+ (B), then:
AB =
2
6
4
a11    a1n
.
.
. aij
.
.
.
ak1    akn
3
7
5
2
6
4
b11    b1n
.
.
. bij
.
.
.
bk1    bkn
3
7
5 =
2
6
4
a11 b11    a1n b1n
.
.
. aij bij
.
.
.
ak1 bk1    akn bkn
3
7
5.
4.4 Matrix Multiplication
 Not all matrices can be multiplied together, and the order in which matrices are multiplied can matter.
 The product of two matrices A and B exists in the order A  B if and only if the
number of columns of A equals the number of rows of B.
 Assuming the multiplication can be carried out, the (i; j)th entry of A  B is given by
Xm
h=1
aih  bhj .
 For example, let A be a 3 2 matrix and B be a 2 2 matrix; since
columns of A equals the number of rows of B then the product A  B exists:
A  B =
2
4
a11 a12
a21 a22
a31 a32
3
5 

b11 b12
b21 b22 
=
2
4
a11b11 + a12b21 a11b12 + a12b22
a21b11 + a22b21 a21b12 + a22b22
a31b11 + a32b21 a31b12 + a32b22
3
5.
ñ In this case the product taken in reverse order, that is, B  A, is not deÖned.
 Note that if A is a k m matrix and B be a m n matrix, then the product A  B
will be a k n matrix. As such, the product inherits the number of its rows from A
and the number of its columns from B.
 The n n matrix I
In =
2
6
6
6
4
1 0    0
0 1    0
0
.
.
.
.
.
. 0
0 0    1
3
7
7
7
5
with aii = 1 and aij = 0 for all i 6= j has the property that for any m n matrix A,
A  In = A,
and for any n l matrix B,
In  B = B.
 The matrix I is called the identity matrix because it is a multiplicative identity for
matrices just like the scalar 1 is for numbers.
4.5 Laws of Matrix Algebra
 Associative laws:
(A + B) + C = A + (B + C);
(A  B) C = A(B  C).
 Commutative law for addition:
A + B = B + A.
 Distributive Laws:
A (B + C) = A  B + A  C;
(A + B) C = A  C + B  C.
8
 The one important law which numbers always satisfy but matrices do not, is the
commutative law for multiplication.
ñ Although for any scalars a and b it is true that ab = ba, it is not true that
A  B = B  A for two matrices A and B even when both products are deÖned.
4.6 Transpose
 The transpose of a k n matrix A is the n k matrix obtained by interchanging the
rows and columns of A. This matrix is often written as A0
. As such, the (i; j)th entry
of A becomes the (j; i)th entry of A0
. For example,

a11 a12 a13
a21 a22 a23 0
=
2
4
a11 a21
a12 a22
a13 a23
3
5.
 The following rules are fairly straightforward to verify:
(A + B)
0 = A0 + B
0
;
(A B)
0 = A0 B
0
;
(A0
)
0 = A;
(rA)
0 = rA0
;
(AB)
0 = B
0
 A0
,
where r is a scalar and A and B are k n matrices.
4.7 Special Kinds of Matrices
 We now take a look at an important class of k n matrices.
4.7.1 Square Matrix
 k = n, that is, equal number of rows and columns.
4.7.2 Column Matrix
 Also referred to as a column vector. In this case, n = 1, so the matrix has only one
column. For example:
2
4
a11
a21
a31
3
5.
4.7.3 Row Matrix
 Also referred to as a row vector. In this case, k = 1, so the matrix has only one row.
For example:

a11 a12 a13
.
4.7.4 Diagonal Matrix
 k = n and aij = 0 for i 6= j, that is, a square matrix in which all nondiagonal entries
are 0. For example,

a11 0
0 a22 
and
2
4
a11 0 0
0 a22 0
0 0 a32
3
5.
4.7.5 Upper-Triangular Matrix
 aij = 0 if i > j, that is, a (usually square) matrix in which all entries below the (main)
diagonal are 0. For example:

a11 a12
0 a22 
and
2
4
a11 a12 a13
0 a22 a23
0 0 a33
3
5.
4.7.6 Lower-Triangular Matrix
 aij = 0 if i < j, that is, a (usually square) matrix in which all entries above the (main)
diagonal are 0. For example:

a11 0
a21 a22 
and
2
4
a11 0 0
a21 a22 0
a31 a32 a33
3
5.
4.7.7 Symmetric Matrix
 A0 = A, that is, aij = aji for all i and j. These matrices are necessarily square. For
example:

a b
b d 
and
2
4
1 2 3
2 4 5
3 5 6
3
5.
4.7.8 Idempotent Matrix
 A square matrix A for which A  A = A, such as A = I or:

5 5
4 4

.

Indeed, note that in this case:

5 5
4 4



5 5
4 4

=

5  5 + (5  4) 5 (5) + (5) (4)
4  5 + (4) 4 4 (5) + (4) (4) 
=

25 20 25 + 20
20 16 20 + 16 
=

5 5
4 4

.
4.7.9 Permutation Matrix
 A square matrix of 0s and 1s in which each row and each column contains exactly one
1. For example:
2
4
0 1 0
1 0 0
0 0 1
3
5.
4.7.10 Power Matrices
 Suppose A is an n n matrix. Then, the power-matrices of A are given by:
A0 = I, A1 = A, A2 = AA, A3 = AAA, etc.
4.7.11 Inverse Matrix
 Let A be an n n matrix and B be an n n matrix. The matrix B is an inverse of
A if:
A  B = B  A = I.
ñ An n n matrix has at most one inverse.
 Let A be a k n matrix. The n k matrix B is a right inverse of A if:
A  B = I.
 Let A be a k n matrix. The n k matrix C is a left inverse of A if:
C  A = I.
ñ If A has a right inverse B and a left inverse C, then A is invertible, and
 Let x be a column vector of n variables,
x =
2
6
6
6
4
x1
x2
.
.
.
xn
3
7
7
7
5
,
c be a column vector of n constants,
c =
2
6
6
6
4
c1
c2
.
.
.
cn
3
7
7
7
5
,
and A be an n n matrix. Then,
Ax = c
amounts to a system of n equations in n unknowns. If A is invertible, then there is a
unique solution to the above-noted system of linear equations, and it is given by:
x = A1
c.
 Let A and B be square invertible matrices. Then:
ñ (A1
)
1 = A.
ñ (A0
)
1 = (A1
)
0
.
ñ A  B is invertible, and (A  B)
1 = B1
 A1
.
 If A is invertible, then:
ñ Am is invertible for any integer m and (Am)
1 = (A1
)
m = Am.
ñ For any integers r and s, Ar
 As = Ar+s
.
ñ For any scalar r 6= 0, rA is invertible and (rA)
1 = r
1
 A1
.
4.8 Determinants
 The determinant of a matrix is a scalar that ìdeterminesî whether the matrix in
question is ìnonsingularîor not. In particular, for any square matrix its determinant
is such that the square matrix is nonsingular if and only if its determinant is not zero.
(A singular matrix is not invertible.)
 A 1 1 matrix is just a scalar. As such, consider the scalar a. Its inverse exists if and
only if a is nonzero, so it is natural to deÖne the determinant of such matrix to be just
the scalar a:
de
(an alternative notation for det [a] is jaj, not to be confused with its absolute value,
although clariÖcation may be needed depending on context).
 For a 2 2 matrix
A =

a11 a12
a21 a22 
,
its determinant is deÖned as:
det 
a11 a12
a21 a22 
= a11  det (a22) a12  det (a21)
= a11  a22 a12  a21,
where the second line is useful for conceptualizing the determinant of higher order
matrices.
 Let A be a n n matrix. Let Aij be the (n 1) (n 1) submatrix obtained by
deleting row i and column j from A. Then, the scalar
Mij = det (Aij )
is called the (i; j)th minor of A and the scalar
Cij = (1)i+j
 Mij
is called the (i; j)th cofactor of A. A cofactor is a signed minor. Note that Mij = Cij
if (i + j) is even and Mij = Cij if (i + j) is odd. As such,
det [A] = det 
a11 a12
a21 a22 
= a11 det (a22) a12 det (a21)
= a11  M11 a12  M12
= a11  C11 + a12C 12 ,
which is useful for motivating the derivation of the determinant of a 3 3 matrix.
 The determinant of a 3 3 matrix
A =
2
4
a11 a12 a13
a21 a22 a23
a31 a32 a33

Need assignment help for this question?

If you need assistance with writing your essay, we are ready to help you!

OUR PROCESS

Order

Payment

Writing

Delivery

Why Choose Us: Cost-efficiency, Plagiarism free, Money Back Guarantee, On-time Delivery, Total Сonfidentiality, 24/7 Support, 100% originality

is given by:
det
2
4
a11 a12 a13
a21 a22 a23
a31 a32 a33
3
5 = a11  C11 + a12  C12 + a13  C13
= a11  M11 a12  M12 + a13  M13
= a11 det 
a22 a23
a32 a33 
a12 det 
a21 a23
a31 a33 
+ a13 det 
a21 a22
a31 a32 
.
ñ Note that the jth term on the right-hand side of the deÖnition is a1j times the
determinant of the submatrix obtained by deleting row 1 and column j from A.
The term is preceded by a plus sign if 1 + j is even and by a minus sign if 1 + j
is odd.
 The determinant of an n n matrix A is given by
det [A] = jAj
= a11C11 + a12C12 + ::: + a1nC1n
= a11M11 a12M12 + ::: + (1)n+1
a1nM1n.
4.9 Eigenvalues
 Let A be a n n matrix. Then, the scalar e is an eigenvalue of A if there exists a
nonzero vector x such that
Ax = ex,
in which case x is an eigenvector of A (associated with e).
 It should be noted that if x is an eigenvector associated with the eigenvalue e, then ex
is another eigenvector for every scalar e 6= 0.
 Eigenvalues and eigenvectors are also called characteristic values and characteristic
vectors, respectively.
 Note that
Ax = ex
! (A eI) x = 0.
 DeÖne p (e)  det [A eI]. Then,
p (e) = 0
is called the characteristic equation (or eigenvalue equation) of A.
 From the deÖnition of a determinant, it follows that p (e) is a polynomial in e. The
roots of this characteristic polynomial are the eigenvalues of
4.10 Linear Independence and Rank of a Matrix
4.10.1 Linear Independence
 Let fx1; x2; :::; xrg be a set of n 1 vectors. These are linearly independent vectors if
and only if
1×1 + 2×2 + ::: + 3xr = 0 (1)
implies that 1 = 2 = ::: = r = 0. If equation (1) holds for a set of scalars that are
not all zero, then fx1; x2; :::; xrg is linearly independent.
 The statement that fx1; x2; :::; xrg is linearly dependent is equivalent to saying that at
least one vector in this set can be written as a linear combination of the others.
4.10.2 Rank
 Let A be an nm matrix. The rank of a matrix A, denoted rank (A), is the maximum
number of linearly independent columns of A.
 If A is an n m matrix and rank (A) = m, then A has full column rank.
 If is an n m matrix, its rank can be at most m. A matrix has full column rank if its
columns form a linearly independent set. For example, the 3 2 matrix
2
4
1 3
2 6
0 0
3
5
can have at most rank 2. In fact, its rank is only 1 because the second column is 3
times the Örst column.
4.10.3 Properties of Rank
 rank (A0
) = rank (A).
 If A is n k, then rank (A)  min fn; kg, where min is the ìminimum operatorîand
in this case selects the minimum of the set fn; kg.
 If A is k k and rank (A) = k, then A is nonsingular.
4.11 Trace
 The trace of a matrix is a very simple operation deÖned only for square matrices. For
any n n matrix A, the trace of matrix A, denoted tr (A), is the sum of its diagonal
elements. Mathematically,
tr (A) = Xn
i=1
aii.
 The trace of a matrix has the following properties:
15
ñ tr (In) = n.
ñ tr (A0
) = tr (A).
ñ tr (A + B) = tr (A) + tr (B).
ñ tr ( A) = tr (A), where is a scalar.
ñ tr (A  B) = tr (B  A), where A is n m and B is m n.
5 Classical Linear Regression (CLR) Model Recap
 Recall that an estimator is:
ñ Unbiased if, on average, it hits the true parameter value. That is, the mean of
the sampling distribution of the estimator is equal to the true parameter value.
ñ Consistent if, as the sample size increases, the estimates (produced by the estimator) ìconvergeî to the true value of the parameter being estimated. To be
slightly more precise consistency means that, as the sample size increases, the
sampling distribution of the estimator becomes increasingly concentrated at the
true parameter value.
ñ NOTE: Unbiasedness is a statement about the expected value of the sampling
distribution of the estimator. Consistency is a statement about ìwhere the sampling distribution of the estimator is goingîas the sample size increases.
ñ E¢ cient if the sampling distribution of the estimator being used has smallest
variance among the class of estimators being considered. For instance, even if we
are dealing with a set of biased estimators that are nonetheless consistent, then
among these biased and consistent estimators the e¢ cient one would be the one
that has the smallest variance.
5.1 CLR Model in Matrix Form
5.1.1 Representation
 In the remainder of this section we will use the t subscript to index observations and
an n to denote the sample size. Then, the multiple linear regression model with k
parameters is written as follows:
yt = 1 + 2×2;t + 3×3;t + ::: + kxk;t + “t (2)
for t = 1; 2; :::; n and where: yt
is the dependent variable for observation t; xj;t for
j = 2; 3; :::; k are the independent variables; and is the Greek letter ìbeta.î
 For each t deÖne a 1 k vector
xt =

1 x2;t    xk;t
16
and let
=

1 2
   k
0
be the k 1 vector of all parameters. Then, we can write equation (2) as:
yt = xt + “t (3)
for t = 1; 2; :::; n.
 We can write equation (3) in full matrix notation by appropriately deÖning data vectors
and matrices. Let yt denote the n 1 vector on y: the tth element of yt
is yt
. Let
Xt be the n k vector of observations on the explanatory variables. In other words,
the tth row of Xt consists of the vector xt
. Equivalently, the (j; t)th element of X is
simply xj;t:
Xt =
2
6
6
6
4
x1
x2
.
.
.
xn
3
7
7
7
5
=
2
6
6
6
4
1 x2;1 x3;1    xk;1
1 x2;2 x3;2    xk;2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x2;n x3;n    xk;n
3
7
7
7
5
.
Of course, because the regression model under consideration involves a constant parameter 1
, then x1;t = 1 8t (8 means ìfor allîin mathematics jargon). Then, we can
write equation (3) for all n observations in matrix notation:
yt = Xt + “t
, (4)
where:
“t =
2
6
6
6
4
“1
“2
.
.
.
“n
3
7
7
7
5
5.1.2 Estimation
 Estimation of proceeds by minimizing the sum of squared residuals. DeÖne the sum
of squared residuals function for any possible k 1 parameter vector b as:
SSR (b) 
Xn
t=1
(yt xtb)
2
,
where: is the Greek letter ìcapital sigma,îwhich is used to represent the summation
operator, where the sum is from t = 1 through t = n. The k 1 vector of ordinary
1
least squares estimates,
^ =

^
1 ^
2
   ^
k
0
=
2
6
6
6
4
^
1
^
2
.
.
.
^
k
3
7
7
7
5
,
minimizes SSR (b) over all possible k 1 vectors b. This is a problem in multivariable
calculus.
 For ^ to minimize the sum of squared residuals, it must solve the Örst-order condition:
@SSR 
^

@b
= 0. (5)
To be clear, this condition says that the partial derivative of SSR () with respect to
b (denoted using the calculus ìpartialîsymbol @) evaluated at ^ equals zero.
 More concretely, using the power and chain rules from calculus,
@SSR
@b
=
@
Pn
t=1 (yt xtb)
2
@b
= 2 (yt xtb) xt
,
which is, of course, a 1 k vector. As such, equation (5) is equivalent to:
Xn
t=1
x
0
t

yt xt ^

= 0 (6)
(where we have divided by 2 and then took the transpose). We can write this Örst
order condition as:
Xn
t=1

yt ^
1 ^
2×2;t ::: ^
kxk;t
= 0
Xn
t=1
x2;t 
yt ^
1 ^
2×2;t ::: ^
kxk;t
= 0
.
.
.
Xn
t=1
x2;t 
yt ^
1 ^
2×2;t ::: ^
kxk;t
= 0.
Using the properties of matrix algebra, this system of equations can of course by written
as
X0
t

yt Xt
or
(X0
tXt) ^ = X0
tyt
. (8)
It can be shown that equation (8) always has at least one solution.
 Assuming that the k k symmetric matrix X0
tXt
is nonsingular, we can premultiply
both sides of equation (8) by (X0
tXt)
1
to solve for the CLR estimator ^:
^ = (X0
tXt)
1 X0
tyt
. (9)
This expression is the critical equation for matrix analysis of the multiple linear regression model and, of course, corresponds to the ordinary least squares (OLS) procedure. The assumption that X0
tXt
is invertible is equivalent tot he assumption that
rank (Xt) = k, which means that the columns of Xt must be linearly independent.
 NOTE: it is tempting to simplify the equation for ^ as follows:
^ = (X0
tXt)
1 X0
tyt
= X1
t
(X0
t
)
1 X0
tyt
= X0
tyt
.
The problem, though, is that Xt
is usually not a square matrix, and so it cannot be
inverted. In other words, we cannot write (X0
tXt)
1 = X1
t
(X0
t
)
1
unless n = k, a case
that virtually never arises in practice.
 The n 1 vectors of OLS Ötted values and residuals are given respectively by:
^yt = Xt ^
t
and
^”t = yt ^yt
= yt Xt ^.
 Of course, from equation (7) and the deÖnition of ^”t
it follows that the Örst order
condition that deÖnes ^ is the same as:
X0
t^”t = 0. (10)
Because the Örst column of Xt consists entirely of ones, equation (10) implies that the
OLS residuals always sum to zero when an intercept is included in the equation and
that the sample covariance between each independent variable and the OLS residuals
is
 The sum of squared residuals can be written as:
SSR =
Xn
t=1
^”
2
t
= ^”
0
t^”t
=

yt Xt ^
0 
yt Xt ^

. (11)
 All of the previous is better understood using full-áedged matrix algebra, which is
why weíre putting emphasis on thisó in fact, matrix algebra, believe it or not, makes
the study of multivariate time series much, much, much(!) simpler than it would be
otherwise…trust me!
ñ Letís start with
yt = Xt ^ + “t
,
which implies that
“t = yt Xt ^.
ñ The estimator ^ therefore minimizes the sum of squared residuals (which is a
scalar):

0
t”t =

yt Xt ^
0 
yt Xt ^

= y
0
tyt ^
0
X0
tyt y
0
tXt ^ + ^
0
X0
tXt ^,
which follows by the rules of matrix algebra studied earlier. Therefore,

0
t”t =

yt Xt ^
0 
yt Xt ^

= y
0
tyt ^
0
X0
tyt y
0
tXt ^ + ^
0
X0
tXt ^
= y
0
tyt 2 ^
0
X0
tyt+ ^
0
X0
tXt ^,
where the second line follows from the fact that the transpose of a scalar is just
that same scalar, that is
y
0
tXt ^ =

y
0
tXt ^
0
= ^
0
X0
tyt
.
ñWe now need to take the drivative of the sum of squared residuals with respect
to ^ and set it to zero to get at the estimator that minimizes this sum (the proof
that this is a minimum is beyond the scope and needs of this class). Th
the Örst order condition is
@”
0
t”t
@ ^
=
@y
0
tyt
@ ^
| {z }
=0
2
@ ^
0
X0
tyt
@ ^
+
@ ^
0
X0
tXt ^
@ ^
= 0.
ñ Note that
@ ^
0
@ ^
= @ ^
0

@ ^
1
2
6
6
4
@ ^
0
@ 1
.
.
.
@ ^
0
@ k
3
7
7
5
=
2
6
6
6
6
4
@
h
^
1 ^
2
   ^
k
i
@ 1
.
.
.
@
h
^
1 ^
2
   ^
k
i
@ k
3
7
7
7
7
5
=
2
6
4

1 0    0

.
.
.

0 0    1

3
7
5 = Ik.
ñ Putting everything together so far
@”
0
t”t
@ ^
=
@y
0
tyt
@ ^
| {z }
=0
2
@ ^
0
@ ^
X0
tyt
| {z }
2
@ ^
0
@ ^ X0
tyt
= 2IkX0
tyt
= 2X0
tyt
+
@ ^
0
X0
tXt ^
@ ^
= 0.
ñ To understand how to deal with the last term above, recall that Xt
is n k, ^ is
k 1, and therefore ^
0
is 1 k. It follows that
^
0
X0
tXt ^ =

1
   k

2
6
4
x11    x1;k
.
.
.
.
.
.
.
.
.
xn;1    xn;k
3
7
5
2
6
4
1
.
.
.
k
3
7
5.
Then,

1
   k

2
6
4
x11    x1;k
.
.
.
.
.
.
.
.
.
xn;1    xn;k
3
7
5
=

1×11 + ::: + kxn;1    kx1;k + ::: + kxn;k
and

1
   k

2
6
4
x11    x1;k
.
.
.
.
.
.
.
.
.
xn;1    xn;k
3
7
5
2
6
4
1
.
.
.
k
3
7
5
=

1×11 + ::: + kxn;1    kx1;k + ::: + kxn;k
2
6
4
1
.
.
.
k
3
7
5
=
2
1×11 + ::: +
2
kxn;1 + ::: +
2
kx1;k + ::: +
2
kxn;k
= ^
0
X0
tXt ^:
It follows that
@ ^
0
X0
tXt ^
@ ^
=

2
1×11 + ::: +
2
kxn;1 + ::: +
2
kx1;k + ::: +
2
kxn;k
@ 1
+:::
+

2
1×11 + ::: +
2
kxn;1 + ::: +
2
kx1;k + ::: +
2
kxn;k
@ k
which is equal to
2 1×11 + ::: + 2 kxn;k.
Analogously to
^
0
X0
tXt ^ =
2
1×11 + ::: +
2
kxn;1 + ::: +
2
kx1;k + ::: +
2
kxn;k,
it immediately follows that
2 1×11 + ::: + 2 kxn;k = 2X0
tXt ^.
ñ Putting this Önal piece into our earlier derivative, we then have
@”
0
t”t
@ ^
=
@y
0
tyt
@ ^
| {z }
=0
2
@ ^
0
@ ^
X0
tyt
| {z }
2
@ ^
0
@ ^ X0
tyt
= 2IkX0
tyt
= 2X0
tyt
+
@ ^
0
X0
tXt ^
@ ^
| {z }
2X0
tXt ^
= 0.
Thus,
@”
0
t”t
@ ^
= 2X0
tyt + 2X0
tXt ^ = 0,
and it follows that
X0
tXt ^ = X
0
tyt
.

Therefore,
^ = (X0
tXt)
1 X0
tyt
.
5.2 CLR Model Finite-Sample Assumptions and Properties
5.2.1 Assumptions
 The CLR model consists of Öve basic assumptions about the way in which the observations are generated.
 Assumption 1. The dependent variable can be calculated as a linear function of a
speciÖc set of independent variables, plus a disturbance term. The unknown coe¢ cients
of this linear function form the vector and are assumed to be constants. In other
words, the model can be written as in equation (4), where: yt
is an observed n 1
vector; Xt
is an n 1 observed matrix; and “t
is an n 1 vector of unobserved
errors or disturbances There can be several violations of this assumption. For instance:
ñWrong regressors: the omission of relevant independent variables or the inclusion
of irrelevant independent variables.
ñ Nonlinearity: when the relationship between the dependent and independent variables is not linear.
ñ Changing parameters: when the parameters do not remain constant during the
period in which data were collected.
 Assumption 2. The expected value of the disturbance term is conditionally zero; that
is, the mean of the distribution from which the disturbance is drawn is zero conditional
on knowledge of the matrix Xt
. Mathematically,
E(“t
jXt) = 0 8t
(recall that the symbol 8 means ìfor allî in mathematical jargon). This assumption
imposes strict exogeneity (as explained later) on the explanatory variables, which,
for instance, rules out explanatory variables whose future values are correlated with
the error term. In particular, this assumption eliminates lagged dependent variables
entering the regression equation as regressors. Given assumption 2, we can condition
on the regressors when computing the expected value of ^. Of note, in a time
series context, assumption translates into the time series of disturbances
being white noise, as deÖned just below.
ñ The problem with violating assumption 2 is most easily assessed by assuming
that the equation in question is instead estimated by rearranging it and removing
the nonzero mean from the error term and adding it to the intercept term. This
situation creates an estimating equation obeying all the CLR model assumptions;
in particular, the mean of a new error term is zero. The only problem is that
ordinary least squares estimation gives an unbiased estimate of the new intercept,
which is the sum of the original intercept and the mean of the original error term;
2
it is therefore a biased estimate of the original intercept (the bias being exactly
equal to the mean of the original error term). Thus, the only implication of this
violation of the second assumption of the CLR model is that the OLS estimate of
the intercept is biased; the slope coe¢ cient estimates are una§ected. This biased
estimate is often welcome by the econometrician since, for prediction purposes,
s/he would want to incorporate the mean of the error term into the prediction.
 Assumption 3. The disturbance terms all have the same variance and are not correlated with one another, which is referred to as the disturbances being ìsphericalî(absent one or the other condition then the disturbances are said to be ìnonsphericalî).
ñ These characteristics are usually described in terms of the variance-covariance
matrix of the disturbance vector. If we have a disturbance vector of length t, then
the variance covariance estimator of the disturbances is a matrix with t rows and
t columns.
ñ The diagonal terms are the variances of the individual disturbances, and the o§-
diagonal terms are the covariances between them. Each diagonal term gives the
variance of the disturbance associated with one of the sample observations (i.e.,
the Örst diagonal term gives the variance of the disturbance associated with the
Örst observation, and the last diagonal term gives the variance of the disturbance
term associated with the tth observation). If all these diagonal terms are the same,
the disturbances are said to have uniform variance or to be ìhomoskedastic.î
If the diagonal terms are not all the same, the disturbances are said to be
ìheteroskedastic;îthe disturbance term is then thought of as being drawn from
a di§erent distribution for each observation.
ñ Each o§-diagonal element of the variance-covariance matrix gives the covariance
between the disturbances associated with two of the sample observations. If all
these o§-diagonal terms are zero, the disturbances are said to be uncorrelated,
which means that in repeated samples there is no tendency for the disturbance
associated with one observation to be related to the disturbance associated with
any other. If the o§-diagonal terms are not all zero, the disturbances are said
to be ìautocorrelated;î the disturbance term for one observation is correlated
with the disturbance term for another observation.
ñ In a time series context it is common to encounter autocorrelated errors, heteroskedasticity, or both. The presence of heteroskedasticity or autocorrelated errors does not create bias or consistency issues in estimating coe¢ cients whenever
all other assumptions of the CLR model hold. However, OLS estimators are ine¢ cient. In addition, the standard errors are biased when heteroskedasticity
is present, which leads to bias in test statistics and conÖdence intervals.
ñ NOTE: To qualify as white noise, the random variable “t must satisfy three
conditions (recall that: the symbol 8 means ìfor allîin mathematical jargon; and
 is the Greek letter ìsigmaî):
Its mean must be zero:
E(“t) = 0 8t;
24
Its variance must be Önite and constant:
E


2
t

= var

2
t

= E(“t”
0
t
)
= 
2

I 8t;
It must be uncorrelated with past or future values of “t
:
E

“t”
0
j

= cov (“t
; “j )
= 0 8t 6= j.
 Assumption 4. The observations on the independent variable can be considered
Öxed in repeated samples; that is, it is possible to redraw the sample with the same
independent variable values.
ñ Two important econometric problems correspond to violations of this assumption: errors in variables, i.e., errors in measuring the independent variables; and
autoregression, i.e., using a lagged value of the dependent variable as an independent variable. Of course, autoregression is very common in time series
analysis.
 Assumption 5. The Öfth assumption of the CLR models is that the number of
observations is greater than the number of independent variables and that there are no
exact linear relationships between the independent variables. Although this is viewed
as an assumption in general, for a speciÖc case it can easily be checked, so that it
need not be assumed. The problem of multicollinearity (two or more independent
variables being approximately linearly related in the sample data) is associated with
this assumption. Under this assumption X0
tXt
is nonsingular, and so ^ is unique and
can be written as in equation (9). In other words, the matrix Xt has rank k per the
notation used earlier in these notes.
 When these Öve assumptions hold, OLS is considered the optimal estimator. That
said, time series econometrics will require careful application of otherwise standard
econometric techniques, since, as noted above, key CLR model assumptions are easily
violated in time series contexts.
5.2.2 Properties
Theorem 1 Unbiasedness of OLS. Under the CLR model assumptions (1), (2), (3), and
(5) the OLS estimator ^ is an unbiased estimator for .
Proof. Use assumptions (1) and (3) to write:
^ = (X0
tXt)
1 X0
tyt
= (X0
tXt)
1 X0
t
(Xt + “t
)
= (X0
tXt)
1
(X0
tXt) + (X0
tXt)
1 X0
t”t
= + (X0
tXt)
1 X0
t”t
,
where we use the fact that (X0
tXt)
1
(X0
tXt) = Ik. Taking the expectation conditional on Xt
gives:
E

^jXt

= E
h
+ (X0
tXt)
1 X0
t”t
jXt
i
= +E
h
(X0
tXt)
1 X0
t”t
jXt
i
= + (X0
tXt)
1 X0
tE(“t
jXt)
= + (X0
tXt)
1 X0
t0
=
because E(“t
jXt) = 0 under the CLR modelís assumption (2). And, invertibility of X0
tXt
follows by the CLR modelís assumption (5).
Theorem 2 Variance-covariance matrix of the OLS estimator. Under the CLR
modelís assumptions (1) through (5)
var 
^jXt

= 
2

(X0
tXt)
1
. (1
Proof. From equation (12) we have:
var 
^jXt

= E
h
^E

^jXt
i2
jXt

= E

^
2
jXt

(by Theorem 1)
= E

^
  ^
0
jXt

= E

(X0
tXt)
1 X0
t”t
 (X0
tXt)
1 X0
t”t
0
jXt

(by substituting in rearrangement of equation 12)
= E

(X0
tXt)
1 X0
t”t


0
t
h
(X0
tXt)
1 X0
t
i0
jXt

(by transpose properties)
= E

(X0
tXt)
1 X0
t”t”
0
t
(X0
t
)
0
h
(X0
tXt)
1
i0
jXt

(by transpose properties)
= E

(X0
tXt)
1 X0
t”t”
0
tXt
h
(X0
tXt)
0
i1
jXt

(by transpose properties)
= E

(X0
tXt)
1 X0
t”t”
0
tXt
h
X0
t
(X0
t
)
0
i1
jXt

(by transpose properties)
= E
h
(X0
tXt)
1 X0
t”t”
0
tXt (X0
tX0
t
)
1
jXt
i
(by transpose properties)
= (X0
tXt)
1 X0
tE(“t”
0
t
jXt) Xt (X0
tX0
t
)
1
= (X0
tXt)
1 X0
t
var (“t
jXt) Xt (X0
tXt)
1
;
Now, use the CLR modelís assumption (4) to obtain:
var 
^jXt

= (X0
tXt)
1 X0
t
2

InXt (X0
tXt)
1
= 
2

(X0
tXt)
1 X0
tXt (X0
tXt)
1
= 
2

(X0
tXt)
1
.
 Equation (13) means that the variance of ^
j
(conditional on Xt) is obtained by multiplying 
2
” by the jth diagonal element of (X0
tXt)
1
. This equation also tells us how to
obtain the covariance between any two OLS estimates: multiply 
2
” by the appropriate
o§-diagonal elem
Theorem 3 Gauss Markov. Under the CLR modelís assumptions (1) through (5) ^ is the
best linear unbiased estimator (BLUE).
Proof. Any other linear estimator of can be written as:
~ = A0
tyt
, (14)
where: At
is an n k matrix. In order for ~ to be unbiased conditional on Xt
, At can
consist of nonrandom numbers and functions of Xt
. (For example, At cannot be a function
of yt
.) To see what further restrictions on At are needed, write:
~ = A0
t
(Xt + “t
)
= (A0
tXt) + A
0
t”t
. (15)
Then,
E

~jXt

= E[(A0
tXt) + A
0
t”t
jXt
]
= E[(A0
tXt) jXt
] + E(A0
t”t
jXt)
= (A0
tXt) + A
0
tE(“t
jXt)
(since At
is a function of Xt)
= (A0
tXt)
(since E(“t
jXt) equals 0).
For ~ to be an unbiased estimator of it must be true that E

~jXt

= for all k 1
vectors , that is,
(A0
tXt) = (16)
for all k 1 vectors . Because A0
tXt
is a k k matrix, equation (16) holds if and only if
A0
tXt = Ik. Equations (14) and (16) characterize the class of linear, unbiased estimators for
. Next, from equation (15) we have:
var 
~jXt

= A0
t
[var (“t
jXt)] At
= 
2
“A0
tAt
28
by the CLR modelís assumption (3). Therefore,
var 
~jXt

var 
^jXt

= 
2

h
A0
tAt (X0
tXt)
1
i
= 
2

h
A0
tAt A0
tXt (X0
tXt)
1 X0
tAt
i
(because A0
tXt equals Ik)
= 
2

h
A0
t A0
tXt (X0
tXt)
1 X0
t
i
At
= 
2
“A0
t
h
In Xt (X0
tXt)
1 X0
t
i
At
= 
2
“A0
tMtAt
,
where Mt  In Xt (X0
tXt)
1 X0
t
. Because Mt
is symmetric and idempotent, A0
tMtAt
is positive semi-deÖnite for any n k matrix At
. (ìPositive semi-deÖniteî means that
z
0
t
(A0
tMtAt) zt
is greater than or equal to the zero vector for every nonzero column vector
zt
.) This establishes that the OLS estimator ^ is BLUE.
 It can be shown that the unbiased estimator of the error variance 
2

can be written
as:
^
2
” =
^”
0
t^”t
n k
,
where we have labeled the explanatory variables so that there are k total parameters,
including the intercept.
Theorem 4 Unbiasedness of ^
2

. Under assumptions (1) through (5) of the CLR model
^
2

is an unbiased estimator of 
2

so that E

^
2

jXt

= 
2

for all 
2
” > 0.
Proof. Write
^”t = yt Xt ^
= yt Xt (X0
tXt)
1 Xtyt
= Mtyt
= Mt”t
,
where: Mt  InXt (X0
tXt)
1 X0
t
, and the last equality follows because MtXt = 0. Because
Mt
is symmetric and idempotent,
^”
0
t^”t = ”
0
tM0
tMt”t
= ”
Because ”
0
tMt”t
is a scalar, it equals its ìtrace.îTherefore,
E(”
0
tMt”t
jXt) = E[tr (”
0
tMt”t)jXt
]
= E[tr (Mt”t”
0
t
)jXt
]
= trE[(Mt”t”
0
t
)jXt
]
= tr [MtE(“t”
0
t
jXt)]
= tr
Mt
2

In

= 
2

tr (Mt)
= 
2

(n k),
where: tr is the trace operator, which calculates the trace of its input. Note that the last
equality above follows from:
tr (Mt) = tr (In) tr h
Xt (X0
tXt)
1 Xt
i
= n tr h
Xt (X0
tXt)
1 Xt
i
= n tr (Ik)
= n k.
Therefore,
E

^
2

jXt

=
E(”
0
tMt”t
jXt)
n k
= 
2

.
6 A Primer on First-Order Di§erence Equations
 A di§erence equation expresses a relationship between a dependent variable and a
lagged independent variable (or variables) that changes at discrete intervals of time.
For example, It = f (Yt1), where I and Y are measured at the end of each year.
 The order of a di§erence equation is determined by the greatest number of periods
lagged. As such: a Örst-order di§erence equation expresses a time lag of one period;
a second-order, two periods; etc.
 The change in a variable y as t changes from time-period t to time-period t+1 is called
the Örst di§erence of y. It is written as:
dy
dt = dyt+1 = yt+1 yt
. (17)
 One way to solve di§erence equations is called the iterative method. For ins
let
yt+1 = byt
where b is a constant, and let the initial value of y be given by y0 (a constant). Then,
yt+1 = byt (18)
! yt = byt1
! yt1 = byt2,
etc. Iterative backwards substitution implies that:
yt+1 = b(byt1)
| {z }
=yt
= b
2
yt1
= b
2
(byt2)
| {z }
=yt1
= b
3
yt2,
etc. Note that
yt+1 = b
3
yt2
! yt = b
3
yt3
! yt = b
s
yts.
So, when t = s we have:
yt = b
t
y0,
which traces back the value of yt
in any period t to its initial value weighted by a
function of the constant b. Therefore, the constant b plays a critical role in determining
the evolution of the variable y.
 Given a Örst-order di§erence equation which is linear (i.e., all the variables are raised
to the Örst power and there are no cross products),
yt = a + byt1, (19)
where a and b are constants, the ìgeneral formula for a deÖnite solutionîis:
yt =

y0
a
1 b

b
t +
a
1 b
(20)
when b 6= 1 and
yt = y0 + at (21)
when b = 1. If no initial condition is given, an arbitrary constant A is used for y0
a
1b
in equation (20) and for y0 in equation (21). This situation is called the ìgenera
 Therefore, equation (20) can be expressed in the general form
yt = c + Abt
, (22)
where: A = y0
a
1b
; and c =
a
1b
. Here, Abt
is called the complementary function and c is called the particular solution. The particular solution expresses the
intertemporal equilibrium level of y, while the complementary function represents
the deviations from that equilibrium.
 Equation (22) will be dynamically stable, therefore, only if the complementary function Abt ! 0 (in this case the symbol ! is used to denote ìtendsîin the mathematical sense of limits) as t ! 1. All depends on the base b. Assuming A = 1 and
c = 0 for the moment, the exponential expression b
t will generate 7 di§erent time paths
depending on the value of b, as illustrated further below. If jbj > 1 the time path will
explode and move farther and farther away from equilibrium. If jbj < 1 the time path
will be damped and move toward equilibrium. If b < 0, the time path will oscillate
between positive and negative values. If b > 0 the tie path will be nonoscillating. Note
that if A 6= 1 the value of this multiplicative constant will scale up or down the magnitude of b
t
, but will not change the basic patter of movement. If A = 1 a mirror image
of the time path of b
t with respect to the horizontal axis will be produced. If c 6= 0 the
vertical intercept of the graph is a§ected, and the graph shifts up or down accordingly.
You should convince yourself that all of these scenarios are true by reproducing the
graphs further below for di§erent values of b (and A). In particular, because in the
equation yt = b
t
, b can range from 1 to +1:
ñ If b > 1, b
t
increases at an increasing rate as t increases, thus moving farther
away from the horizontal axis This is illustrated in panel (a) of the Ögure below,
which is a step function representing changes at discrete intervals of time, not a
continuous function.
ñ If b = 1, b
t = 1 for all values of t. This is represented by a horizontal line in panel
(b) of the Ögure below.
ñ If 0 < b < 1 then b is a positive fraction and b
t decreases as t increases, drawing
closer and closer to the horizontal axis, but always remaining positive, as shown
in panel (c) of the Ögure below.
ñ If b = 0 then b
t = 0 for all values of t as shown in panel (d) of the Ögure below.
ñ If 1 < b < 0 then b is a negative fraction and b
t will alternative in sign and draw
closer and closer to the horizontal axis as t increases as shown in panel (e) of the
Ögure below.
ñ If b = 1 then b
t oscillates between +1 and 1 as shown in panel (f) of the Ögure
below.
ñ If b < 1 then b
t will oscillate and move farther and farther away from the
horizontal axis as shown in panel (g) of the Ögur
 In short: if jbj > 1 the time path explodes; if jbj < 1 the time path converges; if b > 0
the time path is nonoscillating; and if b < 0 the time path oscillates.
33