Category Archives: Notes

Errata: Zariski – Samuel

$\newcommand{\aaa}{\mathfrak{a}} \newcommand{\mmm}{\mathfrak{m}} \newcommand{\qqq}{\mathfrak{q}} $

  1. Volume 2, Chapter VIII, Section 10 (Theory of Multiplicities), Proof of Theorem 22: in the case of zero-divisors in dimension 1, the set up is this:
    • $A$ is a local ring with maximal ideal $\mmm$ such that $A/\mmm$ is an infinite field,
    • $\qqq$ is an ideal of $A$ with $\sqrt{\qqq} = \mmm$,
    • $x \in A$ is outside of all isolated prime ideals of $0$,
    • $x$ is also superficial of order $1$ for $\qqq$, i.e. there is a nonnegative integer $c$ such that $(\qqq^n: Ax) \cap \qqq^c = \qqq^{n-1}$ for $n \gg 1$,
    • $\dim(A) = 1$,
    • $\mmm$ is an embedded prime ideal of $0$, i.e. all elements of $\mmm$ are zero divisors.
      With this, they define $\aaa := (0: Ax)$ and at some point claim that $x$ is a non zero-divisor in $A/\aaa$. This is false, as seen from the following example: $A := k[[x, y]]/\langle x^2y, y^2 \rangle$, where $k$ is an infinite field, and $\qqq = \mmm = \langle x, y \rangle$. Then every element of $A$ can be represented as $f(x) + ay + bxy$ with $f \in k[[x]]$, $a, b \in k$. It is straightforward to check that $\langle x^2y, y^2 \rangle = \langle y \rangle \cap \langle x^2, y^2 \rangle$ induces a primary decomposition of the zero ideal in $A$, and $x$ satisfies all the above properties. However, in this case $\aaa = Axy$, and $x$ is a zero-divisor in $A/\aaa$, since $xy = 0 \in A/\aaa$. This gives the required counterexample. (The proof can be salvaged by defining $\aaa$ as $\bigcup_n (0: Ax^n)$.)

BKK bound – an “elementary proof”

Some comments of Jonathan Korman made me realize that it would have been good to include in How Many Zeroes? an “elementary” proof of the Bernstein-Kushnirenko formula via Hilbert polynomials. This approach only yields a “weak” version of the bound since it applies only to the case that the number of solutions is finite. Nevertheless, it is a rewarding approach since this shows how it can be useful to interpret polynomials, or more generally regular functions on a variety, as linear sections after an appropriate embedding, so that the number of solutions of systems of polynomials can be interpreted as the degree of a variety, and in addition, how the geometric concept of degree of a variety can be interpreted in terms of its Hilbert polynomial, an extremely fruitful algebraic tool discovered by Hilbert which by now occupies a central role in algebraic geometry.

The following posts describe this approach:

  1. The first one that introduces the degree of a projective variety,
  2. The second post describes the connection between the degree of a projective variety and its Hilbert polynomial
  3. The third one proves the weak version of Kushnirenko’s formula using Hilbert polynomials, and sketches the derivation of Bernstein’s and Bézout’s formulae.

Notes (part 2): S. Bosch, U Güntzer and R. Remmert, Non-Archimedean Analysis

\(
\newcommand{\ff}{\mathbb{F}} \DeclareMathOperator{\ord}{ord} \newcommand{\qq}{\mathbb{Q}}
\newcommand{\rr}{\mathbb{R}} \newcommand{\rnneg}{\rr_{\geq 0}} \newcommand{\rpos}{\rr_{> 0}}
\newcommand{\zz}{\mathbb{Z}} \newcommand{\znneg}{\zz_{\geq 0}}
\)Given a normed (abelian) group \(A,\) the following are some important subsets of \(\prod_{i=1}^\infty A\):

  • \(A^{(\infty)} := \bigoplus_{i=1}^\infty A\)
  • the set of zero sequences, \(c(A) := \{(c_i): \lim_{i \to \infty} |c_i| = 0\}\)
  • the set of bounded sequences, \(b(A) := \{(b_i)_i: \sup_i |b_i| < \infty\}\)

It is clear that \(A^{(\infty)} \subseteq c(A) \subseteq b(A).\) If \(A\) is a ring/algebra over a field/module over a ring, each of these sets inherits the structure from \(A.\) We would always consider each of these sets equipped with the component-wise supremum norm:
\[|(b_i)_i| := \sup_i |b_i|\]
If \(A\) is a normed ring (respectively, field), this makes these sets normed \(A\)-modules (respectively, vector spaces). If \(A\) is complete, then so are \(b(A)\) and \(c(A),\) but not \(A^{(\infty)}.\)

Examples/Facts:

  1. Consider a valued field \(K\) equipped with a nontrivial valuation.
    1. \(c(K)\) is a \(K\)-vector space of countable type, i.e. it has a dense linear subspace of at most countable dimension (e.g. \(K^{(\infty)}\)). It follows that any weakly \(K\)-cartesian family in \(c(K)\) is at most countable. (For each element in such a family, replace it by an element of \(K^{(\infty)}\) sufficiently close to it, then the resulting system would be a weakly cartesian subset of \(K^{(\infty)}.\) Since weak cartesianity implies linear independence, and since \(K^{(\infty)}\) has countable dimension, it follows that the original family must be countable.)
    2. \(b(K)\) has an uncountable orthonormal family. (The natural map
      \[(b(K))^\sim \to \prod_{i=1}^\infty K^\sim\]
      is surjective and the latter space has uncountable dimension over \(K^\sim.\) Now pullback an uncountable orthonormal system from the image.)
    3. It follows that there is no \(K\)-linear map \(b(K) \to c(K)\) which is a homeomorphism.
    4. On the other hand given any \(a \in K\) with \(0 < |a| < 1,\) the mapping
      \[(b_i)_i \mapsto (a^ib_i)_i\]
      defines a continuous injective \(K\)-linear map from \(b(K)\) to \(c(K).\)
  2. If a ring is complete, it does not necessarily mean that its field of fractions is complete.
    1. E.g. take \(A := k[[x, y]]\) with the usual (discrete) valuation corresponding to its maximal ideal. The valuation is also induced by the order, i.e.
      \[|f| := e^{-\ord(f)}\]
      Pick a power series \(f(t) \in k[[t]]\) which is not a rational function in \(t,\) i.e. which is not in the image of the natural map from \(k(t) \to k[[t]]\) (e.g. one can take the power series for \(e^t,\) at least if the characteristic of \(k\) is zero). Then \(f(y^2/x)\) is in the completion of the field of fractions of \(A,\) but it can not be represented as \(g/h\) for \(g, h \in A.\) [To see this consider the edge of \(f(y^2/x)h\) corresponding to the inner normal \((2, 1).\)]
    2. Let \(A := k \langle x \rangle\) be the ring of strictly convergent power series over a field \(k\) with a complete nontrivial valuation. Then \(A\) is complete with respect to the Gauss norm. The field of fractions of \(A\) is the field \(L := k \langle x \rangle [x^{-1}]\) of strictly convergent meromorphic series. Take \(c \in k\) such that \(0 < |c| < 1.\) Then the sequence \(\sum_{j=1}^i c^i x^{-i},\) \(i = 1, 2, \ldots,\) is Cauchy but without limit in \(L.\)
  3. The algebraic closure of a complete field does not have to be complete.
    1. Take the field \(k((t))\) of Laurent series, which is complete with respect to the valuation induced by \(\ord.\) Its algebraic closure is the field of “meromorphic” Puiseux series, which is not complete, since the Cauchy sequence
      \[f_n := \sum_{j=1}^n t^{1/n + n}\]
      does not converge.
    2. The algebraic closure of \(\qq_p\) is not complete. Both this and the preceding example are special cases of the following:
      Lemma [BGR 3.4.3/1]. Let \(K\) be a field with a complete nontrivial valuation and \(K_a\) be its algebraic closure. Assume \([K_a:K] = \infty.\) Then \(K_a\) is not complete.
      Proof. If the separable closure \(K_{sep}\) of \(K\) in \(K_a\) had finite degree over \(K,\) then it would be complete, and therefore, since \(K_{sep}\) is dense in \(K\) [BGR 3.4.1/6], it would follow that \(K_a = K_{sep}\) is also of finite degree over \(K,\) which is a contradiction. So \([K_{sep}: K] = \infty.\) Choose a sequence \(x_1 = 1, x_2, \ldots\) of elements in \(K_{sep}\) which are linearly independent over \(K\). Choose \(c_2, c_3, \ldots \in K \setminus \{0\}\) such that
      • \(\lim_i |c_ix_i| = 0\)
      • \(|c_{i+1}x_{i+1}| < \min\{|c_ix_i|, r(\sum_{j=2}^i c_jx_j)\}\)
        (where \(r(\alpha)\), for \(\alpha\) separable over \(K\) of degree > 1, is the minimum of \(|\alpha – \beta|\) over all roots \(\beta \neq \alpha\) of the minimal polynomial of \(\alpha\) over \(K.\)) Note that \(r(\sum_{j=2}^i c_jx_j)\) is well defined for \(j \geq 2,\) since \(\sum_{j=2}^i c_jx_j \in K_{sep} \setminus K,\) since all \(c_j\) are nonzero and \(x_j\) constitute a basis of \(K_{sep}\) over \(K.\) Then if \(\sum_{j=2}^\infty c_jx_j\) converges to an element \(x \in K_a,\) then by assumption,
        \[|x – \sum_{j=2}^i c_jx_j| = |\sum_{j=i+1}^\infty c_jx_j| \leq |c_{i+1}x_{i+1}| < r(\sum_{j=2}^{i}c_ix_i)\]
        so that Krasner’s lemma implies that \(K(x) \supseteq K(\sum_{j=2}^i c_jx_j).\) It follows that \(K(x) \supseteq K(x_2, x_3, \ldots),\) so that \([K(x):K] = \infty,\) which is a contradiction, since \(x\) is algebraic over \(K.\) This completes the proof. In fact, if \(x\) is the element in the completion of \(K_a\) which represents \(\sum_{j=2}^\infty c_jx_j,\) then it follows (from the arguments in the proof of Krasner’s lemma [BGR 3.4.2/2]) that \(K(x)\) indeed contains \(K(x_2, x_3, \ldots),\) and in particular, \(x\) is transcendental over \(K.\)
  4. A (countable) basis of \(\tilde V\) over \(\tilde K\) may not lift to a Schauder basis of a vector space \(V\) over a (valued field) \(K.\) Let \(K\) be a field with a nontrivial non-discrete valuation. Choose a Schauder basis \(\{x_1, x_2, \ldots,\}\) of \(V := c(K)\) such that \(|x_i| = 1\) for all \(i.\) Choose a sequence \(\lambda_2, \lambda_3, \ldots\) in \(K\) such that \(0 < |\lambda_i| < 1\) and \(\lim_i \lambda_2 \lambda_3 \cdots \lambda_i \neq 0.\) Set
    \[y_i := x_i – \lambda_{i+1}x_{i+1}\]
    Then \(\tilde y_i = \tilde x_i\) for all \(i\), i.e. \(\tilde y_1, \tilde y_2, \ldots,\) constitute a \(\tilde K\)-basis of \(\tilde V.\) However, \(x_1\) can not be written as a convergent sum \(x_1 = \sum_{i=1}^\infty a_i y_i\) because then
    \[
    x_1
    = \sum_{i=1}^\infty a_i(x_i – \lambda_{i+1}x_{i+1})
    = a_1x_1 + \sum_{i=2}^\infty (a_i – a_{i-1}\lambda_i)x_i
    \]
    which means \(a_1 = 1\) and \(a_i = a_{i-1}\lambda_i = \lambda_2 \lambda_3 \cdots \lambda_i\) for \(i \geq 2.\) It follows that \(\lim_i |a_iy_i| = \lim_i|a_i| \neq 0.\)
  5. A complete non perfect field. The field \(k((t))\) of Laurent series is complete with respect to the valuation induced by order in \(t.\) However, \(k((t))\) does not contain a \(p\)-th root of \(t\) if the characteristic \(p\) of \(k\) is positive.
  6. There are valued fields \(K\) admitting algebraic extensions \(L \neq K\) such that \(K\) is dense in \(L\) with respect to the spectral norm. In particular, \(L\) is not weakly cartesian over \(K.\) In addition, there are such examples with \(L\) purely inseparable over \(K\), in which case the spectral norm on \(L\) is a valuation.
    1. E.g. one can take \(K\) to be the separable closure of a complete non perfect field \(k\) and \(L\) to be the algebraic closure of \(k.\) \(K\) is dense in \(L\) since there are separable elements arbitrarily close to every element in \(L.\)
    2. For a slightly more explicit example, start with fields \(k\) of characteristic \(p > 0\) with a nontrivial valuation such that the field \(k^{p^{-1}}\) of \(p\)-th roots over \(k\) is of infinite degree over \(k.\) E.g. one can take for \(k\) the field of fractions (or, if completeness is required, the completion of the field of fractions) of the formal power series ring in infinitely many variables over the finite field of \(p\) elements. Let \(L\) be the field of fractions of the ring \(k^{p^{-1}}\langle t \rangle\) of strictly convergent power series over \(k^{p^{-1}}\) equipped with the Gauss norm. Let \(A\) be the set of all series \(\sum_i a_i t^i \in k^{p^{-1}}\langle t \rangle\) such that the coefficients \(a_i\) generate a finite extension of \(k.\) The field \(K\) of fractions of \(A\) is dense in \(L\) (since e.g. it contains all polynomials over \(k^{p^{-1}}\)). Moreover, since \(L \subseteq K^{p^{-1}},\) it follows that \(L\) is algebraic over \(K\) and the Gauss norm on \(L\) is identical to the spectral norm of \(L\) with respect to the Gauss norm on \(K.\) To see that \(L \neq K,\) choose a sequence \(h_i \in k^{p^{-1}}\) such that \(\lim_i |h_i| = 0\) and \([k(h_1, h_2, \ldots, ): k] = \infty.\) Then \(\sum_i h_it^i \in L \setminus K.\)
    3. For an even more explicit example with discrete valuation and finite extension, start with a field \(k\) of characteristic \(p > 0\) such that there is \(f(t) \in k[[t]]\) which is not algebraic over \(k(t)\) (e.g. one can take \(k\) and \(f\) as in Schmidt’s example in the preceding post). Let \(K := k(x,y^p) \subset L := k(x,y).\) As in that example, equip \(L\) with the norm \(|\ |_f\) induced by the valuation on \(k((t))\) via (the pullback of) the homomorphism \(k[x,y] \to k[[t]]\) given by \(x \mapsto t,\) \(y \mapsto f(t).\) Since \(g^p \in K\) for each \(g \in L,\) it follows that the norm on \(L\) is identical to the spectral norm of \(L\) with respect to the norm on \(K.\) Writing \(f(t) := \sum_i a_it^i,\) one gets that \(y – \sum_{j=1}^i a_ix^i \to 0\) as \(i \to \infty,\) so that \(K\) is dense in \(L.\)
  7. The above phenomenon is impossible if \(L/K\) is separable or \(K\) is perfect. Since the the algebraic closure of a perfect field \(K\) is weakly \(K\)-cartesian [BGR 3.5.1/4].
  8. Every perfect or complete field \(K\) is weakly stable, i.e. each finite extension equipped with the spectral norm is weakly \(K\)-cartesian.
  9. The preceding observation suggests that the weak stability of a field \(K\) boils down to whether \(K_\infty := \bigcup_{n \geq 0} K_n\) is weakly cartesian over \(K\), where
    \[K_n := \{x \in \bar K: x^{p^n} \in K\}\]
    (where \(\bar K\) is the algebraic closure of \(K\)), since \(K_\infty,\) being perfect, is weakly stable (i.e. \(\bar K\) is weakly cartesian over \(K_\infty\)). Whether \(K_\infty\) is weakly \(K\)-cartesian depends on whether each \(K_n\) is weakly \(K\)_cartesian, which in turn (via the Frobenius homomorphism \(x \mapsto x^{p^{n-1}}\)) depends on whether \(K_1 = K^{p^{-1}}\) is weakly \(K\)-cartesian. This is the content of [BGR 3.5.3/1].
  10. The field of rational functions over a field in finitely many variables is weakly stable with respect to the valuation induced by degree or order [BGR 3.5.3/4] (this essentially follows from the previous observation together with some computation).
  11. A characteristic zero (in particular, separable) complete field \(K\) and a finite algebraic extension \(L/K\) which is weakly \(K\)-cartesian, but not \(K\)-cartesian; in other words, weak stability \(\not \Rightarrow\) stability. Start with the field \(\qq_2\) of \(2\)-adic numbers. Let \(\alpha_0 := 2, \alpha_1, \alpha_2, \ldots \) be elements algebraic over \(\qq_2\) such that \(\alpha_i^2 = \alpha_{i-1}\) and \(k_i := \qq_2(\alpha_i) = \qq_2(\alpha_0, \ldots, \alpha_i).\) Let \(K\) be the completion of the field \(\bigcup_i k_i\) and \(L := K(\sqrt 3).\)
    Lemma [BGR 3.6.1]. \([L:K] = 2,\) \(|L| = |K|,\) and \(L^\sim = K^\sim = \ff_2,\) the finite field of two elements. In particular, \(L\) is not \(K\)-cartesian.
    The proof is mainly by computation. I tried to solve for \(x^2 = 3\) over \(K.\) It seems that one can write \(\sqrt 3\) as an infinite sum of the form \(\sum_i c_i 2^{\alpha_i}\) with \(|c_i| = 1\) and \(0 = \alpha_0 < \alpha_1 < \alpha_2 < \cdots\) such that \(\lim_i \alpha_i = 1.\) This suggests that \(|\sqrt 3, K| = 2^{-1}\) but there is no \(a \in K\) such that \(|\sqrt 3 – a| = 2^{-1}.\) In particular, \(K\) is not strictly closed in \(L.\)
  12. The above phenomenon is impossible in the following cases:
    1. If the valuation on \(K\) is discrete, since if \(K\) is discretely valued and complete, then every finite dimensional normed vector space \(V\) over \(K\) is \(K\)-cartesian. (The completeness of \(K\) implies that \(V\) is weakly \(K\)-cartesian (so that every \(K\)-subspace of \(V\) is closed) and then the discreteness of the valuation on \(K\) implies that every closed \(K\)-subspace of \(V\) is strictly closed, so that \(V\) is \(K\)-cartesian.)
    2. If \(K^\sim\) has zero characteristic, since in that case \(K\) is stable [BGR 3.6.2/13]. (The main reason – which follows from Zariski-Samuel Vol 2, Chapter VI, Sections 11 (remark following the proof of Theorem 19) and 12 (corollary to Theorem 24) – is that in that case \(\sum_i e_if_i = n\) for all finite extensions of \(K\).)

Notes (part 1): S. Bosch, U Güntzer and R. Remmert, Non-Archimedean Analysis

Semi-normed and normed groups

\(
\newcommand{\mmm}{\mathfrak{m}} \newcommand{\qqq}{\mathfrak{q}} \newcommand{\rr}{\mathbb{R}} \newcommand{\rnneg}{\rr_{\geq 0}} \newcommand{\rpos}{\rr_{> 0}}
\newcommand{\zz}{\mathbb{Z}} \newcommand{\znneg}{\zz_{\geq 0}}
\)All groups are going to abelian with the group operation denoted by “+”. A filtration, a generalization of valuation, is a function \(\nu\) from a group \(G\) to \(\rr \cup \{\infty\}\) such that

  • \(\nu(0) = \infty\)
  • \(\nu(x-y) \geq \min\{\nu(x), \nu(y)\}\)

Filtrations are in one-to-one correspondence with ultrametric functions or semi-norms, i.e. functions \(|\cdot|: G \to \rnneg\) such that

  • \(|0| = 0\)
  • \(|x-y| \leq \max\{|x|, |y|\}\)

A semi-norm \(|\ |\) is called a norm if \(\ker(|\ |) = 0.\) The correspondence between filtrations and semi-norms is given by

  • \(|\cdot| = \alpha^{-\nu(\cdot)}\)
  • \(\nu(\cdot) = -\log_\alpha|\cdot|\)

where \(\alpha\) is any fixed real number greater than \(1.\) There are natural subgroups \(
\begin{align*}
G^0 & := \{g: \nu(g) \geq 0\} = \{g: |g| \leq 1\} \\
G^{\vee} &:= \{g: \nu(g) > 0\} = \{g: |g| < 1\}
\end{align*}
\)

Elementary properties of a semi-norm \(|\cdot|\):

  • \(|-y| = |y|\)
  • \(|x+y| \leq \max\{|x|, |y|\}\)
  • \(|x+y| = \max\{|x|, |y|\}\) if \(|x| \neq |y|\)

A semi-norm \(|\ |\) defines a pseudometric topology on \(G\) via the distance function
\[d(x,y) := |x-y|\]
This makes \(G\) into a topological group, i.e. the group operation \((x,y) \mapsto x – y\) is continuous.

\(G\) is Hausdorff if and only if \(|\ |\) is a norm.

Proposition 1. Every open subgroup of a topological group is also closed.

Proof. For any \(g \in G\), the map \(x \mapsto x + g\) is continuous and has a continuous inverse, so that it is a homeomorphism. Now if \(H\) is a subgroup of \(G\) and \(g \not\in H,\) then \(g + H \subseteq G \setminus H.\) If \(H\) is open, then it follows that \(g + H\) is an open neighborhood of \(g\) completely contained in the complement of \(H,\) so that \(H\) is also closed.

Given a point \(a\) on a semi-normed group \(G\) and \(r > 0,\) let
\(
\begin{align*}
B^0(a,r) & := \{x \in G: |x – a| \leq r\} \\
B^{\vee} (a,r) & := \{x \in G: |x – a| < r\}
\end{align*}
\)
respectively be the closed and the open ball of radius \(r\) centered at \(a.\) Note that

  • \(B^\vee(a,r)\) is open and \(B^0(a,r)\) is closed in \(G\) by definition of the topology on \(G,\)
  • \(B^0(a,r) = a + G^0(r)\) and \(B^{\vee}(a,r) = a + G^{\vee}(r)\) where
    \(
    \begin{align*}
    G^0(r) & := \{x \in G: |x| \leq r\} \\
    G^{\vee}(r) & := \{x \in G: |x| < r\}
    \end{align*}
    \)
  • The ultrametric inequality implies that
    • \(x + G^\vee(r) = G^\vee(r)\) for each \(x \in G^\vee(r)\)
    • \(x + G^\vee(r) \subseteq x + G^0(r) = G^0(r)\) for each \(x \in G^0(r)\)

These observations immediately implies the following results:

Proposition 2. Every ball \(B^0(a,r)\) and \(B^\vee(a,r)\) of positive radius \(r\) is both open and closed in \(G.\) Every sphere \(S(a,r) := \{x \in G: |x-a| = r\}\) of positive radius \(r\) is both open and closed in \(G.\)

Corollary 3. Every normed group \(G\) is totally disconnected (i.e. the path components are precisely the singletons) in the ultrametric topology.

Given a subgroup \(H\) of a semi-normed group \(G\) and \(a \in G,\) the distance from \(a\) to \(H\) is by definition:
\[ |a, H| := \inf_{y \in H} |a + y| \]
The following observation seems to be important:

Proposition 3 (Section 1.1.4, Proposition 2). Let \(G\) be a normed group and \(H\) be a subgroup of \(G\) which is “\(\epsilon\)-dense” for some nonnegative real number \(\epsilon < 1\) in the sense that for all \(g \in G\) there is \(h \in H\) such that \(|g+h| \leq \epsilon|g|.\) Then \(H\) is dense in \(G.\)

Proof. Since \(G\) is normed, it suffices to show that \(|x,H| = 0\) for all \(x \in H.\) If \(\epsilon = 0\) it is obvious. So assume \(\epsilon > 0\) and, in order to proceed by contradiction, that \(|x,H| > 0.\) Since \(\epsilon < 1,\) there is \(h \in H\) such that \(|x+h| < \epsilon^{-1}|x,H|.\) Then by the \(\epsilon\)-density, there is \(h’ \in H\) such that
\[ |x+h+h’| \leq \epsilon|x+h| < |x,H|\]
which gives the required contradiction and proves the proposition.

A subgroup \(H\) of a normed group \(G\) is called strictly closed if for each \(a \in G\) there is \(y \in H\) such that \(|a,H| = |a + y|.\) Since in a normed space the a point is in the closure of a subset if and only if its distance from the subset is zero, it follows that a strictly closed subgroup is also closed. The ultrametric inequality implies that the ball groups \(G^0(r)\) and \(G^\vee(r)\) are strictly closed: we can take \(y = a\) if \(a \in H\) and \(y = 0\) if \(a \not\in H\) (where \(H\) is either \(G^0(r)\) or \(G^\vee(r)\).

Proposition 4 (Section 1.1.5, Proposition 4). Let \(G\) be a normed group and \(H\) be a closed subgroup of \(G\) such that \(|H \setminus \{0\}|\) is discrete in \(\rpos.\) Then \(H\) is strictly closed.

Proof. Given \(x \in G,\) we need to produce \(y \in H\) such that \(|x+y| = |x,H|.\) We may assume \(x \not \in H.\) Since \(H\) is closed, it follows that \(\epsilon := |x,H| > 0.\) Pick \(\delta > 0\). It suffices to show that \(V := \{|x+y’|: y’ \in H,\ \epsilon <|x+y’| < \epsilon + \delta\}\) is finite. Indeed, if \(y’\) is as above, then since \(|x+y’| > |x,H|,\) there is \(y” \in H\) such that \(|x+y’| > |x+y”|.\) But then \(|y’-y”| = |x+y’|,\) so that \(V \subseteq |H \setminus \{0\}|.\) The discreteness assumption on \(|H \setminus \{0\}|\) then implies that \(V\) is finite, as required.

The functor \(G \to G^\sim\): \(G^\sim := G^0/G^\vee.\)

Convergence: in a complete normed group a sum \(\sum_{k \geq 0} a_k\) turns out to be convergent if and only if \(|a_k| \to 0\) as \(k \to \infty.\) It then turns out that if \(\sum_k a_k\) converges, then \(\sum_k a_{\pi_k}\) converges to the same limit for every bijection \(\pi: \znneg \to \znneg.\)

Semi-normed, normed and valued rings

A pair \((A, |\ |)\), where \(A\) is a commutative ring with identity, is called a semi-normed (respectively, normed) ring if

  1. \(|\ |\) is a semi-norm (respectively, norm) on the additive group \(A^+\) of \(A\). (Note: \(A^+\) is simply \(A\) considered an abelian group with respect to \(+\).)
  2. \(|xy| \leq |x||y|\) for each \(x, y \in A\)
  3. \(|1| \leq 1\)

\(|\ |\) is called a valuation and \(A\) is called a valued ring if

  • \(|\ |\) is a norm on \(A,\) and
  • property 2 above is always satisfied with an equality.

Note that for a valuation, property 3 above is redundant: it follows from the second property with equality.

Strictly convergent power series

Let \((A, |\ |)\) be a semi-normed ring. A formal power series \(\sum_{k \geq 0} a_kX^k \in A[[X]]\) is strictly convergent if \(\lim_{k \to \infty} |a_k| = 0.\) The set of all strictly convergent power series over \(A\) is denoted by \(A\langle X \rangle.\) The Gauss norm on \(A\langle X \rangle,\) which we also denote as \(|\ |,\) is defined as:
\[ |\sum_{k \geq 0} a_k X^k| := \max_k |a_k| \]

Normed and faithfully normed modules

Let \((A, |\ |)\) be a normed ring. A pair \((M, |\ |)\) is normed \(A\)-module if

  1. \((M, |\ |)\) is a normed group (with respect to addition in \(M\)), and
  2. \(|ax| \leq |a||x|\) for all \(a \in A,\) \(x \in M.\)

If in addition \(A\) is a valued ring and the second property above always holds with equality, then we say that \((M, |\ |)\) is a faithfully normed \(A\)-module.

Remark. If \(M \neq 0,\) then the condition that “\(A\) is a valued ring” in the definition of a “faithful norm” is unnecessary: indeed picking \(x \neq 0\) in \(M,\) so that \(|x| > 0.\) Now for all \(a_1, a_2 \in A,\) the chain of equalities
\[|a_1a_2||x| = |a_1a_2x| = |a_1||a_2x| = |a_1||a_2||x|\]
implies that \[|a_1a_2 = |a_1||a_2|\]

Examples

  1. Non-closed ideals.
    1. A (non-complete) Noetherian example. Let \(A := k[x]\) where \(k\) is a field equipped with a nontrivial valuation. Consider the topology on \(A\) induced by the Gauss norm.
      Lemma. For each \(a \in K,\) the ideal \(\mmm_a := \langle x – a \rangle\) is closed in \(A\) if and only if \(|a| \leq 1.\)
      Proof. Since \(1 – x^n/a^n \in \mmm_a\) for \(a \in K \setminus 0,\) it follows that \(|1, \mmm_a| \leq |a|^{-n}.\) Therefore if \(|a| > 1,\) then \(|1, \mmm_a| = 0,\) so that \(\mmm_a\) is not closed. On the other hand, if \(|a| \leq 1,\) then we claim that
      \[|1 + (x-a)f| \geq 1\]
      for all \(f.\) Indeed, write \(f = \sum_{j=0}^n c_jx^j\) so that
      \[|1 + (x-a)f| = \max\{|1 – ac_0|, |c_0 – ac_1|, \ldots, |c_{n-1} – ac_n|, |c_n|\}\]
      If \(|1-ac_0| < |1| = 1,\) then one must have \(|ac_0| = |1| = 1,\) so that \(|c_0| = 1/|a| \geq 1.\) Continuing with \(|c_0 – ac_1|\) and so on, it follows that if each of the terms \(|c_0 – ac_1|, \ldots, |c_{n-1} – ac_n|\) is smaller than \(1\), then \(|c_n| = |a|^{-n}\) so that \(|1 + (x-a)f| = |c_n| = |a|^{-n} \geq 1.\) It follows that \(1\) is not in the closure \(\bar \mmm_a\) of \(\mmm_a.\) Since \(\bar \mmm_a\) is an ideal of \(A\) and \(\mmm_a\) is a maximal ideal of \(A,\) this implies that \(\bar \mmm_a = \mmm_a,\) as required.
    2. Contrast the above with the following:
      1. If \(A\) is a complete normed ring, then every maximal ideal is closed. This follows since if a maximal ideal \(\mmm\) is not closed, then its closure, being an ideal, must contain \(1,\) i.e. there are elements in \(\mmm\) arbitrarily close to \(1.\) However, since \(A\) is complete, any element of the form \(1 + f\) with \(|f| < 1\) is invertible.
      2. If \(A\) is complete Noetherian normed ring, then every ideal is closed, at least if \(A\) contains a field \(k\) such that the restriction of the norm on \(k\) is nontrivial. Indeed, then \(A\) is a \(k\)-Banach algebra. Let \(\hat \qqq\) be the completion of an ideal \(\qqq.\) By Noetherianity there is a surjective \(A\)-module map \(\pi: A^n \to \hat \qqq\) which is open due to the Open Mapping Theorem. Then \(\pi((A^\vee)^n)\) is open in \(\hat \qqq.\) Since \(\qqq\) is dense in \(\hat \qqq,\) it follows that \(\hat \qqq = \qqq + \pi((A^\vee)^n).\) Then \(\hat \qqq = \qqq\) by Nakayama’s lemma [BGR 1.2.4/6]. Thus \(\qqq\) is complete, and therefore closed.
    3. A quasi-Noetherian but complete example. Let \(A := k[[x_i: i \geq 1]]\) be the ring of power series ring in infinitely many variables over a field \(k.\) Then \(A\) is a local ring with maximal ideal \(m\) consisting of power series with zero constant term. Let \(m_0\) be the ideal of \(A\) generated by all \(x_i,\) \(i \geq 1.\) Then \(m_0 \subsetneq m,\) since e.g. the element
      \[x_1 + x_2^2 + x_3^3 + \cdots\]
      can not be expressed as an \(A\)-linear combination of finitely many \(x_i.\) However, the closure of \(m_0\) is \(m,\) e.g. with respect to the topology induced by valuation corresponding to the order of power series. This remains true if the topology on \(A\) is induced by a weighted order such the weight of \(x_i\) goes to \(\infty\) as \(i \to \infty,\) even though in that case \(A\) is quasi-Noetherian, i.e. each ideal \(I\) of \(A\) has a (possibly infinite) sequence of elements \(a_i,\) \(i \geq 1,\) such that \(\lim_{i \to \infty}|a_i| = 0,\) and in addition, every \(a \in I\) can be expressed as a possibly infinite \(A\)-linear combination of the \(a_i.\)
  2. “Unexpected” behaviour of residues: equivalent norm with transcendental residue. Take a normed ring \(A\) such that there is real number \(\rho > 1\) such that \(\rho^{-1} \not\in |A|.\) (E.g. one can take \(A\) to be a discrete valuation ring.) Take \(B :=A\langle X \rangle.\) Let \(|\ |_1\) be the Gauss norm on \(B\) and define \(|\ |_2\) on \(B\) as:
    \[ |\sum a_k X^k| := \max\{|a_0|, \rho|a_k|: k \geq 1\}\]
    Then \(|\ |_2\) is also a norm (one needs \(\rho \geq 1\) for the ultametric inequality to hold) and
    \[|\cdot|_1 \leq |\cdot|_2 \leq \rho|\cdot|_1\]
    so that \(|\ |_1\) and \(|\ |_2\) induce the same topology on \(B.\) However,
    \[ B^\sim_1 \cong A^\sim[X],\ \text{whereas}\ B^\sim_2 \cong A^\sim\]
    so that \(B^\sim_1\) is transcendental over \(B^\sim_2.\)
  3. “Unexpected” behaviour of residues: trivial residue. Take a cyclic/principal faithfully normed \(A\)-module \(M := Ax\) such that \(|x|^{-1} \not\in |A|.\) Then \(M^\vee = M^0,\) so that \(M^\sim = 0.\)
  4. A discrete valuation ring \(A\) which is not Japanese, i.e. the field of fractions of \(A\) has a finite algebraic extension \(K\) such that the integral closure of \(A\) in \(K\) is not a finite \(A\)-module [F. K. Schmidt, Math. Z. 41, 443-450, 1936].
    Take a prime \(p.\) Set \(k := (\zz/p\zz)(t_0, t_1, t_2, \ldots)\) where all \(t_i\) are indeterminates. Take a new indeterminate \(T,\) define
    \[ f(T) := t_0 + t_1T + t_2T^2 + \cdots \in k[[T]] \]
    The lemma below implies that \(f\) is not algebraic over \(k[T]\), so that the homomorphism \(k[X,Y] \to k[[T]]\) defined by \(x \mapsto T,\) \(y \mapsto f(T)\) is injective. Let \(K := k(X,Y)\) and \(|\ |_f\) be the valuation on \(K\) induced by the standard valuation on \(k[[T]].\) Let \(Q := k(X, Y^p) \subseteq K\) and \(A\) be the (discrete) valuation ring of the restriction of \(|\ |_f\) to \(Q,\) i.e.
    \[A := Q^0 := \{g \in Q: |g|_f \leq 1\}\]
    The following claim shows that the integral closure \(\bar A\) of \(A\) in \(K\) is not a finite \(A\)-module, as required. The lemma used above is stated and proved after the claim.
    Claim. \(\bar A = K^0 := \{g \in K: |g|_f \leq 1\}.\) Moreover, \(K^0\) is not a finite \(A\)-module.
    Proof. Since \(char(K) = p > 0,\) every \(g \in K^0\) is a \(p\)-th root of some element in \(A.\) On the other hand, if \(g \in K\) satisfies an integral equation
    \[g^n + a_1g^{n-1} + \cdots + a_n = 0\]
    with \(a_i \in A\) then one must have \(|g|_f \leq 1.\) It follows that \(\bar A = K^0.\) To prove the second assertion proceed by contradiction and assume \(e_1, \ldots, e_n \in K^0\) generate \(K^0\) as an \(A\)-module. Since \(1, Y, \ldots, Y^{p-1}\) generate \(K\) as a vector space over \(Q\), there are expressions
    \[e_i = \sum_{j=0}^{p-1}c_{ij}Y^j,\ c_{ij} \in Q\]
    Since \(|X|_f < 1,\) there is \(s \geq 1\) such that \(|c_{ij}X^s|_f \leq 1\) for each \(i,j.\) Then each \(X^se_i\) is in the \(A\)-module \(B\) generated by \(1, Y, \ldots, Y^{p-1}.\) It follows that
    \[X^s K^0 \subseteq B\]
    Let
    \[Z_s := X^{-(s+1)}(Y – t_0 – t_1X – \cdots – t_sX^s) \in K.\]
    Then \(Z_s \in K^0\) and therefore
    \[X^sZ_s = X^{-1}(Y-t_0) – (t_1 + t_2X + \cdots + t_sX^{s-1}) \in B\]
    Since \(t_1 + t_2X + \cdots + t_sX^{s-1} \in A \subseteq B,\) it follows that
    \[X^{-1}(Y – t_0) \in B = A + AY + \cdots + AY^{p-1}\]
    But this is impossible since \(X^{-1} \not\in A,\) and \(1, Y, \ldots, Y^{p-1}\) is a vector space basis of \(K\) over \(Q.\) This contradiction finishes the proof of the claim.
    Lemma. Let \(k\) be a field and \(k_0\) be the prime field of \(k.\) Let \(f := \sum_i c_i T^i \in k[[T]],\) where \(T\) is an indeterminate, be such that \(k_0(c_0, c_1, c_2, \ldots)\) has infinite transcendence degree over \(k_0.\) Then \(f\) is not algebraic over \(k[T].\)
    Proof. Assume to the contrary that there is a nonzero polynomial \(q(T,W) \in K[T,W]\) such that \(q(T, f(T)) = 0.\) Let \(k’\) be the subfield of \(k\) generated by all coefficients of \(q(T,W).\) We claim that each coefficient \(c_n\) of \(f\) is algebraic over \(k’.\) We proceed by induction on \(n\). Write \(q = T^sq_0\) where \(T\) does not divide \(q_0(T,W).\) Then \(q_0(T, f(T)) = 0,\) and in addition, \(q_0(0,W)\) is a nonzero polynomial over \(k’.\) Since \(q(0,f(0)) = q(0,c_0) = 0,\) it follows that \(c_0\) is algebraic over \(k’,\) so that the claim holds for \(n = 0.\) For \(n \geq 1,\) define
    \[
    q'(T,W) := q(T, c_0 + c_1T + \cdots + c_{n-1}T^{n-1} + WT^n) \in k'(c_0, \ldots, c_{n-1})[T,W]
    \]
    Since \(\deg_W(q’) \geq 1\) it is straightforward to check by looking at the highest degree term in \(W\) that
    \[\deg_W(q’) = \deg_W(q) > 0.\]
    In particular, \(q’\) is a nonzero element in \(k'(c_0, \ldots, c_{n-1})[T,W]\) such that
    \[q'(T, c_n + c_{n+1}T + \cdots) = q(T, f(T)) = 0\]
    The \(n = 0\) case of the claim then implies that \(c_n\) is algebraic over \(k'(c_0, \ldots, c_{n-1}),\) so that \(c_n\) is algebraic over \(k’\) due to the induction hypothesis. This proves the claim and consequently, the lemma.
  5. A bounded (in particular, continuous) bijective \(A\)-module map may not have a continuous inverse. Pick a field \(K\) with a non-complete valuation. Pick the completion \(\hat K\) of \(K\) and take \(x \in \hat K \setminus K.\) The map \(\phi: K^2 \to V := K + Kx\) given by \((a,b) \mapsto a + bx\) is a continuous (in fact, bounded) and bijective \(K\)-vector space map. However, \(K\) is not closed in \(V\) so that \(\phi\) is not an homeomorphism; in particular
    1. \(\phi^{-1}:V \to K^2\) is a \(K\)-vector space isomorphism which is not continuous (let alone, bounded)! Indeed, choose \(a_s \in K\) such that \(a_s \to x\) as \(s \to \infty.\) Then \(a_s – x \to 0\) but \(|\phi^{-1}(a_s -x)| = |(a_s, -1)| \geq 1.\)
    2. Contrast the above with the following:
      1. Every \(K\)-linear map from \(K^n\) to a \(K\)-vector space is continuous.
    3. Also note that \(\phi^{-1}\) is not continuous, even though \(\ker(\phi^{-1}) = \{0\}\) is closed in \(V.\) In particular, the following fact does not always hold if \(K\) is replaced by \(K^n\) for \(n \geq 2\): given a \(K\)-linear map \(\lambda: V \to K,\) where \(V\) is a normed \(K\)-vector space, if \(\ker(\lambda)\) is closed, then \(\lambda\) is bounded. (Indeed, if \(x \in V \setminus \ker(\lambda),\) then \(\alpha := |x, \ker(\lambda)| > 0,\) and the \(K\)-faithfulness of the norm implies that \(|ax, \ker(\lambda)| = |a|\alpha\) for all \(a \in K.\) Then for all \(y \in V \setminus \ker(\lambda),\) writing \(y = ax + z\) with \(z \in \ker(\lambda)\) one has that \(|y| \geq |a|\alpha\) so that \(|\lambda(y)| = |a||x| \leq (|x|/\alpha)|y|.\))
    4. In this example, define \(\lambda: V \to K\) as
      \[\lambda(a + bx) := b\]
      Then \(\ker(\lambda) = K\) is not closed in \(V.\) Does it imply that \(V \to V/\ker(\lambda)\) is not continuous? Well, the question is more basic. What is the topology to put on \(V/\ker(\lambda)\)? Since \(\ker(\lambda) = K\) is dense in \(V,\) it follows that in the residue topology every point has zero norm! So in this topology the open sets are only the whole space and the empty set. So the (surjective) map \(V/\ker(\lambda) \to K\) is not continuous with respect to that semi-norm.
  6. For every ring \(A\) equipped with a degenerate valuation. there are \(A\)-module maps which are homeomorphisms (i.e. in particular, continuous), but not bounded. Let \(M_i, := Ae_i\) \(i = 1, 2, \ldots,\) be a sequence of free cyclic \(A\)-modules. Set \(M := \bigoplus M_i.\) For each \(m \in \zz,\) define an \(A\)-module norm \(|\ |_m\) on \(M\) by
    \[ | \sum_{i=1}^n a_ie_i|_m := \max\{i^{-m}|a_i|\} \]
    Since \(|\ |_m \leq |\ |_{m-1},\) the identity map \((M, |\ |_{m-1}) \to (M, |\ |_m)\) is bounded and continuous. The inverse map
    \[\iota_m: (M, |\ |_{m}) \to (M, |\ |_{m-1}) \]
    is not bounded (look at the norms of \(e_i\) for \(i \gg 1\)). However, if \(A\) is a valued ring with a degenerate valuation, then there is \(m\) such that \(\iota_m\) is continuous. Indeed, in that case there are two possibilities:
    1. If \(|\ | \geq 1\) on \(A \setminus \{0\}\) then whenever \(m \leq 0,\) one has \(|\ |_m \geq 1\) on \(M \setminus \{0\},\) so that \(|\ |_m\) induces the discrete topology on \(M\) and every map from \((M, |\ |_m)\) is continuous.
    2. If \(|\ | \leq 1\) on \(A,\) then \(|a_i| \leq |a_i|^{1/m}\) for \(m \geq 1.\) So that for \(m \geq 2\)
      \[
      i^{-m}|a_i| < \epsilon^m
      \Rightarrow |a_i|^{1/m} < i\epsilon
      \Rightarrow |a_i| < i^{m-1} \epsilon
      \Rightarrow |a_ie_i|_{m-1} < \epsilon
      \]
      which implies that \(\iota_m\) is continuous, as claimed.
  7. Contrast the preceding example with the following fact [BGR, Proposition 2.1.8/2]: if \(A\) is equipped with a non-degenerate valuation, then for \(A\)-linear maps between faithfully normed \(A\)-modules, continuity is equivalent to boundedness. The key observation is [BGR, Proposition 2.1.8/1] which says that if the valuation of \(A\) is non-degenerate, then for each faithfully normed \(A\)-module \(M\) there is a fixed \(\rho > 1\) such that for each \(x \in M\setminus \{0\},\) there is \(c \in A\) such that \(1 \leq |cx| < \rho.\)

Notes: Jonathan D. Cryer and Kung-Sik Chan, Time Series Analysis with Applications in R

This post is to record my notes on the first book on time series that I am reading seriously, namely the 2008 edition of Time Series Analysis with Applications in R by Jonathan D. Cryer and Kung-Sik Chan.

Chapter 2. Fundamental Concepts

  1. The model for a time series is a stochastic process, which is a sequence of random variables \((Y_t: t = 0, \pm 1, \pm 2, \ldots)\)
  2. Mean function of the time series \(Y_t\) is \(\mu_t := E(Y_t)\), where \(E\) denotes “expectation.”.
  3. Autocovariance \(\gamma_{t,s} := Cov(Y_t, Y_s)\), where \(Cov\) is the covariance defined as \(Cov(Y_t, Y_s) := E((Y_t – \mu_t)(Y_s – \mu_s)) = E(Y_tY_s) – \mu_t\mu_s.\)
  4. Autocorrelation \(\rho_{t,s} := Corr(Y_t, Y_s)\), where \(Corr\) is the correlation defined as \[Corr(Y_t, Y_s) := \frac{Cov(Y_t, Y_s)}{\sqrt{Var(Y_t)Var(Y_s)}} = \frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}\]
  5. A white noise is a time series \(e_t\) of iid random variables. The white noise processes in this book usually also has zero mean.
    Note: I think the above definition of a white noise is too restrictive even for the purpose of this book. According to Hamilton white noises do not have to be identically distributed, the requirements for \(e_t\) to be a white noise are: each \(e_t\) has the same (usually, zero) mean and the same (finite) variance, and that they have zero autocorrelation, i.e. \(E(e_te_s) = 0\) whenever \(t \neq s\). If in addition \(e_t, e_s\) are independent for all \(t \neq s,\) he calls it an independent white noise process.
  6. A random walk is a time series \((Y_t: t = 1, 2, \ldots ) \) defined as \[Y_t := \begin{cases} e_1 & \text{if}\ t = 1,\\ Y_{t-1} + e_t & \text{if}\ t \geq 2, \end{cases} \] where \(e_t\) is a white noise process with zero mean and finite variance. \(Y_1 = e_1\) is the “initial condition”. Specific realizations of random walks can show “trends” which are not really present.
  7. \(Y_t\) is said to be strictly stationary if for all \(n\) the joint distributions of \(Y_{t_1 – k}, Y_{t_2 – k}, \ldots, Y_{t_n – k}\) are the same for all choices of \(k, t_1, \ldots, t_n\). It is said to be weakly (or second-order) stationary if the mean \(\mu_t\) does not depend on \(t\), and \(\gamma_{t, t-k} = \gamma_{0, k}\) for all time \(t\) and lag \(k\); in that case we write \(\gamma_k\) for \(\gamma_{k, 0}\) and \(\rho_k\) for \(\rho_{k, 0}\). In this book stationary = weakly stationary.
    Note: if \(Y_t\) is weakly stationary, then \(\gamma_k = \gamma_{k,0} = \gamma_{0,k} = \gamma_{-k}\), and similarly \(\rho_k = \rho_{-k}\).

Chapter 3. Trends

  1. Discusses the deterministic trend of a time series, which is defined as the time series of its mean. In particular, in the notation of Chapter 2, the trend of a time series \(Y_t\) is its mean function \(\mu_t\).
    Note: May be sometimes it would be useful to consider to substitute “mean” by “median” or some other function? Well in that case, the difference between \(Y_t\) and “that function” of \(Y_t\) should have good analytic properties.
  2. If the model is \(Y_t = \mu + X_t\) with \(E(X_t) = 0\), then the most common estimate for \(\mu\) is the sample mean \[\bar Y := (\sum_{t=1}^n Y_t)/n.\] Assumptions on \(\rho_k\) can be used to estimate \[Var(\bar Y) = \frac{\gamma_0}{n}[1 + 2 \sum_{k=1}^{n-1}(1 – \frac{k}{n})\rho_k]\]
  3. Cyclical/seasonal trends with period \(M\) can be modeled by a seasonal means model \(Y_t = \mu_t + X_t\) with \(E(X_t) = 0\) and \(\mu_{t-M} = \mu_t\). The \(t\)-values and \(Pr(>|t|)\)-values (reported e.g. in the R-output) are not very interesting, since these correspond to the null hypothesis that \(\mu_t = 0\).
  4. Using a sine-cosine trend model e.g. \[Y_t = \beta_0 + \beta_1 \cos(\frac{2\pi}{M}t) + \beta_2 \sin(\frac{2\pi}{M}t)\] is more “economical” since uses fewer parameters than the seasonal means model, and this translates to smaller variance of estimates (under the hypothesis that the chosen model is correct). For better fitting also include terms of the form \(\cos(\frac{2k\pi}{M}t)\) and \(\sin(\frac{2k\pi}{M}t)\) for \(k > 1\).
  5. Residual analysis
    • Look at the plots of standardized residuals for pattern, e.g. whether too much/too few “runs” (runs test).
    • Standardized residuals vs fitted values, possibly using seasonal symbols for seasonal models.
    • Histogram.
    • Shapiro-Wilk test for normality (which measures the correlation of residuals vs the corresponding normal quantiles – the lower the number the more evidence against normality).
    • Normal quantile-quantile plot.
    • sample autocorrelation of standardized residuals.

Chapter 4. Models for Stationary Time Series

  1. A general linear process is a time series \(Y_t\) that can be represented as a weighted linear combination of present and past white noise terms \[Y_t = \sum_{j \geq 0} \psi_j e_{t – j}\] Without loss of generality one can assume \(\psi_0 = 1\). For the variance to be finite one requires \(\sum_{j \geq 0} \psi_j^2 < \infty\). A general linear process is stationary by construction.
    Note: Wold’s theorem says that any (weakly) stationary process \(Y_t\) admits a decomposition of the form \(Y_t = U_t + X_t\) with \(X_t\) a general linear process and \(U_t\) a purely deterministic stationary process (i.e. a stationary process which can be predicted with arbitrarily small mean squared error by some linear predictors of finitely many past lags of the process) such that \(U_t\) has zero correlation with each white noise component of \(X_t\).
  2. A moving average of order \(q\), in short, \(MA(q)\) process is a general linear process such that \(psi_j = 0\) for \(j > k.\) In this book one uses \(\theta\) as parameters for \(MA(q)\) processes: \[Y_t = e_t – \sum_{j=1}^q \theta_j e_{t-j}.\]
  3. Autocorrelations \(\rho_k\) can be examined by lag plots, i.e. plots of \(Y_t\) vs \(Y_{t-k}\).
  4. For an \(MA(q)\) process, \(\rho_k\) is zero for \(k > q.\)
  5. An autoregressive process of order \(p\), in short, \(AR(p)\) process is a time series of the form \[Y_t = \sum_{j = 1}^p \phi_j Y_{t-j} + e_t\] where \(e_t\), the innovation term, is independent of \(Y_{t-j}\) for each \(j \geq 1\). Also assume, at least in the beginning, that \(E(Y_t) = 0\) for each \(t.\)
    Note: The assumption \(E(Y_t) = 0\) does not impose any restriction (provided we are dealing with processes with finite mean): indeed, writing \(\mu_t\) for \(E(Y_t)\), we have \(\mu_t = \sum_{j=1}^p \phi_j \mu_{t-j} + E(e_t)\), so that replacing \(Y_t\) by \(Y’_t := Y_t – \mu_t\) gives \[Y’_t = \sum_{j=1}^p \phi_j Y’_{t-j} + e’_t\] where \(e’_t := e_t – E(e_t)\), so that \(Y’_t\) is an \(AR(p)\) process with \(E(Y’_t) = E(e’_t) = 0.\)
  6. The characteristic polynomial of the above \(AR(p)\) process \(Y_t\) is \[\phi(x) := 1 – \sum_{j = 1}^p \phi_j x^j\] and the corresponding characteristic equation is \[\phi(x) = 0.\]
  7. \(Y_t\) is stationary if and only if all the roots of the characteristic equation has modulus greater than 1.
    Note: to see this try to “solve” for \(Y_t\) by writing its defining equation as \[\phi(L)(Y_t) = e_t,\] where \(L\) is the “Lag” operator. If \[\phi(L) = \prod_{i=1}^p(1 – \alpha_iL)\], with \(\alpha_i \in \mathbb{C},\) the solution is \[Y_t = \prod_{i=1}^p(1 – \alpha_iL)^{-1}e_t\] which makes sense if each \(|\alpha_i| < 1,\) since then \((1 – \alpha_iL)^{-1}\) can be expanded as \(1 + \alpha_iL + \alpha_i^2L^2 + \cdots .\) Now note that the roots of \(\phi(x) = 0\) are precisely the \(\alpha_i^{-1}\).
  8. Multiplying the defining equation of the \(AR(p)\) process \(Y_t\) by \(Y_{t-k}\), taking expectation, dividing by \(\gamma_0\), and using \(\rho_0 = 1\) and \(\rho_{-k} = \rho_k\) yields the Yule-Walker equations \[\begin{align*} \rho_1 &= \phi_1 + \phi_2 \rho_1 + \phi_3 \rho_2 + \cdots + \phi_p \rho_{p-1} \\ \rho_2 &= \phi_1\rho_1 + \phi_2 + \phi_3 \rho_1 + \cdots + \phi_p \rho_{p-2} \\ &\vdots \\ \rho_p &= \phi_1\rho_{p-1} + \phi_2\rho_{p-2} + \phi_3 \rho_{p-3} + \cdots + \phi_p \end{align*}\] Given \(\phi_1, \ldots, \phi_p\), solving these linear equations gives \(\rho_1, \ldots, \rho_p\), and then iteratively all \(\rho_k\) can be found using the equation: \[\rho_k = \phi_1 \rho_{k-1} + \phi_2 \rho_{k-2} + \cdots + \phi_p \rho_{k-p}\]
  9. A stationary \(AR(p)\) process can be expressed as a general linear process (see the computation of \(\psi\)-coefficients of ARMA processes below).
  10. Autocorrelations of a stationary \(AR(p)\) process does not vanish, but rather “dies off” with increasing lags; the sign of autocorrelations might alternate with lags if correlation with one of the lags is negative and significant. One can try to separate the “damping factor”, “frequency” and “phase” of the autocorrelations (with respect to lags).
  11. A mixed autoregressive moving average process of orders \(p, q\), in short, \(ARMA(p, q)\) process is a time series of the form \[Y_t = \sum_{j = 1}^p \phi_j Y_{t-j} + e_t – \sum_{j=1}^q \theta_j e_{t-j}\] where \(e_t\) is independent of \(Y_{t-j}\) for each \(j \geq 1\). As in the case of \(AR\) processes, if \(E(Y_t)\) and \(E(e_t)\) are finite, then one can assume without loss of generality that \(E(Y_t) = E(e_t) = 0\) for each \(t\).
  12. Exactly as in the autoregressive case, \(Y_t\) is stationary if and only if each root of the characteristic equation \(1 – \sum_{j=1}^p \phi_j x^j = 0\) has modulus greater than \(1.\)
  13. If stationary, then \(Y_t\) can be represented as a general linear process as follows: write \[\psi_0e_t + \psi_1 e_{t-1} + \psi_2 e_{t-2} + \cdots = \phi_1 Y_{t-1} +\cdots + \phi_p Y_{t-p} +e_t – \theta_1 e_{t-1} – \cdots – \theta_q e_{t-q}\] Multiply both sides by \(e_t\), take expectation, and divide by \(\sigma^2_e\) to get \[\psi_0 = 1\] Then multiply both sides by \(e_{t-1}\), and do the same thing to get \[\psi_1 = \phi_1 – \theta_1\] Continuing with \(e_{t-2}\) gives \[\psi_2 = \phi_1(\phi_1 – \theta_1) + \phi_2 – \theta_2 = \phi_1\psi_1 + \phi_2 – \theta_2 \] and so on.

Chapter 5. Models for Nonstationary Time Series

  1. If \(Y_t = \phi Y_{t-1} + e_t\) and \(e_t\) is uncorrelated to \(Y_{t-1},\) then \(Var(Y_t) = \phi^2 Var(Y_{t-1}) + Var(e_t).\) Therefore if \(Y_t\) is stationary and \(\gamma_0 = Var(Y_t) = Var(Y_{t-1}),\) then \(|\phi| < 1.\)
  2. If \(|\phi| > 1,\) and \(e_t\) is an “innovation” (i.e. uncorrelated to \(Y_{t-1}, Y_{t-2}, \ldots\)), then \(Y_t\) cannot be stationary. However, in that case \(Var(Y_t)\) increases exponentially.
  3. If \(|\phi| = 1,\) then \(\nabla Y_t := Y_t – Y_{t-1}\) is stationary with \(\nabla Y_t = e_t.\)
  4. An integrated moving average process of order (\(d, q\)), in short an \(IMA(d,q)\) process, is a process \(Y_t\) such that \(\nabla^d(Y_t)\) is a moving average of order \(q.\)
    Note: the inverse to the “difference operator” \(\nabla\) is the operator \(I\) which maps a time series \((W_t)\) to the time series whose \(i\)-th term is \(\sum_{j = -\infty}^i W_j\), in other words \(I\) is a “sum” or “integration” operator. This is the origin of the term “integrated” in “integrated moving average”.
  5. If \(Y_t = M_t + e_t\) with \(M_t\) changing slowly overtime, then \(IMA\) models appear naturally in the following cases:
    • if \(M_t\) is approximately constant, say \(\beta_t\), over two consecutive time points, say \(t-1\) and \(t,\) then minimizing \((Y_t-\beta_t)^2 + (Y_{t-1} – \beta_t)^2\) yields \(\beta_t = (Y_t + Y_{t-1})/2.\) This implies \((1/2)\nabla Y_t = e_t,\) i.e. an \(IMA(1,0)\) process.
    • if \(M_t = M_{t-1} + \epsilon_t,\) then \(\nabla Y_t = \epsilon_t + e_t – e_{t-1},\) is an \(MA\) process, so that \(Y_t\) is \(IMA\).
      Question: what is the order of \(Y_t\)? Need to understand the order of sums of \(ARIMA\) processes.
    • if \(M_t\) is approximately linear over three consecutive time points or if \(M_t = M_{t-1} + W_t\) with \(\nabla W_t\) stationary, then \(\nabla^2 Y_t\) is \(MA\) with the same autocorrelation function as an \(MA(2)\) process (question: is it \(MA(2)?\)).
  6. \(Y_t \sim ARIMA(p,d,q)\) means \(\nabla^d(Y) \sim ARMA(p,q)\)
  7. \(ARIMA(p,d,q)\) with nonzero mean \(\mu\) is by definition a time series \(Y_t\) such that \(\nabla^d(Y) – \mu \sim ARMA(p,q),\) which implies that \(Y_t = \mu_t + Y’_t,\) where \(\mu_t\) is a deterministic polynomial of degree-\(d\) in \(t\) and \(Y’_t \sim ARIMA(p,d,q)\) (with zero mean).
  8. Special considerations for a time series of positive values:
    • Logarithm appears if \(std.dev(Y_t)\) is approximately proportional to \(E(Y_t).\) Then writing \(\mu_t := E(Y_t),\) one has \(\log(Y_t) = \log(\mu_t) + \log(Y_t/\mu_t) = \log(\mu_t) + \log (1 + (Y_t – \mu_t)/\mu_t) \approx \log(\mu_t) + (Y_t – \mu_t)/\mu_t\) so that \(std.dev(\log(Y_t))\) is approximately proportional to \(std.dev(Y_t)/\mu_t,\) which is constant.
    • Logartihm also appears in modelling the percentage change from one time period to the next.
    • Also, one can try a Power transformation (also probably called Box-Cox), which depends on a parameter \(\lambda\). The appropriate value for \(\lambda\) can come from theory/external information/data. To estimate \(\lambda\) from data one computes the log-likelihood assuming some (e.g. normal) distribution of the data, and then takes some \(\lambda\) which is close to the minimum (say within 95% confidence interval).

Chapter 6. Model Specification

  1. Autocorrelation function (ACF): for \(MA(q)\) processes vanishes for \(lag(k)\) for \(k > q.\)
  2. Partical autocorrelation function (PACF): for \(AR(p)\) processes vanishes for \(lag(k)\) for \(k > p.)
  3. PACF of MA processes behave similarly to ACF of AR processes.
  4. If an \(AR(p)\) model is correct, then the sample partial autocorrelations \(\hat \phi_{kk}\) at lags greater than \(p\) are approximately normally distributed with mean 0 and variance \(1/n\) (Quenoulle, 1949). Thus for \(k > p\), \(\pm 2/\sqrt{n}\) can be used as critical limits on \(\hat \phi_{kk}\) to test the null hypothesis that an \(AR(p)\) model is correct.
  5. Extended ACF (EACF): infinite two dimensional array whose \((k,j)\)-th entry is \(0\) or a “nonzero”-symbol (e.g. \(\times\)) depending on whether the \(lag(j + 1)\) sample correlation of the residual after fitting an \(ARMA(k,j)\) model is “close to zero” or not.
  6. Approximate linear decay of the sample ACF is often taken as a symptom that the underlying time series is nonstationary and requires differencing.
  7. Differencing should be done in succession, and principle of parsimony needs to be followed: models should be simple, but not too simple.
  8. Dickey-Fuller test and its variations can be used to quantify the evidence of nonstationarity.
  9. Orders of \(ARMA(p,q)\) seem to be estimated by AIC, BIC, or their variations, i.e. fit different \(ARMA(p,q)\) models and pick the one with the lowest AIC or BIC.
  10. Once the order is determined, subset models should be considered, e.g. via AIC or BIC.

Chapter 7. Parameter Estimation

  1. Method of moments: sometimes works well for \(AR(p)\) models (provided the parameters are not too close to the boundary of stationary regions), but in general not very well for \(MA(q)\) models.
  2. Least squares: also works better for \(AR(p)\) models. For \(MA(q)\) models the numerical estimate of the minimum involves some choices for the error of the initial time periods, which might have a nontrivial impact on the parameters/predictions for certain classes of models or if the order is not very small compared to the length of the time series.
  3. Maximum likelihood: involves knowledge/assumption about the distribution of the white noise terms; most common assumption is Gaussian. If the maximization of the maximum likelihood is too complicated, one compromise is to minimize the unconditional least square that appears in the maximum likelihood function (under the assumption of Gaussian white noise).
  4. The variance of a method-of-moments estimator tends to be higher than the variance of the corresponding ML estimate; the (conditional) least square estimates may have similar level of variance as ML/unconditional least square estimates.
  5. When the series is not very long, bootstrap gives a probably more reliable estimate of the variance, or more generally, the distribution of functions of parameters, and from which one can estimate any prescribed confidence interval for the parameters.

Chapter 8. Model Diagnostics

  1. Model diagnostics = testing goodness of fit, and if the fit is poor, suggesting appropriate modifications.
  2. Residual analysis: analysis of standardized residuals.
    1. Time series of standardized residuals: adequate model should lead to rectangular scatter of standardized residuals around a zero horizontal level with no trends.
    2. If outliers present (which can be tested, e.g. using the Bonferroni criterion), requires further analysis.
    3. Quantile-Quantile plot of the standardized residuals: compare with the normal quantiles
    4. Autocorrelation of residuals: even though actual residuals do not have autocorrelation (provided the model is correct), the sample residuals may be highly correlated. For larger lags however they are indeed approximately uncorrelated, and have variance approximately \(1/n\).
    5. Ljung-Box test can be applied to sum of squares of autocorrelations.
  3. Another diagnostic is to overfit a slightly general model, then look at the new parameter and the change of the original parameters
  4. While overfitting, one should not increase the order of \(AR\) and \(MA\) parts of an \(ARMA\) or \(ARIMA\) model simultaneously, since any \(ARMA(p,q)\) model can have many different representations as an \(ARMA(p+1, q+1)\) model.

References