# BKK bound – an “elementary proof”

Some comments of Jonathan Korman made me realize that it would have been good to include in How Many Zeroes? an “elementary” proof of the Bernstein-Kushnirenko formula via Hilbert polynomials. This approach only yields a “weak” version of the bound since it applies only to the case that the number of solutions is finite. Nevertheless, it is a rewarding approach since this shows how it can be useful to interpret polynomials, or more generally regular functions on a variety, as linear sections after an appropriate embedding, so that the number of solutions of systems of polynomials can be interpreted as the degree of a variety, and in addition, how the geometric concept of degree of a variety can be interpreted in terms of its Hilbert polynomial, an extremely fruitful algebraic tool discovered by Hilbert which by now occupies a central role in algebraic geometry.

The following posts describe this approach:

1. The first one that introduces the degree of a projective variety,
2. The second post describes the connection between the degree of a projective variety and its Hilbert polynomial
3. The third one proves the weak version of Kushnirenko’s formula using Hilbert polynomials, and sketches the derivation of Bernstein’s and Bézout’s formulae.

# Bernstein-Kushnirenko and Bézout’s theorems (weak version)

$\DeclareMathOperator{\conv}{conv} \newcommand{\dprime}{^{\prime\prime}} \DeclareMathOperator{\interior}{interior} \newcommand{\kk}{\mathbb{K}} \newcommand{\kstar}{\kk^*} \newcommand{\kstarn}{(\kk^*)^n} \newcommand{\kstarnn}[1]{(\kk^*)^{#1}} \DeclareMathOperator{\mv}{MV} \newcommand{\pp}{\mathbb{P}} \newcommand{\qq}{\mathbb{Q}} \newcommand{\rnonnegs}{\mathbb{r}^s_{\geq 0}} \newcommand{\rr}{\mathbb{R}} \newcommand{\scrA}{\mathcal{A}} \newcommand{\scrL}{\mathcal{L}} \newcommand{\scrP}{\mathcal{P}} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\vol}{vol} \newcommand{\znonneg}{\mathbb{Z}_{\geq 0}} \newcommand{\znonnegs}{\mathbb{Z}^s_{\geq 0}} \newcommand{\zz}{\mathbb{Z}}$
In earlier posts we defined the degree of a projective variety (defined over an algebraically closed field $\kk$) and showed that it can be determined from its Hilbert polynomial. In this post we use these notions to

1. obtain Kushnirenko’s formula for the number of solutions of $n$ generic polynomials on $(\kk \setminus \{0\})^n$,
2. sketch a derivation of the formula in Bernstein’s theorem using Kushnirenko’s formula, and
3. deduce the formula in Bézout’s theorem from Bernstein’s formula.

For the proof of Kushnirenko’s formula we use a beautiful argument of Khovanskii (A. G. Khovanskii, Newton Polyhedron, Hilbert Polynomial, and Sums of Finite Sets, Functional Analysis and Its Applications, vol 26, 1992). Bernstein’s formula then follows from multi-additivity properties of intersection numbers and mixed volumes. The formula from Bézout’s theorem is in turn a special case of Bernstein’s formula. Note that the statements we prove in this post are “weak” forms of these formulae in the sense that we only consider the case that the intersection is finite. In the “strong” form (as e.g. in Sections VII.3 and VIII.3 of How Many Zeroes?) these provide an upper bound for the number of isolated solutions, even if the solution set contains non-isolated points.

Following the “toric” tradition, we write $\kstar$ to denote the “torus” $\kk \setminus \{0\}$. Consider the $n$-dimensional torus $\kstarn$ with coordinates $(x_1, \ldots, x_n)$. Every $\alpha = (\alpha_1, \ldots, \alpha_n) \in \zz^n$ corresponds to a monomial $x_1^{\alpha_1} \cdots x_n^{\alpha_n}$ which we denote by $x^\alpha$. A regular function on $\kstarn$, i.e. a Laurent polynomial in $(x_1, \ldots, x_n)$, is a polynomial in $x_1, x_1^{-1}, \ldots, x_n, x_n^{-1}$, and can be expressed as a sum
$f = \sum_\alpha c_\alpha x^\alpha$
such that $c_\alpha \neq 0$ for at most finitely many $\alpha \in \zz^n$; the support of $f$, denoted $\supp(f)$, is the finite set of all $\alpha$ such that $c_\alpha \neq 0$. We say that $f$ is supported at $\scrA$ if $\supp(f) \subseteq \scrA$.

Theorem 1 (Kushnirenko). Given a finite subset $\scrA$ of $\zz^n$, the number of solutions on $\kstarn$ of $n$ generic Laurent polynomials supported at $\scrA$, counted with appropriate multiplicity, is $n!$ times the volume of the convex hull of $\scrA$.

## Proof of Theorem 1: Step 1 – reduction to count of $|s\scrA|$

Let $\alpha_0, \ldots, \alpha_N$ be the elements of $\scrA$. Consider the map
$\phi_\scrA: \kstarn \to \pp^N$
which maps
$x \mapsto [x^{\alpha_0}: \cdots : x^{\alpha_N}]$
Let $X_\scrA$ be the closure of $\phi_\scrA(\kstarn)$ in $\pp^N$. A linear polynomial $\sum_i c_iz_i$, where $[z_0: \cdots :z_N]$ are homogeneous coordinates on $\pp^N$, restricts on the image of $\phi_\scrA$ to a Laurent polynomial $\sum_i c_ix^{\alpha_i}$ supported at $\scrA$. Since $\dim(X_\scrA) = \dim(\phi_\scrA(\kstarn)) \leq n$, a generic codimension $n$ linear subspace of $\pp^N$ either does not intersect $X_\scrA$ (when $\dim(X_\scrA) < n$), or, when $\dim(X_\scrA) = n$, intersects $X_\scrA$ at $\deg(X_\scrA)$ generic points on the image of $\phi_\scrA$, each with multiplicity $1$ (Theorem 4 of Degree of a projective variety). Since in the latter case, for a generic point $z$ on the image of $\phi_\scrA$ there are precisely $\deg(\phi_\scrA)$ points (counted with appropriate multiplicity) in $\phi_\scrA^{-1}(z)$, we have the following:

Observation 2. The number (counted with appropriate multiplicity) of solutions on $\kstarn$ of $n$ generic Laurent polynomials supported at $\scrA$ is either zero (when $\dim(\phi_\scrA(\kstarn)) < n$), or the degree of $X_\scrA$ times the degree of the map $\phi_\scrA$ (when $\dim(\phi_\scrA(\kstarn)) = n$).

Claim 3. Let $G$ be the subgroup of $\zz^n$ generated by $A – \alpha_0 = \{\alpha_i – \alpha_0: i = 0, \ldots, N\}$. Then

1. there is $r \geq 0$, called the rank of $G$, such that $G \cong \zz^r$,
2. $\dim(\phi(\scrA)) = r$,
3. If $r = n$, then $\deg(\phi_\scrA)$ is the index (as a subgroup) of $G$ in $\zz^n$.

Proof. In the coordinates $(z_1/z_0, \ldots, z_N/z_0)$ on $U_0 := \pp^N \setminus V(z_0)$, the map $\phi_\scrA$ reduces to a map from $\kstarn \to \kstarnn{N}$ given by
$x \mapsto (x^{\alpha_1 – \alpha_0}, \ldots, x^{\alpha_N – \alpha_0})$
Now Claim 3 follows from Proposition VI.1 of How Many Zeroes? (the main ingredient in the proof is the observation that there is a basis $\beta_1, \ldots, \beta_n$ of $\zz^n$ such that $m_1\beta_1, \ldots, m_r\beta_r$ is a basis of $G$ for some $r \geq 0$).

In the case that $r < n$, it is clear that the volume of the convex hull $\conv(\scrA)$ of $\scrA$ is zero, and therefore Theorem 1 follows from Observation 2 and Claim 3. So from now on, assume $r = n$. Due to Observation 2 and Claim 3, in order to prove Theorem 1 it remains to compute $\deg(X_\scrA)$. Let $R = \kk[z_0, \ldots, z_N]/I(X_\scrA)$ be the homogeneous coordinate ring of $X_\scrA$. The theory of Hilbert polynomials (Theorems 1 and 2 of the preceding post) imply the following:

Observation 4. For $s \gg 1$, the dimension (as a vector space over $\kk$) of the degree $s$ graded component $R_s$ of $S$ is given by a polynomial $P_\scrA(s)$ of the form
$P_\scrA(s) = \deg(X_\scrA)\frac{s^n}{n!} + \text{terms of degree} < n$
In particular,
$\deg(X_\scrA) = n! \lim_{s \to \infty} \frac{\dim_\kk(R_s)}{s^n}$

Claim 5. For each $s \geq 0$,
$\dim_\kk(R_s) = |s\scrA|$
where $s\scrA$ is the set of all possible $s$-fold sums $\alpha_{i_1} + \cdots + \alpha_{i_s}$ with $\alpha_{i_j} \in \scrA$.

Proof. Indeed, for each $s \geq 0$, the substitutions $z_i = x^{\alpha_i}$, $i = 0, \ldots, N$, induce a $\kk$-linear map
$\sigma:\kk[z_0, \ldots, z_N]_s \to \scrL_{s\scrA}$
where $\kk[z_0, \ldots, z_N]_s$ is the set of homogeneous polynomials of degree $s$ in $(z_0, \ldots, z_N)$ and $\scrL_{s\scrA}$ is the set of Laurent polynomials supported at $s\scrA$. It is straightforward to check that $\sigma$ is surjective and $\ker(\sigma)$ is the set $I(X_\scrA)_s$ of homogeneous polynomials of degree $s$ in $I(X_\scrA)$. Consequently, as a vector space over $\kk$, $R_s$ is isomorphic to $\scrL_{s\scrA}$. In particular,
$\dim_\kk(R_s) = \dim_\kk(\scrL_{s\scrA}) =|s\scrA|$
which completes the proof of Claim 5.

## Proof of Theorem 1: Step 2 – $|s\scrA|$ in terms of $\vol(\conv(\scrA))$

The following lemma is a special case of Proposition 3.7 from Discriminants, Resultants, and Multidimensional Determinants by Gelfand, Kapranov and Zelevinsky.

Lemma 6. Let $\scrP$ be the closure of a bounded open subset of $\rr^n$ such that $\scrP$ contains the origin and the boundary of $\scrP$ is piecewise linear. Then
$\lim_{s \to \infty} \frac{|s\scrP \cap \zz^n|}{s^n} = \vol(\scrP)$
where by $s\scrP$ we denote the “homothetic” set $\{s\alpha : \alpha \in \scrP\}$.

Proof. We proceed by induction on $n$. If $n = 1$, $\scrP$ is a closed interval, say of length $l$, and the lemma follows from the observation that
$sl \leq |s\scrP \cap \zz| \leq sl + 1$
In the general case we associate to each point $\alpha$ in $\zz^n$ a “lattice cube” which is the translation $\alpha + I^n$ of the standard cube
$I^n := \{(x_1, \ldots, x_n) \in \rr^n: 0 \leq x_i \leq 1$
Let $N(s)$ be the number of lattice cubes contained in $s\scrP$. Since the diameter of $I^n$ is $\sqrt{n}$, it follows that
$0 \leq |s\scrP \cap \zz^n| – N(s) \leq a(s)$
where $a(s)$ is the number of elements in $\zz^n \cap s\scrP$ with distance $\leq \sqrt{n}$ from the boundary of $s\scrP$. Similarly, since the volume of $I^n$ is 1,
$0 \leq \vol(s\scrP) – N(s) \leq b(s)$
where $b(s)$ is the volume of the set of all points in $s\scrP$ with distance $\leq \sqrt{n}$ from the boundary of $s\scrP$. It follows from the induction hypothesis and basic properties of volume that both $a(s)$ and $b(s)$ grow as constant times $s^{n-1}$ as $s \to \infty$. Consequently,
$\lim_{s \to \infty} \frac{|s\scrP \cap \zz^n|}{s^n} =\lim_{s \to \infty} \frac{\vol(s\scrP)}{s^n} =\lim_{s \to \infty} \frac{s^n\vol(\scrP)}{s^n} = \vol(\scrP)$
which completes the proof of the lemma.

We formally state a corollary of the observation that $a(s)$ in the proof of Lemma 6 grows as a constant times $s^{n-1}$ as $s \to \infty$.

Proposition 7. Let $\scrP$ and $s\scrP$ be as in Lemma 6. For a fixed $r \in \rr$, let $\scrP’_{s, r}$ be the subset of $s\scrP$ consisting of all points with distance $\leq r$ from the boundary of $s\scrP$. Then
$\lim_{s \to \infty} \frac{|\scrP’_{s, r} \cap \zz^n|}{s^n} = 0$

The main idea of the proof of the following lemma is the same as that of the standard proof of Gordan’s lemma.

Lemma 8. Pick $\alpha_1, \ldots, \alpha_q \in \zz^n$ such that the subgroup generated by the $\alpha_i$ coincides with $\zz^n$. Then there is a constant $C$ with the following property: for every collection of real numbers $\lambda_1, \ldots, \lambda_q$ such that $\sum_i \lambda_i \alpha_i \in \zz^n$, there are integers $l_1, \ldots, l_q$ such that
\begin{align*} &\sum_i l_i\alpha_i = \sum_i \lambda_i \alpha_i,\ \text{and} \\ &\sum_i |\lambda_i – l_i| < C \end{align*}

Proof. Let
$\scrP := \{\sum_i \lambda_i\alpha_i: 0 \leq \lambda_i \leq 1\} \subseteq \rr^n$
Since $\scrP$ is compact and $\zz^n$ is discrete, the intersection $\scrP \cap \zz^n$ is finite. Since the $\alpha_i$ generate $\zz^n$, for each $\alpha \in \scrP \cap \zz^n$, we can fix a representation $\alpha = \sum_i l_i(\alpha)\alpha_i$ with $l_i(\alpha) \in \zz$. Now given $\beta \in \zz^n$ such that $\beta = \sum_i \lambda_i \alpha_i$ with $\lambda_i \in \rr$,
$\alpha := \sum_i (\lambda_i – \lfloor \lambda_i \rfloor) \alpha_i \in \scrP \cap \zz^n$
where $\lfloor \lambda_i \rfloor$ is the greatest integer $\leq \lambda_i$. Consequently,
$\beta = \sum_{i=1}^q \lfloor \lambda_i \rfloor \alpha_i + \alpha = \sum_{i=1}^q (\lfloor \lambda_i \rfloor + l_i(\alpha)) \alpha_i$
with
$\sum_{i=1}^q |\lambda_i – \lfloor \lambda_i \rfloor – l_i(\alpha)| \leq \sum_{i=1}^q |\lambda_i – \lfloor \lambda_i \rfloor| + \sum_{i=1}^q|l_i(\alpha)| \leq q + Q$
where $Q$ is the maximum of $\sum_i|l_i(\alpha)|$ over all $\alpha \in \scrP \cap \zz^n$. Therefore the lemma holds with $C := q + Q$.

Theorem 9. Let $\scrA = \{\alpha_1, \ldots, \alpha_q\}$ be a finite subset of $\zz^n$ and $\scrP := \conv(\scrA) \subseteq \rr^n$. Assume the subgroup of $\zz^n$ generated by all the pairwise differences $\alpha_i – \alpha_j$ coincides with $\zz^n$. Then there is $r \in \rr$ with the following property: for each positive integer $s$, every element of $s\scrP \cap \zz^n$ whose distance from the boundary of $s\scrP$ is greater than or equal to $r$ belongs to the set $s\scrA$ of all $s$-fold sums $\alpha_{i_1} + \cdots + \alpha_{i_s}$ of elements from $\scrA$.

Proof. Fix a nonnegative integer $s$. We want to understand which points of $s\scrP \cap \zz^n$ can be represented as elements from $s\scrA$. Replacing each $\alpha_i$ by $\alpha_i – \alpha_1$ if necessary, we may assume that one of the $\alpha_i$ is the origin. Then
$s\scrA = \{\sum_i l_i\alpha_i:\ l_i \in \znonneg,\ \sum_i l_i \leq s\}$
Moreover, the $\alpha_i$ satisfy the hypothesis of Lemma 8. Let $C$ be the constant prescribed by Lemma 8. If $\alpha \in s\scrP \cap \zz^n$ then
$\alpha = \sum_i \lambda_i \alpha_i$
with nonnegative real numbers $\lambda_i$ such that $\sum_i \lambda_i \leq s$. By Lemma 8, there are $l_i \in \zz$ such that
$\alpha = \sum_i l_i \alpha_i,\ \text{and}\ \sum_i|\lambda_i – l_i| < C$
Consequently, if the $\lambda_i$ further satisfies the following inequalities:
$\lambda_i \geq C,\ \text{and}\ \sum_i \lambda_i \leq s – C$
then the condition $\sum_i|\lambda_i – l_i| < C$ implies that
$l_i > 0,\ \text{and}\ \sum_i l_i < s$
i.e. $\alpha \in s\scrA$. In short, we proved the following:

Observation 10. Let $\scrP_{s,C}$ be the subset of $s\scrP$ consisting of all $\beta = \sum_i \lambda_i \alpha_i$ such that each $\lambda_i \geq C$ and $\sum_i \lambda_i \leq s – C$. Then
$\scrP_{s,C} \cap \zz^n \subseteq s\scrA$

Now we go back to the proof of Theorem 9. Consider the map $\pi: \rr^q \to \rr^n$ that maps the $i$-th standard unit element in $\rr^q$ to $\alpha_i$ for each $i$. Let
$\Delta := \{(\lambda_1, \ldots, \lambda_q): \lambda_i \geq 0\ \text{for each}\ i,\ \sum_i \lambda_i \leq 1\} \subseteq \rr^q$
be the standard simplex in $\rr^q$. Let $\Delta_{s,C}$ be the sub-simplex of $s\Delta$ defined by the inequalities $\lambda_i \geq C$ for each $i$, and $\sum_i \lambda_i \leq s – C$. Then
$\pi(s\Delta) = s\scrP,\ \text{and}\ \pi(\Delta_{s, C}) = \scrP_{s,C}$
Note that every point in $s\Delta$ whose distance from the boundary of $s\Delta$ is $\geq C\sqrt{q}$ belongs to $\Delta_{s,C}$. Khovanskii then argues (in the proof of Theorem 3 of Newton Polyhedron, Hilbert Polynomial, and Sums of Finite Sets) that this implies that “every point in $s\scrP$ whose distance from the boundary of $s\scrP$ is $\geq C\sqrt{q}||\pi||$ is in $\scrP_{s, C}$”, where
$||\pi|| := \sup_{v \neq 0} \frac{||\pi(v)||}{||v||}$
is the “norm” of $\pi$, which also equals the norm of the largest eigenvalue of $\pi$. However, it is not clear how this conclusion follows from the preceding statement: indeed, it is possible to have $\alpha’$ close to the boundary of $s\Delta$, but $\pi(\alpha’)$ to have large distance from the boundary of $s\scrP$, e.g. when one of the $\alpha_i$ is in the interior of $\scrP$, then $se_i$ is in the boundary of $s\Delta$, but $\pi(se_i) = s\alpha’$ is far away from the boundary of $s\scrP$ when $s$ is large. We are going to close this gap by a more precise modification of the argument.

Indeed, consider the collection $S$ of all sets $\scrA’ \subseteq \scrA$ such that $\conv(\scrA’)$ intersects interior of $\scrP$. For each such $\scrA’$, we fix one element
$\alpha_{\scrA’} \in \conv(\scrA’) \cap \interior(\scrP)$
and two representations of $\alpha_{\scrA’}$ as convex combinations of $\scrA’$ and $\scrA$:
$\alpha_{\scrA’} = \sum_{\alpha_i \in \scrA’} \epsilon’_{\scrA’, i} \alpha_i = \sum_{\alpha_i \in \scrA} \epsilon_{\scrA’, i} \alpha_i$
where
$\sum_{\alpha_i \in \scrA} \epsilon_{\scrA’, i} = \sum_{\alpha_i \in \scrA’} \epsilon’_{\scrA’, i} = 1$
and in addition, all $\epsilon_{\scrA’, i}$ and $\epsilon’_{\scrA’, j}$ are positive. For each $\scrA’$, we also fix a large constant $C_{\scrA’}$ such that which we precisely specify a bit further down. Consider a nonnegative $\rr$-linear combination
$\alpha= \sum_i \lambda_i \alpha_i \in s\scrP$
such that
$\sum_i \lambda_i \leq s$
Assume $\alpha \in \scrP_{s, C}$, but the $\lambda_i$ does not satisfy the defining conditions of $\scrP_{s,C}$ from Observation 10, i.e. either some $\lambda_i < C$ or $\sum_i \lambda_i > s – C$. We want to find some conditions under which it is possible to find a different representation of $\alpha$ which does satisfy the defining conditions of $\scrP_{s,C}$.

Claim 11. Pick $\scrA’ \in S$. If $C_{\scrA’}$ is chosen sufficiently large (independently of $s$), then the following holds: if $\lambda_i > \epsilon’_{\scrA’,i}C_{\scrA’}$ for each $\alpha_i \in \scrA’$, then there is a representation
$\alpha = \sum_i \lambda’_i \alpha_i$
with $\lambda’_i \geq C$ for each $i$, and $\sum_i \lambda’_i \leq s – C$.

Proof. Assume $\lambda_i > \epsilon’_{\scrA’,i}C_{\scrA’}$ for each $\alpha_i \in \scrA’$. Then
\begin{align*} \alpha &= \sum_{\alpha_i \in \scrA’} (\lambda_i – \epsilon’_{\scrA’,i}C_{\scrA’}) \alpha_i + C_{\scrA’} \sum_{\alpha_i \in \scrA’} \epsilon’_{\scrA’, i} \alpha_i + \sum_{\alpha_i \not\in \scrA’}\lambda_i \alpha_i \\ &= \sum_{\alpha_i \in \scrA’} (\lambda_i – \epsilon’_{\scrA’,i}C_{\scrA’}) \alpha_i + C_{\scrA’}\sum_{\alpha_i \in \scrA} \epsilon_{\scrA’, i} \alpha_i + \sum_{\alpha_i \not\in \scrA’}\lambda_i \alpha_i \\ &= \sum_{\alpha_i \in \scrA’} ((\lambda_i – \epsilon’_{\scrA’,i}C_{\scrA’}) + \epsilon_{\scrA’, i}C_{\scrA’}) \alpha_i + \sum_{\alpha_i \not\in \scrA’}(\lambda_i + \epsilon_{\scrA’,i}C_{\scrA’}) \alpha_i \\ \end{align*}
If $C_{\scrA’}$ is chosen such that
$\epsilon_{\scrA’, i}C_{\scrA’} \geq 2C\ \text{for each}\ \alpha_i \in \scrA’$
then in the final representation of $\alpha$ above, each coefficient is $\geq 2C \geq C$, so to ensure that $\alpha \in \scrP_{s, C}$ it only remains to bound the sum of the coefficients by $s – C$. However, that is easy: first note that during the above change of representation for $\alpha$, the sum of the coefficients is unchanged, since we replace a convex combination (where the sum of coefficients is $1$) by another convex combination. Denote the coefficients of the final representation of $\alpha$ above by $\tilde \lambda_i$. Since we originally had $\sum_i \lambda_i \leq s$, after the change of representation we have
$\sum_i \tilde \lambda_i \leq s,\ \text{and}\ \tilde \lambda_i \geq 2C\ \text{for each}\ i$
Now recall that there is $i_0$ such that $\alpha_{i_0}$ is the origin. Consequently we can simply reduce $\tilde \lambda_{i_0}$ to $\tilde \lambda_{i_0} – C$ to satisfy all defining conditions of $\scrP_{s, C}$. This completes the proof of Claim 11.

Now we are ready to prove Theorem 9. Replacing $\scrA$ by $\scrA – \alpha_i$ for some $i$ if necessary, we may assume that the origin is a vertex of $\scrP$. Due to Observation 10, it suffices to show that there is $r$ independent of $s$ such that all elements of $s\scrP \setminus \scrP_{s, C}$ has distance smaller than $r$ from the boundary of $s\scrP$. Pick $\alpha \in s\scrP \setminus \scrP_{s, C}$ and a representation
$\alpha = \sum_i \lambda_i \alpha_i$
such that each $\lambda_i$ is nonnegative and $\sum_i \lambda_i \leq s$. Claim 11 implies that for each subset $\scrA’$ of $\scrA$ such that $\conv(\scrA’)$ intersects the interior of $\scrP$, and each $\alpha_i \in \scrA’$,
$\lambda_i \leq \epsilon’_{\scrA’, i}C_{\scrA’}$
Consequently, if
$C’ := \max_{\scrA’, i} \epsilon’_{\scrA’, i} C_{\scrA’}$
then there are two possible cases:

Case 1: $\lambda_i < C’$ for each $i$. Since by our assumption the origin is a vertex of $s\scrP$, it then follows that the the distance of $\alpha$ from the boundary is bounded by
$r_1 := \max\{||\sum_i \epsilon_i\alpha_i || : 0 \leq \epsilon_i \leq C’\}$

Case 2: there is $i$ such that $\lambda_i \geq C’$. In that case Claim 11 implies that all $\alpha_i$ such that $\lambda_i \geq C’$ are contained in a proper face $\scrP’$ of $\scrP$. We consider two subcases:

Case 2.1: $\scrP’$ contains the origin. In this case $s’\scrP’ \subseteq s\scrP’$ whenever $0 \leq s’ \leq s$, and consequently,
$\alpha’ := \sum_{\alpha_i \in \scrP’} \lambda_i \alpha_i \in s\scrP’$
In particular, $\alpha’$ is on the boundary of $s\scrP$ and
$||\alpha – \alpha’|| = || \sum_{\alpha_i \not\in \scrP’}\lambda_i\alpha_i|| \leq r_1$

Case 2.2: $\scrP’$ does not contain the origin. Let $\scrA^* \subseteq \scrP’$ be the set of all $\alpha_i$ such that $\lambda_i \geq C’$. We are going to show that
$\sum_{\alpha_i \in \scrA^*} \lambda_i \geq s – qC’$
(recall that $q := |\scrA|$). Indeed, otherwise we would have
$\sum_i \lambda_i = \sum_{\alpha_i \in \scrA^*} \lambda_i + \sum_{\alpha_i \not\in \scrA^*} \lambda_i < s – qC’ + (q-1)C’ = s – C’$
Consequently, if $\alpha_{i_0}$ is the origin, then we have another representation for $\alpha$ for which the sum of the coefficients is $\leq s$:
$\alpha = \sum_{i \neq i_0} \lambda_i \alpha_i + (\lambda_{i_0} + C’)\alpha_{i_0}$
However, since $\alpha_{i_0} \not\in \scrP’$, it follows that $\scrA’ := \scrA^* \cup \{\alpha_{i_0}\}$ is one of the sets considered in Claim 11, and therefore Claim 11 would imply that $\alpha \in \scrP_{s, C}$ which is a contradiction to the choice of $\alpha$. This proves that
$\sum_{\alpha_i \in \scrA^*} \lambda_i \geq s – qC’$
Pick $\alpha_{i’} \in \scrP’$ and let
$\alpha’ := \sum_{\alpha_i \in \scrP’} \lambda_i \alpha_i + (s – \sum_{\alpha_i \in \scrP’} \lambda_i) \alpha_{i’} \in s\scrP’$
Then $\alpha’$ is in the boundary of $s\scrP$ and
\begin{align*} ||\alpha – \alpha’|| &= || \sum_{\alpha_i \not\in \scrP’}\lambda_i\alpha_i – (s – \sum_{\alpha_i \in \scrP’} \lambda_i) \alpha_{i’} || \leq r \end{align*}
where
$r := \max\{||\sum_i \epsilon_i\alpha_i || : -qC’ \leq \epsilon_i \leq C’\}$

Since $r \geq r_1$, we showed that in all case the distance of $\alpha$ from the boundary of $s\scrP$ is $\leq r$, as required to complete the proof of Theorem 9.

The following result describes the connection between $|s\scrA|$ and the volume of the convex hull of $\scrA$ that we were aiming for in this section:

Corollary 12. Let $\scrA = \{\alpha_1, \ldots, \alpha_q\}$ be a finite subset of $\zz^n$. Assume the subgroup $G$ of $\zz^n$ generated by all the pairwise differences $\alpha_i – \alpha_j$ is a subgroup of finite index $m$ in $\zz^n$. Then
$\lim_{s \infty} \frac{|s\scrA|}{s^n} = \frac{1}{m}\vol(\conv(\scrA))$
where $s\scrA$ of all $s$-fold sums $\alpha_{i_1} + \cdots + \alpha_{i_s}$ of elements from $\scrA$.

Proof. In the case that $m = 1$, the result follows immediately from combining Lemma 6, Proposition 7 and Theorem 9. In the general case, there is a basis $\beta_1, \ldots, \beta_n$ such that $m_1\beta_1, \ldots, m_n\beta_n$ is a basis of $G$ with
$m = \prod_i m_i$
Note that $\beta_1, \ldots, \beta_n$ is also a basis of $\rr^n$, so the map $\psi:\rr^n \to \rr^n$ that sends
$\beta_i \mapsto \frac{1}{m_i}\beta_i$
is well defined. Note that $\psi$ is an isomorphism over $\rr$ and maps $G$ onto $\zz^n$. The $m = 1$ case of the corollary then implies that
$\lim_{s \infty} \frac{|s\scrA|}{s^n} = \vol(\conv(\psi(\scrA)))$
Now note that the change of coordinates between the standard basis of $\rr^n$ and $\beta_1, \ldots, \beta_n$ is given by a matrix with determinant one (since it is a matrix with integer entries and its inverse is also a matrix with integer entries), and therefore it preserves volumes. However, after this change of coordinates $\psi$ corresponds to a diagonal matrix with diagonal $(1/m_1, \ldots, 1/m_n)$ and determinant $1/\prod_i m_i$. Consequently,
$\vol(\conv(\psi(\scrA))) = \vol(\conv(\scrA))/\prod_i m_i = \vol(\conv(\scrA))/m$
which completes the proof of Corollary 12.

## Proof of Theorem 1: Step 3 – conclusion

Given a finite subset $\scrA$ of $\zz^n$, let $M$ be the number of solutions on $\kstarn$ of $n$ generic Laurent polynomials supported at $\scrA$, counted with appropriate multiplicity. Let $G$ be the subgroup of $\zz^n$ generated by all the pairwise differences of elements of $\scrA$. As explained in the paragraph preceding Observation 4, if the rank of $G$ is smaller than $n$, then both $M$ and the volume of the convex hull of $\scrA$ are zero, so that Theorem 1 is true. On the other hand, if rank equals $n$, then the index of $G$ in $\zz^n$ is finite; call it $m$. Then Observation 2, Claim 3, Observation 4 and Claim 5 imply that
$M = n!m\lim_{s \to \infty} \frac{|s\scrA|}{s^n}$
Corollary 12 then implies that
$M = n! \vol(\conv(\scrA))$
which proves Theorem 1.

## Bernstein and Bézout’s formulae

Given bounded convex sets $\scrP_1, \ldots, \scrP_s$ in $\rr^n$, the function $\rnonnegs \to \rr$ which maps
$(\lambda_1, \ldots, \lambda_s) \mapsto \vol(\lambda_1 \scrP_1 + \cdots + \lambda_s \scrP_s)$
is given by a (necessarily unique) homogeneous polynomial of degree $n$ in the $\lambda_i$ (see e.g. Theorem V.39 of How Many Zeroes? for the case that each $\scrP_i$ is a polytope). For each $\alpha = (\alpha_1, \ldots, \alpha_s) \in \znonnegs$ such that $\sum_i \alpha_i = n$, we write $\nu_\alpha(\scrP_1, \ldots, \scrP_s)$ for the coefficient of $\lambda^\alpha$ in the expression of $\vol(\lambda_1 \scrP_1 + \cdots + \lambda_s \scrP_s)$ as a polynomial in $\lambda_1, \ldots, \lambda_s$. It then follows from elementary properties of commutative semigroups that the map that sends
$(\scrP_1, \ldots, \scrP_n) \mapsto \nu_{(1, \ldots, 1)}(\scrP_1, \ldots, \scrP_n)$
is a symmetric multiadditive function on the set of $n$-tuples of bounded convex bodies in $\rr^n$ (see e.g. How Many Zeroes?, Lemma B.59); it is called the mixed volume of $\scrP_1, \ldots, \scrP_n$ and we denote it by $\mv(\scrP_1, \ldots, \scrP_n)$.

Lemma 13. Any symmetric multiadditive function $\rho: G^n \to \rr$ from a commutative semigroup $G$ is uniquely determined by its “diagonal part”, i.e. the map $G \to \rr$ that maps
$g \in G \mapsto \rho(g, \ldots, g)$
More precisely,
$\rho(g_1, \ldots, g_n) = \frac{1}{n!}\sum_{I \subseteq \{1, \ldots, n\}} (-1)^{n-|I|} \rho(\sum_{i \in I}g_i, \ldots, \sum_{i \in I}g_i)$
In particular, $\mv$ is the unique symmetric multiadditive function on the set of $n$-tuples of bounded convex bodies (or the set of $n$-tuples of convex polytopes) such that $\mv(\scrP, \ldots, \scrP) = n! \vol(\scrP)$.

For a proof of Lemma 13, see e.g. this MathOverflow post or How Many Zeroes?, Corollary B.62.

The following is the (weak form of) the theorem of David Bernstein that we were after.

Theorem 14 (Bernstein). Given finite subsets $\scrA_1, \ldots, \scrA_n$ of $\zz^n$, the number (counted with appropriate multiplicity) of solutions on $\kstarn$ of $f_1, \ldots, f_n$, where each $f_i$ is a generic Laurent polynomial supported at $\scrA_i$, is the mixed volume of the convex hulls of $\scrA_i$.

Proof (sketch). Consider the map $\rho$ that sends $(\scrA_1, \ldots, \scrA_n)$ to the number (counted with appropriate multiplicity) of solutions on $\kstarn$ of generic Laurent polynomials supported at $\scrA_i$. Due to Theorem 1 and Lemma 13, it suffices to show that $\rho$ is symmetric and multiadditive. That can be proved easily once one proves the correct “non-degeneracy” condition that identifies when $f_1, \ldots, f_n$ are “sufficiently generic” (see How Many Zeroes?, Claim VII.28).

As a corollary of Theorem 14 we almost immediately obtain a weak form of Bézout’s theorem.

Theorem 15 (Bézout). The number of solutions (counted with appropriate multiplicity) on $\kk^n$ of generic polynomials $f_1, \ldots, f_n$ of degree respectively $d_1, \ldots, d_n$ is $\prod_i d_i$.

Proof (sketch). Every polynomial $f_i$ of degree $d_i$ in $n$ variables is supported at the simplex $\scrP_i \subseteq \rr^n$ with vertices at the origin and $d_ie_j$, $j = 1, \ldots, n$ (where the $e_j$ are the standard unit elements along the axes of $\rr^n$). If $f_i$ are generic then all solutions of $f_1, \ldots, f_n$ on $\kk^n$ actually belong to $\kstarn$. Consequently, Theorem 15 follows from Theorem 14 if we can show that the mixed volume of $\scrP_1, \ldots, \scrP_n$ is $\prod_i d_i$. However, that follows from an easy computation (see How Many Zeroes?, Claim VIII.2.1).

# Degree of a variety via Hilbert polynomial

$$\newcommand{\dprime}{^{\prime\prime}} \newcommand{\kk}{\mathbb{K}} \newcommand{\pp}{\mathbb{P}} \newcommand{\qq}{\mathbb{Q}} \newcommand{\rr}{\mathbb{R}} \newcommand{\zz}{\mathbb{Z}}$$
As presented in the preceding post, the degree of a subvariety $$X$$ of a projective space $$\pp^n$$ is the number of points of intersection of $X$ and a “generic” $n – m$ dimensional linear subspace of $\pp^n$, where $m:= \dim(X)$. In this post we show that the degree of a variety can be computed in terms of the Hilbert Polynomial of its homogeneous coordinate ring. The presentation follows Mumford’s Algebraic Geometry I: Complex Projective Varieties, Section 6C.

The basis of this theory, as in many parts of algebraic geometry, is a beautiful result of Hilbert.

Theorem 1. Let $$M$$ be a finitely generated graded module over $$R_n := \kk[x_0, \ldots, x_n]$$, i.e. $$M$$ is an $$R_n$$-module and there is a direct sum decomposition $$M = \bigoplus_{k \geq k_0} M_k$$ such that for all homogeneous polynomial $$f$$ in $$R_n$$, $fM_k \subseteq M_{k + \deg(f)}$
Then there is a polynomial of $$P_M(t) \in \qq[t]$$ of degree at most $$n$$ such that $$\dim(M_k) = P_M(k)$$ for all sufficiently large $$k$$.

$P_M$ as in Theorem 1 is called the Hilbert polynomial of $M$. If $M = \kk[x_0, \ldots, x_n]/I(X)$, where $X$ is a subvariety of $\pp^n$ and $I(X)$ is the homogeneous coordinate ring of $X$, then $P_M$ is called the Hilbert polynomial of $X$.

Proof of Theorem 1. Proceed by induction on $$n$$. If $$n = -1$$, then $$R_n = \kk$$, and $$M$$, being a finitely generated module over $$\kk$$, must be a finite dimensional vector space over $$\kk$$. Consequently, $$M_k = 0$$ for all sufficiently large $$k$$, and the theorem holds with $$P_M \equiv 0$$. In the general case, consider the map $$M \to M$$ given by $$m \to x_nM$$. The kernel of the map is
$M’ := \{m \in M: x_nm = 0\}$
and the cokernel is
$M\dprime := M/x_nM$
Both $$M’$$ and $$M\dprime$$ are finitely generated graded $$R_n$$-modules in which multiplication by $$x_n$$ is $$0$$. Consequently, these are finitely generated graded $$R_{n-1}$$-modules and by the induction hypothesis, there are polynomials $$P’$$ and $$P\dprime$$ of degree at most $$n-1$$ such that $$\dim(M’_k) = P'(k)$$ and $$\dim(M\dprime_k) = P\dprime(k)$$ for $$k \gg 0$$. Now for all $$k$$ the multiplication by $$x_n$$ maps $$M_k$$ to $$M_{k+1}$$ and induces an exact sequence of vector spaces over $$\kk$$:
$0 \to M’_k \to M_k \to M_{k+1} \to M\dprime_{k+1} \to 0$
Consequently,
\begin{align*} \dim(M_{k+1}) – \dim(M_k) &= \dim(M\dprime_{k+1}) – \dim(M’_k) \\ &= P\dprime(k+1) – P'(k) \end{align*}
where the first equality holds for all $$k$$ and the second equality holds for $$k \gg 0$$. Now, given any $$f(t) \in \qq(t)$$ of degree $$d$$, there is a polynomial $$g(t)$$ of degree $$d+1$$ such that
$g(t+1) – g(t) \equiv f(t)$
(note that $$(t+1)^{d+1} – t^{d+1} = (d+1)t^d +$$ a polynomial with a degree smaller than $$d$$, and use induction on $$d$$). Consequently there is a polynomial $$Q(t) \in \qq(t)$$ of degree at most $$n$$ such that
$Q(k+1) – Q(k) \equiv P\dprime(k+1) – P'(k)$
Then for $$k \gg 0$$,
$\dim(M_{k+1}) – Q(k+1) = \dim(M_k) – Q(k)$
In other words,
$\dim(M_k) = Q(k) + \text{a constant}$
for $$k \gg 0$$, as required to complete the proof of Theorem 1.

Example 2. If $$M = \kk[x_0, \ldots, x_n]$$ itself, then $$\dim(M_k)$$ is the dimension of the vector space of homogeneous polynomials of degree $$k$$, which is equal to the number of distinct monomials of degree $$k$$ in $$n+1$$ variables. The latter number is the same as the number of ways to place $$n$$ identical dividers in $$k+n$$ positions, which is
\begin{align*} \binom{k+n}{n} &= \frac{(k+n)(k+n-1) \cdots (k+1)}{n!} \\ &= \frac{k^n}{n!} + \text{a polynomial of degree lower than}\ n \end{align*}
In particular, the Hilbert polynomial of $$M$$ is
$P_M(t) = \frac{(t+n)(t+n-1) \cdots (t+1)}{n!}$

Example 3. If $$M = \kk[x_0, \ldots, x_n]/\langle f \rangle$$, where $$f$$ is homogeneous of degree $$d$$, then multiplication by $$f$$ induces an exact sequence
$0 \to \kk[x]_{k-d} \to \kk[x]_k \to M_k \to 0$
where we write $$\kk[x]$$ for $$\kk[x_0, \ldots, x_n]$$. Then for all $$k$$
\begin{align*} \dim(M_k) &= \dim(\kk[x]_k) – \dim(\kk[x]_{k-d}) \\ &= \binom{k+n}{n} – \binom{k+n-d}{n} \end{align*}
Write $$\binom{k+n}{n} = \sum_{j=0}^n a_jk^j$$. In Example 2 above we have seen that $$a_n = 1/n!$$. Consequently,
\begin{align*} P_M(t) &= \sum_{j=0}^n a_jt^j – \sum_{j=0}^n a_j(t-d)^j \\ &= \frac{t^n – (t-d)^n}{n!} + \sum_{j=0}^{n-1} a_j(t^j – (t-d)^j) \\ &= \frac{dt^{n-1}}{(n-1)!} + \text{a polynomial of degree lower than}\ n – 1 \end{align*}

Note that in both the above examples, the Hilbert polynomial equals $$\dim(M_k)$$ for all $$k$$, not only when $$k$$ is large. The following is an example where the equality holds only when $$k$$ is sufficiently large.

Example 4. Let $$M = \kk[x_0, \ldots, x_n]/I(S)$$ where $$S = \{P_1, \ldots, P_s\}$$ is a finite set of points in $$\pp^n$$ and $$I(S)$$ is the ideal of all homogeneous polynomials vanishing on all points of $$S$$. After an appropriate (linear) change of coordinates if necessary, we may assume that $$x_0|_{P_i} \neq 0$$ for any $$i$$. Then for all $$k$$ there is an exact sequence:
$0 \to I(S)_k \to \kk[x]_k \xrightarrow{\phi_k} \kk^s$
where we write $$\kk[x]$$ for $$\kk[x_0, \ldots, x_n]$$ and the map $$\phi_k$$ is defined as follows:
$\phi_k(f) := (\frac{f}{x_0^k}(P_1), \ldots, \frac{f}{x_0^k}(P_s))$
Consequently, $$\dim(M_k)$$ is the dimension of the image of $$\phi_k$$. Now, assume
$|\kk| \gg s$
Under this condition we are going to show that for $$k \geq s-1$$, there are homogeneous polynomials $$f_1, \ldots, f_s$$ of degree $$k$$ such that
\begin{align*} \frac{f_i}{x_0^k}(P_j) &= 0,\ i \neq j \\ \frac{f_i}{x_0^k}(P_j) &\neq 0, i = j \end{align*}
Indeed, consider the set $E_{i,j}$ of all linear polynomials $h$ such that $(h/x_0)(P_i) = (h/x_0)(P_j)$. Then $E_{i,j}$ is a proper codimension one linear subspace of the $\kk$-vector space $\kk[x]_1$ of linear homogeneous polynomials in $(x_0, \ldots, x_n)$. We claim that if $|\kk|$ is sufficiently large, then
$\kk[x]_1 \neq \bigcup_{i,j}E_{i,j}$
In fact we show more generally (following a MathOverflow answer) that an affine space $V$ over $\kk$ (recall that an affine space is a set which satisfies all axioms of a vector space except for those involving the “zero element”) can not be covered by the union of finitely many proper affine hyperplanes $H_1, \ldots, H_r$ if $|\kk| > r$. Indeed, it is clearly true for $r = 1$. For $r > 1$, consider translations $H_1 + a$ with $a \in \kk$. Since $|\kk| > r$, there is $a \in \kk\setminus\{0\}$ such that $H_1 + a$ is not equal to any $H_j$ for $j > 1$. Consequently, $H_j \cap (H_1 + a)$, $j = 2, \ldots, r$, are proper affine hyperplanes of the affine space $H_1 + a$. Now we are done by induction. Consequently, if $|\kk| > \binom{s}{2}$, we can choose
$h \in \kk[x]_1 \setminus \bigcup_{i,j}E_{i,j}$
Then $$a_i := (h/x_0)(P_i)$$ are pairwise distinct, and for $k \geq s-1$ we can construct the $$f_i$$ as follows:
$f_i := x_0^{k-s + 1}\prod_{j \neq i} (h-a_jx_0)$
In any event, this implies that for $$\dim(M_k) = s$$ for $$k$$ sufficiently large. In particular, $$P_M(t) = s$$ is a constant polynomial.

The examples above have the following properties: the leading coefficient of a Hilbert polynomial of a module is of the form $dt^n/n!$ for some integer $d$, and moreover, the degree of the Hilbert polynomial $P_X$ of a variety $X$ equals $\dim(X)$. As the following result shows, this is true in general, and moreover, the coefficient of the leading term of $P_X$ equals $\deg(X)$.

Theorem 2. Let $X$ be an irreducible variety in $\pp^n$ of dimension $m$ and degree $d$. Then its Hilbert polynomial has the form
$P_X(t) = d\frac{t^m}{m!} + \text{terms of lower degree}$
In particular, both dimension and degree of $X$ can be determined from the leading term of $P_X$.

Proof. Example 3 above and Proposition 1 in the preceding post show that the theorem holds when $X$ is a hypersurface. Now choose linear subspaces $H’, H$ of $\pp^n$ of dimension respectively $n-m-2$ and $m+1$ such that $H’ \cap X = H’ \cap H = \emptyset$ and the projection $\pi : \pp^n\setminus H’ \to H$ (which we defined in the preceding post) restricts to a birational map on $X$ (this is possible due to Lemma 2 of the preceding post). Since $\deg(X) = \deg(\pi(X))$ (Theorem 3 of the preceding post) and $\pi(X)$ is a hypersurface in $H$, it suffices to show that $P_X$ and $P_{\pi(X)}$ differ by terms of degree $< m$. Indeed, choose coordinates $[x_0: \cdots :x_n]$ on $\pp^n$ such that $H’ = V(x_0, \ldots, x_{m+1})$ and
$\pi: [x_0: \cdots :x_n] \in \pp^n \setminus H’ \mapsto [x_0: \cdots: x_{m+1}] \in \pp^{m+1}$
where we identify $H$ with $\pp^{m+1}$. Let $R, R’$ respectively be the homogeneous coordinate rings of $X$ and $X’ := \pi(X)$. Both $R$ and $R’$ are integral domains and
\begin{align*} R &= \kk[x_0, \ldots, x_n]/I(X) \\ &\supseteq \kk[x_0, \ldots, x_{m+1}]/(I(X) \cap \kk[x_0, \ldots, x_{m+1}]) \\ &= \kk[x_0, \ldots, x_{m+1}]/I(X’) \\ &= R’ \end{align*}
Claim 2.1. $R$ is integral over $R’$.

Proof. Fix an arbitrary $i$, $m+1 < i \leq n$ and factor $\pi|_X$ as $\psi’_i \circ \psi_i$ where
\begin{align*} \psi_i &: [x_0: \cdots : x_n] \in X \mapsto [x_0: \cdots :x_{m+1}: x_i] \in \pp^{m+2}\\ \psi’_i &: [x_0: \cdots : x_{m+1}: x_i] \in \pp^{m+2} \setminus V(x_i) \mapsto [x_0: \cdots :x_{m+1}] \in \pp^{m+1} \end{align*}
Since $H’ \cap X = \emptyset$, $[0: \cdots : 0: 1] \not\in \psi_i(X)$ and therefore
$x_i \in \sqrt{\langle x_0, \ldots, x_{m+1} \rangle + I(\psi_i(X))}$
Since $I(\psi(X)) = I(X) \cap \kk[x_0, \ldots, x_{m+1}, x_i]$ is homogeneous, it follows that there is $k \geq 1$ such that
$(x_i)^k + \sum_{j=1}^k h_i(x_0, \ldots, x_{m+1})(x_i)^{k-j} \in I(X)$
for certain homogeneous polynomials $h_i \in \kk[x_0, \ldots, x_{m+1}]$. In particular, the image of $x_i$ in $R = \kk[x_0, \ldots, x_n]/I(X)$ is integral over $R’$. Since $i$ was arbitrary, it follows that $R$ is integral over $R’$, as claimed.

Claim 2.2. $R$ and $R’$ have the same quotient field.

Proof. In general, if $Y$ is an irreducible subvariety of $\pp^n$ such that $Y \not\subseteq V(x_0)$, then there is a map from the homogeneous coordinate ring $S$ of $Y$ to the ring of polynomials in $x_0$ over its coordinate ring $\kk[Y]$ which maps a homogeneous element $f \in S$ of degree $s$ to $(f/x_0^s)(x_0^s) \in \kk[Y][x_0]$. This map is clearly injective, and since it also maps $x_0 \mapsto x_0$, it induces an isomorphism between the quotient field of $S$ and $\kk(Y)(x_0)$. Since $X$ and $\psi(X)$ are birational and the quotient field of $R$ contains that of $R’$, this implies that these quotient fields are equal and map to $\kk(X)[x_0]$ via the above isomorphism.

Now we go back to the proof of Theorem 2. Since both $R$ and $R’$ are Noetherian, Claim 2.1 implies that $R$ is a finitely generated module over $R’$. Choose homogeneous generators $f_1, \ldots, f_r$ of $R$ as a module over $R’$. By Claim 2.2 each $f_i$ is of the form $g_i/h_i$ with $g_i, h_i \in R’$. Moreover, since both $R$ and $R’$ are graded, and the inclusion $R’ \subseteq R$ preserves the grading, it follows that $g_i, h_i$ are also homogeneous. Let $h := \prod_i h_i$. Then
$hR \subseteq R’$
If $\deg(h) = s$, then it follows that
$hR_{k – s} \subseteq R’_k \subseteq R_k$
for each $k \geq s$. Consequently,
$P_X(k-s) \leq P_{\pi(X)}(k) \leq P_X(k)$
for all $k \geq s$. But then the leading terms of $P_X$ and $P_{\pi(X)}$ must be the same. This concludes the proof of Theorem 2.

# Degree of a projective variety

$\DeclareMathOperator{\codim}{codim} \newcommand{\dprime}{^{\prime\prime}}\newcommand{\kk}{\mathbb{K}} \newcommand{\local}[2]{\mathcal{O}_{#2, #1}} \newcommand{\pp}{\mathbb{P}} \newcommand{\qq}{\mathbb{Q}} \newcommand{\rr}{\mathbb{R}} \DeclareMathOperator{\res}{Res} \newcommand{\scrL}{\mathcal{L}} \DeclareMathOperator{\sing}{Sing} \newcommand{\zz}{\mathbb{Z}}$
The degree of a subvariety $X$ of a projective space $\pp^n$ defined over an algebraically closed field $\kk$ is the number of points of intersection of $X$ and a “generic” linear subspace of $\pp^n$ of “complementary dimension”, i.e. of dimension equal to $n – \dim(X)$. The goal of this post is to show that this definition is well defined – this is the statement of Theorem 3 below. We also show (in Theorem 4 below) that the degree also equals the number of points of such a generic intersection if every point is counted with an appropriate “multiplicity” (in particular, all points of intersection of $X$ and a generic complementary dimensional linear space has intersection multiplicity one). First we clarify what “generic” means in this context.

Generic Linear Subspaces. Let $\scrL_m$ be the set of $m+1$ points in “general position” in $\pp^n$, i.e. the set of all $(P_0, \ldots, P_m) \in ((\pp^n)^*)^{m+1}$ such that every $(m+1) \times (m+1)$ minor of the $(n+1) \times (m+1)$ matrix whose columns are (the homogeneous coordinates of) $P_0, \ldots, P_m$ is nonzero. Then $\scrL_m$ is a nonempty Zariski open subset of $((\pp^n)^*)^{m+1}$ and every $(P_0, \ldots, P_m) \in \scrL_m$ determines a unique $m$-dimensional linear subspace of $\pp^n$, which is the span of the $P_i$. We say that a property holds for a “generic linear subspace of dimension $m$” if it holds for the subspaces corresponding to all elements of a nonempty Zariski open subset of $\scrL_m$.

It is clear that if $X$ is a linear subspace of $\pp^n$, then the degree of $X$ is well defined and equals $1$. The first non-trivial case is the following:

Proposition 1 (Degree of a hypersurface). Let $X = V(f)$ be a hypersurface in $\pp^n$ defined by an irreducible homogeneous polynomial $f$ of degree $d$. Then a generic line intersects $X$ at $d$ points. In particular, $\deg(X)$ is well defined and equals $\deg(f)$.

Proof. Consider a line $L := \{t_0P_0 + t_1P_1: (t_0,t_1) \in \kk^2\setminus\{(0,0)\}\}$. For generic $P_0, P_1$, $f|_L$ is a degree $d$ homogeneous polynomial, and if we count the points in $L \cap V(f)$ with the corresponding multiplicity as a root of $f|_L$, then the number of points in $L \cap V(f)$ is $d$ (since $\kk$ is algebraically closed). However, we are going to show that $f|_L$ has no repeated root for generic choices of $P_0, P_1$. Indeed, choose homogeneous coordinates $[x’_0: \cdots: x’_n]$ on $\pp^n$ such that $L$ is parallel to $x’_n$-axis. Then in affine coordinates $u_i := x’_i/x_0$, $i = 1, \ldots, n$, $L$ has a parametrization of the form
$\{(a’_1, \ldots, a’_n) + t(0, \ldots, 0, 1): t \in \kk\}$
Consider the dehomogenization $\tilde f := f/x_0^d$ of $f$. For a generic choice of $L$ one can ensure that

• $\deg(\tilde f|_L) = \deg(\tilde f) = \deg(f) = d$,
• $\partial \tilde f/\partial u_n \not\equiv 0$

Since $\tilde f$ is irreducible, $\dim(X \cap V(\partial \tilde f/\partial u_n)) < \dim(X) = n -1$. Consequently, the complement in $\kk^{n-1}$ of the image of the projection of $X \cap V(\partial \tilde f/\partial u_n)$ onto the first $n-1$ coordinates contains a nonempty Zariski open subset of $\kk^{n-1}$. Therefore, for a generic choice of $L$ one can also ensure that

• $(\partial \tilde f/\partial u_n)(a’_1, \ldots, a’_{n-1}, t) \neq 0$ for each $t \in \kk$ such that $f(a’_1, \ldots, a’_{n-1}, t) = 0$.

It follows that for generic $L$ there are precisely $d$ elements in $L \cap X$, as required to complete the proof of Proposition 1.

We reduce the general case to hypersurfaces using generic linear projections: Given a hyperplane $H$ in $\pp^n$, every point $P \in \pp^n \setminus H$ defines a projection $\pp^n \setminus \{P\} \to H$ which maps a point $x$ to the (unique) point of intersection of $H$ and the line joining $x$ and $P$. Consequently, we can identify the set of “projections onto a hyperplane of $\pp^n$” with the set of pairs $(P, H)$ such that $H$ is a hyperplane of $\pp^n$ and $P \in \pp^n \setminus H$. In general, every pair $(H’,H)$ of linear subspaces of $\pp^n$ such that

• $H’ \cap H = \emptyset$, and
• $\dim(H’) + \dim(H) = n-1$

defines a projection map $\pi_{H’,H}: \pp^n \setminus H’ \to H$ which maps $x \in \pp^n \setminus H’$ to the unique point where $H$ intersects with the complementary dimensional linear subspace of $\pp^n$ spanned by $x$ and $H’$. Note that we can choose coordinates $[x_0: \cdots: x_n]$ on $\pp^n$ such that $H’ = \{x_0 = \cdots = x_k=0\}$ and $H$ is the coordinate subspace spanned by $x_0, \ldots, x_k$; in that case
$\pi_{H’,H}: [x_0: \cdots :x_n] \in \pp^n \setminus H’ \mapsto [x_0: \cdots : x_k]$
We say that a property holds for a “generic linear projection onto a $k$-dimensional subspace” if it holds for the projection map corresponding to pairs of generic $n-k-1$ and $k$ dimensional subspaces in the sense defined above.

Linear projections can be used to make the standard result that every variety is birational to a hypersurface a bit more precise:

Lemma 2. Let $X$ be an irreducible subvariety of $\pp^n$ of dimension $m \leq n – 2$. Then a generic $m$ dimensional hyperplane $H_0$ satisfies the following properties:

1. If $u_0, \ldots, u_m$ are linearly independent linear forms over $H_0$, then $u_1/u_0, \ldots, u_m/u_0$ restrict to algebraically independent elements in $\kk(X)$;
2. if $H$ is a generic $m+1$ dimensional linear subspace of $\pp^n$ containing $H_0$ and $H’$ is a generic $n-m-2$ dimensional linear subspace $\pp^n$ such that $H’ \cap X = H’ \cap H = \emptyset$, then
• the image of $X$ under the linear projection $\pi_{H’,H}: \pp^n \setminus V(H’) \to H$ is birational to $X$, and
• the degree $d$ of the polynomial defining $\pi_{H’,H}(X)$ as a hypersurface of $H$ equals the degree of the field extension $[\kk(X): \kk(u_1/u_0, \ldots, u_m/u_0)]$. In particular, $d$ does not depend on $H$ or $H’$.

We sketch a proof of Lemma 2 at the end of this post (note that the arguments from the sketch show that for Lemma 2 it is sufficient to have $|\kk| = \infty$ in place of being algebraically closed). Now we use the lemma to show that degree is well defined.

Indeed, fix generic $H_0 \subseteq H$ as in Lemma 2. For generic $H’$ as in Lemma 2, let $U_{H’, H}$ be the Zariski open subset of $X$ such that

• $\pi = \pi_{H’, H}$ induces an isomorphism between $U_{H’, H}$ and $\pi(U_{H’, H})$, and
• $U_{H’,H} = \pi^{-1}(\pi(U_{H’,H})) \cap X$.

Observation. Since $\pi(U_{H’,H})$ is open in $\pi(X)$ and $\pi(X)$ is irreducible of dimension $m$, the complement $\pi(X) \setminus \pi(U_{H’,H})$ has dimension less than $m$, and consequently, if $L$ is a generic $n-m$ dimensional linear subspace of $\pp^n$ containing $H’$, then $\pi(L\setminus H’)$ is a line which does not intersect $\pi(X) \setminus \pi(U_{H’,H})$.

We now reformulate this observation in a more elaborate way: let $S$ be the subset of
$\scrL := \scrL_{n-m} \times \scrL_{n-m-2} \times \scrL_{m+1}$
consisting of all $(L, H’,H)$ such that $L \supseteq H’$. Note that $S$ is a subvariety of $\scrL$, since the condition $L \subseteq H’$ can be defined by vanishing of certain polynomials in coordinates of $\scrL$. Let $S’$ be the subset of $S \times X$ consisting of all $(L,H’,H,x)$ such that

• $H’ \cap X = H’ \cap H = \emptyset$,
• $x \in U_{H’,H}$
• $\pi_{H’,H}(L\setminus H’) \cap \pi_{H’,H}(X) \setminus \pi(U_{H’,H}) = \emptyset$

(Here is a bit of hand waving:) $S’$ can be defined as a subset of $S \times X$ by certain polynomial equalities and inequalities, and consequently, $S’$ is a constructible subset of $S \times X$.

Therefore, by Chevalley’s theorem on images of morphisms, the projection $S\dprime$ of $S’$ to $S$ is also constructible. Consider the projection
$\psi: S \subseteq \scrL_{n-m} \times \scrL_{n-m-2} \times \scrL_{m+1} \to \scrL_{n-m-2} \times \scrL_{m+1}$
The “observation” above says that for all $(H’, H)$ in a nonempty Zariski open subset of $\scrL_{n-m-2} \times \scrL_{m+1}$, a nonempty Zariski open subset of $\psi^{-1}(H’,H)$ is included in $S\dprime$. Since $S\dprime$ is constructible, this can only be possible if $S\dprime$ contains a nonempty Zariski open subset of $S$. Consequently, for a generic $n-m$ dimensional linear subspace $L$ of $\pp^n$ one can pick an $n-m-2$ dimensional linear subspace $H’$ of $L$ which satisfies the second assertion of Lemma 2, and in addition, $L \cap (X \setminus U_{H’, H}) = \emptyset$. Then Proposition 1 and Lemma 2 together imply that the number of points in the intersection of $L \cap X$ does not depend on $L$. Consequently, we have proved the following result in the case that $X$ is irreducible:

Theorem 3. The degree $\deg(X)$ of a “pure dimensional” subvariety $X$ of $\pp^n$ is well defined (recall that a variety is “pure dimensional” if each of its irreducible components has the same dimension). In addition, $\deg(X) = \deg(\pi(X))$ for a generic linear projection $\pi$ onto a linear subspace of dimension $\dim(X) + 1$ in $\pp^n$.

Proof. As noted above, we already proved the theorem in the case that $X$ is irreducible. In general, if $\dim(X) = m$, we can take an $n-m$ dimensional linear subspace of $\pp^n$ which is generic for each irreducible component of $X$ (since “generic” properties hold for nonempty Zariski open subsets of $\scrL_{n-m}$ and since $\scrL_{n-m}$ itself is irreducible, the intersection of finitely many generic conditions is also generic). This completes the proof of the theorem.

Recall that the intersection multiplicity of $n$ hypersurfaces $H_i := \{f_i = 0\}$, $i = 1, \ldots, n$, on an $n$ dimensional variety $X$ (where $f_i$ are regular functions on $X$) at a nonsingular point $x \in X$ is the dimension (as a vector space over $\kk$) of
$\local{X}{x}/\langle f_1, \ldots, f_n \rangle$

Theorem 4. Let $X$ be a subvariety of $\pp^n$ of pure dimension $m$ and degree $d$. Then for generic linear homogeneous polynomials $l_1, \ldots, l_m$, the hypersurfaces of $X$ defined by $l_i = 0$ intersect at precisely $d$ points, each of which is a nonsingular point of $X$, and the intersection multiplicity of $\{l_1|_X = 0\}, \ldots, \{l_m|_X = 0\}$ at each point of intersection is one.

Proof. If $l_1, \ldots, l_m$ are generic, we already know that $L := \{l_1 = \cdots = l_m = 0\}$ intersects $X$ in precisely $d$ points (Theorem 3), and since the set $\sing(X)$ of singular points of $X$ has dimension $< m -1$, in the generic case $L$ does not intersect $\sing(X)$. It remains to show that at each point of intersection, the intersection multiplicity is one in the generic case. However, the arguments from the proof of Proposition 1 show that this is true when $X$ is an hypersurface, and then the general case follows from choosing $(L,H’,H)$ from the constructible subset $S\dprime$ defined above, since in that case the projection $X \to \pi_{H’,H}(X)$ is an isomorphism near every point of $L \cap X$. This completes the proof of the theorem.

Sketch of a Proof of Lemma 2. Since $\dim(X) = m$, if $u_0, \ldots, u_m$ are generic (homogeneous) linear forms in $(x_0, \ldots, x_n)$, then $u_i/u_0$, $i = 1, \ldots, m$, are algebraically independent over $\kk$. Consequently, a generic $m$-dimensional linear subspace $H_0$ of $\pp^n$ satisfies assertion 1 of Lemma 2. For the second assertion, choose homogeneous coordinates $[x_0: \cdots :x_n]$ on $\pp^n$ such that $X \not\subseteq V(x_0)$. Write $x’_i := x_i/x_0$, $i = 1, \ldots, n$. Arguments from standard proofs of the primitive element theorem and Schmidt’s “separable extension theorem” (see e.g. How Many Zeroes?, Theorem B.35 and Corollary B.37) show that

1. There are linear combinations $v_i := \sum_j \lambda_{i,j} x’_i$ with $\lambda_{i,j} \in \kk$ such that $v_1|_X, \ldots, v_m|_X$ are algebraically independent over $\kk$ and $\kk(X) = \kk(v_1, \ldots, v_m)(v_{m+1})$ (here one needs to follow the arguments of the proofs of Theorem B.35 and Corollary B.37 in How Many Zeroes? and observe that one can choose the $\lambda_{i,j}$ from $\kk$ since $|\kk| = \infty$).
2. To see that the above can be achieved with generic $\lambda_1, \ldots, \lambda_m$, observe the following from the proof of Theorem B.35 in How Many Zeroes?: the requirement on $\lambda_1, \ldots, \lambda_m$ boils down to a finite sequence of conditions of the following form: given a certain field $F$ containing $\kk$ and $\alpha_1, \alpha_2 \in F$ and $f_1, f_2 \in F(t)$, where $t$ is an indeterminate, such that $f_1(\alpha_1) = f_2(\alpha_2) = 0$, there is $\lambda \in \kk$ such that
$\gcd(f_1, f_2(\lambda \alpha_1 + \alpha_2 – \lambda t)) = t – \alpha_1$
(where $\gcd$ is computed in $F[t]$). Since $t – \alpha_1$ divides both of these polynomials, the above condition is equivalent to requiring that
$\res(f’_1, f’_2) \neq 0$
where
\begin{align*} f’_1 &:= \frac{f_1}{t-\alpha_1} \\ f’_2 &:= \frac{f_2(\lambda \alpha_1 + \alpha_2 – \lambda t)}{t – \alpha_1} \end{align*}
and $\res$ denotes the resultant. Since the resultant is a polynomial in the coefficients, its non-vanishing is a Zariski open condition on the coefficients. Since the condition is satisfied for at least one $\lambda \in \kk$ (that’s the first observation above), it is satisfied for all $\lambda$ in a non-empty Zariski open subset of $\kk$. It follows that the full set of conditions hold for all $\lambda_{i,j}$ in a non-empty Zariski open subset of $\kk^{n^2}$.