Z-Estimator

Recall that an M-Estimator seeks the minimizer of a function $M_{n}$ . Suppose the function is differentiable and convex, its minimizer equals the zero of its derivative

Ψ_{n} (\hat{θ}) = \frac{1}{n} i = 1 \sum n ψ_{\hat{θ}} (X_{i}) = 0, (Z)

where in this case, $ψ$ is the derivative of $m$ in the definition of M-estimator. For example, MLE is an estimator w.r.t the Score Function $ψ_{θ} = \frac{\partial}{\partial θ} lo g p_{θ} (X)$ .

However, $ψ$ can be more general without necessarily corresponding to an optimization problem. For example, recall that for a Moment Estimator, we solve a system

\frac{1}{n} i = 1 \sum n ψ_{\hat{θ}} (X_{i}) = \frac{1}{n} i = 1 \sum n (E_{\hat{θ}} g - g (X_{i})) = E_{\hat{θ}} g - \hat{E}_{n} g = 0.

Thus, Moment Estimators are also a special case of Z-estimators.

As we can see, Z-estimators are a class of more general estimators that solve the zero point of a system of estimating equations $(Z)$ .

Table. Comparison of optimization and system solution.

Optimization	System Solution
M-estimators	Z-estimators
Optimizing an objective function	Solving an equation system
Utilize optimization landscape, e.g., gradient	Utilize system dynamics, e.g., contraction

As we discussed earlier, for convex/concave and differentiable objective functions, optimization is equivalent to solving a system regarding the gradient. Conversely, we can also define an objective function for solving a system of equations. For example, for a linear system $A x = b$ , we can define the squared cost $f (x) = ∥ A x - b ∥_{2}^{2}$ , whose minimizer is the solution of the system.

Equivalence of optimization and system solution.|300 — Equivalence of optimization and system solution.

However, different problem formulations offer different insights and solution methods.

Optimization is more suitable if you have a clear and well-motivated objective function;
System solution is more suitable when you know how the solution determines the system dynamics.
When optimizing a function, we usually care more about how the local landscape, e.g., gradient, carries the decision variable to the optimum;
When solving a system, we usually want to follow some system dynamics, e.g., a contractive operator, to reach the solution.

Properties

Asymptotic Normality

Let $θ^{*}$ solve the system $E [ψ_{θ^{*}} (X)] = 0$ . Suppose the consistency holds: $\hat{θ} \to P θ^{*}$ . Then, under some regularity conditions:

$θ \mapsto ψ_{θ} (x)$ is twice differentiable for all $θ$ and $∥ \ddot{ψ}_{θ} (x) ∥ \leq f (x)$ for some integrable function $f$ ;¹
- or, there exists an $L_{2}$ function $g$ such that for any $θ_{1}, θ_{2}$ in a neighborhood of $θ^{*}$ , we have $∥ ψ_{θ_{1}} (x) - ψ_{θ_{2}} (x) ∥ \leq g (x) ∥ θ_{1} - θ_{2} ∥$ ;
$E \dot{ψ}_{θ} (X)$ exists and is non-singular in a neighborhood of $θ^{*}$ ;
$E ψ_{θ^{*}} ψ_{θ^{*}}^{T}$ exists;

we have

n (\hat{θ} - θ^{*}) \to d N (0, V_{θ^{*}}^{- 1} E [ψ_{θ^{*}} ψ_{θ^{*}}^{T}] V_{θ^{*}}^{- 1}),

where $V_{θ^{*}} = \frac{\partial}{\partial θ} E ψ_{θ} (X) ∣_{θ^{*}}$ .

Relation to Asymptotic Normality of M-Estimators

If $ψ_{θ} = \overset{m}{˙}_{θ}$ , where $m$ is the objective function for an M-Estimator, we can see that the asymptotic normality of Z-estimators implies the asymptotic normality of M-estimators.

However, the regularity conditions for Z-estimators are stronger than those for M-estimators (see Asymptotic Normality). For example, we can show that quantile regression satisfies the asymptotic normality conditions for M-estimators; but it does not satisfy the conditions for Z-estimators.

Proof Sketch

Denote $Ψ (θ) : = E ψ_{θ} (X)$ ; recall that $Ψ_{n} (θ) : = \hat{E}_{n} ψ_{θ} (X)$ . By Taylor expansion,

0 = Ψ_{n} (\hat{θ}) = Ψ_{n} (θ^{*}) + \dot{Ψ}_{n} (θ^{*}) (\hat{θ} - θ^{*}) + o_{P} (∥ \hat{θ} - θ^{*} ∥) .

Since we assume the consistency, we get

n (\hat{θ} - θ^{*}) \to P - \dot{Ψ}_{n} (θ^{*})^{- 1} (n Ψ_{n} (θ^{*})) .

By LLN, $\dot{ψ}_{n} (θ^{*})^{- 1} \to \dot{ψ} (θ^{*})^{- 1}$ ; by CLT,

n Ψ_{n} (θ^{*}) \to d N (0, E [ψ_{θ^{*}} ψ_{θ^{*}}^{T}]) .

Thus, by Slutsky’s theorem,

n (\hat{θ} - θ^{*}) \to d N l (0, \dot{Ψ} (θ^{*})^{- 1} E [ψ_{θ^{*}} ψ_{θ^{*}}^{T}] \dot{Ψ} (θ^{*})^{- 1}) .

Asymptotic Normality of Least Squares

We cast Ordinary Least Squares as a Z-estimator and verify the asymptotic normality conditions. The cost function is $m_{θ} (x, y) = (y - θ^{T} x)^{2}$ , which gives the z-function $ψ_{θ} (x, y) = 2 x (y - x^{T} θ)$ .

For the first condition, we verify its alternative Lipschitz condition:

∥ ψ_{θ_{1}} (x, y) - ψ_{θ_{2}} (x, y) ∥ \leq 2∥ x x^{T} ∥∥ θ_{1} - θ_{2} ∥.

Suppose $X$ has finite moments, then the first condition is met.

For the second condition, we have

E \dot{ψ}_{θ} (X) = - 2 E [X X^{T}] .

For the third condition, we have

E ψ_{θ^{*}} ψ_{θ^{*}}^{T} = E [4 (Y - X^{T} θ^{*})^{2} X X^{T}] .

Therefore, by the asymptotic normality of Z-estimators, we have

n (\hat{θ} - θ^{*}) \to d N (0, (E X X^{T})^{- 1} E [(Y - X^{T} θ^{*})^{2} X X^{T}] (E X X^{T})^{- 1}) .

The measure is always the probabilistic measure. ↩

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Z-Estimator

Table of Contents

Z-Estimator

Properties

Asymptotic Normality

Relation to Asymptotic Normality of M-Estimators

Proof Sketch

Asymptotic Normality of Least Squares

Backlinks

Graph View

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Z-Estimator

Table of Contents

Z-Estimator

Properties

Asymptotic Normality

Relation to Asymptotic Normality of M-Estimators

Proof Sketch

Asymptotic Normality of Least Squares

Footnotes

Backlinks

Graph View