Yale CEO Algorithm for Classifying Shapes of NDVI-cycles
(Fourier Component Classification, FCC)
 
 
HomeAmu-DaryaBalkashSyr-DaryaData
 

Classification example
(subset)

 

 

 

 

 

 

 

 

 

 

 

 

 

Download DFT Plotter
(zipped Excel spreadsheet)

 

DFT calculation
(scripts for ERDAS Imagine)

 

The basic idea of this classification is to cluster vegetation covers according to the shapes of their annual growing cycle or rather, to their remote sensing surrogate, the NDVI-cycle. The shapes of the cycles can assist in identifying vegetation types and thus help conclude to distinct vegetation characteristics including hydro-meteorological behavior, annuals or perennials, seasonal photosynthetic activity, carbon storage/release or to crop management.
Like a supervised classification the technique requires the selection of reference cycles, which will then be used to identify NDVI-cycles of identical shape.

The described technique offers a series of advantages over conventional classifications like K-means or decision tree that are especially suited for the classification of NDVI time-series:

  • Invariance to coverage variation.

  • Invariance to phenological shifts as it may occur when studying larger areas characterized by a pronounced climatic gradient.

  • Invariance to background (soil) reflectance, particularly important in sparsely vegetated areas.

  • Ensures temporal and spatial comparability (independent from scene statistics).

  • Provides intra-class coverage variation.

  • Accuracy parameters can be set by user and multiple classifications can be performed at different accuracy levels to achieve a complete classification of an area.

Remark
Areas showing temporal snow or ice cover, causing a sharp drop in NDVI-values will affect the shape of the NDVI-cycle (phenology). Their prior correction is strongly recommended, particularly when comparing phenology classifications from different years.

 

Step 1:  Fourier Filtering

Since the clustering of NDVI-cycles is based on their shape it is important to smooth/reduce noise features that are typically part of remotely sensed data sets as much as possible but without compromising diagnostic phenological characteristics.  This is achieved by decomposing the complex NDVI-cycle (green) into its individual frequencies using a Discrete Fourier Transform (DFT). After removal of the higher frequencies, that are assumed noise, the inverse DFT provides a smoothed data set. In 1-year, 10-days interval layerstacks (36 layers = 18 harmonics/frequencies) we typically use the first 5 frequencies (red) and discard frequencies 6 to 18.

To reduce data dimensionality, in the following, a simplified Fourier filtered vegetation cycle is used (frequencies 1 and 2) for explanation of the classification algorithm.

Remark (noise reduction):
Using Fourier filtering, can efficiently remove periodic noise but is less successful in the removal of erratic outliers as caused by cloud cover. While a single period (compositing interval), affected by cloud cover, may be smoothed without causing too much distortion to the pure phenological cycle, Fourier filtering creates unreliable results where NDVI-cycles show multiple periods with cloud cover. Since the classification algorithm described in the following uses reference cycles to identify cycles of similar shape, cloud affected areas will remain unclassified unless their influence creates cycles similar to a reference or where thresholds for tolerable shape variability are relaxed to a level where cloud affected cycles will eventually be assigned to a phenological class.  

 

Step 2: Selection of Reference Cycles

The selection of reference sites (red dots) for our current classification of Central Asia is based on a color composite composed of Magnitudes 1 (red), 2 (green), and 3 (blue), providing an efficient reduction of multi-layer cyclic NDVI-data (subset shown below). In general, variations in hue are indicative for different vegetation types, variations in saturation for changes in vegetation coverage. At this point only limited field data are available that confine to the Ferghana Valley and to the Syr-Darya Province area.

 

Classified Area

Step 3: Classification

Our definition of shape is oriented at a vegetation type's phenology, as expressed in the remote sensing surrogate the NDVI-cycle, and is invariant to distinct cycle modifications that may be imposed by influences including climate, soil, topography or human. In particular, the algorithm allows variations within one vegetation type with respect to its coverage/vigor, to its phenological timing, and to its background reflectance, thus creating a more consistent classification result.

 

Principles of classification

The shape of a complex (NDVI-) cycle is completely defined by the Fourier components magnitude and phase. This is shown at a simplified NDVI-cycle created from 36 temporal intervals. The decomposition of the simplified complex NDVI cycle into its individual frequencies using a Discrete Fourier Transform creates two output harmonics of frequency 1 and frequency 2 (Figure 3). In this simplified example the higher harmonics (frequencies) have an amplitude of zero.

Frequency 1 + Frequency 2 - Cycle Mean = Complex Cycle

Cycle Mean = straight grey line (zero line for amplitude measurement)
Complex Cycle = solid black

All shapes identical to the solid black cycle (complex cycle) must have

  1. identical amplitudes for frequency 1 and frequency 2 and

  2. identical phase difference relative to frequency 1.

Figure 3 Figure 4

Note: The amplitude ratio (Amplitude 1 / Amplitude 2 = 1.42) in both examples is the same

Figure 5 Figure 6

Note: 1. Phase difference (Phase 1 - Phase 2 = 7.9) is in both examples the same.

 

Invariance to phenological shifts
Identical, absolute phase offsets are not required. The shape of the complex cycle is still preserved as long as both phases are shifted for the same amount (Figure 6). It is therefore the difference of each phase relative to phase 1 (Phase 1 - Phase N) that is crucial for shape similarity.
Because DFT assumes a periodic cycle, in the example in Figure 6, values that are cycled out at the end of the year are cycled back in at the beginning of the year, thus forming an exact match to the cycle in Figure 5.

Invariance to coverage variation
An increase/decrease in coverage will induce higher/lower amplitudes in the NDVI-cycle. As shown in Figures 3 and 4, our definition of the overall shape will be preserved as long as each amplitude ratio (Amplitude 1 / Amplitude N) remains constant. In a plot of Amplitude 1 and Amplitude 2 for a specific vegetation type, all points with identical amplitude ratio plot onto a linear that connects the amplitude ratio of the reference (red circle) with the origin (Figure 7). To limit tolerable coverage variations a threshold can be set measured as the Euclidian distance from the reference (red circle) and defined in percent of the distance Origin to Reference.

 

Figure 7

 

Invariance to background
Because we use the frequencies' amplitude in our shape description, the influence of background reflectance on the classification result is almost eliminated.  This can easily be understood looking at Figure 3: A vertical shift of the NDVI-cycle, simulating a darker background, will be of no influence for the individual amplitudes or for the phases (neglecting the influence of reflectance and scattering processes as a result of different soil types). To avoid the mixing of e.g. sparsely vegetated/bare soil areas with evergreen vegetation covers, that may display similar shapes in their temporal cycles a threshold can be set for the annual mean NDVI that applies to all classes. Exceeding this threshold will result in class rejection    

 

Thresholds for Phase differences and Amplitude ratios
As we cannot expect that all pixels of the same vegetation type have ideal phase differences and ideal amplitude ratios that perfectly match those of the reference, reasonable thresholds for deviations from the ideal values need to be set.
Deviations from the ideal amplitude ratio are measured as the distance of a point to the linear (residual) representing the ideal ratio (Figure 7).

 

The tolerable distance is defined in percent of the Sum of Amplitudes. Point 2 in Figure 7 fulfills the Amplitude ratio requirement but would be rejected because it falls outside the coverage threshold (black circle).

The amount of deviation from the ideal phase difference depends on a frequency's contribution to the shape of the complex cycle, which is a function of its amplitude. In the extreme case of a harmonic with an amplitude of zero, there are no restrictions on tolerable phase difference offset θk. A frequency with an amplitude of zero has no influence on the shape of the complex cycle. This is expressed in the following equation:

      equation 1

 

Ak is the kth amplitude out of n frequencies, the 'kth Harmonic' or kth frequency is the number of the harmonic. Since phase differences are measured against Phase 1, calculation of θ1 is irrelevant. Exponent X can assume any value and is used to manipulate tolerable phase differences.

The calculation of each tolerable phase off-set from phase 1 is done individually and independently from other phase off-sets. Therefore, shape fidelity might be compromised where all phases are displaced to their individual maximum (this would be the case where simple box filters are used, Figure 8). The solution is a second order filter that allows only a single phase to be displaced to the maximum value while keeping the rest to a minimum (Figure 8). As indicated in Figure 8, the curvature of the filter can be manipulated to allow for more or less phase tolerances resulting in lower or higher shape fidelity.   The 2nd order filter takes the form:

      equation 2

 

Figure 8
 

with amax calculated as

      equation 3

 

where c = maximum tolerance for phase 2 and x = maximum tolerance for phase 3. The corresponding b-values are calculated from

 

      equation 4

 

The graph cuts the y-axis/x-axis at the maximum tolerable phase difference for phase 2 (y-axis) and phase 3 (x-axis), as calculated from equation 1. amax and -amax is where the maximum/minimum of the function falls onto the y-axis and x-axis respectively. At higher/lower a-values filtering becomes increasingly biased. At a = 0 the equation becomes a linear.

 

Literature: