dwww: tpablo.net

i.cluster(1grass) GRASS GIS User's Manual i.cluster(1grass)

NAME
i.cluster - Generates spectral signatures for land cover types in an
image using a clustering algorithm.
The resulting signature file is used as input for i.maxlik, to generate
an unsupervised image classification.

KEYWORDS
imagery, classification, signatures

SYNOPSIS
i.cluster
i.cluster --help
i.cluster group=name subgroup=name signaturefile=name classes=integer
[seed=name] [sample=rows,cols] [iterations=integer] [conver-
gence=float] [separation=float] [min_size=integer] [report-
file=name] [--overwrite] [--help] [--verbose] [--quiet] [--ui]

Flags:
--overwrite
Allow output files to overwrite existing files

--help
Print usage summary

--verbose
Verbose module output

--quiet
Quiet module output

--ui
Force launching GUI dialog

Parameters:
group=name [required]
Name of input imagery group

subgroup=name [required]
Name of input imagery subgroup

signaturefile=name [required]
Name for output file containing result signatures

classes=integer [required]
Initial number of classes
Options: 1-255

seed=name
Name of file containing initial signatures

sample=rows,cols
Number of rows and columns over which a sample pixel is taken

iterations=integer
Maximum number of iterations
Default: 30

convergence=float
Percent convergence
Options: 0-100
Default: 98.0

separation=float
Cluster separation
Default: 0.0

min_size=integer
Minimum number of pixels in a class
Default: 17

reportfile=name
Name for output file containing final report

DESCRIPTION
i.cluster performs the first pass in the two-pass unsupervised classi-
fication of imagery, while the GRASS module i.maxlik executes the sec-
ond pass. Both commands must be run to complete the unsupervised clas-
sification.

i.cluster is a clustering algorithm (a modification of the k-means
clustering algorithm) that reads through the (raster) imagery data and
builds pixel clusters based on the spectral reflectances of the pixels
(see Figure). The pixel clusters are imagery categories that can be
related to land cover types on the ground. The spectral distributions
of the clusters (e.g., land cover spectral signatures) are influenced
by six parameters set by the user. A relevant parameter set by the user
is the initial number of clusters to be discriminated.

Fig.: Land use/land cover clustering of LANDSAT scene (sim-
plified)

i.cluster starts by generating spectral signatures for this number of
clusters and "attempts" to end up with this number of clusters during
the clustering process. The resulting number of clusters and their
spectral distributions, however, are also influenced by the range of
the spectral values (category values) in the image files and the other
parameters set by the user. These parameters are: the minimum cluster
size, minimum cluster separation, the percent convergence, the maximum
number of iterations, and the row and column sampling intervals.

The cluster spectral signatures that result are composed of cluster
means and covariance matrices. These cluster means and covariance ma-
trices are used in the second pass (i.maxlik) to classify the image.
The clusters or spectral classes result can be related to land cover
types on the ground. The user has to specify the name of group file,
the name of subgroup file, the name of a file to contain result signa-
tures, the initial number of clusters to be discriminated, and option-
ally other parameters (see below) where the group should contain the
imagery files that the user wishes to classify. The subgroup is a sub-
set of this group. The user must create a group and subgroup by run-
ning the GRASS program i.group before running i.cluster. The subgroup
should contain only the imagery band files that the user wishes to
classify. Note that this subgroup must contain more than one band
file. The purpose of the group and subgroup is to collect map layers
for classification or analysis. The signaturefile is the file to con-
tain result signatures which can be used as input for i.maxlik. The
classes value is the initial number of clusters to be discriminated;
any parameter values left unspecified are set to their default values.

Parameters:
group=name
The name of the group file which contains the imagery files that
the user wishes to classify.

subgroup=name
The name of the subset of the group specified in group option,
which must contain only imagery band files and more than one band
file. The user must create a group and a subgroup by running the
GRASS program i.group before running i.cluster.

signaturefile=name
The name assigned to output signature file which contains signa-
tures of classes and can be used as the input file for the GRASS
program i.maxlik for an unsupervised classification.

classes=value
The number of clusters that will initially be identified in the
clustering process before the iterations begin.

seed=name
The name of a seed signature file is optional. The seed signatures
are signatures that contain cluster means and covariance matrices
which were calculated prior to the current run of i.cluster. They
may be acquired from a previously run of i.cluster or from a super-
vised classification signature training site section (e.g., using
the signature file output by g.gui.iclass). The purpose of seed
signatures is to optimize the cluster decision boundaries (means)
for the number of clusters specified.

sample=rows,cols
These numbers are optional with default values based on the size of
the data set such that the total pixels to be processed is approxi-
mately 10,000 (consider round up). The smaller these numbers, the
larger the sample size used to generate the signatures for the
classes defined.

iterations=value
This parameter determines the maximum number of iterations which is
greater than the number of iterations predicted to achieve the op-
timum percent convergence. The default value is 30. If the number
of iterations reaches the maximum designated by the user; the user
may want to rerun i.cluster with a higher number of iterations (see
reportfile).
Default: 30

convergence=value
A high percent convergence is the point at which cluster means be-
come stable during the iteration process. The default value is
98.0 percent. When clusters are being created, their means con-
stantly change as pixels are assigned to them and the means are re-
calculated to include the new pixel. After all clusters have been
created, i.cluster begins iterations that change cluster means by
maximizing the distances between them. As these means shift, a
higher and higher convergence is approached. Because means will
never become totally static, a percent convergence and a maximum
number of iterations are supplied to stop the iterative process.
The percent convergence should be reached before the maximum number
of iterations. If the maximum number of iterations is reached, it
is probable that the desired percent convergence was not reached.
The number of iterations is reported in the cluster statistics in
the report file (see reportfile).
Default: 98.0

separation=value
This is the minimum separation below which clusters will be merged
in the iteration process. The default value is 0.0. This is an im-
age-specific number (a "magic" number) that depends on the image
data being classified and the number of final clusters that are ac-
ceptable. Its determination requires experimentation. Note that as
the minimum class (or cluster) separation is increased, the maximum
number of iterations should also be increased to achieve this sepa-
ration with a high percentage of convergence (see convergence).
Default: 0.0

min_size=value
This is the minimum number of pixels that will be used to define a
cluster, and is therefore the minimum number of pixels for which
means and covariance matrices will be calculated.
Default: 17

reportfile=name
The reportfile is an optional parameter which contains the result,
i.e., the statistics for each cluster. Also included are the re-
sulting percent convergence for the clusters, the number of itera-
tions that was required to achieve the convergence, and the separa-
bility matrix.

NOTES
Sampling method
i.cluster does not cluster all pixels, but only a sample (see parameter
sample). The result of that clustering is not that all pixels are as-
signed to a given cluster; essentially, only signatures which are rep-
resentative of a given cluster are generated. When running i.cluster on
the same data asking for the same number of classes, but with different
sample sizes, likely slightly different signatures for each cluster are
obtained at each run.

Algorithm used for i.cluster
The algorithm uses input parameters set by the user on the initial num-
ber of clusters, the minimum distance between clusters, and the corre-
spondence between iterations which is desired, and minimum size for
each cluster. It also asks if all pixels to be clustered, or every
"x"th row and "y"th column (sampling), the correspondence between iter-
ations desired, and the maximum number of iterations to be carried out.

In the 1st pass, initial cluster means for each band are defined by
giving the first cluster a value equal to the band mean minus its stan-
dard deviation, and the last cluster a value equal to the band mean
plus its standard deviation, with all other cluster means distributed
equally spaced in between these. Each pixel is then assigned to the
class which it is closest to, distance being measured as Euclidean dis-
tance. All clusters less than the user-specified minimum distance are
then merged. If a cluster has less than the user-specified minimum num-
ber of pixels, all those pixels are again reassigned to the next near-
est cluster. New cluster means are calculated for each band as the av-
erage of raster pixel values in that band for all pixels present in
that cluster.

In the 2nd pass, pixels are then again reassigned to clusters based on
new cluster means. The cluster means are then again recalculated. This
process is repeated until the correspondence between iterations reaches
a user-specified level, or till the maximum number of iterations speci-
fied is over, whichever comes first.

EXAMPLE
Preparing the statistics for unsupervised classification of a LANDSAT
subscene in North Carolina:
g.region raster=lsat7_2002_10 -p
# store VIZ, NIR, MIR into group/subgroup (leaving out TIR)
i.group group=lsat7_2002 subgroup=lsat7_2002 \
input=lsat7_2002_10,lsat7_2002_20,lsat7_2002_30,lsat7_2002_40,lsat7_2002_50,lsat7_2002_70
# generate signature file and report
i.cluster group=lsat7_2002 subgroup=lsat7_2002 \
signaturefile=sig_cluster_lsat2002 \
classes=10 reportfile=rep_clust_lsat2002.txt
To complete the unsupervised classification, i.maxlik is subsequently
used. See example in its manual page.

SEE ALSO
• Image classification wiki page

• Historical reference also the GRASS GIS 4 Image Processing man-
ual (PDF)

• Wikipedia article on k-means clustering (note that i.cluster
uses a modification of the k-means clustering algorithm)

g.gui.iclass, i.group, i.gensig, i.maxlik, i.segment, i.smap, r.kappa

AUTHORS
Michael Shapiro, U.S. Army Construction Engineering Research Laboratory
Tao Wen, University of Illinois at Urbana-Champaign, Illinois

SOURCE CODE
Available at: i.cluster source code (history)

Accessed: unknown

GRASS 7.8.7 i.cluster(1grass)

Generated by dwww version 1.14 on Sun Dec 29 18:11:47 CET 2024.