Cell Mapping Features
The idea of cell mapping is that a continuous search space is partitioned in every dimension and thus achieving a discretization of the original search space into cells. This discretization of the original sample into cells allows the computation of features, which help to characterize the global structure or multimodality of an optimization problem.
Based on this approach, three different feature sets can be computed: angle ("cm_angle"
), convexity ("cm_conv"
) and gradient homogeneity ("cm_grad"
).
Angle
The initial idea of the angle features ("cm_angle"
) is that the best and worst values within the cells might return some insight of the underlying function’s landscape. If those two observations lie in opposite directions, it indicates a trend within the cell. In that case the angle between the vectors from cell center to worst value and cell center to best value would be close to 180°.
The angles of all cells from the grid will then be aggregated using the mean and the standard deviation.
X = createInitialSample(n.obs = 1200, dim = 3)
y = rowSums(X^2)
feat.object = createFeatureObject(X = X, y = y, blocks = c(4, 6, 3))
calculateFeatureSet(feat.object, set = "cm_angle")
Furthermore, the standard deviation and mean of the lengths of the two above-mentioned vectors (i.e. distances from the center of a cell to the best/worst observation within that cell) are used as additional features. In case of simple functions (such as the sphere function), the variation should be low as the majority of the cells should have similar distances — due to the fact that they usually lie close to the borders of the cells. In case of very multimodal functions, the variation should be rather high as cells with local optima result in contrary distances (short distances of the best values and long distances of the worst values) compared to cells without any local optima.
Since interactions between cells are ignored, i.e. these features are computed locally per cell, the features are considered to be independent from the search space dimensionality.
(Inspired by Kerschke, P. et al., 2014)
Cell Convexity
For this feature set ("cm_conv"
), all possible combinations of three (linearly) neighbouring cells within the grid are computed. Per default, only horizontally and vertically neighbouring cells are considered. By adding cm_conv.diag = TRUE
to the list of control parameters, diagonally neighbouring cells are considered as well.
X = createInitialSample(n.obs = 1200, dim = 3)
y = rowSums(X^2)
feat.object = createFeatureObject(X = X, y = y, blocks = c(4, 6, 3))
calculateFeatureSet(feat.object, set = "cm_conv")
During the computation of the cell mapping convexity features, only the cells’ representatives are considered. Based on those prototypes, the concavity or convexity of the landscape is approximated.
Given the function evaluations of the three neighbouring cells, this feature computes the convex-combination between f(x1) and f(x3). That value is then compared to the corresponding value of f(x2). The figure below illustrates the resulting decision, i.e. whether a combination indicates convexity or concavity. Just place the value of f(x2) above x2 and infer the corresponding decision.
(Inspired by Kerschke, P. et al., 2014)
Gradient Homogeneity
For every point within a cell’s sample, the nearest neighbor is identified and afterwards, the normalized vectors, which are always rotated towards the better points, are computed. Then, all normalized vectors are summed up and divided by the maximal possible vector length (i.e. the number of points). In case of rather randomly distributed objective values, the fraction should be close to zero as this would indicate vectors, which are pointing in different directions. In case of a strong trend the value should be close to one (i.e., all vectors point into the same direction).
X = createInitialSample(n.obs = 1200, dim = 3)
y = rowSums(X^2)
feat.object = createFeatureObject(X = X, y = y, blocks = c(4, 6, 3))
calculateFeatureSet(feat.object, set = "cm_grad")
Those values are then aggregated over all cells — again, using the mean and the standard deviation. Simple unimodal functions shall thus generate very high mean values.
(Inspired by Kerschke, P. et al., 2014)
Literature Reference
Kerschke, P., Preuss, M., Hernandez, C., Schuetze, O., Sun, J.-Q., Grimme, C., Rudolph, G., Bischl, B., and Trautmann, H. (2014): “Cell Mapping Techniques for Exploratory Landscape Analysis”, in: EVOLVE — A Bridge between Probability, Set Oriented Numbers, and Evolutionary Computation V, pp. 151—131, Springer (http://dx.doi.org/10.1007/978-3-319-07494-8_9).