NEAR, Inc. logo - click for home pageNEAR, Inc. - click for home page AMA, Inc. - click for home page
 
 
Geographic Data Mining

Overview

Aspect ("Association Rules for Spatial Data") is a software product that performs association rule data mining in geographic data sets. Association rule analysis is used to discover associations or relationships between geographic objects or values in a data set.

Aspect is designed to handle multiple variables and to identify the most relevant variables to include in an association rule. Technical details for Aspect are provided.

Aspect is designed for GIS users. It works with standard vector data formats (shapefiles, ESRI® feature classes). It can run standalone or can integrate with ArcGIS® Desktop v9.0 or higher (ArcView, ArcEditor, or ArcInfo). Aspect is easy to use and does not require prior statistical expertise.

Examples
Application Areas
Association Rule Results
Availability
Contact
Reference
Technical Background

Examples

Association rules state that if a condition or set of conditions exists, then that implies that another condition also exists. This is very similar to "If-Then" rules. Association rules have the format

     Condition A => Condition B

where the symbol "=>" means "implies". If A consists of multiple conditions, they are linked with the symbol "^" which means "and". Probabilities of the association can also be given.

The strength of an association rule is measured by its support (how frequently A and B occur together in the data set) and its confidence (how frequently an instance of A also contains an instance of B).

Some hypothetical examples showing the use of association rules are:

Trends Analysis

  • Shopping mall built more than x miles from downtown => Downtown businesses suffer
  • Traffic circles => Decreased accident rate

Prediction of missing data

  • Paved four-lane road intersects with perennial stream => Bridge is present
    Confidence: 90%

Situation awareness and decision support

  • High recent precipitation ^ low elevation ^ poor drainage => saturated soil
    Confidence: 70%
    Conclusion: Reduced load-bearing capacity on such soils.

Validation and error checking

  • Vegetation X => Elevation Y
    Confidence: 90%
    Conclusion: If vegetation occurs at a different elevation, verify whether it is an outlier data point or an error.

Top

Application Areas

Aspect has applications in many industries that rely on spatial data analysis. It may be used in cases where there is a large amount of data available with limited resources to identify relationships of importance. Once relationships are determined, it may also be used for prediction or to fill in missing features in sparse data sets.

Here is a sampling of the industries where Aspect can be applied:

  • Transportation. Aspect can analyze the multiple characteristics of railroad crossings to find significant associations with accident occurrence.
  • City Planning. Planners can use data mining to estimate the impact of new developments based on data from other communities with similar development.
  • Environmental Science. Aspect has been used to discriminate land use and vegetation types based on soil type, climate characteristics, demographics, and terrain features.
  • Agriculture. Data mining can be used to predict the causes of pest infestation or to associate farming practices with improved crop performance.
  • Remote sensing. Association rules discovered with Aspect have been used as ancillary information to enhance the interpretation of remote sensing images.
  • Epidemiology. There is interest in using Aspect to model farm locations to aid in the tracking of foot-and-mouth disease in livestock and avian influenza in poultry.

Top

Association Rule Results

Example 1

An association rule to relate road density to urban areas was identified. This rule can be used to interpret a remote sensing image in which one pixel might represent more than one feature. By computing the road density from the image, the existence of an urban area can be inferred, and this in turn can be used to assign a pixel to asphalt (in an urban area) or bare dirt (nonurban).

The data set used was from Sonoma County, California. The road network is shown in blue and the urban areas in yellow. The road density is defined as the total length of roadway within each grid cell in the map below.

The following rules were found:

  • Road density <12,500 => Not Urban
    Support: 1535 (89%), Confidence: 95%
  • Road density >12,500 => Urban
    Support: 86 (5%), Confidence: 88%

Example 2

Public vehicular railroad crossings in California were examined to find characteristics that were associated with accident occurrence. This analysis made use of Aspect's capabilities to handle multivariate data sets and to identify the most relevant variables to include in a rule. Originally, 16 variables were examined with respect to accident occurrence. These variables were pared down to the four most significant.

The following rule was found to describe the characteristics of dangerous intersections:

  • Annual Average Daily Traffic(10000-100000) ^ Train Max Speed(61-70) ^ Total Daily Trains(20-40) ^ Position of Crossing("AtGrade") => At Least One Accident
    Support: 27, Confidence: 43%

Top

Availability

A free limited-time beta version with tech support is currently available. Contact Laura Rodman (below) for details.

Contact

Laura Rodman
Nielsen Engineering & Research, Inc.
605 Ellis St., Ste. 200
Mountain View, CA 94043
rodman@nearinc.com
707-538-8591

Top

Reference

Rodman, L. C., Jackson, J., Huizar III, R., and Meentemeyer, R. K., "An Association Rule Discovery System for Geographic Data," 2006 IEEE International Geoscience and Remote Sensing Symposium, Denver, CO, Jul. 31- Aug. 4, 2006. See also Aspect Technical Background.

Top



 

Site Map