GeoDA
Encyclopedia
GeoDa is a free software package that conducts spatial data analysis
Spatial analysis
Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties...

, geovisualization
Geovisualization
Geovisualization, short for Geographic Visualization, refers to a set of tools and techniques supporting geospatial data analysis through the use of interactive visualization....

, spatial autocorrelation and spatial modeling. OpenGeoDa is the cross-platform, open source version of Legacy GeoDa. While Legacy GeoDa only runs on Windows XP, OpenGeoDa runs on different versions of Windows (including XP, Vista and 7), Mac OS, and Linux. The package was initially developed by the Spatial Analysis Laboratory of the University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
The University of Illinois at Urbana–Champaign is a large public research-intensive university in the state of Illinois, United States. It is the flagship campus of the University of Illinois system...

 under the direction of Luc Anselin
Luc Anselin
-Life and contributions:Luc Anselin is currently Walter Isard Chair and Director of the where he attracted some of the leading spatial econometrics scholars. He also founded and directs the at ASU to develop, implement, apply, and disseminate spatial analysis methods...

. Development continues at the GeoDa Center for Geospatial Analysis and Computation at Arizona State University
Arizona State University
Arizona State University is a public research university located in the Phoenix Metropolitan Area of the State of Arizona...

.

GeoDa has powerful capabilities to perform spatial analysis, multivariate exploratory data analysis, and global and local spatial autocorrelation. It also performs basic linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

. As for spatial models, both the spatial lag model and the spatial error model, both estimated by maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, are included.

History

GeoDa replaced what was previously called DynESDA, a module that worked under the old ArcView
ArcView
ArcView 3.x was a geographic information system software product produced by ESRI.-History:ArcView started as a graphical program for spatial data and maps made using ESRI's other software products. Over time more and more functionality was added to ArcView and it became a real GIS program capable...

 3.x to perform exploratory spatial data analysis (or ESDA). Current releases of GeoDa no longer depend on the presence of ArcView or other GIS packages on a system.

Functionality

Projects in GeoDa basically consist of a shapefile
Shapefile
The Esri Shapefile or simply a shapefile is a popular geospatial vector data format for geographic information systems software. It is developed and regulated by Esri as a open specification for data interoperability among Esri and other software products.Shapefiles spatially describe geometries:...

 that defines the lattice data, and an attribute table in a .dbf format. The attribute table can be edited inside GeoDa. GeoDa can produce histogram
Histogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...

s, box plot
Box plot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation , lower quartile , median , upper quartile , and largest observation...

s, and scatterplot
Scatterplot
A scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data....

s to conduct simple exploratory analyses of the data.

The package is specialized on exploratory data analysis and geovisualization, where it exploits techniques for dynamic linking and brushing
Brushing and linking
In databases, brushing and linking refers to the connection of two or more views of the same data, such that a change to the representation in one view affects the representation in the other....

. This means that when the user has multiple views or windows in a project, selecting an object in one of them will highlight the same object in all other windows.

GeoDa also is capable of producing histograms, box plots, Scatter plots
Scatterplot
A scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data....

 to conduct simple exploratory analyses of the data. The most important thing, however, is the capability of mapping and linking those statistical devices with the spatial distribution of the phenomenon that the users are studying.

Dynamic linking and brushing in GeoDa

Dynamic linking and brushing
Brushing and linking
In databases, brushing and linking refers to the connection of two or more views of the same data, such that a change to the representation in one view affects the representation in the other....

 are extremely powerful devices as they allow users to interactively discover or confirm suspected patterns of spatial arrangement of the data or otherwise discard the existence of those. It allow users to extract information from data in spatial arrangements that may otherwise require very heavy computer routines to crank the numbers and start yielding some statistical results. The latter may also cost the users quite a bit in terms of expert knowledge and software capabilities.

Anselin's Moran scatter plot

See also Indicators of spatial association
Indicators of spatial association
Indicators of spatial association are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance if we are studying cancer rates among census tracts in a given city local clusters in the rates mean that there are areas that have higher or lower...


A very interesting device available in GeoDa to explore global patterns of autocorrelation in space is Anselin's Moran scatterplot. This graph depicts a standardized variable in the x-axis versus the spatial lag of that standardized variable. The spatial lag is nothing but a summary of the effects of the neighboring spatial units. That summary is obtained by means of a spatial weights matrix, which can take various forms, but a very commonly used is the contiguity matrix. The contiguity matrix is an array that has a value of one in the position (i, j) whenever the spatial unit j is contiguous to the unit i. For convenience that matrix is standardized in such a way that the rows sum to one by dividing each value by the row sum of the original matrix.

In essence Anselin's Moran scatterplot presents the relation of the variable in the location i with respect the values of that variable in the neighboring locations. By construction the slope of the line in the scatter plot is equivalent to the Moran's I coefficient. The latter is a well-known statistic that accounts for the Global spatial autocorrelation. If that slope is positive it means that there is positive spatial autocorrelation: high values of the variable in location i tend to be clustered with high values of the same variable in locations that are neighbors of i, and vice versa. If the slope in the scatter plot is negative that means that we have a sort of checkerboard pattern or a sort of spatial competition in which high values in a variable in location i tend to be co-located with lower values in the neighboring locations.

In Anselin's Moran scatter plot the slope of the curve is calculated and displayed right on top of the graph. As you can see the value in this case is positive, which means that areas with high rate of criminality tend to have neighbors with high rates as well, and vice versa.

Global versus Local Analyses in GeoDa

At the global level we can talk about clustering, i.e. the general trend of the map to be clustered; at the local level we can talk about clusters i.e. we are able to pinpoint the locations of the clusters. The latter can be assessed by means of Local Indicators of Spatial Association - LISA. LISA analysis allows us to identify where are the areas high values of a variable that are surrounded by high values on the neighboring areas i.e. what is called the high-high clusters. Concomitantly, the low-low clusters are also identified from this analysis.

Another type of phenomenon that is important to analyze in this context is the existence of outliers that represent high values of the variable in a given location surrounded by low values in the neighboring locations. This functionality is available in GeoDa by means of Anselin's Moran scatter plot. Note however, that the fact that a value is high in comparison with the values in neighboring locations does not necessarily means that it is an outlier as we need to assess the statistical significance of that relationship. In other words, we may find areas where there seems to be clustering or where there may seem to be clusters but when the statistical procedures are conducted they turn to be non statistically significant clusters or outliers. The procedures employed to assess statistical significance consists on a Monte Carlo simulation of different arrangements of the data and the construction of an empirical distribution of simulated statistics. Afterwards the value obtained originally is compared to the distribution of simulated values and if the value exceeds the 95h percentile it is said that the relation found is significant at 5%.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK