Hierarchical Segmentation (HSEG) and its Recursive approximation (RHSEG) for Data Analysis
For faster, highly accurate processing of high-resolution images and other complex data sets in 2D and 3D
The Hierarchical Segmentation (HSEG) suite of technologies, developed by NASA Goddard Space Flight Center's Dr. James C. Tilton, provides hierarchical segmentation (pre-processing) of image and image-like data. HSEG's recursive, divide-and-conquer approxmation, RHSEG, provides the ability to process very large data sets.
The HSEG suite significantly improves the extraction of patterns from complex data sets. Optimized for speed and accuracy, it provides the user with precise control for selecting the desired level of detail from the hierarchy of results. The software allows the user to group non-spatially adjacent regions for unprecedented accuracy and flexibility within a wide range of data types. Images can be two-dimensional or three-dimensional single-band, multispectral, or hyperspectral data. Originally designed for remote earth sensing, the HSEG suite is broadly applicable to a wide range of applications, from medical image analysis to image data mining.
NASA Goddard invites companies, universities, and other government laboratories to license HSEG technologies.
The HSEG suite offers the following benefits:
The RHSEG suite is useful for pre-processing image and image-like data for further intelligent analysis. Possible applications for HSEG include, but are not limited to:
NASA has used HSEG software in several projects.
Subdue'ing RHSEG: The Marriage of Graph-Based Knowledge Discovery (Subdue) with Image Segmentation Hierarchies (from RHSEG) for Data Analysis, Data Mining, and Knowledge Discovery
The HSEG suite was a key technology in a NASA research project funded for fiscal year 2008 (October 2007–September 2008), "Subdue’ing RHSEG: The Marriage of Graph-Based Knowledge Discovery (Subdue) with Image Segmentation Hierarchies (from RHSEG) for Data Analysis, Data Mining, and Knowledge Discovery." The principal investigator was Dr. James C. Tilton of NASA Goddard Space Flight Center, and the co-investigator was Dr. Diane J. Cook of Washington State University. Seed funding for this project came from NASA’s Applied Information Systems Research Program.
Drs. Tilton and Cook investigated the design and implementation of the integration of the Subdue graph-based knowledge discovery system, developed at the University of TexasArlington and Washington State University, with image segmentation hierarchies produced by RHSEG.
Subdue is a method for discovering substructures in structural databases. Subdue was devised for general-purpose automated discovery, concept learning, and hierarchical clustering, with or without domain knowledge. For Subdue to be effective in finding patterns in imagery data, the data must be abstracted up from the pixel domain through image segmentation.
RHSEG was an excellent choice because it provided the image segmentations required for input to Subdue, based on three key factors: (1) the high spatial fidelity of image segmentations produced by RHSEG, (2) the ability of RHSEG to automatically group spatially connected region objects into region classes, and (3) the hierarchical set of image segmentations that RHSEG automatically produced.
This seed project took some important initial steps in translating image segmentations into relational graphs for analysis by Subdue, achieving some limited data analysis success. The grouping of region objects into region classes, provided by RHSEG, proved important in this translation. The seed project also clarified the importance of enabling Subdue to utilize region object size and region object neighbor relationship information. This is one of the key elements of a follow-on proposal to NASA’s Applied Information Systems Research Program, “Object-Based Image Analysis for Data Analysis, Data Mining and Knowledge Discovery.” Another element of this proposed project seeks to enable Subdue to utilize directly the RHSEG-provided segmentation hierarchy. NASA is expected to make the funding announcement for this follow-on proposal in spring 2009.
MODIS Snow and Ice Product Suite: Maintenance, Enhancement, Error Analysis, and Validation
The HSEG suite is being used in the NASA-funded research project, "MODIS Snow and Ice Product Suite: Maintenance, Enhancement, Error Analysis, and Validation," selected for funding in fiscal year 2008 by NASA's Science Mission Directorate. The principal investigator was Dr. Dorothy K. Hall, NASA Goddard Space Flight Center, and the co-investigators were Dr. Vincent Salomonson, University of Utah; Dr. George A. Riggs, Science Systems and Applications, Inc., and Dr. James C. Tilton, NASA Goddard Space Flight Center.
The objective of this project is to maintain, enhance, validate, and refine the current suite of Terra and Aqua MODIS snow and sea ice algorithms to provide consistent, systematic measurements for science research, modeling, and for development of climate-data records of snow cover and sea ice surface temperature (IST).
Automating and Enhancing Protocols for the Development of Signatures for Archaeological Sites Using Publicly Available NASA Imagery
The HSEG suite is being used to find and study archeology sitesan effort funded through the NASA Space Archaeology Program. Cultural Site Research and Management (CSRM), a private company, contracts with the Department of Defense to help U.S. Navy and Marine Corps produce archaeological surveys. The company uses RHSEG to test a number of approaches to improving the accuracy of archaeological site identification, including the elimination of “noise,” with the goal of reducing false-positive signatures. CSRM is working at a World Heritage site in Petra, Jordan, at Mayan and Inca sites in Central America, and at a North American Indian site in Bluff, Utah. The company seeks to identify and preserve such sites, and to help understand the history of environmental change and the ways in which human alterations of the landscape have precipitated that change and, in some cases, environmental collapse.
The HSEG suite has been nonexclusively licensed to Bartron Medical Imaging, LLC (link opens new browser window). Since launching its medical imaging product, Med-Seg, Bartron has reported that RHSEG has enabled the company to successfully analyze and extract from grayscale data meaningful and significant features previously indistinguishable by the human eye. Read more about Bartron’s successful use of RHSEG.
| Technology Details
HSEG partitions two- and three-dimensional image or image-like data into regions or clusters at various levels of detail. Using HSEG, analysts can hierarchically relate regions of data at coarser levels of detail to regions of data at finer levels. After being processed by HSEG software, data are grouped and can be analyzed in terms of hierarchically related regions, rather than as individual data points, enabling a more consistent and accurate analysis. More information about how HSEG works appears below. To understand the richness of HSEG's capabilities, it is helpful to have some background on image segmentation in general.
Image segmentation is the partitioning of an image into related sections or regions. Best merge region growing is a widely used approach for image segmentation. A classic example of best merge region growing is the "hierarchical step-wise optimization" (HSWO) approach develop by Beaulieu and Goldberg ("Hierarchy in picture segmentation: A stepwise optimal approach," IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(2), pp. 150-163, Feb. 1989). Best merge region growing approaches, such as HSWO, produce image segmentations consisting of spatially connected region objects. However, spectrally similar objects often appear in spatially separated locations. The hierarchical segmentation software (HSEG) takes advantage of this common image characteristic by grouping spectrally similar but spatially separated region objects together into region classes. HSEG does this by alternating HSWO-type iterations of merging spatially adjacent regions with iterations that merge spatially non-adjacent regions. HSEG is unique in this grouping of spatially disjoint region objects into region classes.
The straightforward intertwining of spatially non-adjacent region merges with the spatially adjacent regions merges normally performed by a best merge region growing approach such as HSWO would be prohibitively expensive computationally. HSEG significantly reduces this computational load by limiting the regions considered for non-adjacent region merging to “large regions” containing at least "min_npixels" pixels. The value of "min_npixels" is adjusted automatically by HSEG so that the number of "large regions" stays as much as possible within a certain range, normally 512 to 1024 "large regions."
However, HSEG still cannot be run on very large images without further modification. RHSEG is a recursive approximation of HSEG in that the image data is recursively subdivided into smaller subsections for initial processing. With RHSEG, the whole image is eventually processed by HSEG after being initialized with the HSEG segmentation results from the smaller subsections of the image. RHSEG has an "embarrassingly" parallel implementation which enables RHSEG to produce hierarchical segmentations of very large images in a very reasonable amount of time. For example a Landsat Thematic Mapper scene of about 7,000 columns and 6,500 rows can be processed on a 256 CPU parallel cluster in less than 10 minutes.
A hierarchical set of image segmentations is a set of image segmentations of the same image at different levels of detail in which the less detailed segmentations can be produced from specific merges of regions contained in the more detailed segmentations. Unlike most other segmentation approaches that produce a single segmentation result, a natural product of the HSEG approach is a hierarchical set of segmentations. A single preferred segmentation can be selected out of the segmentation hierarchy by examining how the features of individual regions change throughout each level of detail.
Dr. Tilton first began developing hierarchical segmentation technologies in 1983, after becoming familiar with earth sciences and remote sensing in graduate school. During his initial years at NASA, he began to think about image segmentation and analyzing the data beyond the typical "per-pixel" approach, because each pixel did not necessarily provide enough information about where it fitted into the overall "scene." Dr. Tilton theorized that a better understanding could be achieved by considering the context of the image and looking at the objects in the image rather than the individual pixels. This theory ultimately led to the initial version of the core HSEG software algorithm.
The HSEG suite now contains the following components. For information about how to access these components, see the “Licensing” section below.
HSEG (for single-processor computing platforms, for two- or three-dimensional images, and for non-image data)
The HSEG algorithm is effective in processing small to moderate size images, up through 1024x1024 pixels.
Recursive Approximation of HSEG, called RHSEG (for parallel and single-processor computing platforms, for two- or three-dimensional images, and for non-image data)
Because of its high computer memory requirements, HSEG generally cannot be used to process large images (larger than 2048x2048 pixels) on conventional computing platforms because of its computer memory requirements. The RHSEG approximation of HSEG must be used to process large images. RHSEG's recursive subdivision of the image data into subsection for initial processing substantially reduces the program's computer memory requirements. RHSEG divide-and-conquer implementation also lends itself to a straightforward and efficient implementation on parallel or serial computing platforms.
Parallel vs. Serial Computing: The implementation of RHSEG on parallel computing platforms is very effective in exploiting available concurrent processing, making it possible to process very large images in a reasonable amount of processing time (e.g., less than 10 minutes for a full Landsat Thematic Mapper scene).
For those without access to parallel processing machines, NASA offers a version of RHSEG that uses an innovative data-swapping scheme to enable processing of larger image sets on single processor platforms. For example, a 6912-column, 6528-row, 6-band Landsat Thematic Mapper image can be processed in 8 hours, whereas previous versions were unable to process images of this magnitude. (The parallel version still is significantly faster, capable of processing the same image in only 1.5 minutes.)
Two- and Three-Dimensional Images: RHSEG can be used to analyze two- or three-dimensional data. Because processing three-dimensional data increases computational demands, NASA offers separate two-dimensional and three-dimensional versions, each with different licensing restrictions.
Non-Image Data: RHSEG also can process non-image data if it has image-like characteristics when plotted on a one-, two-, or three-dimensional array. Data has image-like characteristics when data points positioned nearer to each other in an array are more highly correlated than data positioned further away.
Artifact Elimination: All versions of RHSEG incorporate the patent-pending artifact elimination software, which avoids processing window artifacts caused by the RHSEG pre-processing software’s recursive subdivision and subsequent combination of the image data. This is accomplished by identifying pixels in the region that are more similar to pixels in the candidate region and either reassigning them to the region or splitting them out and remerging them after processing.
HSEG's or RHSEG's output is a set of hierarchical segmentations. Most scientists, however, want a single segmentation to work with, rather than a set of segmentations. The problem is selecting the appropriate image segmentation from the hierarchical set. At times, this selection may be as simple as determining which single segmentation out of the hierarchical set is suitable for the application. Frequently, however, a single segmentation synthesized by combining segmentations from a number of the hierarchical segmentations is more suitable. Therefore, an additional program, HSEGViewer is available to facilitate combining a number of the hierarchical segmentations into a single segmentation. This tool enables the user to view the HSEG or RHSEG output, facilitating the selection of segmentation sets.
The HSEG software initially partitions the image data (by default, each image pixel is placed into a separate partition or region) and then compares each region with spatially adjacent regions. Pairs of spatially adjacent regions that are most similar are combined to form larger regions. Then, HSEG compares pairs of non-spatially adjacent regions, and combines pairs of non-spatially adjacent regions that are at least as similar as the previously compared spatially adjacent regions. This process continues until reaching a prespecified number of regions (depending on the coarseness or fineness of detail desired). At this point, HSEG provides options for controlling the output of the segmentation hierarchy from that number of regions, down to a two-region segmentation.
It is noteworthy and useful that the HSEG Software produces a hierarchical set of segmentations that faithfully and compactly represent image information content. The following examples provide some indication of its usefulness.
Figure 1 is a true color rendition of a 512x512 pixel portion of a Landsat 7 Thematic Mapper image from northern Wisconsin collected on May 26, 2003. HSEG was used to pre-process this image into a segmentation hierarchy; it was not necessary to use RHSEG because this is a relatively small image.
Figure 2 displays the segmentation at the coarsest level of the segmentation hierarchy produced by HSEG. This segmentation consists of 199 region objects grouped into just 2 region classes and separates water from land. Notice how faithfully this segmentation reproduces the details of the lake and stream shore lines.
Figure 3 displays the segmentation at the second coarsest level of the HSEG segmentation hierarchy. This segmentation consists of 678 region objects grouped into just 3 region classes. Besides separating water from land, this segmentation also separates wetlands from dry land.
Figure 4 displays the segmentation at the third coarsest level of the HSEG segmentation hierarchy. This segmentation consists of 1039 region objects grouped into just 4 region classes. Besides separating water from land, and wetland from dry land, this segmentation also separates vegetated and non-vegetated land.
So far we have just examined particular levels of the HSEG segmentation hierarchy. Using the accompanying HSEGViewer tool, it is also possible to examine how the segmentation hierarchy evolves for a particular region class. We will use this capability to examine how the segmentation hierarchy evolves for the water region class. Figure 5 illustrates the separation of shallower streams and lakes from the rest of the lakes using HSEGViewer to differentially examine and display the HSEG segmentation hierarchy, and Figure 6 illustrates a further example of this type capability in the separation of darker (deeper?) lakes from the other lakes.
A second example of HSEG's capabilities is illustrated in a comparison between HSWO and HSEG segmentation results, as shown in Figure 7 (a,b,c,d). It is evident that HSEG does a much better job in preserving the information content in the example case, since the region mean image for HSEG segmentation result (Fig. 7(c)) looks much more similar to the original image than the corresponding region mean image for the HSWO segmentation result (Fig. 7(b)). HSEG produces a highly compact representation of this particular image into just 43 region classes, grouping each of 7,011 region objects into one of these 43 region classes. The 218 region object representation produced by HSWO is not quite so compact, but, more significantly, does a much poorer job of representing the information content of the image. HSEG's tight integration of region object finding and region classification enables a compact representation of the image information content while maintaining high spatial fidelity.
Consider the patterning of dark roofs evident throughout Fig. 7(a). These roofs are labeled as region class "6" in the HSEG segmentation result shown in Fig. 7(c) and are highlighted in white in Fig. 7(d). Note a certain regularity of the roof pattern to the southeast, east and north of Patterson Park. This area is generally an older residential area, with a few business interspersed. The roof pattern to the southwest and west of Patterson Park appears somewhat different. This area has a denser concentration of businesses and apartment complexes. Pixel-based analysis could never detect this difference in spatial patterning whereas detection of such spatial patterning should be possible with the appropriate object-based image analysis (OBIA) approach. The assumption made here is that if the spatial pattern detection system built into the human eye-brain system can detect it, a sufficiently sophisticated OBIA approach should also be able to detect it.
Figure 7. Comparison of HSEG segmentation versus region growing segmentation without region classification produced by HSWO. (a) A portion of an Ikonos image depicting the Patterson Park area of Baltimore, MD. (b) The region mean image from the HSWO region growing segmentation result at merging threshold 8.75 (218 region objects). (c) The region mean image from the HSEG result at merging threshold 8.76 (7,011 region objects grouped into 43 region classes). (d) Region class "6" highlighted in white from the HSEG result shown in (c).
The HSEG software suite has been tested for a variety of image segmentation applications for projects undertaken by NASA Goddard’s Computational and Information Sciences and Technology Office.
The table below compares some processing times (minutes:seconds) for 2.8 GHz processors with 8 GBytes of RAM, on a six-band Landsat Thematic Mapper data set. The results demonstrate the effect of the number of recursive levels.
Nonadjacent region merging weighting factor (spclust_wght) = 0.5
A: HSEG can process data in stored in a wide variety of image file formats. HSEG supports the image file formats supported by the Geospatial Data Abstraction Library (GDAL). See http://www.gdal.org/formats_list.html for a list of formats.
A: Maximum image size is dependent on the amount of RAM available. With 8 Gigabytes of RAM, you can process images up to 8,000 by 8,000 pixels with any number of bands and with the RHSEG rnb_levels parameter set to 4 to allow for the most efficient processing. Images as large as 16,000 by 16,000 pixels have been processed on parallel machines at the NASA Center for Climate Simulation (link opens new browser window).
Note that larger images may require parallel processing.
A: The HSEGViewer allows the user to manually classify and label regions with meaningful names (e.g., river, ground cover, buildings). Currently, RHSEG does not include any automated classification algorithms such as nearest neighbor, maximum likelihood, etc.
A: Yes. HSEG and RHSEG include a parameter called spclust_wght. By varying its value, the user can control both the relative importance of spectral clustering versus region growing in determining segments, as well as the required similarity between nonadjacent regions. More information is provided in the RHSEG Help documentation. To obtain a copy, please contact Goddard's Innovative Partnerships Program office.
A: RHSEG licensing is available for both Windows and Linux/Unix platforms. By default, the trial version is available for Windows. Trials for Solaris Unix or certain implementations of Linux are available on request.
For information about the various releases of HSEG software, review this document.
NASA has issued two patents and two pending patent applications. The HSEG suite also includes other intellectual property.
U.S. Patent #6,895,115 (link opens new browser window): Method for implementation of recursive hierarchical segmentation on parallel computers
U.S. Patent #7,697,759: A split-remerge method for eliminating processing window artifacts in recursive hierarchical segmentation.
Pending patent application:
A U.S. patent application (link opens new browser window) for “Systems, Methods, and Apparatus for D-Dimensional Formulation and Implementation of Recursive Hierarchical Segmentation” was filed on June 1, 2007.
A U.S. patent application for "Refinement of the HSEG algorithm for improved computational processing efficiency" was filed.
Additional intellectual property:
The Core RHSEG Pre-processing Software and the HSEGViewer are in the public domain.
| Publications and Awards
The Recursive Hierarchical Segmentation Pre-Processing Software for Analyzing Imagery Data has received several awards including:
| Licensing and Partnering Options
The HSEG software suite is available for licensing in two versions:
To start the licensing process or to submit a request to receive a 90-day evaluation version, please contact Goddard's Innovative Partnerships Program office.
| Demo Software
NASA is offering 90-day evaluation copies of the HSEG and RHSEG software to help demonstrate the program's capabilities. It is available for Windows, Solaris Unix, and various Linux non-parallel systems. (No demo is available for the parallel version of RHSEG.)
To receive the free 90-day evaluation software, please send an email to Adil Anis.
| For More Information
If you would like additional information or are interested in partnering with NASA for the commercialization of the HSEG technology, please contact:
Innovative Partnerships Program Office