Multidimensional hierarchical toolkit
Encyclopedia
The Multidimensional hierarchical toolkit or Multi-Dimensional and Hierarchical (MDH) Database Toolkit is a Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

-based, open-sourced, toolkit of portable software that supports very fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in databases ranging in size up to 256 terabytes. The package is written in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 and is available under the GNU GPL/LGPL/Free Documentation licenses in source code form. The distribution kit contains demonstration implementations of network-capable, interactive text and sequence retrieval tools that function with very large genomic data bases and illustrate the toolkit's capability to manipulate massive data sets of genomic information.

Distribution

The toolkit is distributed as part of the Mumps Compiler. Versions exist for Linux, Cygwin
Cygwin
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...

, and Windows XP
Windows XP
Windows XP is an operating system produced by Microsoft for use on personal computers, including home and business desktops, laptops and media centers. First released to computer manufacturers on August 24, 2001, it is the second most popular version of Windows, based on installed user base...

.

Origins

The toolkit is a solution to the problem of manipulating very large, character string indexed, multi-dimensional, sparse matrices. It is based on MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...

 (also referred to as M), a general purpose programming language that originated in the mid 60's at the Massachusetts General Hospital
Massachusetts General Hospital
Massachusetts General Hospital is a teaching hospital and biomedical research facility in the West End neighborhood of Boston, Massachusetts...

.

Key features

The principal database feature in this project is the global array which permits direct, efficient manipulation of multi-dimensional arrays of effectively unlimited size. A global array is a persistent, sparse, undeclared, multi-dimensional, string indexed data disk based structure. A global array may appear anywhere an ordinary array reference is permitted and data may be stored at leaf nodes as well as intermediate nodes in the data base array. The number of subscripts in an array reference is limited only by the total length of the array reference with all subscripts expanded to their string values. The toolkit includes several functions to traverse the data base and manipulate the arrays.

The toolkit makes the data base and function set available as C++ classes and also permits interpretive execution of legacy Mumps scripts. To use the toolkit, you install the MDH and Mumps distribution kit and related code.

Functions implemented

The toolkit implements the legacy Mumps functions: $ascii, $extract, $find, $horolog,
$length, $name, $justify, $order, $piece, and $test as well as vector and matrix operations, Boyer–Moore–Gosper string search algorithm
Boyer–Moore string search algorithm
The Boyer–Moore string search algorithm is a particularly efficient string searching algorithm, and it has been the standard benchmark for the practical string search literature. It was developed by Bob Boyer and J Strother Moore in 1977...

 functions, a Smith–Waterman algorithm function, relational algebra operations and access to the Perl Compatible Regular Expression library (PCRE).
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK