Data Intensive Computing
Encyclopedia
Data-Intensive Computing is a class of parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 applications which use a data parallel approach to processing large volumes of data typically terabytes or petabytes in size and typically referred to as Big Data
Big data
Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing...

. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive and typically require small volumes of data, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

Introduction

The rapid growth of the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

 and World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 has led to vast amounts of information available online. In addition, business and government organizations create large amounts of both structured and unstructured information which needs to be processed, analyzed, and linked. Vinton Cerf of Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 has described this as an “Information Avalanche” and has stated “we must harness the Internet’s energy before the information it has unleashed buries us”. An IDC
IDC
-Businesses and organizations:* Industrial Development Corporation, also known as Castle Grande, an Arkansas company involved in the Whitewater political controversy* Interactive Data Corporation , a financial services company...

 white paper sponsored by EMC
EMC
EMC may refer to:In organizations:* EMC Corporation, an information management company* IEEE Electromagnetic Compatibility Society, a worldwide professional engineering society* Eastern Media Centre, a television channel in the UK...

 estimated the amount of information currently stored in a digital form in 2007 at 281 exabytes and the overall compound growth rate at 57% with information in organizations growing at even a faster rate. In another study of the so-called information explosion it was estimated that 95% of all current information exists in unstructured form with increased data processing requirements compared to structured information. The storing, managing, accessing, and processing of this vast amount of data represents a fundamental need and an immense challenge in order to satisfy needs to search, analyze, mine, and visualize this data as information. Data-intensive computing is intended to address this need.

Parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 approaches can be generally classified as either compute-intensive, or data-intensive. Compute-intensive is used to describe application programs that are compute bound. Such applications devote most of their execution time to computational requirements as opposed to I/O, and typically require small volumes of data. Parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 of compute-intensive applications typically involves parallelizing individual algorithms within an application process, and decomposing the overall application process into separate tasks, which can then be executed in parallel on an appropriate computing platform to achieve overall higher performance than serial processing. In compute-intensive applications, multiple operations are performed simultaneously, with each operation addressing a particular part of the problem. This is often referred to as task parallelism
Parallelism
Parallelism may refer to:* Angle of parallelism, the angle at one vertex of a right hyperbolic triangle that has two hyperparallel sides* Conscious parallelism, price-fixing between competitors in an oligopoly that occurs without an actual spoken agreement between the parties* Parallel computing,...

.

Data-intensive is used to describe applications that are I/O bound or with a need to process large volumes of data. Such applications devote most of their processing time to I/O and movement and manipulation of data. Parallel processing
Parallel processing
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

 of data-intensive applications typically involves partitioning or subdividing the data into multiple segments which can be processed independently using the same executable application program in parallel on an appropriate computing platform, then reassembling the results to produce the completed output data. The greater the aggregate distribution of the data, the more benefit there is in parallel processing of the data. Data-intensive processing requirements normally scale linearly according to the size of the data and are very amenable to straightforward parallelization. The fundamental challenges for data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. Researchers have coined the term BORPS (Billions of records per second) to measure record processing speed in a way analogous to how the term MIPS
MIPS
MIPS may refer to:* Maharana Institute of Professional Studies, an institution in Kanpur, Uttar Pradesh, India* Mansehra International Public School and College, in Mansehra, Pakistan* Material input per unit of service, an economic efficiency indicator...

 applies to describe computers' processing speed.

Data-Parallelism

Computer system architectures which can support data parallel applications are a potential solution to the terabyte and petabyte scale data processing requirements of data-intensive computing. Data-parallelism can be defined as a computation applied independently to each data item of a set of data which allows the degree of parallelism to be scaled with the volume of data. The most important reason for developing data-parallel applications is the potential for scalable performance, and may result in several orders of magnitude performance improvement. The key issues with developing applications using data-parallelism are the choice of the algorithm, the strategy for data decomposition, load balancing
Load balancing
Load balancing or load distribution may refer to:*Load balancing , balancing a workload amongst multiple computer devices*Load balancing , the storing of excess electrical power by power stations during low demand periods, for release as demand rises*Weight distribution, the apportioning of weight...

 on processing nodes, message passing
Message passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

 communications between nodes, and the overall accuracy of the results. The development of a data parallel application can involve substantial programming complexity to define the problem in the context of available programming tools, and to address limitations of the target architecture. Information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

 from and indexing of Web documents is typical of data-intensive computing which can derive significant performance benefits from data parallel implementations since Web and other types of document collections can typically then be processed in parallel.

Characteristics of Data-Intensive Computing Systems

The National Science Foundation believes that data-intensive computing requires a “fundamentally different set of principles” than current computing approaches. Through a funding program within the Computer and Information Science and Engineering area, the NSF is seeking to “increase understanding of the capabilities and limitations of data-intensive computing.” The key areas of focus are:
  • Approaches to parallel programming to address the parallel processing
    Parallel processing
    Parallel processing is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and in parallel computing by machines.-Parallel processing by...

     of data on data-intensive systems
  • Programming abstractions including models, languages, and algorithms which allow a natural expression of parallel processing of data
  • Design of data-intensive computing platforms to provide high levels of reliability, efficiency, availability, and scalability.
  • Identifying applications that can exploit this computing paradigm and determining how it should evolve to support emerging data-intensive applications


Pacific Northwest National Labs has defined data-intensive computing as “capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies”. They believe that to address the rapidly growing data volumes and complexity requires “epochal advances in software, hardware, and algorithm development” which can scale readily with size of the data and provide effective and timely analysis and processing results.

Processing Approach

Current data-intensive computing platforms typically use a parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 approach combining multiple processors and disks in large commodity computing clusters
Cluster (computing)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks...

 connected using high-speed communications switches and networks which allows the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A cluster can be defined as a type of parallel and distributed system, which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource. This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 this approach is considered suitable for data-intensive computing and problems which are “embarrassingly parallel” , i.e. where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 including clusters and data grids and cloud computing
Cloud computing
Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network ....

.

Common Characteristics

There are several important common characteristics of data-intensive computing systems that distinguish them from other forms of computing:

(1) The principle of collocation of the data and programs or algorithms is used to perform the computation. To achieve high performance in data-intensive computing, it is important to minimize the movement of data. This characteristic allows processing algorithms to execute on the nodes where the data resides reducing system overhead and increasing performance. Newer technologies such as InfiniBand
InfiniBand
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its features include high throughput, low latency, quality of service and failover, and it is designed to be scalable...

 allow data to be stored in a separate repository and provide performance comparable to collocated data.

(2) The programming model utilized. Data-intensive computing systems utilize a machine-independent approach in which applications are expressed in terms of high-level operations on data, and the runtime system transparently controls the scheduling, execution, load balancing, communications, and movement of programs and data across the distributed computing cluster. The programming abstraction and language tools allow the processing to be expressed in terms of data flows and transformations incorporating new dataflow programming languages and shared libraries of common data manipulation algorithms such as sorting.

(3) A focus on reliability and availability. Large-scale systems with hundreds or thousands of processing nodes are inherently more susceptible to hardware failures, communications errors, and software bugs. Data-intensive computing systems are designed to be fault resilient. This typically includes redundant copies of all data files on disk, storage of intermediate processing results on disk, automatic detection of node or processing failures, and selective re-computation of results.

(4) The inherent scalability of the underlying hardware and software architecture
Software architecture
The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both...

. Data-intensive computing systems can typically be scaled in a linear fashion to accommodate virtually any amount of data, or to meet time-critical performance requirements by simply adding additional processing nodes. The number of nodes and processing tasks assigned for a specific application can be variable or fixed depending on the hardware, software, communications, and distributed file system
Distributed file system
Network file system may refer to:* A distributed file system, which is accessed over a computer network* Network File System , a specific brand of distributed file system...

 architecture.

System Architectures

A variety of system
System
System is a set of interacting or interdependent components forming an integrated whole....

 architectures have been implemented for data-intensive computing and large-scale data analysis applications including parallel and distributed relational database management systems which have been available to run on shared nothing clusters of processing nodes for more than two decades.
However most data growth is with data in unstructured form and new processing paradigms with more flexible data models were needed. Several solutions have emerged including the MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 architecture pioneered by Google and now available in an open-source implementation called Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 used by Yahoo, Facebook
Facebook
Facebook is a social networking service and website launched in February 2004, operated and privately owned by Facebook, Inc. , Facebook has more than 800 million active users. Users must register before using the site, after which they may create a personal profile, add other users as...

, and others. LexisNexis Risk Solutions
LexisNexis
LexisNexis Group is a company providing computer-assisted legal research services. In 2006 it had the world's largest electronic database for legal and public-records related information...

 also developed and implemented a scalable platform for data-intensive computing which is used by LexisNexis
LexisNexis
LexisNexis Group is a company providing computer-assisted legal research services. In 2006 it had the world's largest electronic database for legal and public-records related information...

.

MapReduce

The MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 architecture and programming model pioneered by Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 is an example of a modern systems architecture designed for data-intensive computing. The MapReduce architecture allows programmers to use a functional programming style to create a map function that processes a key-value pair associated with the input data to generate a set of intermediate key-value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Since the system automatically takes care of details like partitioning the input data, scheduling and executing tasks across a processing cluster, and managing the communications between nodes, programmers with no experience in parallel programming can easily use a large distributed processing environment.

The programming model for MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 architecture is a simple abstraction where the computation takes a set of input key-value pairs associated with the input data and produces a set of output key-value pairs. In the Map phase, the input data is partitioned into input splits and assigned to Map tasks associated with processing nodes in the cluster. The Map task typically executes on the same node containing its assigned partition of data in the cluster. These Map tasks perform user-specified computations on each input key-value pair from the partition of input data assigned to the task, and generates a set of intermediate results for each key. The shuffle and sort phase then takes the intermediate data generated by each Map task, sorts this data with intermediate data from other nodes, divides this data into regions to be processed by the reduce tasks, and distributes this data as needed to nodes where the Reduce tasks will execute. The Reduce tasks perform additional user-specified operations on the intermediate data possibly merging values associated with a key to a smaller set of values to produce the output data. For more complex data processing procedures, multiple MapReduce calls may be linked together in sequence.

Hadoop

Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 is an open source software project sponsored by The Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

 which implements the MapReduce architecture. Hadoop now encompasses multiple subprojects in addition to the base core, MapReduce, and HDFS distributed filesystem. These additional subprojects provide enhanced application processing capabilities to the base Hadoop implementation and currently include Avro, Pig
Pig
A pig is any of the animals in the genus Sus, within the Suidae family of even-toed ungulates. Pigs include the domestic pig, its ancestor the wild boar, and several other wild relatives...

, HBase
HBase
HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS , providing BigTable-like capabilities for Hadoop...

, ZooKeeper, Hive
Hive
A hive may refer to a beehive, an enclosed structure in which some honey bee species live and raise their young.Hive may also refer to:-Arts:* The Hives, a Swedish rock band* Hive , a DJ and producer in the drum and bass music genre...

, and Chukwa. The Hadoop MapReduce architecture is functionally similar to the Google implementation except that the base programming language for Hadoop is Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 instead of C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

. The implementation is intended to execute on clusters of commodity processors.

Hadoop implements a distributed data processing scheduling and execution environment and framework for MapReduce jobs. Hadoop includes a distributed file system called HDFS which is analogous to GFS
GFS
There are three distributed file systems that share the initials GFS:* Global File System, a cluster file system for Linux released under GPL license* GlusterFS, a cluster file system...

 in the Google MapReduce implementation. The Hadoop execution environment supports additional distributed data processing capabilities which are designed to run using the Hadoop MapReduce architecture. These include HBase
HBase
HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS , providing BigTable-like capabilities for Hadoop...

, a distributed column-oriented database which provides random access read/write capabilities; Hive
Hive
A hive may refer to a beehive, an enclosed structure in which some honey bee species live and raise their young.Hive may also refer to:-Arts:* The Hives, a Swedish rock band* Hive , a DJ and producer in the drum and bass music genre...

 which is a data warehouse system built on top of Hadoop that provides SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

-like query capabilities for data summarization, ad-hoc queries, and analysis of large datasets; and Pig – a high-level data-flow programming language and execution framework for data-intensive computing.

Pig
Pig
A pig is any of the animals in the genus Sus, within the Suidae family of even-toed ungulates. Pigs include the domestic pig, its ancestor the wild boar, and several other wild relatives...

 was developed at Yahoo! to provide a specific language notation for data analysis applications and to improve programmer productivity and reduce development cycles when using the Hadoop MapReduce environment. Pig programs are automatically translated into sequences of MapReduce programs if needed in the execution environment. Pig provides capabilities in the language for loading, storing, filtering, grouping, de-duplication, ordering, sorting, aggregation, and joining operations on the data.

HPCC

LexisNexis Risk Solutions
LexisNexis
LexisNexis Group is a company providing computer-assisted legal research services. In 2006 it had the world's largest electronic database for legal and public-records related information...

, independently developed and implemented a solution for data-intensive computing called the HPCC
HPCC
HPCC , also known as DAS , is a Data Intensive Computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing Big...

 (High-Performance Computing Cluster). The development of this computing platform began in 1999 and applications were in production by late 2000. The LexisNexis
LexisNexis
LexisNexis Group is a company providing computer-assisted legal research services. In 2006 it had the world's largest electronic database for legal and public-records related information...

 approach also utilizes commodity clusters of hardware running the Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 operating system as shown in Figure 1. Custom system software and middleware components were developed and layered on the base Linux operating system to provide the execution environment and distributed filesystem support required for data-intensive computing. LexisNexis also implemented a new high-level language for data-intensive computing called ECL.

The ECL programming language
ECL, data-centric programming language for Big Data
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions....

 is the primary distinguishing factor between HPCC and other data-intensive computing solutions. It is a high-level, declarative, data-centric, implicitly parallel
Implicit parallelism
In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs...

 language that allows the programmer to define what the data processing result should be and the dataflows and transformations that are necessary to achieve the result. The ECL
ECL
ECL may stand for:*ECL programming language, an extensible programming language developed at Harvard*ECL, data-centric programming language for Big Data, a declarative, data centric programming language used for data intensive supercomputing...

 language includes extensive capabilities for data definition, filtering, data management, and data transformation, and provides an extensive set of built-in functions to operate on records in datasets which can include user-defined transformation functions. ECL
ECL, data-centric programming language for Big Data
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions....

 programs are compiled into optimized C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 source code, which is subsequently compiled into executable code and distributed to the nodes of a processing cluster.

To address both batch and online aspects data-intensive computing applications, HPCC
HPCC
HPCC , also known as DAS , is a Data Intensive Computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing Big...

 includes two distinct cluster environments, each of which can be optimized independently for its parallel data processing purpose. The Thor platform is a cluster whose purpose is to be a data refinery for processing of massive volumes of raw data for applications such as data cleansing and hygiene, ETL (extract, transform load), record linking and entity resolution, large-scale ad-hoc analysis of data, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications. A Thor system is similar in its hardware configuration, function, execution environment, filesystem, and capabilities to the Hadoop MapReduce platform, but provides higher performance in equivalent configurations. The Roxie platform provides an online high-performance structured query and analysis system or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times. A Roxie system is similar in its function and capabilities to Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 with HBase
HBase
HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS , providing BigTable-like capabilities for Hadoop...

 and Hive
Hive
A hive may refer to a beehive, an enclosed structure in which some honey bee species live and raise their young.Hive may also refer to:-Arts:* The Hives, a Swedish rock band* Hive , a DJ and producer in the drum and bass music genre...

 capabilities added, but provides an optimized execution environment and filesystem for high-performance online processing. Both Thor and Roxie systems utilize the same ECL programming language for implementing applications, increasing programmer productivity.

See also

  • List of important publications in concurrent, parallel, and distributed computing
  • Parallel Computing
    Parallel computing
    Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

  • Distributed computing
    Distributed computing
    Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

  • Data parallelism
    Data parallelism
    Data parallelism is a form of parallelization of computing across multiple processors in parallel computing environments. Data parallelism focuses on distributing the data across different parallel computing nodes...

  • Big Data
    Big data
    Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing...

  • Implicit parallelism
    Implicit parallelism
    In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs...

  • Massively parallel
    Massively parallel
    Massively parallel is a description which appears in computer science, life sciences, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself,...

  • Supercomputer
    Supercomputer
    A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

  • HPCC
    HPCC
    HPCC , also known as DAS , is a Data Intensive Computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing Big...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK