Array programming - AbsoluteAstronomy.com

Computer science

Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, array programming languages (also known as vector or multidimensional languages) generalize operations on scalar

Scalar (computing)

In computing, a scalar variable or field is one that can hold only one value at a time; as opposed to composite variables like array, list, hash, record, etc. In some contexts, a scalar value may be understood to be numeric. A scalar data type is the type of a scalar variable...

s to apply transparently to vectors, matrices

Matrix (mathematics)

In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

, and higher dimensional arrays.

Array programming primitives concisely express broad ideas about data manipulation. The level of conciseness can be dramatic in certain cases: it is not uncommon to find array programming language one-liners

One-liner program

A one-liner is textual input to the command-line of an operating system shell that performs some function in just one line of input.The one liner can be# An expression written in the language of the shell....

that require more than a couple of pages of Java code.

APL

APL programming language

APL is an interactive array-oriented language and integrated development environment, which is available from a number of commercial and noncommercial vendors and for most computer platforms. It is based on a mathematical notation developed by Kenneth E...

, designed by Ken Iverson

Kenneth E. Iverson

Kenneth Eugene Iverson was a Canadian computer scientist noted for the development of the APL programming language in 1962. He was honored with the Turing Award in 1979 for his contributions to mathematical notation and programming language theory...

, was the first programming language

Programming language

A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

to provide array programming capabilities. The mnemonic APL refers to the title of his seminal book "A Programming Language" and not to arrays per se. Iverson's contribution to rigor and clarity was probably more important than the simple extension of dimensions to functions.

Concepts

The fundamental idea behind array programming is that operations apply at once to an entire set of values. This makes it a high-level programming

High-level programming language

A high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be from the specification of the program, making the process of...

model as it allows the programmer to think and operate on whole aggregates of data, without having to resort to explicit loops of individual scalar operations.

Iverson described the rationale behind array programming (actually referring to APL) as follows:
The basis behind array programming and thinking is to find and exploit the properties of data where individual elements are similar and/or adjacent. Unlike object orientation which implicitly breaks down data to its constituent parts (or scalar

Scalar (computing)

quantities), array orientation looks to group data and apply a uniform handling.

Function rank is an important concept to array programming languages in general, by analogy to tensor

Tensor

Tensors are geometric objects that describe linear relations between vectors, scalars, and other tensors. Elementary examples include the dot product, the cross product, and linear maps. Vectors and scalars themselves are also tensors. A tensor can be represented as a multi-dimensional array of...

rank in mathematics: functions that operate on data may be classified by the number of dimensions they act on. Ordinary multiplication, for example, is a scalar ranked function because it operates on zero-dimensional data (individual numbers). The cross product

Cross product

In mathematics, the cross product, vector product, or Gibbs vector product is a binary operation on two vectors in three-dimensional space. It results in a vector which is perpendicular to both of the vectors being multiplied and normal to the plane containing them...

operation is an example of a vector rank function because it operates on vectors, not scalars. Matrix multiplication

Matrix multiplication

In mathematics, matrix multiplication is a binary operation that takes a pair of matrices, and produces another matrix. If A is an n-by-m matrix and B is an m-by-p matrix, the result AB of their multiplication is an n-by-p matrix defined only if the number of columns m of the left matrix A is the...

is an example of a 2-rank function, because it operates on 2-dimensional objects (matrices). Collapse operators reduce the dimensionality of an input data array by one or more dimensions. For example, summing over elements collapses the input array by 1 dimension.

Uses

Array programming is very well suited to implicit parallelization; a topic of much research nowadays. Further, Intel and compatible CPUs developed and produced after 1997 contained various instruction set extensions, starting from MMX and continuing through SSSE3

SSSE3

Supplemental Streaming SIMD Extensions 3 is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.- History :...

and 3DNow!

3DNow!

3DNow! is an extension to the x86 instruction set developed by Advanced Micro Devices . It adds single instruction multiple data instructions to the base x86 instruction set, enabling it to perform simple vector processing, which improves the performance of many graphic-intensive applications...

, which include rudimentary SIMD

SIMD

Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...

array capabilities. Array processing is distinct from parallel processing

Parallel computing

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

in that one physical processor performs operations on a group of items simultaneously while parallel processing aims to split a larger problem into smaller ones (MIMD

MIMD

In computing, MIMD is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data...

) to be solved piecemeal by numerous processors. Processors with two or more cores are increasingly common today.

Languages

The canonical examples of array programming languages are APL, J, and Fortran 90

Fortran

Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

. Others include: A+

A+ (programming language)

A+ is an array programming language descendent from the programming language A, which in turn was created to replace APL in 1988. Arthur Whitney developed the "A" portion of A+, while other developers at Morgan Stanley extended it, adding a graphical user interface and other language features...

, IDL, K

K (programming language)

K is a proprietary array processing language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the foundation for kdb, an in-memory, column-based database, and other related financial products. The language, originally developed in 1993, is a variant of APL and...

, Q

Q (programming language from Kx Systems)

Q is a proprietary array processing language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the query language for kdb+, a disk based and in-memory, column-based database. kdb+ is based upon K, a terse variant of APL...

, Mathematica

Mathematica

Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...

, MATLAB

MATLAB

MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

, MOLSF, NumPy, GNU Octave

GNU Octave

GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...

, PDL

Perl Data Language

PDL is a set of array programming extensions to the Perl programming language.PDL is an extension to Perl v5, intended for scientific and other data intensive programming tasks...

, R

R (programming language)

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

, S-Lang, SAC

SAC programming language

SAC is a strict purely functional programming language which design is focused on the needs of numerical applications. Emphasis is laid on efficient support for array processing. Efficiency concerns are essentially twofold...

, Nial and ZPL.

Examples

In scalar languages like FORTRAN 77, C

C (programming language)

C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, Pascal

Pascal (programming language)

Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

, etc. operations apply only to single values, so a+b expresses the addition of two numbers. In such languages adding two arrays requires indexing and looping:

FORTRAN 77

00 DO 10 I = 1, N
DO 10 J = 1, N
10 A(J,I) = A(J,I) + B(J,I)

C

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][j] += b[i][j];

C#

class Program
{
static void Main(string[] args)
{
int[] numbers; //Array declaration

Console.WriteLine("Please enter the array size");
int n = int.Parse(Console.ReadLine); //Getting the Array size from user

numbers = new int[n]; //Array Creation

for (int i = 0; i < numbers.Length; i++) //Reading the Array Elements from user
{
Console.WriteLine("Enter number " + i + 1);
numbers[i] = int.Parse(Console.ReadLine);
}
}
}

This need to loop and index to perform operations on arrays is both tedious and error prone.

In array languages, operations are generalized to apply to both scalars and arrays. Thus, a+b expresses the sum of two scalars if a and b are scalars, or the sum of two arrays if they are arrays. When applied to arrays, the operations act on corresponding elements as illustrated in the loops above.

Within the array programming paradigm, the previous C# code would become (e.g. in GNU Octave or in MATLAB languages):

numbers = sscanf( input( 'Enter numbers: ' , 's' ) , '%d ' )

without the need to statically decide the array size.

Ada

The Ada language provides a library package supporting array-programming oriented syntax. Using that library, the example algorithm for adding element-by-element two matrices can be implemented as:

with Ada.Numerics.Generic_Real_Arrays;

A := A + B;

where A and B are two-dimensional arrays. It generates code that is effectively the same as the C loops shown above. An array language, therefore, simplifies programming but may come at a cost known as the abstraction penalty. Because the additions are performed in isolation to the rest of the coding, it may not produce the optimally most efficient

Algorithmic efficiency

In computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...

code (for example if additions of other elements of the same array are subsequently encountered during the same execution, causing unnecessary repeated lookups). Even the most sophisticated optimizing compiler would have an extremely hard time amalgamating two or more apparently disparate functions which might appear in different program sections or sub-routines (yet this would be entirely obvious to a programmer who would naturally try to ensure the sums were aggregated on the same 'pass' of the array to minimize overhead

Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal...

Nial

The Nial

Nial

Nial is a high-level array programming language developed from about 1981 by Mike Jenkins of Queen's University, Kingston, Ontario, Canada....

array language allows an implementation with the same syntax used in the Ada example:

A := A + B

As another example, the following generates the inner product of two arrays (matrix multiplication

Matrix multiplication

):

a inner[+,*] b

Notice how the operations are sequenced on the arrays.

J

The J

J (programming language)

The J programming language, developed in the early 1990s by Kenneth E. Iverson and Roger Hui, is a synthesis of APL and the FP and FL function-level languages created by John Backus....

array language allows a pretty similar implementation of the basic example:

A =. A + B

However, the inner product of two arrays can be used to illustrate a typical right-to-left composition of operators (or "verbs") which is derived by the APL language:

+/ a * b

which can be read as "add" (+) "together" (/) the element-by-element product between a and b. The inner product can also been restated as a single composite operator (or "verb")

a (+/ . *) b

which can be read as "add together" (+/) "at" (. or also @:) the "element-by-element product" (*) between a and b.

MATLAB

A = A + B;

The implementation in MATLAB

MATLAB

language allows the same economy allowed by using the Ada language. A variant of the MATLAB language is the GNU Octave

GNU Octave

language, which extends the original language also with augmented assignment

Augmented assignment

Augmented assignment is the name given to certain operators in certain programming languages...

s so that the example can be implemented in the shortest way:

A += B;

Both MATLAB and GNU Octave natively support linear algebra operations such as matrix multiplication, matrix inversion, the numerical solution of system of linear equations even using the Moore–Penrose pseudoinverse. The Nial example of the inner product of two arrays can be implemented using the native matrix multiplication operator

a * b;

if a is a row vector of size [1 n] and b is a corresponding column vector of size [n 1]. The inner product between two matrices having the same number of elements can be implemented with the auxiliary operator (:) which efficiently reshape a given matrix to be a column vector, and the transpose

Transpose

In linear algebra, the transpose of a matrix A is another matrix AT created by any one of the following equivalent actions:...

operator ':

A(:)' * B(:);

An equivalent formulation of the "add together" and "element-by-element product" operators of J is in both MATLAB and GNU Octave languages the following

sum( A(:) .* B(:) );

where sum means "add together by columns" and .* means "element-by-element product". In this case, sum would add together the elements of a single column because of the use of the operator (:).
The solution of a system of n linear equations

expressed in matrix form A * x = b takes advantage of the native matrix left-division \:

x = A \ b;

which also automatically recognizes and works with both overdetermined

Overdetermined system

In mathematics, a system of linear equations is considered overdetermined if there are more equations than unknowns. The terminology can be described in terms of the concept of counting constraints. Each unknown can be seen as an available degree of freedom...

and underdetermined system

Underdetermined system

In mathematics, a system of linear equations is considered underdetermined if there are fewer equations than unknowns. The terminology can be described in terms of the concept of counting constraints. Each unknown can be seen as an available degree of freedom...

s using a pseudoinverse strategy to solve them.

R

A = A + B;

The implementation in R language has a comprehensive suite of matrix operations.
R natively supports linear algebra operations such as matrix multiplication, matrix inversion, the numerical solution of system of linear equations.

a %*% b; # multiplication

The notation for the inner product of two matrices is compact:

x <- 1:4
x %*% x # scalar ("inner") product (1 x 1 matrix)

"Element-by-element" operations are

A * B
A + B

Additional matrix functionalities are available via packages such as OpenMx

OpenMx

OpenMx is an open source program for extended structural equation modeling. It runs as a package under R. Cross platform, it runs under Linux, Mac OS and Windows....

which e.g. supports lower/upper/Std matrices. Other packages support sparse-matrix operations as GNU Octave / MATLAB natively do.

However GNU R is unable to provide compact semantic notations such as native matrix left- of right-divisions. Solving a linear system would require to invoke a solve(.) function as in many other languages (or language libraries). This would not directly affect the conciseness:

x <- solve( A , b )

is only twice the length of the corresponding GNU Octave /MATLAB notation and still extremely compact. The only feature which could be more difficult to preserve is the language ability to use the notation for deducing results by directly expressing and developing mathematical reasoning with the source code.

Mathematical reasoning and language notation

The matrix left-division operator concisely expresses some semantic properties of matrices. As in the scalar equivalent, if the (determinant

Determinant

In linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...

of the) coefficient (matrix) A is not null then it is possible to solve the (vectorial) equation A * x = b by left-multiplying both sides by the inverse of A: A^-1 (in both MATLAB and GNU Octave languages: A^-1). The following mathematical statements hold when A is a full rank square matrix:

A^-1 *(A * x)

`A^-1 * (b)`

(A^-1 * A)* x A^-1 * b (matrix-multiplication associativity

Associativity

In mathematics, associativity is a property of some binary operations. It means that, within an expression containing two or more occurrences in a row of the same associative operator, the order in which the operations are performed does not matter as long as the sequence of the operands is not...

)

x = A^-1 * b

where is the equivalence relational operator

Relational operator

In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality and inequalities...

.
The previous statements are also valid MATLAB expressions if the third one is executed before the others (numerical comparisons may be false because of round-off errors).

If the system is overdetermined - so that A has more rows than columns - the pseudoinverse A⁺ (in MATLAB and GNU Octave languages: pinv(A)) can replace the inverse A^-1, as follows:

pinv(A) *(A * x) pinv(A) * (b)

(pinv(A) * A)* x pinv(A) * b (matrix-multiplication associativity

Associativity

)

x = pinv(A) * b

However, these solutions are neither the most concise ones (e.g. still remains the need to notationally differentiate overdetermined systems) nor the most computationally efficient. The latter point is easy to understand when considering again the scalar equivalent a * x = b, for which the solution x = a^-1 * b would require two operations instead of the more efficient x = b / a.
The problem is that generally matrix multiplications are not commutative

Commutativity

In mathematics an operation is commutative if changing the order of the operands does not change the end result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it...

as the extension of the scalar solution to the matrix case would require:

(a * x)/ a b / a

(x * a)/ a

`b / a` (commutativity does not hold for matrices!)

x * (a / a) b / a (associativity also holds for matrices)

x = b / a

The MATLAB language introduces the left-division operator \ to maintain the essential part of the analogy with the scalar case, therefore simplifying the mathematical reasoning and preserving the conciseness:

A \ (A * x)

`A \ b`

(A \ A)* x A \ b (associativity also holds for matrices, commutativity is no more required)

x = A \ b

This is not only an example of terse array programming from the coding point of view but also from the computational efficiency perspective, which in several array programming languages benefits from quite efficient linear algebra libraries such as ATLAS

Automatically Tuned Linear Algebra Software

Automatically Tuned Linear Algebra Software is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77....

or LAPACK

LAPACK

-External links:* : a modern replacement for PLAPACK and ScaLAPACK* on Netlib.org* * * : a modern replacement for LAPACK that is MultiGPU ready* on Sourceforge.net* * optimized LAPACK for Solaris OS on SPARC/x86/x64 and Linux* * *...

.

Returning to the previous quotation of Iverson, the rationale behind it should now be evident:

Third party libraries providing array-programming support to other languages

The use of specialized and efficient libraries to provide more terse abstractions is also common in other programming languages. In C++

C++

C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

several linear algebra libraries exploit the language ability to overload operators

Operator overloading

In object oriented computer programming, operator overloading—less commonly known as operator ad-hoc polymorphism—is a specific case of polymorphism, where different operators have different implementations depending on their arguments...

. It is interesting to notice that in some case a very terse abstraction in those languages is explicitly influenced by the array programming paradigm, as the Armadillo

Armadillo (C++ library)

The Armadillo C++ Library aims to provide an efficient and streamlined base for linear algebra operations , while at the same time having a straightforward and easy to use interface...

and Blitz++

Blitz++

Blitz++ is a high-performance vector mathematics library written in C++. This library is intended for use in scientific applications that might otherwise be implemented with Fortran or MATLAB....

libraries do.

Python

NumPy provides the nd-array object to the Python programming language

Python (programming language)

Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

. This provides much of the same functionality as Matlab

MATLAB

External links

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Concepts

Uses

Languages

Examples

FORTRAN 77

C

C#

Ada

Nial

J

MATLAB

R

Mathematical reasoning and language notation

A^-1 * (b)

b / a (commutativity does not hold for matrices!)

A \ b

Third party libraries providing array-programming support to other languages

Python

External links

`A^-1 * (b)`

`b / a` (commutativity does not hold for matrices!)

`A \ b`