Record (computer science)
Encyclopedia
In computer science
, a record is an instance of a product
of primitive data types called a tuple. In C
it is the compound data in a struct. Records are among the simplest data structure
s. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
For example, a date may be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. As another example, a Personnel record might contain a name, a salary, and a rank. As yet another example, a Circle might contain a center and a radius. In this instance, the center itself might be represented as a Point record containing x and y coordinates.
Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type.
A record type is a data type
that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier
(name or label) by which it can be accessed. In type theory
, product type
s (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub. Since type-theoretical records may contain first-class function
-typed fields in addition to data, they can express many features of object-oriented programming
.
Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tape
s or hard disk
s. Records are a fundamental component of most data structures, especially linked data structure
s. Many computer file
s are organized as arrays of logical record
s, often grouped into larger physical records or blocks
for efficiency.
The parameters of a function or procedure
can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack
that is often used to implement procedure calls, each entry is an
activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields.
An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object data types (often called object classes) are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects.
A record can be viewed as the computer analog of a mathematical
tuple
. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product
of two or more mathematical sets, or the implementation of an abstract product type
in a specific language.
s and ledger
s used in accounting since remote times. The modern notion of records in computer science, with fields of well-defined type and size, was already implicit in 19th century mechanical calculators, such as Babbage
's Analytical Engine
.
Records were well established in the first half of the 20th century, when most data processing was done using punched card
s. Typically each records of a data file would be recorded in one punched card, with specific columns assigned to specific fields.
Most machine language implementations and early assembly language
s did not have special syntax for records, but the concept was available (and extensively used) through the use of index register
s, indirect addressing, and self-modifying code
. Some early computers, such as the IBM 1620
, had hardware support for delimiting records and fields, and special instructions for copying such records.
The concept of records and fields was central in some early file sorting
and tabulating utilities, such as IBM's Report Program Generator (RPG).
COBOL
was the first widespread programming language to support record types, and its record definition facilities were quite sophisticated at the time. The language allows for the definition of nested records with alphanumeric, integer, and fractional fields of arbitrary size and precision, as well as fields that automatically format any value assigned to them (e.g., insertion of currency signs, decimal points, and digit group separators). Each file is associated with a record variable where data is read into or written from. COBOL also provides a
The early languages developed for numeric computing, such as FORTRAN
(up to FORTRAN IV) and Algol 60
, did not have support for record types; but latter versions of those languages, such as Fortran 77 and Algol 68
did add them. The original Lisp programming language
too was lacking records (except for the built-in cons cell), but its S-expression
s provided an adequate surrogate. The Pascal programming language was one of the first languages to fully integrate record types with other basic types into a logically consistent type system. IBM's PL/1 programming language provided for COBOL-style records. The C
programming language initially provided the record concept as a kind of template (
, Modula
, and Java) also supported records.
Some languages may provide facilities that enumerate all fields of a record, or at least the fields that are references. This facility is needed to implement certain services such as debugger
s, garbage collectors, and serialization
. It requires some degree of type polymorphism
.
The selection of a field from a record value yields a value.
Some languages may also allow assignment between records whose fields have different names, matching each field value with the corresponding field variable by their positions within the record; so that, for example, a complex number
with fields called
Other languages (such as COBOL
) may match fields and values by their names, rather than positions.
These same possibilities apply to the comparison of two record values for equality. Some languages may also allow order comparisons ('<'and '>'), using the lexicographic order based on the comparison of individual fields.
field must occupy a single word.
Some languages may implement a record as an array of addresses pointing to the fields (and, possibly, to their names and/or types). Objects in object-oriented languages are often implemented in rather complicated ways, especially in languages that allow multiple class inheritance.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a record is an instance of a product
Product type
In programming languages and type theory, a product of types is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the product. An instance of a product type retains the fixed...
of primitive data types called a tuple. In C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
it is the compound data in a struct. Records are among the simplest data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
For example, a date may be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. As another example, a Personnel record might contain a name, a salary, and a rank. As yet another example, a Circle might contain a center and a radius. In this instance, the center itself might be represented as a Point record containing x and y coordinates.
Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type.
A record type is a data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
(name or label) by which it can be accessed. In type theory
Type theory
In mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...
, product type
Product type
In programming languages and type theory, a product of types is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the product. An instance of a product type retains the fixed...
s (with no field names) are generally preferred due to their simplicity, but proper record types are studied in languages such as System F-sub. Since type-theoretical records may contain first-class function
First-class function
In computer science, a programming language is said to have first-class functions if it treats functions as first-class objects. Specifically, this means that the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning...
-typed fields in addition to data, they can express many features of object-oriented programming
Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
.
Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tape
Magnetic tape
Magnetic tape is a medium for magnetic recording, made of a thin magnetizable coating on a long, narrow strip of plastic. It was developed in Germany, based on magnetic wire recording. Devices that record and play back audio and video using magnetic tape are tape recorders and video tape recorders...
s or hard disk
Hard disk
A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...
s. Records are a fundamental component of most data structures, especially linked data structure
Linked data structure
In computer science, a linked data structure is a data structure which consists of a set of data records linked together and organized by references ....
s. Many computer file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
s are organized as arrays of logical record
Storage record
In computer science, a storage record is:* A group of related data, words, or fields treated as a meaningful unit; for instance, a Name, Address, and Telephone Number can be a "Personal Record"....
s, often grouped into larger physical records or blocks
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
for efficiency.
The parameters of a function or procedure
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack
Call stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...
that is often used to implement procedure calls, each entry is an
activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields.
An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object data types (often called object classes) are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects.
A record can be viewed as the computer analog of a mathematical
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...
. In the same vein, a record type can be viewed as the computer language analog of the Cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...
of two or more mathematical sets, or the implementation of an abstract product type
Product type
In programming languages and type theory, a product of types is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the product. An instance of a product type retains the fixed...
in a specific language.
History
The concept of record can be traced to various types of tableTable (information)
A table is a means of arranging data in rows and columns.Production % of goalNorth 4087102%South 4093110% The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural...
s and ledger
Ledger
A ledger is the principal book or computer file for recording and totaling monetary transactions by account, with debits and credits in separate columns and a beginning balance and ending balance for each account. The ledger is a permanent summary of all amounts entered in supporting journals which...
s used in accounting since remote times. The modern notion of records in computer science, with fields of well-defined type and size, was already implicit in 19th century mechanical calculators, such as Babbage
Charles Babbage
Charles Babbage, FRS was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer...
's Analytical Engine
Analytical engine
The Analytical Engine was a proposed mechanical general-purpose computer designed by English mathematician Charles Babbage. It was first described in 1837 as the successor to Babbage's difference engine, a design for a mechanical calculator...
.
Records were well established in the first half of the 20th century, when most data processing was done using punched card
Punched card
A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...
s. Typically each records of a data file would be recorded in one punched card, with specific columns assigned to specific fields.
Most machine language implementations and early assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
s did not have special syntax for records, but the concept was available (and extensively used) through the use of index register
Index register
An index registerCommonly known as a B-line in early British computers. in a computer's CPU is a processor register used for modifying operand addresses during the run of a program, typically for doing vector/array operations...
s, indirect addressing, and self-modifying code
Self-modifying code
In computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance...
. Some early computers, such as the IBM 1620
IBM 1620
The IBM 1620 was announced by IBM on October 21, 1959, and marketed as an inexpensive "scientific computer". After a total production of about two thousand machines, it was withdrawn on November 19, 1970...
, had hardware support for delimiting records and fields, and special instructions for copying such records.
The concept of records and fields was central in some early file sorting
Sorting
Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two common, yet distinct meanings:# ordering: arranging items of the same kind, class, nature, etc...
and tabulating utilities, such as IBM's Report Program Generator (RPG).
COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
was the first widespread programming language to support record types, and its record definition facilities were quite sophisticated at the time. The language allows for the definition of nested records with alphanumeric, integer, and fractional fields of arbitrary size and precision, as well as fields that automatically format any value assigned to them (e.g., insertion of currency signs, decimal points, and digit group separators). Each file is associated with a record variable where data is read into or written from. COBOL also provides a
MOVE
CORRESPONDING
statement that assigns corresponding fields of two records according to their names.The early languages developed for numeric computing, such as FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
(up to FORTRAN IV) and Algol 60
ALGOL 60
ALGOL 60 is a member of the ALGOL family of computer programming languages. It gave rise to many other programming languages, including BCPL, B, Pascal, Simula, C, and many others. ALGOL 58 introduced code blocks and the begin and end pairs for delimiting them...
, did not have support for record types; but latter versions of those languages, such as Fortran 77 and Algol 68
ALGOL 68
ALGOL 68 isan imperative computerprogramming language that was conceived as a successor to theALGOL 60 programming language, designed with the goal of a...
did add them. The original Lisp programming language
Lisp programming language
Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older...
too was lacking records (except for the built-in cons cell), but its S-expression
S-expression
S-expressions or sexps are list-based data structures that represent semi-structured data. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp family of programming languages...
s provided an adequate surrogate. The Pascal programming language was one of the first languages to fully integrate record types with other basic types into a logically consistent type system. IBM's PL/1 programming language provided for COBOL-style records. The C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
programming language initially provided the record concept as a kind of template (
struct
) that could be laid on top of a memory area, rather than a true record data type. The latter were provided eventually (by the typedef
declaration), but the two concepts are still distinct in the language. Most languages designed after Pascal (such as AdaAda (programming language)
Ada is a structured, statically typed, imperative, wide-spectrum, and object-oriented high-level computer programming language, extended from Pascal and other languages...
, Modula
Modula
The Modula programming language is a descendent of the Pascal programming language. It was developed in Switzerland in the late 1970s by Niklaus Wirth, the same person who designed Pascal...
, and Java) also supported records.
Operations
A programming language that supports record types usually provides some or all of the following operations:- Declaration of a new record type, including the position, type, and (possibly) name of each field;
- Declaration of variables and values as having a given record type;
- Construction of a record value from given field values and (sometimes) with given field names;
- Selection a field of a record with an explicit name;
- Assignment of a record value to a record variable;
- Comparison of two records for equality;
- Computation of a standard hash valueHash functionA hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
for the record.
Some languages may provide facilities that enumerate all fields of a record, or at least the fields that are references. This facility is needed to implement certain services such as debugger
Debugger
A debugger or debugging tool is a computer program that is used to test and debug other programs . The code to be examined might alternatively be running on an instruction set simulator , a technique that allows great power in its ability to halt when specific conditions are encountered but which...
s, garbage collectors, and serialization
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...
. It requires some degree of type polymorphism
Type polymorphism
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions...
.
The selection of a field from a record value yields a value.
Assignment and comparison
Most languages allow assignment between records that have exactly the same record type (including same field types and names, in the same order). Depending on the language, however, two record data types defined separately may be regarded as distinct types even if they have exactly the same fields.Some languages may also allow assignment between records whose fields have different names, matching each field value with the corresponding field variable by their positions within the record; so that, for example, a complex number
Complex data type
Some programming languages provide a complex data type for complex number storage and arithmetic as a built-in data type.In some programming environments the term complex data type is a synonym to the composite data type.-Complex number arithmetic:A complex variable or value is usually...
with fields called
real
and imag
can be assigned to a 2D point record variable with fields X
and Y
. In this alternative, the two operands are still required to have the same sequence of field types. Some languages may also require that corresponding types have the same size and encoding as well, so that the whole record can be assigned as an uninterpreted bit string. Other languages may be more flexible in this regard, and require only that each value field can be legally assigned to the corresponding variable field; so that, for example, a short integer field can be assigned to a long integer field, or vice-versa.Other languages (such as COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
) may match fields and values by their names, rather than positions.
These same possibilities apply to the comparison of two record values for equality. Some languages may also allow order comparisons ('<'and '>'), using the lexicographic order based on the comparison of individual fields.
Algol 68's distributive field selection
In Algol 68, ifPts
was an array of records, each with integer fields X
and Y
, one could write Pts.Y
to obtain an array of integers, consisting of the Y
fields of all the elements of Pts
. As a result, the statements Pts[3].Y := 7
and Pts.Y[3] := 7
would have the same effect.Pascal's "with" statement
In the Pascal programming language, the commandwith R do S end
would execute the command sequence S
as if all the fields of record R
had been declared as variables. So, instead of writing Pt.X := 5; Pt.Y := Pt.X + 3
one could write with Pt do X := 5; Y := X + 3 end
.Representation in memory
The representation of records in memory varies depending on the programming languages. Usually the fields are stored in consecutive positions in memory, in the same order as they are declared in the record type. This may result in two or more fields stored into the same word of memory; indeed, this feature is often used in systems programming to access specific bits of a word. On the other hand, most compilers will add padding fields, mostly invisible to the programmer, in order to comply with alignment constraints imposed by the machine—say, that a floating pointFloating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
field must occupy a single word.
Some languages may implement a record as an array of addresses pointing to the fields (and, possibly, to their names and/or types). Objects in object-oriented languages are often implemented in rather complicated ways, especially in languages that allow multiple class inheritance.
See also
- struct (C programming language)Struct (C programming language)A struct in C programming language is a structured type that aggregates a fixed set of labelled objects, possibly of different types, into a single object.A struct declaration consists of a list of fields, each of which can have any type...
- Composite data type
- Object compositionObject compositionIn computer science, object composition is a way to combine simple objects or data types into more complex ones...
- Cons cell
- Storage recordStorage recordIn computer science, a storage record is:* A group of related data, words, or fields treated as a meaningful unit; for instance, a Name, Address, and Telephone Number can be a "Personal Record"....
- Block (data storage)Block (data storage)In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
- Hashed linking
- Data structure alignmentData structure alignmentData structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks...
- Data hierarchyData hierarchyData Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on....
- Row (database)Row (database)In the context of a relational database, a row—also called a record or tuple—represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields...