Serialization
Encyclopedia
In computer science
, in the context of data storage and transmission, serialization is the process of converting a data structure
or object
state into a format that can be stored (for example, in a file
or memory buffer, or transmitted across a network
connection link) and "resurrected" later in the same or another computer environment. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references
, this process is not straightforward. Serialization of object-oriented object
s does not include any of their associated methods
with which they were previously inextricably linked.
This process of serializing an object is also called deflating or marshalling
an object. The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called inflating or unmarshalling).
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness
. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming language
s.
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.
Even on a single machine, primitive pointer objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling
.
Since both serializing and deserializing can be driven from common code, (for example, the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time, and thus 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy, since differences can be detected "on the fly". This is a way to understand the technique called differential execution. It is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.
by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of proprietary software
often keep the details of their programs' serialization formats a trade secret
. Some deliberately obfuscate
or even encrypt
the serialized data.
Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, remote method call
architectures such as CORBA
define their serialization formats in detail.
published the External Data Representation
(XDR) in 1987.
In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML
was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML
has been proposed as a compromise which is not readable by plain-text editors, but is more compact than regular XML. In the 2000s, XML is often used for asynchronous transfer of structured data between client and server in Ajax
web applications.
JSON
is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax
, but is supported in other programming languages as well.
Another alternative, YAML
, is effectively a superset of JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.
Another human-readable serialization format is the property list
format used in NeXTSTEP
, GNUstep
, and Mac OS X
Cocoa
.
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF
, netCDF
and the older GRIB
.
languages directly support object serialization (or object archival), either by syntactic sugar
elements or providing a standard interface for doing so.
Some of these programming languages are Ruby, Smalltalk
, Python
, PHP
, Objective-C
, Java
, and the .NET
family of languages.
There are also libraries available that add serialization support to languages that lack native support for it.
languages, classes can be serialized and deserialized by adding the
If new members are added to a serializable class, they can be tagged with the
To modify the default deserialization (for example, to automatically initialize a member marked
Objects may be serialized in binary format for deserialization by other .NET
applications. The framework also provides the
programming language, serialization (more commonly known as archiving) is achieved by overriding the
by implementing the interface
. Implementing the interface marks the class as "okay to serialize," and Java then handles serialization internally. There are no serialization methods defined on the
There are three primary reasons why objects are not serializable by default and must implement the
The standard encoding method uses a simple translation of the fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object and not marked as transient must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized.
It is possible to serialize Java objects through JDBC and store them into a database.
While Swing
components do implement the Serializable interface, they are not portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to an array of bytes, but it is not guaranteed that this storage will be readable on another machine.
allows data structures to be serialized to WDDX
with the
with the SerializeJSON function.
modules available from CPAN
provide serialization mechanisms, including
Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars.
In addition to serializing directly to files,
When serializing structures with
and C++
do not provide direct support for serialization. It is however possible to write your own serialization functions, since both language support writing binary data. Besides, compiler-based solutions, such as the ODB
ORM
system for C++, are capable of automatically producing serialization code with few or no modifications to class declarations. Another popular serialization framework is Boost.Serialization from the Boost Framework.
implements serialization through the standard library module
As of version 2.6, Python's standard library also includes support for JSON
and for XML-encoded property list
s. (See
implements serialization through the built-in
For objects (as of at least PHP 4) there are two "magic
methods" that can be implemented within a class — __sleep and __wakeup — that are called from within
As of PHP 5.1.0 there's another method to hook into internal serialize/unserialize mechanism - Serializable interface.
has the function
will serialize to file (
Some objects cannot be serialized (doing so would raise a
If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods:
to serialize and store objects. The easiest and most used method will be shown below (where?). Other classes of interest in Squeak for serializing objects are
The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange.
packages. A solution to this problem is SIXX http://www.mars.dti.ne.jp/~umejava/smalltalk/sixx/index.html, which is a package for multiple Smalltalks that uses an XML
-based format for serialization.
In many types of Lisp, including Common Lisp
, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function
Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out. See REPL.
Notice that not all readers/writers support cyclic, recursive or shared structures.
es.
Every type that is a member of the
The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read).
implements serialization through the built-in
cmdlet
To reconstitute the objects, use the
Two dimensional data structures can also be (de)serialized in CSV
format using the built-in cmdlets
For Java:
For C:
For C++:
For PHP:
Serialization systems that support multiple languages:
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, in the context of data storage and transmission, serialization is the process of converting a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
or object
Object (computer science)
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
state into a format that can be stored (for example, in a file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
or memory buffer, or transmitted across a network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....
connection link) and "resurrected" later in the same or another computer environment. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references
Reference (computer science)
In computer science, a reference is a value that enables a program to indirectly access a particular data item, such as a variable or a record, in the computer's memory or in some other storage device. The reference is said to refer to the data item, and accessing those data is called...
, this process is not straightforward. Serialization of object-oriented object
Object (computer science)
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
s does not include any of their associated methods
Method (computer science)
In object-oriented programming, a method is a subroutine associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time...
with which they were previously inextricably linked.
This process of serializing an object is also called deflating or marshalling
Marshalling (computer science)
In computer science, marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission...
an object. The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called inflating or unmarshalling).
Uses
Serialization provides:- a method of persistingPersistence (computer science)Persistence in computer science refers to the characteristic of state that outlives the process that created it. Without this capability, state would only exist in RAM, and would be lost when this RAM loses power, such as a computer shutdown....
objects which is more convenient than writing their properties to a text file on disk, and re-assembling them by reading this back in. - a method of remote procedure callRemote procedure callIn computer science, a remote procedure call is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space without the programmer explicitly coding the details for this remote interaction...
s, e.g., as in SOAP - a method for distributing objects, especially in software componentry such as COMComponent Object ModelComponent Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...
, CORBAÇorbaChorba , ciorbă , shurpa , shorpo , or sorpa is one of various kinds of soup or stew found in national cuisines across Middle East...
, etc. - a method for detecting changes in time-varying data.
For some of these features to be useful, architecture independence must be maintained. For example, for maximal use of distribution, a computer running on a different hardware architecture should be able to reliably reconstruct a serialized data stream, regardless of endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering, memory layout, or simply different ways of representing data structures in different programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s.
Inherent to any serialization scheme is that, because the encoding of the data is by definition serial, extracting one part of the serialized data structure requires that the entire object be read from start to end, and reconstructed. In many applications this linearity is an asset, because it enables simple, common I/O interfaces to be utilized to hold and pass on the state of an object. In applications where higher performance is an issue, it can make sense to expend more effort to deal with a more complex, non-linear storage organization.
Even on a single machine, primitive pointer objects are too fragile to save, because the objects to which they point may be reloaded to a different location in memory. To deal with this, the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling
Pointer swizzling
In computer science, pointer swizzling is the conversion of references based on name or position to direct pointer references. It is typically performed during the deserialization of a relocatable object from disk, such as an executable file or pointer-based data structure...
.
Since both serializing and deserializing can be driven from common code, (for example, the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time, and thus 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy, since differences can be detected "on the fly". This is a way to understand the technique called differential execution. It is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.
Consequences
Serialization, however, breaks the opacity of an abstract data typeAbstract data type
In computing, an abstract data type is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...
by potentially exposing private implementation details. To discourage competitors from making compatible products, publishers of proprietary software
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...
often keep the details of their programs' serialization formats a trade secret
Trade secret
A trade secret is a formula, practice, process, design, instrument, pattern, or compilation of information which is not generally known or reasonably ascertainable, by which a business can obtain an economic advantage over competitors or customers...
. Some deliberately obfuscate
Obfuscated code
Obfuscated code is source or machine code that has been made difficult to understand for humans. Programmers may deliberately obfuscate code to conceal its purpose or its logic to prevent tampering, deter reverse engineering, or as a puzzle or recreational challenge for someone reading the source...
or even encrypt
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...
the serialized data.
Yet, interoperability requires that applications be able to understand each other's serialization formats. Therefore, remote method call
RMI-IIOP
RMI-IIOP denotes the Java Remote Method Invocation interface over the Internet Inter-Orb Protocol , which delivers Common Object Request Broker Architecture distributed computing capabilities to the Java 2 platform...
architectures such as CORBA
Çorba
Chorba , ciorbă , shurpa , shorpo , or sorpa is one of various kinds of soup or stew found in national cuisines across Middle East...
define their serialization formats in detail.
Serialization formats
The Xerox Network Systems Courier technology in the early 1980s influenced the first widely-adopted standard. Sun MicrosystemsSun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...
published the External Data Representation
External Data Representation
External Data Representation is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems. Converting from the local representation to XDR is called encoding. Converting from XDR to the local...
(XDR) in 1987.
In the late 1990s, a push to provide an alternative to the standard serialization protocols started: XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
was used to produce a human readable text-based encoding. Such an encoding can be useful for persistent objects that may be read and understood by humans, or communicated to other systems regardless of programming language. It has the disadvantage of losing the more compact, byte-stream-based encoding, but by this point larger storage and transmission capacities made file size less of a concern than in the early days of computing. Binary XML
Binary XML
Binary XML refers to any specification which defines the compact representation of XML in a binary format. While there are several competing formats, none has been widely adopted by a standards organization or accepted as a de facto standard...
has been proposed as a compromise which is not readable by plain-text editors, but is more compact than regular XML. In the 2000s, XML is often used for asynchronous transfer of structured data between client and server in Ajax
Ajax (programming)
Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...
web applications.
JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
is a more lightweight plain-text alternative to XML which is also commonly used for client-server communication in web applications. JSON is based on JavaScript syntax
JavaScript syntax
The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.The examples below make use of the alert function for standard text output. The JavaScript standard library lacks an official standard text output function...
, but is supported in other programming languages as well.
Another alternative, YAML
YAML
YAML is a human-readable data serialization format that takes concepts from programming languages such as C, Perl, and Python, and ideas from XML and the data format of electronic mail . YAML was first proposed by Clark Evans in 2001, who designed it together with Ingy döt Net and Oren Ben-Kiki...
, is effectively a superset of JSON and includes features that make it more powerful for serialization, more "human friendly," and potentially more compact. These features include a notion of tagging data types, support for non-hierarchical data structures, the option to structure data with indentation, and multiple forms of scalar data quoting.
Another human-readable serialization format is the property list
Property list
In the Mac OS X, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files....
format used in NeXTSTEP
NEXTSTEP
NeXTSTEP was the object-oriented, multitasking operating system developed by NeXT Computer to run on its range of proprietary workstation computers, such as the NeXTcube...
, GNUstep
GNUstep
GNUstep is a free software implementation of Cocoa Objective-C libraries , widget toolkit, and application development tools not only for Unix-like operating systems, but also for Microsoft Windows. It is part of the GNU Project.GNUstep features a cross-platform, object-oriented development...
, and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
Cocoa
Cocoa (API)
Cocoa is Apple's native object-oriented application programming interface for the Mac OS X operating system and—along with the Cocoa Touch extension for gesture recognition and animation—for applications for the iOS operating system, used on Apple devices such as the iPhone, the iPod Touch, and...
.
For large volume scientific datasets, such as satellite data and output of numerical climate, weather, or ocean models, specific binary serialization standards have been developed, e.g. HDF
Hierarchical Data Format
Hierarchical Data Format is the name of a set of file formats and libraries designed to store and organize large amounts of numerical data...
, netCDF
NetCDF
NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research...
and the older GRIB
GRIB
GRIB is a mathematically concise data format commonly used in meteorology to store historical and forecast weather data...
.
Programming language support
Several object-oriented programmingObject-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
languages directly support object serialization (or object archival), either by syntactic sugar
Syntactic sugar
Syntactic sugar is a computer science term that refers to syntax within a programming language that is designed to make things easier to read or to express....
elements or providing a standard interface for doing so.
Some of these programming languages are Ruby, Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, Objective-C
Objective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, and the .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
family of languages.
There are also libraries available that add serialization support to languages that lack native support for it.
.NET Framework
In the .NET.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
languages, classes can be serialized and deserialized by adding the
Serializable
attribute to the class.If new members are added to a serializable class, they can be tagged with the
OptionalField
attribute to allow previous versions of the object to be deserialized without error. This attribute affects only deserialization, and prevents the runtime from throwing an exception if a member is missing from the serialized stream. A member can also be marked with the NonSerialized
attribute to indicate that it should not be serialized. This will allow the details of those members to be kept secret.To modify the default deserialization (for example, to automatically initialize a member marked
NonSerialized
), the class must implement the IDeserializationCallback
interface and define the IDeserializationCallback.OnDeserialization
method.Objects may be serialized in binary format for deserialization by other .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
applications. The framework also provides the
SoapFormatter
and XmlSerializer
objects to support serialization in human-readable, cross-platform XML.Objective-C
In the Objective-CObjective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
programming language, serialization (more commonly known as archiving) is achieved by overriding the
write:
and read:
methods in the Object root class. (NB This is in the GNU runtime variant of Objective-C. In the NeXT-style runtime, the implementation is very similar.)Java
Java provides automatic serialization which requires that the object be markedMarker interface pattern
The marker interface pattern is a design pattern in computer science, used with languages that provide run-time type information about objects...
by implementing the interface
Interface (Java)
An interface in the Java programming language is an abstract type that is used to specify an interface that classes must implement. Interfaces are declared using the interface keyword, and may only contain method signature and constant declarations...
. Implementing the interface marks the class as "okay to serialize," and Java then handles serialization internally. There are no serialization methods defined on the
Serializable
interface, but a serializable class can optionally define methods with certain special names and signatures that if defined, will be called as part of the serialization/deserialization process. The language also allows the developer to override the serialization process more thoroughly by implementing another interface, the interface, which includes two special methods that are used to save and restore the object's state.There are three primary reasons why objects are not serializable by default and must implement the
Serializable
interface to access Java's serialization mechanism.- Not all objects capture useful semantics in a serialized state. For example, a object is tied to the state of the current JVM. There is no context in which a deserialized
Thread
object would maintain useful semantics. - The serialized state of an object forms part of its class's compatibility contract. Maintaining compatibility between versions of serializable classes requires additional effort and consideration. Therefore, making a class serializable needs to be a deliberate design decision and not a default condition.
- Serialization allows access to non-transient private members of a class that are not otherwise accessible. Classes containing sensitive information (for example, a password) should not be serializable nor externalizable.
The standard encoding method uses a simple translation of the fields into a byte stream. Primitives as well as non-transient, non-static referenced objects are encoded into the stream. Each object that is referenced by the serialized object and not marked as transient must also be serialized; and if any object in the complete graph of non-transient object references is not serializable, then serialization will fail. The developer can influence this behavior by marking objects as transient, or by redefining the serialization for an object so that some portion of the reference graph is truncated and not serialized.
It is possible to serialize Java objects through JDBC and store them into a database.
While Swing
Swing (Java)
Swing is the primary Java GUI widget toolkit. It is part of Oracle's Java Foundation Classes — an API for providing a graphical user interface for Java programs....
components do implement the Serializable interface, they are not portable between different versions of the Java Virtual Machine. As such, a Swing component, or any component which inherits it, may be serialized to an array of bytes, but it is not guaranteed that this storage will be readable on another machine.
ColdFusion
ColdFusionColdFusion
In computing, ColdFusion is the name of a commercial rapid application development platform invented by Jeremy and JJ Allaire in 1995. ColdFusion was originally designed to make it easier to connect simple HTML pages to a database, by version 2 it had...
allows data structures to be serialized to WDDX
WDDX
WDDX is a programming-language-, platform- and transport-neutral data interchange mechanism to pass data between different environments and different computers...
with the
tag and to JSONJSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
with the SerializeJSON function.
OCaml
OCaml's standard library provides marshalling through the Marshal module (its documentation) and the Pervasives functions output_value and input_value. While OCaml programming is statically type-checked, uses of the Marshal module may break type guarantees, as there is no way to check whether an unmarshalled stream represents objects of the expected type. In OCaml it is difficult to marshal a function or a data structure which contains a function (e.g. an object which contains a method), because executable code in functions cannot be transmitted across different programs. (There is a flag to marshal the code position of a function but it can only be unmarshalled in exactly the same program.)Perl
Several PerlPerl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
modules available from CPAN
CPAN
CPAN, the Comprehensive Perl Archive Network, is an archive of nearly 100,000 modules of software written in Perl, as well as documentation for it. It has a presence on the World Wide Web at and is mirrored worldwide at more than 200 locations...
provide serialization mechanisms, including
Storable
and FreezeThaw
.Storable includes functions to serialize and deserialize Perl data structures to and from files or Perl scalars.
In addition to serializing directly to files,
Storable
includes the freeze function to return a serialized copy of the data packed into a scalar, and thaw to deserialize such a scalar. This is useful for sending a complex data structure over a network socket or storing it in a database.When serializing structures with
Storable
, there are network safe functions that always store their data in a format that is readable on any computer at a small cost of speed. These functions are named nstore, nfreeze, etc. There are no "n" functions for deserializing these structures — the regular thaw and retrieve deserialize structures serialized with the "n
" functions and their machine-specific equivalents.C and C++
CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
do not provide direct support for serialization. It is however possible to write your own serialization functions, since both language support writing binary data. Besides, compiler-based solutions, such as the ODB
ODB (C++)
ODB is an object-relational mapping system for the C++ language. It allows an application developer to persist C++ objects to a relational database without having to deal with tables, columns, or SQL and without manually writing any mapping code...
ORM
Object-relational mapping
Object-relational mapping in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language...
system for C++, are capable of automatically producing serialization code with few or no modifications to class declarations. Another popular serialization framework is Boost.Serialization from the Boost Framework.
Python
PythonPython (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
implements serialization through the standard library module
picklePickle (Python)In the computer programming language Python, pickle is the standard mechanism for object serialization; pickling is the common term among Python programmers for serialization . Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object...
, and to a lesser extent, the older marshal
. marshal
does offer the ability to serialize Python code objects, unlike pickle
. In addition, Python offers the cPickle module, which (as the name suggests) is a C implementation of the pickle module. It can be up to 1000 times faster than the pure Python pickle module, but has a few limitations. The shelve
module is based on the pickle
module and can be regarded as a serialized Python dictionary.As of version 2.6, Python's standard library also includes support for JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...
and for XML-encoded property list
Property list
In the Mac OS X, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files....
s. (See
json
and plistlib
, respectively.) However, these modules only handle basic Python types like strings, integers, and collections of basic types, whereas pickle
is intended for arbitrary objects.PHP
PHPPHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
implements serialization through the built-in
serialize
and unserialize
functions. PHP can serialize any of its data types except resources (file pointers, sockets, etc.).For objects (as of at least PHP 4) there are two "magic
Magic (programming)
In the context of computer programming, magic is an informal term for abstraction - it is used to describe code that handles complex tasks while hiding that complexity to present a simple interface. The term is somewhat tongue-in-cheek and carries good connotations, implying that the interface...
methods" that can be implemented within a class — __sleep and __wakeup — that are called from within
serialize
and unserialize
, respectively, that can clean up and restore an object. For example, it may be desirable to close a database connection on serialization and restore the connection on deserialization; this functionality would be handled in these two magic methods. They also permit the object to pick which properties are serialized.As of PHP 5.1.0 there's another method to hook into internal serialize/unserialize mechanism - Serializable interface.
R
RR (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
has the function
dput
which writes an ASCII text representation of an R object to a file or connection. A representation can be read from a file using dget
.REBOL
REBOLREBOL
REBOL is a cross-platform data exchange language and a multi-paradigm dynamic programming language originally designed by Carl Sassenrath for network communications and distributed computing. The language and its official implementation, which is a proprietary freely redistributable software are...
will serialize to file (
save/all
) or to a string!
(mold/all
). Strings and files can be deserialized using the polymorphicType polymorphism
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions...
load
function.Ruby
Ruby includes the standard moduleMarshal
with 2 methods dump
and load
, akin to the standard Unix utilities dumpDump (program)dump is a Unix program used to back up file systems. It operates on blocks, below filesystem abstractions such as files and directories. Dump can back up a file system to a tape or another disk...
and restore
. These methods serialize to the standard class String
, that is, they effectively become a sequence of bytes.Some objects cannot be serialized (doing so would raise a
TypeError
exception):- bindings,
- procedure objects,
- instances of class IO,
- singleton objects
If a class requires custom serialization (for example, it requires certain cleanup actions done on dumping / restoring), it can be done by implementing 2 methods:
_dump
and _load
. The instance method _dump
should return a String
object containing all the information necessary to reconstitute objects of this class and all referenced objects up to a maximum depth given as an integer parameter (a value of -1 implies that depth checking should be disabled). The class method _load
should take a String
and return an object of this class.Squeak Smalltalk
There are several ways in Squeak SmalltalkSqueak
The Squeak programming language is a Smalltalk implementation. It is object-oriented, class-based and reflective.It was derived directly from Smalltalk-80 by a group at Apple Computer that included some of the original Smalltalk-80 developers...
to serialize and store objects. The easiest and most used method will be shown below (where?). Other classes of interest in Squeak for serializing objects are
SmartRefStream
and ImageSegment
.Cincom Smalltalk and Smalltalk/X
Both provide a so called "binary-object storage framework", which support serialization into and retrieval from a compact binary form. Both handle cyclic, recursive and shared structures, storage/retrieval of class and metaclass info and include mechanisms for "on the fly" object migration (i.e. to convert instances which were written by an older version of a class with a different object layout).The APIs are similar (storeBinary/readBinary), but the encoding details are different, making these two formats incompatible. However, the Smalltalk/X code is open source and free and can be loaded into other Smalltalks to allow for cross-dialect object interchange.
Other Smalltalk dialects
Object serialization is not part of the ANSI Smalltalk specification. As a result, the code to serialize an object varies by Smalltalk implementation. The resulting binary data also varies. For instance, a serialized object created in Squeak Smalltalk cannot be restored in Ambrai Smalltalk. Consequently, various applications that do work on multiple Smalltalk implementations that rely on object serialization cannot share data between these different implementations. These applications include the MinneStore object database http://minnestore.sourceforge.net/ and some RPCRemote procedure call
In computer science, a remote procedure call is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space without the programmer explicitly coding the details for this remote interaction...
packages. A solution to this problem is SIXX http://www.mars.dti.ne.jp/~umejava/smalltalk/sixx/index.html, which is a package for multiple Smalltalks that uses an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
-based format for serialization.
Lisp
Generally a Lisp data structure can be serialized with the functions "read
" and "print
". A variable foo containing, for example, a list of arrays would be printed by (print foo)
. Similarly an object can be read from a stream named s by (read s)
. These two parts of the Lisp implementation are called the Printer and the Reader. The output of "print
" is human readable; it uses lists demarked by parentheses, for example: (4 2.9 "x" y)
.In many types of Lisp, including Common Lisp
Common Lisp
Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
, the printer cannot represent every type of data because it is not clear how to do so. In Common Lisp for example the printer cannot print CLOS objects. Instead the programmer may write a method on the generic function
print-object
, this will be invoked when the object is printed. This is somewhat similar to the method used in Ruby.Lisp code itself is written in the syntax of the reader, called read syntax. Most languages use separate and different parsers to deal with code and data, Lisp only uses one. A file containing lisp code may be read into memory as a data structure, transformed by another program, then possibly executed or written out. See REPL.
Notice that not all readers/writers support cyclic, recursive or shared structures.
Haskell
In Haskell, serialization is supported for types that are members of the Read and Show type classType class
In computer science, a type class is a type system construct that supports ad-hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types...
es.
Every type that is a member of the
Read
type class defines a function that will extract the data from the string representation of the dumped data. The Show
type class, in turn, contains the show
function from which a string representation of the object can be generated.The programmer need not define the functions explicitly—merely declaring a type to be deriving Read or deriving Show, or both, can make the compiler generate the appropriate functions for many cases (but not all: function types, for example, cannot automatically derive Show or Read).
Windows PowerShell
Windows PowerShellWindows PowerShell
Windows PowerShell is Microsoft's task automation framework, consisting of a command-line shell and associated scripting language built on top of, and integrated with the .NET Framework...
implements serialization through the built-in
Shell builtin
In computing, a shell builtin is a command or a function, called from a shell, that is executed directly in the shell itself, instead of an external executable program which the shell would load and execute....
cmdlet
Export-CliXML
. Export-CliXML
serializes .NET objects and stores the resulting XML in a file.To reconstitute the objects, use the
Import-CliXML
cmdlet, which generates a deserialized object from the XML in the exported file. Deserialized objects, often known as "property bags" are not live objects; they are snapshots that have properties, but no methods.Two dimensional data structures can also be (de)serialized in CSV
Comma-separated values
A comma-separated values file stores tabular data in plain-text form. As a result, such a file is easily human-readable ....
format using the built-in cmdlets
Import-CSV
and Export-CSV
.See also
- Comparison of data serialization formatsComparison of data serialization formatsThis is a comparison of data serialization formats, different ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.-Overview:*a. The current default format is binary....
- Hibernate (Java)Hibernate (Java)Hibernate is an object-relational mapping library for the Java language, providing a framework for mapping an object-oriented domain model to a traditional relational database...
- Persistor.NETPersistor.NETPersistor.NET is an object-oriented persistence framework which provides persistence for pure object-oriented development. Persistor .NET saves, retrieves, and deletes pure .NET Framework object graphs within a SQL Server 2005 or SQL Server Express database...
- XML Schema
- Basic Encoding RulesBasic Encoding RulesThe Basic Encoding Rules is one of the encoding formats defined as part of the ASN.1 standard specified by the ITU in X.690.-Description:...
External links
For ASP.NET:For Java:
- Java 1.4 Object Serialization documentation.
- Java Object Serialization
- Durable Java: Serialization
- XML Data Binding Resources
- JOAFIP serialization in file and more...
- Databoard Binary serialization with partial and random access, type system, RPC, type adaption, and text format.
- Kryo Fast and efficient object graph serialization framework for Java
For C:
For C++:
For PHP:
- Object Serialization in PHP
- Online serialize & unserialize tool with recovering capacities. Other resources about PHP serialization
- Recovering Truncated PHP Serialized Arrays
Serialization systems that support multiple languages:
- Google Protocol Buffers Developer Guide (C++, Java, Python)
- Apache Avro