File sequence
Encyclopedia
In computing
, as well as in non-computing contexts, a file sequence is a well-order
ed, (finite) collection of file
s, usually related to each other in some way.
In computing, file sequences should ideally obey some kind of locality of reference
principle, so that not only all the files belonging to the same sequence ought to be locally referenced to each other, but they also obey that as much as is their proximity with respect to the ordering relation
. Explicit file sequences are, in fact, sequences whose filenames all end with a numeric or alphanumeric tag in the end (excluding file extension).
The aforementioned locality of reference usually pertains either to the data, the metadata (e.g. their filenames or last-access dates), or the physical proximity within the storage media they reside in. In the latter acception it is better to speak about file contiguity (see below).
with a GUI
shows contents of folders
by usually ordering its files according to some criteria, mostly related to the files' metadata
, like the filename. The criterion is, by default, the alphanumeric ordering of filenames, although some operating systems do that in "smarter" ways than others: for example
ideally be placed before
does whereas, alphanumerically, it comes after (more on that later).
Other criteria exists, like ordering files by their file type
(or by their extension) and, if the same type, by either filename or last-access date, and so on.
For this reason, when a file sequence has a more strong locality of reference, particularly when it is related to their actual contents, it is better to highlight this fact by letting their well-ordering induce an alphanumeric ordering of the filenames too. That is the case of explicit file sequences.
In this sense any files sharing the same filename (and possibly extension), only differing by the sequence number at the end of the filename, automatically belong to the same file sequence, at least when they are located in the same folder.
It is also part of many naming conventions that number-indexed file sequences (in any number base) containing as many files as to span at most a fixed number of digits, make use of "trailing zero
es" in their filenames so that:
To better explain the latter point, consider that, strictly speaking,
Examples of explicit file sequences include:
ordering of 256 files
Software and programming conventions usually represent a file sequence as a single virtual file object, whose name is comprehensively written in C
-like formatted-string notation to represent where the sequence number is located in the filename and what is its formatting. For the two examples above, that would be
es would be
Note, however, that such notation is usually not valid at operating system and command-line interface
levels, because the '
nor a universally legal filename character: that notation just stands as a placeholder
for the virtual file-like representing the whole explicit file sequence.
Notable software packages anckowledging explitic file sequences as single filesystem objects, rather typical in the Audio/Video post-production industry (see below), are found among products by Autodesk
, Quantel
, daVinci
, DVS
, as well as Adobe After Effects
.
device is said to be contiguous if:
File contiguity is a more practical requirement for file sequences than just their locality of reference
, because it is related to the storage medium hosting the whole sequence than to the sequence itself (or its metadata
). At the same time, it is a "high-level" feature, because it is not related to the physical and technical details of mass storage itself: particularly, file contiguity is realized in different ways according to the storage device's architecture and actual filesystem structure. At "low level", each file in a contiguous sequence must be placed in contiguous blocks, in spite of reserved areas or special metadata required by the filesystem (like inodes or inter-sector headers) actually interleaving them.
File contiguity is, in most practical applications, "invisible" at operating-system or user levels, since all the files in a sequence are always available to applications in the same way, regardless of their physical location on the storage device (due to operating systems hiding the filesystem internals to higher-level services). Indeed, file contiguity may be related to I/O performance when the sequence is to be read or written in the shortest time possible.
In some contexts (like optical disk burning - also cfr. below), data in a file sequence must be accessed in the same order as the file sequence itself; in other contexts, a "random" access to the sequence may be required. In both cases, most professional filesystems provide faster access strategies to contiguous files than non-contiguous ones. Data pre-allocation is crucial for write access, whereas burst read speeds are achievable only for contiguous data.
When a file sequence is not contiguous, it is said to be scattered, since its files are stored in sparse locations on the storage device. File scattering is the process of allocating (or re-allocating) a file sequence as being (or becoming) uncontiguous. That is often associated with file fragmentation
too, where each file is also stored in several, non-contiguous blocks; mechanisms contributing to the former are usually a common cause to the latter too. The act of reducing file scattering by means of allocating (in the first place) or moving (for already-stored data) files in the same sequence near together on the storage medium is called (file) file descattering.
A few defragmentation
strategies and dedicated software are able to both defragment single files and descatter file sequences.
. In the latter case, explicit file numbering is extremely important in order to provide both software and end users a way to discern the consequentiality of the contents stored therein. For example, digital cameras and similar devices save all the picture files in the same folder (until it either reaches its maximum file-number capacity, or a new event like midnight-coming or device-switching takes place) with a final number sequence: it would be very unpractical to choose a filename for each taken shot on the very shooting time, so the camera firmware/software picks one which is perfectly identifiable by its sequence number. With the aid of other metadata
(and usually of specialized PC software), users can later on discern the multimedia contents and re-organize them, if needed.
(DI) workflow for motion picture and video industries. In such contexts, video data need to maintain the highest quality and be ready for visualization (usually real-time if not even better). Usually video data are acquired from either a digital video camera or a motion picture film scanner
and stored into file sequences (as much as a common photographic camera does) and need to be post-produced in several steps, including at least editing, conforming and colour-correction. That requires:
Consider that a single frame in a DI
project is currently from 9MB to 48MB large (depending upon resolution
and colour-depth
), whereas video refresh rate
is commonly 24 or 25 frames per second (if not faster); any storage required for real-time playing such contents thus needs a minimum overall throughput
of 220MB/s to 1.2GB/s, respectively. With those numbers, all the above requirements (particularly file contiguity, given nowadays storage performances) become strictly mandatory.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, as well as in non-computing contexts, a file sequence is a well-order
Well-order
In mathematics, a well-order relation on a set S is a strict total order on S with the property that every non-empty subset of S has a least element in this ordering. Equivalently, a well-ordering is a well-founded strict total order...
ed, (finite) collection of file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
s, usually related to each other in some way.
In computing, file sequences should ideally obey some kind of locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...
principle, so that not only all the files belonging to the same sequence ought to be locally referenced to each other, but they also obey that as much as is their proximity with respect to the ordering relation
Relation (mathematics)
In set theory and logic, a relation is a property that assigns truth values to k-tuples of individuals. Typically, the property describes a possible connection between the components of a k-tuple...
. Explicit file sequences are, in fact, sequences whose filenames all end with a numeric or alphanumeric tag in the end (excluding file extension).
The aforementioned locality of reference usually pertains either to the data, the metadata (e.g. their filenames or last-access dates), or the physical proximity within the storage media they reside in. In the latter acception it is better to speak about file contiguity (see below).
Identification
Every operating systemOperating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
with a GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...
shows contents of folders
Directory (file systems)
In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...
by usually ordering its files according to some criteria, mostly related to the files' metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
, like the filename. The criterion is, by default, the alphanumeric ordering of filenames, although some operating systems do that in "smarter" ways than others: for example
file1.ext
shouldideally be placed before
file10.ext
, like LinuxLinux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
does whereas, alphanumerically, it comes after (more on that later).
Other criteria exists, like ordering files by their file type
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
(or by their extension) and, if the same type, by either filename or last-access date, and so on.
For this reason, when a file sequence has a more strong locality of reference, particularly when it is related to their actual contents, it is better to highlight this fact by letting their well-ordering induce an alphanumeric ordering of the filenames too. That is the case of explicit file sequences.
Explicit file sequences
Explicit file sequences have the same filename (including file extensions in order to confirm their contents' locality of reference) except for the final part (excluding the extension), which is a sequence of either numeric, alphanumeric or purely alphabetical characters to force a specific ordering; such sequences should also be ideally located all within the same directory.In this sense any files sharing the same filename (and possibly extension), only differing by the sequence number at the end of the filename, automatically belong to the same file sequence, at least when they are located in the same folder.
It is also part of many naming conventions that number-indexed file sequences (in any number base) containing as many files as to span at most a fixed number of digits, make use of "trailing zero
Trailing zero
In mathematics, trailing zeros are a sequence of 0s in the decimal representation of a number, after which no other digits follow....
es" in their filenames so that:
- all the files in the sequence share exactly the same number of characters in their complete filenames;
- non-smart alphanumeric orderings, like those of operating systems' GUIGuiGui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...
s, do not erroneously permute them within the sequence.
To better explain the latter point, consider that, strictly speaking,
file1.ext
(1st file in the sequence) comes alphanumerically after file100.ext
, which is actually the hundredth. By renaming the first file to file001.ext
with two trailing zeroes, the problem is universally solved.Examples of explicit file sequences include:
file00000.ext
, file00001.ext
, file00002.ext
, , file02979.ext
(five trailing zeroes), and another with a hexadecimalHexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
ordering of 256 files
tag_00.ext
, tag_01.ext
, , tag_09.ext
, tag_0A.ext
, ..., tag_0F.ext
, tag_10.ext
, ..., tag_0F.ext
, ..., tag_FF.ext
(with just one trailing zero).Software and programming conventions usually represent a file sequence as a single virtual file object, whose name is comprehensively written in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
-like formatted-string notation to represent where the sequence number is located in the filename and what is its formatting. For the two examples above, that would be
filename%05d.ext
and tag_%02H.ext
, respectively, whereas for the former one, the same convention without trailing zeroTrailing zero
In mathematics, trailing zeros are a sequence of 0s in the decimal representation of a number, after which no other digits follow....
es would be
filename%5d.ext
.Note, however, that such notation is usually not valid at operating system and command-line interface
Command-line interface
A command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...
levels, because the '
%
' character is neither a valid regular expressionRegular expression
In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
nor a universally legal filename character: that notation just stands as a placeholder
Placeholder
Placeholder may refer to:In language:* Placeholder name, words that can refer to objects or people, whose names are unknown or irrelevant* Filler text, shares some characteristics of a real written text, but is random or otherwise generated...
for the virtual file-like representing the whole explicit file sequence.
Notable software packages anckowledging explitic file sequences as single filesystem objects, rather typical in the Audio/Video post-production industry (see below), are found among products by Autodesk
Autodesk
Autodesk, Inc. is an American multinational corporation that focuses on 3D design software for use in the architecture, engineering, construction, manufacturing, media and entertainment industries. The company was founded in 1982 by John Walker, a coauthor of the first versions of the company's...
, Quantel
Quantel
Quantel is a company based in the United Kingdom and founded in 1973 that designs and manufactures digital production equipment for the broadcast television, video production and motion picture industries...
, daVinci
Da Vinci Systems
da Vinci Systems is a main manufacturer of high-end post-production color grading and film restoration systems for feature films, video production and broadcast post-production facilities...
, DVS
Digitale Videosysteme
Digital Video Systems AG is a German company specializing in digital cinema equipment.DVS is headquartered in Hannover, Germany, and is a leading manufacturer of high-performance digital video products for film, TV, post production and research and development...
, as well as Adobe After Effects
Adobe After Effects
Adobe After Effects is a digital motion graphics and compositing software published by Adobe Systems, used in the post-production process of filmmaking and television production. Its main uses are the origination of 2D and 2.5D animation, visual effects compositing and finishing...
.
File scattering
A file sequence located within a mass storageMass storage
In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. Devices and/or systems that have been described as mass storage include tape libraries, RAID systems, hard disk drives, magnetic tape drives, optical disc drives, magneto-optical...
device is said to be contiguous if:
- every file in the sequence is unfragmented, i.e. each file is stored in one contiguous and ordered piece of storage space (ideally in one or multiple, but contiguous, extents);
- consecutive files in the sequence occupy contiguous portions of storage space (extents, yet consistently with their file ordering).
File contiguity is a more practical requirement for file sequences than just their locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...
, because it is related to the storage medium hosting the whole sequence than to the sequence itself (or its metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
). At the same time, it is a "high-level" feature, because it is not related to the physical and technical details of mass storage itself: particularly, file contiguity is realized in different ways according to the storage device's architecture and actual filesystem structure. At "low level", each file in a contiguous sequence must be placed in contiguous blocks, in spite of reserved areas or special metadata required by the filesystem (like inodes or inter-sector headers) actually interleaving them.
File contiguity is, in most practical applications, "invisible" at operating-system or user levels, since all the files in a sequence are always available to applications in the same way, regardless of their physical location on the storage device (due to operating systems hiding the filesystem internals to higher-level services). Indeed, file contiguity may be related to I/O performance when the sequence is to be read or written in the shortest time possible.
In some contexts (like optical disk burning - also cfr. below), data in a file sequence must be accessed in the same order as the file sequence itself; in other contexts, a "random" access to the sequence may be required. In both cases, most professional filesystems provide faster access strategies to contiguous files than non-contiguous ones. Data pre-allocation is crucial for write access, whereas burst read speeds are achievable only for contiguous data.
When a file sequence is not contiguous, it is said to be scattered, since its files are stored in sparse locations on the storage device. File scattering is the process of allocating (or re-allocating) a file sequence as being (or becoming) uncontiguous. That is often associated with file fragmentation
File system fragmentation
In computing, file system fragmentation, sometimes called file system aging, is the inability of a file system to lay out related data sequentially , an inherent phenomenon in storage-backed file systems that allow in-place modification of their contents. It is a special case of data fragmentation...
too, where each file is also stored in several, non-contiguous blocks; mechanisms contributing to the former are usually a common cause to the latter too. The act of reducing file scattering by means of allocating (in the first place) or moving (for already-stored data) files in the same sequence near together on the storage medium is called (file) file descattering.
A few defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...
strategies and dedicated software are able to both defragment single files and descatter file sequences.
Multimedia file sequences
There are many contexts which explicit file sequences are particularly important in: incremental backups, periodic logs and multimedia files captured or created with a chronological locality of referenceLocality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...
. In the latter case, explicit file numbering is extremely important in order to provide both software and end users a way to discern the consequentiality of the contents stored therein. For example, digital cameras and similar devices save all the picture files in the same folder (until it either reaches its maximum file-number capacity, or a new event like midnight-coming or device-switching takes place) with a final number sequence: it would be very unpractical to choose a filename for each taken shot on the very shooting time, so the camera firmware/software picks one which is perfectly identifiable by its sequence number. With the aid of other metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
(and usually of specialized PC software), users can later on discern the multimedia contents and re-organize them, if needed.
The Digital Intermediate example
A typical example where explicit file sequences, as well as their contiguity, becomes crucial is in the digital intermediateDigital intermediate
Digital intermediate is a motion picture finishing process which classically involves digitizing a motion picture and manipulating the color and other image characteristics. It often replaces or augments the photochemical timing process and is usually the final creative adjustment to a movie...
(DI) workflow for motion picture and video industries. In such contexts, video data need to maintain the highest quality and be ready for visualization (usually real-time if not even better). Usually video data are acquired from either a digital video camera or a motion picture film scanner
Motion picture film scanner
A motion picture film scanner is a device used in digital filmmaking to scan original film for storage as high-resolution digital intermediate files.A film scanner scans original film stock: negative or positive print or reversal/IP...
and stored into file sequences (as much as a common photographic camera does) and need to be post-produced in several steps, including at least editing, conforming and colour-correction. That requires:
- Uncompressed data, because any lossy compression, which is common in most finalized products, introduces unacceptable quality losses.
- Uncompressed data (again), because decompression times may degrade playing/visualization performance by hardware and software.
- Frame-per-file data management, because common post-production operations imply the shortest seek-times ever; "fast-forwarding" or "rewinding" to a specific (key) frame is much faster if done at filesystem level rather than within a huge, possibly fragmentedDefragmentationIn the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...
video file; every frame is then stored in a single file as a still digital picture. - Unambiguous frames' ordering, for obvious reasons, which is best accomplished grouping all the files together with explicit file numbering.
- File contiguity, because many filesystem architectures employ higher I/O speeds if transferring data on contiguous areas of the storage, whereas random allocation might prevent real-time or better loading performances.
Consider that a single frame in a DI
Digital intermediate
Digital intermediate is a motion picture finishing process which classically involves digitizing a motion picture and manipulating the color and other image characteristics. It often replaces or augments the photochemical timing process and is usually the final creative adjustment to a movie...
project is currently from 9MB to 48MB large (depending upon resolution
Display resolution
The display resolution of a digital television or display device is the number of distinct pixels in each dimension that can be displayed. It can be an ambiguous term especially as the displayed resolution is controlled by all different factors in cathode ray tube , flat panel or projection...
and colour-depth
Color depth
In computer graphics, color depth or bit depth is the number of bits used to represent the color of a single pixel in a bitmapped image or video frame buffer. This concept is also known as bits per pixel , particularly when specified along with the number of bits used...
), whereas video refresh rate
Refresh rate
The refresh rate is the number of times in a second that a display hardware draws the data...
is commonly 24 or 25 frames per second (if not faster); any storage required for real-time playing such contents thus needs a minimum overall throughput
Throughput
In communication networks, such as Ethernet or packet radio, throughput or network throughput is the average rate of successful message delivery over a communication channel. This data may be delivered over a physical or logical link, or pass through a certain network node...
of 220MB/s to 1.2GB/s, respectively. With those numbers, all the above requirements (particularly file contiguity, given nowadays storage performances) become strictly mandatory.
External links
- PySeq PySeq is an open source python module that finds groups of items that follow a naming convention containing a numerical sequence index (e.g. fileA.001.png, fileA.002.png, fileA.003.png...) and serializes them into a compressed sequence string representing the entire sequence (e.g. fileA.1-3.png).
- checkfileseq checkfileseq is an open source python script (usable via CLICommand-line interfaceA command-line interface is a mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks...
) that scans a directory structure recursively for files missing in a file sequence and prints a report upon completion. It supports a wide array of filename patterns and can be customized to gain additional pattern logic.