Au file format
Encyclopedia
The Au file format is a simple audio file format
introduced by Sun Microsystems
. The format was common on NeXT
systems and on early Web pages. Originally it was headerless, being simply 8-bit µ-law-encoded data at an 8000 Hz sample rate. Hardware from other vendors often used sample rates as high as 8192 Hz, often integer factors of video clock signals. Newer files have a header that consists of six unsigned
32-bit
words, an optional information chunk and then the data (in big endian format).
Although the format now supports many audio encoding
formats, it remains associated with the µ-law logarithmic encoding. This encoding was native to the SPARCstation 1
hardware, where SunOS
exposed the encoding to application programs through the /dev/audio interface. This encoding and interface became a de facto
standard for Unix
sound.
The type of encoding depends on the value of the "encoding" field (word 3 of the header). Formats 2 through 7 are uncompressed PCM, therefore lossless. Formats 23 through 26 are ADPCM, which is a lossy, roughly 4:1 compression. Formats 1 and 27 are μ-law and A-law, respectively, both lossy. Several of the others are DSP
commands or data, designed to be processed by the NeXT
MusicKit software.
Note: PCM data appear to be encoded as signed, rather than unsigned.
Audio file format
An audio file format is a file format for storing digital audio data on a computer system. This data can be stored uncompressed, or compressed to reduce the file size. It can be a raw bitstream, but it is usually a container format or an audio data format with defined storage layer.-Types of...
introduced by Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...
. The format was common on NeXT
NeXT
Next, Inc. was an American computer company headquartered in Redwood City, California, that developed and manufactured a series of computer workstations intended for the higher education and business markets...
systems and on early Web pages. Originally it was headerless, being simply 8-bit µ-law-encoded data at an 8000 Hz sample rate. Hardware from other vendors often used sample rates as high as 8192 Hz, often integer factors of video clock signals. Newer files have a header that consists of six unsigned
Signedness
In computing, signedness is a property of data types representing numbers in computer programs. A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers .As signed numbers can represent negative numbers, they...
32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....
words, an optional information chunk and then the data (in big endian format).
Although the format now supports many audio encoding
Digital audio
Digital audio is sound reproduction using pulse-code modulation and digital signals. Digital audio systems include analog-to-digital conversion , digital-to-analog conversion , digital storage, processing and transmission components...
formats, it remains associated with the µ-law logarithmic encoding. This encoding was native to the SPARCstation 1
SPARCstation 1
The SPARCstation 1, or Sun 4/60, is the first of the SPARCstation series of SPARC-based computer workstations sold by Sun Microsystems. It had a distinctive slim enclosure and was first sold in April 1989, with Sun's support for it ending in 1995.Based around a LSI Logic RISC CPU running at...
hardware, where SunOS
SunOS
SunOS is a version of the Unix operating system developed by Sun Microsystems for their workstation and server computer systems. The SunOS name is usually only used to refer to versions 1.0 to 4.1.4 of SunOS...
exposed the encoding to application programs through the /dev/audio interface. This encoding and interface became a de facto
De facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...
standard for Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
sound.
New format
All fields are stored in big-endian format, including the sample data.32 bit word (unsigned) | field | Description/Content Hexadecimal Hexadecimal In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen... numbers in C C (programming language) C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system.... notation |
---|---|---|
0 | magic number Magic number (programming) In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:* A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures... |
the value 0x2e736e64 (four ASCII characters ".snd") |
1 | data offset | the offset to the data in byte Byte The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer... s. The minimum valid number is 24 (decimal), since this is the header length (six 32-bit words) with no space reserved for extra information. |
2 | data size | data size in bytes. If unknown, the value 0xffffffff should be used. |
3 | encoding | Data encoding format:
|
4 | sample rate | the number of samples/second, e.g., 8000 |
5 | channels | the number of interleaved channels, e.g., 1 for mono, 2 for stereo; more channels possible, but may not be supported by all readers. |
The type of encoding depends on the value of the "encoding" field (word 3 of the header). Formats 2 through 7 are uncompressed PCM, therefore lossless. Formats 23 through 26 are ADPCM, which is a lossy, roughly 4:1 compression. Formats 1 and 27 are μ-law and A-law, respectively, both lossy. Several of the others are DSP
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...
commands or data, designed to be processed by the NeXT
NeXT
Next, Inc. was an American computer company headquartered in Redwood City, California, that developed and manufactured a series of computer workstations intended for the higher education and business markets...
MusicKit software.
Note: PCM data appear to be encoded as signed, rather than unsigned.