General RIFF
File Background
General RIFF description provided by
Robert Shuler <rlshuler@aol.com>
General
RIFF File Format
RIFF
is a Windows file format for storing chunks of multi-media data, associated
descriptions, formats, playlists, etc.
The Waveform Audio File Format
(.WAV) description below provides a precise description of the data unique to
.WAV files, but does not describe the RIFF file structure within which the .WAV
data is stored, so I have added this section to describe general RIFF files.
If
you read the raw file data you will need to process the structures described in
this section. If you use RIFF access
functions within windows, they will strip this information off and you will not
see it.
RIFF Header
A
RIFF file has an 8-byte RIFF header,
identifying the file, and giving the residual length after the header (i.e.
file_length - 8):
struct
{
char
id[4]; // identifier string = "RIFF"
DWORD len;
// remaining length after this
header
}
riff_hdr;
The
riff_hdr is immediately followed by a 4-byte
data type identifier. For .WAV files
this is "WAVE" as follows:
char
wave_id[4]; // WAVE file identifier =
"WAVE"
RIFF Chunks
The
entire remainder of the RIFF file is "chunks". Each chunk has an 8-byte chunk header identifying the type of chunk, and giving the
length in bytes of the data following the chunk header, as follows:
struct
{ // CHUNK 8-byte header
char
id[4]; // identifier, e.g.
"fmt " or "data"
DWORD len; //
remaining chunk length after header
}
chunk_hdr;
//
data bytes follow chunk header
This
concludes the general RIFF file description.
The types of chunks to expect for .WAV files (unexpected chunks should
be allowed for in processing RIFF files) and the format of the content data of
each chunk type are described in the sections that follow.
RIFF WAVE
(.WAV) file format
From: Rob Ryan <ST802200@brownvm.brown.edu>
Organization: Brown University
I
found the following lengthy excerpt in a document rmrtf.zrt (it is actually a
.zip file) in the vendor/microsoft/multimedia subdirectory at the ftp.uu.net
ftp site. It is presumably beyond the
scope (in terms of the amount of detail) of your document, but nevertheless, I
thought that it may help you in including references to the Windows .WAV format
in the future.
Let
me know if you have any questions/comments.
Again, thank you for your helpful summary. Keep it up!
The
following is taken from RIFFMCI.RTF, "Multimedia Programming Interface and
Data Specification v1.0", a Windows RTF (Rich Text Format) file contained
in the .zip file, RMRTF.ZRT. The
original document is quite long and this constitutes pages 83-95 of the text
format version (starting on roughly page 58 of the RTF version). If you would like a PostScript version, let
me know and I can make one up for you.
Waveform
Audio File Format (WAVE)
This
section describes the Waveform format, which is used to represent digitized
sound.
The
WAVE form is defined as follows. Programs must expect(and ignore) any unknown
chunks encountered, as with all RIFF forms. However, <fmt-ck> must always occur before <wave-data>, and both of these chunks are mandatory in a WAVE
file.<
WAVE-form> ->
RIFF(
'WAVE'
<fmt-ck> //
Format
[<fact-ck>] //
Fact chunk
[<cue-ck>] //
Cue points
[<playlist-ck>]
// Playlist
[<assoc-data-list>]
// Associated data list
<wave-data> ) //
Wave data
WAVE chunks are described in the following
sections.
WAVE
Format Chunk
The
WAVE format chunk <fmt-ck> specifies the format of the <wave-data>. The <fmt-ck>
is defined as follows:
<fmt-ck> -> fmt( <common-fields>
<format-specific-fields> )
<common-fields> ->
struct
{
WORD
wFormatTag; // Format category
WORD
wChannels; // Number of channels
DWORDdwSamplesPerSec; // Sampling rate
DWORDdwAvgBytesPerSec; // For buffer estimation
WORD
wBlockAlign; // Data block size
}
Common
Fields Chunk
The
fields in the <common-fields> chunk are as follows:
Field
Description
wFormatTag
A number indicating the WAVE
format category of
the
file. The content of the <format-specific-fields>
portion of the `fmt' chunk, and the interpretation of the waveform data,on this
value. must register any new WAVE format categories. See ``Registering
Multimedia Formats'' in Chapter 1, ``Overview of Multimedia,'' for information
on registering WAVE format categories. ``Wave Format Categories,'' following
this section, lists the currently defined WAVE format categories.
wChannels The number of channels represented in
the
waveform
data, such as 1 for mono or 2 for stereo.
dwSamplesPerSec The sampling rate (in samples per second)
at
which each channel should be played.
dwAvgBytesPerSec The average number of bytes per second
at
which the waveform data should be transferred. Playback software can estimate
the buffer size using this value.
wBlockAlign The block alignment (in bytes) of the
waveform
data. Playback software needs to process a multiple
of wBlockAlign bytes of data at a time, so the value of wBlockAlign can be used
for buffer alignment.
Format
Specific Fields Chunk
The
<format-specific-fields>
consists of zero or more bytes of parameters. Which parameters occur depends on
the WAVE format category-see the following section for details. Playback
software should be written to allow for (and ignore) any unknown <format-specific-fields>
parameters that occur at the end of this field.
WAVE
Format Categories
The
format category of a WAVE file is specified by the value of the wFormatTag
field of the `fmt' chunk. The
representation of data in <wave-data>, and the content of the
<format-specific-fields> of the `fmt' chunk, depend on the format
category.
The
currently defined open non-proprietary WAVE format categories are as follows:
wFormatTag Value Format Category_
WAVE_FORMAT_PCM (0x0001) Microsoft
Pulse Code Modulation (PCM)
The
following are the registered proprietary WAVE format categories:
wFormatTag Value Format Category_
FORMAT_MULAW (0x0101) IBM mu-law format
IBM_FORMAT_ALAW (0x0102) IBM a-law
format
IBM_FORMAT_ADPCM (0x0103) IBM AVC
Adaptive Differential PCM format
Microsoft
WAVE_FORMAT_PCM format
The
following sections describe the Microsoft WAVE_FORMAT_PCM format. If the wFormatTag
field of the <fmt-ck> is set to
WAVE_FORMAT_PCM, then the waveform data consists of samples represented in
pulse code modulation (PCM) format. For PCM waveform data, the <format-specific-fields> is
defined as follows:
<PCM-format-specific> ->
struct
{
WORD
wBitsPerSample; // Sample size
}
The
wBitsPerSample
field specifies the number of bits of data used to represent each sample of
each channel. If there are multiple channels, the sample size is the same for
each channel.
For
PCM data, the wAvgBytesPerSec field of the `fmt' chunk should be equal to the
following formula rounded up to the next whole number:
wBitsPerSample
wChannels
x wBitsPerSecond x --------------
8
The
wBlockAlign
field should be equal to the following formula, rounded to the next whole
number:
wBitsPerSample
wChannels
x --------------
8
Data
Packing for PCM WAVE Files
In
a single-channel WAVE file, samples are stored consecutively. For stereo WAVE
files, channel 0 represents the left channel, and channel 1 represents the
right channel. The speaker position mapping for more than two channels is
currently undefined. In multiple-channel WAVE files, samples are interleaved.
The
following diagrams show the data packing for a 8-bit mono and stereo WAVE files:
Data Packing for 8-Bit Mono PCM:
Sample
1 Sample 2 Sample 3 Sample 4
--------- --------- --------- ---------
Channel
0 Channel 0 Channel 0 Channel 0
Data Packing for 8-Bit Stereo PCM:
Sample
1 Sample 2
--------------------- ---------------------
Channel
0 Channel 1 Channel 0 Channel 0
(left)
(right) (left) (right)
The
following diagrams show the data packing for 16-bit mono and stereo WAVE files:
Data Packing for 16-Bit Mono PCM:
Sample
1 Sample
2
---------------------- ----------------------
Channel
0 Channel 0 Channel 0 Channel 0
low-order high-order low-order high-order
byte
byte byte byte
Data Packing for 16-Bit Stereo PCM:
Sample
1
---------------------------------------------
Channel
0 Channel 0 Channel 1 Channel 1
(left) (left) (right) (right)
low-order high-order low-order
high-order
byte
byte byte byte
Data
Format of the Samples
Each
sample is contained in an integer i. The size of i is the smallest number of
bytes required to contain the specified sample size. The least significant byte
is stored first. The bits that represent the sample amplitude are stored in the
most significant bits of i, and the remaining bits are set to zero.
For
example, if the sample size (recorded in nBitsPerSample) is 12 bits, then each
sample is stored in a two-byte integer. The least significant four bits of the
first (least significant) byte is set to zero. The
data format and maximum and minimums values for PCM waveform samples of various
sizes are as follows:
SampleSize DataFormat Max.Value MinimumValue
One
to Unsigned 255 (0xFF) 0
eight
bits integer
Nine
or Signed Largest Most negative
more
bits integer i positive value of i
value
of i
For
example, the maximum, minimum, and midpoint values for 8-bit and 16-bit PCM
waveform data are as follows:
Format Max.Value Min.Value MidpointValue
8-bit
PCM 255 (0xFF) 0 128 (0x80)
16-bit
PCM 32767 -32768 0
(0x7FFF) (-0x8000)
Examples
of PCM WAVE Files
Example
of a PCM WAVE file with 11.025 kHz sampling rate, mono, 8 bits per sample:
RIFF( 'WAVE' fmt(1, 1, 11025, 11025, 1, 8)
data(
<wave-data> ) )
Example
of a PCM WAVE file with 22.05 kHz sampling rate, stereo, 8 bits per sample:
RIFF( 'WAVE' fmt(1, 2, 22050, 44100, 2, 8)
data(
<wave-data> ) )
Example
of a PCM WAVE file with 44.1 kHz sampling rate, mono, 20 bits per sample:
RIFF( 'WAVE' INFO(INAM("O Canada"Z))
fmt(1, 1,
44100, 132300, 3, 20)
data( <wave-data>
) )
Storage
of WAVE Data
The
<wave-data> contains the
waveform data. It is defined as follows:
<wave-data>
-> { <data-ck> :
<data-list> }
<data-ck> ->
data( <wave-data> )
<wave-list>
-> LIST( 'wavl' { <data-ck> : // Wave samples
<silence-ck> }... ) // Silence
<silence-ck>
-> slnt( <dwSamples:DWORD> ) // Count of
//
silent samples
Note: The `slnt' chunk represents silence, not
necessarily a repeated zero volume or baseline sample. In 16-bit PCM data, if
the last sample value played before the silence section is a 10000, then if
data is still output to the D to A converter, it must maintain the 10000 value.
If a zero value is used, a click may be heard at the start and end of the
silence section. If play begins at a
silence section, then a zero value might be used since no other information is
available. A click might be created if the data following the silent section
starts with a nonzero value.
FACT
Chunk
The
<fact-ck> fact chunk stores important information about the contents of
the WAVE file. This chunk is defined as follows:
<fact-ck>
-> fact( <dwFileSize:DWORD> ) //
Number of samples
The
`fact'' chunk is required if the waveform data is contained in a `wavl'' LIST chunk and for all
compressed audio formats. The chunk is not required for PCM files using the `data'' chunk format.
The
"fact" chunk will be expanded to include any other information
required by future WAVE formats. Added fields will appear following the <dwFileSize> field. Applications
can use the chunk size field to determine which fields are present.
Cue-Points
Chunk
The
<cue-ck> cue-points chunk
identifies a series of positions in the waveform data stream. The
<cue-ck> is defined as follows:
<cue-ck>
-> cue( <dwCuePoints:DWORD> // Count of cue points
<cue-point>...
) // Cue-point table
<cue-point>
-> struct
{
DWORD dwName;
DWORD dwPosition;
FOURCC
fccChunk;
DWORD dwChunkStart;
DWORD dwBlockStart;
DWORD dwSampleOffset;
}
The
<cue-point> fields are as
follows:
Field Description
dwName
Specifies the cue point name.
Each
<cue-point>
record must have a unique dwName field.
dwPosition Specifies the sample position of the
cue
point.This
is the sequential sample number within the play order. See ``Playlist Chunk,''
later in this document, for a discussion of the play order.
fccChunk Specifies the name or chunk ID of
thechunk
containing
the cue point.
dwChunkStart Specifies the file position of the start of
the
chunk containing the cue point. This is a byte offset relative to the start of
the data section of the `wavl' LIST chunk.
dwBlockStart Specifies the file position of the start of
the
block containing the position. This is a byte offset relative to the start of the
data section of the `wavl' LIST chunk.
dwSampleOffset Specifies the sample offset of the
cuepoint
relative
to the start of the block.
Examples
of File Position Values
The
following table describes the <cue-point> field values for a WAVE file
containing multiple `data' and `slnt' chunks enclosed in a `wavl' LIST chunk:
CuePointLoc. Field Value
a `slnt' fccChunk FOURCC
value `slnt'.
dwChunkStart File
position of the`slnt' chunk
relative
to the start of the data section in the `wavl' LIST chunk.
dwBlockStart File
position of the datasection of
the
`slnt' chunk relative to the start of the data section of the `wavl' LIST
chunk.
dwSampleOffset Sample
position of the cuepoint
relative
to the start of the `slnt' chunk.
In a PCM fccChunk FOURCC
value `data'.
`data' chunk
dwChunkStart File
position of the`data' chunk
relative
to the start of the data section in the `wavl' LIST chunk.
dwBlockStart File
position of the cuepoint
relative
to the start of the data section of the `wavl' LIST chunk.
dwSampleOffset Zero
value.
In a fccChunk FOURCC
value `data'.
compressed
`data' chunk
dwChunkStart File
position of the startof the
`data'
chunk relative to the start of the data section of the `wavl' LIST chunk.
dwBlockStart File position of theenclosing block
relative
to the start of the data section of the `wavl' LIST chunk. The software can
begin the decompression at this point.
dwSampleOffset Sample
position of the cuepoint
relative
to the start of the block.
The
following table describes the <cue-point> field values for a WAVE file
containing a single `data' chunk:
CuePointLoc. Field Value
Within PCM fccChunk FOURCC
value `data'.
data
dwChunkStart Zero value.
dwBlockStart Zero value.
dwSampleOffset Sample position of the cuepoint
relative
to the start of the `data' chunk.
In a fccChunk
FOURCC value `data'.
compressed
`data' chunk
dwChunkStart Zero value.
dwBlockStart File position of theenclosing block
relative
to the start of the `data' chunk. The software can begin the decompression at
this point.
dwSampleOffset Sample position of the cuepoint
relative
to the start of the block.
Playlist
Chunk
The
<playlist-ck> playlist chunk
specifies a play order for a series of cue points. The <playlist-ck> is
defined as follows:
<playlist-ck> -> plst( <dwSegments:DWORD> // Count of play segments
<play-segment>...
) // Play-segment table
<play-segment> -> struct {
DWORD
dwName;
DWORD
dwLength;
DWORD
dwLoops;
}
The
<play-segment> fields are as
follows:
Field Description
dwName Specifies
the cue point name. This value
must
match one of the names listed in the <cue-ck> cue-point table.
dwLength Specifies the length of the section
in
samples.
dwLoops
Specifies the number of times
to play
the
section.
Associated
Data Chunk
The
<assoc-data-list> associated
data list provides the ability to attach information like labels to sections of
the waveform data stream. The <assoc-data-list>
is defined as follows:
<assoc-data-list> -> LIST('adtl'
<labl-ck>
// Label
<note-ck> // Note
<ltxt-ck> // Text with data length
<file-ck>
) // Media file
<labl-ck> -> labl( <dwName:DWORD>
<data:ZSTR> )
<note-ck> -> note( <dwName:DWORD>
<data:ZSTR> )
<ltxt-ck> -> ltxt( <dwName:DWORD>
<dwSampleLength:DWORD>
<dwPurpose:DWORD>
<wCountry:WORD>
<wLanguage:WORD>
<wDialect:WORD>
<wCodePage:WORD>
<data:BYTE>...
)
<file-ck> -> file( <dwName:DWORD>
<dwMedType:DWORD>
<fileData:BYTE>...)
Label
and Note Information
The
`labl' and `note' chunks have similar fields. The `labl' chunk contains a
label, or title, to associate with a cue point. The `note' chunk contains
comment text for a cue point. The fields are as follows:
Field Description
dwName Specifies the cue point name. This
value
must match one of the names listed in the <cue-ck>
cue-point
table.
data Specifies a NULL-terminated
string
containing
a text label (for the `labl' chunk)
or comment text (for the `note' chunk).
Text
with Data Length Information
The
`ltxt'' chunk contains text that is associated with a data segment of specific
length. The chunk fields are as follows:
Field Description
dwName Specifies the cue point name. This
value
must match one of the names listed in the <cue-ck>
cue-point
table.
dwSampleLength Specifies the number of samples in the
segment
of waveform data.
dwPurpose Specifies the type or purpose of the
text.
For example, dwPurpose can specify a FOURCC code like `scrp' for script text or `capt' for close-caption text.
wCountry Specifies the country code for the
text. See
``Country Codes'' in Chapter 2, ``Resource Interchange File Format,'' for
a current list of country codes.
wLanguage, Specify the language and dialect codes
wDialect for the text. See ``Language and
Dialect
Codes''
in Chapter 2, ``Resource Interchange File Format,'' for a current list of
language and dialect codes.
wCodePage Specifies the code page for the text.
Embedded
File Information
The
`file' chunk contains information described in other file formats (for example,
an `RDIB' file or an ASCII text file). The chunk fields are as follows:
Field Description
dwName Specifies the cue point name. This value
must
match one of the names listed in the <cue-ck>
cue-point table.
dwMedType Specifies the file type contained in
the
fileData
field. If the fileData section contains a RIFF form, the dwMedType field is the same as the RIFF form type for the
file. This field can contain a zero
value.
fileData Contains the media file.