Back to home page

Quest Cross Reference

 
 

    


0001 ***** ISO9660 Simplified for DOS/Windows
0002 by Philip J. Erdelsky *****
0003 *** 1. Introduction ***
0004 We weren't sure about it a few years ago, but by now it should be clear to
0005 everyone that CD-ROM's are here to stay. Most PC's are equipped with CD-ROM
0006 readers, and most major PC software packages are being distributed on CD-ROM's.
0007 Under DOS (and Windows, which uses the DOS file system) files are written to
0008 both hard and floppy disks with a so-called FAT (File Allocation Table) file
0009 system.
0010 Files on a CD-ROM, however, are written to a different standard, called
0011 ISO9660. ISO9660 is rather complex and poorly written, and obviously contains a
0012 number of diplomatic compromises among advocates of DOS, UNIX, MVS and perhaps
0013 other operating systems.
0014 The simplified version presented here includes only features that would
0015 normally be found on a CD-ROM to be used in a DOS system and which are
0016 supported by the Microsoft MS-DOS CD-ROM Extensions (MSCDEX). It is based on
0017 ISO9660, on certain documents regarding MSCDEX (version 2.10), and on the
0018 contents of some actual CD-ROM's.
0019 Where a field has a specific value on a CD-ROM to be used with DOS, that value
0020 is given in this document. However, in some cases a brief description of values
0021 for use with other operating systems is given in square brackets.
0022 ISO9660 makes provisions for sets of CD-ROM's, and apparently even permits a
0023 file system to span more than one CD-ROM. However, this feature is not
0024 supported by MSCDEX.
0025 *** 2. Files ***
0026 The directory structure on a CD-ROM is almost exactly like that on a DOS floppy
0027 or hard disk. (It is presumed that the reader of this document is reasonably
0028 familiar with the DOS file system.) For this reason, DOS and Windows
0029 applications can read files from a CD-ROM just as they would from a floppy or
0030 hard disk.
0031 There are only a few differences, which do not affect most applications:
0032    1. The root directory contains the notorious "." and ".." entries, just like
0033       any other directory.
0034    2. There is no limit, other than disk capacity, to the size of the root
0035       directory.
0036    3. The depth of directory nesting is limited to eight levels, including the
0037       root. For example, if drive E: contains a CD-ROM, a file such as E:
0038       \D2\D3\D4\D5\D6\D7\D8\FOO.TXT is permitted but E:
0039       \D2\D3\D4\D5\D6\D7\D8\D9\FOO.TXT is not.
0040    4. If a CD-ROM is to be used by a DOS system, file names and extensions must
0041       be limited to eight and three characters, respectively, even though
0042       ISO9660 permits longer names and extensions.
0043    5. ISO9660 permits only capital letters, digits and underscores in a file or
0044       directory name or extension, but DOS also permits a number of other
0045       punctuation marks.
0046    6. ISO9660 permits a file to have an extension but not a name, but DOS does
0047       not.
0048    7. DOS permits a directory to have an extension, but ISO9660 does not.
0049    8. Directories on a CD-ROM are always sorted, as described below.
0050 Of course, neither DOS, nor UNIX, nor any other operating system can WRITE
0051 files to a CD-ROM as it would to a floppy or hard disk, because a CD-ROM is not
0052 rewritable. Files must be written to the CD-ROM by a special program with
0053 special hardware.
0054 *** 3. Sectors ***
0055 The information on a CD-ROM is divided into sectors, which are numbered
0056 consecutively, starting with zero. There are no gaps in the numbering.
0057 Each sector contains 2048 8-bit bytes. (ISO9660 apparently permits other sector
0058 sizes, but the 2048-byte size seems to be universal.)
0059 When a number of sectors are to be read from the CD-ROM, they should be read in
0060 order of increasing sector number, if possible, since that is the order in
0061 which they pass under the read head as the CD-ROM rotates. Most implementations
0062 arrange the information so sectors will be read in this order for typical file
0063 operations, although ISO9660 does not require this in all cases.
0064 The order of bytes within a sector is considered to be the order in which they
0065 appear when read into memory; i.e., the "first" bytes are read into the lowest
0066 memory addresses. This is also the order used in this document; i.e., the
0067 "first" bytes in any list appear at the top of the list.
0068 *** 4. Character Sets ***
0069 Names and extensions of files and directories, the volume name, and some other
0070 names are expressed in standard ASCII character codes (although ISO9660 does
0071 not use the name ASCII). According to ISO9660, only capital letters, digits,
0072 and underscores are permitted. However, DOS permits some other punctuation
0073 marks, which are sometimes found on CD-ROM's, in apparent defiance of ISO9660.
0074 MSCDEX does offer support for the kanji (Japanese) character set. However, this
0075 document does not cover kanji.
0076 *** 5. Sorting Names or Extensions ***
0077 Where ISO9660 requires file or directory names or extensions to be sorted, the
0078 usual ASCII collating sequence is used. That is, two different names or
0079 extensions are compared as follows:
0080    1. ASCII blanks (32) are added to the right end of the shorter name or
0081       extension, if necessary, to make it as long as the longer name or
0082       extension.
0083    2. The first (leftmost) position in which the names or extensions are not
0084       identical determines the order. The name or extension with the lower
0085       ASCII code in that position appears first in the sorted order.
0086 *** 6. Multiple-Byte Values ***
0087 A 16-bit numeric value (usually called a word) may be represented on a CD-ROM
0088 in any of three ways:
0089   Little Endian Word:
0090       The value occupies two consecutive bytes, with the less significant byte
0091       first.
0092   Big Endian Word:
0093       The value occupies two consecutive bytes, with the more significant byte
0094       first.
0095   Both Endian Word:
0096       The value occupies FOUR consecutive bytes; the first and second bytes
0097       contain the value expressed as a little endian word, and the third and
0098       fourth bytes contain the same value expressed as a big endian word.
0099 A 32-bit numeric value (usually called a double word) may be represented on a
0100 CD-ROM in any of three ways:
0101   Little Endian Double Word:
0102       The value occupies four consecutive bytes, with the least significant
0103       byte first and the other bytes in order of increasing significance.
0104   Big Endian Double Word:
0105       The value occupies four consecutive bytes, with the most significant
0106       first and the other bytes in order of decreasing significance.
0107   Both Endian Double Word:
0108       The value occupies EIGHT consecutive bytes; the first four bytes contain
0109       the value expressed as a little endian double word, and the last four
0110       bytes contain the same value expressed as a big endian double word.
0111 *** 7. The First Sixteen Sectors are Empty ***
0112 The first sixteen sectors (sector numbers 0 to 15, inclusive) contain nothing
0113 but zeros. ISO9660 does not define the contents of these sectors, but for DOS
0114 they are apparently always written as zeros. They are apparently reserved for
0115 use by systems that can be booted from a CD-ROM.
0116 *** 8. The Volume Descriptors ***
0117 Sector 16 and a few of the following sectors contain a series of volume
0118 descriptors. There are several kinds of volume descriptor, but only two are
0119 normally used with DOS. Each volume descriptor occupies exactly one sector.
0120 The last volume descriptors in the series are one or more Volume Descriptor Set
0121 Terminators. The first seven bytes of a Volume Descriptor Set Terminator are
0122 255, 67, 68, 48, 48, 49 and 1, respectively. The other 2041 bytes are zeros.
0123 (The middle bytes are the ASCII codes for the characters CD001.)
0124 The only volume descriptor of real interest under DOS is the Primary Volume
0125 Descriptor. There must be at least one, and there is usually only one. However,
0126 some CD-ROM's have two or more identical Primary Volume Descriptors. The
0127 contents of a Primary Volume Descriptor are as follows:
0128      length
0129      in bytes  contents
0130      --------  ---------------------------------------------------------
0131         1      1
0132         6      67, 68, 48, 48, 49 and 1, respectively (same as Volume
0133                  Descriptor Set Terminator)
0134         1      0
0135        32      system identifier
0136        32      volume identifier
0137         8      zeros
0138         8      total number of sectors, as a both endian double word
0139        32      zeros
0140         4      1, as a both endian word [volume set size]
0141         4      1, as a both endian word [volume sequence number]
0142         4      2048 (the sector size), as a both endian word
0143         8      path table length in bytes, as a both endian double word
0144         4      number of first sector in first little endian path table,
0145                  as a little endian double word
0146         4      number of first sector in second little endian path table,
0147                  as a little endian double word, or zero if there is no
0148                  second little endian path table
0149         4      number of first sector in first big endian path table,
0150                  as a big endian double word
0151         4      number of first sector in second big endian path table,
0152                  as a big endian double word, or zero if there is no
0153                  second big endian path table
0154        34      root directory record, as described below
0155       128      volume set identifier
0156       128      publisher identifier
0157       128      data preparer identifier
0158       128      application identifier
0159        37      copyright file identifier
0160        37      abstract file identifier
0161        37      bibliographical file identifier
0162        17      date and time of volume creation
0163        17      date and time of most recent modification
0164        17      date and time when volume expires
0165        17      date and time when volume is effective
0166         1      1
0167         1      0
0168       512      reserved for application use (usually zeros)
0169       653      zeros
0170 The first 11 characters of the volume identifier are returned as the volume
0171 identifier by standard DOS system calls and utilities.
0172 Other identifiers are not used by DOS, and may be filled with ASCII blanks
0173 (32).
0174 Each date and time field is of the following form:
0175      length
0176      in bytes  contents
0177      --------  ---------------------------------------------------------
0178         4      year, as four ASCII digits
0179         2      month, as two ASCII digits, where
0180                  01=January, 02=February, etc.
0181         2      day of month, as two ASCII digits, in the range
0182                  from 01 to 31
0183         2      hour, as two ASCII digits, in the range from 00 to 23
0184         2      minute, as two ASCII digits, in the range from 00 to 59
0185         2      second, as two ASCII digits, in the range from 00 to 59
0186         2      hundredths of a second, as two ASCII digits, in the range
0187                  from 00 to 99
0188         1      offset from Greenwich Mean Time, in 15-minute intervals,
0189                  as a twos complement signed number, positive for time
0190                  zones east of Greenwich, and negative for time zones
0191                  west of Greenwich
0192 If the date and time are not specified, the first 16 bytes are all ASCII zeros
0193 (48), and the last byte is zero.
0194 Other kinds of Volume Descriptors (which are normally ignored by DOS) have the
0195 following format:
0196      length
0197      in bytes  contents
0198      --------  ---------------------------------------------------------
0199         1      neither 1 nor 255
0200         6      67, 68, 48, 48, 49 and 1, respectively (same as Volume
0201                  Descriptor Set Terminator)
0202       2041     other things
0203 *** 9. Path Tables ***
0204 The path tables normally come right after the volume descriptors. However,
0205 ISO9660 merely requires that each path table begin in the sector specified by
0206 the Primary Volume Descriptor.
0207 The path tables are actually redundant, since all of the information contained
0208 in them is also stored elsewhere on the CD-ROM. However, their use can make
0209 directory searches much faster.
0210 There are two kinds of path table -- a little endian path table, in which
0211 multiple-byte values are stored in little endian order, and a big endian path
0212 table, in which multiple-byte values are stored in big endian order. The two
0213 kinds of path tables are identical in every other way.
0214 A path table contains one record for each directory on the CD-ROM (including
0215 the root directory). The format of a record is as follows:
0216      length
0217      in bytes  contents
0218      --------  ---------------------------------------------------------
0219         1      N, the name length (or 1 for the root directory)
0220         1      0 [number of sectors in extended attribute record]
0221         4      number of the first sector in the directory, as a
0222                  double word
0223         2      number of record for parent directory (or 1 for the root
0224                  directory), as a word; the first record is number 1,
0225                  the second record is number 2, etc.
0226         N      name (or 0 for the root directory)
0227       0 or 1   padding byte: if N is odd, this field contains a zero; if
0228                  N is even, this field is omitted
0229 According to ISO9660, a directory name consists of at least one and not more
0230 than 31 capital letters, digits and underscores. For DOS the upper limit is
0231 eight characters.
0232 A path table occupies as many consecutive sectors as may be required to hold
0233 all its records. The first record always begins in the first byte of the first
0234 sector. Except for the single byte described above, no padding is used between
0235 records; hence the last record in a sector is usually continued in the next
0236 following sector. The unused part of the last sector is filled with zeros.
0237 The records in a path table are arranged in a precisely specified order. For
0238 this purpose, each directory has an associated number called its level. The
0239 level of the root directory is 1. The level of each other directory is one
0240 greater than the level of its parent. As noted above, ISO9660 does not permit
0241 levels greater than 8.
0242 The relative positions of any two records are determined as follows:
0243    1. If the levels are different, the directory with the lower level appears
0244       first. In particular, this implies that the root directory is always
0245       represented by the first record in the table, because it is the only
0246       directory with level 1.
0247    2. If the levels are identical, but the directories have different parents,
0248       then the directories are in the same relative positions as their parents.
0249    3. Directories with the same level and the same parent are arranged in the
0250       order obtained by sorting on their names, as described in Section 5.
0251 *** 10. Directories ***
0252 A directory consists of a series of directory records in one or more
0253 consecutive sectors. However, unlike path records, directory records may not
0254 straddle sector boundaries. There may be unused space at the end of each
0255 sector, which is filled with zeros.
0256 Each directory record represents a file or directory. Its format is as follows:
0257      length
0258      in bytes  contents
0259      --------  ---------------------------------------------------------
0260         1      R, the number of bytes in the record (which must be even)
0261         1      0 [number of sectors in extended attribute record]
0262         8      number of the first sector of file data or directory
0263                  (zero for an empty file), as a both endian double word
0264         8      number of bytes of file data or length of directory,
0265                  excluding the extended attribute record,
0266                  as a both endian double word
0267         1      number of years since 1900
0268         1      month, where 1=January, 2=February, etc.
0269         1      day of month, in the range from 1 to 31
0270         1      hour, in the range from 0 to 23
0271         1      minute, in the range from 0 to 59
0272         1      second, in the range from 0 to 59
0273                  (for DOS this is always an even number)
0274         1      offset from Greenwich Mean Time, in 15-minute intervals,
0275                  as a twos complement signed number, positive for time
0276                  zones east of Greenwich, and negative for time zones
0277                  west of Greenwich (DOS ignores this field)
0278         1      flags, with bits as follows:
0279                  bit     value
0280                  ------  ------------------------------------------
0281                  0 (LS)  0 for a norma1 file, 1 for a hidden file
0282                  1       0 for a file, 1 for a directory
0283                  2       0 [1 for an associated file]
0284                  3       0 [1 for record format specified]
0285                  4       0 [1 for permissions specified]
0286                  5       0
0287                  6       0
0288                  7 (MS)  0 [1 if not the final record for the file]
0289         1      0 [file unit size for an interleaved file]
0290         1      0 [interleave gap size for an interleaved file]
0291         4      1, as a both endian word [volume sequence number]
0292         1      N, the identifier length
0293         N      identifier
0294         P      padding byte: if N is even, P = 1 and this field contains
0295                  a zero; if N is odd, P = 0 and this field is omitted
0296     R-33-N-P   unspecified field for system use; must contain an even
0297                  number of bytes
0298 The length of a directory includes the unused space, if any, at the ends of
0299 sectors. Hence it is always an exact multiple of 2048 (the sector size). Since
0300 every directory, even a nominally empty one, contains at least two records, the
0301 length of a directory is never zero.
0302 All fields in the first record (sometimes called the "." record) refer to the
0303 directory itself, except that the identifier length is 1, and the identifier is
0304 zero. The root directory record in the Primary Volume Descriptor also has this
0305 format.
0306 All fields in the second record (sometimes called the ".." record) refer to the
0307 parent directory, except that the identifier length is 1, and the identifier is
0308 1. The second record in the root directory refers to the root directory.
0309 The identifier for a subdirectory is its name. The identifier for a file
0310 consists of the following fields, in the order given:
0311    1. The name, consisting of the ASCII codes for at least one and not more
0312       than eight capital letters, digits and underscores.
0313    2. If there is an extension, the ASCII code for a period (46). If there is
0314       no extension, this field is omitted.
0315    3. The extension, consisting of the ASCII codes for not more than three
0316       capital letters, digits and underscores. If there is no extension, this
0317       field is omitted.
0318    4. The ASCII code for a semicolon (59).
0319    5. The ASCII code for 1 (49). [On other systems, this is the version number,
0320       consisting of the ASCII codes for a sequence of digits representing a
0321       number between 1 and 32767, inclusive.]
0322 Some implementations for DOS omit (4) and (5), and some use punctuation marks
0323 other than underscores in file names and extensions.
0324 Directory records other than the first two are sorted as follows:
0325    1. Records are sorted by name, as described above.
0326    2. Every series of records with the same name is sorted by extension, as
0327       described above. For this purpose, a record without an extension is
0328       sorted as though its extension consisted of ASCII blanks (32).
0329    3. [On other systems, every series of records with the same name and
0330       extension is sorted in order of decreasing version number.]
0331    4. [On other systems, two records with the same name, extension and version
0332       number are permitted, if the first record is an associated file.]
0333 [ISO9660 permits names containing more than eight characters and extensions
0334 containing more than three characters, as long as both of them together contain
0335 no more than 30 characters.]
0336 It is apparently permissible under ISO9660 to use two or more consecutive
0337 records to represent consecutive pieces of the same file. Bit 7 of the flags
0338 byte is set in every record except the last one. However, this technique seems
0339 pointless and is apparently not used. It is not supported by MSCDEX.
0340 Interleaving is another technique that is apparently seldom used. It is not
0341 supported by MSCDEX (version 2.10).
0342 *** 11. Arrangement of Directory and Data Sectors ***
0343 ISO9660 does not specify the order of directory or file sectors. It merely
0344 requires that the first sector of each directory or file be in the location
0345 specified by its directory record, and that the sectors for directories and
0346 non-interleaved files be consecutive.
0347 However, most implementations arrange the directories so each directory follows
0348 its parent, and the data sectors for the files in each directory lie
0349 immediately after the directory and immediately before the next following
0350 directory. This appears to be an efficient arrangement for most applications.
0351 Some implementations go one step further and order the directories in the same
0352 manner as the corresponding path table records.
0353 *** 12. Extended Attribute Records ***
0354 Extended attribute records contain file and directory information used by
0355 operating systems other than DOS, such as permissions and logical record
0356 lengths.
0357 A CD-ROM written for DOS normally does not contain any extended attribute
0358 records.
0359 When reading a CD-ROM containing extended attribute records, early versions of
0360 MSCDEX simply returned incorrect results. Later versions learned to skip over
0361 extended attribute records.
0362 Philip J. Erdelsky
0363 San Diego, California USA
0364 pje@acm.org
0365 http://www.alumni.caltech.edu/~pje/