当前位置:早雪网网络学院编程文档文件格式 → chm文件的文件格式 (chm format)

chm文件的文件格式 (chm format)

减小字体 增大字体 作者:未知  来源:从互联网收集整理并转载  发布时间:2005-6-3 0:56:07

CHM文件的文件格式

Microsoft's HTML Help (.chm) format

Preface

This is documentation on the .chm format used by Microsoft HTML Help. This format has been reverse engineered in the past, but as far as I know this is the first freely available documentation on it. One Usenet message indicates that these .chm files are actually IStorage files documented in the Microsoft Platform SDK. However, I have not been able to locate such documentation.

Note

The word "section" is badly overloaded in this document. Sorry about that.

All numbers are in hexadecimal unless otherwise indicated in the text. Except in tabular listings, this will be indicated by $ or 0x as appropriate. All values within the file are Intel byte order (little endian) unless indicated otherwise.

The overall format of a .chm file

The .chm file begins with a short ($38 byte) initial header. This is followed by the header section table and the offset to the content. Collectively, this is the "header".

The header is followed by the header sections. There are two header sections. One header section is the file directory, the other contains the file length and some unknown data. Immediately following the header sections is the content.

The header starts with the initial header, which has the following format

0000: char[4]  'ITSF'
0004: DWORD    3 (Version number)
0008: DWORD    Total header length, including header section table and
               following data.
000C: DWORD    1 (unknown)
0010: DWORD    a timestamp.
               Considered as a big-endian DWORD, it appears to contain
               seconds (MSB) and fractional seconds (second byte).
	       The third and fourth bytes may contain even more fractional
               bits.  The 4 least significant bits in the last byte are
               constant.
0014: DWORD    Windows Language ID.  The two I've seen
               $0409 = LANG_ENGLISH/SUBLANG_ENGLISH_US
               $0407 = LANG_GERMAN/SUBLANG_GERMAN
0018: GUID     {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC}
0028: GUID     {7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC}

Note: a GUID is $10 bytes, arranged as 1 DWORD, 2 WORDs, and 8 BYTEs.

It is followed by the header section table, which is 2 entries, where each entry is $10 bytes long and has this format:

0000: QWORD    Offset of section from beginning of file
0008: QWORD    Length of section

Following the header section table is 8 bytes of additional header data. In Version 2 files, this data is not there and the content section starts immediately after the directory.

0000: QWORD    Offset within file of content section 0

The Header Sections

Header Section 0

This section contains the total size of the file, and not much else

0000: DWORD    $01FE (unknown)
0004: DWORD    0 (unknown)
0008: QWORD    File Size
0010: DWORD    0 (unknown)
0014: DWORD    0 (unknown)

Header Section 1: The Directory Listing

The central part of the .chm file: A directory of the files and information it contains.

Directory header

The directory starts with a header; its format is as follows:

0000: char[4]  'ITSP'
0004: DWORD    Version number 1
0008: DWORD    Length of the directory header
000C: DWORD    $0a (unknown)
0010: DWORD    $1000    Directory chunk size
0014: DWORD    "Density" of quickref section, usually 2.
0018: DWORD    Depth of the index tree
               1 there is no index, 2 if there is one level of PMGI
	       chunks.
001C: DWORD    Chunk number of root index chunk, -1 if there is none
               (though at least one file has 0 despite there being no
	       index chunk, probably a bug.) 
0020: DWORD    Chunk number of first PMGL (listing) chunk
0024: DWORD    Chunk number of last PMGL (listing) chunk
0028: DWORD    -1 (unknown)
002C: DWORD    Number of directory chunks (total)
0030: DWORD    Windows language ID
0034: GUID     {5D02926A-212E-11D0-9DF9-00A0C922E6EC}
0044: DWORD    $54 (This is the length again)
0048: DWORD    -1 (unknown)
004C: DWORD    -1 (unknown)
0050: DWORD    -1 (unknown)

The Listing Chunks

The header is directly followed by the directory chunks. There are two types of directory chunks -- index chunks, and listing chunks. The index chunk will be omitted if there is only one listing chunk. A listing chunk has the following format:

0000: char[4]  'PMGL'
0004: DWORD    Length of free space and/or quickref area at end of
               directory chunk 
0008: DWORD    Always 0. 
000C: DWORD    Chunk number of previous listing chunk when reading
               directory in sequence (-1 if this is the first listing chunk)
0010: DWORD    Chunk number of next listing chunk when reading
               directory in sequence (-1 if this is the last listing chunk)
0014: Directory listing entries (to quickref area)  Sorted by
      filename; the sort is case-insensitive.

The quickref area is written backwards from the end of the chunk. One quickref entry exists for every n e

[1] [2] [3]  下一页


Tags:chm,文件,文件,格式,chm,format
[数据载入中...] [返回上一页] [打 印]