RM-COBOL Index File Structure

This paper documents the internal structure of RM-COBOL index files.  
It is intended to provide background information to assist in the recovery of 
corrupt files and the resolution of problems. 

The information contained here is provided on an 'as-is' basis.  
It is certainly incomplete, and I can accept no responsibility for problems 
arising from its use.

The file structure has been tested out on data files from RM-COBOL for DOS and UNIX, 
versions 3, 4 and 5.  It has not been tested on later versions.


File structure.

RM-COBOL index files each have a constant block length.  
The block length is determined when the file is created, and is not necessarily 
the size specified in the 'BLOCK CONTAINS' clause of the creation program.  
The file contains a mixture of index and data blocks.  
The first file block contains header information which can be used to extract 
data from the index and data areas.

Block types.
The second byte in each block indicates the type of data held in the block.
Possible values are:

1	File Header
2	unknown
3	unknown
4	unknown
5	Index Block
6	Data Block
7	Current Data
8	Blank Record

If a file has multiple keys, the first byte of the index block (type 5) is set to the key number.  Primary key is 0; secondary keys are numbered from 1 to 254.

File header structure.
(Offset starts at byte 1)

Element			Size in Bytes	Offset in bytes

Signature ( = RMKF)		4	7
Minimum Record			2	17
Maximum Record			2	19
Data Compression Flag (2=on)	1	21
Space Code			1	22
Number Code			1	23
Key Compression Flag (2=on)	1	24
Key Number Code			1	25
Block Size			2	27
Block Increment			2	29
Block Contains			1	31
Allocation Increment		2	41
Index Blocks			2	45
No. of Records			2	53
Integrity Flag			1	55
Key Offset(Primary key)		2	257
Key Length(Primary key)		2	259
Empty Blocks			2	261
Alternate key offset (key n)	2	257+(n*36)
Alternate key length (key n)	2	259+(n*36)




Key Block Structure (Block type 5)

Element			Size in Bytes	Offset in bytes

Key number (Primary = 0)	1	1
Block ID (= 5)			1	2
Next Index Block No		2	5
Characters following		2	7
Index records start	variable	13
( following repeated until block filled)		
Data Block No			2	n
Data Record Sequence		1	n+2
No of repeated key characters	1	n+3
No of characters following	1	n+4
Key characters start			n+5
filler(Null)		2	n+5+no_chars

Data block structure (Block Type 6)

Element			Size in Bytes	Offset in bytes

Key number (Primary = 0)	1	1
Block ID (= 6)			1	2
Characters following		2	7
Data records start	variable	9
( following repeated until block filled)		
Filler(Null)			1	n
Data Record Sequence		1	n+1
No of bytes in record		2	n+(3*(no_keys-1) +2
Record data		variable	n+(3*(no_keys-1) +4

The record data contains a mixture of compression bytes, length bytes and data bytes.
Data bytes are preceded by a length byte giving the number of data characters following. Compression bytes are followed either by another compression byte or a length byte.  Compression bytes have a value greater than 127 (0x7f).
The logic used to translate compression bytes is (in C code):

if (out_char[i] > 127)
 {
    fill_char = ' ';
    fill_num = (out_char[i] - 126);
 }
 if (out_char[i] > 191)
 {
    fill_char = '0';
    fill_num = (out_char[i] - 190);
 }
 if (out_char[i] > 207)
 {
    fill_char = '\0';
    fill_num = (out_char[i] - 210);
 }
if (out_char[i] > 231)
 {
    fill_char = out_char[i+1];
    fill_num = (out_char[i] - 230);
 }
for (z=0;z<fill_num;++z)
   {
   tran_char[j] = fill_char;
   j++;

A typical data record might appear in hex as:

Hex:


00 0c 00 28 c0 0b 31 52 44 45 50 54 2d 4e 41 4d 45 87 c0 13 36 30 
33 30 44 65 70 61 72 74 6d 65 6e 74 20 4e 61 6d 65 83 01 58 c0 df 


Char:
X     (  1RDEPT-NAME6030Department Name  X

Translation:
001RDEPT-NAME         006030Department Name     X00..(13 nulls)..

Translated record has 64 characters.

A compression character value of 232 (0xe8) or greater indicates that the next 
single data character is repeated n times, 
where n equals the compression character - 230. 
No length byte is used for the single data character.


Scott Williamson
syw@cdvdc.demon.co.uk

SYW	
May 1996



Correction
==========

In the above description "A compression character value of 232 (0xe8) or greater indicates that the next single data character is repeated n times, where n equals the compression character - 230.' This is written both in text and in the pseudo-C algorithm.

It is believed that this is incorrect, a value of 229 should be used instead.
