From vicoli_g%eca401.enea.it@axion.bt.co.uk Mon Sep 20 09:43:22 1993
Received: from ICNUCEVX.CNUCE.CNR.IT by zaphod.axion.bt.co.uk with SMTP (PP); Mon, 20 Sep 1993 09:41:41 +0100
Received: from DECNET-MAIL (VICOLI_G@ECA401) by icnucevx.cnuce.cnr.it (PMDF
 V4.2-13 #4369) id <01H362TJK1NK9EGMQP@icnucevx.cnuce.cnr.it>; Mon,
 20 Sep 1993 10:42:45 MET
Date: Mon, 20 Sep 1993 10:42:45 +0100 (MET)
From: VICOLI_G@ECA401.ENEA.IT
Subject: Re: Interesting proposal
To: duplain
Message-Id: <01H362TJKULE9EGMQP@icnucevx.cnuce.cnr.it>
X-Vms-To: CNUCE::IN%"duplain@btcs.bt.co.uk"
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-Transfer-Encoding: 7BIT
Status: OR

****************************************************************************
*									   *
*                 Chess Base File Format (Jun 17, 1993)                    *
*									   *
****************************************************************************

The Chess Base File Format, stand: Jun 17, 1993
================================================

ChessBase uses BigEndian format, i.e. the first byte contains the
most significant part of the (multibyte) data

1. ChessBase Index File (CBI)
=============================

A CBI file indexes the game info in the CBF file. CBI information is
longwords (4 bytes)

The first longword contains the number of stored games + 1.
Following longwords contains the position of the games in the CBF file.
To get the true position, the number of the game + 1 be subtracted from
the position.  The last entry contains the first free position in the
CBF file.

Example: The CBF file contains 3 games. Game 1 is 48 bytes long,
game 2 39 bytes and game 3 35 bytes.

The CBI file thus contains 5 longwords:

    Byte   Contents   Note
  -------------------------------------------
    0.. 3    4        3 games
    4.. 7    2        at offset   2 - 2 =   0
    8..11   51        at offset  51 - 3 =  48
   12..15   91        at offset  91 - 4 =  87
   16..19  127        at offset 127 - 5 = 122

Note: The CBI information can not be used to compute the size of a
game in the CBF file.


2.  ChessBase File (CBF) 
========================

A CBF file contains all chess games scores, inluding comments and
variants.  The games should be accessed only through the CBI file.
Note: KnightStalker (german: fritz) can only handle about 4000 games
per file.

A game score consists of:

  - Game header (14 bytes, encoded)
  - Player text  (max. 63 bytes, encoded)
  - Source text  (max. 63 bytes, encoded)
  - Moves (max. 250) with variants (encoded)
  - comments (plain text)
  - optional starting position (33 bytes)


2.1 Game Header

The game header contains all important game information.
Assuming that the game header is stored in the variable
'unsigned char header[14]' decoding can perform with the
following short C program:

  int i; unsigned char k;
  for (i=13,k=101;i>=0;i--,k*=3) header[i] ^= k;
  header[11] ^= 14 + (header[4]&63) + (header[5]&63);

Each byte of the header (starting from the last byte) is xored with a
key, with an initial value of 101. After each xor, the key is
multiplied with 3. Values are computed modulo 256 - implicit in
the char format.

The decoded header contains:

 byte   type    name
 ----------------------------------------------------
  0     sbyte   GY (signed byte)
                year of game = GY+1900.
                (e.g. -29 = $E3 => 1871)
                if GY = 127 then no year specified

  1     byte    result
                bit 0..1:
                  0 = 0-1
                  1 = draw
                  2 = 1-0
                  3 = variant result stored in bits 2..5
                bit 2..5:
                  variant 1..15, e.g. 1: +-, 4: =, 8: -+, 14: T
                  0 for no specification
                  (only valid if bits 0..1 = 3)
                bit 6..7:
                  unused (should be zero)

  2/3   word    mlen
                number of halfmoves (with variants) + 1.
                This information is used to compute the size in bytes of
                the fourth part (Moves) of the game = movlen-1 bytes

  4     byte    plen
                bit 0..5:
                  size of 'player' field (2nd part of game record)
                bit 6..7:
                  ECO1 bits 5/6 (only valid if no special board setup
                  specified (see byte 10))

  5     byte    slen
                bit 0..5:
                  size of 'source' field (3rd part of game record)
                bit 6..7:
                  ECO1 bits 7/8 (only valid if no special board setup
                  specified (see byte 10))

  6/7   word    clen
                size in bytes of comment field (5th part of game record)

  8     byte    whtelo
                ELO rating of white player
                if WhtElo > 0 then ELO = 1600 + 5*whtelo
                else no specification

  9     byte    blkelo
                same for black player

 10     byte    flags1
                bit 0:
                  0 - ordinary starting position
                  1 - special board setup (6th part of game record)

                bit 6: 	1=game marked (bold in ChessBase game listing)
                bit 7:  1=game deleted (grey in ChessBase game listing)

                ordinary starting position:
                  bit 1..5: ECO1 bits 0..4
                special board setup:
                  bit 1: player to move (0 - white, 1 - black)
                  bit 2: white 0-0-0 permitted
                  bit 3: white 0-0 permitted
                  bit 4: black 0-0-0 permitted
                  bit 5: black 0-0 permitted

 11     byte    flags2

		bit 6: version
  		Apparently this bit is used to determine the ChessBase
		version, or perhaps whether ChessBase is running on a
		PC or the Atari.

                ordinary starting position:
                  bit 0..5 : ECO2 bits 0..5
                  bit 7    : ECO2 bit  6
                special board setup:
                  bit 0..3 : e.p. file (1-8 -> A-H)
                
 12     byte    number of moves (not half-moves)

 13     byte    magic byte (see 2.7) 
                  checksum ?


2.2 Encyclopaedic Chess Opening (ECO)

If no special startup position is defined (bit 0 of flags1 = 0) an ECO
may be given. The ECO consist of one letter (A..F) and two
numbers (00-99/00-99) (e.g. A00/00). 
The letter is computed by (ECO1-1)/100 where 0 means
A, 1 - B, ... 5 - F. The first number is computed by (ECO1-1)%100.
The second number is ECO2.
If ECO1==0, no ECO specified. 
ECO1 is a 9 bit word: 
bits 0-4: byte 10, bits 1-5 
bits 5/6: byte  4, bits 6/7 
bits 7/8: byte  5, bits 6/7 
ECO2 is a 7 bit word: 
bits 0-4: byte 11, bits 0-5 
bit    5: byte 11, bit    7

2.3. Player and Source Information

After the game header comes player and source information. The
size of the player text is in bits 0..5 of byte 4. The size of
the source is in bits 0..5 of byte 5. Assuming that the text
(player and source) is stored in the variable 'unsigned char text[]'
decoding can perform with the following short C program:

  int i;
  int l = (header[4]&63) + (header[5]&63);
  unsigned char k = 3 * l;
  for (i=l-1;i>=0;i--,k*=3) text[i] ^= k;

2.4.  Halfmoves and Variants

The fourth part contains the halfmoves and variants. The size of this
part (+1) is stored in byte 2/3 of the game header. Assuming that
the moves (halfmoves and variants) are stored in the variable
'unsigned char moves[]' decoding can perform with the following
short C program:

  int i;
  int s = header[2]*256 + header[3];
  unsigned char k = 49 * s;
  for (i=s-2;i>0;i--,k*=7) moves[i] ^= k;

Note: The first move is not coded (program bug?)

Each byte of the move list contains a half-move:
A Move Generator numbered all possible (and some illegal)
moves from the given position. Now its possible to store only the
certain number of the legal halfmove (assuming that the Move Generator
generates always less than 128 moves (see below).

In the following description, all directions are described from
the point of the white player. The sentence 'Black king on e4 moves
to the left' means Ke4-d4, "... up" means Ke4-e5, etc.

The moves are numbered as follows:

1. The board squares A1,...,A8,B1,...,B8, .... are examined. If a
square contains a figure of the move colour, the moves of this figure
are computed.

2. Compute King moves: The king moves are regardless of opponent threats.
The move order is:

     3 5 8
     2 K 7
     1 4 6

The king moves down and left, left, up and left, and so on until up
and right.

Then short and long castling are tested. This requires the king and
the corresponding rook to be correctly placed, and the intervening
squares to be free.

3. Queen: The same moves as for the Bishop and Rook (in that order)
are tried.

4. Rook:  The rook starts off to the left until it runs into a
figure (opponents pieces are captured). The 'runs' are tried in this
order: left, down, right and up.

5. Bishop: As rook, but directions are left down, right down,
right up, left up.

6. Knight:  The folowing diagram is used:

         | 6 |   | 8 |
      ---+---+---+---+---
       2 |   |   |   | 4
      ---+---+---+---+---
         |   | S |   |
      ---+---+---+---+---
       1 |   |   |   | 3
      ---+---+---+---+---
         | 5 |   | 7 |

7: Pawn: First the long pawn move is tried, and then the short move.
If the pawn can be promoted, four moves are computed: =Q, =R, =B and
=N.  Then captures first left, then right are tried. Finally e.p.
captures are checked, which requires that the captured pawn just made
a long move.

Apart from this, the following illegal e.p. move is computed
(ChessBase without KnightStalker permits this move
during a normal game (intentionally?). The situation is described
for white, but applied also to black:

  A white pawn is on the 5th line, next to a pawn of the opposite
  colour, which cannot be caught e.p. If white made a long
  pawn move and put the pawn in front of the black pawn, and black
  castled, then white can illegally capture e.p.

  Example:

  white pawn on e2 and d2, black king on e8,
  rook on h8, pawn on d5.  If white moves d2-d4, and black
  castles 0-0, e5xd6e.p. is a possible move.

It's very easy to generate this illegal move (see C program).

Now the order of all moves is defined, the move list can be
interpreted.  1 means first move etc.

The value 0 does not occur in the list. $FF indicates the start
of a variant, and $80 the end. These variants can be nested, so
it's not enough to find the first $80 to find the end of the
variant.

Every move can be commented, and is indicated by setting bit 7 of the
move byte.

2.5.  Comments

Comments are not encoded. A comment starts with a zero longword
and ends with a $FF byte. The first comment begin with a extra 
$FF byte. Some details: The value 177 stand for the King symbol,
178: queen, 179: knight, 180: bishop, 181: rook, 182: pawn.
$FE stands for cr-lf.

Anjo Anjewierden: 
The comments (annotations) are encoded as follows.  First some definitions:

    move   = move evaluation (! command in ChessBase)
    pos    = position evaluation (= command in ChessBase)
    extra  = additional evaluation (Move evaluation command in ChessBase)
    START  = byte 0xff
    NULL   = byte 0x00
    string = annotation (^A command in ChessBase)

The syntax for comments is:

START move [pos] [extra] [NULL] [string]

The optional fields are only used if they have a value or if one of
the following fields has a value.  Some examples, suppose you enter
"This is a great move" (no evaluations), then the encoding is:

START NULL NULL NULL NULL string="This is a great move"

A comment ends on the START of the next comment.

2.6.  Position

The ChessBase file format allows to store games that did not start
from the starting position.  The following information is required;

 -  starting position
 -  next to move
 -  en passant line
 -  remaining castling opportunities
 -  move number of starting position

If a starting position is supplied, bit 0 of byte 10 is set.
The last 33 bytes provides the starting position. A byte specifies two
squares, and the squares are defined in the A1,..,H1,..,A8,...,H8 order.
The four upper bits specify the A,C,E and G files.
The pieces are represented as follows:

            white   black
  ---------------------------
  King        1         9
  Queen       2        10
  Knight      3        11
  Bishop      4        12
  Rook        5        13
  Pawn        6        14

The last byte (nr 33) contains the move number - 1.

Example: next move is nr 55:

 8   k q r b n - - -
 7   p - - - - - - -   black
 6   - - - - - - - -
 5   - - - - - - - -
 4   - - - - - - - -
 3   - - - - - - - -
 2   P - - - - - - -   white
 1   K Q R B N - - -

     a b c d e f g h

The thirty-two bytes are (in hex)

  $12,$54,$30,$00, $60,$00,$00,$00,
  $00,$00,$00,$00, $00,$00,$00,$00,
  $00,$00,$00,$00, $00,$00,$00,$00,
  $E0,$00,$00,$00, $9A,$DC,$B0,$00

The 33rd byte contains 54.

2.7.  Magic Byte 13

Here are the comments from decauter@pool.informatik.rwth-aachen.de
(Wolfgang): 

  > Byte 13 marks this game "public" or "private".
  > Of course there are 256 possibilities.
  > 
  > "public":
  > (At least) one possibility stands for "everyone may read this game".
  > This is computed using the bytes 0 - 12 (?).
  > I don't know the exact function.
  >
  > "private":
  > (At least) one possibility stands for "user xxxx (and some others)
  > may read this game".
  > This is computed using the bytes 0 - 12 (?) and
  > the special user ID (?) from the program.
  > I don't know the exact function.
  >
  > Up to 254 possibilities stand for "user yyyy (and many others) are
  > not allowed to read this game".
  > Trying to read such a game I get the German message: "Fremddaten".
  > (I'm using the old German Atari Version 3.0 of ChessBase.)


3.  Opening files (FBK)
========================

An FBK file contains opening moves in the following format:

- a move is specified by starting and ending square.
  A1 = 0, H8 = 63.
  rank = number / 8, file = number & 7

- the squares are stored in bits 0..5 of the bytes.

- the starting square is placed before the ending square.

- bit 6 in byte 1 indicates if this move starts a new variant
  (bit 6 = 0) or not (bit 6 = 1)

- bit 7 in byte 1 indicates if the move ends the variant
  (bit 7 = 1) or not (bit 7 = 0)

- bit 6 in byte 2 is unused

- bit 7 in byte 2 indicates a weak move which should not be played
  by the computer.

-----------------------------------------------------------------------
Horst Aurisch, Email: aurisch@informatik.uni-bonn.de

