NTX file format
---------------
Version of this document: 2.0
by Cesar A. Gil, January 06, 1997


Introduction

	The NTX file format is the format of index files used by
   Clipper. It is much more efficient than NDX files, with respect to
   performance and file size (usually more than twice the speed, and
   only 80% of the size of an equivalent NDX file). The extension is also
   different to indicate that its format is not the same as NDX.

	The NTX file stores a B-tree. A B-tree consists of nodes pointing
   to sub-nodes.

	The B-tree allows you to traverse the nodes in a depth-first
   order, thus allowing you to retrieve the keys in crescent order. It also
   allows you to retrieve a given key or its closest-matching key with an
   efficient algorithm, since you must only traverse a few nodes to
   locate a specific record. However, as far as theory is concerned,
   nothing beyond a diagram of a B-tree will be given here.

	Updating an NTX file may be very difficult, since after a
   change you must keep the B-tree balanced. However updating the B-tree
   is a laborious task, the information given here may easily allow you
   to read the NTX file.


Diagram of a B-tree

			    root node
			    -----------------------
			    | . | 50 | . | 60 | . |
			    -----------------------
			      |        |        |
	     +----------------+        |        +-----------+
	     |                         |                    |
  -----------------------   -----------------------   --------------
  | . | 10 | . | 20 | . |   | . | 52 | . | 56 | . |   | . | 62 | . |
  -----------------------   -----------------------   --------------
    |        |               |        |        |       |        |
   ///       |              ///      ///      ///     ///      ///
	     |
	-----------------------
	| . | 15 | . | 16 | . |
	-----------------------
	  |        |        |
	 ///      ///      ///

   In this diagram we have assumed that each node can hold up to 2 keys,
   however a B-tree can be specified to have any maximum number of keys
   per node (n >= 2). A node doesn't need to be used up to the maximum. For
   example, in this diagram we have a node which is not fully used (the node
   which only contains the value 62).

   We also didn't care here to create a balanced tree, as would happen
   in a real situation, since we are concerned about giving the reader
   a feeling of how it is to traverse a B-tree. The diagram is also
   intended to help visualize the structure of the NTX file.

   Each node contains pointers between its key values. For example, look
   at the root node. The pointer to the left of the value 50 points to a
   sub-tree that contains every key lesser than 50. The pointer bewteen the
   values 50 and 60 points to a sub-tree containing every value between 50
   and 60. The pointer after the value 60 points to a sub-tree containing
   all values greater than 60.

   At some given time we will arrive at a point where there will be no more
   branching. We indicate that by an empty pointer (represented here by
   ///). In NTX files, an empty pointer (that does not point to a
   sub-tree) contains a 0.

   Now it is very easy to see that, if you traverse the tree in a depth-
   first search, you'll retrive all the key values in order. It's also
   very easy to define an algorithm to locate a given key efficiently
   in the tree. The difficult part is updating the B-tree, by inserting,
   changing or removing records, because the B-tree must be kept balanced.
   The algorithm for doing so is very complex and literature is available
   on the subject.


Description of an NTX file

    The NTX file consists of pages of 1K.
    The first 1K page contains a header.
    The other pages contain nodes of the B-tree. (Each 1K page will hold
      a node).

    Header structure:

	Offset
	  00h  unsigned signature - contains 09 if good NTX file
	  02h  unsigned version   - version of indexing system that the
				    file was created under
				    (0x1 is typical but may change)
	  04h  long root          - file offset (within the NTX file)
				    where the root page is.
	  08h  long free_page     - file offset (withing the NTX file)
				    where the first of a list of free
				    pages is. A 0 value ends the list.
				    The Clipper index system is intended
				    to reuse free pages.
	  0Ch  unsigned item_size - the size of the item.
	  0Eh  unsigned key_size  - the size of the index key
	  10h  unsigned key_decim - number of decimal positions for the
				    key, if it is a numeric key.
	  12h  unsigned max_item  - maximum number of items per page.
	  14h  unsigned min_item  - minimun number of items per page
				    (max_item/2 is typical)
	  16h  char key_expr[256] - an array containing the key
				    expression for the index.
				    The expression must be terminated by
				    null (ASCII 0) and its length cannot
				    be greater than 256 bytes.
	  272h unsigned unique    - contains the value of the UNIQUE
				    option for the index.
				    (1=index is UNIQUE; 0=isn't UNIQUE)

    Comments:

    (1)	The size of the index key (key_size) is determined when you issue
	the command

	    INDEX ON key_expr TO file.ntx

	The key_expr is evaluated for each record of the data file.
	The value returned by evaluating key_expr has its size calculated.
	It is assumed that all the evaluations of key_expr will return
	values of the same type and length.
	If you define a key_expr that returns values of different lengths,
	the Clipper indexing system will give unpredictable results and,
	eventually, an error will happen.

	An example (of what you shouldn't do) is:

	    INDEX ON TRIM(CLIENT_NAME) TO CLIENTS


	Based on the determined key_size, Clipper will calculate how
	many items a page can hold and will thus determine max_items
	and min_items. Remember, each item holds a key value, whose size
        depends on the key_size. That's why the item size is dynamically
        determined.



    Node structure:

	Offset
	  00h  unsigned item_count - how many items this particular
				     page holds.
	  02h  unsigned item_offset[item_count] - holds the offsets, within
		the page, where the items are located.
	  ??   item area

	       The size of each item is: 8 + key_size.
	       The last item doesn't contain a key value, only a pointer
		 to a sub-tree.

	       Each item has the following structure:

	       long page_offset - file offset (within NTX file) where
				  sub-tree is. If there's no subtree,
				  a 0 is stored.
	       long recno       - record number for this key value
				  (in the DBF file)
	       char key_value[key_size]




	   For example, the node:

			    -----------------------
			    | . | 50 | . | 60 | . |
			    -----------------------
			      |        |        |
			   to page  to page   to page
			     at       at       at
			  offset x  offset y  offset z

	   Will be represented in a page containing:

	   Offset
	     00h  unsigned item_count = 3
	     02h  unsigned item_offset[0] = 08h
	     04h  unsigned item_offset[1] = 12h
	     06h  unsigned item_offset[2] = 1Ch
	     08h  item[0] :
		      long page_offset  : x
		      long recno        : DBF record number pointed by
					 key value 50
		      char key_value[2] : '50'
	     12h  item[1] :
		      long page_offset  : y
		      long recno        : DBF record number pointed by
					 key value 60
		      char key_value[2] : '60'
	     1Ch  item[2] :
		      long page_offset  : z


Conclusion:

    The NTX file structure is easy to access. The main concern is about
    the fact that each NTX file will have items of different sizes and
    pages with different number of items.


Source code example - NTXVIEW.C :

    Author     : Cesar A. Gil
    Date       : January 06, 1997
    Language   : C
    Use        : NTXVIEW file.ntx
    Description: lists the contents of the ntx file's header, then
		 traverses all the tree in depth-first order, listing all
		 the keys in order.


    ---------------------------------------------------------------

    /*
     * ntxdump.h
     */
    #define MAX_KEY  256
    #define BUF_SIZE 1024

    typedef struct {
	unsigned type;
	unsigned version;
	long     root;
	long     next_page;
	unsigned item_size;
	unsigned key_size;
	unsigned key_dec;
	unsigned max_item;
	unsigned half_page;
	char     key_expr[MAX_KEY];
	char     unique;
    } NTX_HEADER;

    typedef struct
    {
	long page;
	long rec_no;
	char key[1];
    } NTX_ITEM;

    typedef struct
    {
	unsigned item_count;
	unsigned item_offset[1];
    } NTX_BUFFER;

    ---------------------------------------------------------------

    /*
     * ntxview.c
     */
    #include "stdio.h"
    #include "stdlib.h"
    #include "fcntl.h"
    #include "io.h"

    #include "ntxview.h"

    NTX_HEADER ntx_header;
    int        ntx_handle;

    void view_page(long page_offset);


    void main(int argc, char ** argv)
    {
	if( argc != 2 ) {
	    printf( "Use: ntxhead arquivo.ntx\n" );
	    exit(1);
	}

	ntx_handle = open(argv[1], 0x8000);
	if( ntx_handle == -1 ) {
	    printf( "Open error: %s\n", argv[1] );
	    exit(1);
	}

	/* Reads header */
	if( read(ntx_handle, &ntx_header, sizeof(NTX_HEADER))
	     != sizeof(NTX_HEADER) ) {
	    printf( "Error reading header of %s\n", argv[1] );
	    exit(1);
	}

	/* Prints header information */
	printf( "***** NTX header *****'n" );
	printf( "Signature %xh\n", ntx_header.type );
	printf( "Version %xh\n", ntx_header.version );
	printf( "Root %lxh\n", ntx_header.root );
	printf( "Next page %xh\n", ntx_header.next_page );
	printf( "Item size %xh\n", ntx_header.item_size );
	printf( "Key size %xh\n", ntx_header.key_size );
	printf( "Key decimals %xh\n", ntx_header.key_size );
	printf( "Max item %xh\n", ntx_header.half_page );
	printf( "Key expr: '%s'\n", ntx_header.key_expr );
	printf( "Unique: %d\n", ntx_header.unique );

	view_page(ntx_header.root);

	close(ntx_handle);
    }


    void view_page(long page_offset)
    {
	char       * page;
	NTX_BUFFER * buffer;
	NTX_ITEM   * item;
	unsigned   i;

	/* Allocates page */
	page = malloc(BUF_SIZE);
	buffer = (NTX_BUFFER *) page;

	if( page == NULL ) {
	    printf( "Error allocating memory.\n" );
	    exit(1);
	}

	/* Moves to the given page offset within the file */
	lseek(ntx_handle, page_offset, 0);

	/* Reads page */
	if( read(ntx_handle, buffer, BUF_SIZE) != BUF_SIZE ) {
	    printf( "Error reading page %lxh\n", page_offset );
	    exit(1);
	}

	/* Traverses the vector, item by item */
	for( i = 0; i < buffer->item_count; i ++ ) {
	    item = (NTX_ITEM *) (page + buffer->item_offset[i]);
	    if( item->page != 0 )
		view_page(item->page);
	    printf("dbf recnum: %5lx   Key: %.*s\n", item->rec_no,
		ntx_header.key_size, &item->key);
	}

	/* Handles last pointer */
	item = (NTX_ITEM *) (page + buffer->item_offset[buffer->item_count]);
	if( item->page )
	    view_page(item->page);

	/* Deallocates page */
	free(page);
    }

    ---------------------------------------------------------------


- end of document -
