man bt_format_names (Fonctions bibliothèques) - formatting BibTeX names for consistent output

NAME

bt_format_names - formatting BibTeX names for consistent output

SYNOPSIS

   bt_name_format * bt_create_name_format (char * parts,
                                           boolean abbrev_first);
   void bt_free_name_format (bt_name_format * format);
   void bt_set_format_text (bt_name_format * format, 
                            bt_namepart part,
                            char * pre_part,
                            char * post_part,
                            char * pre_token,
                            char * post_token);
   void bt_set_format_options (bt_name_format * format, 
                               bt_namepart part,
                               boolean abbrev,
                               bt_joinmethod join_tokens,
                               bt_joinmethod join_part);
   char * bt_format_name (bt_name * name, bt_name_format * format);

DESCRIPTION

After splitting a name into its components parts (represented as a CWbt_name structure), you often want to put it back together again as a single string in a consistent way. btparse provides a very flexible way to do this, generally in two stages: first, you create a name format which describes how to put the tokens and parts of any name back together, and then you apply the format to a particular name.

The name format is encapsulated in a CWbt_name_format structure, which is created with CWbt_create_name_format(). This function includes some clever trickery that means you can usually get away with calling it alone, and not need to do any customization of the format. If you do need to customize the format, though, CWbt_set_format_text() and CWbt_set_format_options() provide that capability.

The format controls the following:

•
which name parts are printed, and in what order (e.g. first von last jr, or von last jr first)
•
the text that precedes and follows each part (e.g. if the first name follows the last name, you probably want a comma before the `first' part: Smith, John rather than Smith John)
•
the text that precedes and follows each token (e.g. if the first name is abbreviated, you may want a period after each token: J. R. Smith rather than J R Smith)
•
the method used to join the tokens of each part together
•
the method used to join each part to the following part

All of these except the list of parts to format are kept in arrays indexed by name part: for example, the structure has a field

   char * post_token[BT_MAX_NAMEPARTS]

and CWpost_token[BTN_FIRST] (CWBTN_FIRST is from the CWbt_namepart CWenum) is the string to be added after each token in the first name---for example, CW"." if the first name is to be abbreviated in the conventional way.

Yet another CWenum, CWbt_joinmethod, describes the available methods for joining tokens together. Note that there are two sets of join methods in a name format: between tokens within a single part, and between the tokens of two different parts. The first allows you, for example, to change CW"J R Smith" (first name abbreviated with no post-token text but tokens joined by a space) to CW"JR Smith" (the same, but first-name tokens jammed together). The second is mainly used to ensure that von and last name-parts may be joined with a tie: CW"de~Roche" rather than CW"de Roche".

The token join methods are:

BTJ_MAYTIE
Insert a discretionary tie between tokens. That is, either a space or a tie is inserted, depending on context. (A tie, otherwise known as unbreakable space, is currently hard-coded as CW"~"---from TeX.) The format is then applied to a particular name by CWbt_format_name(), which returns a new string.
BTJ_SPACE
Always insert a space between tokens.
BTJ_FORCETIE
Always insert a tie (CW"~") between tokens.
BTJ_NOTHING
Insert nothing between tokens---just jam them together.

Tokens are joined together, and thus the choice of whether to insert a discretionary tie is made, at two places: within a part and between two parts. Naturally, this only applies when CWBTJ_MAYTIE was supplied as the token-join method; CWBTJ_SPACE and CWBTJ_FORCETIE always insert either a space or tie, and CWBTJ_NOTHING always adds nothing between tokens. Within a part, ties are added after a the first token if it is less than three characters long, and before the last token. Between parts, a tie is added only if the preceding part consisted of single token that was less than three characters long. In all other cases, spaces are inserted. (This implementation slavishly follows BibTeX.)

FUNCTIONS

bt_create_name_format()
   bt_name_format * bt_create_name_format (char * parts,
                                           boolean abbrev_first)
Creates a name format for a given set of parts, with variations for the most common forms of customization---the order of parts and whether to abbreviate the first name. The CWparts parameter specifies which parts to include in a formatted name, as well as the order in which to format them. CWparts must be a string of four or fewer characters, each of which denotes one of the four name parts: for instance, CW"vljf" means to format all four parts in von last jr first order. No characters outside of the set CW"fvlj" are allowed, and no characters may be repeated. CWabbrev_first controls whether the `first' part will be abbreviated (i.e., only the first letter from each token will be printed). In addition to simply setting the list of parts to format and the abbreviate flag for the first name, CWbt_create_name_format() initializes the entire format structure so as to minimize the need for further customizations:
•
The token join method---what to insert between tokens of the same part---is set to CWBTJ_MAYTIE (discretionary tie) for all parts
•
The part join method---what to insert after the final token of a particular part, assuming there are more parts to come---is set to CWBTJ_SPACE for the `first', `last', and `jr' parts. If the `von' part is present and immediately precedes the `last' part (which will almost always be the case), CWBTJ_MAYTIE is used to join `von' to `last'; otherwise, `von' also gets CWBTJ_SPACE for the inter-part join method.
•
The abbreviation flag is set to CWFALSE for the `von', `last', and `jr' parts; for `first', the abbreviation flag is set to whatever you pass in as CWabbrev_first.
•
Initially, all surrounding text (pre-part, post-part, pre-token, and post-token) for all parts is set to the empty string. Then a few tweaks are done, depending on the CWabbrev_first flag and the order of tokens. First, if CWabbrev_first is CWTRUE, the post-token text for first name is set to CW"."---this changes CW"J R Smith" to CW"J. R. Smith", which is usually the desired form. (If you don't want the periods, you'll have to set the post-token text yourself with CWbt_set_format_text().) Then, if `jr' is present and immediately after `last' (almost always the case), the pre-part text for `jr' is set to CW", ", and the inter-part join method for `last' is set to CWBTJ_NOTHING. This changes CW"John Smith Jr" (where the space following CW"Smith" comes from formatting the last name with a CWBTJ_SPACE inter-part join method) to CW"John Smith, Jr" (where the CW", " is now associated with CW"Jr"---that way, if there is no `jr' part, the CW", " will not be printed.) Finally, if `first' is present and immediately follows either `jr' or `last' (which will usually be the case in last-name first formats), the same sort of trickery is applied: the pre-part text for `first' is set to CW", ", and the part join method for the preceding part (either `jr' or `last') is set to CWBTJ_NOTHING. While all these rules are rather complicated, they mean that you are usually freed from having to do any customization of the name format. Certainly this is the case if you only need CW"fvlj" and CW"vljf" part orders, only want to abbreviate the first name, want periods after abbreviated tokens, non-breaking spaces in the right places, and commas in the conventional places. If you want something out of the ordinary---for instance, abbreviated tokens jammed together with no puncuation, or abbreviated last names---you'll need to customize the name format a bit with CWbt_set_format_text() and CWbt_set_format_options().
bt_free_name_format()
   void bt_free_name_format (bt_name_format * format)
Frees a name format created by CWbt_create_name_format().
bt_set_format_text()
   void bt_set_format_text (bt_name_format * format, 
                            bt_namepart part,
                            char * pre_part,
                            char * post_part,
                            char * pre_token,
                            char * post_token)
Allows you to customize some or all of the surrounding text for a single name part. Supply CWNULL for any chunk of text that you don't want to change. For instance, say you want a name format that will abbreviate first names, but without any punctuation after the abbreviated tokens. You could create and customize the format as follows:
   format = bt_create_name_format ("fvlj", TRUE);
   bt_set_format_text (format, 
                       BTN_FIRST,       /* name-part to customize */
                       NULL, NULL,      /* pre- and post- part text */
                       NULL, "");       /* empty string for post-token */
Without the CWbt_set_format_text() call, CWformat would result in names formatted like CW"J. R. Smith". After setting the post-token text for first names to CW"", this name would become CW"J R Smith".
bt_set_format_options()
   void bt_set_format_options (bt_name_format * format, 
                               bt_namepart part,
                               boolean abbrev,
                               bt_joinmethod join_tokens,
                               bt_joinmethod join_part)
Allows further customization of a name format: you can set the abbreviation flag and the two token-join methods. Alas, there is no mechanism for leaving a value unchanged; you must set everything with CWbt_set_format_options(). For example, let's say that just dropping periods from abbreviated tokens in the first name isn't enough; you really want to save space by jamming the abbreviated tokens together: CW"JR Smith" rather than CW"J R Smith" Assuming the two calls in the above example have been done, the following will finish the job:
   bt_set_format_options (format, BTN_FIRST,
                          TRUE,         /* keep same value for abbrev flag */
                          BTJ_NOTHING,  /* jam tokens together */
                          BTJ_SPACE);   /* space after final token of part */
Note that we unfortunately had to know (and supply) the current values for the abbreviation flag and post-part join method, even though we were only setting the intra-part join method.
bt_format_name()
   char * bt_format_name (bt_name * name, bt_name_format * format)
Once a name format has been created and customized to your heart's content, you can use it to format any number of names that have been split with CWbt_split_name (see bt_split_names). Simply pass the name structure and name format structure, and a newly-allocated string containing the formatted name will be returned to you. It is your responsibility to CWfree() this string.

SEE ALSO

btparse, bt_split_names

AUTHOR

Greg Ward <gward@python.net>