findFieldGroups

findFieldGroups creates range that iterates over the 'field-groups' in a 'field-list'. (Private function.)

Input is typically a string or character array. The range becomes empty when the end of input is reached or an unescaped field-list terminator character is found.

A 'field-list' is a comma separated list of 'field-groups'. A 'field-group' is a single numeric or named field, or a hyphen-separated pair of numeric or named fields. For example:

1,3,4-7               # 3 numeric field-groups
field_a,field_b       # 2 named fields

Each element in the range is represented by a tuple of two values:

  • consumed - The total index positions consumed by the range so far
  • value - A slice containing the text of the field-group.

The field-group slice does not contain the separator character, but this is included in the total consumed. The field-group tuples from the previous examples:

Input: 1,2,4-7
   tuple(1, "1")
   tuple(3, "2")
   tuple(7, "4-7")

Input: field_a,field_b
   tuple(7, "field_a")
   tuple(8, "field_b")

The details of field-groups are not material to this routine, it is only concerned with finding the boundaries between field-groups and the termination boundary for the field-list. This is relatively straightforward. The main parsing concern is the use of escape character when delimiter characters are included in field names.

Field-groups are separated by a single comma (','). A field-list is terminated by a colon (':') or space (' ') character. Comma, colon, and space characters can be included in a field-group by preceding them with a backslash. A backslash not intended as an escape character must also be backslash escaped.

A field-list is also terminated if an unescaped backslash is encountered or a pair of consecutive commas. This is normally an error, but handling of these cases is left to the caller.

Additional characters need to be backslash escaped inside field-groups, the asterisk ('*') and hyphen ('-') characters in particular. However, this routine needs only be aware of characters that affect field-list and field-group boundaries, which are the set listed above.

Backslash escape sequences are recognized but not removed from field-groups.

Field and record delimiter characters (usually TAB and newline) are not handled by this routine. They cannot be used in field names as there is no way to represent them in the header line. However, it is not necessary for this routine to check for them, these checks occurs naturally when processing header lines.

private
findFieldGroups
(
Range
)
(
Range r
)
if (
isInputRange!Range &&
(
is(Unqual!(ElementEncodingType!Range) == char) ||
is(Unqual!(ElementEncodingType!Range) == ubyte)
)
&&
(
isNarrowString!Range ||
(
isRandomAccessRange!Range &&
hasSlicing!Range
&&
hasLength!Range
)
)
)

Meta