parseFieldList

parseFieldList returns a range iterating over the field numbers in a field-list.

parseFieldList is the main routine for parsing field-lists entered on the command line. It handles both numeric and named field-lists. The elements of the returned range are sequence of 1-up field numbers corresponding to the fields specified in the field-list string.

An error is thrown if the field-list string is malformed. The error text is intended for display to the user invoking the tsv-utils tool from the command line.

Named field-lists require an array of field names from the header line. Named fields are allowed only if a header line is available. Using a named field-list without a header line generates an error message referencing the headerCmdArg string as a hint to the end user.

Several optional modes of operation are available:

  • Conversion to zero-based indexes (convertToZero template parameter) - Returns the field numbers as zero-based array indices rather than 1-based field numbers.
  • Allow zero as a field number (allowZero template parameter) - This allows zero to be used as a field number. This is typically used to allow the user to specify the entire line rather than an individual field. Use a signed result type if also using covertToZero, as this will be returned as (-1).
  • Consuming the entire field list string (consumeEntire template parameter) - By default, an error is thrown if the entire field-list string is not consumed. This is the most common behavior. Turning this off (the No option) will terminate processing without error when a valid field-list termination character is found. The parseFieldList.consumed member function can be used to see where in the input string processing terminated.

The optional cmdOptionString and headerCmdArg arguments are used to generate better error messages. cmdOptionString should be the command line arguments string passed to std.getopt. e.g "f|field". This is added to the error message. Callers already adding the option name to the error message should pass the empty string.

The headerCmdArg argument should be the option for turning on header line processing. This is standard for tsv-utils tools (--H|header), so most tsv-utils tools will use the default value.

parseFieldList returns a reference range. This is so the consumed member function remains valid when using the range with facilities that would copy a value-based range.

parseFieldList
(
T = size_t
ConvertToZeroBasedIndex convertToZero = No.convertToZeroBasedIndex
AllowFieldNumZero allowZero = No.allowFieldNumZero
ConsumeEntireFieldListString consumeEntire = Yes.consumeEntireFieldListString
)
(
string fieldList
,
bool hasHeader = false
,
string[] headerFields = []
,
string cmdOptionString = ""
,
string headerCmdArg = "H|header"
)
if (
isIntegral!T &&
(
!allowZero ||
!convertToZero
||
!isUnsigned!T
)
)

Examples

Basic cases showing how parseFieldList works

1 import std.algorithm : each, equal;
2 
3 string[] emptyHeader = [];
4 
5 // Numeric field-lists, with no header line.
6 assert(`5`.parseFieldList
7        .equal([5]));
8 
9 assert(`10`.parseFieldList(false, emptyHeader)
10        .equal([10]));
11 
12 assert(`1-3,17`.parseFieldList(false, emptyHeader)
13        .equal([1, 2, 3, 17]));
14 
15 // General field lists, when a header line is available
16 assert(`5,1-3`.parseFieldList(true, [`f1`, `f2`, `f3`, `f4`, `f5`])
17        .equal([5, 1, 2, 3]));
18 
19 assert(`f1`.parseFieldList(true, [`f1`, `f2`, `f3`])
20        .equal([1]));
21 
22 assert(`f3`.parseFieldList(true, [`f1`, `f2`, `f3`])
23        .equal([3]));
24 
25 assert(`f1-f3`.parseFieldList(true, [`f1`, `f2`, `f3`])
26        .equal([1, 2, 3]));
27 
28 assert(`f3-f1`.parseFieldList(true, [`f1`, `f2`, `f3`])
29        .equal([3, 2, 1]));
30 
31 assert(`f*`.parseFieldList(true, [`f1`, `f2`, `f3`])
32        .equal([1, 2, 3]));
33 
34 assert(`B*`.parseFieldList(true, [`A1`, `A2`, `B1`, `B2`])
35        .equal([3, 4]));
36 
37 assert(`*2`.parseFieldList(true, [`A1`, `A2`, `B1`, `B2`])
38        .equal([2, 4]));
39 
40 assert(`1-2,f4`.parseFieldList(true, [`f1`, `f2`, `f3`, `f4`, `f5`])
41        .equal([1, 2, 4]));
42 
43 /* The next few examples are closer to the code that would really be
44  * used during in command line arg processing.
45  */
46 {
47     string getoptOption = "f|fields";
48     bool hasHeader = true;
49     auto headerFields = [`A1`, `A2`, `B1`, `B2`];
50     auto fieldListCmdArg = `B*,A1`;
51     auto fieldNumbers = fieldListCmdArg.parseFieldList(hasHeader, headerFields, getoptOption);
52     assert(fieldNumbers.equal([3, 4, 1]));
53     assert(fieldNumbers.consumed == fieldListCmdArg.length);
54 }
55 {
56     /* Supplimentary options after the field-list. */
57     string getoptOption = "f|fields";
58     bool hasHeader = false;
59     string[] headerFields;
60     auto fieldListCmdArg = `3,4:option`;
61     auto fieldNumbers =
62         fieldListCmdArg.parseFieldList!(size_t, No.convertToZeroBasedIndex,
63                                         No.allowFieldNumZero, No.consumeEntireFieldListString)
64         (hasHeader, headerFields, getoptOption);
65     assert(fieldNumbers.equal([3, 4]));
66     assert(fieldNumbers.consumed == 3);
67     assert(fieldListCmdArg[fieldNumbers.consumed .. $] == `:option`);
68 }
69 {
70     /* Supplimentary options after the field-list. */
71     string getoptOption = "f|fields";
72     bool hasHeader = true;
73     auto headerFields = [`A1`, `A2`, `B1`, `B2`];
74     auto fieldListCmdArg = `B*:option`;
75     auto fieldNumbers =
76         fieldListCmdArg.parseFieldList!(size_t, No.convertToZeroBasedIndex,
77                                         No.allowFieldNumZero, No.consumeEntireFieldListString)
78         (hasHeader, headerFields, getoptOption);
79     assert(fieldNumbers.equal([3, 4]));
80     assert(fieldNumbers.consumed == 2);
81     assert(fieldListCmdArg[fieldNumbers.consumed .. $] == `:option`);
82 }
83 {
84     /* Supplementary options after the field-list. */
85     string getoptOption = "f|fields";
86     bool hasHeader = true;
87     auto headerFields = [`A1`, `A2`, `B1`, `B2`];
88     auto fieldListCmdArg = `B* option`;
89     auto fieldNumbers =
90         fieldListCmdArg.parseFieldList!(size_t, No.convertToZeroBasedIndex,
91                                         No.allowFieldNumZero, No.consumeEntireFieldListString)
92         (hasHeader, headerFields, getoptOption);
93     assert(fieldNumbers.equal([3, 4]));
94     assert(fieldNumbers.consumed == 2);
95     assert(fieldListCmdArg[fieldNumbers.consumed .. $] == ` option`);
96 }
97 {
98     /* Mixed numeric and named fields. */
99     string getoptOption = "f|fields";
100     bool hasHeader = true;
101     auto headerFields = [`A1`, `A2`, `B1`, `B2`];
102     auto fieldListCmdArg = `B2,1`;
103     auto fieldNumbers =
104         fieldListCmdArg.parseFieldList!(size_t, No.convertToZeroBasedIndex,
105                                         No.allowFieldNumZero, No.consumeEntireFieldListString)
106         (hasHeader, headerFields, getoptOption);
107     assert(fieldNumbers.equal([4, 1]));
108     assert(fieldNumbers.consumed == fieldListCmdArg.length);
109 }

Meta