module Netmime_string:sig..end
Low-level functions to parse and print mail and MIME messages
Netmime_string contains a lot of functions to scan and print strings
 formatted as MIME messages. For a higher-level view on this topic,
 see the Netmime module.
Contents
These functions are all CR/LF-aware, i.e. lines can be terminated by either LF or CR/LF.
val find_line_end : string -> int -> int -> intfind_line_end s pos len: Searches the next line end (CR/LF or
      only LF), and returns the position. The search starts at position
      pos, and covers the next len bytes. Raises Not_found
      if there is no line end.
val find_line_end_poly : 's Netstring_tstring.tstring_ops -> 's -> int -> int -> intpolymorphic version
val find_line_start : string -> int -> int -> intfind_line_start s pos len: Searches the next start, and returns its
      position. The line start is the position after the next line end
      (CR/LF or only LF). The search starts at position
      pos, and covers the next len bytes. Raises Not_found
      if there is no line end.
val find_line_start_poly : 's Netstring_tstring.tstring_ops -> 's -> int -> int -> intpolymorphic version
val find_double_line_start : string -> int -> int -> intfind_double_line_start s pos len: Searches two adjacent line ends
      (each may be a CR/LF combination or a single LF), and returns the
      position after the second line end.  The search starts at position
      pos, and covers the next len bytes. Raises Not_found
      if the mentioned pattern is not found.
val find_double_line_start_poly : 's Netstring_tstring.tstring_ops -> 's -> int -> int -> intpolymorphic version
val skip_line_ends : string -> int -> int -> intskip_line_ends s pos len: Skips over adjacent line ends (terminated
      by CR/LF or plain LF), and returns the position after the last
      line end. The search starts at position
      pos, and covers the next len bytes. Note that this function
      cannot raise Not_found.
val skip_line_ends_poly : 's Netstring_tstring.tstring_ops -> 's -> int -> int -> intpolymorphic version
val fold_lines_p : ('a -> int -> int -> int -> bool -> 'a) -> 'a -> string -> int -> int -> 'afold_lines_p f acc0 s pos len: Splits the substring of s 
      from pos
      to pos+len into lines, and folds over these lines like 
      List.fold_left. The function f is called as
      f acc p0 p1 p2 is_last where acc is the current accumulator
      (initialized with acc0), and
p0 is the start position of the line in sp1 is the position of the line terminator in sp2 is the position after the line terminator in sis_last is true if this is the last line in the iterationThe lines can be terminated with CR/LF or LF. For the last line
      the terminator is optional (p1=p2 is possible).
The function is tail-recursive.
val fold_lines_p_poly : 's Netstring_tstring.tstring_ops ->
       ('a -> int -> int -> int -> bool -> 'a) -> 'a -> 's -> int -> int -> 'aeven more polymorphic
val fold_lines : ('a -> string -> 'a) -> 'a -> string -> int -> int -> 'afold_lines f acc0 s pos len: Splits the substring of s 
      from pos
      to pos+len into lines, and folds over these lines like 
      List.fold_left. The function f is called as
      f acc line where acc is the current accumulator
      (initialized with acc0), and line is the current line
      w/o terminator.
The lines can be terminated with CR/LF or LF.
The function is tail-recursive.
Example: Get the lines as list:
         List.rev(fold_lines (fun l acc -> acc::l) [] s pos len)
      val fold_lines_poly : 's Netstring_tstring.tstring_ops ->
       ('a -> 's -> 'a) -> 'a -> 's -> int -> int -> 'aeven more polymorphic
val iter_lines : (string -> unit) -> string -> int -> int -> unititer_lines f s pos len: Splits the substring of s 
      from pos
      to pos+len into lines, and calls f line for each
      line.
The lines can be terminated with CR/LF or LF.
val iter_lines_poly : 's Netstring_tstring.tstring_ops -> ('s -> unit) -> 's -> int -> int -> uniteven more polymorphic
val skip_whitespace_left : string -> int -> int -> intskip_whitespace_left s pos len: Returns the smallest
      p with p >= pos && p < pos+len so that s.[p] is not
      a whitesapce character (space, TAB, CR, LF), and 
      s.[q] is a whitespace character for all q<p.
      If this is not possible Not_found will be raised.
val skip_whitespace_right : string -> int -> int -> intskip_whitespace_right s pos len: Returns the biggest
      p with p >= pos && p < pos+len so that s.[p] is not
      a whitesapce character (space, TAB, CR, LF), and
      s.[q] is a whitespace character for all q>p.
      If this is not possible Not_found will be raised.
The Format of Mail Messages
Messages
  consist of a header and a body; the first empty line separates both
  parts. The header contains lines "param-name: param-value" where
  the param-name must begin on column 0 of the line, and the ":"
  separates the name and the value. So the format is roughly:
   param1-name: param1-value
   ...
   paramN-name: paramN-value
   _
   body (Where "_" denotes an empty line.)
Details
Note that parameter values are restricted; you cannot represent arbitrary strings. The following problems can arise:
:" and the
      value.This implementation of a mail scanner tolerates a number of deviations from the standard: long lines are not rejected; 8 bit values are generally accepted; lines may be ended only with LF instead of CRLF.
Compatibility
These functions can parse all mail headers that conform to RFC 822 or RFC 2822.
But there may be still problems, as RFC 822 allows some crazy representations that are actually not used in practice. In particular, RFC 822 allows it to use backslashes to "indicate" that a CRLF sequence is semantically meant as line break. As this function normally deletes CRLFs, it is not possible to recognize such indicators in the result of the function.
val fold_header : ?downcase:bool ->
       ?unfold:bool ->
       ?strip:bool ->
       ('a -> string -> string -> 'a) -> 'a -> string -> int -> int -> 'afold_header f acc0 s pos len:
      Parses a MIME header in the string s from pos to exactly
      pos+len. The MIME header must be terminated by an empty line.
A folding operation is done over the header values while
      the lines are extracted from the string, very much like
      List.fold_left. For each header (n,v) where n is the
      name and v is the value, the function f is called as
      f acc n v.
If the header cannot be parsed, a Failure is raised.
Certain transformations may be applied (default: no transformations):
downcase is set, the header names are converted to
        lowercase charactersunfold is set, the line terminators are not included
        in the resulting values. This covers both the end of line
        characters at the very end of a header and the end of line
        characters introduced by continuation lines.strip is set, preceding and trailing white space is
        removed from the value (including line terminators at the
        very end of the value)val list_header : ?downcase:bool ->
       ?unfold:bool -> ?strip:bool -> string -> int -> int -> (string * string) listlist_header s pos len: Returns the headers as list of pairs
      (name,value).
For the meaning of the arguments see fold_header above.
val scan_header : ?downcase:bool ->
       ?unfold:bool ->
       ?strip:bool ->
       string -> start_pos:int -> end_pos:int -> (string * string) list * intlet params, header_end_pos = scan_header s start_pos end_pos:
Deprecated.
Scans the mail header that begins at position start_pos in the string 
      s and that must end somewhere before position end_pos. It is intended
      that in end_pos the character position following the end of the body of
      the MIME message is passed.
Returns the parameters of the header as (name,value) pairs (in
     params), and in header_end_pos the position of the character following
     directly after the header (i.e. after the blank line separating
     the header from the body).
downcase, header names are converted to lowercase charactersunfold and strip have a slightly different meaning as
       for the new function fold_header above. In particular, whitespace
       is already stripped off the returned values if any of unfold or
       strip are enabled. (This is for backward compatibility.)Also, this function is different because downcase and unfold are
     enabled by default, and only strip is not enabled.
val scan_header_tstring : ?downcase:bool ->
       ?unfold:bool ->
       ?strip:bool ->
       Netsys_types.tstring ->
       start_pos:int -> end_pos:int -> (string * string) list * intThe same for tagged strings
val scan_header_poly : ?downcase:bool ->
       ?unfold:bool ->
       ?strip:bool ->
       's Netstring_tstring.tstring_ops ->
       's -> start_pos:int -> end_pos:int -> (string * string) list * intPolymorphic version
val read_header : ?downcase:bool ->
       ?unfold:bool ->
       ?strip:bool -> Netstream.in_obj_stream -> (string * string) listThis function expects that the current position of the passed
 in_obj_stream is the first byte of the header. The function scans the
 header and returns it. After that, the stream position is after
 the header and the terminating empty line (i.e. at the beginning of
 the message body).
The options downcase, unfold, and strip have the same meaning
 as in scan_header.
Example
To read the mail message "file.txt":
 let ch = new Netchannels.input_channel (open_in "file.txt") in
 let stream = new Netstream.input_stream ch in
 let header = read_header stream in
 stream#close_in()  (* no need to close ch *)
 val write_header : ?soft_eol:string ->
       ?eol:string -> Netchannels.out_obj_channel -> (string * string) list -> unitThis function writes the header to the passed out_obj_channel. The
 empty line following the header is also written.
Exact output format:
write_value below for this.)write_header ch ["x","Field value"; "y","   Other value"] outputs:
  x: Field value\r\n
 y: Other value\r\n
 \r\nsoft_eol string. If the
    necessary space or tab character following the eol is missing, an
    additional space character will be inserted.
    Example:
    write_header ch ["x","Field\nvalue"; "y","Other\r\n\tvalue"] outputs:
  x: Field\r\n
  value
 y: Other\r\n
 \tvaluewrite_header ch ["x","Field\n\nvalue"] outputs:
  x: Field\r\n
  valueeol once.These rules ensure that the printed header will be well-formed with two exceptions:
These two problems cannot be addressed without taking the syntax
 of the header fields into account. See below how to create
 proper header fields from s_token lists.
The following types and functions allow it to build scanners for structured mail and MIME values in a highly configurable way.
Structured Values
RFC 822 (together with some other RFCs) defines lexical rules how formal mail header values should be divided up into tokens. Formal mail headers are those headers that are formed according to some grammar, e.g. mail addresses or MIME types.
Some of the characters separate phrases of the value; these are
 the "special" characters. For example, '@' is normally a special
 character for mail addresses, because it separates the user name
 from the domain name (as in user@domain). RFC 822 defines a fixed set
 of special
 characters, but other RFCs use different sets. Because of this,
 the following functions allow it to configure the set of special characters.
Every sequence of characters may be embraced by double quotes, which means that the sequence is meant as literal data item; special characters are not recognized inside a quoted string. You may use the backslash to insert any character (including double quotes) verbatim into the quoted string (e.g. "He said: \"Give it to me!\""). The sequence of a backslash character and another character is called a quoted pair.
Structured values may contain comments. The beginning of a comment is indicated by '(', and the end by ')'. Comments may be nested. Comments may contain quoted pairs. A comment counts as if a space character were written instead of it.
Control characters are the ASCII characters 0 to 31, and 127. RFC 822 demands that mail headers are 7 bit ASCII strings. Because of this, this module also counts the characters 128 to 255 as control characters.
Domain literals are strings embraced by '[' and ']'; such literals
 may contain quoted pairs. Today, domain literals are used to specify
 IP addresses (rare), e.g. user@[192.168.0.44].
Every character sequence not falling in one of the above categories is an atom (a sequence of non-special and non-control characters). When recognized, atoms may be encoded in a character set different than US-ASCII; such atoms are called encoded words (see RFC 2047).
Scanning Using the Extended Interface
In order to scan a string containing a structured value, you must first
 create a mime_scanner using the function create_mime_scanner.
 The scanner contains the reference to the scanned string, and a 
 specification how the string is to be scanned. The specification
 consists of the lists specials and scan_options.
The character list specials specifies the set of special characters.
 These are the characters that are not regarded as part of atoms, 
 because they work as delimiters that separate atoms (like @ in the
 above example). In addition to this, when '"', '(', and '[' are
 seen as regular characters not delimiting quoted string, comments, and
 domain literals, respectively, these characters must also be added
 to specials. In detail, these rules apply:
' ' in specials: A space character is returned as Special ' '.
       Note that there may also be an effect on how comments are returned
       (see below).' ' not in specials: Spaces are not returned, although
      they still delimit atoms.'\t' in specials: A tab character is returned as 
      Special '\t'.'\t' not in specials: Tabs are not returned, although
      they still delimit atoms.'\r' in specials: A CR character is returned as 
      Special '\r'.'\r' not in specials: CRs are not returned, although
      they still delimit atoms.'\n' in specials: A LF character is returned as
      Special '\n'.'\n' not in specials: LFs are not returned, although
      they still delimit atoms.'(' in specials: Comments are not recognized. The 
       character '(' is returned as Special '('.'(' not in specials: Comments are recognized. How comments
       are returned, depends on the following:Return_comments in scan_options: Outer comments are
         returned as Comment (note that inner comments are recognized but
         are not returned as tokens)' ' in specials: Outer comments are returned as
         Special ' ''"' in specials: Quoted strings are not recognized, and
      double quotes are returned as Special '"'.'"' not in specials: Quoted strings are returned as
      QString tokens.specials: Domain literals are not recognized, and
       left brackets are returned as Special '['.specials: Domain literals are returned as
       DomainLiteral tokens.If recognized, quoted strings are returned as QString s, where
 s is the string without the embracing quotes, and with already
 decoded quoted pairs.
Control characters c are returned as Control c.
If recognized, comments may either be returned as spaces (in the case
 you are not interested in the contents of comments), or as Comment tokens.
 The contents of comments are not further scanned; you must start a
 subscanner to analyze comments as structured values.
If recognized, domain literals are returned as DomainLiteral s, where
 s is the literal without brackets, and with decoded quoted pairs.
Atoms are returned as Atom s where s is a longest sequence of
 atomic characters (all characters which are neither special nor control
 characters nor delimiters for substructures). If the option
 Recognize_encoded_words is on, atoms which look like encoded words
 are returned as EncodedWord tokens. (Important note: Neither '?' nor
 '=' must be special in order to enable this functionality.)
After the mime_scanner has been created, you can scan the tokens by
 invoking scan_token which returns one token at a time, or by invoking
 scan_token_list which returns all following tokens.
There are two token types: s_token is the base type and is intended to
 be used for pattern matching. s_extended_token is a wrapper that 
 additionally contains information where the token occurs.
Scanning Using the Simple Interface
Instead of creating a mime_scanner and calling the scan functions,
 you may also invoke scan_structured_value. This function returns the
 list of tokens directly; however, it is restricted to s_token.
Examples
 scan_structured_value "user@domain.com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
  scan_structured_value "user @ domain . com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
  scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] []
   = [ Atom "user"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
  scan_structured_value "user(Do you know him?)@domain.com" [ '@'; '.' ] 
     [ Return_comments ]
   = [ Atom "user"; Comment; Special '@'; Atom "domain"; Special '.'; 
       Atom "com" ]
  scan_structured_value "user (Do you know him?) @ domain . com" 
     [ '@'; '.'; ' ' ] []
   = [ Atom "user"; Special ' '; Special ' '; Special ' '; Special '@'; 
       Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
  scan_structured_value "user (Do you know him?) @ domain . com" 
     [ '@'; '.'; ' ' ] [ Return_comments ]
   = [ Atom "user"; Special ' '; Comment; Special ' '; Special '@'; 
       Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
  scan_structured_value "user @ domain . com" [ '@'; '.'; ' ' ] []
   = [ Atom "user"; Special ' '; Special '@'; Special ' '; Atom "domain";
       Special ' '; Special '.'; Special ' '; Atom "com" ]
  scan_structured_value "user(Do you know him?)@domain.com" ['@'; '.'; '(']
     []
   = [ Atom "user"; Special '('; Atom "Do"; Atom "you"; Atom "know";
       Atom "him?)"; Special '@'; Atom "domain"; Special '.'; Atom "com" ]
  scan_structured_value "\"My.name\"@domain.com" [ '@'; '.' ] []
   = [ QString "My.name"; Special '@'; Atom "domain"; Special '.';
       Atom "com" ]
  scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
     [ ] [ ] 
   = [ Atom "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" ]
  scan_structured_value "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?=" 
     [ ] [ Recognize_encoded_words ] 
   = [ EncodedWord(("ISO-8859-1",""), "Q", "Keld_J=F8rn_Simonsen") ]
 type | | | Atom of  | |||
| | | EncodedWord of  | (* | Args:  | *) | 
| | | QString of  | |||
| | | Control of  | |||
| | | Special of  | |||
| | | DomainLiteral of  | |||
| | | Comment | |||
| | | End | 
A token may be one of:
QString s: The quoted string s, i.e a string between double
   quotes. Quoted pairs are already decoded in s.Control c: The control character c (0-31, 127, 128-255)Special c: The special character c, i.e. a character from 
   the specials listDomainLiteral s: The bracketed string s, i.e. a string between
   brackets.  Quoted pairs are already decoded in s.Comment: A string between parentheses. This kind of token is only
   generated when the option Return_comments is in effect.EncodedWord((charset,lang),encoding,encoded_word): An RFC-2047 style
   encoded word: charset is the name of the character set; lang is
   the language specifier (from RFC 2231) or ""; encoding is either
   "Q" or "B"; and encoded_word is the word encoded in charset and
   encoding. This kind of token is only generated when the option
   Recognize_encoded_words is in effect (if not, Atom is generated
   instead).Atom s: A string which is neither quoted not bracketed nor 
   written in RFC 2047 notation, and which is not a control or special
   character, i.e. the "rest"End: The end of the stringtype | | | No_backslash_escaping | (* | Do not handle backslashes in quoted string and comments as escape
 characters; backslashes are handled as normal characters.
 For example: The wrong qstring  | *) | 
| | | Return_comments | (* | Comments are returned as token  | *) | 
| | | Recognize_encoded_words | (* | Enables that encoded words are recognized and returned as
  | *) | 
type 
An opaque type containing the information of s_token plus:
val get_token : s_extended_token -> s_tokenReturn the s_token within the s_extended_token
val get_decoded_word : s_extended_token -> string
val get_charset : s_extended_token -> stringReturn the decoded word (the contents of the word after decoding the "Q" or "B" representation), and the character set of the decoded word (uppercase).
These functions not only work for EncodedWord. The function
 get_decoded_word returns for the other kinds of token:
Atom: Returns the atom without decoding itQString: Returns the characters inside the double quotes, and
   ensures that any quoted pairs are decodedControl: Returns the one-character stringSpecial: Returns the one-character stringDomainLiteral: Returns the characters inside the brackets, and
   ensures that any quoted pairs are decodedComment: Returns ""The function get_charset returns "US-ASCII" for them.
val get_language : s_extended_token -> stringReturns the language if the token is an EncodedWord, and "" for
 all other tokens.
val get_pos : s_extended_token -> intReturn the byte position where the token starts in the string (the first byte has position 0)
val get_line : s_extended_token -> intReturn the line number where the token starts (numbering begins usually with 1)
val get_column : s_extended_token -> intReturn the column of the line where the token starts (first column is number 0)
val get_length : s_extended_token -> intReturn the length of the token in bytes
val separates_adjacent_encoded_words : s_extended_token -> boolTrue iff the current token is white space (i.e. Special ' ', 
 Special '\t', Special '\r' or Special '\n') and the last
 non-white space token was EncodedWord and the next non-white
 space token will be EncodedWord.
The background of this function is that white space between encoded words does not have a meaning, and must be ignored by any application interpreting encoded words.
type 
The opaque type of a scanner for structured values
val create_mime_scanner : specials:char list ->
       scan_options:s_option list ->
       ?pos:int -> ?line:int -> ?column:int -> string -> mime_scannerCreates a new mime_scanner scanning the passed string.
specials : The list of characters recognized as special characters.scan_options : The list of global options modifying the behaviour
   of the scannerpos : The position of the byte where the scanner starts in the
   passed string. Defaults to 0.line : The line number of this first byte. Defaults to 1.column : The column number of this first byte. Default to 0.Note for create_mime_scanner:
The optional parameters pos, line, column are intentionally placed after
 scan_options and before the string argument, so you can specify
 scanners by partially applying arguments to create_mime_scanner
 which are not yet connected with a particular string:
 let my_scanner_spec = create_mime_scanner my_specials my_options in
 ...
 let my_scanner = my_scanner_spec my_string in 
 ...val get_pos_of_scanner : mime_scanner -> int
val get_line_of_scanner : mime_scanner -> int
val get_column_of_scanner : mime_scanner -> intReturn the current position, line, and column of a mime_scanner.
 The primary purpose of these functions is to simplify switching
 from one mime_scanner to another within a string:
 let scanner1 = create_mime_scanner ... s in
 ... now scanning some tokens from s using scanner1 ...
 let scanner2 = create_mime_scanner ... 
                  ?pos:(get_pos_of_scanner scanner1)
                  ?line:(get_line_of_scanner scanner1)
                  ?column:(get_column_of_scanner scanner1)
                  s in
 ... scanning more tokens from s using scanner2 ... Restriction: These functions are not available if the option
 Recognize_encoded_words is on. The reason is that this option
 enables look-ahead scanning; please use the location of the last
 scanned token instead.
Note: To improve the performance of switching, it is recommended to
 create scanner specs in advance (see the example my_scanner_spec
 above).
val scan_token : mime_scanner ->
       s_extended_token * s_tokenReturns the next token, or End if there is no more token. The 
 token is returned both as extended and as normal token.
val scan_token_list : mime_scanner ->
       (s_extended_token * s_token) listReturns all following tokens as a list (excluding End)
val scan_structured_value : string ->
       char list -> s_option list -> s_token listThis function is included for backwards compatibility, and for all cases not requiring extended tokens.
It scans the passed string according to the list of special characters and the list of options, and returns the list of all tokens.
val specials_rfc822 : char list
val specials_rfc2045 : char listThe sets of special characters defined by the RFCs 822 and 2045.
val scan_encoded_text_value : string -> s_extended_token listScans a "text" value. The returned token list contains only
 Special, Atom and EncodedWord tokens. 
 Spaces, TABs, CRs, LFs are returned (as Special) unless
 they occur between adjacent encoded words in which case
 they are suppressed. The characters '(', '[', and '"' are also
 returned as Special tokens, and are not interpreted as delimiters.
For instance, this function can be used to scan the "Subject" field of mail messages.
val scan_value_with_parameters : string -> s_option list -> string * (string * string) listlet name, params = scan_value_with_parameters s options:
 Scans values with annotations like
    name ; p1=v1 ; p2=v2 ; ...
 For example, MIME types like "text/plain;charset=ISO-8859-1" can
 be parsed.
The values may or may not be quoted. The characters ";", "=", and even "," are only accepted as part of values when they are quoted. On sytax errors, the function fails.
RFC 2231: This function supports some features of this RFC: Continued parameter values are concatenated. For example:
 Content-Type: message/external-body; access-type=URL;
    URL*0="ftp://";
    URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" This is returned as:
"message/external-body", 
   [ ("access-type", "URL");
     ("URL", "ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar") ]
      ) However, encoded parameter values are not handled specially. The
 parameter
   title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
 would be returned as
   ("title*", "us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A").
 Use scan_values_with_parameters_ep instead (see below).
Raises Failure on syntax errors.
type 
The type of encoded parameters (RFC 2231)
val param_value : s_param -> string
val param_charset : s_param -> string
val param_language : s_param -> stringReturn the decoded value of the parameter, the charset (uppercase),
 and the language.
 If the charset is not available, "" will be returned. 
 If the language is not available, "" will be returned.
val mk_param : ?charset:string -> ?language:string -> string -> s_paramCreates a parameter from a value (in decoded form). The parameter may have a charset and a language.
val print_s_param : Stdlib.Format.formatter -> s_param -> unitPrints a parameter to the formatter (as toploop printer)
val scan_value_with_parameters_ep : string ->
       s_option list ->
       string * (string * s_param) listlet name, params = scan_value_with_parameters_ep s options:
 This version of the scanner copes with encoded parameters according
 to RFC 2231.
 Note: "ep" means "encoded parameters".
Example:
   doc.html;title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
The parameter title would be returned as:
"title""This is ***fun***""US-ASCII""en-us"Raises Failure on syntax errors.
val scan_mime_type : string -> s_option list -> string * (string * string) listlet name, params = scan_mime_type s options:
 Scans MIME types like
    text/plain; charset=iso-8859-1
 The name of the type and the names of the parameters are converted
 to lower case.
Raises Failure on syntax errors.
val scan_mime_type_ep : string ->
       s_option list ->
       string * (string * s_param) listlet name, params = scan_mime_type_ep s options:
 This version copes with RFC-2231-encoded parameters.
Raises Failure on syntax errors.
val split_mime_type : string -> string * stringlet (main_type, sub_type) = split_mime_type content_type:
 Splits the MIME type into main and sub type, for example
  split_mime_type "text/plain" = ("text", "plain") .
 The returned strings are always lowercase.
Raises Failure on syntax errors.
exception Line_too_long
Raised when the hard limit of the line length is exceeded
val write_value : ?maxlen1:int ->
       ?maxlen:int ->
       ?hardmaxlen1:int ->
       ?hardmaxlen:int ->
       ?fold_qstring:bool ->
       ?fold_literal:bool ->
       ?unused:int Stdlib.ref ->
       ?hardunused:int Stdlib.ref ->
       Netchannels.out_obj_channel -> s_token list -> unitWrites the list of s_token to the out_obj_channel. The value
 is optionally folded into several lines while writing, but this
 is off by default. To enable folding, pass both maxlen1 and
 maxlen:
 The maxlen1 parameter specifies the length of the first line
 to write, the maxlen parameter specifies the length of the
 other lines.
If enabled, folding tries to ensure that the value is written
 in several lines that are not longer as specified by 
 maxlen1 and maxlen. The value is split into lines by inserting
 "folding space" at certain locations (which is usually a linefeed
 followed by a space character, see below). The following
 table specifies between which tokens folding may happen:
               +=========================================================+
 1st   \   2nd | Atom | QString | DLiteral | EncWord | Special | Spec ' '|
 ==============+======+=========+==========+=========+=========+=========+
          Atom | FS   |  FS     |   FS     |   FS    |    -    |    F    |
       QString | FS   |  FS     |   FS     |   FS    |    -    |    F    |
 DomainLiteral | FS   |  FS     |   FS     |   FS    |    -    |    F    |
   EncodedWord | FS   |  FS     |   FS     |   FS    |    -    |    F    |
       Special | -    |  -      |   -      |   -     |    -    |    F    |
   Special ' ' | -    |  -      |   -      |   -     |    -    |    -    |
 ==============+======+=========+==========+=========+=========+=========+
The table shows between which two types of tokens a space or a folding space is inserted:
FS: folding spaceF:  linefeed without extra space-:  nothing can be inserted hereFolding space is "\n ", i.e. only LF, not CRLF is used as end-of-line
 character. The function write_header will convert these LF to CRLF
 if needed.
Special '\t' is handled like Special ' '. Control characters are just
 printed, without folding. Comments, however, are substituted by 
 either space or folding space. The token End is ignored.
Furthermore, folding may also happen within tokens:
Atom, Control, and Special are never split up into parts.
   They are simply printed.EncodedWords, however, are reformatted. This especially means:
   adjacent encoded words are first concatenated if possible
   (same character set, same encoding, same language), and then
   split up into several pieces with optimally chosen lengths.
   Note: Because this function gets s_token as input and not
   s_extended_token, it is not known whether Special ' ' tokens
   (or other whitespace) between adjacent EncodedWords must be
   ignored. Because of this, write_value only reformats adjacent encoded 
   words when there is not any whitespace between them.QString may be split up in a special way unless fold_qstring
   is set to false. For example, "One Two  Three" may be split up into
   three lines "One\n Two\n \ Three". Because some header fields
   explicitly forbid folding of quoted strings, it is possible to
   set ~fold_qstring:false (it is true by default).
   Note: Software should not rely on that the different types of
   whitespace (especially space and TAB) remain intact at the
   beginning of a line. Furthermore, it may also happen that 
   additional whitespace is added at the end of a line by the
   transport layer.DomainLiteral: These are handled like QString. The parameter
   ~fold_literal:false turns folding off if it must be prevented,
   it is true by default.Comment: Comments are effectively omitted! Instead of Comment,
   a space or folding space is printed. However, you can output comments
   by passing sequences like  Special "("; ...; Special ")" .It is possible to get the actual number of characters back that
 can still be printed into the last line without making the line
 too long. Pass an int ref as unused to get this value (it may
 be negative!). Pass an
 int ref as hardunused to get the number of characters that may
 be printed until the hard limit is exceeded.
The function normally does not fail when a line becomes too long,
 i.e. it exceeds maxlen1 or maxlen.
 However, it is possible to specify a hard maximum length
 (hardmaxlen1 and hardmaxlen). If these are exceeded, the function
 will raise Line_too_long.
For electronic mail, a maxlen of 78 and a hardmaxlen of 998 is
 recommended.
Known Problems:
Netconversion. You can assume that UTF-8 and UTF-16 always
   work. If the character set is not known the reformatter may
   split the string at wrong positions.Malformed_code.
   This is only done in some special cases, however.Further Tips:
EncodedWord. The reformatter takes care to
   fold the word into several lines.val param_tokens : ?maxlen:int ->
       (string * s_param) list -> s_token listFormats a parameter list. For example, 
 [ "a", "b"; "c", "d" ] is transformed to the token sequence
 corresponding to ; a=b; c=d.
 If maxlen is specified, it is ensured that the individual
 parameter (e.g. "a=b;") is not longer than maxlen-1, such that
 it will fit into a line with maximum length maxlen.
 By default, no maximum length is guaranteed.
 If maxlen is passed, or if a parameter specifies a character
 set or language, the encoding of RFC 2231 will be applied. If these
 conditions are not met, the parameters will be encoded traditionally.
val split_uri : string -> s_token listSplits a long URI according to the algorithm of RFC 2017. The input string must only contain 7 bit characters, and must be, if necessary, already be URL-encoded.
val scan_multipart_body : string ->
       start_pos:int ->
       end_pos:int -> boundary:string -> ((string * string) list * string) listlet [params1, value1; params2, value2; ...]
   = scan_multipart_body s start_pos end_pos boundary:
Scans the string s that is the body of a multipart message.
 The multipart message begins at position start_pos in s, and 
 end_pos is the position
 of the character following the message. In boundary the boundary string
 must be passed (this is the "boundary" parameter of the multipart
 MIME type, e.g. multipart/mixed;boundary="some string" ).
The return value is the list of the parts, where each part
 is returned as pair (params, value). The left component params
 is the list of name/value pairs of the header of the part. The
 right component is the raw content of the part, i.e. if the part
 is encoded ("content-transfer-encoding"), the content is returned
 in the encoded representation. The caller is responsible for decoding
 the content.
The material before the first boundary and after the last boundary is not returned.
Multipart Messages
The MIME standard defines a way to group several message parts to
 a larger message (for E-Mails this technique is known as "attaching"
 files to messages); these are the so-called multipart messages.
 Such messages are recognized by the major type string "multipart",
 e.g. multipart/mixed or multipart/form-data. Multipart types MUST
 have a boundary parameter because boundaries are essential for the
 representation.
Multipart messages have a format like (where "_" denotes empty lines):
 ...Header...
 Content-type: multipart/xyz; boundary="abc"
 ...Header...
 _
 Body begins here ("prologue")
 --abc
 ...Header part 1...
 _
 ...Body part 1...
 --abc
 ...Header part 2...
 _
 ...Body part 2
 --abc
 ...
 --abc--
 Epilogue The parts are separated by boundary lines which begin with "--" and the string passed as boundary parameter. (Note that there may follow arbitrary text on boundary lines after "--abc".) The boundary is chosen such that it does not occur as prefix of any line of the inner parts of the message.
The parts are again MIME messages, with header and body. Note that it is explicitely allowed that the parts are even multipart messages.
The texts before the first boundary and after the last boundary are ignored.
Note that multipart messages as a whole MUST NOT be encoded. Only the PARTS of the messages may be encoded (if they are not multipart messages themselves).
Please read RFC 2046 if want to know the gory details of this brain-dead format.
val scan_multipart_body_and_decode : string ->
       start_pos:int ->
       end_pos:int -> boundary:string -> ((string * string) list * string) listSame as scan_multipart_body, but decodes the bodies of the parts
 if they are encoded using the methods "base64" or "quoted printable".
 Fails, if an unknown encoding is used.
val scan_multipart_body_from_netstream : Netstream.in_obj_stream ->
       boundary:string ->
       create:((string * string) list -> 'a) ->
       add:('a -> Netstream.in_obj_stream -> int -> int -> unit) ->
       stop:('a -> unit) -> unitscan_multipart_body_from_netstream s boundary create add stop:
Reads the MIME message from the netstream s block by block. The
 parts are delimited by the boundary.
Once a new part is detected and begins, the function create is
 called with the MIME header as argument. The result p of this function
 may be of any type.
For every chunk of the part that is being read, the function add
 is invoked: add p s k n.
Here, p is the value returned by the create invocation for the
 current part. s is the netstream. The current window of s contains
 the read chunk completely; the chunk begins at position k of the
 window (relative to the beginning of the window) and has a length
 of n bytes.
When the part has been fully read, the function stop is
 called with p as argument.
That means, for every part the following is executed:
let p = create hadd p s k1 n1add p s k2 n2add p s kN nNstop pImportant Precondition:
s must be at least
   String.length boundary + 4Exceptions:
create, add, stop.p is being read, and the
   create function has already been called (successfully), the
   stop function is also called (you have the chance to close files).
   The exception is re-raised after stop returns.val read_multipart_body : (Netstream.in_obj_stream -> 'a) ->
       string -> Netstream.in_obj_stream -> 'a listThis is the "next generation" multipart message parser. It is called as follows:
let parts = read_multipart_body f boundary s
As precondition, the current position of the stream s must be at
 the beginning of the message body. The string boundary must
 be the message boundary (without "--"). The function f is called
 for every message part, and the resulting list parts is the
 concatentation of the values returned by f.
The stream passed to f is a substream of s that begins at the
 first byte of the header of the message part. The function f
 can read data from the substream as necessary. The substream
 terminates at the end of the message part. This means that f can simply
 read the data of the substream from the beginning to the end. It is
 not necessary that f reads the substream until EOF, however.
After all parts have been read, the trailing material of stream s 
 is skipped until EOF of s is reached.
val create_boundary : ?random:string list -> ?nr:int -> unit -> stringCreates a boundary string that can be used to separate multipart messages. The string is 63 characters long and has the following "features":
nr, so you can safely distinguish between
   several boundaries occurring in the same MIME body if you 
   assign different nr.random, and influenced
   by the current GC state.