Class MimeUtility


  • public class MimeUtility
    extends Object
    This is a utility class that provides various MIME related functionality.

    There are a set of methods to encode and decode MIME headers as per RFC 2047. Note that, in general, these methods are not needed when using methods such as setSubject and setRecipients; Jakarta Mail will automatically encode and decode data when using these "higher level" methods. The methods below are only needed when maniuplating raw MIME headers using setHeader and getHeader methods. A brief description on handling such headers is given below:

    RFC 822 mail headers must contain only US-ASCII characters. Headers that contain non US-ASCII characters must be encoded so that they contain only US-ASCII characters. Basically, this process involves using either BASE64 or QP to encode certain characters. RFC 2047 describes this in detail.

    In Java, Strings contain (16 bit) Unicode characters. ASCII is a subset of Unicode (and occupies the range 0 - 127). A String that contains only ASCII characters is already mail-safe. If the String contains non US-ASCII characters, it must be encoded. An additional complexity in this step is that since Unicode is not yet a widely used charset, one might want to first charset-encode the String into another charset and then do the transfer-encoding.

    Note that to get the actual bytes of a mail-safe String (say, for sending over SMTP), one must do

    
            byte[] bytes = string.getBytes("iso-8859-1");   
    
     

    The setHeader and addHeader methods on MimeMessage and MimeBodyPart assume that the given header values are Unicode strings that contain only US-ASCII characters. Hence the callers of those methods must insure that the values they pass do not contain non US-ASCII characters. The methods in this class help do this.

    The getHeader family of methods on MimeMessage and MimeBodyPart return the raw header value. These might be encoded as per RFC 2047, and if so, must be decoded into Unicode Strings. The methods in this class help to do this.

    Several System properties control strict conformance to the MIME spec. Note that these are not session properties but must be set globally as System properties.

    The mail.mime.decodetext.strict property controls decoding of MIME encoded words. The MIME spec requires that encoded words start at the beginning of a whitespace separated word. Some mailers incorrectly include encoded words in the middle of a word. If the mail.mime.decodetext.strict System property is set to "false", an attempt will be made to decode these illegal encoded words. The default is true.

    The mail.mime.encodeeol.strict property controls the choice of Content-Transfer-Encoding for MIME parts that are not of type "text". Often such parts will contain textual data for which an encoding that allows normal end of line conventions is appropriate. In rare cases, such a part will appear to contain entirely textual data, but will require an encoding that preserves CR and LF characters without change. If the mail.mime.encodeeol.strict System property is set to "true", such an encoding will be used when necessary. The default is false.

    In addition, the mail.mime.charset System property can be used to specify the default MIME charset to use for encoded words and text parts that don't otherwise specify a charset. Normally, the default MIME charset is derived from the default Java charset, as specified in the file.encoding System property. Most applications will have no need to explicitly set the default MIME charset. In cases where the default MIME charset to be used for mail messages is different than the charset used for files stored on the system, this property should be set.

    The current implementation also supports the following System property.

    The mail.mime.ignoreunknownencoding property controls whether unknown values in the Content-Transfer-Encoding header, as passed to the decode method, cause an exception. If set to "true", unknown values are ignored and 8bit encoding is assumed. Otherwise, unknown values cause a MessagingException to be thrown.

    Author:
    John Mani, Bill Shannon
    • Method Detail

      • getEncoding

        public static String getEncoding​(DataSource ds)
        Get the Content-Transfer-Encoding that should be applied to the input stream of this DataSource, to make it mail-safe.

        The algorithm used here is:

        • If the DataSource implements EncodingAware, ask it what encoding to use. If it returns non-null, return that value.
        • If the primary type of this datasource is "text" and if all the bytes in its input stream are US-ASCII, then the encoding is StreamProvider.BIT7_ENCODER. If more than half of the bytes are non-US-ASCII, then the encoding is StreamProvider.BASE_64_ENCODER. If less than half of the bytes are non-US-ASCII, then the encoding is StreamProvider.QUOTED_PRINTABLE_ENCODER.
        • If the primary type of this datasource is not "text", then if all the bytes of its input stream are US-ASCII, the encoding is StreamProvider.BIT7_ENCODER. If there is even one non-US-ASCII character, the encoding is StreamProvider.BASE_64_ENCODER.
        Parameters:
        ds - the DataSource
        Returns:
        the encoding. This is either StreamProvider.BIT7_ENCODER, StreamProvider.QUOTED_PRINTABLE_ENCODER or StreamProvider.BASE_64_ENCODER
      • getEncoding

        public static String getEncoding​(DataHandler dh)
        Same as getEncoding(DataSource) except that instead of reading the data from an InputStream it uses the writeTo method to examine the data. This is more efficient in the common case of a DataHandler created with an object and a MIME type (for example, a "text/plain" String) because all the I/O is done in this thread. In the case requiring an InputStream the DataHandler uses a thread, a pair of pipe streams, and the writeTo method to produce the data.

        Parameters:
        dh - the DataHandler
        Returns:
        the Content-Transfer-Encoding
        Since:
        JavaMail 1.2
      • decode

        public static InputStream decode​(InputStream is,
                                         String encoding)
                                  throws MessagingException
        Decode the given input stream. The Input stream returned is the decoded input stream. All the encodings defined in RFC 2045 are supported here. They include StreamProvider.BASE_64_ENCODER, StreamProvider.QUOTED_PRINTABLE_ENCODER, StreamProvider.BIT7_ENCODER, StreamProvider.BIT8_ENCODER, and StreamProvider.BINARY_ENCODER. In addition, StreamProvider.UU_ENCODER is also supported.

        In the current implementation, if the mail.mime.ignoreunknownencoding system property is set to "true", unknown encoding values are ignored and the original InputStream is returned.

        Parameters:
        is - input stream
        encoding - the encoding of the stream.
        Returns:
        decoded input stream.
        Throws:
        MessagingException - if the encoding is unknown
      • encode

        public static OutputStream encode​(OutputStream os,
                                          String encoding)
                                   throws MessagingException
        Wrap an encoder around the given output stream. All the encodings defined in RFC 2045 are supported here. They include StreamProvider.BASE_64_ENCODER, StreamProvider.QUOTED_PRINTABLE_ENCODER, StreamProvider.BIT7_ENCODER, StreamProvider.BIT8_ENCODER and StreamProvider.BINARY_ENCODER. In addition, StreamProvider.UU_ENCODER is also supported.
        Parameters:
        os - output stream
        encoding - the encoding of the stream.
        Returns:
        output stream that applies the specified encoding.
        Throws:
        MessagingException - if the encoding is unknown
      • encode

        public static OutputStream encode​(OutputStream os,
                                          String encoding,
                                          String filename)
                                   throws MessagingException
        Wrap an encoder around the given output stream. All the encodings defined in RFC 2045 are supported here. They include StreamProvider.BASE_64_ENCODER, StreamProvider.QUOTED_PRINTABLE_ENCODER, StreamProvider.BIT7_ENCODER, StreamProvider.BIT8_ENCODER and StreamProvider.BINARY_ENCODER. In addition, StreamProvider.UU_ENCODER is also supported. The filename parameter is used with the StreamProvider.UU_ENCODER encoding and is included in the encoded output.
        Parameters:
        os - output stream
        encoding - the encoding of the stream.
        filename - name for the file being encoded (only used with uuencode)
        Returns:
        output stream that applies the specified encoding.
        Throws:
        MessagingException - for unknown encodings
        Since:
        JavaMail 1.2
      • encodeText

        public static String encodeText​(String text)
                                 throws UnsupportedEncodingException
        Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.

        The given Unicode string is examined for non US-ASCII characters. If the string contains only US-ASCII characters, it is returned as-is. If the string contains non US-ASCII characters, it is first character-encoded using the platform's default charset, then transfer-encoded using either the B or Q encoding. The resulting bytes are then returned as a Unicode string containing only ASCII characters.

        Note that this method should be used to encode only "unstructured" RFC 822 headers.

        Example of usage:

        
          MimePart part = ...
          String rawvalue = "FooBar Mailer, Japanese version 1.1"
          try {
            // If we know for sure that rawvalue contains only US-ASCII 
            // characters, we can skip the encoding part
            part.setHeader("X-mailer", MimeUtility.encodeText(rawvalue));
          } catch (UnsupportedEncodingException e) {
            // encoding failure
          } catch (MessagingException me) {
           // setHeader() failure
          }
        
         

        Parameters:
        text - Unicode string
        Returns:
        Unicode string containing only US-ASCII characters
        Throws:
        UnsupportedEncodingException - if the encoding fails
      • encodeText

        public static String encodeText​(String text,
                                        String charset,
                                        String encoding)
                                 throws UnsupportedEncodingException
        Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.

        The given Unicode string is examined for non US-ASCII characters. If the string contains only US-ASCII characters, it is returned as-is. If the string contains non US-ASCII characters, it is first character-encoded using the specified charset, then transfer-encoded using either the B or Q encoding. The resulting bytes are then returned as a Unicode string containing only ASCII characters.

        Note that this method should be used to encode only "unstructured" RFC 822 headers.

        Parameters:
        text - the header value
        charset - the charset. If this parameter is null, the platform's default chatset is used.
        encoding - the encoding to be used. Currently supported values are "B" and "Q". If this parameter is null, then the "Q" encoding is used if most of characters to be encoded are in the ASCII charset, otherwise "B" encoding is used.
        Returns:
        Unicode string containing only US-ASCII characters
        Throws:
        UnsupportedEncodingException - if the charset conversion failed.
      • decodeText

        public static String decodeText​(String etext)
                                 throws UnsupportedEncodingException
        Decode "unstructured" headers, that is, headers that are defined as '*text' as per RFC 822.

        The string is decoded using the algorithm specified in RFC 2047, Section 6.1. If the charset-conversion fails for any sequence, an UnsupportedEncodingException is thrown. If the String is not an RFC 2047 style encoded header, it is returned as-is

        Example of usage:

        
          MimePart part = ...
          String rawvalue = null;
          String  value = null;
          try {
            if ((rawvalue = part.getHeader("X-mailer")[0]) != null)
              value = MimeUtility.decodeText(rawvalue);
          } catch (UnsupportedEncodingException e) {
              // Don't care
              value = rawvalue;
          } catch (MessagingException me) { }
        
          return value;
        
         

        Parameters:
        etext - the possibly encoded value
        Returns:
        the decoded text
        Throws:
        UnsupportedEncodingException - if the charset conversion failed.
      • encodeWord

        public static String encodeWord​(String word)
                                 throws UnsupportedEncodingException
        Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.

        The given Unicode string is examined for non US-ASCII characters. If the string contains only US-ASCII characters, it is returned as-is. If the string contains non US-ASCII characters, it is first character-encoded using the platform's default charset, then transfer-encoded using either the B or Q encoding. The resulting bytes are then returned as a Unicode string containing only ASCII characters.

        This method is meant to be used when creating RFC 822 "phrases". The InternetAddress class, for example, uses this to encode it's 'phrase' component.

        Parameters:
        word - Unicode string
        Returns:
        Array of Unicode strings containing only US-ASCII characters.
        Throws:
        UnsupportedEncodingException - if the encoding fails
      • encodeWord

        public static String encodeWord​(String word,
                                        String charset,
                                        String encoding)
                                 throws UnsupportedEncodingException
        Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.

        The given Unicode string is examined for non US-ASCII characters. If the string contains only US-ASCII characters, it is returned as-is. If the string contains non US-ASCII characters, it is first character-encoded using the specified charset, then transfer-encoded using either the B or Q encoding. The resulting bytes are then returned as a Unicode string containing only ASCII characters.

        Parameters:
        word - Unicode string
        charset - the MIME charset
        encoding - the encoding to be used. Currently supported values are "B" and "Q". If this parameter is null, then the "Q" encoding is used if most of characters to be encoded are in the ASCII charset, otherwise "B" encoding is used.
        Returns:
        Unicode string containing only US-ASCII characters
        Throws:
        UnsupportedEncodingException - if the encoding fails
      • decodeWord

        public static String decodeWord​(String eword)
                                 throws ParseException,
                                        UnsupportedEncodingException
        The string is parsed using the rules in RFC 2047 and RFC 2231 for parsing an "encoded-word". If the parse fails, a ParseException is thrown. Otherwise, it is transfer-decoded, and then charset-converted into Unicode. If the charset-conversion fails, an UnsupportedEncodingException is thrown.

        Parameters:
        eword - the encoded value
        Returns:
        the decoded word
        Throws:
        ParseException - if the string is not an encoded-word as per RFC 2047 and RFC 2231.
        UnsupportedEncodingException - if the charset conversion failed.
      • quote

        public static String quote​(String word,
                                   String specials)
        A utility method to quote a word, if the word contains any characters from the specified 'specials' list.

        The HeaderTokenizer class defines two special sets of delimiters - MIME and RFC 822.

        This method is typically used during the generation of RFC 822 and MIME header fields.

        Parameters:
        word - word to be quoted
        specials - the set of special characters
        Returns:
        the possibly quoted word
        See Also:
        HeaderTokenizer.MIME, HeaderTokenizer.RFC822
      • fold

        public static String fold​(int used,
                                  String s)
        Fold a string at linear whitespace so that each line is no longer than 76 characters, if possible. If there are more than 76 non-whitespace characters consecutively, the string is folded at the first whitespace after that sequence. The parameter used indicates how many characters have been used in the current line; it is usually the length of the header name.

        Note that line breaks in the string aren't escaped; they probably should be.

        Parameters:
        used - characters used in line so far
        s - the string to fold
        Returns:
        the folded string
        Since:
        JavaMail 1.4
      • unfold

        public static String unfold​(String s)
        Unfold a folded header. Any line breaks that aren't escaped and are followed by whitespace are removed.
        Parameters:
        s - the string to unfold
        Returns:
        the unfolded string
        Since:
        JavaMail 1.4
      • javaCharset

        public static String javaCharset​(String charset)
        Convert a MIME charset name into a valid Java charset name.

        Parameters:
        charset - the MIME charset name
        Returns:
        the Java charset equivalent. If a suitable mapping is not available, the passed in charset is itself returned.
      • mimeCharset

        public static String mimeCharset​(String charset)
        Convert a java charset into its MIME charset name.

        Note that a future version of JDK (post 1.2) might provide this functionality, in which case, we may deprecate this method then.

        Parameters:
        charset - the JDK charset
        Returns:
        the MIME/IANA equivalent. If a mapping is not possible, the passed in charset itself is returned.
        Since:
        JavaMail 1.1
      • getDefaultJavaCharset

        public static String getDefaultJavaCharset()
        Get the default charset corresponding to the system's current default locale. If the System property mail.mime.charset is set, a system charset corresponding to this MIME charset will be returned.

        Returns:
        the default charset of the system's default locale, as a Java charset. (NOT a MIME charset)
        Since:
        JavaMail 1.1
      • getBytes

        public static byte[] getBytes​(String s)