Primitive Type char1.0.0 [−]
A character type.
The char type represents a single character. More specifically, since
'character' isn't a well-defined concept in Unicode, char is a 'Unicode
scalar value', which is similar to, but not the same as, a 'Unicode code
point'.
This documentation describes a number of methods and trait implementations on the
char type. For technical reasons, there is additional, separate
documentation in the std::char module as well.
Representation
char is always four bytes in size. This is a different representation than
a given character would have as part of a String. For example:
let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>());Run
As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' can be more than one Unicode code point; this ❤️ in particular is two:
let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next());Run
This means it won't fit into a char. Trying to create a literal with
let heart = '❤️'; gives an error:
error: character literal may only contain one codepoint: '❤
let heart = '❤️';
^~
Another implication of the 4-byte fixed size of a char is that
per-char processing can end up using a lot more memory:
let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>());Run
Methods
impl char[src]
fn is_digit(self, radix: u32) -> bool
Checks if a char is a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.
Compared to is_numeric(), this function only recognizes the characters
0-9, a-z and A-Z.
'Digit' is defined to be only the following characters:
0-9a-zA-Z
For a more comprehensive understanding of 'digit', see is_numeric.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10));Run
Passing a large radix, causing a panic:
use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err());Run
fn to_digit(self, radix: u32) -> Option<u32>
Converts a char to a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radices are supported.
'Digit' is defined to be only the following characters:
0-9a-zA-Z
Errors
Returns None if the char does not refer to a digit in the given radix.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15));Run
Passing a non-digit results in failure:
assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None);Run
Passing a large radix, causing a panic:
use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err());Run
fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape of a
character as chars.
This will escape characters with the Rust syntax of the form
\u{NNNNNN} where NNNNNN is a hexadecimal representation.
Examples
As an iterator:
for c in '❤'.escape_unicode() { print!("{}", c); } println!();Run
Using println! directly:
println!("{}", '❤'.escape_unicode());Run
Both are equivalent to:
println!("\\u{{2764}}");Run
Using to_string:
assert_eq!('❤'.escape_unicode().to_string(), "\\u{2764}");Run
fn escape_debug(self) -> EscapeDebug1.20.0
Returns an iterator that yields the literal escape code of a character
as chars.
This will escape the characters similar to the Debug implementations
of str or char.
Examples
As an iterator:
for c in '\n'.escape_debug() { print!("{}", c); } println!();Run
Using println! directly:
println!("{}", '\n'.escape_debug());Run
Both are equivalent to:
println!("\\n");Run
Using to_string:
assert_eq!('\n'.escape_debug().to_string(), "\\n");Run
fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the literal escape code of a character
as chars.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab is escaped as
\t. - Carriage return is escaped as
\r. - Line feed is escaped as
\n. - Single quote is escaped as
\'. - Double quote is escaped as
\". - Backslash is escaped as
\\. - Any character in the 'printable ASCII' range
0x20..0x7einclusive is not escaped. - All other characters are given hexadecimal Unicode escapes; see
escape_unicode.
Examples
As an iterator:
for c in '"'.escape_default() { print!("{}", c); } println!();Run
Using println! directly:
println!("{}", '"'.escape_default());Run
Both are equivalent to:
println!("\\\"");Run
Using to_string:
assert_eq!('"'.escape_default().to_string(), "\\\"");Run
fn len_utf8(self) -> usize
Returns the number of bytes this char would need if encoded in UTF-8.
That number of bytes is always between 1 and 4, inclusive.
Examples
Basic usage:
let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4);Run
The &str type guarantees that its contents are UTF-8, and so we can compare the length it
would take if each code point was represented as a char vs in the &str itself:
// as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len());Run
fn len_utf16(self) -> usize
Returns the number of 16-bit code units this char would need if
encoded in UTF-16.
See the documentation for len_utf8 for more explanation of this
concept. This function is a mirror, but for UTF-16 instead of UTF-8.
Examples
Basic usage:
let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2);Run
fn encode_utf8(self, dst: &mut [u8]) -> &mut str1.15.0
Encodes this character as UTF-8 into the provided byte buffer, and then returns the subslice of the buffer that contains the encoded character.
Panics
Panics if the buffer is not large enough.
A buffer of length four is large enough to encode any char.
Examples
In both of these examples, 'ß' takes two bytes to encode.
let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, "ß"); assert_eq!(result.len(), 2);Run
A buffer that's too small:
use std::thread; let result = thread::spawn(|| { let mut b = [0; 1]; // this panics 'ß'.encode_utf8(&mut b); }).join(); assert!(result.is_err());Run
fn encode_utf16(self, dst: &mut [u16]) -> &mut [u16]1.15.0
Encodes this character as UTF-16 into the provided u16 buffer,
and then returns the subslice of the buffer that contains the encoded character.
Panics
Panics if the buffer is not large enough.
A buffer of length 2 is large enough to encode any char.
Examples
In both of these examples, '𝕊' takes two u16s to encode.
let mut b = [0; 2]; let result = '𝕊'.encode_utf16(&mut b); assert_eq!(result.len(), 2);Run
A buffer that's too small:
use std::thread; let result = thread::spawn(|| { let mut b = [0; 1]; // this panics '𝕊'.encode_utf16(&mut b); }).join(); assert!(result.is_err());Run
fn is_alphabetic(self) -> bool
Returns true if this char is an alphabetic code point, and false if not.
Examples
Basic usage:
assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic());Run
fn is_xid_start(self) -> bool
🔬 This is a nightly-only experimental API. (rustc_private #27812)
mainly needed for compiler internals
Returns true if this char satisfies the 'XID_Start' Unicode property, and false
otherwise.
'XID_Start' is a Unicode Derived Property specified in
UAX #31,
mostly similar to ID_Start but modified for closure under NFKx.
fn is_xid_continue(self) -> bool
🔬 This is a nightly-only experimental API. (rustc_private #27812)
mainly needed for compiler internals
Returns true if this char satisfies the 'XID_Continue' Unicode property, and false
otherwise.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
Returns true if this char is lowercase, and false otherwise.
'Lowercase' is defined according to the terms of the Unicode Derived Core
Property Lowercase.
Examples
Basic usage:
assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase());Run
fn is_uppercase(self) -> bool
Returns true if this char is uppercase, and false otherwise.
'Uppercase' is defined according to the terms of the Unicode Derived Core
Property Uppercase.
Examples
Basic usage:
assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase());Run
fn is_whitespace(self) -> bool
Returns true if this char is whitespace, and false otherwise.
'Whitespace' is defined according to the terms of the Unicode Derived Core
Property White_Space.
Examples
Basic usage:
assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace());Run
fn is_alphanumeric(self) -> bool
Returns true if this char is alphanumeric, and false otherwise.
'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
Examples
Basic usage:
assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric());Run
fn is_control(self) -> bool
Returns true if this char is a control code point, and false otherwise.
'Control code point' is defined in terms of the Unicode General
Category Cc.
Examples
Basic usage:
// U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control());Run
fn is_numeric(self) -> bool
Returns true if this char is numeric, and false otherwise.
'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.
Examples
Basic usage:
assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric());Run
fn to_lowercase(self) -> ToLowercase
Returns an iterator that yields the lowercase equivalent of a char
as one or more chars.
If a character does not have a lowercase equivalent, the same character will be returned back by the iterator.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its lowercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
As an iterator:
for c in 'İ'.to_lowercase() { print!("{}", c); } println!();Run
Using println! directly:
println!("{}", 'İ'.to_lowercase());Run
Both are equivalent to:
println!("i\u{307}");Run
Using to_string:
assert_eq!('C'.to_lowercase().to_string(), "c"); // Sometimes the result is more than one character: assert_eq!('İ'.to_lowercase().to_string(), "i\u{307}"); // Characters that do not have both uppercase and lowercase // convert into themselves. assert_eq!('山'.to_lowercase().to_string(), "山");Run
fn to_uppercase(self) -> ToUppercase
Returns an iterator that yields the uppercase equivalent of a char
as one or more chars.
If a character does not have an uppercase equivalent, the same character will be returned back by the iterator.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its uppercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
As an iterator:
for c in 'ß'.to_uppercase() { print!("{}", c); } println!();Run
Using println! directly:
println!("{}", 'ß'.to_uppercase());Run
Both are equivalent to:
println!("SS");Run
Using to_string:
assert_eq!('c'.to_uppercase().to_string(), "C"); // Sometimes the result is more than one character: assert_eq!('ß'.to_uppercase().to_string(), "SS"); // Characters that do not have both uppercase and lowercase // convert into themselves. assert_eq!('山'.to_uppercase().to_string(), "山");Run
Note on locale
In Turkish, the equivalent of 'i' in Latin has five forms instead of two:
- 'Dotless': I / ı, sometimes written ï
- 'Dotted': İ / i
Note that the lowercase dotted 'i' is the same as the Latin. Therefore:
let upper_i = 'i'.to_uppercase().to_string();Run
The value of upper_i here relies on the language of the text: if we're
in en-US, it should be "I", but if we're in tr_TR, it should
be "İ". to_uppercase() does not take this into account, and so:
let upper_i = 'i'.to_uppercase().to_string(); assert_eq!(upper_i, "I");Run
holds across languages.
Trait Implementations
impl Debug for char[src]
impl Clone for char[src]
fn clone(&self) -> char
Returns a deep copy of the value.
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from source. Read more
impl PartialOrd<char> for char[src]
fn partial_cmp(&self, other: &char) -> Option<Ordering>
This method returns an ordering between self and other values if one exists. Read more
fn lt(&self, other: &char) -> bool
This method tests less than (for self and other) and is used by the < operator. Read more
fn le(&self, other: &char) -> bool
This method tests less than or equal to (for self and other) and is used by the <= operator. Read more
fn ge(&self, other: &char) -> bool
This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more
fn gt(&self, other: &char) -> bool
This method tests greater than (for self and other) and is used by the > operator. Read more
impl Eq for char[src]
impl TryFrom<u32> for char[src]
type Error = CharTryFromError
The type returned in the event of a conversion error.
fn try_from(i: u32) -> Result<char, <char as TryFrom<u32>>::Error>
Performs the conversion.
impl<'a> Pattern<'a> for char[src]
Searches for chars that are equal to a given char
type Searcher = CharSearcher<'a>
🔬 This is a nightly-only experimental API. (pattern #27721)
API not fully fleshed out and ready to be stabilized
Associated searcher for this pattern
fn into_searcher(self, haystack: &'a str) -> <char as Pattern<'a>>::Searcher
🔬 This is a nightly-only experimental API. (pattern #27721)
API not fully fleshed out and ready to be stabilized
Constructs the associated searcher from self and the haystack to search in. Read more
fn is_contained_in(self, haystack: &'a str) -> bool
🔬 This is a nightly-only experimental API. (pattern #27721)
API not fully fleshed out and ready to be stabilized
Checks whether the pattern matches anywhere in the haystack
fn is_prefix_of(self, haystack: &'a str) -> bool
🔬 This is a nightly-only experimental API. (pattern #27721)
API not fully fleshed out and ready to be stabilized
Checks whether the pattern matches at the front of the haystack
fn is_suffix_of(self, haystack: &'a str) -> bool where
<char as Pattern<'a>>::Searcher: ReverseSearcher<'a>,
<char as Pattern<'a>>::Searcher: ReverseSearcher<'a>,
🔬 This is a nightly-only experimental API. (pattern #27721)
API not fully fleshed out and ready to be stabilized
Checks whether the pattern matches at the back of the haystack
impl From<u8> for char1.13.0[src]
Maps a byte in 0x00...0xFF to a char whose code point has the same value, in U+0000 to U+00FF.
Unicode is designed such that this effectively decodes bytes with the character encoding that IANA calls ISO-8859-1. This encoding is compatible with ASCII.
Note that this is different from ISO/IEC 8859-1 a.k.a. ISO 8859-1 (with one less hyphen), which leaves some "blanks", byte values that are not assigned to any character. ISO-8859-1 (the IANA one) assigns them to the C0 and C1 control codes.
Note that this is also different from Windows-1252 a.k.a. code page 1252, which is a superset ISO/IEC 8859-1 that assigns some (not all!) blanks to punctuation and various Latin characters.
To confuse things further, on the Web
ascii, iso-8859-1, and windows-1252 are all aliases
for a superset of Windows-1252 that fills the remaining blanks with corresponding
C0 and C1 control codes.
impl Display for char[src]
fn fmt(&self, f: &mut Formatter) -> Result<(), Error>
Formats the value using the given formatter. Read more
impl Hash for char[src]
fn hash<H>(&self, state: &mut H) where
H: Hasher,
H: Hasher,
Feeds this value into the given [Hasher]. Read more
fn hash_slice<H>(data: &[Self], state: &mut H) where
H: Hasher, 1.3.0
H: Hasher,
Feeds a slice of this type into the given [Hasher]. Read more
impl FromStr for char1.20.0[src]
type Err = ParseCharError
The associated error which can be returned from parsing.
fn from_str(s: &str) -> Result<char, <char as FromStr>::Err>
Parses a string s to return a value of this type. Read more
impl Default for char[src]
impl Ord for char[src]
fn cmp(&self, other: &char) -> Ordering
This method returns an Ordering between self and other. Read more
fn max(self, other: Self) -> Self
Compares and returns the maximum of two values. Read more
fn min(self, other: Self) -> Self
Compares and returns the minimum of two values. Read more
impl PartialEq<char> for char[src]
fn eq(&self, other: &char) -> bool
This method tests for self and other values to be equal, and is used by ==. Read more
fn ne(&self, other: &char) -> bool
This method tests for !=.
impl AsciiExt for char[src]
type Owned = char
Container type for copied ASCII characters.
fn is_ascii(&self) -> bool
Checks if the value is within the ASCII range. Read more
fn to_ascii_uppercase(&self) -> char
Makes a copy of the value in its ASCII upper case equivalent. Read more
fn to_ascii_lowercase(&self) -> char
Makes a copy of the value in its ASCII lower case equivalent. Read more
fn eq_ignore_ascii_case(&self, other: &char) -> bool
Checks that two values are an ASCII case-insensitive match. Read more
fn make_ascii_uppercase(&mut self)
Converts this type to its ASCII upper case equivalent in-place. Read more
fn make_ascii_lowercase(&mut self)
Converts this type to its ASCII lower case equivalent in-place. Read more
fn is_ascii_alphabetic(&self) -> bool
Checks if the value is an ASCII alphabetic character: U+0041 'A' ... U+005A 'Z' or U+0061 'a' ... U+007A 'z'. For strings, true if all characters in the string are ASCII alphabetic. Read more
fn is_ascii_uppercase(&self) -> bool
Checks if the value is an ASCII uppercase character: U+0041 'A' ... U+005A 'Z'. For strings, true if all characters in the string are ASCII uppercase. Read more
fn is_ascii_lowercase(&self) -> bool
Checks if the value is an ASCII lowercase character: U+0061 'a' ... U+007A 'z'. For strings, true if all characters in the string are ASCII lowercase. Read more
fn is_ascii_alphanumeric(&self) -> bool
Checks if the value is an ASCII alphanumeric character: U+0041 'A' ... U+005A 'Z', U+0061 'a' ... U+007A 'z', or U+0030 '0' ... U+0039 '9'. For strings, true if all characters in the string are ASCII alphanumeric. Read more
fn is_ascii_digit(&self) -> bool
Checks if the value is an ASCII decimal digit: U+0030 '0' ... U+0039 '9'. For strings, true if all characters in the string are ASCII digits. Read more
fn is_ascii_hexdigit(&self) -> bool
Checks if the value is an ASCII hexadecimal digit: U+0030 '0' ... U+0039 '9', U+0041 'A' ... U+0046 'F', or U+0061 'a' ... U+0066 'f'. For strings, true if all characters in the string are ASCII hex digits. Read more
fn is_ascii_punctuation(&self) -> bool
Checks if the value is an ASCII punctuation character: U+0021 ... U+002F ! " # $ % & ' ( ) * + , - . / U+003A ... U+0040 : ; < = > ? @ U+005B ... U+0060 [ \\ ] ^ _ \U+007B ... U+007E{ | } ~` For strings, true if all characters in the string are ASCII punctuation. Read more
fn is_ascii_graphic(&self) -> bool
Checks if the value is an ASCII graphic character: U+0021 '@' ... U+007E '~'. For strings, true if all characters in the string are ASCII punctuation. Read more
fn is_ascii_whitespace(&self) -> bool
Checks if the value is an ASCII whitespace character: U+0020 SPACE, U+0009 HORIZONTAL TAB, U+000A LINE FEED, U+000C FORM FEED, or U+000D CARRIAGE RETURN. For strings, true if all characters in the string are ASCII whitespace. Read more
fn is_ascii_control(&self) -> bool
Checks if the value is an ASCII control character: U+0000 NUL ... U+001F UNIT SEPARATOR, or U+007F DELETE. Note that most ASCII whitespace characters are control characters, but SPACE is not. Read more