Expressions in Rust

TL;DR

Expression make up the body of Rust functions and thus the majority of Rust code. Most things in Rust are expressions. Control flow in Rust is entirely expression-oriented.

Rust usually makes a sharp distinction between references and the values they refer to. If you pass a &i32 to a function that expects an i32, that’s a type error. You’ll notice that the . operator relaxes those rules a bit. In the method call player.location(), player might be a Player, a reference of type &Player, or a smart pointer of type Box<Player> or Rc<Player>. The .location() method might take the player either by value or by reference. The same .location() syntax works in all cases, because Rust’s . operator automatically dereferences player or borrows a reference to it as needed.

传参时参数类型必须匹配，方法调用时规则会放松一些。

An Expression Language

In C, there is a sharp distinction between expressions and statements. Expressions have values. Statements don’t.

1
2
3
4
5
6
7
8
// expression
5 * (fahr-32) / 9

// statement
for (; begin != end; ++begin) {
    if (*begin == target)
        break;
}

Rust is what is called an expression language. It follows an older tradition, dating back to Lisp, where expressions do all the work.

In C, if and switch are statements. They don’t produce a value, and they can’t be used in the middle of an expression. In Rust, if and match can produce values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// a match expression produces a numeric value
pixels[r * bounds.0 + c] =
    match escapes(Complex { re: point.0, im: point.1 }, 255) {
        None => 0,
        Some(count) => 255 - count as u8
    };

// if expression is used to initialize a variable
let status =
    if cpu.temperature <= MAX_TEMP {
        HttpStatus::Ok
    } else {
        HttpStatus::ServerError
    };

// a match expression is passed as an argument to a function or macro
println!("Inside the vat, you see {}.",
    match vat.contents {
        Some(brain) => brain.desc(),
        None => "nothing of interest"
    });

This explains why Rust does not have C’s ternary operator (expr1 ? expr2 : expr3). In C, it is a handy expression-level analogue to the if statement. It would be redundant in Rust: the if expression handles both cases.

Most of the control flow tools in C are statements. In Rust, they are all expressions.

Precedence and Associativity

Like most programming languages, Rust has operator precedence to determine the order of operations when an expression contains multiple adjacent operators.

All of the operators that can usefully be chained are left-associative. That is, a chain of operations such as a - b - c is grouped as (a - b) - c, not a - (b - c).

Operators that can be chained:

1
* / % + - << >> & ^ | && || as

The comparison operators, the assignment operators, and the range operators .. and ..= can’t be chained at all.

Blocks and Semicolons

Blocks are the most general kind of expression. A block produces a value and can be used anywhere a value is needed:

1
2
3
4
5
6
7
8
let display_name = match post.author() {
    Some(author) => author.name(),
    None => {
        let network_info = post.get_network_metadata()?;
        let ip = network_info.client_address();
        ip.to_string()
    }
};

If a block has semicolons in all the familiar places like C or Java, then its value will be (). When you leave the semicolon off the last line of a block, that makes the value of the block the value of its final expression, rather than the usual ().

An if without an else must always return ().

Declarations

In addition to expressions and semicolons, a block may contain any number of declarations. The most common are let declarations, which declare local variables:

1
let name: type = expr;

The type and initializer are optional. The semicolon is required.

Like all identifiers in Rust, variable names must start with a letter or underscore, and can contain digits only after that first character. Rust has a broad definition of “letter”: it includes Greek letters, accented Latin characters, and many more symbols—anything that Unicode Standard Annex #31 declares suitable. Emoji aren’t allowed.

A let declaration can declare a variable without initializing it. The variable can then be initialized with a later assignment:

1
2
3
4
5
6
7
let name;
if user.has_nickname() {
    name = user.nickname();
} else {
    name = generate_unique_name();
    user.register(&name);
}

Here there are two different ways the local variable name might be initialized, but either way it will be initialized exactly once, so name does not need to be declared mut. It’s an error to use a variable before it’s initialized. (This is closely related to the error of using a value after it’s been moved. Rust really wants you to use values only while they exist.)

Shadowing:

1
2
3
for line in file.lines() {
    let line = line?;
}

The type of the first variable line is Result<String, io::Error>. The second line is a String. Its definition supersedes the first’s for the rest of the block.

A block can also contain item declarations. An item is simply any declaration that could appear globally in a program or module, such as a fn, struct, or use.

Any block may contain an fn:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use std::io;
use std::cmp::Ordering;
fn show_files() -> io::Result<()> {
    let mut v = vec![];
    // ...
    fn cmp_by_timestamp_then_name(a: &FileInfo, b: &FileInfo) -> Ordering {
        a.timestamp.cmp(&b.timestamp)   // first, compare timestamps
            .reverse()                  // newest file first
            .then(a.path.cmp(&b.path))  // compare paths to break ties
    }
    v.sort_by(cmp_by_timestamp_then_name);
    // ...
}

When an fn is declared inside a block, its scope is the entire block—that is, it can be used throughout the enclosing block. But a nested fn cannot access local variables or arguments that happen to be in scope. cmp_by_timestamp_then_name could not use v directly. Closures in Rust see into enclosing scopes.

A block can even contain a whole module.

`if` and `match`

The following is the form of an if expression:

1
2
3
4
5
6
7
if condition1 {
    block1
} else if condition2 {
    block2
} else {
    block_n
}

Each condition must be an expression of type bool; true to form, Rust does not implicitly convert numbers or pointers to Boolean values.

Parentheses are not required around conditions. rustc will emit a warning if unnecessary parentheses are present. The curly braces, however, are required.

The else if blocks, as well as the final else, are optional. An if expression with no else block behaves exactly as though it had an empty else block.

match expressions are something like the C switch statement, but more flexible:

1
2
3
4
5
6
7
// each pattern is a constant integer
match code {
    0 => println!("OK"),
    1 => println!("Wires Tangled"),
    2 => println!("User Asleep"),
    _ => println!("Unrecognized Error {}", code)
}

The wildcard pattern _ matches everything. This is like the default: case in a switch statement, except that it must come last; placing a _ pattern before other patterns means that it will have precedence over them. Those patterns will never match anything.

The compiler can optimize this kind of match using a jump table, just like a switch statement in C++. A similar optimization is applied when each arm of a match produces a constant value. In that case, the compiler builds an array of those values, and the match is compiled into an array access. Apart from a bounds check, there is no branching at all in the compiled code.

ChatGPT: A jump table, also known as a dispatch table or a branch table, is a data structure used in computer programming to efficiently handle multiple branches or cases in a program. The jump table contains a list of addresses or offsets that correspond to different code blocks or functions. Each entry in the table represents a specific case or branch. When the program encounters a switch statement or a similar construct, it evaluates an expression to determine which branch to take. By using the index obtained from the expression’s value, the program can efficiently “jump” directly to the desired code block or function without the need for multiple conditional checks. This approach is particularly useful when there are a large number of cases or when the cases are non-contiguous, as it avoids the need for a linear search through the cases. They are commonly used in programming languages like C and C++ to implement switch statements.

The general form of a match expression is:

1
2
3
4
match value {
    pattern => expr,
    // ...
}

The comma after an arm may be dropped if the expr is a block.

Rust checks the given value against each pattern in turn, starting with the first. When a pattern matches, the corresponding expr is evaluated, and the match expression is complete; no further patterns are checked. At least one of the patterns must match. Rust prohibits match expressions that do not cover all possible values:

1
2
3
4
5
let score = match card.rank {
    Jack => 10,
    Queen => 10,
    Ace => 11
}; // error: nonexhaustive patterns

All blocks of an if expression must produce values of the same type. Similarly, all arms of a match expression must have the same type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
let suggested_pet =
    if with_wings { Pet::Buzzard } else { Pet::Hyena };  // ok

let favorite_number =
    if user.is_hobbit() { "eleventy-one" } else { 9 };  // error

let best_sports_team =
    if is_hockey_season() { "Predators" };              // error; during non-hockey season, the value if expression would be `()`

let suggested_pet =
    match favorites.element {
        Fire => Pet::RedPanda,
        Air => Pet::Buffalo,
        Water => Pet::Orca,
        _ => None  // error: incompatible types
    };

The versatility of match stems from the variety of supported patterns that can be used to the left of => in each arm. A pattern can match a range of values. It can unpack tuples. It can match against individual fields of structs. It can chase references, borrow parts of a value, and more. Rust’s patterns are a mini-language of their own.

`if let`

if let expression is another form of if expression:

1
2
3
4
5
if let pattern = expr {
    block1
} else {
    block2
}

The given expr either matches the pattern, in which case block1 runs, or doesn’t match, and block2 runs. Sometimes this is a nice way to get data out of an Option or Result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if let Some(cookie) = request.session_cookie {
    return restore_session(cookie);
}

if let Err(err) = show_cheesy_anti_robot_task() {
    log_robot_attempt(err);
    politely_accuse_user_of_being_a_robot();
} else {
    session.mark_as_human();
}    

It’s never strictly necessary to use if let, because match can do everything if let can do. An if let expression is shorthand for a match with just one pattern:

1
2
3
4
match expr {
    pattern => { block1 }
    _ => { block2 }
}

Loops

There are four looping expressions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
while condition {
    block
}

while let pattern = expr {
    block
}

loop {
    block
}

for pattern in iterable {
    block
}

Loops are expressions in Rust. The value of a while or for loop is always (). A loop expression can produce a value if you specify one.

A while loop behaves exactly like the C equivalent, except that, again, the condition must be of the exact type bool.

The while let loop is analogous to if let. At the beginning of each while let loop iteration, the value of expr either matches the given pattern, in which case the block runs, or doesn’t, in which case the loop exits.

Use loop to write infinite loops. It executes the block repeatedly forever (or until a break or return is reached or the thread panics).

A for loop evaluates the iterable expression and then evaluates the block once for each value in the resulting iterator:

1
2
3
4
5
6
7
8
for i in 0..20 {
    println!("{}", i);
} // prints 0 through 19

// C equivalent
for (int i = 0; i < 20; i++) {
    printf("%d\n", i);
}

The .. operator produces a range, a simple struct with two fields: start and end. 0..20 is the same as std::ops::Range { start: 0, end: 20 }. Ranges can be used with for loops because Range is an iterable type: it implements the std::iter::IntoIterator trait. The standard collections like Vec and HashMap are all iterable, as are arrays and slices.

A for loop over a value consumes the value:

1
2
3
4
5
let strings: Vec<String> = error_messages();
for s in strings {  // each String is moved into s here...
    println!("{}", s);
}                   // ...and dropped here
println!("{} error(s)", strings.len()); // error: use of moved value

The easy remedy is to loop over a reference to the collection instead:

1
2
3
4
// the type of &strings is &Vec<String>; the type of rs is &String
for rs in &strings {
    println!("String {:?} is at address {:p}.", *rs, rs);
}

Iterating over a mut reference provides a mut reference to each element:

1
2
3
for rs in &mut strings {    // the type of rs is &mut String
    rs.push('\n');          // add a newline to each string
}

Control Flow in Loops

A break expression exits an enclosing loop. In Rust, break works only in loops. It is not necessary in match expressions, which are unlike switch statements in this regard.

Within the body of a loop, you can give break an expression, whose value becomes that of the loop:

1
2
3
4
5
6
7
8
9
let answer = loop {
    if let Some(line) = next_line() {
        if line.starts_with("answer: ") {
            break line;
        }
    } else {
        break "answer: nothing";
    }
};

Naturally, all the break expressions within a loop must produce values with the same type, which becomes the type of the loop itself.

A continue expression jumps to the next loop iteration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Read some data, one line at a time.
for line in input_lines {
    let trimmed = trim_comments_and_whitespace(line);
    if trimmed.is_empty() {
        // Jump back to the top of the loop and
        // move on to the next line of input.
        continue;
    }
    // ...
}

A loop can be labeled with a lifetime:

1
2
3
4
5
6
7
8
9
'search:
for room in apartment {
    for spot in room.hiding_spots() {
        if spot.contains(keys) {
            println!("Your keys are {} in the {}.", spot, room);
            break 'search;
        }
    }
}

A break can have both a label and a value expression:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Find the square root of the first perfect square
// in the series.
let sqrt = 'outer: loop {
    let n = next_number();
    for i in 1.. {
        let square = i * i;
        if square == n {
            // Found a square root.
            break 'outer i;
        }
        // `if` without an `else`
        if square > n {
            // `n` isn't a perfect square, try the next
            break;
        }
    }
};

Labels can also be used with continue.

`return` Expressions

A return expression exits the current function, returning a value to the caller.

return without a value is shorthand for return ():

1
2
3
fn f() {    // return type omitted: defaults to ()
    return; // return value omitted: defaults to ()
}

Functions don’t have to have an explicit return expression. The body of a function works like a block expression: if the last expression isn’t followed by a semicolon, its value is the function’s return value. This is the preferred way to supply a function’s return value in Rust.

Like a break expression, return can abandon work in progress:

1
2
3
4
5
6
7
let output = File::create(filename)?;

// equivalent to
let output = match File::create(filename) {
    Ok(f) => f,
    Err(err) => return Err(err)
};

If the call to File::create() returns Ok(f), then the whole match expression evaluates to f, so f is stored in output, and we continue with the next line of code following the match. Otherwise, we’ll match Err(err) and hit the return expression. When that happens, it doesn’t matter that we’re in the middle of evaluating a match expression to determine the value of the variable output. We abandon all of that and exit the enclosing function, returning whatever error we got from File::create().

Why Rust Has `loop`

Several pieces of the Rust compiler analyze the flow of control through your program:

Rust checks that every path through a function returns a value of the expected return type. To do this correctly, it needs to know whether it’s possible to reach the end of the function.
Rust checks that local variables are never used uninitialized. This entails checking every path through a function to make sure there’s no way to reach a place where a variable is used without having already passed through code that initializes it.
Rust warns about unreachable code. Code is unreachable if no path through the function reaches it.

These are called flow-sensitive analyses. They are nothing new; Java has had a “definite assignment” analysis, similar to Rust’s, for years.

When enforcing this sort of rule, a language must strike a balance between simplicity, which makes it easier for programmers to figure out what the compiler is talking about sometimes, and cleverness, which can help eliminate false warnings and cases where the compiler rejects a perfectly safe program.

Rust went for simplicity. Its flow-sensitive analyses do not examine loop conditions at all, instead simply assuming that any condition in a program can be either true or false.

This causes Rust to reject some safe programs:

1
2
3
4
5
6
7
fn wait_for_process(process: &mut Process) -> i32 {
    while true {
        if process.wait() {
            return process.exit_code();
        }
    }
} // error: mismatched types: expected i32, found ()

The error here is bogus. This function only exits via the return statement, so the fact that the while loop doesn’t produce an i32 is irrelevant.

The loop expression is offered as a “say-what-you-mean” solution to this problem.

Rust’s type system is affected by control flow, too. All branches of an if expression must have the same type. But it would be silly to enforce this rule on blocks that end with a break or return expression, an infinite loop, or a call to panic!() or std::process::exit(). What all those expressions have in common is that they never finish in the usual way, producing a value.

So in Rust, these expressions don’t have a normal type. Expressions that don’t finish normally are assigned the special type !, and they’re exempt from the rules about types having to match. You can see ! in the function signature of std::process::exit():

1
fn exit(code: i32) -> !

The ! means that exit() never returns. It’s a divergent function.

You can write divergent functions of your own using the same syntax:

1
2
3
4
5
6
7
fn serve_forever(socket: ServerSocket, handler: ServerHandler) -> ! {
    socket.listen();
    loop {
        let s = socket.accept();
        handler.handle(s);
    }
}

Rust considers it an error if the self-defined divergent function can return normally.

Function and Method Calls

The syntax for calling functions and methods is the same in Rust:

1
2
let x = gcd(1302, 462);         // function call
let room = player.location();   // method call

传参时参数类型必须匹配，方法调用时规则会放松一些。

A third syntax is used for calling type-associated functions:

1
let mut numbers = Vec::new(); // type-associated function call

These are similar to static methods in object-oriented languages: ordinary methods are called on values (like my_vec.len()), and type-associated functions are called on types (like Vec::new()).

Method calls can be chained:

1
2
3
server
    .bind("127.0.0.1:3000").expect("error binding server to address")
    .run().expect("error running server");

One quirk of Rust syntax is that in a function call or method call, the usual syntax for generic types, Vec<T>, does not work:

1
2
3
return Vec<i32>::with_capacity(1000);       // error: something about chained comparisons

let ramp = (0 .. n).collect<Vec<i32>>();    // same error

The problem is that in expressions, < is the less-than operator. The Rust compiler suggests writing ::<T> instead of <T> in this case, and that solves the problem.

1
2
3
return Vec::<i32>::with_capacity(1000);     // ok, using ::<

let ramp = (0 .. n).collect::<Vec<i32>>();  // ok, using ::<

The symbol ::<...> is known in the Rust community as the turbofish.

Alternatively, it is often possible to drop the type parameters and let Rust infer them:

1
2
3
return Vec::with_capacity(10);              // ok, if the fn return type is Vec<i32>

let ramp: Vec<i32> = (0 .. n).collect();    // ok, variable's type is given

It’s considered good style to omit the types whenever they can be inferred.

Fields and Elements

The fields of a struct are accessed using familiar syntax. Tuples are the same except that their fields have numbers rather than names:

1
2
game.black_pawns    // struct field
coords.1            // tuple element

If the value to the left of the dot is a reference or smart pointer type, it is automatically dereferenced, just as for method calls.

Square brackets access the elements of an array, a slice, or a vector:

1
pieces[i]           // array element

The value to the left of the brackets is automatically dereferenced.

Expressions like these three are called lvalues, because they can appear on the left side of an assignment:

1
2
3
game.black_pawns = 0x00ff0000_00000000_u64;
coords.1 = 0;
pieces[2] = Some(Piece::new(Black, Knight, coords));

Extracting a slice from an array or vector is straightforward:

1
2
// extracting a slice from an array or vector
let second_half = &game_moves[midpoint .. end];

Here, game_moves may be either an array, a slice, or a vector; the result is a borrowed slice of length end - midpoint. game_moves is considered borrowed for the lifetime of second_half.

The .. operator allows either operand to be omitted:

1
2
3
4
..      // RangeFull
a ..    // RangeFrom { start: a }
.. b    // RangeTo { end: b }
a .. b  // Range { start: a, end: b }

The latter two forms are end-exclusive (or half-open): the end value is not included in the range represented.

The ..= operator produces end-inclusive (or closed) ranges, which do include the end value:

1
2
..= b   // RangeToInclusive { end: b }
a ..= b // RangeInclusive::new(a, b)

Only ranges that include a start value are iterable, since a loop must have somewhere to start. But in array slicing, all six forms are useful. If the start or end of the range is omitted, it defaults to the start or end of the data being sliced.

Reference Operators

The unary * operator is used to access the value pointed to by a reference.

Rust automatically follows references when you use the . operator to access a field or method, so the * operator is necessary only when we want to read or write the entire value that the reference points to.

Arithmetic, Bitwise, Comparison, and Logical Operators

Rust has the usual arithmetic operators, +, -, *, /, and %. Integer overflow is detected, and causes a panic, in debug builds. The standard library provides methods like a.wrapping_add(b) for unchecked arithmetic.

Integer division rounds toward zero, and dividing an integer by zero triggers a panic even in release builds. Integers have a method a.checked_div(b) that returns an Option (None if b is zero) and never panics.

Unary - negates a number. It is supported for all the numeric types except unsigned integers. There is no unary + operator.

a % b computes the signed remainder, or modulus, of division rounding toward zero. The result has the same sign as the lefthand operand. % can be used on floating-point numbers as well as integers:

1
let x = 1234.567 % 10.0; // approximately 4.567

Rust also inherits C’s bitwise integer operators, &, |, ^, <<, and >>. However, Rust uses ! instead of ~ for bitwise NOT.

Bit shifting is always sign-extending on signed integer types and zero-extending on unsigned integer types.

Rust’s comparison operators are ==, !=, <, <=, >, and >=. The two values being compared must have the same type.

Rust has the two short-circuiting logical operators && and ||. Both operands must have the exact type bool.

Assignment

The = operator can be used to assign to mut variables and their fields or elements. But assignment is not as common in Rust as in other languages, since variables are immutable by default.

If the value has a non-Copy type, assignment moves it into the destination. Ownership of the value is transferred from the source to the destination. The destination’s prior value, if any, is dropped.

Compound assignment is supported:

1
total += item.price;

Unlike C, Rust doesn’t support chaining assignment: you can’t write a = b = 3 to assign the value 3 to both a and b.

Rust does not have C’s increment and decrement operators ++ and --.

Type Casts

Converting a value from one type to another usually requires an explicit cast in Rust. Casts use the as keyword:

1
2
let x = 17;             // x is type i32
let index = x as usize; // convert to usize

Several kinds of casts are permitted:

Numbers may be cast from any of the built-in numeric types to any other.
- Casting an integer to another integer type is always well-defined. In short, there are no surprises.
  - Converting to a narrower type results in truncation.
  - A signed integer cast to a wider type is sign-extended, an unsigned integer is zero-extended, and so on.
- Converting from a floating-point type to an integer type rounds toward zero.
  - The value of -1.99 as i32 is -1.
- If the value is too large to fit in the integer type, the cast produces the closest value that the integer type can represent: the value of 1e6 as u8 is 255.
Values of type bool or char, or of a C-like enum type, may be cast to any integer type.
- Casting in the other direction is not allowed, as bool, char, and enum types all have restrictions on their values that would have to be enforced with run-time checks.
  - Casting a u16 to type char is banned because some u16 values, like 0xd800, correspond to Unicode surrogate code points and therefore would not make valid char values. There is a standard method, std::char::from_u32(), which performs the run-time check and returns an Option<char>. The need for this kind of conversion has grown rare. We typically convert whole strings or streams at once, and algorithms on Unicode text are often nontrivial and best left to libraries.
- As an exception, a u8 may be cast to type char, since all integers from 0 to 255 are valid Unicode code points for char to hold.
Some casts involving unsafe pointer types are also allowed.

A conversion usually requires a cast. A few conversions involving reference types are so straightforward that the language performs them even without a cast:

Values of type &String auto-convert to type &str without a cast.
Values of type &Vec<i32> auto-convert to &[i32].
Values of type &Box<Chessboard> auto-convert to &Chessboard.
Values of a mut reference auto-convert to a non-mut reference without a cast.

These are called deref coercions, because they apply to types that implement the Deref built-in trait. The purpose of Deref coercion is to make smart pointer types, like Box, behave as much like the underlying value as possible. Using a Box<Chessboard> is mostly just like using a plain Chessboard, thanks to Deref.

User-defined types can implement the Deref trait, too.

Closures

Rust has closures, lightweight function-like values. A closure usually consists of an argument list, given between vertical bars, followed by an expression:

1
2
let is_even = |x| x % 2 == 0;
assert_eq!(is_even(14), true);

Rust infers the argument types and return type. You can also write them out explicitly. If you do specify a return type, then the body of the closure must be a block:

1
2
let is_even = |x: u64| -> bool x % 2 == 0;      // error
let is_even = |x: u64| -> bool { x % 2 == 0 };  // ok

Calling a closure uses the same syntax as calling a function:

1
assert_eq!(is_even(14), true);

References

Programming Rust, 2nd Edition (Covers the Rust 2021 Edition)

TL;DR#

An Expression Language#

Precedence and Associativity#

Blocks and Semicolons#

Declarations#

if and match#

if let#

Loops#

Control Flow in Loops#

return Expressions#

Why Rust Has loop#

Function and Method Calls#

Fields and Elements#

Reference Operators#

Arithmetic, Bitwise, Comparison, and Logical Operators#

Assignment#

Type Casts#

Closures#