TL;DR
Expression make up the body of Rust functions and thus the majority of Rust code. Most things in Rust are expressions. Control flow in Rust is entirely expression-oriented.
Rust usually makes a sharp distinction between references and the values they refer to. If you pass a &i32
to a function that expects an i32
, that’s a type error. You’ll notice that the .
operator relaxes those rules a bit. In the method call player.location()
, player
might be a Player
, a reference of type &Player
, or a smart pointer of type Box<Player>
or Rc<Player>
. The .location()
method might take the player either by value or by reference. The same .location()
syntax works in all cases, because Rust’s .
operator automatically dereferences player
or borrows a reference to it as needed.
- 传参时参数类型必须匹配,方法调用时规则会放松一些。
An Expression Language
In C, there is a sharp distinction between expressions and statements. Expressions have values. Statements don’t.
|
|
Rust is what is called an expression language. It follows an older tradition, dating back to Lisp, where expressions do all the work.
In C, if
and switch
are statements. They don’t produce a value, and they can’t be used in the middle of an expression. In Rust, if
and match
can produce values:
|
|
This explains why Rust does not have C’s ternary operator (expr1 ? expr2 : expr3
). In C, it is a handy expression-level analogue to the if
statement. It would be redundant in Rust: the if
expression handles both cases.
Most of the control flow tools in C are statements. In Rust, they are all expressions.
Precedence and Associativity
Like most programming languages, Rust has operator precedence to determine the order of operations when an expression contains multiple adjacent operators.
All of the operators that can usefully be chained are left-associative. That is, a chain of operations such as a - b - c
is grouped as (a - b) - c
, not a - (b - c)
.
Operators that can be chained:
|
|
The comparison operators, the assignment operators, and the range operators ..
and ..=
can’t be chained at all.
Blocks and Semicolons
Blocks are the most general kind of expression. A block produces a value and can be used anywhere a value is needed:
|
|
If a block has semicolons in all the familiar places like C or Java, then its value will be ()
. When you leave the semicolon off the last line of a block, that makes the value of the block the value of its final expression, rather than the usual ()
.
An if
without an else must always return ()
.
Declarations
In addition to expressions and semicolons, a block may contain any number of declarations. The most common are let
declarations, which declare local variables:
|
|
The type
and initializer
are optional. The semicolon is required.
Like all identifiers in Rust, variable names must start with a letter or underscore, and can contain digits only after that first character. Rust has a broad definition of “letter”: it includes Greek letters, accented Latin characters, and many more symbols—anything that Unicode Standard Annex #31 declares suitable. Emoji aren’t allowed.
A let
declaration can declare a variable without initializing it. The variable can then be initialized with a later assignment:
|
|
Here there are two different ways the local variable name
might be initialized, but either way it will be initialized exactly once, so name
does not need to be declared mut
. It’s an error to use a variable before it’s initialized. (This is closely related to the error of using a value after it’s been moved. Rust really wants you to use values only while they exist.)
Shadowing:
|
|
The type of the first variable line is Result<String, io::Error>
. The second line
is a String
. Its definition supersedes the first’s for the rest of the block.
A block can also contain item declarations. An item is simply any declaration that could appear globally in a program or module, such as a fn
, struct
, or use
.
Any block may contain an fn
:
|
|
When an fn
is declared inside a block, its scope is the entire block—that is, it can be used throughout the enclosing block. But a nested fn
cannot access local variables or arguments that happen to be in scope. cmp_by_timestamp_then_name
could not use v
directly. Closures in Rust see into enclosing scopes.
A block can even contain a whole module.
if
and match
The following is the form of an if
expression:
|
|
Each condition
must be an expression of type bool
; true to form, Rust does not implicitly convert numbers or pointers to Boolean values.
Parentheses are not required around conditions. rustc
will emit a warning if unnecessary parentheses are present. The curly braces, however, are required.
The else if
blocks, as well as the final else
, are optional. An if
expression with no else
block behaves exactly as though it had an empty else
block.
match
expressions are something like the C switch
statement, but more flexible:
|
|
The wildcard pattern _
matches everything. This is like the default:
case in a switch
statement, except that it must come last; placing a _
pattern before other patterns means that it will have precedence over them. Those patterns will never match anything.
The compiler can optimize this kind of match using a jump table, just like a switch
statement in C++. A similar optimization is applied when each arm of a match
produces a constant value. In that case, the compiler builds an array of those values, and the match
is compiled into an array access. Apart from a bounds check, there is no branching at all in the compiled code.
ChatGPT: A jump table, also known as a dispatch table or a branch table, is a data structure used in computer programming to efficiently handle multiple branches or cases in a program. The jump table contains a list of addresses or offsets that correspond to different code blocks or functions. Each entry in the table represents a specific case or branch. When the program encounters a
switch
statement or a similar construct, it evaluates an expression to determine which branch to take. By using the index obtained from the expression’s value, the program can efficiently “jump” directly to the desired code block or function without the need for multiple conditional checks. This approach is particularly useful when there are a large number of cases or when the cases are non-contiguous, as it avoids the need for a linear search through the cases. They are commonly used in programming languages like C and C++ to implementswitch
statements.
The general form of a match
expression is:
|
|
The comma after an arm may be dropped if the expr
is a block.
Rust checks the given value
against each pattern in turn, starting with the first. When a pattern matches, the corresponding expr
is evaluated, and the match
expression is complete; no further patterns are checked. At least one of the patterns must match. Rust prohibits match
expressions that do not cover all possible values:
|
|
All blocks of an if
expression must produce values of the same type. Similarly, all arms of a match
expression must have the same type:
|
|
The versatility of match
stems from the variety of supported patterns that can be used to the left of =>
in each arm. A pattern can match a range of values. It can unpack tuples. It can match against individual fields of structs. It can chase references, borrow parts of a value, and more. Rust’s patterns are a mini-language of their own.
if let
if let
expression is another form of if
expression:
|
|
The given expr
either matches the pattern
, in which case block1
runs, or doesn’t match, and block2
runs. Sometimes this is a nice way to get data out of an Option
or Result
:
|
|
It’s never strictly necessary to use if let
, because match
can do everything if let
can do. An if let
expression is shorthand for a match
with just one pattern:
|
|
Loops
There are four looping expressions:
|
|
Loops are expressions in Rust. The value of a while
or for
loop is always ()
. A loop
expression can produce a value if you specify one.
A while
loop behaves exactly like the C equivalent, except that, again, the condition
must be of the exact type bool
.
The while let
loop is analogous to if let
. At the beginning of each while let
loop iteration, the value of expr
either matches the given pattern
, in which case the block runs, or doesn’t, in which case the loop exits.
Use loop
to write infinite loops. It executes the block repeatedly forever (or until a break
or return
is reached or the thread panics).
A for
loop evaluates the iterable
expression and then evaluates the block once for each value in the resulting iterator:
|
|
The ..
operator produces a range, a simple struct with two fields: start
and end
. 0..20
is the same as std::ops::Range { start: 0, end: 20 }
. Ranges
can be used with for
loops because Range
is an iterable type: it implements the std::iter::IntoIterator
trait. The standard collections like Vec
and HashMap
are all iterable, as are arrays and slices.
A for
loop over a value consumes the value:
|
|
The easy remedy is to loop over a reference to the collection instead:
|
|
Iterating over a mut
reference provides a mut reference to each element:
|
|
Control Flow in Loops
A break
expression exits an enclosing loop. In Rust, break
works only in loops. It is not necessary in match
expressions, which are unlike switch
statements in this regard.
Within the body of a loop
, you can give break
an expression, whose value becomes that of the loop
:
|
|
Naturally, all the break
expressions within a loop
must produce values with the same type, which becomes the type of the loop
itself.
A continue
expression jumps to the next loop iteration:
|
|
A loop can be labeled with a lifetime:
|
|
A break
can have both a label and a value expression:
|
|
Labels can also be used with continue
.
return
Expressions
A return
expression exits the current function, returning a value to the caller.
return
without a value is shorthand for return ()
:
|
|
Functions don’t have to have an explicit return
expression. The body of a function works like a block expression: if the last expression isn’t followed by a semicolon, its value is the function’s return value. This is the preferred way to supply a function’s return value in Rust.
Like a break
expression, return
can abandon work in progress:
|
|
If the call to File::create()
returns Ok(f)
, then the whole match
expression evaluates to f
, so f
is stored in output
, and we continue with the next line of code following the match
. Otherwise, we’ll match Err(err)
and hit the return
expression. When that happens, it doesn’t matter that we’re in the middle of evaluating a match
expression to determine the value of the variable output
. We abandon all of that and exit the enclosing function, returning whatever error we got from File::create()
.
Why Rust Has loop
Several pieces of the Rust compiler analyze the flow of control through your program:
- Rust checks that every path through a function returns a value of the expected return type. To do this correctly, it needs to know whether it’s possible to reach the end of the function.
- Rust checks that local variables are never used uninitialized. This entails checking every path through a function to make sure there’s no way to reach a place where a variable is used without having already passed through code that initializes it.
- Rust warns about unreachable code. Code is unreachable if no path through the function reaches it.
These are called flow-sensitive analyses. They are nothing new; Java has had a “definite assignment” analysis, similar to Rust’s, for years.
When enforcing this sort of rule, a language must strike a balance between simplicity, which makes it easier for programmers to figure out what the compiler is talking about sometimes, and cleverness, which can help eliminate false warnings and cases where the compiler rejects a perfectly safe program.
Rust went for simplicity. Its flow-sensitive analyses do not examine loop conditions at all, instead simply assuming that any condition in a program can be either true or false.
This causes Rust to reject some safe programs:
|
|
The error here is bogus. This function only exits via the return
statement, so the fact that the while
loop doesn’t produce an i32
is irrelevant.
The loop
expression is offered as a “say-what-you-mean” solution to this problem.
Rust’s type system is affected by control flow, too. All branches of an if
expression must have the same type. But it would be silly to enforce this rule on blocks that end with a break
or return
expression, an infinite loop
, or a call to panic!()
or std::process::exit()
. What all those expressions have in common is that they never finish in the usual way, producing a value.
So in Rust, these expressions don’t have a normal type. Expressions that don’t finish normally are assigned the special type !
, and they’re exempt from the rules about types having to match. You can see !
in the function signature of std::process::exit()
:
|
|
The !
means that exit()
never returns. It’s a divergent function.
You can write divergent functions of your own using the same syntax:
|
|
Rust considers it an error if the self-defined divergent function can return normally.
Function and Method Calls
The syntax for calling functions and methods is the same in Rust:
|
|
Rust usually makes a sharp distinction between references and the values they refer to. If you pass a &i32
to a function that expects an i32
, that’s a type error. You’ll notice that the .
operator relaxes those rules a bit. In the method call player.location()
, player
might be a Player
, a reference of type &Player
, or a smart pointer of type Box<Player>
or Rc<Player>
. The .location()
method might take the player either by value or by reference. The same .location()
syntax works in all cases, because Rust’s .
operator automatically dereferences player
or borrows a reference to it as needed.
- 传参时参数类型必须匹配,方法调用时规则会放松一些。
A third syntax is used for calling type-associated functions:
|
|
These are similar to static methods in object-oriented languages: ordinary methods are called on values (like my_vec.len()
), and type-associated functions are called on types (like Vec::new()
).
Method calls can be chained:
|
|
One quirk of Rust syntax is that in a function call or method call, the usual syntax for generic types, Vec<T>
, does not work:
|
|
The problem is that in expressions, <
is the less-than operator. The Rust compiler suggests writing ::<T>
instead of <T>
in this case, and that solves the problem.
|
|
The symbol ::<...>
is known in the Rust community as the turbofish.
Alternatively, it is often possible to drop the type parameters and let Rust infer them:
|
|
It’s considered good style to omit the types whenever they can be inferred.
Fields and Elements
The fields of a struct are accessed using familiar syntax. Tuples are the same except that their fields have numbers rather than names:
|
|
If the value to the left of the dot is a reference or smart pointer type, it is automatically dereferenced, just as for method calls.
Square brackets access the elements of an array, a slice, or a vector:
|
|
The value to the left of the brackets is automatically dereferenced.
Expressions like these three are called lvalues, because they can appear on the left side of an assignment:
|
|
Extracting a slice from an array or vector is straightforward:
|
|
Here, game_moves
may be either an array, a slice, or a vector; the result is a borrowed slice of length end - midpoint
. game_moves
is considered borrowed for the lifetime of second_half
.
The ..
operator allows either operand to be omitted:
|
|
The latter two forms are end-exclusive (or half-open): the end value is not included in the range represented.
The ..=
operator produces end-inclusive (or closed) ranges, which do include the end value:
|
|
Only ranges that include a start value are iterable, since a loop must have somewhere to start. But in array slicing, all six forms are useful. If the start or end of the range is omitted, it defaults to the start or end of the data being sliced.
Reference Operators
The unary *
operator is used to access the value pointed to by a reference.
Rust automatically follows references when you use the .
operator to access a field or method, so the *
operator is necessary only when we want to read or write the entire value that the reference points to.
Arithmetic, Bitwise, Comparison, and Logical Operators
Rust has the usual arithmetic operators, +
, -
, *
, /
, and %
. Integer overflow is detected, and causes a panic, in debug builds. The standard library provides methods like a.wrapping_add(b)
for unchecked arithmetic.
Integer division rounds toward zero, and dividing an integer by zero triggers a panic even in release builds. Integers have a method a.checked_div(b)
that returns an Option
(None
if b
is zero) and never panics.
Unary -
negates a number. It is supported for all the numeric types except unsigned integers. There is no unary +
operator.
a % b
computes the signed remainder, or modulus, of division rounding toward zero. The result has the same sign as the lefthand operand. %
can be used on floating-point numbers as well as integers:
|
|
Rust also inherits C’s bitwise integer operators, &
, |
, ^
, <<
, and >>
. However, Rust uses !
instead of ~
for bitwise NOT.
Bit shifting is always sign-extending on signed integer types and zero-extending on unsigned integer types.
Rust’s comparison operators are ==
, !=
, <
, <=
, >
, and >=
. The two values being compared must have the same type.
Rust has the two short-circuiting logical operators &&
and ||
. Both operands must have the exact type bool
.
Assignment
The =
operator can be used to assign to mut
variables and their fields or elements. But assignment is not as common in Rust as in other languages, since variables are immutable by default.
If the value has a non-Copy
type, assignment moves it into the destination. Ownership of the value is transferred from the source to the destination. The destination’s prior value, if any, is dropped.
Compound assignment is supported:
|
|
Unlike C, Rust doesn’t support chaining assignment: you can’t write a = b = 3
to assign the value 3
to both a
and b
.
Rust does not have C’s increment and decrement operators ++
and --
.
Type Casts
Converting a value from one type to another usually requires an explicit cast in Rust. Casts use the as
keyword:
|
|
Several kinds of casts are permitted:
- Numbers may be cast from any of the built-in numeric types to any other.
- Casting an integer to another integer type is always well-defined. In short, there are no surprises.
- Converting to a narrower type results in truncation.
- A signed integer cast to a wider type is sign-extended, an unsigned integer is zero-extended, and so on.
- Converting from a floating-point type to an integer type rounds toward zero.
- The value of
-1.99 as i32
is-1
.
- The value of
- If the value is too large to fit in the integer type, the cast produces the closest value that the integer type can represent: the value of
1e6 as u8
is255
.
- Casting an integer to another integer type is always well-defined. In short, there are no surprises.
- Values of type
bool
orchar
, or of a C-likeenum
type, may be cast to any integer type.- Casting in the other direction is not allowed, as
bool
,char
, andenum
types all have restrictions on their values that would have to be enforced with run-time checks.- Casting a
u16
to typechar
is banned because someu16
values, like0xd800
, correspond to Unicode surrogate code points and therefore would not make validchar
values. There is a standard method,std::char::from_u32()
, which performs the run-time check and returns anOption<char>
. The need for this kind of conversion has grown rare. We typically convert whole strings or streams at once, and algorithms on Unicode text are often nontrivial and best left to libraries.
- Casting a
- As an exception, a
u8
may be cast to typechar
, since all integers from 0 to 255 are valid Unicode code points forchar
to hold.
- Casting in the other direction is not allowed, as
- Some casts involving unsafe pointer types are also allowed.
A conversion usually requires a cast. A few conversions involving reference types are so straightforward that the language performs them even without a cast:
- Values of type
&String
auto-convert to type&str
without a cast. - Values of type
&Vec<i32>
auto-convert to&[i32]
. - Values of type
&Box<Chessboard>
auto-convert to&Chessboard
. - Values of a
mut
reference auto-convert to a non-mut
reference without a cast.
These are called deref coercions, because they apply to types that implement the Deref
built-in trait. The purpose of Deref
coercion is to make smart pointer types, like Box
, behave as much like the underlying value as possible. Using a Box<Chessboard>
is mostly just like using a plain Chessboard
, thanks to Deref
.
User-defined types can implement the Deref
trait, too.
Closures
Rust has closures, lightweight function-like values. A closure usually consists of an argument list, given between vertical bars, followed by an expression:
|
|
Rust infers the argument types and return type. You can also write them out explicitly. If you do specify a return type, then the body of the closure must be a block:
|
|
Calling a closure uses the same syntax as calling a function:
|
|