Data structures in Rust
2023-02-05
TL;DR
This blog post aims to provide a brief exploration of data structures in Rust, following the organization of The alghoritm design Manual by Skiena. The focus will be on the various data structures available in Rust std::Collections, and a brief comparison will be made with the structures listed by Skiena.
Toc
[
Hide]
-
1 Contiguous and Linked
- 1.1 Array vs Vectors vs Slices
- 1.2 VecDeque
- 1.3 LinkedList
- 1.4 Performance
-
2 Dictionaries
- 2.1 HashMap
- 2.2 BTreeMap
- 2.3 Performance
- 3 Set
-
4 Priority queues
- 4.1 BinaryHeap
Contiguous and Linked
Array vs Vectors vs Slices
Rust has essentially 3 way to represent a sequence of items, the simpler is the array, array is a collection of data of the same type, data are allocated continously on the stack, for the size of an array is immutable even if its content can be mutated.
let arr = [4, 5, 6];
let mut arr = [1, 2, 3];
arr[0] = 4;
Vectors are represented in std::Collections as Vec, they are dynamic and growable, and this thanks to the fact that they are stored on the heap. In Rust vectors are actually structs with 3 fields, a pointer to the item location (on the heap), a capacity: usize and a lenght: usize. they can be considered exactly as a c-like array.
let vector: Vec<u32> = vec![4, 5, 6];
v.push(8);
println!("{:?}", vector); // [4, 5, 6, 8]
Slices are pointers with an additional lenght: usize field, they can point to a Vector or to an Array. Slices are important because they allow a safe view inside a vector or inside an array without memory copies. They can be mutable.
let v: Vec<u32> = vec![4, 5, 6];
let sli = &v;
let mut_sli = &mut v;
VecDeque
VecDeque<T> is a double-ended queue, it allow efficient enqueue and dequeue on both ends of the queue. The struct is similar to a Vec but it stores an additional pointer to the bottom of the queue.
let mut deque = VecDeque::new();
// Add items to the deque
deque.push_back(1);
deque.push_back(2);
deque.push_front(3);
deque.front(); //3
deque.back(); //2
LinkedList
LinkedList<T> is a doubly linked list data structure that allows for efficient insertion and removal of elements at any position in the list. Each element in the list contains a value, as well as pointers to the prevous and next element in the list. Unlyke Vec<T>, linked lists do not store their elements on contiguous memory locations, which makes them less efficient for indexing and iteration but more efficient for insertion and removal elements in arbitrary positions.
let mut list = LinkedList::new();
// Add elements to the list
list.push_back(1);
list.push_back(2);
list.push_front(3);
// Remove the second element from the list
if let Some(second) = list.iter_mut().skip(1).next() {
*second = 4;
}
list.pop_back();
Performance
| get(i) | insert(i) | remove(i) | append | split_off(i) | |
|---|---|---|---|---|---|
| Vec | O(1) | O(n-i) | O(e-i) | O(m) | O(n-i) |
| VecDeque | O(1) | O(min(i, n-i)) | O(min(i, n-i)) | O(m) | O(min(i, n-i)) |
| LinkedList | O(min(i, n-i)) | O(min(i, n-i)) | O(min(i, n-i)) | O(1) | O(min(i, n-i)) |
Dictionaries
HashMap
HashMap<K, V> is a colleciton of key-value pairs implemented using a hash table, it allows efficient insertion, deletion and retrieval of elements based on their keys. Hash implies collisions and Rust manage collisions with Linked lists, every value is a linked list.
let mut map = HashMap::new();
map.insert("apple", 3);
map.insert("banana", 2);
map.insert("orange", 5);
map.get("banana"); //2
BTreeMap
BTreeMap<K, V> is essentially an HashMap sorted by key, the data structure is a Binary Tree. Each node in a Binary Tree contains a certain number of key-value pairs and pointers to it's child nodes, every node can have at most 2 childs, in the entire tree the childs with a lower value then the parent are stored as left-child, the childs with a greataer value are right-child. BTreeMap is a very efficient data structure that allow a fast search, insertion and deletion.
let mut map = BTreeMap::new();
map.insert(3, "foo");
map.insert(1, "bar");
map.get(&2);
Performance
| get | insert | remove | range | append | |
|---|---|---|---|---|---|
| HashMap | O(1) | O(1) | O(1) | N/A | N/A |
| BTreeMap | O(log(n)) | O(log(n)) | O(log(n)) | O(log(n)) | O(n+m) |
Set
HashSet
HashSet<T> are like a HashMap but without value fields, sets in general are collections that can store unique values, they are very fast (as HashMap) and they are important when it is necessary to check the presence of a key (i.e. already viewed or not).
let mut set: HashSet<i32> = HashSet::new();
set.insert(1);
set.insert(2);
set.insert(3);
set.insert(2); // This value will not be added since it already exists in the set
assert!(set.contains(&2)); // This will pass
assert!(!set.contains(&4)); // This will fail
BTreeSet
BTreeSet<T> is to BtreeMap as HashSet is to Hashmap, they store sorted unique keys.
let mut btree_set: BTreeSet<i32> = BTreeSet::new();
btree_set.insert(3);
btree_set.insert(2);
btree_set.insert(1);
btree_set.insert(2); // This value will not be added since it already exists in the set
assert!(btree_set.contains(&2)); // This will pass
assert!(!btree_set.contains(&4)); // This will fail
Performance for sets are on the same order of magnitude as the maps.
Priority queues
BinaryHeap
All structures discussed so far try to optimize main operations as insertion, deletion, search etc. If the goal is to do these operations but with max and min values the perfect solution is a BinaryHeap<T>. This structure represent a Priority Queue.
Someone could ask the difference between BinaryHeap and BTreeMap, both are binary trees but BinaryHeap is Balanced: the median value is the top parent.