Data structures in Rust

2023-02-05

TL;DR

This blog post aims to provide a brief exploration of data structures in Rust, following the organization of The alghoritm design Manual by Skiena. The focus will be on the various data structures available in Rust std::Collections, and a brief comparison will be made with the structures listed by Skiena.

Toc
[ Hide]

Contiguous and Linked

Array vs Vectors vs Slices

Rust has essentially 3 way to represent a sequence of items, the simpler is the array, array is a collection of data of the same type, data are allocated continously on the stack, for the size of an array is immutable even if its content can be mutated.

let arr = [4, 5, 6];
let mut arr = [1, 2, 3];
arr[0] = 4;

Vectors are represented in std::Collections as Vec, they are dynamic and growable, and this thanks to the fact that they are stored on the heap. In Rust vectors are actually structs with 3 fields, a pointer to the item location (on the heap), a capacity: usize and a lenght: usize. they can be considered exactly as a c-like array.

let vector: Vec<u32> = vec![4, 5, 6];
v.push(8);
println!("{:?}", vector); // [4, 5, 6, 8]

Slices are pointers with an additional lenght: usize field, they can point to a Vector or to an Array. Slices are important because they allow a safe view inside a vector or inside an array without memory copies. They can be mutable.

let v: Vec<u32> = vec![4, 5, 6];
let sli = &v;
let mut_sli = &mut v;

VecDeque

VecDeque<T> is a double-ended queue, it allow efficient enqueue and dequeue on both ends of the queue. The struct is similar to a Vec but it stores an additional pointer to the bottom of the queue.

let mut deque = VecDeque::new();

// Add items to the deque
deque.push_back(1);
deque.push_back(2);
deque.push_front(3);
deque.front(); //3
deque.back(); //2

LinkedList

LinkedList<T> is a doubly linked list data structure that allows for efficient insertion and removal of elements at any position in the list. Each element in the list contains a value, as well as pointers to the prevous and next element in the list. Unlyke Vec<T>, linked lists do not store their elements on contiguous memory locations, which makes them less efficient for indexing and iteration but more efficient for insertion and removal elements in arbitrary positions.

let mut list = LinkedList::new();
// Add elements to the list
list.push_back(1);
list.push_back(2);
list.push_front(3);

// Remove the second element from the list
if let Some(second) = list.iter_mut().skip(1).next() {
    *second = 4;
}
list.pop_back();

Performance

get(i)insert(i)remove(i)appendsplit_off(i)
VecO(1)O(n-i)O(e-i)O(m)O(n-i)
VecDequeO(1)O(min(i, n-i))O(min(i, n-i))O(m)O(min(i, n-i))
LinkedListO(min(i, n-i))O(min(i, n-i))O(min(i, n-i))O(1)O(min(i, n-i))

Dictionaries

HashMap

HashMap<K, V> is a colleciton of key-value pairs implemented using a hash table, it allows efficient insertion, deletion and retrieval of elements based on their keys. Hash implies collisions and Rust manage collisions with Linked lists, every value is a linked list.

let mut map = HashMap::new();
map.insert("apple", 3);
map.insert("banana", 2);
map.insert("orange", 5);
map.get("banana"); //2

BTreeMap

BTreeMap<K, V> is essentially an HashMap sorted by key, the data structure is a Binary Tree. Each node in a Binary Tree contains a certain number of key-value pairs and pointers to it's child nodes, every node can have at most 2 childs, in the entire tree the childs with a lower value then the parent are stored as left-child, the childs with a greataer value are right-child. BTreeMap is a very efficient data structure that allow a fast search, insertion and deletion.

let mut map = BTreeMap::new();
map.insert(3, "foo");
map.insert(1, "bar");
map.get(&2);

Performance

getinsertremoverangeappend
HashMapO(1)O(1)O(1)N/AN/A
BTreeMapO(log(n))O(log(n))O(log(n))O(log(n))O(n+m)

Set

HashSet

HashSet<T> are like a HashMap but without value fields, sets in general are collections that can store unique values, they are very fast (as HashMap) and they are important when it is necessary to check the presence of a key (i.e. already viewed or not).

 let mut set: HashSet<i32> = HashSet::new();
set.insert(1);
set.insert(2);
set.insert(3);
set.insert(2); // This value will not be added since it already exists in the set
assert!(set.contains(&2)); // This will pass
assert!(!set.contains(&4)); // This will fail

BTreeSet

BTreeSet<T> is to BtreeMap as HashSet is to Hashmap, they store sorted unique keys.

let mut btree_set: BTreeSet<i32> = BTreeSet::new();
btree_set.insert(3);
btree_set.insert(2);
btree_set.insert(1);
btree_set.insert(2); // This value will not be added since it already exists in the set
assert!(btree_set.contains(&2)); // This will pass
assert!(!btree_set.contains(&4)); // This will fail

Performance for sets are on the same order of magnitude as the maps.

Priority queues

BinaryHeap

All structures discussed so far try to optimize main operations as insertion, deletion, search etc. If the goal is to do these operations but with max and min values the perfect solution is a BinaryHeap<T>. This structure represent a Priority Queue.

Someone could ask the difference between BinaryHeap and BTreeMap, both are binary trees but BinaryHeap is Balanced: the median value is the top parent.