In This Article
In Chapter 4, you learned the basics of String — creating them, concatenating, and interpolating. But strings in Swift are far more sophisticated than most languages. This chapter goes deep: how characters are really stored, why you can't subscript with integers, how emoji work under the hood, and how encoding determines memory usage.
Strings as Collections
Strings in Swift are collections of Character values. This means you can iterate over them, count them, and use all the collection methods you learned about:
So far, straightforward. But what exactly is a Character? The answer is more nuanced than you might expect.
Grapheme Clusters: What a "Character" Really Is
A Swift Character is not a single Unicode code point. It's a grapheme cluster — one or more code points that together represent a single visible symbol.
Consider the letter é (e with an acute accent). It can be represented two ways:
Both strings have a count of 4 because Swift treats each grapheme cluster as a single Character, regardless of how many code points make it up.
This also applies to emoji. Many emoji are actually multiple code points combined:
Because characters have variable sizes (1 to many code points, each of which may need 1 to 4 bytes), you can't jump to the nth character by simple math. This is why string.count takes O(n) time — Swift must walk through every character to count grapheme clusters. And it's why integer subscripts don't work.
String Indexing: Why No Integer Subscripts
In most languages, string[3] gives you the 4th character. Swift deliberately doesn't support this because it would be misleading — it looks like O(1) but would actually be O(n).
Instead, Swift uses String.Index, a special opaque index type:
endIndex points after the last character, not at it. To get the last character, use index(before: endIndex). Accessing string[string.endIndex] directly crashes with a fatal error.
Equality and Canonicalization
Because the same visible character can be represented multiple ways (single code point vs. combining characters), Swift normalizes both strings before comparing. This process is called canonicalization.
Most languages would say these are different strings. Swift says they're equal because they look the same to a human. This is one of Swift's most thoughtful design decisions.
Substrings: Efficient Slicing
You can slice strings using ranges of String.Index:
The result type is Substring, not String. This is a deliberate optimization: a Substring shares memory with its parent string, so slicing costs zero extra memory.
When you need an independent String (for long-term storage or passing to APIs), convert explicitly:
Raw Strings
Sometimes you need strings with lots of backslashes or quotes — regular expressions, file paths, ASCII art. Wrapping a string in # makes it raw, disabling escape sequences and interpolation:
You can use multiple # symbols if your string itself contains #:
Character Properties
The Character type has built-in properties for inspecting what kind of character it is:
These properties are invaluable when parsing or validating text.
Encoding: UTF-8 and UTF-16
At the hardware level, strings are stored as sequences of bytes. The encoding determines how code points map to bytes (called code units).
UTF-8: Swift's internal encoding
UTF-8 uses variable-width code units (1 to 4 bytes per code point):
- 1 byte — ASCII characters (0-127). Fully compatible with C strings.
- 2 bytes — Latin extended, Greek, Cyrillic, Arabic, Hebrew (128-2047)
- 3 bytes — CJK characters, most of Unicode (2048-65535)
- 4 bytes — Emoji, rare scripts (65536+)
You can inspect UTF-8 code units through the utf8 view:
UTF-16: used by some systems
UTF-16 uses 16-bit code units. Most characters fit in one code unit (2 bytes), but emoji and rare characters need two code units (a surrogate pair). You can inspect via the utf16 view:
Swift stores strings as UTF-8 internally for the best balance of memory and performance. But the String API works at the grapheme cluster level, hiding encoding details. You only touch encoding through the utf8, utf16, and unicodeScalars views when you need to. This is one of the reasons Swift handles Unicode more correctly than most languages.
Exercises
Try These in Your Playground
- Create a string with your name. Use
index(_:offsetBy:)to extract the 3rd character. - Iterate over your name and print the Unicode scalar values for each character using
char.unicodeScalars. - Create
"caf\u{00E9}"and"cafe\u{0301}". Verify they're equal with==but have differentunicodeScalars.count. - Split a full name string (e.g., "Ada Lovelace") into first and last name using
firstIndex(of: " ")and open-ended ranges. Convert both substrings toString. - Write a function
characterCount(in text: String) -> [Character: Int]that counts occurrences of each character. - Iterate over the
utf8view of the string "Hello 🌍" and count the bytes. Then do the same withutf16. Compare the results. - Challenge: Write a function that reverses each word in a sentence without using
split. For "My dog is cute" return "yM god si etuc".
Key Points
What You Learned
- Strings are collections of
Charactervalues (grapheme clusters) - A grapheme cluster may consist of multiple Unicode code points (combining characters, emoji modifiers)
- String indexing uses
String.Index, not integers — because characters have variable sizes startIndex,endIndex, andindex(_:offsetBy:)navigate through stringsendIndexis past the last character — useindex(before:)to get the last one- Swift canonicalizes strings before comparison, so "caf\u{E9}" equals "cafe\u{301}"
- Slicing returns
Substring, which shares storage with the parentString - Convert
SubstringtoStringexplicitly when you need an independent copy - Raw strings (
#"..."#) disable escaping and interpolation - Character properties (
isASCII,isLetter,wholeNumberValue) help with parsing - UTF-8 uses 1-4 bytes per code point; UTF-16 uses 2-4 bytes; Swift stores as UTF-8 internally
- Access encoding views via
.utf8,.utf16, and.unicodeScalars
This completes the deep dive into strings. In the next chapter, we begin Section III: Building Your Own Types, starting with Structs — Swift's primary value type for modeling data.
Watch the video lessons
Our Swift Fundamentals course covers strings, Unicode, and text processing with hands-on examples in 96 video lessons.
Watch Swift Videos