Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
842 views
in Technique[技术] by (71.8m points)

rust - Modifying chars in a String by index

I wrote a function to titlecase (first letter capitalized, all others lowercase) a borrowed String, but it ended up being more of a hassle than it feels like it should be.

fn titlecase_word(word: &mut String) {

    unsafe {
        let buffer = word.as_mut_vec().as_mut_slice();
        buffer[0] = std::char::to_uppercase(buffer[0] as char) as u8;

        for i in range(1, buffer.len()) {
            buffer[i] = std::char::to_lowercase(buffer[i] as char) as u8;
        }
    }
}

The unsafe block is particularly undesirable. Is there a nicer way to modify String contents by index?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Update: updated for the latest Rust. As of Rust 1.0.0-alpha, to_lowercase()/to_uppercase() are now methods in CharExt trait and there is no separate Ascii type anymore: ASCII operations are now gathered in two traits, AsciiExt and OwnedAsciiExt. They are marked as unstable, so they probably can change throughout the Rust beta period.


Your code is incorrect because it access individual bytes to perform char-based operations, but in UTF-8 characters are not bytes. It won't work correctly for anything which is not ASCII.

In fact, there is no way to do this in-place correctly, because any character conversions may change the number of bytes the character occupy, and this would require full string reallocation. You should iterate over characters and collect them to a new string:

fn titlecase_word(word: &mut String) {
    if word.is_empty() { return; }

    let mut result = String::with_capacity(word.len());

    {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    *word = result;
}

(try it here)

Because you need generate a new string anyway, it is better just to return it, without replacing the old one. In this case it is also better to pass a slice to the function:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());

        for c in chars {
            result.push(c.to_lowercase());
        }
    }

    result
}

(try it here)

Also String has extend() method from Extend trait which provides a more idiomatic approach as opposed to for loop:

fn titlecase_word(word: &str) -> String {
    let mut result = String::with_capacity(word.len());

    if !word.is_empty() {
        let mut chars = word.chars();
        result.push(chars.next().unwrap().to_uppercase());
        result.extend(chars.map(|c| c.to_lowercase()));
    }

    result
}

(try it here)

In fact, with iterators it is possible to shorten it even further:

fn titlecase_word(word: &str) -> String {
    word.chars().enumerate()
        .map(|(i, c)| if i == 0 { c.to_uppercase() } else { c.to_lowercase() })
        .collect()
}

(try it here)

If you know in advance that you're working with ASCII, however, you could use traits provided by std::ascii module:

fn titlecase_word(word: String) -> String {
    use std::ascii::{AsciiExt, OwnedAsciiExt};
    assert!(word.is_ascii());

    let mut result = word.into_bytes().into_ascii_lowercase();
    result[0] = result[0].to_ascii_uppercase();

    String::from_utf8(result).unwrap()
}

(try it here)

This function will fail if the input string contains any non-ASCII character.

This function won't allocate anything and will modify string contents in-place. However, you can't write such function with a single &mut String argument without unsafe and without extra allocations because it would require moving out from &mut, and this is disallowed.

You could use std::mem::swap() and a temporary variable with an empty string, though - it won't require unsafe but it may require an allocation of the empty string. I don't remember if it actually does need an allocation; if not, then you can write such a function, though the code will be somewhat cumbersome. Anyway, &mut-arguments are not really idiomatic for Rust.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...