rust serde deserialization of an enum variant

Intro

For a program I'm working on I have this datastructure:

pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

This same datastructure is returned from different external JSON API's where the formatting is slightly different. I'm using serde and serde_json for deserialization. Without any special processing the following program will deserialize "CivilWar" to State::CivilWar:

#[macro_use]
extern crate serde_derive;
extern crate serde_json;

#[derive(Debug, Deserialize)]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

fn main() {
    let s = r#" "CivilWar" "#;
    let c:State = serde_json::from_str(s).unwrap();
    println!("input: {} output: State::{:?}", s, c);
}

This will output: input: "CivilWar" output: State::CivilWar.

Lowercase

The JSON format I'm deserialiazing from actually specifies the state as lowercase. This is easily accomodated by adding an annotation #[serde(rename_all = "lowercase")] to the enum:

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

Now "civilwar" will be deserialized as State::CivilWar. Of course "CivilWar" won't deserialize anymore.

Space

However some files contain "civil war" with a space in between. This will still not be mapped correctly. As we have multiple possible inputs, a simple rename will no longer suffice.

A custom implementation of Deserialize works, but is a lot of boilerplate code:

#[derive(Debug)]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

impl<'de> Deserialize<'de> for State {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s = String::deserialize(deserializer)?.to_lowercase();
        let state = match s.as_str() {
            "none" => State::None,
            "expansion" => State::Expansion,
            "war" => State::War,
            "civil war" | "civilwar" => State::CivilWar,
            ...
            other => { return Err(de::Error::custom(format!("Invalid state '{}'", other))); },
        };
        Ok(state)
    }
}

Variant deserialize_with

In principle it should be possible to make a custom deserialization function only for the offending variants (State::CivilWar and State::CivilUnrest) by introducing a variant annotation like this:

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum State {
    None,
    Expansion,
    War,
    #[serde(deserialize_with = "de_civilwar")]
    CivilWar,
    Election,
    Boom,
    Bust,
    CivilUnrest,
    Famine,
    Outbreak,
    Lockdown,
    Investment,
    Retreat,
}

fn de_civilwar<'de, D>(deserializer:D)-> Result<(), D::Error>
    where D: Deserializer<'de> {
    let s = String::deserialize(deserializer)?.to_lowercase();
    println!("found: {}", s);
    if s.as_str() == "civilwar" || s.as_str() == "civil war" {
        Ok(())
    } else {
        Err(
            de::Error::invalid_value(
                Unexpected::Str(&s),
                &r#""civil war" or "civilwar""#
            )
        )
    }
}

However using this fails with an error: invalid type: unit variant, expected newtype variant. At this point it is unclear to my why this doesn't work as it matches the documentation. To narrow it down I implemented a variant of the problem based on the test contained in serde:

#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate serde;

use serde::de::{self, Deserialize, Deserializer, Unexpected};

#[derive(Debug, PartialEq, Serialize, Deserialize)]
enum WithVariant {
    #[serde(deserialize_with = "deserialize_u8_as_unit_variant")]
    Unit,
}

fn deserialize_u8_as_unit_variant<'de, D>(deserializer: D) -> Result<(), D::Error>
where
    D: Deserializer<'de>,
{
    let n = u8::deserialize(deserializer)?;
    if n == 0 {
        Ok(())
    } else {
       Err(de::Error::invalid_value(Unexpected::Unsigned(n as u64), &"0"))
    }
}

fn main() {
    let s1 = "0";
    let i:u8 = serde_json::from_str(s1).unwrap();
    println!("i: {}", i);

    
    let s = "0";
    let c:WithVariant = serde_json::from_str(s).unwrap();
    println!("input: {} output: {:?}", s, c);
}

This fails in a different way, with the error: ExpectedSomeValue, line: 1, column: 1.

Either I'm overlooking something or there is a bug in the libraries.

Update

After some help from David Tolnay, one of authors of serde, it turns out that the enum variant deserialize_with feature is meant to be used in a different way.

For the example above from the testcase this works:

    let s = r#"{ "Unit": 0 }"#;
    let c:WithVariant = serde_json::from_str(s).unwrap();
    println!("input: {} output: {:?}", s, c);

meaning the variant needs to be contained in another structure.

Finally David offered the following elegant alternative:

use serde::de::{Deserialize, Deserializer, IntoDeserializer};

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
#[serde(remote = "State")]
pub enum State {
    Expansion,
    CivilWar,
    /* ... */
}

impl<'de> Deserialize<'de> for State {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>
    {
        let s = String::deserialize(deserializer)?;
        if s == "civil war" {
            Ok(State::CivilWar)
        } else {
            State::deserialize(s.into_deserializer())
        }
    }
}

which provides the special handling but avoids the boilerplate for the common cases.

All the example code used in this blog past can be found here.

(this was first posted on rustit.be)