Peeking data

Sometimes you want to look at the next data without consuming it.

Example, you have to match the starting of a parenthesis-delimited expression, and you want to check if one of the next characters is a ).

If so, you want the contents of the parenthesis to be consumed.

5 * ( 1 + 2 )
    ^
    you're here

To do that, you have to advance the scanner and check for each step if the scanner matches the close parenthesis.

Then you get the slice of the data between the open parenthesis and the close parenthesis.

 1 + 2

Peekable trait

Elyze uses the Peekable trait to define peekable data. Peekable data stands for data that you can look at without consuming it.

#![allow(unused)]
fn main() {
extern crate elyze;

pub trait Peekable<'a, T> {
    /// Attempt to match the `Peekable` against the current position of the
    /// `Scanner`.
    ///
    /// This method will temporarily advance the position of the `Scanner` to
    /// find a match. If a match is found, the `Scanner` is rewound to the
    /// original position and a `PeekResult` is returned. If no match is found,
    /// the `Scanner` is rewound to the original position and an `Err` is
    /// returned.
    ///
    /// # Arguments
    ///
    /// * `data` - The `Scanner` to use when matching.
    ///
    /// # Returns
    ///
    /// A `PeekResult` if the `Peekable` matches the current position of the
    /// `Scanner`, or an `Err` otherwise.
    fn peek(&self, data: &Scanner<'a, T>) -> ParseResult<PeekResult>;
}
}

This trait defines an unique peek method.

This one remains the scanner unchanged and returns a PeekResult.

The PeekResult is shield by a ParseResult because peeking can fail either by recognizing or by accepting the data. The error is propagated and left for the caller to handle it.

PeekResult

The PeekResult itself is an enumeration.

#![allow(unused)]
fn main() {
extern crate elyze;
pub enum PeekResult {
    /// The match was successful.
    Found {
        // The last index of the end slice
        end_slice: usize,
        // The size of the start element
        start_element_size: usize,
        // The size of the end element
        end_element_size: usize,
    },
    /// The match was unsuccessful.
    NotFound,
}
}

In its Found variant it embeds the last index of the end slice, the size of the start element and the size of the end element.

Example

Let's implement a Match for the closing parenthesis.

#![allow(unused)]
fn main() {
extern crate elyze; 
struct CloseParentheses;

impl Match<u8> for CloseParentheses {
    fn is_matching(&self, data: &[u8]) -> (bool, usize) {
        if data[0] == b')' {
            (true, 1)
        } else {
            (false, 0)
        }
    }

    fn size(&self) -> usize {
        1
    }
}
}

Then define something that will bear the Peekable trait.

#![allow(unused)]
fn main() {
struct ParenthesesGroup;
}

Then implement the Peekable

#![allow(unused)]
fn main() {
impl<'a> Peekable<'a, u8> for ParenthesesGroup {
    fn peek(&self, scanner: &Scanner<'a, u8>) -> ParseResult<PeekResult> {
        // create an internal scanner allowing to peek data without alterating the original scanner
        let mut inner_scanner = Scanner::new(&scanner.remaining());

        // loop on each byte until we find a close parenthesis
        loop {
            if inner_scanner.is_empty() {
                // we have reached the end without finding a close parenthesis
                break;
            }
            if CloseParentheses.recognize(&mut inner_scanner)?.is_some() {
                // we have found a close parenthesis
                return Ok(PeekResult::Found {
                    // we return the position of the close parenthesis
                    end_slice: inner_scanner.current_position(),
                    // our peeking doesn't include a start element
                    start_element_size: 0,
                    // the size of the end element is a close parenthesis of 1 byte
                    end_element_size: 1,
                });
            }

            // consume the current byte
            inner_scanner.bump_by(1);
        }

        // At this point, we have reached the end of available data without finding a close parenthesis
        Ok(PeekResult::NotFound)
    }
}
}

Its implementation is not perfect, it takes the first close parenthesis and doesn't take into account the case where there are multiple close parentheses in the case of nested parentheses, for example.

But enough to demonstrate the concept.

extern crate elyze;

fn main() -> ParseResult<()> {
    let data = b"7 * ( 1 + 2 )";
    let mut scanner = Scanner::new(data);
    scanner.bump_by(5); // consumes : 7 * (
    let result = ParenthesesGroup.peek(&scanner)?;
    if let PeekResult::Found {
        end_slice,
        end_element_size,
        ..
    } = result
    {
        println!(
            "{:?}",
            // to found the real size of enclosed data, we need to subtract the size of the end element
            String::from_utf8_lossy(&scanner.remaining()[..end_slice - end_element_size]) // 1 + 2
        );
    } else {
        println!("not found");
    }
    println!(
        "scanner: {:?}",
        // the scanner itself remains unchanged
        String::from_utf8_lossy(scanner.remaining()) // scanner: " 1 + 2 )"
    );
    Ok(())
}

Peeking

To stroll a successful peek, Elyze defines a structure called Peeking

#![allow(unused)]
fn main() {
pub struct Peeking<'a, T> {
    /// The start of the match.
    pub start_element_size: usize,
    /// The end of the match.
    pub end_element_size: usize,
    /// The length of peeked slice.
    pub end_slice: usize,
    /// The data that was peeked.
    pub data: &'a [T],
}
}

Like you can see, the Peeking struct embeds PeekResult::Found and the data slice.

peek method

This Peeking is used by the peek method.

#![allow(unused)]
fn main() {
extern crate elyze;

/// Attempt to match a `Peekable` against the current position of a `Scanner`.
///
/// This function will temporarily advance the position of the `Scanner` to find
/// a match. If a match is found, the `Scanner` is rewound to the original
/// position and a `Peeking` is returned. If no match is found, the `Scanner` is
/// rewound to the original position and an `Err` is returned.
///
/// # Arguments
///
/// * `peekable` - The `Peekable` to attempt to match.
/// * `scanner` - The `Scanner` to use when matching.
///
/// # Returns
///
/// A `Peeking` if the `Peekable` matches the current position of the `Scanner`,
/// or an `Err` otherwise.
pub fn peek<'a, T, P: Peekable<'a, T>>(
    peekable: P,
    scanner: &Scanner<'a, T>,
) -> ParseResult<Option<Peeking<'a, T>>>;
}

This one is a short syntax of using directly the Peekable::peek method.

It takes care of the arithmetic data slice for you.

extern crate elyze;
fn main() -> ParseResult<()> {
    let data = b"7 * ( 1 + 2 )";
    let mut scanner = Scanner::new(data);
    scanner.bump_by(5); // consumes : 7 * (
    
    // use peek method instead of ParenthesesGroup.peek
    let result = peek(ParenthesesGroup, &scanner)?;
    if let Some(peeking) = result {
        println!(
            "{:?}",
            // the peek_slice method returns the slice of recognized without the end element
            String::from_utf8_lossy(peeking.peeked_slice()) // 1 + 2
        );
    } else {
        println!("not found");
    }
    println!(
        "scanner: {:?}",
        // the scanner itself remains unchanged
        String::from_utf8_lossy(scanner.remaining()) // scanner: " 1 + 2 )"
    );
    Ok(())
}