Saturday, November 15, 2008

EBNF and Boost.Spirit

EBNF(Extended Backus-Naur Form) is a syntax notation used to write grammars. It distributed so that it was standardized by ISO. The Spirit is a parser library for EBNF writed in C++. It is distributed with Boost. So let introduce them with examples. In previous post we discuss normal forms,now let write the grammar of CNF using EBNF. At first we need to define proposition.

proposition ::= alpha_p alnum_p*;

alpha_p is alphabetic character, alnum_p is alphabetic or numeric character,* means zero or more repetition of alnum_p. For example "ab78" is a proposition but "7ab8" is not, because the first symbol must be a letter. The next one is a literal.

literal ::= proposition | \neg proposition;

Literal is a proposition or negation of proposition. We also need to define intermediate formula, which is a disjunction of literals, let call it term.

term ::= ( literal (⋁ literal)* );
CNF_formula ::= term (⋀ term)*;

The brackets are added to prevent ambiguity(for example A⋁BC can be interpreted as A⋁(B⋀C) or A⋁(B⋀C) ). Usually lines write at inverse sequence. At first define the main formula , and then define other rules. So the grammar of CNF in EBNF is:

CNF_formula ::= term (⋀ term)*;
term ::= ( literal (⋁ literal)* );
literal ::= proposition | \neg proposition;
proposition ::= alpha_p alnum_p*;

alpha_p and alnum_p are not defined in EBNF, I can define them, but it is not necessary because we will use this grammar in Spirit, in which those are predefined. In Spirit used modified version of EBNF, because of the limit of C++ grammar. Instead of space used in EBNF as a separator of lists, in Spirit they use >> operator. In place of ::= use =, and * operator is changed from postfix to prefix. Every line of EBNF can be stored in variables of type rule.
This is the analog of CNF grammar in Spirit. We use other symbols for connectives to write code using only ANSI symbols. &= ;|=; ~=\neg;

CNF_formula = term >> *( '&' >> term);
term = '(' >> literal >> *( '|' >> literal) >> ')';
literal = proposition | '~' >> proposition;
proposition = alpha_p >> *(alnum_p);
Now we can parse any string using CNF_formula rule.
parse(str, CNF_formula,space_p);
where the third parameter is the skip rule. We can combine those rules into one grammar class

#include <boost/spirit/include/classic_core.hpp>
using namespace BOOST_SPIRIT_CLASSIC_NS;

struct CNFGrammar : public grammar<CNFGrammar>
{
template <typename ScannerT>
struct definition
{
definition(CNFGrammar const& /*self*/)
{
CNF_formula = term >> *( '&' >> term);
term = '(' >> literal >> *( '|' >> literal) >> ')' ;
literal = proposition | '~' >> proposition;
proposition = alpha_p >> *(alnum_p);
}

rule<ScannerT> CNF_formula,term,literal,proposition;

rule<ScannerT> const&
start() const { return CNF_formula; }
};
};

And use it to write simple program which parse the input string.

int main()
{
string str;CNFGrammar g;
while (getline(cin, str))
{
if (str.empty() || str[0] == 'q' || str[0] == 'Q')
break;

if (parse(str.c_str(),g,space_p).full)
cout << "Parsing succeeded\n"; else cout << "Parsing failed\n"; } return 0; }

No comments: