This blog post is archived. I consider it outdated and not very useful anymore but since there are many who still visit these pages I've decided to keep them up.

Razor parser engine for the razor syntax highlighter

Here’s my Razor Parser class that I use in my Razor Syntax Highlighter. I’m going to post later about how I integrate this into Visual Studio using the Visual Studio SDK.

public IList<Token> Parse(ITextBuffer buffer) {
    List<Token> tokens = new List<Token>();
    CodeLanguageService languageService = CodeLanguageService.GetServiceByExtension(".cshtml");

    InlinePageParser parser = new InlinePageParser(languageService.CreateCodeParser(), new HtmlMarkupParser());

    consumer = new RazorParserConsumer();
    consumer.OnSpanStart += delegate(object s, ParsedSpanEventArgs span) {
        switch (span.span.Type) {
            case SpanType.Code:
                tokens.AddRange(RazorCSharpCodeParser.Parse(span.span.Start.AbsoluteIndex, span.span.Content).ToArray());

                if (span.span.Content.Trim().Length != 0) {
                    tokens.Add(new Token(TokenType.CodeBlock, new Span(span.span.Start.AbsoluteIndex, span.span.Length)));
                }
                break;
            case SpanType.MetaCode:
            case SpanType.Transition:
                tokens.Add(new Token(TokenType.CodeStart, new Span(span.span.Start.AbsoluteIndex, span.span.Length)));
                break;

            case SpanType.Markup:
                break;
        }
    };

    MemoryStream ms = new MemoryStream(System.Text.ASCIIEncoding.ASCII.GetBytes(buffer.CurrentSnapshot.GetText()));
    using (System.IO.StreamReader reader = new System.IO.StreamReader(ms)) {
        parser.Parse(reader, consumer);
    }

    return tokens;
}

The ITextBuffer parameter of the Parse() method is pass from Visual Studio using the EditorClassifier project. This gives us access to the text content within the editor.

There are 4 different types of blocks returned to us via the InlinePageParser: Code, Markup, MetaCode, Transitions.

Transitions

Simply escaping from html into the code world. In the case of razor that’s one of several sets of characters @, @:, @<, and <text>

MetaCode

These are characters that define a block of code. These include () and {}. for example

@{
    View.Title = "Your Title Here";
    LayoutPage = "~/Views/Shared/&amp;#95;Layout.cshtml";
}

@("The title of this page is " + View.Title)

Code

This specifies that the block of text is actually code and not markup or a transition. I’ve just used basic keyword matching and a bit of magic to add syntax highlighting. It was very basic and not intended to be a full source highlighter.

Markup

We don’t care about markup here in that we’re letting the Visual Studio default html classifiers here syntax highlight that.

Token Types

We’re only interested in a few token types to highlight cshtml to the point where it’s basically usable.

public enum TokenType {
    CodeBlock,
    CodeStart,
    Unknown,
    WhiteSpace,
    CSharpObject,
    CSharpKeyword,
    CSharpString,
    CSharpComment
}

This gives us our basic Content Types. 1. CodeBlock gives us the background color for the code. 2. CodeStart is the transition colors. 3. Unknown/Whitespace are currently unused. (Whitespace was used in a previous implementation) 4. CSharp[x] is just some very basic c# color coding

This is the heart of the Razor Syntax Highlighter. It works reasonably well. I’m sure there are lots of improvements to be made regarding the c# highlighter and the highlighter’s overall performance.

In my next post I’ll explain how I integrate the parser with the editor classifier though I’m sure several of you will already have figured it out and will be working on your own soon.

BuildStarted.com