A simple implementation of an XML lexer. It handles most cases. It is not a validating lexer, meaning it will happily process invalid XML without complaining.
Methods
Public Instance methods
Initialize the lexer.
[ show source ]
# File lib/syntax/lang/xml.rb, line 11
11: def setup
12: @in_tag = false
13: end
Step through a single iteration of the tokenization process. This will yield (potentially) many tokens, and possibly zero tokens.
[ show source ]
# File lib/syntax/lang/xml.rb, line 17
17: def step
18: start_group :normal, matched if scan( /\s+/ )
19: if @in_tag
20: case
21: when scan( /([-\w]+):([-\w]+)/ )
22: start_group :namespace, subgroup(1)
23: start_group :punct, ":"
24: start_group :attribute, subgroup(2)
25: when scan( /\d+/ )
26: start_group :number, matched
27: when scan( /[-\w]+/ )
28: start_group :attribute, matched
29: when scan( %r{[/?]?>} )
30: @in_tag = false
31: start_group :punct, matched
32: when scan( /=/ )
33: start_group :punct, matched
34: when scan( /["']/ )
35: scan_string matched
36: else
37: append getch
38: end
39: elsif ( text = scan_until( /(?=[<&])/ ) )
40: start_group :normal, text unless text.empty?
41: if scan(/<!--.*?(-->|\Z)/m)
42: start_group :comment, matched
43: else
44: case peek(1)
45: when "<"
46: start_group :punct, getch
47: case peek(1)
48: when "?"
49: append getch
50: when "/"
51: append getch
52: when "!"
53: append getch
54: end
55: start_group :normal, matched if scan( /\s+/ )
56: if scan( /([-\w]+):([-\w]+)/ )
57: start_group :namespace, subgroup(1)
58: start_group :punct, ":"
59: start_group :tag, subgroup(2)
60: elsif scan( /[-\w]+/ )
61: start_group :tag, matched
62: end
63: @in_tag = true
64: when "&"
65: if scan( /&\S{1,10};/ )
66: start_group :entity, matched
67: else
68: start_group :normal, scan( /&/ )
69: end
70: end
71: end
72: else
73: append scan_until( /\Z/ )
74: end
75: end