Pascal Grammar - Felix John COLIBRI. | - abstract : the Pascal ebnf grammar
- key words : ebnf, iebnf, compiler, pascal
- software used : Windows XP, Delphi 6
- hardware used : Pentium 1.400Mhz, 256 M memory, 140 G hard disc
- scope : Delphi 1 to 8 for Windows, Kylix
- level : Delphi developer
- plan :
1 - Introduction Here are a couple of Pascal Grammars, of the level of the P4 compiler from Zurich.
2 - The grammars 2.1 - The base grammar
The base grammar seems to be the following:
actual_function= FUNCTION_NAME .
actual_parameter_list= '(' actual_parameter { ',' actual_parameter } ')' .
actual_parameter= actual_value | actual_variable | actual_procedure
| actual_function . actual_procedure= PROCEDURE_NAME .
actual_value= expression . actual_variable= variable .
addition_operator= '+' | '-' | OR .
array_type= ARRAY '[' index_type { ',' index_type } ']' OF element_type .
array_variable= variable .
assignment_statement= ( variable | FUNCTION_NAME ) ':=' expression .
base_type= type .
block= declaration_part statement_part .
bound_specification= NAME '..' NAME ':' ordinal_type_identifier .
case_element= case_label_list ':' statement .
case_label_list= constant { ',' constant } .
case_statement= CASE expression OF case_element { ';' case_element } [ ';' ] END .
component_variable= indexed_variable | field_designator | file_buffer .
compound_statement= BEGIN statement_sequence END .
conditional_statement= if_statement | case_statement .
conformant_array_schema= packed_conformant_array_schema | unpacked_conformant_array_schema .
constant_definition_part= CONST constant_definition ';' { constant_definition ';' } .
constant_definition= NAME '=' constant .
constant= [ '+' | '-' ] ( CONSTANT_NAME | number ) | STRING .
declaration_part= [ label_declaration_part ] [ constant_definition_part ]
[ type_definition_part ] [ variable_declaration_part ]
directive= FORWARD .
element_list= [ expression { ',' expression } ] .
element_type= type .
entire_variable= VARIABLE_NAME | FIELD_NAME .
enumerated_type= '(' identifier_list ')' .
expression_list= expression { ',' expression } .
expression= simple_expression [ relational_operator simple_expression ] .
factor= NUMBER | STRING | NIL | CONSTANT_NAME | set | variable
| function_designator | '(' expression ')' | NOT factor .
field_designator= record_variable '.' FIELD_NAME .
field_list= [ ( fixed_part [ ';' variant_part ] | variant_part ) [ ';' ] ] .
field_width= expression .
file_buffer= file_variable '^' .
file_component_type= type .
file_type= FILE OF file_component_type .
file_variable= variable . final_expression= expression .
fixed_part= record_section { ';' record_section } .
for_statement= FOR VARIABLE_NAME ':=' initial_expression ( TO | DOWNTO )
final_expression DO statement .
formal_parameter_list= '(' formal_parameter_section { ';' formal_parameter_section } ')' .
formal_parameter_section= value_parameter_section | variable_parameter_section
| procedure_parameter_section | function_parameter_section .
fraction_length= expression .
function_declaration= function_heading ';' ( block | directive ) .
function_designator= FUNCTION_NAME [ actual_parameter_list ] .
function_heading= FUNCTION NAME [ formal_parameter_list ] ':' result_type .
function_parameter_section= function_heading .
goto_statement= GOTO label .
identifier_list= NAME { ',' NAME } .
if_statement= IF expression THEN statement [ ELSE statement ] .
index_type= simple_type .
indexed_variable= array_variable '[' expression_list ']' .
initial_expression= expression . integer_number= NUMBER .
label_declaration_part= LABEL label { ',' label } ';' .
label= NUMBER . lower_bound= constant .
multiplication_operator= '*' | '/' | DIV | MOD | AND .
number= integer_number | real_number .
ordinal_type_identifier= TYPE_NAME .
output_list= output_value { ',' output_value } .
output_value= expression [ ';' field_width [ ':' fraction_length ] ] .
packed_conformant_array_schema= PACKED ARRAY '[' bound_specification ']' OF TYPE_NAME .
parameter_type= TYPE_NAME | conformant_array_schema .
pointer_type= '^' TYPE_NAME .
pointer_variable= variable . procedure_and_function_declaration_part .
procedure_and_function_declaration_part= { ( procedure_declaration
| function_declaration ) ';' } .
procedure_declaration= procedure_heading ';' ( block | directive ) .
procedure_heading= PROCEDURE NAME [ formal_parameter_list ] .
procedure_parameter_section= procedure_heading .
procedure_statement= PROCEDURE_NAME [ actual_parameter_list ] .
program_heading= PROGRAM NAME '(' identifier_list ')' ';' .
program= program_heading block '.' .
real_number= NUMBER .
record_section= identifier_list ':' type .
record_type= RECORD field_list END .
record_variable= variable .
referenced_variable= pointer_variable '^' .
relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
repeat_statement= REPEAT statement_sequence UNTIL expression .
repetitive_statement= while_statement | repeat_statement | for_statement .
result_type= TYPE_NAME .
set_type= SET OF base_type .
set= '[' element_list ']' .
simple_expression= [ '+' | '-' ] term { addition_operator term } .
simple_statement= [ assignment_statement | procedure_statement | goto_statement ] .
simple_type= subrange_type | enumerated_type .
statement_part= BEGIN statement_sequence END .
statement_sequence= statement { ';' statement } .
statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
structured_statement= compound_statement | repetitive_statement
| conditional_statement | with_statement .
structured_type= [ PACKED ] unpacked_structured_type .
subrange_type= lower_bound '..' upper_bound .
tag_field= [ NAME ':' ] .
term= factor { multiplication_operator factor } .
type_definition_part= TYPE type_definition ';' { type_definition ';' } .
type_definition= NAME '=' type .
type= simple_type | structured_type | pointer_type | TYPE_NAME .
unpacked_conformant_array_schema= ARRAY '[' bound_specification
{ ';' bound_specification } ']' OF ( TYPE_NAME | conformant_array_schema ) .
unpacked_structured_type= array_type | record_type | set_type | file_type .
upper_bound= constant .
value_parameter_section= identifier_list ':' parameter_type .
variable_declaration_part= VAR variable_declaration ';' { variable_declaration ';' } .
variable_declaration= identifier_list ':' type .
variable_parameter_section= VAR identifier_list ':' parameter_type .
variable= entire_variable | component_variable | referenced_variable .
variant_part= CASE tag_field TYPE_NAME OF variant { ';' variant } .
variant= case_label_list ':' '(' field_list ')' .
while_statement= WHILE expression DO statement .
with_statement= WITH record_variable { ',' record_variable } DO statement .
|
The Wilson and Addyman book ordered the productions alphabetically, and we did the same above. Other authors
(Fitzpatrick ) presented the productions grouped by topic (types, statements etc), and other (Jobst ) tried to isolate the
LL1 part from the non-LL1 parts. Also the traditional Pascal Grammars also add the definition of identifier, integer, real etc. Since this is handled by the scanner we removed the corresponding productions.
In the above description, we found that - some productions are never called (output_xxx)
- there remain left recursions (variable -> component_variable ->
field_designator -> record_variable -> variable)
2.2 - Another P4 version So we used our standard Indentation machine to remove unreachable productions and remove left recursion.
While we were at it, we tried to make small steps in the direction of Turbo: - we removed the conformant_array stuff. This was an english trial to add variable size arrays to Pascal, by pushing them on the stack instead of
allocating them at compile time
- we replaced the structure parameter compatibility of WIRTH with type name compatibility of Turbo.
In UCSD, you could describe a type in a procedure declaration:
VAR g_structure: RECORD
name: String;
amount: ARRAY[1..3] OF Integer;
END;
PROCEDURE compute(p_structure: RECORD name: String;
amount: ARRAY[1..3] OF Integer END);
BEGIN // ... END; BEGIN
compute(g_structure); END. | To check compatibility with the caller, the compiler had to recursively
check the parts of the declaration. Turbo forced the programmer to define a type name, and we can only use this name in the parameter types:
TYPE t_strusture= RECORD
name: String;
amount: ARRAY[1..3] OF Integer;
END;
VAR g_structure: t_structure;
PROCEDURE compute(p_structure: t_structure); BEGIN
// ... END; BEGIN
compute(g_structure); END. |
- in the same vein, we removed the procedural parameters, and added instead the procedural types, which are included in Turbo
- WIRTH force a fixed LABEL, CONST, TYPE, VAR, PROCEDURE order.
Turbo allowed to mix those "compiling" zones at will. So we replaced the fixed order declaration with a loop
- the original FILE mechanism used a "file buffer" (defined in the file_buffer production above):
VAR my_file: FILE OF Integer;
BEGIN Put(my_file^); Get(my_file^);
END. | Turbo nicely replaced this with a generalized Read and Write, which in addition removed the tricky "first record pre fetching" issue at the
semantic level. So we also removed the file_buffer production.
Here our modified grammar:
program= program_heading block '.' .
identifier_list= NAME { ',' NAME } .
program_heading= PROGRAM NAME '(' identifier_list ')' ';' .
block= declaration_part statement_part .
label= NUMBER .
formal_parameter_list= '(' formal_parameter_section
{ ';' formal_parameter_section } ')' .
formal_parameter_section= [ VAR ]identifier_list ':' parameter_type .
parameter_type= TYPE_NAME .
constant= [ '+' | '-' ] ( CONSTANT_NAME | NUMBER ) | STRING .
case_label_list= constant { ',' constant } .
type= simple_type | structured_type | pointer_type | procedural_type | TYPE_NAME .
simple_type= subrange_type | enumerated_type .
subrange_type= constant '..' constant .
enumerated_type= '(' identifier_list ')' .
structured_type= [ PACKED ] unpacked_structured_type .
unpacked_structured_type= array_type | record_type | set_type
| file_type .
array_type= ARRAY '[' index_type { ',' index_type } ']' OF
element_type .
index_type= simple_type .
element_type= type .
record_type= RECORD field_list END .
field_list= [ ( fixed_part [ ';' variant_part ] | variant_part ) [ ';' ] ] .
fixed_part= record_section { ';' record_section } .
record_section= identifier_list ':' type .
variant_part= CASE tag_field TYPE_NAME OF variant { ';' variant } .
tag_field= [ NAME ':' ] .
variant= case_label_list ':' '(' field_list ')' .
set_type= SET OF base_type .
base_type= type .
file_type= FILE [ OF file_component_type ] .
file_component_type= type .
pointer_type= '^' TYPE_NAME .
procedural_type= procedure_type | function_type .
procedure_type= PROCEDURE [ formal_parameter_list ] .
function_type= FUNCTION [ formal_parameter_list ] ':' TYPE_NAME .
declaration_part= { label_declaration_part | constant_definition_part |
| type_definition_part | variable_declaration_part
| procedure_and_function_declaration_part } .
label_declaration_part= LABEL label { ',' label } ';' .
constant_definition_part= CONST constant_definition ';'
{ constant_definition ';' } .
constant_definition= NAME '=' constant .
type_definition_part= TYPE type_definition ';' { type_definition ';' } .
type_definition= NAME '=' type .
variable_declaration_part= VAR variable_declaration ';'
{ variable_declaration ';' } .
variable_declaration= identifier_list ':' type .
procedure_and_function_declaration_part=
( procedure_declaration | function_declaration ) ';' .
directive= FORWARD .
procedure_declaration= procedure_heading ';' ( block | directive ) .
procedure_heading= PROCEDURE NAME [ formal_parameter_list ] .
function_declaration= function_heading ';' ( block | directive ) .
function_heading= FUNCTION NAME [ formal_parameter_list ] ':' TYPE_NAME .
statement_part= BEGIN statement_sequence END .
statement_sequence= statement { ';' statement } .
expression= F .
expression_list= expression { ',' expression } .
variable_access= ACCESS_NAME { end_access_ } .
end_access_= { array_access_ | record_access_ | '^' | function_parameters_ } .
array_access_= '[' expression_list ']' .
record_access_= '.' variable_access .
function_parameters_= '(' [ expression_list ] ')' .
actual_parameter_list= '(' actual_parameter { ',' actual_parameter } ')' .
actual_parameter= actual_value | actual_variable | actual_procedure
| actual_function .
actual_procedure= PROCEDURE_NAME .
actual_function= FUNCTION_NAME .
actual_value= expression .
actual_variable= variable_access .
expression= simple_expression [ relational_operator simple_expression ] .
relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
simple_expression= [ '+' | '-' ] term { addition_operator term } .
addition_operator= '+' | '-' | OR .
term= factor { multiplication_operator factor } .
multiplication_operator= '*' | '/' | DIV | MOD | AND .
factor= NUMBER | STRING | NIL | CONSTANT_NAME | set
| variable_access | function_designator
| '(' expression ')' | NOT factor .
function_designator= FUNCTION_NAME [ actual_parameter_list ] .
set= '[' element_list ']' .
element_list= [ expression { ',' expression } ] .
statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
simple_statement= [ assignment_statement | procedure_statement
| goto_statement ] .
assignment_statement= ( variable_access | FUNCTION_NAME ) ':=' expression .
procedure_statement= PROCEDURE_NAME [ actual_parameter_list ] .
goto_statement= GOTO label .
structured_statement= compound_statement | repetitive_statement
| conditional_statement | with_statement .
compound_statement= BEGIN statement_sequence END .
repetitive_statement= while_statement | repeat_statement
| for_statement .
while_statement= WHILE expression DO statement .
repeat_statement= REPEAT statement_sequence UNTIL expression .
for_statement= FOR VARIABLE_NAME ':=' initial_expression
( TO | DOWNTO ) final_expression DO statement .
initial_expression= expression .
final_expression= expression .
conditional_statement= if_statement | case_statement .
if_statement= IF expression THEN statement [ ELSE statement ] .
case_statement= CASE expression OF case_element { ';' case_element }
[ ';' ] END .
case_element= case_label_list ':' statement .
with_statement= WITH variable_access { ',' variable_access } DO
statement . . |
Please note that
- we did not try to compact this grammar in any way. For instance, "parameter_type= TYPE_NAME ." could easily be removed. Such "useless" productions are often kept in order to generate a dedicated parsing
procedure which will contain important handling (type checking, address computations etc) in later stages of the compilation. So the degree of redundancy depends on the later stages of the compilation (which were not detailed here)
- the grammar is still not LL1 (mainly many NAME duplicate FIRSTs), but at least is is no longer left recursive.
2.3 - What's next The next step could include
- ELSE in the CASE, BREAK and CONTINUE
- UNITs: this necessitates the extraction of the declaration part from the block, since the INTERFACE part breaks the nice block recursive
structure
- constant expression: to solve this, we have to extract the expression from the statement part to make it available at the declaration level
- objects:
- simple objects (like Turbo 5.5 to Turbo 7) are reasonably easy to handle. After all, they only are some kind of special RECORDs (from a syntactic point of view). Security levels (PRIVATE) are also easy.
- Delphi kind of CLASSes require the PROPERTY mechanism, the CLASS references, the PROCEDURE OF OBJECT. And this level hides some nasty surprises (words like READ which are not keywords, but
nevertheless have very special meaning, or the changing ";" rules in the attributes or directives parts)
- the Delphi INTERFACE (in the COM sense) also forces another handful of productions
3 - References Here are a couple of links:
4 - Comments
As usual: - please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken
links or had some problem downloading the file. Resulting corrections will be helpful for other readers
- we welcome any comment, criticism, enhancement, other sources or reference
suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
- or more simply, enter your (anonymous or with your e-mail if you want an
answer) comments below and clic the "send" button
- and if you liked this article, talk about this site to your fellow
developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.
5 - The author Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object
Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly active in the area of custom software
development (new projects, maintenance, audits, BDE migration, Delphi Xe_n migrations, refactoring), Delphi Consulting and Delph training. His web site features tutorials, technical papers about programming with full downloadable source
code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP / UML, Design Patterns, Unit Testing training sessions.
|