SQL Parser - John Felix COLIBRI. |
- abstract : This recursive descent parser built from an IEBNF SQL grammar will parse a subset of Interbase SQL requests.
- key words : SQL - Parser - SQL grammar - BNF - EBNF - IEBNF - Interbase
- software used : Windows XP, Delphi 6
- hardware used : Pentium 1.400Mhz, 256 M memory, 140 G hard disc
- scope : Delphi 1 to 2005 for Windows, Kylix
- level : Delphi developer
- plan :
1 - Introduction For the analysis of the Database schema
extracted from an application's .DFMs, we had to analyze the content of several SQL statements. Basically we wanted to get the tables (FROM), the columns (SELECT) and the Master Detail links if any.
So we build a small SQL parser which allowed a quick analysis of the statements.
2 - The SQL Grammar 2.1 - The Hopeless Grammar
To build our parser, we started from a grammar, as we always do (cf the .DFM parser). SQL is quite a complex language. Not in terms of concepts, but in the way all
kind of functionalities ("features") were piled up, layer upon layer, years after years, on top of the language. Since it was supposed to be used as the sole all purpose language in Banks and Insurance companies, it has been
enriched with all kinds of warts and hacks. A full blown SQL grammar cannot exist, since the next day all kind of SQL engine providers will promptly add two or three new features, making sure that they stay incompatible with each other.
Sure standards have been proposed (SQL 92 etc), but I believe that no single established SQL engine uses any standard (implementing all and only all the standard). In addition, the puslished BNF grammars will disgust you to look at
any grammar for the rest of your life: 1000 lines of rules or more, for something which, after all, is just a simple non-procedural data manipulation language with a tiny bit of arithmetic.
To avoid being dragged into a lost battle, I only use portions of SQL grammars which serve my immediate needs. I use whatever part of SQL that my application requires, and generate the parser from this ad hoc grammar.
2.2 - The Interbase SQL grammar In this presentation, I started from the Interbase 6 SQL Reference page. This page presents an overview of the SQL language used by
Interbase, with fragments of EBNF grammar embedded in the text. Those portions were extracted, some typos removed, and the whole lot was reorganized in IEBNF
(Indented Extended Backus Naur Formalism). The result is around 519 lines long (27 K). For our application, only the data manipulation part was necessary:
SELECT, INSERT and UPDATE. So we removed the other parts
(CREATE, ALTER, GRANT etc), which yielded the grammar used in this paper, which is about 50 lines long.
2.3 - The Data Manipulation Grammar
Here is the grammar that we used:
sql= insert | select | update .
value_litteral= VALUE_LITTERAL | NAME .
integer_litteral= INTEGER . table_or_view_name= NAME .
name_view_procedure= NAME . column_name= NAME .
collation_name= NAME . alias_name= NAME .
select= F . select_expression= select .
insert= INSERT INTO table_or_view_name
[ '(' column_name { ',' column_name } ')' ]
( VALUES '(' value_litteral { ',' value_litteral } ')' | select_expression ) .
functions= average | count | max | min | sum | upper .
average= AVG '(' [ ALL | DISTINCT ] value_litteral ')' .
count= COUNT '(' '*' | [ ALL | DISTINCT ] value_litteral ')' .
max= MAX '(' [ ALL | DISTINCT ] value_litteral ')' .
min= MIN '(' [ ALL | DISTINCT ] value_litteral ')' .
sum= SUM '(' [ ALL | DISTINCT ] value_litteral ')' .
upper= UPPER '(' value_litteral ')' .
search_condition= search_value { ( OR | AND ) search_condition } .
search_value= value_litteral
( [ NOT ] ( between | like | in | compare | containing | starting )
| IS [ NOT ] NULL )
| ( ALL | SOME | ANY ) '(' select_column_list ')'
| EXISTS '(' select_expression ')'
| SINGULAR '(' select_expression ')'
| '(' search_condition ')'
| NOT search_condition .
select_one_column= select .
select_column_list= select .
between= BETWEEN value_litteral AND value_litteral .
like= LIKE value_litteral [ ESCAPE value_litteral ].
in= IN '(' value_litteral { ',' value_litteral } | select_column_list ')' .
compare= operator ( value_litteral | '(' select_one_column ')' ) .
operator= '=' | '<' | '>' | '<=' | '>=' | '<>' .
containing= CONTAINING value_litteral .
starting= STARTING [ WITH ] value_litteral .
select= SELECT [ DISTINCT | ALL ]
( '*' | functions | value_litteral { ',' value_litteral } )
FROM from_table_reference { ',' from_table_reference }
[ WHERE search_condition ]
[ GROUP BY column_name
[ COLLATE collation_name ] { ',' column_name [ COLLATE collation_name ] } ]
[ HAVING search_condition ]
[ UNION select_expression [ ALL ] ]
[ ORDER BY order_list ] .
// -- DUPS: NAME, co-recursive
from_table_reference= NAME procedure_end | joined_table .
procedure_end= [ '(' value_litteral { ',' value_litteral } ')' ] [ alias_name ] .
joined_table= ( name_view_procedure join_on | '(' joined_table ')' ) { join_on } .
join_on= join_type ( joined_table | name_view_procedure ) ON search_condition .
join_type= ( [ INNER | { LEFT | RIGHT | FULL } [OUTER] ] ) JOIN .
order_list= ( column_name | integer_litteral ) [ COLLATE collation_name ]
[ ascending_or_descending ] { ',' order_list } .
ascending_or_descending= ASC | ASCENDING | DESC | DESCENDING .
update= UPDATE table_or_view_name
SET column_name '=' value_litteral
{ ',' column_name '=' value_litteral }
[ WHERE search_condition ] . |
Just a couple of warnings: - I did not try to sort the different table, view, column and procedure NAMEs. For instance, after SELECT we accept "VALUE_LITTERAL", which naturally
includes column name, but this "VALUE_LITTERAL" is used in many other places. It would be useful to define specific non-terminals, which would indicate what kind of name or expression is allowed.
- the grammar is not LL1 (one symbol look ahead). Most conspicuously is the FROM part, where the table name, the stored procedure call and the
JOIN expression all begin with a NAME. We started to factor this NAME out, but the result was a mess. So I kept the grammar simple, and generated the parser from this ambiguous grammar. The automatically generated LL1
parser also contains this ambiguity (and possibly other) and I will try to sort this out if and when I will stumble upon a parsing error.
3 - The SQL Parser 3.1 - The Symbol definitions
We placed all the symbols in a UNIT with the following INTERFACE (partial):
type t_sql_symbol_type= (e_unknown_symbol,
e_integer_litteral_symbol,
e_double_litteral_symbol,
e_string_litteral_symbol,
e_start_punctuation,
e_multiply_symbol, // *
e_semi_colon_symbol, // ;
// ...
e_different_symbol, // <>
e_end_punctuation,
e_identifier_start,
e_SELECT_symbol,
e_INSERT_symbol,
// ...
e_UPPER_symbol,
e_identifier_symbol,
e_end_of_parse_symbol
);
t_sql_symbol_type_set= set of t_sql_symbol_type;
const k_sql_litteral_symbol_set= [e_integer_litteral_symbol,
e_double_litteral_symbol, e_string_litteral_symbol];
k_sql_litteral_and_identifier_symbol_set= k_sql_litteral_symbol_set
+ [e_identifier_symbol, e_colon_symbol];
function f_sql_symbol_type_name(p_sql_symbol_type: t_sql_symbol_type): String;
function f_sql_symbol_type_set_name(
p_sql_symbol_type_set: t_sql_symbol_type_set): String;
function f_sql_identifier_type(p_symbol_string: String): t_sql_symbol_type;
|
3.2 - The SQL Scanner Here is the Scanner definition:
c_sql_scanner= class(c_text_file)
m_sql_symbol_type: t_sql_symbol_type;
m_blank_string, m_symbol_string: String;
m_display_symbol_read: Boolean;
constructor create_sql_scanner(p_name, p_file_name: String);
function f_initialized: Boolean;
function f_read_symbol: Boolean;
procedure test_sql_scanner;
end; // c_sql_scanner |
with the following main symbol fetching function (partial):
function c_sql_scanner.f_read_symbol: Boolean; // ...
begin // f_read_symbol m_symbol_string:= '';
m_sql_symbol_type:= e_unknown_symbol; if f_end_of_text
then begin
Result:= False;
m_sql_symbol_type:= e_end_of_parse_symbol;
end else begin
get_blanks;
if m_buffer_index< m_buffer_size
then begin
case m_pt_buffer[m_buffer_index] of
'a'..'z', 'A'..'Z' : get_identifier;
'-', '0'..'9' : get_number;
'''' : get_string_litteral;
'*' : get_one_operator(e_multiply_symbol);
':' : get_one_operator(e_colon_symbol);
'.' : get_one_operator(e_point_symbol);
';' : get_one_operator(e_semi_colon_symbol);
',' : get_one_operator(e_comma_symbol);
'=' : get_one_operator(e_equal_symbol);
'<' : if (m_buffer_index+ 1< m_buffer_size)
and (m_pt_buffer[m_buffer_index+ 1]= '=')
then get_two_operator(e_lower_or_equal_symbol)
else
if (m_buffer_index+ 1< m_buffer_size)
and (m_pt_buffer[m_buffer_index+ 1]= '>')
then get_two_operator(e_different_symbol)
else get_one_operator(e_lower_symbol);
'>' : if (m_buffer_index+ 1< m_buffer_size)
and (m_pt_buffer[m_buffer_index+ 1]= '=')
then get_two_operator(e_greater_or_equal_symbol)
else get_one_operator(e_greater_symbol);
'(' : get_one_operator(e_opening_parenthesis_symbol);
')' : get_one_operator(e_closing_parenthesis_symbol);
else
display_bug_stop('unknown_char_in_sql '+ m_pt_buffer[m_buffer_index]
+ '< '+ IntToStr(Ord(m_pt_buffer[m_buffer_index])));
end; // case
end
else m_sql_symbol_type:= e_end_of_parse_symbol;
Result:= True;
end; // not eof
end; // f_read_symbol |
3.3 - The SQL parser The Parser is defined by:
c_sql_parser= Class(c_basic_object)
m_c_sql_scanner_ref: c_sql_scanner;
constructor create_sql_parser(p_name: String);
procedure display_error(p_text: string);
procedure start_parsing;
function f_symbol_type: t_sql_symbol_type;
function f_symbol_string: String;
procedure read_symbol;
procedure check(p_sql_symbol_type: t_sql_symbol_type);
procedure check_set(p_sql_symbol_type_set: t_sql_symbol_type_set);
procedure parse_sql;
end; // c_sql_parser |
As an example, here is the handling of INSERT INTO :
PROCEDURE parse_insert; BEGIN
check(e_INSERT_symbol); read_symbol;
check(e_INTO_symbol); read_symbol;
parse_table_or_view_name;
IF f_symbol_type= e_opening_parenthesis_symbol
THEN BEGIN read_symbol;
parse_column_name;
WHILE f_symbol_type= e_comma_symbol DO
BEGIN read_symbol;
parse_column_name;
END; // WHILE {
check(e_closing_parenthesis_symbol);
read_symbol; END; // IF [
IF f_symbol_type= e_VALUES_symbol
THEN BEGIN read_symbol;
check(e_opening_parenthesis_symbol);
read_symbol; parse_value_litteral;
WHILE f_symbol_type= e_comma_symbol DO
BEGIN read_symbol;
parse_value_litteral;
END; // WHILE {
check(e_closing_parenthesis_symbol);
read_symbol; END
ELSE
IF f_symbol_type= e_SELECT_symbol
THEN parse_select_expression
ELSE display_error('VALUES, SELECT');
END; // parse_insert |
3.4 - The Application Snapshot Let us take the following SQL request (from IbMastApp):
SELECT * FROM parts
WHERE (parts.OnOrder > parts.OnHand) | Here is a snapshot of the application while parsing the previous request:
3.5 - Mini Manual To use the parser with the actual grammar: |
eventually place your SQL request in the _data folder | | compile and execute |
| navigate to the folder containing the SQL request (_data by default) | |
click on the ASCII file to start the parsing | If you want to change the grammar (add other SQL statements): - modify the symbol definitions in u_sql_symbol_definitions
- eventually modifiy the scanner if new litterals with special syntax are introduced ("?", "$", double quotes etc)
- add the parsing procedures
4 - Download the Sources
Here are the source code files:
Those .ZIP files contain:
- the main program (.DPR, .DOF, .RES), the main form (.PAS, .DFM), and any other auxiliary form
- any .TXT for parameters
- all units (.PAS) for units
Those .ZIP
- are self-contained: you will not need any other product (unless expressly mentioned).
- can be used from any folder (the pathes are RELATIVE)
- will not modify your PC in any way beyond the path where you placed the .ZIP
(no registry changes, no path creation etc).
To use the .ZIP: - create or select any folder of your choice
- unzip the downloaded file
- using Delphi, compile and execute
To remove the .ZIP simply delete the folder.
As usual: - please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken
links or had some problem downloading the file. Resulting corrections will be helpful for other readers
- we welcome any comment, criticism, enhancement, other sources or reference
suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
- or more simply, enter your (anonymous or with your e-mail if you want an
answer) comments below and clic the "send" button
- and if you liked this article, talk about this site to your fellow
developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.
5 - Conclusion We presented here a small SQL parser.
6 - References
7 - Other Papers with Source and Links
Database |
database reverse engineering | Extraction of the Database Schema by analyzing the content of the application's .DFMs
| sql parser | Parsing SQL requests in Delphi, starting from an EBNF grammar for SELECT, INSERT and UPDATE |
ado net tutorial |
a complete Ado Net architectural presentation, and projects for creating the Database, creating Tables, adding, deleting and updating rows, displaying the data in controls and DataGrids, using in memory DataSets, handling Views, updating the Tables with a DataGrid
| turbo delphi interbase tutorial |
develop database applications with Turbo Delphi and Interbase. Complete ADO Net architecture, and full projects to create the database, the Tables, fill the rows, display and update the values with DataGrids. Uses the BDP |
bdp ado net blobs | BDP and Blobs : reading and writing Blob fields using the BDP with Turbo Delphi |
interbase stored procedure grammar |
Interbase Stored Procedure Grammar : The BNF Grammar of the Interbase Stored Procedure. This grammar can be used to build stored procedure utilities, like pretty printers, renaming tools, Sql Engine conversion or ports |
using interbase system tables |
Using InterBase System Tables : The Interbase / FireBird System Tables: description of the main Tables, with their relationship and presents examples of how to extract information from the schema |
eco tutorial |
Writing a simple ECO application: the UML model, the in memory objects and the GUI presentation. We also will show how to evaluate OCL expressions using the EcoHandles, and persist the data on disc |
delphi dbx4 programming |
the new dbExpress 4 framework for RAD Studio 2007 : the configuration files, how to connect, read and write data, using tracing and pooling delegates and metadata handling |
blackfishsql |
using the new BlackfishSql standalone database engine of RAD Studio 2007 (Win32 and .Net) : create the database, create / fill / read Tables, use Pascal User Defined Functions and Stored Procedures |
rave pdf intraweb |
how to produce PDF reports using Rave, and have an Intraweb site generate and display .PDF pages, with multi-user access |
embarcadero er studio |
Embarcadero ER Studio tutorial: how to use the Entity Relationship tool to create a new model, reverse engineer a database, create sub-models, generate reports, import metadata, switch to Dimensional Model | | |
Web |
sql to html | converting SQL ascii request to HTML format
| simple web server |
a simple HTTP web Server and the corresponding HTTP web Browser, using our Client Server Socket library |
simple cgi web server |
a simple CGI Web Server which handles HTML <FORM> requests, mainly for debugging CGI Server extension purposes |
cgi database browser | a CGI extension in order to display and modify a Table using a Web Browser |
whois | a Whois Client who requests information about owners of IP adresses. Works in batch mode. |
web downloader |
an HTTP tool enabling to save on a local folder an HTML page with its associated images (.GIF, .JPEG, .PNG or other) for archieving or later off-line reading |
web spider | a Web Spider allowing to download all pages from a site, with custom or GUI filtering and selection. |
asp net log file |
a logging CLASS allowing to monitor the Asp.Net events, mainly used for undesrtanding, debugging and journaling Asp.Net Web applications |
asp net viewstate viewer |
an ASP.NET utility displaying the content of the viewtate field which carries the request state between Internet Explorer and the IIS / CASSINI Servers |
rss reader |
the RSS Reader lets you download and view the content of an .RSS feed (the entry point into somebody's blog) in a tMemo or a tTreeView. Comes complete with an .HTML downloader and an .XML parser |
news message tree |
how to build a tree of the NNTP News Messages. The downloaded messages are displayed in tListBox by message thread (topic), and for each thread the messages are presented in a tTreeVi"ew |
threaded indy news reader |
a NewsReader which presents the articles sorted by thread and in a logical hierarchical way. This is the basic Indy newsreader demo plus the tree organization of messages |
delphi asp net portal programming |
presentation, architecture and programming of the Delphi Asp Net Portal. This is a Delphi version of the Microsoft ASP.NET Starter Kit Web Portal showcase. With detailed schemas and step by step presentation, the Sql scripts and binaries of the Database
| delphi web designer |
a tiny Delphi "RAD Web Designer", which explains how the Delphi IDE can be used to generate .HTML pages using the Palette / Object Inspector / Form metaphor to layout the page content |
intraweb architecture |
the architecture of the Intraweb web site building tool. Explains how Delphi "rad html generator" work, and presents the CLASS organization (UML Class diagrams) |
ajax tutorial |
AJAX Tutorial : writing an AJAX web application. How AJAX works, using a JavaScript DOM parser, the Indy Web Server, requesting .XML data packets - Integrated development project |
asp net master pages |
Asp.Net 2.0 Master Pages : the new Asp.Net 2.0 allow us to define the page structure in a hierarchical way using Master Pages and Content Pages, in a way similar to tForm inheritance |
delphi asp net 20 databases |
Asp.Net 2.0 and Ado.Net 2.0 : displaying and writing InterBase and Blackfish Sql data using Dbx4, Ado.Net Db and AdoDbxClient. Handling of ListBox and GridView with DataSource components
| asp net 20 users roles profiles |
Asp.Net 2.0 Security: Users, Roles and Profiles : Asp.Net 2.0 offers a vaslty improved support for handling security: new Login Controls, and services for managing Users, grouping Users in Roles, and storing User preferences in Profiles
| bayesian spam filter |
Bayesian Spam Filter : presentation and implementation of a spam elimination tool which uses Bayesian Filtering techniques | | |
TCP/IP |
tcp ip sniffer | project to capture and display the packets travelling on the Ethernet network of your PC. |
sniffing interbase traffic |
capture and analysis of Interbase packets. Creation of a database and test table, and comparison of the BDE vs Interbase Express Delphi components |
socket programming | the simplest Client Server example of TCP / IP communication using Windows Sockets with Delphi |
delphi socket architecture |
the organization of the ScktComp unit, with UML diagrams and a simple Client Server file transfer example using tClientSocket and tServerSocket | | |
Object Oriented Programming Components |
delphi virtual constructor |
VIRTUAL CONSTRUCTORS together with CLASS references and dynamic Packages allow the separation between a main project and modules compiled and linked in later. The starting point for Application Frameworks and Plugins
| delphi generics tutorial |
Delphi Generics Tutorial : using Generics (parameterized types) in Delphi : the type parameter and the type argument, application of generics, constraints on INTERFACEs or CONSTRUCTORs | |
| UML Patterns |
the lexi editor |
delphi source code of the Gof Editor: Composite, Decorator, Iterator, Strategy, Visitor, Command, with UML diagrams |
factory and bridge patterns |
presentation and Delphi sources for the Abstract Factory and Bridge patterns, used in the Lexi Document Editor case study from the GOF book |
gof design patterns |
delphi source code of the 23 Gof (GAMMA and other) patterns: Composite, Decorator, Iterator, Strategy, Visitor, Command | | |
| Graphic |
delphi 3d designer |
build a 3d volume list, display it in perspective and move the camera, the screen or the volumes with the mouse. |
writing a flash player |
build your own ShockWave Flash movie Player, with pause, custom back and forward steps, snapshots, resizing. Designed for analyzing .SWF demos. | | |
Utilities |
the coliget search engine |
a Full Text Search unit allowing to find the files in a directory satisfying a complex string request (UML AND Delphi OR Patters) |
treeview html help viewer |
Treeview .HTML Help Viewer : the use of a Treeview along with a WebBrowser to display .HTML files alows both structuring and ordering of the help topics. This tool was used to browse the Delphi PRISM Wiki help. | |
| Delphi utilities |
delphi net bdsproj |
structure and analysis of the .BDSPROJ file with the help of a small Delphi .XML parser | dccil bat generator
| generation of the .BAT for the Delphi DCCIL command line compiler using the .BDSPROJ | dfm parser |
a Delphi Project analyzing the .DFM file and building a memory representation. This can be used for transformations of the form components |
dfm binary to text | a Delphi Project converting all .DFM file from a path from binary to ascii format |
component to code |
generate the component creation and initialization code by analyzing the .DFM. Handy to avoid installing components on the Palette when examining new libraries |
exe dll pe explorer |
presents and analyzes the content of .EXE and .DLL files. The starting point for extracting resources, spying .DLL function calls or injecting additional functionalities |
dll and process viewer |
analyze and display the list of running processes, with their associated DLLs and Memory mapped files (Process Walker) | | |
Controls |
find memo | a tMemo with "find first", "find next", "sort", "save" capabilities | | |
|
8 - The author
Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly
active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi
Xe_n migrations, refactoring), Delphi Consulting and Delph
training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP / UML, Design Patterns, Unit Testing training sessions. |