NNTP News Message Tree - Felix John COLIBRI. |
- abstract : how to build a tree of the NNTP News Messages. The downloaded messages are displayed in tListBox by message thread (topic), and for each thread the messages are presented in a tTreeView
- key words : NNTP - News Messages - newsgroups - message tree - tTreeView
- software used : Windows XP, Delphi 6
- hardware used : Pentium 2.800Mhz, 512 M memory, 140 G hard disc
- scope : Delphi 1 to 2006, Turbo Delphi for Windows, Kylix
- level : Delphi developer
- plan :
1 - Displaying News Messages In this article, we will simple show how to present the messages from a news thread in the classical hierarchical presentation.
The standard NNTP protocol allows us to retrieve the news headers before donwloading the text of the message. The answer from such a command is one line per message, with the topic of the
message (the "thread"), the user, the date etc. However this line does not directly tell the hierarchy of the messages. This paper will present a structure allowing to display the message in a hierarchical way. Obviously this
services is interesting when incorporated - in a news reader
- in an off-line news reader
2 - Organizing Messages in a Thread 2.1 - The Message Tree of a thread
News is organized by forums. For instance: borland.public.delphi.oodesign borland.public.delphi.database.sqlservers
borland.public.delphi.vcl.components.using | And within each forum, members start new threads. In OODesign, we found, for instance:
How to make a class/form testable? No virtual class methods in dotnet
App Frameworks | An then other people start answering to this thread and talk to each other about this topic. When a thread is started, many people can answer. And many
other can answer in turn to those "second level" messages. The result of this is a message tree Lets take a simple example - we start the thread aaa1
- two people answer back
101_aaa
102_aaa_bbb_2 103_aaa_bbb_3 | - one person answers to the aaa_bb1
101_aaa 102_aaa_bbb_2 104_aaa_bbb_ccc_4 103_aaa_bbb_3 | - one person answers to the original message
101_aaa 102_aaa_bbb_2 104_aaa_bbb_ccc_4 103_aaa_bbb_3 105_aaa_bbb_5
|
Asking all the message from thread aaa, the NNTP Server would send back the sequential list:
101_aaa 102_aaa_bbb_2 103_aaa_bbb_3 104_aaa_bbb_ccc_4 105_aaa_bbb_5 | and our task is to build the tree from this sequential message list.
All messages are uniquely identified by a key. In our example, we used simple names for the hierarchy (aaa for level 0, bbb for level 1, ccc for level 3) and a sequential id (101, 102, 103 etc). Actual keys are not so simple. For
instance, for the Borland forum, we have keys like <44a16bf4$1@newsgroups.borland.com>
<xn0eogtbq53mmgg000-velthuis@www.teamb.com> |
Building the tree is possible because each message header contains a list of
all the keys from the thread root to the message itself. In message 104_aaa_bbb_ccc_4, for instance, we would have
104_aaa_bbb_ccc_4 <aaa_bbb_ccc_4> <aaa> <aaa_bbb_2> | which should be interpreted as: - the message is 104_aaa_bbb_ccc_4
- the key of the message is <aaa_bbb_ccc_4>
- the branch to which the message belongs is <aaa> - <aaa_bbb_2>
So we simply have to look at the key list and rebuild the tree with the corresponding structure.
2.2 - real live example As an example, here is the "Why has OODb Failed?" thread from the OODesign newsgroup
Why has OODB failed? William Egge <begge@eggcentric.com> [26809] Michael Baytalsky <mike@contextsoft.com> [26810]
John Herbster <herb-sci1_AT_sbcglobal.net> [26824]
Michael Baytalsky <mike@contextsoft.com> [26826]
Harley Pebley <harley_pebley@idahotech.com> [26825]
Franz-Leo Chomse <franz-leo.chomse@samac.de> [26811]
Wayne Niddery [TeamB] <wniddery@chaffaci.on.ca> [26812]
Andreas Dorn <adornno1@web.de> [26814]
Wayne Niddery [TeamB] <wniddery@chaffaci.on.ca> [26816]
Bob Dawson <RBDawson@prodigy.net> [26817] |
The original message header is the following (tabs have been replaced by new lines): 26809
Why has OODB failed? "William Egge" <begge@eggcentric.com> Mon, 4 Dec 2006 08:47:08 -0500
<457426d2$1@newsgroups.borland.com> 2156 39 Xref: newsgroups.borland.com borland.public.delphi.oodesign:26809 |
The first answer had the following keys: Michael Baytalsky <mike@contextsoft.com>
<45743796@newsgroups.borland.com> <457426d2$1@newsgroups.borland.com> | and you notice that - the first key is the message's own key
- the second key is the root key
If we extract only the keys of all the messages in this thread, we obtain:
Why has OODB failed? William Egge <begge@eggcentric.com> 0= <457426d2$1@newsgroups.borland.com>
Michael Baytalsky <mike@contextsoft.com> 0= <45743796@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> Franz-Leo Chomse <franz-leo.chomse@samac.de>
0= <l2i8n2t60lrkeo7ktup1usvdanlo4t3k4t@4ax.com> 1= <457426d2$1@newsgroups.borland.com>
Wayne Niddery [TeamB] <wniddery@chaffaci.on.ca> 0= <457449dc$1@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> Andreas Dorn <adornno1@web.de>
0= <45749d75$1@newsgroups.borland.com> 1= <457426d2$1@newsgroups.borland.com>
2= <457449dc$1@newsgroups.borland.com> Wayne Niddery [TeamB] <wniddery@chaffaci.on.ca>
0= <4575a30c$1@newsgroups.borland.com> 1= <457426d2$1@newsgroups.borland.com>
2= <457449dc$1@newsgroups.borland.com> 3= <45749d75$1@newsgroups.borland.com>
Bob Dawson <RBDawson@prodigy.net> 0= <45762647$1@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> 2= <457449dc$1@newsgroups.borland.com>
3= <45749d75$1@newsgroups.borland.com> 4= <4575a30c$1@newsgroups.borland.com>
John Herbster <herb-sci1_AT_sbcglobal.net> 0= <4579f72f@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> 2= <45743796@newsgroups.borland.com>
Harley Pebley <harley_pebley@idahotech.com> 0= <457a068f$1@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> 2= <45743796@newsgroups.borland.com>
Michael Baytalsky <mike@contextsoft.com> 0= <457a17ba@newsgroups.borland.com>
1= <457426d2$1@newsgroups.borland.com> 2= <45743796@newsgroups.borland.com>
3= <4579f72f@newsgroups.borland.com> |
2.3 - Parsing the Message Header
The message header is a simple string where the different parts are separated by Tabs (Chr(9)). The only surprise comes from the key list, where the different keys are - often separated by tabs
- sometimes separated by a space or two
- or not separated at all (<aaa><bbb>)
Since the "<" and ">" delimiters are present, and the next field which is a number has no "<", this is not too hard to handle.
2.4 - The Construction Algorithm
When we download a list of message headers, the messages from different threads are mixed together. So the first task is to separate the messages by thread. And within a thread, we start with the message root, and gradually add answers
to this thread using the key lists. So 1 - sort the messages by thread 2 - for each thread 22 - start with the root
22 - for each message of the thread, add the message at the bottom of its key branch Most of the time, the thread root is the message with a thread title without
"Re: " at the start. However this is not always true, and some people (or some news readers) do not systematically prepend "Re: " to an answer message. So the best test is to check the reference list: when the list contains only one key,
then this message is a thread root.
The second difficulty is that the downloading of messages is performed by requesting an ID range. In our example, we downloaded ALL available OODesign
messages, which started at ID 26145 (13 Jun 2006) and ended at ID 26827 (8 Dec 2006). The "Why did OOdb failed" were in the 26808..26826 ID range. Here is a shortened dump of some headers in the 26809..26827 range:
26809 Why has OODB failed? William Egge
26810 Re: Why has OODB failed? Michael Baytalsky
26811 Re: Why has OODB failed? Franz-Leo Chomse
26812 Re: Why has OODB failed? Wayne Niddery [TeamB]
26813 Re: SmallTalk Peter Morris
26814 Re: Why has OODB failed? Andreas Dorn
26815 Re: SmallTalk Jarle Stabell
26816 Re: Why has OODB failed? Wayne Niddery [TeamB]
26817 Re: Why has OODB failed? Bob Dawson
26818 Onion Peter Morris
26819 TTreeView.ValidateSelecti Ricardo Villela Coppola
26820 Re: SmallTalk Peter Morris
26821 Re: SmallTalk Bart
26822 Re: SmallTalk Peter Morris
26823 Re: SmallTalk Bart
26824 Re: Why has OODB failed? John Herbster
26825 Re: Why has OODB failed? Harley Pebley
26826 Re: Why has OODB failed? Michael Baytalsky
26827 Re: TTreeView.ValidateSel Bob Dawson
| If we had requested the 26812..26827 range, we would have missed the start of the "Why OOdb" thread. The best we can do is to insert dummy entries in the tree:
Why has OODB failed? --?-- [26809]
--?-- [26810] John Herbster [26824]
Michael Baytalsky [26826]
Harley Pebley [26825] --?-- [26811]
Wayne Niddery [26812] Andreas Dorn [26814]
Wayne Niddery [26816]
Bob Dawson [26817] |
2.5 - Header Surprises
Like all Internet parsing, there are many exceptions to the nicely planned parsing. There are many RFC specifications, but even more people who do not conform to those specifications. Parsing according to an RFC specifications is
not very difficult, handling all exceptions and gracefully recovering from those is another business altogether. Here are a couple of points: - In our 600 entries, we found two cases where the reference list does not
start at the root (entry 26279 and 26456, in our .ZIP sample file). The search for the relevant tree branch has therefore been adapted to locate the relevant entry point.
- in the user id and e-mail part, both elements are not always present. We
event found headers without ANY user/e-mail, which totally desynchronised the whole analysis
- in the "database_ado" newsgroup, we also found messages answers without any reference to other thread messages (entry 56642 for instance, not in the
header .ZIP)
- in addition (but we have not experienced this in our sample), if the WebMaster removes some entries (because of some rude words or for some other reason), the nice reference list could have holes, and finding a nice path
from the root to the end of a branch could become problematic.
So the original project, first tested on the OODesign newsgroup was "hardened" by analyzing ALL the Delphi newsgroups (17 Mb of .Txt, and 355.000 header
lines). Hopefully, many possible cases are now covered.
3 - The Delphi News Message Tree 3.1 - The Basic structure
To find out how to build the tree, we downloaded all the available messages from the OODesign newsgroup. This header list was saved to file, and sorted by thread title, using a tStringList container CLASS. This allowed us to
extract the key list and understand the key structure. So we grafted a second structure, with a tree organization. Here is the UML class diagram of the thread list: where
- the c_thread_list contains a tStringList of c_threads
- each c_thread contains a tStringList of c_messages
And here is the UML class diagram with the additional tree CLASS (in blue): where - each c_thread contains an m_c_root_message pointer
- each c_message_tree contains
- a reference to a c_message
- a m_c_child_message_list, wich is a tStringList of c_message_trees
So the c_message_tree is a recursive structure
3.2 - The c_news_message CLASS This is the CLASS containing each news message. Here is the definition:
c_news_message= Class(c_basic_object)
// -- m_name: the key (to find the children) // -- the number "26.827"
m_id: String; m_user_name_and_e_mail: String;
m_date: String;
// -- the NNTP key "<457a3fa6$1@newsgroups.borland.com>"
m_key: String; m_byte_count: String;
m_line_count: String; m_group_ref: String;
m_user_name, m_user_e_mail: String;
// -- 0 is m_key, 1 is the root, 2 the next, Count-1 the father
m_c_key_list: tStringList; m_is_first_in_thread: Boolean;
Constructor create_news_message(p_name: String);
function f_display_news_message: String;
function f_c_self: c_news_message;
Destructor Destroy; Override;
end; // c_news_message | and
- m_id, m_user_name_and_e_mail, m_date, m_byte_count, m_line_count, m_group_ref, m_c_key_list are directly extracted from the NNTP message header
- m_key is the first entry in the key list (the message's own key)
- m_user_name and m_user_e_mail are extracted from the m_user_name_and_e_mail header field
- m_user_name_and_e_mail is set for the first message found with this thread title (not necessarily the root, if we start downloading in the middle of a thread)
3.3 - The c_news_message_tree
The tree structure is defined by: c_news_message_tree=
Class(c_basic_object) // -- m_name: the key (to find the children)
// -- the payload m_c_news_message_ref: c_news_message;
m_c_news_message_child_list: tStringList;
Constructor create_news_message_tree(p_name: String;
p_c_news_message_ref: c_news_message);
function f_display_news_message_tree: String;
function f_c_self: c_news_message_tree;
function f_news_message_child_count: Integer;
function f_c_news_message_child(
p_news_message_index: Integer): c_news_message_tree;
function f_index_of_news_message_child(
p_news_message_name: String): Integer;
function f_c_find_by_news_message_child(
p_news_message_name: String): c_news_message_tree;
procedure add_news_message_child(
p_c_news_message_tree: c_news_message_tree);
procedure display_news_message_child_list;
Destructor Destroy; Override;
end; // c_news_message_tree | and:
- the f_xxx_count(), f_c_xxx(), f_index_of_xxx(), f_c_find_by_xxx(), add_xxx(), f_c_add_xxx() are our traditional tStringList encapsulation
methods, which should be familiar to those who already looked at many of our other projects
- here is, as an example, the recursive display method which was used above to display the tree structure of the OODb thread:
procedure c_news_message_tree.display_news_message_child_list;
procedure display_recursive(p_level: Integer;
p_c_news_message: c_news_message_tree);
var l_news_message_index: Integer; begin
with p_c_news_message do
begin
if m_c_news_message_ref= Nil
then display(f_spaces(2+ p_level* 2)+ 'bogus')
else
with m_c_news_message_ref do
display(f_spaces(2+ p_level* 2)+ m_user_name_and_e_mail
+ ' ['+ m_id+ ']');
for l_news_message_index:= 0 to f_news_message_child_count- 1 do
display_recursive(p_level+ 1,
f_c_news_message_child(l_news_message_index));
end; // with p_c_news_message
end; // display_recursive
begin // display_news_message_child_list display_recursive(0, Self);
end; // display_news_message_child_list |
3.4 - The c_news_thread_message CLASS
Here is the definition of the CLASS representing a complete thread:
c_news_thread= // one thread= serveral messages Class(c_basic_object)
// -- m_name: the thread title (the topic) // -- the linear structure
m_c_news_message_list: tStringList; // -- the message tree
m_c_root_news_message: c_news_message_tree;
// -- the root of the thread was not retrieved (too old)
m_has_root_message: Boolean;
Constructor create_news_thread(p_name: String);
function f_display_news_thread: String;
function f_c_self: c_news_thread;
// -- the linear structure
function f_news_message_count: Integer;
function f_c_news_message(p_news_message_index: Integer): c_news_message;
function f_index_of(p_news_message_name: String): Integer;
function f_c_find_by_news_message(p_news_message_name: String): c_news_message;
procedure add_news_message(p_news_message_name: String;
p_c_news_message: c_news_message);
function f_c_add_news_message(p_news_message_name: String): c_news_message;
procedure display_news_message_list; // -- add to the tree
function f_c_find_in_tree(p_key: String): c_news_message_tree;
procedure add_news_thread_message_to_tree(p_c_news_message: c_news_message);
procedure fill_treeview(p_c_treeview_ref: tTreeView);
Destructor Destroy; Override;
end; // c_news_message_list | and:
3.5 - The c_news_thread_list CLASS The main CLASS is defined by:
c_news_thread_list= Class(c_basic_object)
m_c_news_thread_list: tStringList;
Constructor create_news_thread_list(p_name: String);
function f_news_thread_count: Integer;
function f_c_news_thread(p_news_thread_index: Integer): c_news_thread;
function f_index_of(p_news_thread_name: String): Integer;
function f_c_find_by_news_thread(p_news_thread_name: String): c_news_thread;
procedure add_news_thread(p_news_thread_name: String;
p_c_news_thread: c_news_thread);
function f_c_add_news_thread(p_news_thread_name: String): c_news_thread;
procedure display_news_thread_list;
procedure display_news_thread(p_thread_title: String; p_tree_or_list: Boolean);
procedure load_from_file(p_full_file_name: String);
Destructor Destroy; Override;
end; // c_news_thread_list | and: - the main method is the load_from_file PROCEDURE:
procedure c_news_thread_list.load_from_file(p_full_file_name: String);
var l_list_index: Integer;
l_the_line: String;
l_index, l_length: Integer;
function f_extract_tab: String; // ...
procedure extract_keys(p_c_news_message: c_news_message);
// ... function f_extract: String;
// ... var l_c_news_thread: c_news_thread;
l_id, l_thread_name: String;
l_name, l_date: String;
l_c_news_message: c_news_message;
l_is_first_in_thread: Boolean; begin // load_from_file
with tStringList.Create do
begin LoadFromFile(p_full_file_name);
for l_list_index:= 0 to Count- 1 do
begin
l_the_line:= Trim(Strings[l_list_index]);
l_index:= 1;
l_length:= Length(l_the_line);
l_id:= f_string_extract_non_blank(l_the_line, l_index);
l_thread_name:= f_extract_tab;
l_is_first_in_thread:= False;
if Copy(l_thread_name, 1, 4)= 'Re: '
then begin
System.Delete(l_thread_name, 1, 4);
l_c_news_thread:= f_c_find_by_news_thread(l_thread_name);
if l_c_news_thread= Nil
then begin
l_c_news_thread:= f_c_add_news_thread(l_thread_name);
l_is_first_in_thread:= True;
end;
end
else begin
l_c_news_thread:= f_c_find_by_news_thread(l_thread_name);
// -- should NOT find a prior element
if l_c_news_thread= Nil
then begin
l_c_news_thread:= f_c_add_news_thread(l_thread_name);
l_is_first_in_thread:= True;
end;
end;
l_name:= f_extract_tab;
l_date:= f_extract_tab;
l_c_news_message:= c_news_message.create_news_message(l_id);
with l_c_news_message do
begin m_id:= l_id;
m_user_name_and_e_mail:= l_name;
m_date:= l_date;
extract_keys(f_c_self);
m_key:= m_c_key_list[0];
m_byte_count:= f_extract;
m_line_count:= f_extract;
m_group_ref:= f_extract;
l_index:= 1;
m_user_name:= Trim(f_string_extract_until(m_user_name_and_e_mail,
'<', l_index));
m_user_e_mail:= Copy(m_user_name_and_e_mail, l_index,
Length(m_user_name_and_e_mail)+ 1- l_index);
m_is_first_in_thread:= l_is_first_in_thread;
if m_is_first_in_thread
then l_c_news_thread.m_has_root_message:= (m_c_key_list.Count= 1);
end; // with l_c_news_message
l_c_news_thread.add_news_message(l_id, l_c_news_message);
l_c_news_thread.add_news_thread_message_to_tree(l_c_news_message);
end; // while l_list_index Free;
end; // with tStringList end; // load_from_file |
3.6 - The complete Delphi Project The whole u_c_news_thread_list is imported in our main project, and clicking on the .TXT file containing the header list form any NNTP download creates either
the linear list, or the tree. This tree can then be displayed in a tTreeView: Here is a snapshot of the project, when we create the linear structure, and ask to display the messages of the OODb thread:
If we build the trees, the application displays this:
and if we look at the "treeview" tab: I did not have the time to read the messages, but now that the tree sytem
works, I am eager to look at the "Onion" thread. Just curious...
4 - Improvements to the Message Tree The first thing will be to write the Destroy methods. Being busy to find the adequate structure, this was left behind.
We started with the linear list, and after the analysis of the key lists, we created the tree structure. We then scrapped the linear structure, only to find out that it was convenient for displaying unexpected key lists (for instance
when the list does not start at the root). In addition, when we dowload the daily news, it is quite often to break into a thread started days, or even weeks before our download. In this case, building the tree does not make much
sense. So we included the linear list back into the structure. Reading and organizing the NNTP news headers is a fine exercise, but not very usefull. The whole UNIT obviously has to be grafted upon a on-line or
off-line news reader. Well, the Threaded Indy News Reader presents such an on-line newsreader based on the Indy NNTP component.
5 - Download the Delphi Sources Here are the source code files: - build_news_message_tree.zip: the project building the tree and
displaying it in a tTreeView, with all the used UNITs (28 K)
- borland_public_delphi_oodesign.zip: the .TXT file of the message
headers downloaded from the OODesign Delphi newsgroup. This is just an example, but the project can work on any similar file (31 K)
The .ZIP file(s) contain:
- the main program (.DPR, .DOF, .RES), the main form (.PAS, .DFM), and any other auxiliary form
- any .TXT for parameters, samples, test data
- all units (.PAS) for units
Those .ZIP
- are self-contained: you will not need any other product (unless expressly mentioned).
- for Delphi 6 projects, can be used from any folder (the pathes are RELATIVE)
- will not modify your PC in any way beyond the path where you placed the .ZIP (no registry changes, no path creation etc).
To use the .ZIP: - create or select any folder of your choice
- unzip the downloaded file
- using Delphi, compile and execute
To remove the .ZIP simply delete the folder. The Pascal code uses the Alsacian notation, which prefixes identifier by
program area: K_onstant, T_ype, G_lobal, L_ocal, P_arametre, F_unction, C_lasse etc. This notation is presented in the Alsacian Notation paper.
As usual:
- please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will
be helpful for other readers
- we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
- or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
- and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate:
the more traffic and Google references we get, the more articles we will write.
6 - The author Felix John COLIBRI works at the Pascal
Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly
active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi
Xe_n migrations, refactoring), Delphi Consulting and Delph
training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP / UML, Design Patterns, Unit Testing training sessions. |