Tk Html Widget Revitalization Project Requirements
DRAFT

1. Overview

This document contains requirements for the revitalization of the Tk html software. The goal of this project is to produce a second generation of the Tk html widget that provides a superior alternative to existing html rendering frameworks such as Gecko [1] or KHTML [2]. The widget should be smaller, faster and more easily extensible than other frameworks.

The deliverables for the project are:

It is envisaged that the revitalized html widget be eventually integrated into the Tcl core.

REQ00010

All code, user documentation and test artifacts produced by the project shall be governed by a license compatible with that of the TCL core.

REQ00020

Signed copyright releases from all contributors shall be held by Hwaci.

2. Code Compliance

REQ01010

All C-code in the package shall compile using gcc version 3.3 Linux-ELF, mingw or OS-X targets. The package shall run on all Linux, Windows or OS-X runtime platforms supported by Tk.

It will also run just as well on any Tk platform with any compiler capable of building Tcl and Tk, but only the above are required.
REQ01020

The package shall be compatible with Tcl/Tk 8.4 and later.

REQ01030

The source tree organization and build system used by the package shall be TEA (Tcl Extension Architecture) compliant.

REQ01040

The TCL and C code in the package shall comply with the specifications in the Tcl Style Guide [3] and Tcl/Tk Engineering Manual [4], respectively.

REQ01050

The package shall use modern Tcl interfaces, such as Tcl_Obj.

3. Functionality

This section defines the functionality required from the html widget. For the purposes of formulating requirements, the functionality of the widget is divided into three categories, as follows:

3.1. Document Processing

3.1.1. Parsing

REQ02010

The html widget shall parse web documents. The widget shall support documents conforming to the standards:

  • HTML 4.01,
  • XHTML 1.1,
  • CSS 2.1.

REQ02020

The widget shall be tolerant of errors in HTML documents in a similar way to existing html engines.

In this context, "existing html rendering engines" refers to Gecko [1] and Internet Explorer [5]. Where Gecko and Internet Explorer produce substantially different results, the html widget will emulate the more elegant.
REQ02030

Incremental parsing of documents shall be supported. It shall be possible to render and query partially parsed documents.

This means a document can be passed to the widget in parts, perhaps while waiting for the remainder of it to be retrieved over a network.

3.1.2. Document API

This section contains requirements for an API to query and manipulate a parsed document structure. Although there are likely to be other users, it is anticipated that this API will be primarily used to implement DOM compatible interfaces (i.e. for javascript).

REQ02040

The html widget shall expose an API for querying and modifying the parsed document as a tree structure. Each node of the tree structure shall be either an XML tag, or a string. The attributes of each tag shall be available as part of the tag node.

For example, the HTML fragment:
    <p class="normal">The quick <b>brown</b> 
    fox <i>jumped <b>over</b> the</i> ...</p>
  
will be exposed as the tree structure:

Tree structure
REQ02050

It shall be possible to obtain a reference to the root node of a document tree.

REQ02060

The html widget shall support querying for a list of document nodes by any combination of the following criteria:

  • Tag type,
  • Whether or not a certain attribute is defined,
  • Whether or not a certain attribute defined and set to a certain value.

REQ02070

Given a reference to a document tree node, it shall be possible to obtain a reference to the parent node or to any child nodes (if present).

REQ02080

Given a node reference, it shall be possible to query for the node tag and attributes (if the node is an XML tag), or text (if the node is a string).

REQ02090

It shall be possible to create and insert new nodes into any point in the document.

REQ02100

The html widget shall support modification of the tag, attributes or string value of existing nodes.

REQ02110

It shall be possible to delete nodes from the document.

REQ02120

If a document currently rendered to a window is modified, then the display shall automatically update next time the process enters the Tk event loop.

REQ02130

During parsing, the widget shall support invoking a user supplied scripts when nodes matching specified criteria are added to the document. Criteria are as defined for requirement 2060. The script shall be passed a reference that may be used to query, modify or delete the new node.

3.1.3. Stylesheets

This section contains requirements for an API to query and manipulate the various style sheets contained within a document. Also requirements describing the way the widget may co-operate with the application to support linked style sheets.

REQ02140

The html widget shall support CSS and CSS2 stylesheets only.

If support for a future stylesheet format is required, this may be implemented by transforming the new stylesheet format to one of the supported types.
REQ02150

It shall be possible to supply the html widget with a script that is invoked whenever an unrecognized stylesheet format is encountered. The script shall have the power to modify the content and content type of the stylesheet.

For example to support a future standard CSS3, the application may supply a script to transform CSS3 specifications into CSS2 format.
REQ02160

It shall be possible to retrieve an ordered list of stylesheets specified in the <head> section of a document.

REQ02170

The html widget shall supply an interface to add and remove stylesheets to and from a document.

This is the same as adding and removing nodes from a document <head> section.
REQ02180

It shall be possible to supply the html widget with a script to be invoked whenever a linked stylesheet is required. The script shall have the option of returning the required stylesheet synchronously, asynchronously or not at all.

In this context, "linked stylesheets" refers to an external stylesheet refered to by a <link> tag or an @import directive within another stylesheet.

3.1.4. Printing

REQ02190

The html widget shall support rendering to postscript.

3.2. Document Rendering

REQ03010

The html widget shall render parsed web documents in a Tk window, producing results consistent with existing html rendering engines. The widget shall support documents conforming to:

  • HTML 4.01,
  • XHTML 1.1,
  • CSS 2.1.

In this context, "existing html rendering engines" refers to Gecko [1] and Internet Explorer [5]. Where Gecko and Internet Explorer produce substantially different results, the html widget will emulate the more elegant.
REQ03020

The html widget shall provide an interface to re-render the current document. When a document is re-rendered, existing images, fonts, applets and form elements shall be destroyed and re-requested from the various callbacks.

3.2.1. Hyperlinks

REQ03030

An application shall be able to register a callback script that is invoked when a user clicks on a hyperlink in an html document. All attributes of the hyperlink markup shall be available to the callback script.

The 'href' field value available to the callback script may be an internal, relative or absolute URI. Dealing with this is in the application domain.
REQ03040

It shall be possible to configure the colors that visited and unvisited hyperlinks are rendered in.

REQ03050

It shall be possible to configure whether or not hyperlinks are rendered underlined.

REQ03060

An application shall have the option of supplying a script to be executed by the html widget to determine if a hyperlink should be rendered in the visited or unvisited color. The script shall have access to the href field of the hyperlink.

Presumably, an implementation would query a database of previously visited URLs to determine if the link should be colored as visited or unvisited.

3.2.2. Tables and Lines

REQ03070

It shall be possible to configure the html widget to render horizontal lines and table borders with solid lines or 3-D grooves or ridges.

3.2.3. Fonts

REQ03080

It shall be possible to provide a callback script to the html widget to resolve fonts. If such a script is provided, it shall be invoked whenever a new font is required, specifying the font-family and size as an integer between 1 and 7. The script shall translate this into a Tk font name. If no such script is provided, a default implementation shall be used.

3.2.4. Images, Forms and Applets

This section contains requirements describing the way the html widget will divide the job of handling some complex html tags between itself and the application. Specifically, requirements for the "img", "applet", and "form" tags are specified here.

In general, the html widget itself is responsible only for rendering text and laying out document elements. To render more complex document tags, such as images or form controls, the widget invokes callback scripts supplied by the application. The callback scripts create Tk primitives (i.e. a window or image) based on the attributes of the parsed tags. These primitives are passed back to the html widget to be included in the document layout.

This approach allows the application to select the various toolkits used to implement complex functionality such as images or form controls.

REQ03090

It shall be possible to handle the <img> tag by supplying the html widget with a handler function that interprets an img tag's attributes and creates a Tk primitive to display. The html widget shall be responsible only for mapping the primitive into the rendered document.

In the above requirement, "function" can be taken to mean "Tcl script". The purpose of this is to allow the application to select the image toolkit to use.
REQ03100

If the <img> tag handler fails to create a Tk primitive to display, the html widget shall display the "alt" text, if any, from the img tag.

REQ03110

A simple default handler for the <img> tag shall be used if the application does not provide one.

REQ03120

It shall be possible to handle the <applet> tag by supplying the html widget with a script that interprets an applet tag's attributes and creates a Tk window containing the applet. The html widget shall be responsible only for mapping the applet window into the rendered document.

REQ03130

It shall be possible to handle the <form> tag by supplying the html widget with a script that interprets the attributes of form, control and label elements of a form definition and creates a Tk primitive for each. The html widget shall be responsible only for mapping the Tk primitives into the rendered document.

Implementation of all logic to submit forms is implemented by the application.

3.2.5. Frames

REQ03140

It shall be possible to handle framed documents by supplying the html widget with a handler script that interprets the attributes of a frameset structure. The html widget shall provide no support for framed documents other than parsing the document and invoking the handler.

The anticipated approach is to have the handler script render each frame of the document in a separate instance of the html widget.
REQ03150

If a framed document is parsed and no frame handler function has been supplied, the html widget shall render the document contained in the <noframes> element (if any) of the document.

3.3. Other Widget Functionality

REQ04010

The html widget shall respect Tk widget conventions, including:

  • The following standard Tk options: -background, -cursor, -exportselection, -foreground, -height, -highlightbackground, -highlightcolor, -highlightthickness, -takefocus, -width, -xscrollcommand, -yscrollcommand.
  • The configure/cget subcommands for setting and retrieving option values.
  • The options database used to configure widget options.

3.3.1. Widget Panning

REQ04020

It shall be possible to pan the widget (change the portion of the document visible). It shall be possible to specify the new region to display by:

  • Specifying a reference to a node within the rendered document identifying the top of the required region, or
  • Specifying a number of pages or units to scroll up, down or across from the current position, or
  • Specifying a fraction of the document to be left off-screen (either on the left or above the displayed region).

If the rendered html document is too large to fit in the widget window, only part of it will be displayed. By default, the widget will provide no bindings that can be used to scroll to a different part of the document. Such bindings should be implemented by the application.
REQ04030

The html widget shall support smooth-scrolling.

REQ04040

It shall be possible to receive notification that the visible region of a html document has changed.

This is required so that applications may keep any associated scroll-bar widgets up to date when an html widget is panned or resized. This might use the standard Tk -xscrollcommand, -yscrollcommand interfaces.

3.3.2. The Selection

REQ04050

The html widget shall support querying for the identifier of the leaf node that is rendered at nominated screen coordinates. If the leaf node is a text string, then the index of the character in the string shall also be available.

REQ04060

The html widget shall provide an interface to set the current selection.

The widget is not required to provide bindings to allow a region to be selected with the pointer. Such bindings may be created by an application.
REQ04070

The html widget shall provide an interface to retrieve the current selection, if any.

REQ04080

The background color used when rendering the selection shall be configurable.

3.3.3. Events

This section describes requirements for binding scripts to Tk events received by the html widget.

REQ04090

It shall be possible to bind Tcl scripts to Tk events received by the html widget.

The html widget shall be no different to built-in widgets in this respect.
REQ04100

It shall be possible to bind Tcl scripts to Tk events received when the pointer is over the rendering of a specified document node.

REQ04110

It shall be possible to bind a Tcl script to a Tk event received when the pointer is over a document node with a specified attribute defined.

For example a script may be bound to all nodes with attribute "onmouseover" defined.

4. Performance

REQ05010

The html widget shall parse and render html code as fast or faster than existing html rendering engines.

In this context, "existing html rendering engines" refers to Gecko [1] and KHTML [2].
REQ05020

The html widget shall initialize as fast or faster than existing html rendering engines.

In this context, "existing html rendering engines" refers to Gecko [1] and KHTML [2].

5. Testing

This section defines requirements for testing of the html widget. These requirements will eventually be refined into a test plan.

REQ06010

A demo TCL web browser application that uses the html widget shall be packaged along with the widget. The application shall support images, but not applets or forms.

This application will be advanced enough to use for informal testing by viewing web pages.
REQ06020

The demo application shall include bindings to manipulate the selection with the pointer.

This can be used to informally verify both the selection interface and the capability to translate between screen coordinates and token identifiers.
REQ06030

The demo application shall include scrollbars and key bindings to informally test the panning interface.

REQ06040

Automated tests shall be developed to verify the handling of frameset documents.

REQ06050

Automated tests shall be developed to test the document query and edit interfaces, with the exception of those interfaces that use screen coordinates.

This requirement applies to functions required by requirements in section 3.4.
REQ06060

The demo web browser application shall contain features that may be used to informally test the document query interfaces that use screen coordinates.

One of these feature will be the selection bindings (REQ4020).
REQ06070

An automated test shall be developed to compare the initialization and rendering speed of the html widget against the required html rendering engines.

In this context, "required html rendering engines" refers to Gecko [1] and KHTML [2].

6. Documentation

REQ07010

All options and commands that make up the Tcl interface to the html widget shall be documented. The build system will generate the same documentation as both a man page and an HTML document.

REQ07020

The Tcl demo browser application shall be written and commented so that it is suitable for reading as an example.

7. References

[1] "Mozilla Layout Engine", http://www.mozilla.org/newlayout/
[2] "KHTML - KDE's HTML library", http://developer.kde.org/documentation/library/kdeqt/kde3arch/khtml/
[3] "Tcl Style Guide", Ray Johnson, http://www.tcl.tk/doc/styleGuide.pdf
[4] "Tcl/Tk Engineering Manual", John K. Ousterhout, http://www.tcl.tk/doc/engManual.pdf
[5] "Microsoft Internet Explorer", http://www.microsoft.com/windows/ie/default.mspx
[6] "Cascading Style Sheets test suites", http://www.w3.org/Style/CSS/Test/
[7] "HTML4 Test Suite", http://www.w3.org/MarkUp/Test/HTML401/current/