Moving to XML

Author and presenter: Simon Brooke.

The full text of this presentation is online at <URL: http://www.jasmine.org.uk/~simon/bookshelf/courses/xml/>

Written May 2001; last revised 30th June 2004

Note: the original version of this text had a horrible error of understanding in the use of the org.w3c.dom package. My apologies to all those mislead by it. This remains, essentially, a three year old course; I have fixed the major error but have not otherwise brought it up to date. Bits of it are still worth reading.

Changes to the presentation since your handouts were printed are highlighted like this.

Simon Brooke, 21 Main Street, Auchencairn, DG7 1QU, Scotland.

What we're going to do today

What is XML
- A brief chautauqua on language
- What is XML
- A bit about the other bits
- XML in your context
Anatomy of an XML system
- Specifying
- Creating
- Transforming
- Communicating

How we're going to get there

The morning, mostly talking
This afternoon, mostly doing it. We have an awful lot to get in!

Breaks, meals, fire exits, nearest WCs, how to get to coffee?

Before we start: what do you know

We've got a lot of ground to cover... Just so I can know which bits to concentrate on and which bits to skip, what do you know about...

	nothing	a little	can do it	expert
HTML
Java
XML

Before we start: Namespace

A context in which there are things with names
Each thing in the namespace has a different name
You can address a thing in the namespace just by using its name
Examples
- A family
- This workshop
This is a powerful concept, and I'm going to use it a lot - but it has special meaning in XML. Be clear about when I'm talking about an 'XML namespace' and when I'm just talking about a namespace!

[get participants to write their names on bits of paper. Make sure there are not two in the class with the same name. If there are, get them to add something to their names to disambiguate]

A brief chautauqua on language

Words

Baker
- an English word
Boulangier
- not an English word

We can recognise words as belonging to a language because we know them... (sometimes, we can recognise words as belonging to a language even when we don't know them, because they sound right).

Sentences

Colourless green ideas sleep furiously.
- an English sentence.
  although it is nonsense, we can easily parse it and see that it is structurally correct
Development state with join material and.
- not an English sentence
  although there could be sensible parses in some grammars, we can easily see that in English grammar this is not structurally correct.

Language

In any given language, we can easily recognise what is a well formed component of the language.
And what is not...

Words

Fish
- an Indo-European word
Peske
Choiremheir
Chautauqua
Gkprtwcv
P7ajo

Although language families have rules about what can be in a word and what can't, it's much harder to tell whether a word is valid or not, unless we know which language we're looking at.

Sentences

This is not a pipe.
Ceci n'est pas une pipe.
Chagco vet nici yan toube.
GGGGGG #000007 cabala.

Meta-Language [1]

Within a family of languages, we can recognise what is a well-formed component of some language
or might be...
and what certainly isn't.

Meta-Language [2]

In Indo-European languages,
- a word has at least one vowel
  - Usually...
- words don't have more than four consonants in succession
  - Usually...
- a sentence is a succession of words
- a sentence starts with a capital letter and ends with a period
There is an (implicit) meta-grammar.

HTML [1]

<address>
- A valid HTML tag (in HTML 4.0 Transitional)
<cotton>
- Not a valid HTML tag

HTML [2]

HTML is a language (albeit a simple one).
It's a markup language, and I hope it's one you're all familiar with.
we can know at once whether a tag is a valid HTML tag or not...
and what it means...
and how it should be used...

Well Formedness

When we know what the language is we can parse ill-formed forms:
- been there, done that
because we can predict what the missing bits are
and where they should be:
- I have been there, and I have done that.

What is XML

Key features
Differences from HTML
Differences from SGML
A bit about the other bits
Reality check

Key Features

A universal, application-independent framework for the communication of semantically rich structured information between software agents.

A language for describing other languages
- Markup languages
Which describe the structure of a document
Not the visual appearance (CSS, XSL)
Written in simple UniCode (a sixteen-bit replacement for ASCII)

Differences from HTML

A Metalanguage: In a word, extensible.
- HTML can be (and has been) reimplemented as an XML dialect
Also, strictly parsed.

Extensible: what does this mean for you?

Allows you to define new markup.
Describing structure, not appearance.
Makes it easier for programs to extract information from your documents.

Extensible: a simple example [1]

<?xml version="1.0"?>
<!DOCTYPE meeting PUBLIC "-//WEFT//DTD MEETING 0.1//EN" 
        "meeting.dtd">
<meeting id="June Board Meeting">
  <venue>
    28 Forth Street, Edinburgh
  </venue>
  <invitees>
    <attendee attendance="required" 
        meeting-role="convenor">
      <name>
    Simon Brooke
      </name>
      <position>
    Technical Director
      </position>
    </attendee>
    <attendee attendance="required">
      <name>Angela Stormont</name>
      <position>
    Communications Director
      </position>
    </attendee> 
  </invitees>
</meeting>

Extensible: a simple example [2]

What does this do?
For the user directly, very little.
For the user's program, it allows it to isolate items of structured information and handle them in intelligent ways to help the user.
But only if the user's program understands the special markup you have defined.

Strictly parsed: what does this mean for you? [1]

Documents which are not well-formed will not be handled by an XML application. At all.
- Tags and attributes are case-sensitive;
- End tags cannot be omitted - every  must have a .
- Tags must be correctly nested:
 This won't work
- Empty tags (those which don't enclose any content) must be marked with a trailing slash like this: <xx/>

Strictly parsed: what does this mean for you? [2]

Most Web designers are sloppy.
More than ninety percent of all commercially authored Web pages do not conform to any standard and are not valid HTML.
Few if any of the commercially available WYSIWYG tools generate valid HTML.
Web authors switching to XML will need to adopt much more rigorous technical discipline.

Differences from SGML

Like HTML, simpler!
- I used to say 'much simpler', but now I'm not too sure...
Like HTML, optimised for delivery over restricted-bandwidth links.
Unlike HTML, a true subset of SGML.
All valid XML documents are valid SGML documents.
SGML tools (conforming to ISO 8879) will work with XML.
Organisations with an existing committment to SGML will find the transition to XML much simpler.

A bit about the other bits

XML is a language for describing other languages
- Most of these are application specific
- Some are very general
  - XLink: a vocabulary for linking between XML documents
  - XPath and XPointer: vocabularies for describing positions inside XML documents
  - XSL-T: a vocabulary for transforming XML documents
  - XML Schema: a vocabulary for describing vocabularies
  - SMIL (Synchonised Multimedia Integration Language): a vocabulary for integrating and synchronising multimedia presentations
  - SOAP (Simple Object Access Protocol): a vocabulary for exchanging computation requests between heterogenous agents in a network.
- All of these key standards are looked after by W3C

A bit about the other bits [ii]: XLink

In HTML, you need to use a special element (the A or Anchor tag) to be the start of a link
In XML any element can be the start of a link
- Currently, Mozilla/Netscape 6 is the only 'mainstream' browser which partly supports this
- W3C's Amaya 4 also partly supports it
- Several other demo and prototype implementations
  - Full list here
- No mainstream browser fully supports XLink

A bit about the other bits [iii]: XPath and XPointer

In HTML, you need to use a special element (the A or Anchor tag) to be the target of a link
In XML a link can target any element in the target document
- Several demo and prototype implementations
- No mainstream browser fully supports this
- You've heard this before somewhere, haven't you?

A bit about the other bits [iv]: XSL-T

'eXtensible Stylesheet Language - Transformations'
- A language for manipulating document structure
- Maps any XML dialect into any other (or even to plain text)
- Declarative, pattern matching language, conceptually like Prolog
- Extremely powerful, unquestionably useful.
- But not really a stylesheet language

Digression: Visual Appearance and Stylesheets [i]

XML documents are not necessarily or primarily intended to be viewed by people, but when they are...
The visual appearance of a document should be controlled by stylesheets.
The appearance of this one is.
In XML as in HTML you don't have to use stylesheets.
If you don't, you will get a plain, simple appearance.

If people are interested, you can open the stylesheet for this presentation, slideshow.css, in a text editor.

Visual Appearance and Stylesheets [ii]

A special stylesheet language, XSL, was conceived to support the new features of XML.
- Two parts:
  - XSL-T, the Transformation language
    - I've described this above
  - XSL-FO, formatting objects
    - A comprehensive language for descibing the fine detail of document presentation.
    - Produces prolix, semantically impoverished markup.
    - Not supported by any client yet.
    - Really a stylesheet language...
    - Of doubtful value.

Visual Appearance and Stylesheets [iii]: Status of XSL

XSL-T was adopted on 16 November 1999 as a W3C recommendation.
XSL-FO was adopted on 21 November 2000 as a W3C recommendation.
Microsoft IE5 implemented a proprietary 'XSL' which is based on an older draft of XSL; newer IE5s are migrating towards the standard.

Visual Appearance and Stylesheets [iv]: XSL Summary

Transformation language of unquestionable merit, greatly aids separating content from presentation.
Designed primarily to transform XML to XSL-FO, but can transform to any other XML dialect (including XHTML).
Recommendation:
- use XSL-T to map XML into XHTML for presentation to users, decorate with CSS
- use XSL-T to map XML to other XML dialects as needed for communication with other organisations
- ignore XSL-FO for now, except if
  1. You need pixel-perfect presentation of your documents and
  2. You work in an environment (e.g. an Intranet) where you control the client.

Visual Appearance and Stylesheets [v]: What about CSS?

You can continue to use existing CSS1 and CSS2 stylesheets.
- Probably.
- Depending on what individual client vendors decide to support...
- all (roughly) support CSS2.
This presentation is not about stylesheets.

A bit about the other bits [v]: XML Schemas

A vocabulary for defining vocabularies or 'dialects'
A bit late arriving
Replace DTDs, inherited from SGML

Digression: Dialects of XML

What is a DTD?
What about Schemas?
Do I have to use a DTD or Schema?
What DTDs and Schemas are available?
Who will write DTDs and Schemas?

What is a Document Type Definition?

Essentially, a dictionary for the language you are using.
Every Web author has heard of one
Every good Web author has seen one
- typically http://www.w3.org/TR/REC-html40/loose.dtd
Very few Web authors have written one

What about Schemas? (Schemata?)

Schemas are a new, alternate way to specify XML languages
Officially adopted by w3c on 4^th May 2001 - so still very new
Recommendation: Let someone else take the grief of getting the bugs out of it - stick with DTDs for now.

More about Schemas [i]: benefits

The schema language is itself an XML laguage, so schemas can be parsed with standard XML tools
- DTDs can't
You can specify rules for the content of elements and attributes with much finer granularity than with DTDs
- You can specify that an attribute must be a number
- You can specify minimum and maximum values for an attribute
- You can specify regular expression patterns the attribute must match

More about Schemas [ii]: examples

An attribute representing someone's age

<xsd:element name="age">
  <xsd:simpleType base="xsd:positive-integer">
    <xsd:maxExclusive value="150">
  </xsd:simpleType>
</xsd:element>

An attribute representing a UK bank sorting code (e.g. 68-59-13)

<xsd:simpleType name="sortcode" base="xsd:string">
  <xsd:pattern value="\d{2}-\d{2}-\d{2}">
</xsd:simpleType>

An attribute representing a UK grid reference (e.g. NX7951)

<xsd:simpleType name="gridref" base="xsd:string">
  <xsd:pattern value="[A-Z]{2}[0-9]{4}">
</xsd:simpleType>

The pattern specification seems to have changed at some stage in the drafting process. The examples given in Learning XML don't work with Daniel Potter's tutorial applet. Treat all tutorials with care and refer back to the formal specification!

More about Schemas [iii]: conversion

Schema has superset of the same information in a DTD
- You can convert a DTD to a schema with a PERL script
- You should be able to convert a schema to a DTD using XSL-T
  - But you might lose some information

Do I have to use a DTD or Schema?

As with HTML, you don't have to specify a DTD.
Even if you define new markup...
... but client programs won't know how to interpret your new markup unless you also define a DTD or Schema.
As with HTML, you should specify one.

What DTDs and Schemas are available?

All the XML extensions discussed in this presentation are defined as DTDs or Schemas (mostly DTDs).
Thousands of SGML DTDs are available which can relatively easily be converted.
There are already many hundreds of XML DTDs available, and the number is growing fast.
Some repositories:

Who will write DTDs and Schemas? [i]

Very specialised documents, technically demanding to write.
For most purposes, suitable examples are available.
Most XML users will never write one.

Who will write DTDs and Schemas? [ii]

Large organisations with special documentation requirements may write DTDs and/or Schemas.
Communities of organisations which wish to exchange data will probably write DTDs and/or Schemas.
Corporations which sell application programs will probably write DTDs and/or Schemas.
Corporations which sell WYSIWYG Web authoring tools will certainly write DTDs and/or Schemas.
- In future, there will be much less distinction between a word processor and a Web authoring tool.
Communities of interest with special technical needs will certainly write DTDs and/or Schemas.

Ownership of DTDs: Communities vs single vendors

Rich Site Summary (RSS) is an important XML dialect used in news syndication
The original DTDs (0.9, 0.91) was developed and 'owned' by Netscape
By April 2001, Netscape had lost interest in RSS...
- ... and deleted the DTDs from their servers
Many other people's systems broke.
My conclusion: single vendors are not to be trusted with community resources.
Since 2001 the history of RSS and its competing versions has got even more complex and bizarre. It's still a really useful tool for doing syndication, though.

Another cautionary tail about software vendors

Microsoft has a long history of 'embracing and extending' standards
- Making small changes which cause other people's implementations to break.
- Thus forcing people to use only their implementations.
MS Word 2002 saves as 'HTML' and as 'XML'
- When it saves as HTML, the HTML contains embedded 'XML' elements
- In the 'XML', the attribute values are unquoted.
  - This is explicitly forbidden by the XML standards...
  - Standards compliant XML parsers can't parse this
  - Microsoft's own XML parsers can parse this
Is this simply incompetence
- Or is it 'embracing and extending'...?

A bit about the other bits [vi]: SOAP

Simple Object Access Protocol
A vocabulary for communicating with software agents in a heterogenous network
- Using HTTP as transport
Not actually very simple...
- But this is an inherently difficult area
- Software toolkits (such as Apache Soap) will make this easier to deploy

More about SOAP

Developed from 'XML-RPC' (Dave Winer, Userland)
Three versions out there
- 0.9
- 1.0
  - submitted to IETF as a 'draft'
    - no longer available from IETF
    - but referred to in more recent IETF drafts
  - certainly incompatible with 0.9
- 1.1,
  - May 2000
  - submitted to W3C as a 'note'
  - probably incompatible with 1.0
Not (yet) a W3C recommendation, just a 'note'
Not (yet) an IETF RFC
Vapourware: not ready for prime time.
If you want to pursue this further, there's an online tutorial here.

XML in your context

Applications which will benefit greatly from XML
Applications which will benefit little from XML
XML in action: Content syndication

Applications which benefit greatly from XML

Applications exchanging structured data with other software agents.
- Accounting systems exchanging orders, invoices, payments...
- Engineering systems exchanging specifications, dimensions...
- Diary systems exchanging bookings, events, meetings, holidays...
Technical documentation applications, or applications involving special notation (e.g., mathematics, music).
Applications requiring highly detailed illustrations.
Multimedia applications.

At present, only where the audience is controlled

Applications which will benefit little from XML

Simple publishing of text, with or without simple graphics.
- 'What I did last summer'...

XML in action: content syndication

What is content syndication
History of Syndication
Standards for Syndication
Offering Syndication
Incorporating Syndication
Aggregation

What is content syndication

Making headlines from one web site available to others
Automatically
A dramatically successful public application of XML

History of Syndication

In the beginning was the ripper
1997: ScriptingNews starts promoting XML-based syndication
1999: My Netscape and Rich Site Summary 0.90
1999: ScriptingNews elements integrated by Netscape into RSS 0.91
2001: Netscape abandon Rich Site Summary

Standards for Syndication

Rich Site Summary 0.91
- Netscape, now abandoned
- Very, very simple
- Still useful
Rich Site Summary 1.0
- Much more complex
- Based on Resource Description Framework
- Extensible
- Best format to use now
Invent your own
- e.g. Slashdot.org's 'backslash'
- Easy to do...
- Not recommended!

Offering Syndication

Provide a URL on your site from which an RSS document can be pulled
- Example pulled from a flat file (static, compiled periodically) [ Wired news]
- Example pulled from a Servlet (dynamic) [PRES]
- You can do this with CGI, or any other server side content technology
Very easy to set up.

Incorporating Syndication [i]

Periodically request RSS from donor sites and transform to HTML
- Easy to do with XSL-T
Example sites
- Portaloo
- PRES

Incorporating Syndication [ii]: Sample code

<!-- sidebar sections: show title and top eight entries -->
  <xsl:template match="rss">
    <h2>
      <xsl:apply-templates select="channel/title" />
    </h2>
    <xsl:for-each select="channel/item">
      <xsl:if test="9 > position()">
    <p>
      <a>
        <xsl:attribute name="href"><xsl:value-of 
          select="link"/> 
        </xsl:attribute>
        <xsl:apply-templates select="title" />
      </a>
    </p>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>

Sample XSL code

Moreover Internet Europe headlines, processed with this XSL 22^nd May 2001

Aggregation

If you can collect headlines from multiple sources, you can search the collection with predetermined patterns, and offer personalised aggregations of news to users.
O'Reilly's Meerkat
Start of something big.

Worked Example: a meeting arranger system

We all go to meetings...
We all know what a hassle it is arranging them...
Wouldn't it be nice if the machines could do it for us?
Here's how!

Creating an example document (quite easy)

Start by typing what you want into your favourite text editor.
Invent sensible looking markup as you go along.
Don't be too casual about this
- this is a data design exercise,
- you need to think about not only what you need for this document,
- but what you might need for others.
- you need to think about all the possible uses of your document.
Here's one I did earlier.

This is a good opportunity for a whiteboard and some interaction! If possible, get the participants to do an example for themselves.

Creating the DTD and/or Schema (hard, but we'll use a trick)

DTDs and Schemas are precise, technical documents. How are we going to make them?
Pass our example page to the DTDGenerator
- [2004: unfortunately the DTD generator is no longer available]
Pass the results of that through the DTD2Schema script (requires PERL)
Tidy up the results with your text editor
Here's a DTD and a schema I did earlier.

Again, if possible, get the participants to actually do this.

Viewing it: creating a style-sheet (harder)

Two approaches to stylesheets:

CSS1:

just establishes visual styles for the actual elements in your document

XSL:

much more complex, but allows on-the-fly transformation of the document to present particular features

(Of course, you can just do without altogether)
Here's one just for the agenda.
- Here's the HTML it produces

Using it: applications

Now we need to write applications which will:

allow us to generate these documents
- not very hard, there are Java components around which semi-automate creating a form-driven special-purpose editor from a DTD...
allow our diary programs to automatically handle these documents
- much harder, but XML parser libaries are available for most modern programming languages which you can build on.
We probably won't get that far today.

Specifying

The Structure of an XML document
Exercise period [i]

The Structure of an XML document

Overall structure
Processing Instructions
XML Namespaces
Elements
Attributes
When to use which

Overall Structure

Prolog
- The XML declaration
 - <?xml version="1.0"?>
 - declares that this is XML
 - strictly, not optional
- The Document Type Declaration
 - <!DOCTYPE meeting PUBLIC "-//WEFT//DTD MEETING 0.1//EN" "meeting.dtd">
 - says what dialect of XML this is
 - optional
- Processing instructions
- Comments
Root element
- Just an element, like any other
- Just exactly one.

Processing Instructions

Special instructions for particular applications
Syntactically, delimited by <? and ?>
- <?xml version="1.0"?> is a processing instruction
- a special one
The tag-part identifies the particular application this PI is intended for
- xml means 'any XML parser'
The rest of the content is application specific

XML Namespaces

Warning: Special use of the term!
Allow mutiple XML dialects to be used in one document

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

xmlns means 'this is an XML namespace declaration'
the rest means that names starting with xsl: belong to the namespace defined as http://www.w3.org/1999/XSL/Transform
Note that the URL doesn't actually point to anything interesting, it's just a marker!

Elements [i]

Syntactically, an element is what is delimited by its tags.

An opening tag comprises a left angle bracket <, the name of the element, optionally some attribute-value pairs, and a closing angle bracket >
- <meeting id="June Board Meeting">
A closing tag comprises a left angle bracket <, a slash /, the name of the element, and a closing angle bracket >
- </meeting>
An empty tag comprises a left angle bracket <, the name of the element, optionally some attribute-value pairs, a slash /, and a closing angle bracket >; it is just shorthand for an opening tag immediately followed by the closing tag with nothing in between.

Elements [ii]

An element is a primary structural unit in the XML markup
- Has a name which is a string of characters
- may have attributes
  - some may be required
  - some optional
May allow child elements of particular kinds
- Or just text (PCDATA)
- Or neither (empty tags)
An element may have many child elements with the same name

Attributes

An attribute belongs to a particular element type
Has a name which is a string of characters
Has a value which is a string of characters
Syntactically
- name and value are separated by an equals sign =
- value is delimited by quotation marks "
An element may have only one attribute with any given name

When to use which

When you may have a value which is a complex data item, use an element
- example: agenda containing agenda items
When you may have many values of the same type, use an element.
- example: agenda item
When you may have a long simple text value, use an element
- example: title of an agenda item
When you always have just one short simple text value, use an attribute
- example: proposer of an agenda item

    <meeting id="June Board Meeting">
      <agenda>
        <item proposer="Simon Brooke">
          <title>
            Adoption of new project management
            procedures manual
          </title>
        </item>
        <item proposer="Angela Stormont">
          <title>
            Transfer of shares
          </title>
        </item>
      </agenda>
    </meeting>

Exercise period [i]

In groups, produce a DTD for an XML dialect to describe meetings
You may use the DTD generator at <URL:http://www.pault.com/Xmltube/dtdgen.html>
- [2004: unfortunately the DTD generator is no longer available]
You should think about your meetings database as you do so and have some idea of how your XML DTD relates to your database design.

Creating

Building XML applications: tools and technologies
Constructing the document
Exercise period[ii]

Building XML applications: tools and technologies

Languages for XML applications
Tools, components and toolkits
What we will be using today

Why Java?

Portable
Reasonably readable
Very well supported with XML toolkits and components
I like it...

Other languages for building XML applications

PERL
LISP
Others

Tools, components and toolkits

Parsers
Transformation engines
APIs
Where to find XML tools
- <URL:http://www.xmlsoftware.com/>
- <URL:http://www.garshol.priv.no/download/xmltools/>

Transformation engines

Apply XSL stylesheets to transform a document from one representation to another.

XML to XML
XML to HTML
XML to text

What we will be using today

Apache Xalan: XSL processor contributed to the Apache Foundation by IBM closely related to IBM's LotusXSL processor
Apache Xerces: XML parser contributed to the Apache Foundation by IBM; based on IBM's XML4J parser
SAX: Simple API for XML, by David Megginson and others
DOM: The W3C Document Object Model API
W3C Jigsaw: HTTP Server and Servlet Server developed by W3C
Jacquard: A toolkit of useful bits for sticking it all together. By me. Not neccesarily the best but it's what I know and use.

Constructing the document

Writing text to the output stream
Using the DOM

First an apology

Previous versions of this course contained a howling error at this point. It suggested creating DOM objects essentially by calling the newInstance method of the implementing classes. This only works with the particular DOM implementation you happen to be using and is not portable between DOM implementations (or even, necessarily, between successive versions of the same DOM implementation). So clearly it is very bad practice to do this.

I can only apologise to people who were mislead by this.

The Document Object Model

Standardised interface for working with XML documents
A W3C standard
Many DOM implementations

The DOM: what is a Document?

A document is just a document; an object conforming to the org.w3c.dom.Document interface
You create one by calling the createDocument method of a DomImplementation
In Jacquard you usually create one by calling the generate method of a class which implements the DocumentGenerator interface
- Usually a subclass of DocumentGeneratorImpl.

The DOM: what is an Element?

A 'tag'
With 'attributes'
- a namespace
And 'contents'
- other elements which are children of this element
- text elements
Constructed by calling the createElement( String tagName) method of the Document object
Or in Jacquard, by calling the generate method of a class which implements the NodeGenerator interface
- Usually a subclass of ElementGenerator

The DOM: what is a Text?

just text
- No tag
- No attributes
- No enclosing angle brackets

Create a document object

    // get a handle on a DOM implementation...
    DOMImplementation di = DOMStub.getDOMImplementation( context);
    // and use it to create a document object
    Document doc = di.createDocument( getNamespaceURI( context), 
						  rootName, doctype);

DOMStub is a Jacquard utility class which gets hold of whatever DOM implementation is available. If you don't use Jacquard you'll have to instantiate a DOM implementation for yourself.

Add a root ('content') element

    doc.appendChild( doc.createElement( doc, "eventsdiary"));
    Element content = doc.getDocumentElement();

Every Document must have exactly one 'content' element
If you attempt to add another child to a document which already has a child, that's an error.

Add further elements recursively as required

            // match the pattern against the convenience view and pull
            // back the rows that match as namespaces
            Contexts events =
                TableDescriptor.getDescriptor( VIEW, null, 
                                               context ).match( pattern );

            Enumeration e = events.elements(  );

            // and pass each of those namespaces in turn to my event element 
            // generator to generate children for my element
            while ( e.hasMoreElements(  ) )
                content.appendChild( eventEltGenerator.generate( doc,
                        (Context) e.nextElement(  ) ) );

This is a bit of a cheat. It depends on having a view in the database which collects together all the necessary fields for us:

---- EVENTS_VIEW -----------------------------------------------------

CREATE VIEW events_view AS
     SELECT EVENT.Actor,
            EVENT.Event,
            CATEGORY.Description AS Type,
            LOCATION.Description AS Location,
            EVENT.Eventdate,
            EVENT.Starttime,
            EVENT.Endtime,
            EVENT.Description
       FROM EVENT,
            CATEGORY,
            LOCATION
      WHERE EVENT.Location = LOCATION.Location
        AND EVENT.Category = CATEGORY.Category
   ORDER BY Eventdate,Starttime
;

Let's see that again [i] the source

public class DayView extends DocumentGeneratorImpl
{
    //~ Static fields/initializers --------------------------------------------

    /**
     * the name of the convenience view in the database from which I will
     * collect all the information I need
     */
    protected static final String VIEW = "events_view";

    /** the field in that view which represents the date of the event */
    protected static final String EVENTDATEFIELD = "when";

    //~ Instance fields -------------------------------------------------------

    /** a generate to generate the event elements which will be my children */
    protected EventElementGenerator eventEltGenerator =
        new EventElementGenerator(  );

    //~ Methods ---------------------------------------------------------------

    /**
     * generate a document containing all the events on the day implied by
     * this context
     */
    public Document generate( Context context ) throws GenerationException
    {
        DOMImplementation di = DOMStub.getDOMImplementation( context );
        Document doc = di.createDocument( "", "eventsdiary", null );

        String day = context.getValueAsString( "day" );
        uk.co.weft.dbutil.Calendar when = new uk.co.weft.dbutil.Calendar(  );

        if ( day != null )
        {
            // if we've got a date, set my calendar to that day
            // (by default it sets itself to today)
            when.setTime( java.sql.Date.valueOf( day ) );
        }

        Element content = doc.getDocumentElement(  );

        content.setAttribute( "date", when.toString(  ) );

        try
        {
            // create a new, blank, context as a pattern to match
            Context pattern = new Context(  );

            // give it the database username, password and url from the current context
            pattern.copyDBTokens( context );

            // put the date we're interested in into the pattern
            pattern.put( EVENTDATEFIELD, when );

            // match the pattern against the cnvenience view and pull
            // back the rows that match as namespaces
            Contexts events =
                TableDescriptor.getDescriptor( VIEW, null, context ).match( pattern );

            Enumeration e = events.elements(  );

            // and pass each of those namespaces in turn to my event element 
            // generator to generate children for my element
            while ( e.hasMoreElements(  ) )
                content.appendChild( eventEltGenerator.generate( doc,
                        (Context) e.nextElement(  ) ) );
        }
        catch ( DataStoreException dex )
        {
            throw new GenerationException( "Failed to read from data store: " +
                dex.getMessage(  ) );
        }

        return doc;
    }

Let's see that again [ii]: the event element generator

The event element is a simple wrapper round a context element generator:

    //~ Inner Classes ---------------------------------------------------------

    /**
     * a generator for an XML element representing a single event. This uses
     * ContextElementGenerator which knows how to construct a DOM element
     * node by taking values out of a context, so all we need to do is tell
     * it which value names to treat as attributes and which as children
     */
    class EventElementGenerator extends ContextElementGenerator
    {
        //~ Constructors ------------------------------------------------------

        /**
         * the tag of the element I generate is 'event'
         */
        public EventElementGenerator(  )
        {
            super( "event" );
        }

        //~ Methods -----------------------------------------------------------

        /**
         * return a String array of the names of my properties to output as
         * attributes
         */
        protected String[] getAttrNames(  )
        {
            String[] attrNames =
            { "event", "type", "location", "starttime", "endtime", "actor" };

            return attrNames;
        }

        /**
         * return a String array of the names of my properties to output as
         * children
         */
        protected String[] getChildNames(  )
        {
            String[] childNames = { "description" };

            return childNames;
        }
    }
}

Let's see that again: [iii] the context element generator

A class which makes a simple elements out of namespaces. Often useful
- Not part of DOM or SAX - part of my own Jacquard toolkit
- There's no particular reason to use Jacquard
ContextElmentGenerator
- A name to be used as an element name
- A list of names which are to be used as attributes
- A list of names which are to be used as child (text) elements
- Constructs element nodes to that specification
  - taking attribute and child values from a namespace passed to the generate method

Let's see that again: [iv] the output

<?xml version="1.0"?>
 <eventsdiary
  date="Jul 18, 2000">
  <event
   actor="simon"
   endtime="5:30:00 PM"
   event="19"
   location="Yokohama, Japan"
   starttime="9:00:00 AM"
   type="Otherwise unavailable">
   <description>
    Lecture, Java XML, all day
   </description>
  </event>
 </eventsdiary>

Should be online here (login required). HTML formatted view here

Exercise period [ii]

We may skip this one if time's short or the group is struggling!

In groups: Try to write a Java application or Servlet which produces at least part of an XML document to your meeting DTD from your database

Transforming

Beginning XSL-T
Exercise period [iii]

Beginning XSL-T [i] The 'stylesheet'

    
<?xml version="1.0"?>
<xsl:stylesheet version=1.0
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Basic XSL stylesheet for day view of events diary.  -->

  <xsl:output indent="yes" method="html" 
          doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>

  <xsl:template match="eventsdiary">
    <html>
      <head>
    <title>
      Diary for <xsl:value-of select="@date" />
    </title>
    <link rel="StyleSheet" href="/styles/jacquard.css" type="text/css" 
      media="screen"/>
      </head>
      <body>
    <h1>
      Diary for <xsl:value-of select="@date" />
    </h1>
    <table>
      <tr>
        <th rowspan="2">
        Who
        </th>
        <th rowspan="2">
        Where
        </th>
        <th colspan="2">
        When
        </th>
        <th rowspan="2">
        What
        </th>
        <th rowspan="2">
        Details
        </th>
        <th rowspan="2">
        <a href="event">Add</a>
        </th>
      </tr>
      <tr>
        <th>
          Starts
        </th>
        <th>
          Ends
        </th>
      </tr>
      <xsl:apply-templates select="event" />
    </table>
      </body>
   </html>
  </xsl:template>

  <xsl:template match="event">
    <tr>
      <td>
    <xsl:value-of select="@actor"/>
      </td>
      <td>
    <xsl:value-of select="@location"/>
      </td>
      <td>
    <xsl:value-of select="@starttime"/>
      </td>
      <td>
    <xsl:value-of select="@endtime"/>
      </td>
      <td>
    <xsl:value-of select="@type"/>
      </td>
      <td>
    <xsl:value-of select="description"/>
      </td>
      <td>
    <a>
      <xsl:attribute name="href">event?event=<xsl:value-of 
        select="@event"/>
      </xsl:attribute>
      Edit
    </a>
      </td>
    </tr>
  </xsl:template>

</xsl:stylesheet>

Beginning XSL-T [ii] The 'stylesheet' tag

<?xml version="1.0"?>

This says this stylesheet is written in XML; it should be the first line of every XML document
Yes, XSL is a dialect of XML
version=1.0 says it's version 1.0 of XML

<xsl:stylesheet version=1.0 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

Every XSL-T 'stylesheet' starts with this
xsl:stylesheet says it's a stylesheet
version=1.0 says it's version 1.0 of XSL
xmlns says the namesspace definition of names which start with 'xsl:' is identified by the URL http://www.w3.org/1999/XSL/Transform

Beginning XSL-T [iii] comments

<!-- Basic XSL stylesheet for day view of events diary.  -->

Comments in XSL text are just like any other XML (or SGML) comments
- Start with  (the space matters)
Because they're comments, they don't appear in the output
To create comments in the output, use xsl:comment
- <xsl:comment>text of comment</xsl:comment>
- will produce

Beginning XSL-T [iv] output specifier

  <xsl:output indent="yes" method="html" 
          doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>

The output specifier is not required
If it exists it must appear at top level
- as a child of the xsl:stylesheet element
indent="yes" says we want the output neatly indented to show structure
method="html" saya we want the output to have html syntax
- might have been "xml" or "text"
doctype-public says include a DOCTYPE declaration of this DTD
There are a number of other possible attributes.

Beginning XSL-T [v] declaring a template

  <xsl:template match="eventsdiary">

This template matches every instance of the element eventsdiary which is found in the document being processed. As eventsdiary is the root element of the document type we're interested in, there will only be one.

    <html>
      <head>
        <title>

As you can see, what is in the template is just the HTML markup that will be output (if we were outputting XML, it would be XML, of course)...

      Diary for <xsl:value-of select="@date" />

with scattered among it special xsl tags which cause things to be spliced into the output. This one says 'use the value of the date attribute of the current element'

        </title>
      </head>
      <body>
        <h1>
          Diary for <xsl:value-of select="@date" />
        </h1>
        <table>
          <tr>
            <th rowspan="2">
              Who
            </th>
            <th rowspan="2">
              Where
            </th>
            <th colspan="2">
              When
            </th>
            <th rowspan="2">
              What
            </th>
            <th rowspan="2">
              Details
            </th>
            <th rowspan="2">
              <a href="event">Add</a>
            </th>
          </tr>
          <tr>
            <th>
              Starts
            </th>
            <th>
              Ends
            </th>
          </tr>
          <xsl:apply-templates select="event" />

This is the important one. It says "apply the templates in this stylesheet to all the instances of event elements which are children of the current node".

        </table>
      </body>
   </html>
  </xsl:template>

Beginning XSL-T [vi] other useful bits

<xsl:template match="section[ @slot='main']">

This template will match only section elements which have an attribute named slot whose value is main

  <p>
    <xsl:call-template name="toc"/>

paste in the output of the named template called toc.

  </p>
  <xsl:apply-templates select="section">
    <xsl:sort select="title"/>

Apply templates in this stylesheet to sections which are children of this section, sorted alphabetically by their title sub-element

  </xsl:apply-templates>
</xsl:template>

<xsl:template name="toc">

This is the named template which was called earlier. Most templates are not named: they are applied automatically if their patterns match an element

  <xsl:for-each select="section">

for-each iterates over matching elements in turn

      <xsl:sort select="title"/>
        <a>
              <xsl:attribute name="href">#<xsl:value-of select="title"/>
                </xsl:attribute>

xsl:attribute allows us to construct the value of an attribute of the enclosing tag

          <xsl:value-of select="title"/>
      </a> |
  </xsl:for-each>
</xsl:template>

XSL-T elements: reprise

xsl:output: allows us to define how we want the output to be formatted
xsl:template: defines what should be output for elements matching a given pattern
xsl:apply-templates: applies the templates to the elements which match its pattern
xsl:call-template: calls a template with a particular name, overriding the pattern-matching system
xsl:for-each: produces output iteratively, overriding the pattern-matching system
xsl:sort: orders the result of its enclosing element (an xsl:apply-templates or an xsl:for-each)
xsl:value-of: produces the value of the thing matched by its pattern
xsl:attribute: outputs an attribute for the output element which encloses it

There are a few more XSL elements, but these will do most things for you.

Beginning XSL-T [vii]: Patterns

*: matches any element
foo: matches any element whose type is foo
foo | bar: matches any element whose type is foo or bar
foo/bar: matches any bar element with a foo parent
foo//bar: matches any bar element with a foo ancestor
foo[ @bar='baz']: matches any foo element which has a bar attribute which has the value baz
foo[1]: matches any foo element which is the first foo child of its parent
foo[ position() = 1]: matches any foo element which is the first child of its parent
[ position() < 5]: matches any element which is the first, second, third or fourth child of its parent
text(): matches any text element.

This is just the basics. The full definition is here

XSL-T: A deceptively simple language

Not many elements
Simple to learn all of them
Very subtle in use
The power is in the patterns

Exercise period [iii]

In groups: Write an XSL-T stylesheet which produces an HTML agenda for your group's Meeting DTD.
Everyone together: negotiate and agree a new, common DTD which you can use to communicate meeting information between your groups
In groups: Write an XSL-T stylesheet which produces a document conforming to the common DTD from a document conforming to the groups DTD.

Communicating

Just a bit about transport
XML Parsers
Parsing XML into the Database
Parsing: Simple worked example
Exercise period [iv]

Just a bit about transport

XML is about the content of communication, not how it's sent...
But how do you send XML information?
- HTTP GET to get a information from a known place
- HTTP POST or PUT to send information to a known place
- Special purpose listener daemons with special purpose protocols
- eMail

Parsers

read a document from some source,
construct a representation of that document in the machine
or provide the hooks to allow you to do so

Parsing is quite compute-intensive - don't do it if you don't have to!

More about parsers [i] types

Event-based parsers
- You register handlers for parsing events you are interested in
- The parser calls these handlers when it sees the events
- Useful if you only want some of the information out of the document
- Useful if the document might use more memory than you have available
- Quite a lot of work to set up.
Document parsers
- Usually built on event-based parsers
- Parse the whole document and provide you with a handle on an internal representation of it
  - Usually a DOM document object
- Useful if you want all the information out of the document

More about parsers [ii] types

Validating parsers
- Read the DTD (or schema)
- Read the document
- If the document isn't valid according to the DTD, report this
- Good if you're making sure your document conforms to the dialect standard
Non validating parsers
- Don't read the DTD (or schema)
- Read the document
- Will still throw an error if the document has bad syntax
- Good if you just want to parse XML quickly

Parsing from XML into the database

Walk recursively down the document tree
identifying the elements we want to store
for each one, see if it's already there (tricky!)
if not, store it.

Identifying the data to store

The attributes of an element are a namespace
So are the fields of a table
If you have one table for every element type
- and one field in that table for every attribute that element can have
It's relatively easy
The real world isn't often like that
- the overall structure of XML and relational databases are quite different
- most serious databases have been around a long time, we can't just design them to fit our DTD
- most DTDs are agreed between large numbers of organisations, we can't just design them to fit our database
- but it may be coerced with a little help from XSL...

Other things to bear in mind

Text nodes - what do you do with them?
Context - what was the key value of that meeting we just stored?

Parsing: very simple worked example

Sample XML document
Sample Java class

Sample XML document

<?xml version="1.0"?>
<workshop tutor="Simon Brooke" 
  title="Parsing XML" venue="small">
  <attendee name="Jon Smith" age="37" 
    sex="M" country="UK" />
  <attendee name="Jane Doe" age="42" 
      sex="F" country="US" />
</workshop>

those who were here yesterday will probably recognise this from the 'WORKSHOP' database - I'm using this because I can't predict what your 'MEETING' databases will look like

Sample Java class

import java.io.*;       // to read things from the user
import java.sql.*;      // to talk to the database
import uk.co.weft.domutil.*;    // things to convert elements to namespaces
import uk.co.weft.dbutil.*; // things to store namespaces in databases
import org.w3c.dom.*;       // interrogates a DOM tree...
import org.apache.xerces.dom.*; // using Apache's DOM implementation
import org.apache.xalan.xslt.*; // Apache's XSL processor
import org.apache.xerces.parsers.DOMParser;
                // and Apache's XML parser


public class ParseExample
{
    static Context connectionContext = new Context();
                // a context to hold database
                // connection details

    /** walk down a document tree looking for nodes we recognise */
    public static void walk( Node node)
    throws SQLException, DataStoreException
    {
    if ( node.getNodeType() == Node.ELEMENT_NODE)
        {
        Element elt = ( Element) node;

        System.out.println( "Considering element of type " +
                    elt.getTagName());

        if ( elt.getTagName().equals( "workshop"))
            handleWorkshop( elt);
        else
            {
            NodeList children = elt.getChildNodes();

            for ( int i = 0; i < children.getLength(); i++)
                walk( children.item( i));
                // recurse down through the children
            }
        }
    }


    /** handle a workshop element; extract its attribute (and
     *  actually, it's text-only child) values, and store them in the
     *  database. Then look for attendees.*/
    protected static void handleWorkshop( Element elt) 
    throws SQLException, DataStoreException
    {
    Object key = null;

    Context c = ( Context)connectionContext.clone(); 
                // construct a new namespace with just
                // the database connection details in
                // it
    ContextElement.populateContext( elt, c);
                // fill it with values from the element

    TableDescriptor workshopDescriptor = 
        TableDescriptor.getDescriptor( "WORKSHOP", "Workshop", c);
                // get a descriptor on the WORKSHOP table

    Contexts rows = workshopDescriptor.match( c);
                // try to match that against what's
                // already in the table

    if ( rows != null && rows.size() > 0)
        {           // there was a match
        key = ( ( Context)rows.get( 0)).getValueAsInteger( "Workshop");
                // get its primary key value
        System.out.println( "Found workshop " + key.toString());
        }
    else
        {
        key = workshopDescriptor.store( c);
                // store it and get its primary key value
        System.out.println( "Created workshop " + key.toString());
        }

    NodeList children = elt.getChildNodes();

    for ( int i = 0; i < children.getLength(); i++)
        {           // look through the children for my attendees
        Node child = children.item( i);

        if ( child.getNodeType() == Node.ELEMENT_NODE &&
             ( ( Element) child).getTagName().equals( "attendee"))
            {
            handleAttendee( ( Element)child, key);
            }
        }
    }

    /** handle an attendee element by finding or storing it in the
     *  database, and fixing up the link table */
    protected static void handleAttendee( Element elt, Object workshopKey)
    throws SQLException, DataStoreException
    {
    Object attendeeKey = null;

    Context c = ( Context)connectionContext.clone(); 
                // construct a new namespace with just
                // the database connection details in
                // it
    ContextElement.populateContext( elt, c);
                // fill it with values from the element

    TableDescriptor attendeeDescriptor = 
        TableDescriptor.getDescriptor( "ATTENDEE", "Attendee", c);
                // get a descriptor on the ATTENDEE table

    Contexts rows = attendeeDescriptor.match( c);
                // try to match that against what's
                // already in the table

    if ( rows != null && rows.size() > 0)
        {           // there was a match
        attendeeKey = 
            ( ( Context)rows.get( 0)).getValueAsInteger( "Attendee");
                // get its primary key value
        System.out.println( "Found attendee " + 
                    attendeeKey.toString());
        }
    else
        {
        attendeeKey = attendeeDescriptor.store( c);
                // store it and get its primary key value
        System.out.println( "Created attendee " + 
                    attendeeKey.toString());
        }

    String q = "insert into ATTENDANCE ( Attendee, Workshop) values ("
        + attendeeKey.toString() + ", " + workshopKey.toString() + ")";

    Connection conn = c.getConnection();
    Statement s = conn.createStatement();
                // set up a database connection

    s.executeUpdate( q);    // run the statement
    System.out.println( "Inserted link into link table");

    s.close();      // close it...
    c.releaseConnection( conn);
                // and release it back into the pool
    }

    /** prompt the user for input; if we get any, return it */
    protected static String maybeGetFromUser( BufferedReader in, String prompt,
                       String val) throws IOException
    {
    System.out.print( prompt + " ] ");

    String s = in.readLine();

    if ( s != null || s.length() == 0)
        val = s.trim();
    
    return val;
    }

    /** start me up... */
    public static void main(String args[]) 
    {
    BufferedReader in = new 
        BufferedReader( new InputStreamReader( System.in));

                // get from the user the name of the
                // database driver to use
    try
        {
        Class.forName( 
              maybeGetFromUser( in, "Database Driver", 
                    "sun.jdbc.odbc.JdbcOdbcDriver"));

                // get from the user the details
                // needed to connect to the database
        connectionContext.put( "db_url", 
                   maybeGetFromUser( in, "Database URL", 
                         "jdbc:odbc:workshop"));
        connectionContext.put( "db_username", 
                   maybeGetFromUser( in, "Database Username", 
                         "nobody"));
        connectionContext.put( "db_password", 
                   maybeGetFromUser( in, "Database Password", 
                         "doesntmatter"));


        DOMParser p = new DOMParser();
            
        p.parse( maybeGetFromUser( in, "URL of XML to handle", 
                         "file:workshop.xml"));

        walk( p.getDocument().getDocumentElement());

        System.exit( 0); // all satisfactory
        }
    catch ( Exception e)
        {
        System.out.println( "Failed: " + e.getClass().getName() +
                    ": " +e.getMessage());
        System.exit( 1); // whoops
        }
    }
}

Exercise period [iv]

In your groups
- Write an XSL-T stylesheet that converts back from the common DTD to the group's DTD
- Adapt the above Java class to store (at least part of) documents in your group's DTD into your database

References

XML

news:comp.text.xml

Newsgroup for XML - recommended

FAQs, Directories and Resources

Extensible Markup Language (XML): http://www.oasis-open.org/cover/xml.html: A useful and authoritative overview of the technology; another good place to start.
Frequently Asked Questions about the Extensible Markup Language: http://www.ucc.ie/xml/: The most superior FAQ. Everyone seriously interested in XML should start here.
SCHEMA.NET: The XML Schema Site: http://www.schema.net/
Cafe con Leche XML News, and Resources: http://metalab.unc.edu/xml/index.html
DEVELOPERLIFE.COM brought to you by Nazmul Idris.: http://developerlife.com/
xmlTree - The leading directory of XML content on the Web: http://www.xmltree.com/

News

Welcome to XMLNews.org: http://www.xmlnews.org/
Mulberry Technologies, Inc.: XSL-List -- Open Forum on XSL: http://www.mulberrytech.com/xsl/xsl-list/
XMLephant: News: http://www.xmlephant.com/pages/News/
XML.ORG - A good XML Portal: http://www.xml.org/
XML.com - Another good XML portal: http://www.xml.com/pub

Standards

Authoritative sources of standards documents, mostly from the World Wide Web Consortium (W3C)

Core standards

The Annotated XML Specification: http://www.xml.com/axml/testaxml.htm: The standard annotated by one of the editor's personal comments -- very revealing!
Extensible Markup Language (XML) 1.0: http://www.w3.org/TR/1998/REC-xml-19980210
XML Linking Language (XLink): http://www.w3.org/TR/WD-xlink#addressing

Resource Description Framework

W3C Resource Description Framework: http://www.w3.org/RDF/
java tutorial help resource only at gamelan.com: http://www.gamelan.com/journal/techfocus/090199_rdf1.html
UKOLN: DC-dot, A Dublin Core Generator: http://www.ukoln.ac.uk/metadata/dcdot/
Dublin Core Metadata Initiative / Documents / Proposed Recommendations / Dublin Core Element Set, Version 1.1: http://purl.org/DC/documents/rec-dces-19990702.htm
Dublin Core Metadata Initiative: http://purl.org/dc/index.htm
UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
Welcome to XMLNews.org: http://www.xmlnews.org/

XSL

XSL Transformations (XSLT) Specification: http://www.w3.org/TR/WD-xslt

DocBook

The nwalsh.com Home Page - XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/
XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/

WML

WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
Welcome to WAP School: http://www.refsnesdata.no/wap/default.asp
Nokia WAP Developer Forum: Nokia WAP Toolkit: http://www.forum.nokia.com/wapforum/main/1,6668,1_1_3_2,00.html

RSS: Rich Site Summary

Tutorials

My Netscape Network: http://my.netscape.com/publish/
Using RSS News Feeds - Webreference.com: http://www.webreference.com/perl/tutorial/8/

Feed Directories

Webfeeds: http://www.stirbitch.com/cgi-bin/agg/sources.pl
Moreover... Top stories: http://w.moreover.com/
StartsHere Channel List: http://theweb.startshere.net/channels.phtml
Open Directory - Computers: Internet: WWW: Web Portals: Netscape Netcenter: My Netscape Network: http://dmoz.org/Computers/Internet/WWW/Web_Portals/Netscape_Netcenter/My_Netscape_Network/

Internet Alchemy : Internet Alchemy : RSSMaker: http://internetalchemy.org/rss/index.phtml

xmlTree - The leading directory of XML content on the Web: http://www.xmltree.com/rss/index.htm

XML.COM - Standards List Sorted by Date: http://www.xml.com/xml/pub/standate/

W3C Scalable Vector Graphics (SVG): http://www.w3.org/Graphics/SVG/

VML - the Vector Markup Language: http://www.w3.org/TR/1998/NOTE-VML-19980513

Vector (infinitely zoomable) graphics for the Web, with implications especially for maps and technical diagrams.

News Industry Text Format: http://www.nitf.org/

Meta Content Framework Using XML: http://www.w3.org/TR/NOTE-MCF-XML/

'Content about content' - i.e. information for search and indexing engines and other software agents which must make some sense of the document.

Audio, Video, and Synchronized Multimedia: http://www.w3.org/AudioVideo/

The SMIL standard. I believe SMIL has implications not just for the Web, but for all sorts of presentation media including digital television.

XHTML 1.0: The Extensible HyperText Markup Language: http://www.w3.org/TR/WD-html-in-xml/

Backwards compatibility: implementing HTML in XML. Only very well written HTML is going to work!

XML Catalog proposal: http://www.ccil.org/~cowan/XML/XCatalog.html

XHTML 1.0: The Extensible HyperText Markup Language: http://www.w3.org/TR/xhtml1/

Template Resolution in XML/HTML: http://www-uk.hpl.hp.com/people/ak/doc/trix.html

eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html

Workflow Management Coalition: http://www.aiim.org/wfmc/mainframe.htm

DSML.ORG: The Standards Effort to Link Directories with XML: http://www.dsml.org/

Turorials

Info for Newcomers to XML at XMLINFO: http://www.xmlinfo.com/newcomers/
Producing HTML tables with XSLT: http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
A Tutorial in XML and XSL Authoring: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
Java & XML: 1 + 1 > 2: http://www.sun.com.au/sjug/pres/xml/JavaAndXML/seminar.html#Slide3
The WDVL: XML Tutorials: http://www.wdvl.com/Authoring/Languages/XML/Tutorials/
Generally Markup: XML Resources: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
developerWorks : XML : Education: http://www.software.ibm.com/developer/education/xmlintro/xmlintro.html
SGML/XML: Using Elements and Attributes: http://www.oasis-open.org/cover/elementsAndAttrs.html
Producing HTML tables with XSLT: http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
Welcome to XML School: http://www.refsnesdata.no/xml/
Practical XML : An introduction to XML and XSL stylesheets: http://www.kst.com/articles/2000/January/practical_xml1/index.php
Crane Softwrights Ltd. - Training: http://www.CraneSoftwrights.com/training/index.htm#ptux-dl
developerWorks : XML : Education: http://www-4.ibm.com/software/developer/education/xmlintro/xmlintro.html
RSS Tutorial: http://my.netscape.com/publish/help/mnn20/quickstart.html#rsssyntax
XML DTD Tutorial: http://www.xml101.com/dtd/

Software resources

Editors

Editing SGML with Emacs and PSGML - Table of Contents: http://rainbow.ldeo.columbia.edu/documentation/programs/psgml/psgml_toc.html#SEC2
A GNU Emacs mode for SGML files: http://www.lysator.liu.se/projects/about_psgml.html: This is what I use and recommend (I personally use XEmacs rather than GNU Emacs)
SoftQuad XMetaLhttp://www.softquad.com/index_main.html
Mulberry Technologies -- tdtd Emacs Major Mode for SGML and XML DTDs: http://www.mulberrytech.com/tdtd/
Download Morphon XML Editor 1.0b41: http://www.lunatech.com/products/morphon-xml-editor/download/

Browsers

Jumbo: http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/
Doczilla: http://www.doczilla.com/download/index.html
XML Viewer : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/xmlviewer
InDelv: http://www.indelv.com/

XML to HTML on the fly

IBM XML Web Site, Education - Accessing XML on the Client: http://www.software.ibm.com/xml/education/client/client.html
Apache Cocoon: http://xml.apache.org/cocoon/: Apache is the world's most widely used Web server. This is the Apache project's server-side XML to HTML conversion strategy, important for serving XML documents while many browsers are still unable to interpret it. Implemented as a Java Servlet, may work with other Servlet enabled Web servers (but then does anyone serious use anything other than Apache anyway?)

XML Database integration

DB2XML A tool for transforming relational databases into XML documents: http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
Tamino - The Information Server for Electronic Business, Software AG: http://www.softwareag.com/tamino/: A database which claims to store XML directly. Whether this means that it's really an object-oriented database underneath I'm not sure.
ODBC2XML: Merging ODBC data into XML documents: http://members.xoom.com/_XOOM/gvaughan/odbc2xml.htm
pgxml homepage: http://www.morinel.demon.nl/pgxml/: My favourite database engine, Postgres,
XML Lightweight Extractor : another alphaWorks technology: http://alphaworks.ibm.com/tech/xle

Conversion tools and filters

RTF2XML: http://www.xmeta.com/omlette/: Tool for converting RTF to XML, written in Omnimark
OmniMark Technologies Corporation: http://www.omnimark.com/: A programming language for manipulating data streams, useful in writing conversion filters from other formats into XML.

Quick ways to produce DTDs

DTDGenerator Frontend: http://www.pault.com/Xmltube/dtdgen.html
DB2XML A tool for transforming relational databases into XML documents: http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
schematron: http://www.ascc.net/xml/resource/schematron/schematron.html: Widely recommended as a very powerful and elegant solution, knows about schemas as well as DTDs.
XMLschema.com: http://apps.xmlschema.com/

Structured Search tools

Downloading sgrep: http://www.cs.helsinki.fi/~jjaakkol/sgrep/download.html: Probably the most powerful simple tool for manipulating SGML and XML documents

Software collections and directories

xml.apache.org: http://xml.apache.org/
XMLSOFTWARE.COM: The XML Software Site: http://www.xmlsoftware.com/: This (commercial) site tries to keep track of XML related software tools which are available. Likely not to effectively index open source tools in the longer term.
Free XML software: http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html#SC_XSL

IBM Developers: XML : Overview: http://www.ibm.com/developer/xml/

eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html

OpenXML: http://www.openxml.org/

Major open source project to provide XML tools in Java

PHP3: Manual: XML Parser Functions: http://www.php.net/manual/ref.xml.php3

PHP is a server-side scripting language -- probably the best of the open source ones available. This manual section shows how the PHP project intends to handle XML at the server side, and is thus an alternative to Apache's Cocoon technology.

XML Authority Product Overview: http://www.extensibility.com/xml_authority/xml_ath_specs.htm

eidon products - Solutions for Structured Documents: http://www.eidon-products.com/

Dynamic XML for Java : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/dynamicxmlforjava

XML Products Evaluation Form: http://www.bluestone.com/scripts/SaApps/SaCGI.exe/XMLevaluate.class

XML Script - XML tools for E-commerce: http://www.xmlscript.org/

SAX: The Simple API for XML: http://www.megginson.com/SAX/

Activated Intelligence Rocks Your Java World!: http://www.activated.com/

W4F, the World Wide Web Wrapper Factory: Welcome: http://db.cis.upenn.edu/W4F/

JDOM: Who We Are: http://www.jdom.org/credits/index.html

Commentry and background

XML, Java, and the future of the Web: ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
Scientific American: Feature Article: XML and the Second Generation Web: May 1999: http://www.scientificamerican.com/1999/0599issue/0599bosak.html: An extremely clear and well written article
DevEdge Online - Metadata: http://developer.netscape.com/tech/metadata/index.html: Netscape's official take on metadata.
XML.COM - XML support in IE5: http://www.xml.com/xml/pub/1999/03/ie5/first-x.html: XML.com sets out to be a newsletter on XML and related developments. It's contributors are in general exceptionally well informed. In this article Tim Bray (who works closely with Netscape) reviews Microsoft IE5's XML compatibility.
CNET News.com - Taking sides on XML: http://www.news.com/News/Item/0,4,37072,00.html
XML, Java, and the future of the Web: ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
XML Namespaces: http://www.jclark.com/xml/xmlns.htm
The Last Page: XML's Achilles Heel (Web Techniques, June 1999): http://www.webtechniques.com/archives/1999/06/lastpage/

XML EDI and e-Commerce stuff

A number of competeing proposals are being developed to do automatic businessto business transfer of invoices, orders,et cetera...

CNET.com - News - Services & Consulting - Big-name chemical firms join business e-commerce trend: http://news.cnet.com/news/0-1008-200-1579569.html?tag=st

Collaborative initiatives

The OBI Consortium: http://www.openbuy.org/: A solid business community consortium
Welcome to RosettaNet: http://www.rosettanet.org/: Probably the most incompetent and unprofessional Web site I've ever seen. This organisation claims to be the hub of EDI in XML development, but their Web site gives no comfort whatever regarding their competence.
Biztalk - Letting computers speak the language of business: http://www.biztalk.org/: Microsoft's tame e-Commerce consortium.
FpML.org: http://www.fpml.org/: JP Morgan - PriceWaterhouseCoopers initiative, apparently mainly aimed at financial services.
Electronic Business XML (ebXML) Home Page: http://www.ebXML.org/

Suppliers

DEDIOUX - Dynamic EDI Objects Using XML: http://www.americancoders.com/OpenBusinessObjects
ariba.com - welcome: http://www.ariba.com/
Welcome To OpenLink Software: http://www.openlinksw.com/virtuoso/

Stories

XML Applications Stand Up To EDI: http://www.techweb.com/wire/story/TWB19990416S0002
XML Applications Stand Up To EDI: http://www.techweb.com/se/directlink.cgi?INW19990419S0014: News story about Dell Computer's XML
CNET News.com - IBM links business software, e-commerce: http://www.news.com/News/Item/0,4,35128,00.html: News story about IBM's XML e-Commerce

WAP/WML

WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
wml-tools: http://www.pwot.co.uk/wml/
www.kannel.org: http://www.kannel.org/

XML Icon Gallery.: http://www.iol.ie/~alank/xml/icons.htm

give me feedback on this page // show previous feedback on this page

Moving to XML

What we're going to do today

How we're going to get there

Before we start: what do you know

Before we start: Namespace

A brief chautauqua on language

Words

Sentences

Language

Words

Sentences

Meta-Language [1]

Meta-Language [2]

HTML [1]

HTML [2]

Well Formedness

What is XML

Key Features

Differences from HTML

Extensible: what does this mean for you?

Extensible: a simple example [1]

Extensible: a simple example [2]

Strictly parsed: what does this mean for you? [1]

Strictly parsed: what does this mean for you? [2]

Differences from SGML

A bit about the other bits

A bit about the other bits [ii]: XLink

A bit about the other bits [iii]: XPath and XPointer

A bit about the other bits [iv]: XSL-T

Digression: Visual Appearance and Stylesheets [i]

Visual Appearance and Stylesheets [ii]

Visual Appearance and Stylesheets [iii]: Status of XSL

Visual Appearance and Stylesheets [iv]: XSL Summary

Visual Appearance and Stylesheets [v]: What about CSS?

A bit about the other bits [v]: XML Schemas

Digression: Dialects of XML

What is a Document Type Definition?

What about Schemas? (Schemata?)

More about Schemas [i]: benefits

More about Schemas [ii]: examples

More about Schemas [iii]: conversion

Do I have to use a DTD or Schema?

What DTDs and Schemas are available?

Who will write DTDs and Schemas? [i]

Who will write DTDs and Schemas? [ii]

Ownership of DTDs: Communities vs single vendors

Another cautionary tail about software vendors

A bit about the other bits [vi]: SOAP

More about SOAP

XML in your context

Applications which benefit greatly from XML

Applications which will benefit little from XML

XML in action: content syndication

What is content syndication

History of Syndication

Standards for Syndication

Offering Syndication

Incorporating Syndication [i]

Incorporating Syndication [ii]: Sample code

Internet Europe news

Aggregation

Worked Example: a meeting arranger system

Creating an example document (quite easy)

Creating the DTD and/or Schema (hard, but we'll use a trick)

Viewing it: creating a style-sheet (harder)

Using it: applications

Specifying

The Structure of an XML document

Overall Structure

Processing Instructions

XML Namespaces

Elements [i]

Elements [ii]

Attributes

When to use which

Exercise period [i]

Creating

Building XML applications: tools and technologies

Why Java?

Other languages for building XML applications