Moving to XML
    
    Author and presenter: Simon
    Brooke.
    The full text of this presentation is online at <URL:
    http://www.jasmine.org.uk/~simon/bookshelf/courses/xml/>
    Written May 2001; last revised 30th June 2004
      Note: the original version of this text had
      a horrible error of understanding in the use of the
      org.w3c.dom package. My apologies to all those
      mislead by it.  This remains, essentially, a three year old
      course; I have fixed the major error but have not otherwise
      brought it up to date. Bits of it are still worth reading.
    Changes to the presentation since your handouts
    were printed are highlighted like this.
    
      Simon Brooke, 21
      Main Street, Auchencairn, DG7 1QU, Scotland.
    
    
    
What we're going to do today
    
      - 
        What is XML 
        
          - A brief chautauqua on language
- What is XML
- A bit about the other bits
- XML in your context
 
- 
        Anatomy of an XML system 
        
          - Specifying
- Creating
- Transforming
- Communicating
 
    
How we're going to get there
    
      - The morning, mostly talking
- This afternoon, mostly doing it. We have an awful lot to
      get in!
Breaks, meals, fire exits, nearest WCs, how
    to get to coffee?
 
    
    
Before we start: what do you know
    We've got a lot of ground to cover... Just so I can know
    which bits to concentrate on and which bits to skip, what do
    you know about...
    
      
        
          |  | nothing | a little | can do it | expert | 
        
          | HTML |  |  |  |  | 
        
          | Java |  |  |  |  | 
        
          | XML |  |  |  |  | 
      
    
    
    
Before we start: Namespace
    
      - A context in which there are things with names
- Each thing in the namespace has a different name
- You can address a thing in the namespace just by using
      its name
- 
        Examples 
        
      
- This is a powerful concept, and I'm going to use it a lot
      - but it has special meaning in XML. Be clear about when I'm
      talking about an 'XML namespace' and when I'm just
      talking about a namespace!
[get participants to write their names
    on bits of paper. Make sure there are not two in the class with
    the same name. If there are, get them to add something to their
    names to disambiguate]
    
    
      A brief chautauqua on language
    
    
      Words
    
    
    We can recognise words as belonging to a
    language because we know them... (sometimes, we can recognise
    words as belonging to a language even when we don't know them,
    because they sound right).
    
    
Sentences
    
      - 
        Colourless green ideas sleep furiously. 
        
      
- 
        Development state with join material and. 
        
      
    
Language
    
      - In any given language, we can easily recognise what is a
      well formed component of the language.
- And what is not...
    
Words
    
      - 
        Fish 
        
      
- Peske
- Choiremheir
- Chautauqua
- Gkprtwcv
- P7ajo
Although language families have rules
    about what can be in a word and what can't, it's much harder to
    tell whether a word is valid or not, unless we know which
    language we're looking at.
    
    
Sentences
    
      - This is not a pipe.
- Ceci n'est pas une pipe.
- Chagco vet nici yan toube.
- GGGGGG #000007 cabala.
    
Meta-Language [1]
    
      - Within a family of languages, we can recognise what is a
      well-formed component of some language
- or might be...
- and what certainly isn't.
    
Meta-Language [2]
    
      - 
        In Indo-European languages, 
        
          - 
            a word has at least one vowel 
            
          
- words don't have more than four consonants in
	      succession
	      
	    
- a sentence is a succession of words
- a sentence starts with a capital letter and ends with
          a period
 
- There is an (implicit) meta-grammar.
    
HTML [1]
    
      - 
        <address> 
        
          - A valid HTML tag (in HTML 4.0
          Transitional)
 
- 
        <cotton> 
        
      
    
HTML [2]
    
      - HTML is a language (albeit a simple one).
- It's a markup language, and I hope it's one you're all
      familiar with.
- we can know at once whether a tag is a valid HTML tag or
      not...
- and what it means...
- and how it should be used...
    
Well Formedness
    
      - 
        When we know what the language is we can parse ill-formed
        forms: 
        
      
- because we can predict what the missing bits are
- 
        and where they should be: 
        
          - I have been there, and
          I have done that.
 
    
      What is XML
    
    
      - Key features
- Differences from HTML
- Differences from SGML
- A bit about the other bits
- Reality check
    
Key Features
    A universal, application-independent framework for the
    communication of semantically rich structured information
    between software agents.
    
      - 
        A language for describing other languages 
        
      
- Which describe the structure of a document
- Not the visual appearance (CSS, XSL)
- Written in simple UniCode (a sixteen-bit
      replacement for ASCII)
    
Differences from HTML
    
      - 
        A Metalanguage: In a word, extensible. 
        
          - HTML can be (and has been)
          reimplemented as an XML dialect
 
- Also, strictly parsed.
    
Extensible: what does this mean for you?
    
      - Allows you to define new markup.
- Describing structure, not
      appearance.
- Makes it easier for programs to extract
      information from your documents.
    
Extensible: a simple example [1]
<?xml version="1.0"?>
<!DOCTYPE meeting PUBLIC "-//WEFT//DTD MEETING 0.1//EN" 
        "meeting.dtd">
<meeting id="June Board Meeting">
  <venue>
    28 Forth Street, Edinburgh
  </venue>
  <invitees>
    <attendee attendance="required" 
        meeting-role="convenor">
      <name>
    Simon Brooke
      </name>
      <position>
    Technical Director
      </position>
    </attendee>
    <attendee attendance="required">
      <name>Angela Stormont</name>
      <position>
    Communications Director
      </position>
    </attendee> 
  </invitees>
</meeting>
   
    
    
Extensible: a simple example [2]
    
      - What does this do?
- For the user directly, very little.
- For the user's program, it allows it to isolate items of
      structured information and handle them in intelligent ways to
      help the user.
- But only if the user's program understands the special
      markup you have defined.
    
      Strictly parsed: what does this mean for you? [1]
    
    
      - Documents which are not well-formed will not be handled by an
    XML application. At all. 
    
      - Tags and attributes are case-sensitive;
- End tags cannot be omitted - every <p>
      must have a </p>.
- Tags must be correctly nested:
 <b><i>This won't
      work</b></i>
- Empty tags (those which don't enclose any content) must
      be marked with a trailing slash like this:
      <xx/>
 
    
Strictly parsed: what does this mean for you? [2]
    
      - Most Web designers are sloppy.
- More than ninety percent of all commercially authored Web
      pages do not conform to any standard and are not valid
      HTML.
- Few if any of the commercially available WYSIWYG tools
      generate valid HTML.
- Web authors switching to XML will need to adopt much more
      rigorous technical discipline.
    
Differences from SGML
    
      - 
        Like HTML, simpler!
        
          - I used to say 'much simpler', but now I'm not too
          sure...
 
- Like HTML, optimised for delivery over
      restricted-bandwidth links.
- Unlike HTML, a true subset of SGML.
- All valid XML documents are valid SGML documents.
- SGML tools (conforming to ISO 8879) will work with
      XML.
- Organisations with an existing committment to SGML will
      find the transition to XML much simpler.
    
      A bit about the other bits
    
    
      - 
        XML is a language for describing other languages 
        
          - Most of these are application specific
- 
            Some are very general
            
              - XLink:
              a vocabulary for linking between XML documents
- XPath
              and XPointer:
              vocabularies for describing positions inside XML
              documents
- XSL-T: a
              vocabulary for transforming XML documents
- XML
              Schema: a vocabulary for describing
              vocabularies
- SMIL
              (Synchonised Multimedia Integration Language): a
              vocabulary for integrating and synchronising
              multimedia presentations
- SOAP
              (Simple Object Access Protocol): a vocabulary for
              exchanging computation requests between heterogenous
              agents in a network.
 
- All of these key standards are looked after by W3C
 
    
A bit about the other bits [ii]: XLink
    
      - In HTML, you need to use a special element (the A or
      Anchor tag) to be the start of a link
- 
        In XML any element can be the start of a link 
        
          - Currently, Mozilla/Netscape 6 is
          the only 'mainstream' browser which partly supports
          this
- W3C's Amaya 4
          also partly supports it
- 
            Several other demo and prototype implementations
            
          
- No mainstream browser fully supports XLink
 
    
A bit about the other bits [iii]: XPath and XPointer
    
      - In HTML, you need to use a special element (the A or
      Anchor tag) to be the target of a link
- 
        In XML a link can target any element in the target document
        
        
          - Several demo and prototype implementations
- No mainstream browser fully supports this
- You've heard this before somewhere, haven't you?
 
    
A bit about the other bits [iv]: XSL-T
    
      - 
        'eXtensible Stylesheet Language - Transformations' 
        
          - A language for manipulating document structure
- Maps any XML dialect into any other (or even to plain
          text)
- Declarative, pattern matching language, conceptually
          like Prolog
- Extremely powerful, unquestionably useful.
- But not really a stylesheet language
 
    
Digression: Visual Appearance and Stylesheets [i]
    
      - XML documents are not necessarily or primarily intended
      to be viewed by people, but when they are...
- The visual appearance of a document should be controlled
      by stylesheets.
- The appearance of this one is.
- In XML as in HTML you don't have to use stylesheets.
- If you don't, you will get a plain, simple
      appearance.
If people are interested, you can open
    the stylesheet for this presentation, slideshow.css, in a text
    editor.
    
    
Visual Appearance and Stylesheets [ii]
    
      - 
        A special stylesheet language, XSL, was conceived to
        support the new features of XML. 
        
          - 
            Two parts: 
            
              - 
                XSL-T, the Transformation language
                
                  - I've described this above
 
- 
                XSL-FO, formatting objects 
                
                  - A comprehensive language for descibing the
                  fine detail of document presentation.
- Produces prolix, semantically impoverished
                  markup.
- Not supported by any client yet.
- Really a stylesheet language...
- Of doubtful value.
 
 
 
    
Visual Appearance and Stylesheets [iii]: Status of XSL
    
      - XSL-T was adopted
      on 16 November 1999 as a W3C recommendation.
- XSL-FO was
      adopted on 21 November 2000 as a W3C recommendation.
- Microsoft IE5 implemented a proprietary 'XSL'
      which is based on an older draft of XSL; newer IE5s are
      migrating towards the standard.
      
    
Visual Appearance and Stylesheets [iv]: XSL Summary
    
      - Transformation language of unquestionable merit, greatly
      aids separating content from presentation.
- Designed primarily to transform XML to XSL-FO, but can
      transform to any other XML dialect (including XHTML).
- 
        Recommendation: 
        
          - use XSL-T to map XML into XHTML for presentation to
          users, decorate with CSS
- use XSL-T to map XML to other XML dialects as needed
          for communication with other organisations
- 
            ignore XSL-FO for now, except if 
            
              - You need pixel-perfect presentation of your
              documents and
- You work in an environment (e.g. an Intranet)
              where you control the client.
 
 
    
Visual Appearance and Stylesheets [v]: What about CSS?
    
      - 
        You can continue to use existing CSS1 and CSS2 stylesheets.
        
        
          - Probably.
- Depending on what individual client vendors decide to
          support...
- 
            
            all (roughly) support CSS2.
          
 
- This presentation is not about stylesheets.
    
A bit about the other bits [v]: XML Schemas
    
      - A vocabulary for defining vocabularies or 'dialects'
- A bit late arriving
- Replace DTDs, inherited from SGML
    
Digression: Dialects of XML
    
      - What is a DTD?
- What about Schemas?
- Do I have to use a DTD or Schema?
- What DTDs and Schemas are available?
- Who will write DTDs and Schemas?
    
What is a Document Type Definition?
    
      - Essentially, a dictionary for the language you are
      using.
- Every Web author has heard of one
- 
        Every good Web author has seen one 
        
      
- Very few Web authors have written one
    
What about Schemas? (Schemata?)
    
      - Schemas are a new, alternate way to specify XML
      languages
- Officially adopted by w3c on 4
      th May 2001 - so still very new
- Recommendation: Let someone else take
      the grief of getting the bugs out of it - stick with DTDs for
      now.
    
More about Schemas [i]: benefits
    
      - 
        The schema language is itself an XML laguage, so schemas
        can be parsed with standard XML tools 
        
      
- 
        You can specify rules for the content of elements and
        attributes with much finer granularity than with DTDs 
        
          - You can specify that an attribute must be a
          number
- You can specify minimum and maximum values for an
          attribute
- You can specify 
          regular expression patterns the attribute must
          match
 
    
More about Schemas [ii]: examples
    
      - 
        An attribute representing someone's age 
        
      
- 
        An attribute representing a UK bank sorting code (e.g.
        68-59-13)
        
      
- 
        An attribute representing a UK grid reference (e.g. 
        NX7951)
        
      
The pattern specification seems to have
    changed at some stage in the drafting process. The examples
    given in Learning
    XML don't work with Daniel Potter's tutorial applet.
    Treat all tutorials with care and refer back to the formal
    specification!
    
    
More about Schemas [iii]: conversion
    
      - 
        Schema has superset of the same information in a DTD 
        
          - You can convert a DTD to a schema with a PERL
          script
- 
            You should be able to convert a schema to a DTD using
            XSL-T
            
              - But you might lose some information
 
 
    
Do I have to use a DTD or Schema?
    
      - As with HTML, you don't have to specify a DTD.
- Even if you define new markup...
- ... but client programs won't know how to interpret your
      new markup unless you also define a DTD or Schema.
- As with HTML, you should specify one.
    
What DTDs and Schemas are available?
    
      - All the XML extensions discussed in this presentation are
      defined as DTDs or Schemas (mostly DTDs).
- Thousands of SGML DTDs are available which can relatively
      easily be converted.
- 
        There are already many hundreds of XML DTDs available, and
        the number is growing fast. 
        Some repositories: 
    
Who will write DTDs and Schemas? [i]
    
      - Very specialised documents, technically demanding to
      write.
- For most purposes, suitable examples are available.
- Most XML users will never write one.
    
Who will write DTDs and Schemas? [ii]
    
      - Large organisations with special documentation
      requirements may write DTDs and/or Schemas.
- Communities of organisations which wish to exchange data
      will probably write DTDs and/or Schemas.
- Corporations which sell application programs will
      probably write DTDs and/or Schemas.
- 
        Corporations which sell WYSIWYG Web authoring tools
        will certainly write DTDs and/or Schemas. 
        
          - In future, there will be much less distinction
          between a word processor and a Web authoring tool.
 
- Communities of interest with special technical needs
      will certainly write DTDs and/or Schemas.
    
Ownership of DTDs: Communities vs single vendors
    
    
      - Rich Site Summary (RSS) is an important XML dialect used in news
	syndication
- The original DTDs (0.9, 0.91) was developed and 'owned' 
	by Netscape
- By April 2001, Netscape had lost interest in RSS...
	
- Many other people's systems broke.
- My conclusion: single vendors are not to be trusted with
	community resources.
- Since 2001 the history of RSS and its
	  competing versions has got even more complex and
	  bizarre. It's still a really useful tool for doing
	  syndication, though.
    
Another cautionary tail about software vendors
    
      - Microsoft has a long history of 'embracing and extending'
    standards
	
	  - Making small changes which cause other people's implementations to
    break.
-     Thus forcing people to use only their implementations.
 
-     MS Word 2002 saves as 'HTML' and as 'XML'
	
	  - When it saves as HTML, the HTML contains embedded 'XML' elements
-     In the 'XML', the attribute values are unquoted.
	    
	      - This is explicitly forbidden by the XML standards...
-     Standards compliant XML parsers can't parse this
-     Microsoft's own XML parsers can parse this
 
 
-     Is this simply incompetence
	
	  - Or is it 'embracing and extending'...?
 
    
A bit about the other bits [vi]: SOAP
  
    
      - Simple Object Access Protocol
- 
        A vocabulary for communicating with software agents in a
        heterogenous network 
        
      
- 
        Not actually very simple... 
        
          - But this is an inherently difficult area
- Software toolkits (such as Apache Soap) will
          make this easier to deploy
 
    
More about SOAP
    
      - Developed from 'XML-RPC' (Dave Winer, Userland)
- Three versions out there
	
	  - 0.9
- 1.0 
	    
	      - submitted to IETF as a 'draft'
		
-     certainly incompatible with 0.9
 
-     1.1,
	    
	      - May 2000
- 	    submitted to W3C as a 'note'
-     probably incompatible with 1.0
 
 
-     Not (yet) a W3C recommendation, just a 'note'
-     Not (yet) an IETF RFC
-     Vapourware: not ready for prime time.
-     If you want to pursue this further, there's an online tutorial here.
    
      XML in your context
    
    
      - Applications which will benefit greatly from XML
- Applications which will benefit little from XML
- XML in action: Content syndication
    
Applications which benefit greatly from XML
    
      - 
        Applications exchanging structured data with other software
        agents. 
        
          - Accounting systems exchanging orders, invoices,
          payments...
- Engineering systems exchanging specifications,
          dimensions...
- Diary systems exchanging bookings, events, meetings,
          holidays...
 
- Technical documentation applications, or applications
      involving special notation (e.g., mathematics, music).
- Applications requiring highly detailed
      illustrations.
- Multimedia applications.
At present, only where the audience is
    controlled
    
    
Applications which will benefit little from XML
    
      - 
        Simple publishing of text, with or without simple graphics.
        
      
    
      XML in action: content syndication
    
    
      - What is content syndication
- History of Syndication
- Standards for Syndication
- Offering Syndication
- Incorporating Syndication
- Aggregation
    
What is content syndication
    
      - Making headlines from one web site available to
      others
- Automatically
- A dramatically successful public application of XML
    
History of Syndication
    
      - In the beginning was the ripper
- 1997: ScriptingNews starts promoting XML-based
      syndication
- 1999: My Netscape and Rich Site Summary 0.90
- 1999: ScriptingNews elements integrated by Netscape into
      RSS 0.91
- 2001: Netscape abandon Rich Site Summary
    
Standards for Syndication
    
      - 
        Rich Site Summary 0.91 
        
          - Netscape, now abandoned
- Very, very simple
- Still useful
 
- 
        Rich Site Summary 1.0 
        
      
- 
        Invent your own
        
      
    
Offering Syndication
    
      - 
        Provide a URL on your site from which an RSS document can
        be pulled 
        
          - Example pulled from a flat file (static, compiled
          periodically) [
          Wired news]
- Example pulled from a Servlet (dynamic) [PRES]
- You can do this with CGI, or any other server side
          content technology
 
- Very easy to set up.
    
Incorporating Syndication [i]
    
      - 
        Periodically request RSS from donor sites and transform to
        HTML 
        
      
- 
        Example sites 
        
      
    
Incorporating Syndication [ii]: Sample code
    
      
        
          | 
<!-- sidebar sections: show title and top eight entries -->
  <xsl:template match="rss">
    <h2>
      <xsl:apply-templates select="channel/title" />
    </h2>
    <xsl:for-each select="channel/item">
      <xsl:if test="9 > position()">
    <p>
      <a>
        <xsl:attribute name="href"><xsl:value-of 
          select="link"/> 
        </xsl:attribute>
        <xsl:apply-templates select="title" />
      </a>
    </p>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>
 | 
        
          | Sample XSL code | Moreover Internet Europe headlines, processed with
          this XSL 22nd May 2001 | 
      
    
    
    
Aggregation
    
      - If you can collect headlines from multiple sources, you
      can search the collection with predetermined patterns, and
      offer personalised aggregations of news to users.
- O'Reilly's Meerkat
- Start of something big.
    
      Worked Example: a meeting arranger system
    
    
      - We all go to meetings...
- We all know what a hassle it is arranging them...
- Wouldn't it be nice if the machines could do it for
      us?
- Here's how!
    
Creating an example document (quite easy)
    
      - Start by typing what you want into your favourite text
      editor.
- Invent sensible looking markup as you go along.
- 
        Don't be too casual about this 
        
          - this is a data design exercise,
- you need to think about not only what you need for
          this document,
- but what you might need for others.
- you need to think about all the possible uses of your
          document.
 
- Here's one I did
      earlier.
This is a good opportunity for a
    whiteboard and some interaction! If possible, get the
    participants to do an example for themselves.
    
    
Creating the DTD and/or Schema (hard, but we'll use a
    trick)
    
      - DTDs and Schemas are precise, technical documents. How
      are we going to make them?
- Pass our example page to the DTDGenerator
	    - [2004: unfortunately the DTD generator is no longer available]
 
- Pass the results of that through the DTD2Schema
      script (requires PERL)
- Tidy up the results with your text editor
- Here's a DTD and a schema I did earlier.
Again, if possible, get the participants
    to actually do this.
    
    
Viewing it: creating a style-sheet (harder)
    
      - 
        Two approaches to stylesheets: 
        
          - CSS1:
- just establishes visual styles for the actual
          elements in your document
- XSL:
- much more complex, but allows on-the-fly
          transformation of the document to present particular
          features
 (Of course, you can just do without altogether)
- 
        Here's one just for the
        agenda. 
        
      
    
Using it: applications
    Now we need to write applications which will: 
    
      - 
        allow us to generate these documents 
        
          - not very hard, there are Java components around which
          semi-automate creating a form-driven special-purpose
          editor from a DTD...
 
- 
        allow our diary programs to automatically handle these
        documents 
        
          - much harder, but XML parser libaries are available
          for most modern programming languages which you can build
          on.
 
- We probably won't get that far today.
    
      Specifying
    
    
      - The Structure of an XML document
- Exercise period [i]
    
The Structure of an XML document
    
      - Overall structure
- Processing Instructions
- XML Namespaces
- Elements
- Attributes
- When to use which
    
Overall Structure
    
      - 
        Prolog 
        
          - 
            The XML declaration 
            
              - <?xml version="1.0"?>
- declares that this is XML
- strictly, not optional
 
- 
            The Document Type Declaration 
            
              - <!DOCTYPE meeting PUBLIC "-//WEFT//DTD
              MEETING 0.1//EN" "meeting.dtd">
- says what dialect of XML this is
- optional
 
- Processing instructions
- Comments
 
- 
        Root element 
        
          - Just an element, like any other
- Just exactly one.
 
    
    
      - Special instructions for particular applications
- 
        Syntactically, delimited by <?and?>
          - <?xml version="1.0"?>is a
          processing instruction
- a special one
 
- 
        The tag-part identifies the particular application this PI
        is intended for 
        
          - xmlmeans 'any XML parser'
 
- The rest of the content is application specific
    
    
      - Warning: Special use of the term!
- Allow mutiple XML dialects to be used in one
      document
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      - xmlns means 'this is an XML namespace declaration'
- the rest means that names starting with xsl: belong to
      the namespace defined as http://www.w3.org/1999/XSL/Transform
- Note that the URL doesn't actually point to anything
      interesting, it's just a marker!
    
Elements [i]
    Syntactically, an element is what is delimited by its
    tags.
    
      - 
        An opening tag comprises a left angle bracket
        <, the name of the element, optionally some
        attribute-value pairs, and a closing angle bracket>
          - <meeting id="June Board
          Meeting">
 
- 
        A closing tag comprises a left angle bracket
        <, a slash/, the name of the
        element, and a closing angle bracket>
- An empty tag comprises a left angle bracket
      <, the name of the element, optionally some
      attribute-value pairs, a slash/, and a closing
      angle bracket>; it is just shorthand for an
      opening tag immediately followed by the closing tag with
      nothing in between.
    
Elements [ii]
    
      - 
        An element is a primary structural unit in the XML markup 
        
      
- 
        May allow child elements of particular kinds 
        
          - Or just text (PCDATA)
- Or neither (empty tags)
 
- An element may have many child elements with the same
      name
    
Attributes
    
      - An attribute belongs to a particular element type
- Has a name which is a string of characters
- Has a value which is a string of
      characters
- 
        Syntactically 
        
          - name and value are separated by an equals sign
          =
- value is delimited by quotation marks
          "
 
- An element may have only one attribute with any given
      name
    
When to use which
    
      - 
        When you may have a value which is a complex data item, use
        an element 
        
          - example: agenda containing agenda items
 
- 
        When you may have many values of the same type, use an
        element. 
        
      
- 
        When you may have a long simple text value, use an element 
        
          - example: title of an agenda item
 
- 
        When you always have just one short simple text value, use
        an attribute 
        
          - example: proposer of an agenda item
 
    <meeting id="June Board Meeting">
      <agenda>
        <item proposer="Simon Brooke">
          <title>
            Adoption of new project management
            procedures manual
          </title>
        </item>
        <item proposer="Angela Stormont">
          <title>
            Transfer of shares
          </title>
        </item>
      </agenda>
    </meeting>
   
    
    
Exercise period [i]
    
      - In groups, produce a DTD for an XML dialect to describe
      meetings
- You may use the DTD generator at <URL:http://www.pault.com/Xmltube/dtdgen.html>
	    - [2004: unfortunately the DTD generator is no longer available]
 
- You should think about your meetings database as you do
      so and have some idea of how your XML DTD relates to your
      database design.
    
      Creating
    
    
      - Building XML applications: tools and technologies
- Constructing the document
- Exercise period[ii]
    
Building XML applications: tools and technologies
    
      - Languages for XML applications
- Tools, components and toolkits
- What we will be using today
    
Why Java?
    
      - Portable
- Reasonably readable
- Very well supported with XML toolkits and components
- I like it...
    
Other languages for building XML applications
    
    
    
Tools, components and toolkits
    
      - Parsers
- Transformation engines
- APIs
- 
        Where to find XML tools 
        
      
    
Transformation engines
    Apply XSL stylesheets to transform a document from one
    representation to another.
    
      - XML to XML
- XML to HTML
- XML to text
    
What we will be using today
    
      - Apache
      Xalan
- XSL processor contributed to the Apache Foundation by IBM
      closely related to IBM's 
      LotusXSL processor
- Apache
      Xerces
- XML parser contributed to the Apache Foundation by IBM;
      based on IBM's XML4J parser
- SAX
- Simple API for XML, by David Megginson and others
- DOM
- The W3C Document Object Model API
- W3C Jigsaw
- HTTP Server and Servlet Server developed by W3C
- Jacquard
- A toolkit of useful bits for sticking it all together. By
      me. Not neccesarily the best but it's what I know and
      use.
    
Constructing the document
    
      - Writing text to the output stream
- Using the DOM
First an apology
Previous versions of this course contained a howling error at this
	point. It suggested creating DOM objects essentially by
	calling the newInstance method of the implementing
	classes. This only works with the particular DOM
	implementation you happen to be using and is not portable
	between DOM implementations (or even, necessarily, between
	successive versions of the same DOM implementation). So clearly
	it is very bad practice to do this.
I can only apologise to people who were mislead by
	this.
      
    
The Document Object Model
    
      - Standardised interface for working with XML
      documents
- A W3C standard
- Many DOM implementations
    
The DOM: what is a Document?
      
      
    
The DOM: what is an Element?
    
      - A 'tag'
- 
        With 'attributes' 
        
      
- 
        And 'contents' 
        
          - other elements which are children of this
          element
- text elements
 
- Constructed by calling the createElement( String tagName)
	  method of the Document object
- Or in Jacquard, by calling the generate method of a class
	  which implements the NodeGenerator interface
    
The DOM: what is a Text?
    
      - 
        just text 
        
          - No tag
- No attributes
- No enclosing angle brackets
 
    
Create a document object
    // get a handle on a DOM implementation...
    DOMImplementation di = DOMStub.getDOMImplementation( context);
    // and use it to create a document object
    Document doc = di.createDocument( getNamespaceURI( context), 
						  rootName, doctype);
      
      
	DOMStub is a Jacquard utility class which gets hold of
	whatever DOM implementation is available. If you don't use
	Jacquard you'll have to instantiate a DOM implementation for yourself.
      
    
    
Add a root ('content') element
    doc.appendChild( doc.createElement( doc, "eventsdiary"));
    Element content = doc.getDocumentElement();
    
    
      - Every Document must have exactly one 'content'
      element
- If you attempt to add another child to a document which
      already has a child, that's an error.
    
Add further elements recursively as required
            // match the pattern against the convenience view and pull
            // back the rows that match as namespaces
            Contexts events =
                TableDescriptor.getDescriptor( VIEW, null, 
                                               context ).match( pattern );
            Enumeration e = events.elements(  );
            // and pass each of those namespaces in turn to my event element 
            // generator to generate children for my element
            while ( e.hasMoreElements(  ) )
                content.appendChild( eventEltGenerator.generate( doc,
                        (Context) e.nextElement(  ) ) );
      
      
	This is a bit of a cheat. It depends on having a view in the
	database which collects together all the necessary fields for
	us:
---- EVENTS_VIEW -----------------------------------------------------
CREATE VIEW events_view AS
     SELECT EVENT.Actor,
            EVENT.Event,
            CATEGORY.Description AS Type,
            LOCATION.Description AS Location,
            EVENT.Eventdate,
            EVENT.Starttime,
            EVENT.Endtime,
            EVENT.Description
       FROM EVENT,
            CATEGORY,
            LOCATION
      WHERE EVENT.Location = LOCATION.Location
        AND EVENT.Category = CATEGORY.Category
   ORDER BY Eventdate,Starttime
;
    
    
Let's see that again [i] the source
public class DayView extends DocumentGeneratorImpl
{
    //~ Static fields/initializers --------------------------------------------
    /**
     * the name of the convenience view in the database from which I will
     * collect all the information I need
     */
    protected static final String VIEW = "events_view";
    /** the field in that view which represents the date of the event */
    protected static final String EVENTDATEFIELD = "when";
    //~ Instance fields -------------------------------------------------------
    /** a generate to generate the event elements which will be my children */
    protected EventElementGenerator eventEltGenerator =
        new EventElementGenerator(  );
    //~ Methods ---------------------------------------------------------------
    /**
     * generate a document containing all the events on the day implied by
     * this context
     */
    public Document generate( Context context ) throws GenerationException
    {
        DOMImplementation di = DOMStub.getDOMImplementation( context );
        Document doc = di.createDocument( "", "eventsdiary", null );
        String day = context.getValueAsString( "day" );
        uk.co.weft.dbutil.Calendar when = new uk.co.weft.dbutil.Calendar(  );
        if ( day != null )
        {
            // if we've got a date, set my calendar to that day
            // (by default it sets itself to today)
            when.setTime( java.sql.Date.valueOf( day ) );
        }
        Element content = doc.getDocumentElement(  );
        content.setAttribute( "date", when.toString(  ) );
        try
        {
            // create a new, blank, context as a pattern to match
            Context pattern = new Context(  );
            // give it the database username, password and url from the current context
            pattern.copyDBTokens( context );
            // put the date we're interested in into the pattern
            pattern.put( EVENTDATEFIELD, when );
            // match the pattern against the cnvenience view and pull
            // back the rows that match as namespaces
            Contexts events =
                TableDescriptor.getDescriptor( VIEW, null, context ).match( pattern );
            Enumeration e = events.elements(  );
            // and pass each of those namespaces in turn to my event element 
            // generator to generate children for my element
            while ( e.hasMoreElements(  ) )
                content.appendChild( eventEltGenerator.generate( doc,
                        (Context) e.nextElement(  ) ) );
        }
        catch ( DataStoreException dex )
        {
            throw new GenerationException( "Failed to read from data store: " +
                dex.getMessage(  ) );
        }
        return doc;
    }
    
    
Let's see that again [ii]: the event element generator
    The event element is a simple wrapper round a context
    element generator:
    //~ Inner Classes ---------------------------------------------------------
    /**
     * a generator for an XML element representing a single event. This uses
     * ContextElementGenerator which knows how to construct a DOM element
     * node by taking values out of a context, so all we need to do is tell
     * it which value names to treat as attributes and which as children
     */
    class EventElementGenerator extends ContextElementGenerator
    {
        //~ Constructors ------------------------------------------------------
        /**
         * the tag of the element I generate is 'event'
         */
        public EventElementGenerator(  )
        {
            super( "event" );
        }
        //~ Methods -----------------------------------------------------------
        /**
         * return a String array of the names of my properties to output as
         * attributes
         */
        protected String[] getAttrNames(  )
        {
            String[] attrNames =
            { "event", "type", "location", "starttime", "endtime", "actor" };
            return attrNames;
        }
        /**
         * return a String array of the names of my properties to output as
         * children
         */
        protected String[] getChildNames(  )
        {
            String[] childNames = { "description" };
            return childNames;
        }
    }
}
    
    
Let's see that again: [iii] the context element generator
    
      - 
        A class which makes a simple elements out of namespaces.
        Often useful 
        
          - Not part of DOM or SAX - part of my own Jacquard
          toolkit
- There's no particular reason to use Jacquard
 
- 
	  ContextElmentGenerator
        
          - A name to be used as an element name
- A list of names which are to be used as
          attributes
- A list of names which are to be used as child (text)
          elements
- Constructs element nodes to that specification
	      
		- taking attribute and child values from a namespace
		  passed to the generate method
 
 
    
Let's see that again: [iv] the output
<?xml version="1.0"?>
 <eventsdiary
  date="Jul 18, 2000">
  <event
   actor="simon"
   endtime="5:30:00 PM"
   event="19"
   location="Yokohama, Japan"
   starttime="9:00:00 AM"
   type="Otherwise unavailable">
   <description>
    Lecture, Java XML, all day
   </description>
  </event>
 </eventsdiary>
    Should be online here
    (login required). HTML formatted view here
    
    
Exercise period [ii]
    We may skip this one if time's short or the group is
    struggling!
    
      - In groups: Try to write a Java application or Servlet
      which produces at least part of an XML document to your
      meeting DTD from your database
    
      Transforming
    
    
      - Beginning XSL-T
- Exercise period [iii]
    
Beginning XSL-T [i] The 'stylesheet'
    
<?xml version="1.0"?>
<xsl:stylesheet version=1.0
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Basic XSL stylesheet for day view of events diary.  -->
  <xsl:output indent="yes" method="html" 
          doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
  <xsl:template match="eventsdiary">
    <html>
      <head>
    <title>
      Diary for <xsl:value-of select="@date" />
    </title>
    <link rel="StyleSheet" href="/styles/jacquard.css" type="text/css" 
      media="screen"/>
      </head>
      <body>
    <h1>
      Diary for <xsl:value-of select="@date" />
    </h1>
    <table>
      <tr>
        <th rowspan="2">
        Who
        </th>
        <th rowspan="2">
        Where
        </th>
        <th colspan="2">
        When
        </th>
        <th rowspan="2">
        What
        </th>
        <th rowspan="2">
        Details
        </th>
        <th rowspan="2">
        <a href="event">Add</a>
        </th>
      </tr>
      <tr>
        <th>
          Starts
        </th>
        <th>
          Ends
        </th>
      </tr>
      <xsl:apply-templates select="event" />
    </table>
      </body>
   </html>
  </xsl:template>
  <xsl:template match="event">
    <tr>
      <td>
    <xsl:value-of select="@actor"/>
      </td>
      <td>
    <xsl:value-of select="@location"/>
      </td>
      <td>
    <xsl:value-of select="@starttime"/>
      </td>
      <td>
    <xsl:value-of select="@endtime"/>
      </td>
      <td>
    <xsl:value-of select="@type"/>
      </td>
      <td>
    <xsl:value-of select="description"/>
      </td>
      <td>
    <a>
      <xsl:attribute name="href">event?event=<xsl:value-of 
        select="@event"/>
      </xsl:attribute>
      Edit
    </a>
      </td>
    </tr>
  </xsl:template>
</xsl:stylesheet>
    
    
Beginning XSL-T [ii] The 'stylesheet' tag
<?xml version="1.0"?>
    
      - This says this stylesheet is written in XML; it should be
      the first line of every XML document
- Yes, XSL is a dialect of XML
- version=1.0says it's version 1.0 of
      XML
<xsl:stylesheet version=1.0 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      - Every XSL-T 'stylesheet' starts with this
- xsl:stylesheetsays it's a stylesheet
- version=1.0says it's version 1.0 of
      XSL
- xmlnssays the namesspace definition of
      names which start with '- xsl:' is identified by
      the URL- http://www.w3.org/1999/XSL/Transform
    
Beginning XSL-T [iii] comments
<!-- Basic XSL stylesheet for day view of events diary.  -->
    
      - 
        Comments in XSL text are just like any other XML (or SGML)
        comments 
        
          - Start with <!-- (the space
          matters)
- End with  -->(the space
          matters)
 
- Because they're comments, they don't appear in the
      output
- 
        To create comments in the output, use
        xsl:comment
          - <xsl:comment>text of
          comment</xsl:comment>
- will produce <!-- text of comment
          -->
 
    
Beginning XSL-T [iv] output specifier
  <xsl:output indent="yes" method="html" 
          doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
    
      - The output specifier is not required
- 
        If it exists it must appear at top level 
        
          - as a child of the xsl:stylesheetelement
 
- indent="yes"says we want the output neatly
      indented to show structure
- 
        method="html"saya we want the output to have
        html syntax
          - might have been "xml" or "text"
 
- doctype-publicsays include a DOCTYPE
      declaration of this DTD
- There are a number of other possible
      attributes.
    
Beginning XSL-T [v] declaring a template
  <xsl:template match="eventsdiary">
    This template matches every instance of the element
    eventsdiary which is found in the document being
    processed. As eventsdiary is the root element of
    the document type we're interested in, there will only be
    one.
    <html>
      <head>
        <title>
    As you can see, what is in the template is just the HTML
    markup that will be output (if we were outputting XML, it would
    be XML, of course)...
      Diary for <xsl:value-of select="@date" />
    with scattered among it special xsl tags which
    cause things to be spliced into the output. This one says 'use the
    value of the date attribute of the current element'
        </title>
      </head>
      <body>
        <h1>
          Diary for <xsl:value-of select="@date" />
        </h1>
        <table>
          <tr>
            <th rowspan="2">
              Who
            </th>
            <th rowspan="2">
              Where
            </th>
            <th colspan="2">
              When
            </th>
            <th rowspan="2">
              What
            </th>
            <th rowspan="2">
              Details
            </th>
            <th rowspan="2">
              <a href="event">Add</a>
            </th>
          </tr>
          <tr>
            <th>
              Starts
            </th>
            <th>
              Ends
            </th>
          </tr>
          <xsl:apply-templates select="event" />
    This is the important one. It says "apply the templates in
    this stylesheet to all the instances of event
    elements which are children of the current node".
        </table>
      </body>
   </html>
  </xsl:template>
    
    
Beginning XSL-T [vi] other useful bits
<xsl:template match="section[ @slot='main']">
    This template will match only section elements
    which have an attribute named slot whose value is
    main
  <p>
    <xsl:call-template name="toc"/>
    paste in the output of the named template called toc.
  </p>
  <xsl:apply-templates select="section">
    <xsl:sort select="title"/>
    Apply templates in this stylesheet to sections
    which are children of this section, sorted
    alphabetically by their title sub-element
  </xsl:apply-templates>
</xsl:template>
<xsl:template name="toc">
    This is the named template which was called
    earlier. Most templates are not named: they are applied
    automatically if their patterns match an element
  <xsl:for-each select="section">
    for-each iterates over matching elements in turn
      <xsl:sort select="title"/>
        <a>
              <xsl:attribute name="href">#<xsl:value-of select="title"/>
                </xsl:attribute>
    xsl:attribute allows us to construct the value of an
    attribute of the enclosing tag
          <xsl:value-of select="title"/>
      </a> |
  </xsl:for-each>
</xsl:template>
    
    
XSL-T elements: reprise
    
      - xsl:output
- allows us to define how we want the output to be
      formatted
- xsl:template
- defines what should be output for elements matching a
      given pattern
- xsl:apply-templates
- applies the templates to the elements which match its
      pattern
- xsl:call-template
- calls a template with a particular name, overriding the
      pattern-matching system
- xsl:for-each
- produces output iteratively, overriding the
      pattern-matching system
- xsl:sort
- orders the result of its enclosing element (an
      xsl:apply-templatesor anxsl:for-each)
- xsl:value-of
- produces the value of the thing matched by its
      pattern
- xsl:attribute
- outputs an attribute for the output element which
      encloses it
There are a few more XSL elements, but these will do most
    things for you.
    
    
Beginning XSL-T [vii]: Patterns
    
      - *
- matches any element
- foo
- matches any element whose type is foo
- foo | bar
- matches any element whose type is fooorbar
- foo/bar
- matches any barelement with afooparent
- foo//bar
- matches any barelement with afooancestor
- foo[ @bar='baz']
- matches any fooelement which has abarattribute which has the valuebaz
- foo[1]
- matches any fooelement which is the firstfoochild of its parent
- foo[ position() = 1]
- matches any fooelement which is the first
      child of its parent
- [ position() < 5]
- matches any element which is the first, second, third or
      fourth child of its parent
- text()
- matches any text element.
This is just the basics. The full definition is here
    
    
XSL-T: A deceptively simple language
    
      - Not many elements
- Simple to learn all of them
- Very subtle in use
- The power is in the patterns
    
Exercise period [iii]
    
      - In groups: Write an XSL-T stylesheet which produces an
      HTML agenda for your group's Meeting DTD.
- Everyone together: negotiate and agree a new, common DTD
      which you can use to communicate meeting information between
      your groups
- In groups: Write an XSL-T stylesheet which produces a
      document conforming to the common DTD from a document
      conforming to the groups DTD.
    
      Communicating
    
    
      - Just a bit about transport
- XML Parsers
- Parsing XML into the Database
- Parsing: Simple worked example
- Exercise period [iv]
    
Just a bit about transport
    
      - XML is about the content of communication, not how it's
      sent...
- 
        But how do you send XML information? 
        
          - HTTP GET to get a information from a known place
- HTTP POST or PUT to send information to a known
          place
- Special purpose listener daemons with special purpose
          protocols
- eMail
 
    
Parsers
    
      - read a document from some source,
- construct a representation of that document in the
      machine
- or provide the hooks to allow you to do so
Parsing is quite compute-intensive - don't do it if you
    don't have to!
    
    
More about parsers [i] types
    
      - 
        Event-based parsers 
        
          - You register handlers for parsing events you are
          interested in
- The parser calls these handlers when it sees the
          events
- Useful if you only want some of the information out
          of the document
- Useful if the document might use more memory than you
          have available
- Quite a lot of work to set up.
 
- 
        Document parsers 
        
          - Usually built on event-based parsers
- 
            Parse the whole document and provide you with a handle
            on an internal representation of it 
            
              - Usually a DOM document object
 
- Useful if you want all the information out of the
          document
 
    
More about parsers [ii] types
    
      - 
        Validating parsers 
        
          - Read the DTD (or schema)
- Read the document
- If the document isn't valid according to the DTD,
          report this
- Good if you're making sure your document conforms to
          the dialect standard
 
- 
        Non validating parsers 
        
          - Don't read the DTD (or schema)
- Read the document
- Will still throw an error if the document has bad
          syntax
- Good if you just want to parse XML quickly
 
    
Parsing from XML into the database
    
      - Walk recursively down the document tree
- identifying the elements we want to store
- for each one, see if it's already there (tricky!)
- if not, store it.
    
Identifying the data to store
    
      - The attributes of an element are a namespace
- So are the fields of a table
- 
        If you have one table for every element type 
        
          - and one field in that table for every attribute that
          element can have
 
- It's relatively easy
- 
        The real world isn't often like that 
        
          - the overall structure of XML and relational databases
          are quite different
- most serious databases have been around a long time,
          we can't just design them to fit our DTD
- most DTDs are agreed between large numbers of
          organisations, we can't just design them to fit our
          database
- but it may be coerced with a little help from
          XSL...
 
    
Other things to bear in mind
    
      - Text nodes - what do you do with them?
- Context - what was the key value of that meeting we just
      stored?
    
Parsing: very simple worked example
    
      - Sample XML document
- Sample Java class
Sample XML document
<?xml version="1.0"?>
<workshop tutor="Simon Brooke" 
  title="Parsing XML" venue="small">
  <attendee name="Jon Smith" age="37" 
    sex="M" country="UK" />
  <attendee name="Jane Doe" age="42" 
      sex="F" country="US" />
</workshop>
    those who were here yesterday will
    probably recognise this from the 'WORKSHOP' database - I'm
    using this because I can't predict what your 'MEETING'
    databases will look like
    
    
Sample Java class
import java.io.*;       // to read things from the user
import java.sql.*;      // to talk to the database
import uk.co.weft.domutil.*;    // things to convert elements to namespaces
import uk.co.weft.dbutil.*; // things to store namespaces in databases
import org.w3c.dom.*;       // interrogates a DOM tree...
import org.apache.xerces.dom.*; // using Apache's DOM implementation
import org.apache.xalan.xslt.*; // Apache's XSL processor
import org.apache.xerces.parsers.DOMParser;
                // and Apache's XML parser
public class ParseExample
{
    static Context connectionContext = new Context();
                // a context to hold database
                // connection details
    /** walk down a document tree looking for nodes we recognise */
    public static void walk( Node node)
    throws SQLException, DataStoreException
    {
    if ( node.getNodeType() == Node.ELEMENT_NODE)
        {
        Element elt = ( Element) node;
        System.out.println( "Considering element of type " +
                    elt.getTagName());
        if ( elt.getTagName().equals( "workshop"))
            handleWorkshop( elt);
        else
            {
            NodeList children = elt.getChildNodes();
            for ( int i = 0; i < children.getLength(); i++)
                walk( children.item( i));
                // recurse down through the children
            }
        }
    }
    /** handle a workshop element; extract its attribute (and
     *  actually, it's text-only child) values, and store them in the
     *  database. Then look for attendees.*/
    protected static void handleWorkshop( Element elt) 
    throws SQLException, DataStoreException
    {
    Object key = null;
    Context c = ( Context)connectionContext.clone(); 
                // construct a new namespace with just
                // the database connection details in
                // it
    ContextElement.populateContext( elt, c);
                // fill it with values from the element
    TableDescriptor workshopDescriptor = 
        TableDescriptor.getDescriptor( "WORKSHOP", "Workshop", c);
                // get a descriptor on the WORKSHOP table
    Contexts rows = workshopDescriptor.match( c);
                // try to match that against what's
                // already in the table
    if ( rows != null && rows.size() > 0)
        {           // there was a match
        key = ( ( Context)rows.get( 0)).getValueAsInteger( "Workshop");
                // get its primary key value
        System.out.println( "Found workshop " + key.toString());
        }
    else
        {
        key = workshopDescriptor.store( c);
                // store it and get its primary key value
        System.out.println( "Created workshop " + key.toString());
        }
    NodeList children = elt.getChildNodes();
    for ( int i = 0; i < children.getLength(); i++)
        {           // look through the children for my attendees
        Node child = children.item( i);
        if ( child.getNodeType() == Node.ELEMENT_NODE &&
             ( ( Element) child).getTagName().equals( "attendee"))
            {
            handleAttendee( ( Element)child, key);
            }
        }
    }
    /** handle an attendee element by finding or storing it in the
     *  database, and fixing up the link table */
    protected static void handleAttendee( Element elt, Object workshopKey)
    throws SQLException, DataStoreException
    {
    Object attendeeKey = null;
    Context c = ( Context)connectionContext.clone(); 
                // construct a new namespace with just
                // the database connection details in
                // it
    ContextElement.populateContext( elt, c);
                // fill it with values from the element
    TableDescriptor attendeeDescriptor = 
        TableDescriptor.getDescriptor( "ATTENDEE", "Attendee", c);
                // get a descriptor on the ATTENDEE table
    Contexts rows = attendeeDescriptor.match( c);
                // try to match that against what's
                // already in the table
    if ( rows != null && rows.size() > 0)
        {           // there was a match
        attendeeKey = 
            ( ( Context)rows.get( 0)).getValueAsInteger( "Attendee");
                // get its primary key value
        System.out.println( "Found attendee " + 
                    attendeeKey.toString());
        }
    else
        {
        attendeeKey = attendeeDescriptor.store( c);
                // store it and get its primary key value
        System.out.println( "Created attendee " + 
                    attendeeKey.toString());
        }
    String q = "insert into ATTENDANCE ( Attendee, Workshop) values ("
        + attendeeKey.toString() + ", " + workshopKey.toString() + ")";
    Connection conn = c.getConnection();
    Statement s = conn.createStatement();
                // set up a database connection
    s.executeUpdate( q);    // run the statement
    System.out.println( "Inserted link into link table");
    s.close();      // close it...
    c.releaseConnection( conn);
                // and release it back into the pool
    }
    /** prompt the user for input; if we get any, return it */
    protected static String maybeGetFromUser( BufferedReader in, String prompt,
                       String val) throws IOException
    {
    System.out.print( prompt + " ] ");
    String s = in.readLine();
    if ( s != null || s.length() == 0)
        val = s.trim();
    
    return val;
    }
    /** start me up... */
    public static void main(String args[]) 
    {
    BufferedReader in = new 
        BufferedReader( new InputStreamReader( System.in));
                // get from the user the name of the
                // database driver to use
    try
        {
        Class.forName( 
              maybeGetFromUser( in, "Database Driver", 
                    "sun.jdbc.odbc.JdbcOdbcDriver"));
                // get from the user the details
                // needed to connect to the database
        connectionContext.put( "db_url", 
                   maybeGetFromUser( in, "Database URL", 
                         "jdbc:odbc:workshop"));
        connectionContext.put( "db_username", 
                   maybeGetFromUser( in, "Database Username", 
                         "nobody"));
        connectionContext.put( "db_password", 
                   maybeGetFromUser( in, "Database Password", 
                         "doesntmatter"));
        DOMParser p = new DOMParser();
            
        p.parse( maybeGetFromUser( in, "URL of XML to handle", 
                         "file:workshop.xml"));
        walk( p.getDocument().getDocumentElement());
        System.exit( 0); // all satisfactory
        }
    catch ( Exception e)
        {
        System.out.println( "Failed: " + e.getClass().getName() +
                    ": " +e.getMessage());
        System.exit( 1); // whoops
        }
    }
}
    
    
Exercise period [iv]
    
      - 
        In your groups 
        
          - Write an XSL-T stylesheet that converts back from the
          common DTD to the group's DTD
- Adapt the above Java class to store (at least part
          of) documents in your group's DTD into your database
 
    
      References
    
    XML
    
      - 
        
      
- news:comp.text.xml
- Newsgroup for XML - recommended
- 
        FAQs, Directories and Resources
          - 
            
          
- Extensible Markup Language (XML): http://www.oasis-open.org/cover/xml.html
- A useful and authoritative overview of the
          technology; another good place to start.
- Frequently Asked Questions about the Extensible
          Markup Language: http://www.ucc.ie/xml/
- The most superior FAQ. Everyone seriously interested
          in XML should start here.
- SCHEMA.NET: The XML Schema Site: http://www.schema.net/
- Cafe con Leche XML News, and Resources: http://metalab.unc.edu/xml/index.html
- DEVELOPERLIFE.COM brought to you by Nazmul Idris.: http://developerlife.com/
- xmlTree - The leading directory of XML content on the
          Web: http://www.xmltree.com/
 
- 
        News
          - 
            
          
- Welcome to XMLNews.org: http://www.xmlnews.org/
- Mulberry Technologies, Inc.: XSL-List -- Open Forum
          on XSL: http://www.mulberrytech.com/xsl/xsl-list/
- XMLephant: News: http://www.xmlephant.com/pages/News/
- XML.ORG - A good XML Portal: http://www.xml.org/
- XML.com - Another good XML portal: http://www.xml.com/pub
 
- 
        Standards
- Authoritative sources of standards documents, mostly from
      the World Wide Web Consortium (W3C)
- 
        
          - 
            
          
- 
            Core standards
- 
            
              - 
                
              
- The Annotated XML Specification: http://www.xml.com/axml/testaxml.htm
- The standard annotated by one of the editor's
              personal comments -- very revealing!
- Extensible Markup Language (XML) 1.0: http://www.w3.org/TR/1998/REC-xml-19980210
- XML Linking Language (XLink): http://www.w3.org/TR/WD-xlink#addressing
 
- 
            Resource Description Framework
              - 
                
              
- W3C Resource Description Framework: http://www.w3.org/RDF/
- java tutorial help resource only at gamelan.com:
              
              http://www.gamelan.com/journal/techfocus/090199_rdf1.html
- UKOLN: DC-dot, A Dublin Core Generator: http://www.ukoln.ac.uk/metadata/dcdot/
- Dublin Core Metadata Initiative / Documents /
              Proposed Recommendations / Dublin Core Element Set,
              Version 1.1: 
              http://purl.org/DC/documents/rec-dces-19990702.htm
- Dublin Core Metadata Initiative: http://purl.org/dc/index.htm
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- Welcome to XMLNews.org: http://www.xmlnews.org/
 
- 
            XSL
              - 
                
              
- XSL Transformations (XSLT) Specification: http://www.w3.org/TR/WD-xslt
 
- 
            DocBook
              - 
                
              
- The nwalsh.com Home Page - XSL DocBook
              Stylesheets: http://nwalsh.com/docbook/xsl/
- XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/
 
- 
            WML
              - 
                
              
- WAP WAP Binary XML (WBXML) Encoding
              Specification: http://www.w3.org/TR/wbxml/
- Welcome to WAP School: http://www.refsnesdata.no/wap/default.asp
- Nokia WAP Developer Forum: Nokia WAP Toolkit: 
              http://www.forum.nokia.com/wapforum/main/1,6668,1_1_3_2,00.html
 
- 
            RSS: Rich Site Summary
              - 
                
              
- 
                Tutorials
                  - 
                    
                  
- My Netscape Network: http://my.netscape.com/publish/
- Using RSS News Feeds - Webreference.com: 
                  http://www.webreference.com/perl/tutorial/8/
 
- 
                Feed Directories
                  - 
                    
                  
- Webfeeds: 
                  http://www.stirbitch.com/cgi-bin/agg/sources.pl
- Moreover... Top stories: http://w.moreover.com/
- StartsHere Channel List: 
                  http://theweb.startshere.net/channels.phtml
- Open Directory - Computers: Internet: WWW:
                  Web Portals: Netscape Netcenter: My Netscape
                  Network: 
                  http://dmoz.org/Computers/Internet/WWW/Web_Portals/Netscape_Netcenter/My_Netscape_Network/
 
- Internet Alchemy : Internet Alchemy : RSSMaker:
              http://internetalchemy.org/rss/index.phtml
- xmlTree - The leading directory of XML content on
              the Web: http://www.xmltree.com/rss/index.htm
 
- XML.COM - Standards List Sorted by Date: http://www.xml.com/xml/pub/standate/
- W3C Scalable Vector Graphics (SVG): http://www.w3.org/Graphics/SVG/
- VML - the Vector Markup Language: http://www.w3.org/TR/1998/NOTE-VML-19980513
- Vector (infinitely zoomable) graphics for the Web,
          with implications especially for maps and technical
          diagrams.
- News Industry Text Format: http://www.nitf.org/
- Meta Content Framework Using XML: http://www.w3.org/TR/NOTE-MCF-XML/
- 'Content about content' - i.e. information for search
          and indexing engines and other software agents which must
          make some sense of the document.
- Audio, Video, and Synchronized Multimedia: http://www.w3.org/AudioVideo/
- The SMIL standard. I believe SMIL has implications
          not just for the Web, but for all sorts of presentation
          media including digital television.
- XHTML 1.0: The Extensible HyperText Markup Language:
          http://www.w3.org/TR/WD-html-in-xml/
- Backwards compatibility: implementing HTML in XML.
          Only very well written HTML is going to work!
- XML Catalog proposal: http://www.ccil.org/~cowan/XML/XCatalog.html
- XHTML 1.0: The Extensible HyperText Markup Language:
          http://www.w3.org/TR/xhtml1/
- Template Resolution in XML/HTML: http://www-uk.hpl.hp.com/people/ak/doc/trix.html
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- Workflow Management Coalition: http://www.aiim.org/wfmc/mainframe.htm
- DSML.ORG: The Standards Effort to Link Directories
          with XML: http://www.dsml.org/
 
- 
        Turorials
          - 
            
          
- Info for Newcomers to XML at XMLINFO: http://www.xmlinfo.com/newcomers/
- Producing HTML tables with XSLT: 
          http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- A Tutorial in XML and XSL Authoring: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- Java & XML: 1 + 1 > 2: 
          http://www.sun.com.au/sjug/pres/xml/JavaAndXML/seminar.html#Slide3
- The WDVL: XML Tutorials: 
          http://www.wdvl.com/Authoring/Languages/XML/Tutorials/
- Generally Markup: XML Resources: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- developerWorks : XML : Education: 
          http://www.software.ibm.com/developer/education/xmlintro/xmlintro.html
- SGML/XML: Using Elements and Attributes: 
          http://www.oasis-open.org/cover/elementsAndAttrs.html
- Producing HTML tables with XSLT: 
          http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- Welcome to XML School: http://www.refsnesdata.no/xml/
- Practical XML : An introduction to XML and XSL
          stylesheets: 
          http://www.kst.com/articles/2000/January/practical_xml1/index.php
- Crane Softwrights Ltd. - Training: 
          http://www.CraneSoftwrights.com/training/index.htm#ptux-dl
- developerWorks : XML : Education: 
          http://www-4.ibm.com/software/developer/education/xmlintro/xmlintro.html
- RSS Tutorial: 
          http://my.netscape.com/publish/help/mnn20/quickstart.html#rsssyntax
- XML DTD Tutorial: http://www.xml101.com/dtd/
 
- 
        Software resources
          - 
            
          
- 
            Editors
              - 
                
              
- Editing SGML with Emacs and PSGML - Table of
              Contents: 
              http://rainbow.ldeo.columbia.edu/documentation/programs/psgml/psgml_toc.html#SEC2
- A GNU Emacs mode for SGML files: 
              http://www.lysator.liu.se/projects/about_psgml.html
- This is what I use and recommend (I personally
              use XEmacs rather than GNU Emacs)
- SoftQuad XMetaLhttp://www.softquad.com/index_main.html
- Mulberry Technologies -- tdtd Emacs Major Mode
              for SGML and XML DTDs: http://www.mulberrytech.com/tdtd/
- Download Morphon XML Editor 1.0b41: 
              http://www.lunatech.com/products/morphon-xml-editor/download/
 
- 
            Browsers
              - 
                
              
- Jumbo: 
              http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/
- Doczilla: http://www.doczilla.com/download/index.html
- XML Viewer : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/xmlviewer
- InDelv: http://www.indelv.com/
 
- 
            XML to HTML on the fly
              - 
                
              
- IBM XML Web Site, Education - Accessing XML on
              the Client: 
              http://www.software.ibm.com/xml/education/client/client.html
- Apache Cocoon: http://xml.apache.org/cocoon/
- Apache is the world's most widely used Web
              server. This is the Apache project's server-side XML
              to HTML conversion strategy, important for serving
              XML documents while many browsers are still unable to
              interpret it. Implemented as a Java Servlet, may work
              with other Servlet enabled Web servers (but then does
              anyone serious use anything other than Apache
              anyway?)
 
- 
            XML Database integration
              - 
                
              
- DB2XML A tool for transforming relational
              databases into XML documents: 
              http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- Tamino - The Information Server for Electronic
              Business, Software AG: http://www.softwareag.com/tamino/
- A database which claims to store XML directly.
              Whether this means that it's really an
              object-oriented database underneath I'm not
              sure.
- ODBC2XML: Merging ODBC data into XML documents:
              
              http://members.xoom.com/_XOOM/gvaughan/odbc2xml.htm
- pgxml homepage: http://www.morinel.demon.nl/pgxml/
- My favourite database engine, Postgres,
 
- XML Lightweight Extractor : another alphaWorks
              technology: http://alphaworks.ibm.com/tech/xle
 
- 
            Conversion tools and filters
              - 
                
              
- RTF2XML: http://www.xmeta.com/omlette/
- Tool for converting RTF to XML, written in
              Omnimark
- OmniMark Technologies Corporation: http://www.omnimark.com/
- A programming language for manipulating data
              streams, useful in writing conversion filters from
              other formats into XML.
 
- 
            Quick ways to produce DTDs
              - 
                
              
- DTDGenerator Frontend: http://www.pault.com/Xmltube/dtdgen.html
- DB2XML A tool for transforming relational
              databases into XML documents: 
              http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- schematron: 
              http://www.ascc.net/xml/resource/schematron/schematron.html
- Widely recommended as a very powerful and elegant
              solution, knows about schemas as well as DTDs.
- XMLschema.com: http://apps.xmlschema.com/
 
- 
            Structured Search tools
              - 
                
              
- Downloading sgrep: 
              http://www.cs.helsinki.fi/~jjaakkol/sgrep/download.html
- Probably the most powerful simple tool for
              manipulating SGML and XML documents
 
- 
            Software collections and directories
              - 
                
              
- xml.apache.org: http://xml.apache.org/
- XMLSOFTWARE.COM: The XML Software Site: http://www.xmlsoftware.com/
- This (commercial) site tries to keep track of XML
              related software tools which are available. Likely
              not to effectively index open source tools in the
              longer term.
- Free XML software: 
              http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html#SC_XSL
 
- IBM Developers: XML : Overview: http://www.ibm.com/developer/xml/
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- OpenXML: http://www.openxml.org/
- Major open source project to provide XML tools in
          Java
- PHP3: Manual: XML Parser Functions: http://www.php.net/manual/ref.xml.php3
- PHP is a server-side scripting language -- probably
          the best of the open source ones available. This manual
          section shows how the PHP project intends to handle XML
          at the server side, and is thus an alternative to
          Apache's Cocoon technology.
- XML Authority Product Overview: 
          http://www.extensibility.com/xml_authority/xml_ath_specs.htm
- eidon products - Solutions for Structured Documents:
          http://www.eidon-products.com/
- Dynamic XML for Java : another alphaWorks technology:
          
          http://www.alphaworks.ibm.com/tech/dynamicxmlforjava
- XML Products Evaluation Form: 
          http://www.bluestone.com/scripts/SaApps/SaCGI.exe/XMLevaluate.class
- XML Script - XML tools for E-commerce: http://www.xmlscript.org/
- SAX: The Simple API for XML: http://www.megginson.com/SAX/
- Activated Intelligence Rocks Your Java World!: http://www.activated.com/
- W4F, the World Wide Web Wrapper Factory: Welcome: http://db.cis.upenn.edu/W4F/
- JDOM: Who We Are: http://www.jdom.org/credits/index.html
 
- 
        Commentry and background
          - 
            
          
- XML, Java, and the future of the Web: 
          ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- Scientific American: Feature Article: XML and the
          Second Generation Web: May 1999: 
          http://www.scientificamerican.com/1999/0599issue/0599bosak.html
- An extremely clear and well written article
 
- DevEdge Online - Metadata: 
          http://developer.netscape.com/tech/metadata/index.html
- Netscape's official take on metadata.
- XML.COM - XML support in IE5: 
          http://www.xml.com/xml/pub/1999/03/ie5/first-x.html
- XML.com sets out to be a newsletter on XML and
          related developments. It's contributors are in general
          exceptionally well informed. In this article Tim Bray
          (who works closely with Netscape) reviews Microsoft IE5's
          XML compatibility.
- CNET News.com - Taking sides on XML: http://www.news.com/News/Item/0,4,37072,00.html
- XML, Java, and the future of the Web: 
          ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- XML Namespaces: http://www.jclark.com/xml/xmlns.htm
- The Last Page: XML's Achilles Heel (Web Techniques,
          June 1999): 
          http://www.webtechniques.com/archives/1999/06/lastpage/
 
- 
        XML EDI and e-Commerce stuff
- A number of competeing proposals are being developed to
      do automatic businessto business transfer of invoices,
      orders,et cetera...
- 
        
          - 
            
          
- CNET.com - News - Services & Consulting -
          Big-name chemical firms join business e-commerce trend:
          
          http://news.cnet.com/news/0-1008-200-1579569.html?tag=st
- 
            Collaborative initiatives
              - 
                
              
- The OBI Consortium: http://www.openbuy.org/
- A solid business community consortium
 
- Welcome to RosettaNet: http://www.rosettanet.org/
- Probably the most incompetent and unprofessional
              Web site I've ever seen. This organisation claims to
              be the hub of EDI in XML development, but their Web
              site gives no comfort whatever regarding their
              competence.
- Biztalk - Letting computers speak the language of
              business: http://www.biztalk.org/
- Microsoft's tame e-Commerce consortium.
- FpML.org: http://www.fpml.org/
- JP Morgan - PriceWaterhouseCoopers initiative,
              apparently mainly aimed at financial services.
- Electronic Business XML (ebXML) Home Page: http://www.ebXML.org/
 
- 
            Suppliers
              - 
                
              
- DEDIOUX - Dynamic EDI Objects Using XML: 
              http://www.americancoders.com/OpenBusinessObjects
- ariba.com - welcome: http://www.ariba.com/
- Welcome To OpenLink Software: http://www.openlinksw.com/virtuoso/
 
- 
            Stories
              - 
                
              
- XML Applications Stand Up To EDI: 
              http://www.techweb.com/wire/story/TWB19990416S0002
- XML Applications Stand Up To EDI: 
              http://www.techweb.com/se/directlink.cgi?INW19990419S0014
- News story about Dell Computer's XML
 
- CNET News.com - IBM links business software,
              e-commerce: 
              http://www.news.com/News/Item/0,4,35128,00.html
- News story about IBM's XML e-Commerce
 
 
 
- 
        WAP/WML
          - 
            
          
- WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
- wml-tools: http://www.pwot.co.uk/wml/
- www.kannel.org: http://www.kannel.org/
 
- XML Icon Gallery.: http://www.iol.ie/~alank/xml/icons.htm
    give me 
    feedback on this page // show previous 
    feedback on this page