Course Outline
Adopting XML: Tomorrow's Web
1 Day Tutorial Format
Course Description
A one day workshop which gives an overview of the present
state of development and future prospects for XML, (eXtensible
Markup Language), the proposed new open standard for Web
documents. The workshop includes some technical material and
hands-on elements. Participants should have a working knowledge
of HTML.
NOTE This course has not been updated since
1998 and is now woefully out of date. The successor course is here.
Who should attend
- Web developers,
- Web development managers.
Course Objectives
The course is intended to help you:
- Understand the underlying technology of XML
- Understand the benefits offered by XML, and their
costs.
- Be aware of the current state of the standardisation of
XML
- Be aware of commercial implementations and market
committments to XML
- Be aware of available tools for generating XML
documents
- Understand the issues required in switching to XML from
other technologies
- Decide whether, and when, to switch your Web development
to XML
Course Outcomes
The delegate will understand how XML fits into the
architecture of the Web, and will have the resources to develop
a strategy for incorporating XML into their organistions' Web
presence, or for converting an existing Web presence entirely
to XML.
- Course Notes
- Set of Online and Published references
- XML Strategy Workbook
Working Method
- Presentation of Information
- Worked case-study
- Hands-on investigation
- Personal plan workbook
But first, an apology...
This course was written by and was scheduled to be presented
by me, Simon
Brooke. However, what I did this summer was very
carelessly roll an open car and break my back, so it will
instead be presented by my friend and colleague, Gordon Howell. I'm sure
he'll keep you entertained and informed for the duration...
Presenting this course
Tutors notes in the text are marked up
like this - just as reminders of where you are at and any
special things to mention.
There are many hypertext links in the
body of the document. Except where mentioned in tutor's notes,
they are not part of the course and you need not follow them.
If you do follow them, please come back to the body of the
course!
Please ensure that you have downloaded
local copies of both the SMIL demo and the JUMBO demo in
advance.
If you don't want these notes to appear
while you are giving the presentation, edit slideshow.css and
remove the commenting around the line 'display:
none' at the end of the file
Course Outline: What we're going to do today
- What is XML
- Status of XML
- Benefits of XML
- Document Type Definitions and dialects of XML
- Anatomy of an XML system
- XML in your context
- Review of XML Tools and Technologies
What is XML
- Key features
- Differences from HTML
- Differences from SGML
- Reality check
Key Features
- A markup language
- Describes the structure of a document
- Not the visual appearance (CSS1, CSS2, XSL)
- Written in simple ASCII
Visual Appearance and Style Sheets [1]
- The visual appearance of a document should be controlled
by style sheets.
- The appearance of this one is.
- In XML as in HTML you don't have to use style
sheets.
- If you don't, you will get a plain, simple
appearance.
If people are interested, you can open
the style sheet for this presentation, slideshow.css, in a text
editor.
Visual Appearance and Style Sheets [2]
- A special style sheet language, XSL, has been written to
support the new features of XML.
-
You can continue to use existing CSS1 and CSS2 stylesheets.
- Probably.
- Depending on what individual browser vendors decide
to support...
- This presentation is not about style sheets.
Differences from HTML
- In a word, extensible.
- Also, strictly parsed.
Extensible: what does this mean for you?
- Allows you to define new markup.
- Describing structure, not
appearance.
- Makes it easier for programs to extract
information from your documents.
Extensible: a simple example [1]
<customer-details id="AcPharm39156">
<name>Acme Pharmaceuticals Co.</name>
<address country="US">
<street>7301 Smokey Boulevard</street>
<city>Smallville</city>
<state>Indiana</state>
<postal>94571</postal>
</address>
</customer-details>
Extensible: a simple example [2]
- What does this do?
- For the user directly, very little.
- For the user's program, it allows it to isolate items of
structured information and handle them in intelligent ways to
help the user.
- But only if the user's program understands the special
markup you have defined.
Strictly parsed: what does this mean for you? [1]
Documents which do not conform to the standard will not be
rendered by an XML browser.
At all.
- Tags and attributes are case-sensitive;
- End tags cannot be omitted - every <p>
must have a </p>.
- Tags must be correctly nested:
<b><i>This won't
work</b></i>
- Et cetera, et cetera...
Strictly parsed: what does this mean for you? [2]
- Most Web designers are sloppy.
- More than ninety percent of all commercially authored Web
pages do not conform to any standard and are not valid
HTML.
- Few if any of the commercially available WYSIWYG tools
generate valid HTML.
- Web authors switching to XML will need to adopt much more
rigorous technical discipline.
Differences from SGML
- Like HTML, much simpler!
- Like HTML, optimised for delivery over
restricted-bandwidth links.
- Unlike HTML, a true subset of SGML.
- All valid XML documents are valid SGML documents.
- SGML tools (conforming to ISO 8879) will work with
XML.
- Organisations with an existing committment to SGML will
find the transition to XML much simpler.
Reality check
- Where is this process at?
- Will this really happen?
- Will I have to change what I do?
Where is this process at?
- A draft
standard has been published.
- Software tools
are emerging.
- No mainstream browser is available yet.
- About where HTML was in 1994.
- Prediction: commercially important in
two years, dominant in four.
Will this really happen?
- All the major players are involved.
- Many emerging standards depend on XML.
- XML has many advantages over HTML.
- Prediction: it will really happen.
Will you have to change what you do?
- HTML documents will continue to work with XML
browsers.
- May need to be more correct than at present.
- Correct HTML documents (i.e. ones which validate) are very easy
to convert to XML. Automatic conversion will be
possible.
- Increasingly, search engines will depend on XML-based
metadata.
- Increasingly, mainstream browsers will exploit XML-based
metadata.
- Conclusion: no, but...
- Prediction: commercial Web authors who
don't change won't stay in business.
Status of XML
The World Wide Web Consortium ('W3C') [1]
- The standards body for the Web.
- An open organisation, anyone can join.
- Not for profit.
- All major players are members.
- Should your organisation be?
The World Wide Web Consortium ('W3C') [2]
- Driving the standardisation
process for XML.
- Issued the XML 1.0 Recommendation in February 1998.
- Using XML as the basis for emerging standards in privacy,
multimedia, etc.
Emerging XML based standards
SMIL: Synchronised Multimedia
Integration Language A set of XML extensions to handle
embedded multimedia.
PICS: Platform for Internet Content
Selection A means of labelling the content of
documents based on criteria of taste - mainly motivated by
people who want to protect children from sexually explicit
material.
RDF: Resource Description
Framework A standard for structuring embedded metadata
- making it easier for programs to understand documents.
Some other proposed XML-based standards
This slide is included simply to give
some feel of the scale of the XML project...
Netscape Communications Corporation
- Publicly committed to XML.
- Tim Bray of
Netscape and Textuality is joint editor of XML standard
document.
- Developing XML-based indexing protocol, MCF.
- Navigator 4 does not handle XML.
-
Navigator
5 Beta
"...includes XML support..."
- Source code is
available
Microsoft Inc
- Publicly committed to XML.
- Jean Paoli of
Microsoft is joint editor of XML standard document
- Developing XML-based indexing protocol, PICS
- Have an XML parser written in Java
- Have an XML authoring tool, XML
Notepad
- Internet Explorer 4 partially handles
XML
- Internet Explorer 5 will not fully
handle XML
Benefits of XML
- SMIL: Glitz and eye-candy
- Meta information frameworks: benefits for searching and
indexing
- Benefits for dynamic content
- Benefits for precise layout and visual appearance
- Limitations and costs
SMIL: Glitz and eye-candy
- Essentially a framework for embedding multimedia objects,
so that they can intercommunicate
- Has some built-in multimedia capabilities
- Optimised for low bandwidth
- Designed to make it easy to add new handlers for new
multimedia formats
- First commercial implementation:
Real Networks G2
Say 'Smile'!
SMIL: An Example
Larry Bouthillier: "What I did last summer..."
NB: You should ensure you have the G2
beta of RealPlayer to view this. As of November 1998, bugs in
the server prevented the presentation from completely working
over the Internet, but what does work on a 28k modem is still
worth showing. To demonstrate the complete presentation,
download the source and media in advance from here
.
Document Type Definitions and dialects of XML
- What is a DTD?
- Do I have to use a DTD
- What DTDs are available?
- Who will write DTDs?
What is a Document Type Definition [1]?
- Every Web author has heard of one
-
Every good Web author has seen one
- Very few Web authors have written one
Do I have to use a DTD
- As with HTML, you don't have to specify a DTD.
- Even if you define new markup...
- ... but client programs won't know how to interpret your
new markup unless you also define a DTD.
- As with HTML, you should specify a DTD...
- ... and we all do, don't we, children?
What DTDs are available?
- All the XML extensions discussed in this presentation are
defined as DTDs.
- Thousands of SGML DTDs are available which can relatively
easily be converted.
- Relatively few XML DTDs yet available.
Who will write DTDs
- Very specialised documents, technically demanding to
write.
- For most purposes, suitable DTDs will quickly become
available.
- Most Web authors will never write a DTD.
- Large organisations with special documentation
requirements may write DTDs.
- Corporations which sell word-processors will
probably write DTDs.
-
Corporations which sell WYSIWYG Web authoring tools
will certainly write DTDs.
- In future, there will be much less distinction
between a word processor and a Web authoring tool.
- Communities of interest with special technical needs
will certainly write DTDs.
Anatomy of an XML system
- CML: Chemical Markup Language
- The JUMBO
XML browser and CML
Chemical Markup Language
- An application of XML
- Specified by a special-purpose DTD
- Allows chemists to interchange information
The Jumbo XML Browser: Introduction
- Peter Murray-Rust, University of Nottingham,
England.
- Prototype software.
- Developed for displaying and editing CML, but claimed to
be general-purpose XML browser.
- Architecturally interesting.
- Almost unusable and certainly nowhere near
user-ready!
The Jumbo Browser: Illustration
I'm including this because in my
experience the browser is so hard to use you may not be able to
make it do anything the course will see as interesting.
- Scalable, interactive diagrams linked into text.
- Links from objects in the diagrams into the text.
- Complex special notation.
Using Jumbo [1]
The XML source code for this
demo
- This is an old version; I couldn't make the latest
version do anything interesting to show you!
- Ignore the window called 'SGMLTree'.
- When the 'TableOfContents window opens, it's too small
and you have to resize it.
- I think there should be some way of rendering the
document as a readable document, but I can't work out
how!
- Open the folder marked 'Assignments'
-
Click on any of the little circles by the assignments.
- A window opens with a picture of the molecule.
- A window opens with predicted spectroscopy of the
molecule.
- You'll probably have to resize these.
- Rotate the molecule.
- Click on the highlighted atom and note how a highlight
appears on the graph.
- Click on the little circles against other assignments and
note how the graph and molecule change.
- Finally, look at the XML source for this demo.
Using Jumbo [2]
- No, I don't really understand it either - I'm not a
chemist!.
- Notice how simple the XML source is.
- Notice how small the XML source is.
- XML is used to display complex technical information to a
specialist audience in a form that audience will
understand.
XML in your context
- Applications which will benefit greatly from XML
- Applications which will benefit little from XML
- Early adoption: arguments for
- Organisations which should aim to be early adopters
- Wait and see: arguments for
- Organisations which should aim to wait and see
- Hybrid strategy: arguments for
- Organisations which should adopt a hybrid strategy
Applications which will benefit greatly from XML
- Multimedia applications.
- Technical documentation applications, or applications
involving special notation (e.g., mathematics, music).
- Applications incorporating client-end software
agents.
- Applications requiring highly detailed
illustrations.
At present, only where the audience is
controlled
Applications which will benefit little from XML
-
Simple publishing of text, with or without simple graphics.
- What I did last summer..!
But even here the advantages of XML will gradually take
over.
Early adoption: arguments for
- Better representation of technical information.
- Improved multimedia capabilities.
- Improved indexing capabilities.
- Development of new skills which will very rapidly become
important.
Organisations which should aim to be early adopters
- Technical information
- Specialist search
- Gaining Experience
Early Adopters: Technical information [1]
Organisations which distribute large quantities of technical
information to a targetted audience should adopt early.
- Improved indexing and searching allows better navigation
of the documents.
- Special markup can be used to help document users and
maintainers understand document structure.
- Technical notation can be easily incorporated where
required.
- Vector graphics allow 'zoomable' detail.
Early Adopters: Technical information [2]
- Engineering companies distributing technical
manuals.
- Companies exchanging technical specifications.
- Science and research establishments publishing technical
information.
Early Adopters: Specialist search [1]
Organisations which publish volumes of reference information
which users typically search should adopt early.
- Improved indexing and searching allows better navigation
of the documents.
- Special markup allows client-side user agents to
understand document structure and select 'interesting'
information.
Early Adopters: Specialist search [2]
- News providers, especially upstream news providers such
as PA and Reuters.
- Market information providers.
- Online libraries.
- Search engines.
Early Adopters: Gaining experience [1]
Organisations which view the Web as core to their business
should adopt early.
-
XML is radically different from HTML:
- Much richer, more powerful;
- Technically much more demanding.
- New skillsets and tools will be needed to publish
effectively in XML.
- The learning curve is steep.
Early Adopters: Gaining experience [2]
- Web authors and Web production companies.
-
Especially, software houses which make tools for Web
authoring;
- Still a considerable market opportunity in which new
starters might succeed.
Wait and see: arguments for
- 'It's not ready':
- No commercial quality XML browsers are yet available, and
it may be some time before any emerge.
- 'It may never happen':
- Some powerful software houses may see it as to their
advantage to undermine the standardisation process.
- 'What we've got is good enough':
- For many organisations, the present capabilities of HTML
as enhanced with applets and various proprietary multimedia
plug-ins will continue to fulfill their needs for some time
to come.
Organisations which should aim to wait and see
- Successful on-line retailers: organisations with a big
investment in existing Web technology, which is paying off
for them, do not need to change now, but should track the
technology.
- Organisations with simple 'brochureware' Web sites will
not need to change for some time.
- Organisations for whom the Web is not a core part of
their business do not need to change now.
Remember: better tools will emerge; this is still a
bleeding edge.
Hybrid strategy: arguments for Organisations which should adopt
a hybrid strategy
Review of XML Tools and Technologies
Parsers
Everyone and his dog seems to have written an XML parser in
Java:
There are also a few parsers available in C, Python,
etc...
Parsers are essential technology if you want to build
user-level tools for XML, but, by themselves, don't do anything
useful for the average user.
Editors
A number of other SGML editing tools can be used for
XML.
Browsers
-
Jumbo
- Demonstrates interesting new
functionality not possible with HTML.
- Edits as well as browses.
- Extremely counter-intuitive and hard to
use
-
Fujitsu HyBrick
- Looks like a browser!
- Lays out simple documents
straightforwardly.
- All menu text, help text, et cetera, in
Japanese.
- Can't layout any of the documents from
the Jumbo or SMIL demos
-
Microsoft IE 5
- Intended for real users to really
use.
- Renders all XML into HTML in order to
lay it out, so can't do anything which couldn't be done
with HTML.
SGML Browsers may also work.
Tools: The state of play
- None of the available XML browsers can even render the
demos distributed with any of the other XML browsers.
- No WYSIWYG editor is yet available, and, indeed, to build
one you would first have to build a working browser.
- None of the available XML editing tools appears suitable
for creating large or complex document structures.
- There are good, established SGML tools, but these may be
too expensive or too technically demanding (or both) for most
Web production houses.
- There are still very considerable market
opportunities.
Conclusions
- Early days
- When we planned this presentation in April 1998, we
thought that by November this technology would be beginning
to stabilise. It hasn't; it isn't ready for the real world
yet.
- Scope
- XML and the family of proposals based around it could
radically change the way we interact with all data and all
machines: it has the potential to be far more pervasive than
the Web is now.
- Development could be very rapid
- Given good document authoring, conversion and maintenance
tools and at least one solid mass-market browser, conversion
to XML could be very rapid indeed.
Thank you
Course author: Simon
Brooke
Course presenter: Gordon
Howell
This presentation is available
online.
Marketing Note
This course is new in second quarter 1998.
Course Author and Presenter
Simon Brooke
Associate, Internet Business
Services Consulting Ltd simon@ibsc.co.uk
Simon Brooke has been a technical consultant in advanced
software applications for thirteen years. He advises on the
development of software architectures and systems, primarily
for Internet and Intranet application.
As a consultant, Simon has advised many blue chip companies,
primarily in the IT, Telecoms and Chemical industries, on the
application and development of advanced software systems.