Learn XML

Introduction to XML

XML stands for eXtensible Markup Language.

XML was designed to store and transport data.

XML was designed to be both human-readable and machine-readable.

What is XML?

  • XML stands for eXtensible Markup Language
  • XML is a markup language much like HTML
  • XML was designed to store and transport data
  • XML was designed to be self-descriptive
  • XML is a W3C Recommendation

XML Does Not DO Anything

This note is a note to Tove from Jani, stored as XML:

    <body>Don't forget me this weekend!</body>

But still, the XML above does not DO anything. XML is just information wrapped in tags.

Someone must write a piece of software to send, receive, store, or display it:


To: Tove
From: Jani


Don't forget me this weekend!

The Difference Between XML and HTML

XML and HTML were designed with different goals:

  • XML was designed to carry data - with focus on what data is
  • HTML was designed to display data - with focus on how data looks
  • XML tags are not predefined like HTML tags are

XML Does Not Use Predefined Tags

The XML language has no predefined tags.

The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are “invented” by the author of the XML document.

HTML works with predefined tags like <p>, <h1>, <table>, etc.

With XML, the author must define both the tags and the document structure.

XML is Extensible

Most XML applications will work as expected even if new data is added (or removed).

Imagine an application designed to display the original version of note.xml (<to> <from> <heading> <body>).

Then imagine a newer version of note.xml with added <date> and <hour> elements, and a removed <heading>.

The way XML is constructed, older version of the application can still work:

  <body>Don't forget me this weekend!</body>

XML Simplifies Things

  • It simplifies data sharing
  • It simplifies data transport
  • It simplifies platform changes
  • It simplifies data availability

Many computer systems contain data in incompatible formats. Exchanging data between incompatible systems (or upgraded systems) is a time-consuming task for web developers. Large amounts of data must be converted, and incompatible data is often lost.

XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data.

XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

With XML, data can be available to all kinds of “reading machines” like people, computers, voice machines, news feeds, etc.

XML is a W3C Recommendation

XML became a W3C Recommendation as early as in February 1998.

How Can XML be Used?

XML is used in many aspects of web development.

XML is often used to separate data from presentation.

XML Separates Data from Presentation

XML does not carry any information about how to be displayed.

The same XML data can be used in many different presentation scenarios.

Because of this, with XML, there is a full separation between data and presentation.

XML is Often a Complement to HTML

In many HTML applications, XML is used to store or transport data, while HTML is used to format and display the same data.

XML Separates Data from HTML

When displaying data in HTML, you should not have to edit the HTML file when the data changes.

With XML, the data can be stored in separate XML files.

With a few lines of JavaScript code, you can read an XML file and update the data content of any HTML page.


<?xml version="1.0" encoding="UTF-8"?>

	<book category="cooking">
    	<title lang="en">Everyday Italian</title>
    	<author>Giada De laurentiis</author>

	<book category="cooking">
    	<title lang="en">Everyday Italian</title>
    	<author>Giada De laurentiis</author>

Transaction Data

Thousands of XML formats exists, in many different industries, to describe day-to-day data transactions:

  • Stocks and Shares
  • Financial transactions
  • Medical data
  • Mathematical data
  • Scientific measurements
  • News information
  • Weather services

Example: XML News

XMLNews is a specification for exchanging news and other information.

Using a standard makes it easier for both news producers and news consumers to produce, receive, and archive any kind of news information across different hardware, software, and programming languages.

An example XMLNews document:

<?xml version="1.0" encoding="UTF-8"?>
    <title>Colombia Earthquake</title>
      <hl1>143 Dead in Colombia Earthquake</hl1>
      <bytag>By Jared Kotler, Associated Press Writer</bytag>
      <location>Bogota, Colombia</location>
      <date>Monday January 25 1999 7:28 ET</date>

Example: XML Weather Service

An XML national weather service from NOAA (National Oceanic and Atmospheric Administration):

<?xml version="1.0" encoding="UTF-8"?>

<credit>NOAA's National Weather Service</credit>

  <title>NOAA's National Weather Service</title>

<location>New York/John F. Kennedy Intl Airport, NY</location>
<observation_time_rfc822>Mon, 11 Feb 2008 06:51:00 -0500 EST

<weather>A Few Clouds</weather>



XML Tree

XML documents form a tree structure that starts at “the root” and branches to “the leaves”.

An Example XML Document

The image above represents books in this XML:

<?xml version="1.0" encoding="UTF-8"?>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>

XML Tree Structure

XML documents are formed as element trees.

An XML tree starts at a root element and branches from the root to child elements.

All elements can have sub elements (child elements):


The terms parent, child, and sibling are used to describe the relationships between elements.

Parents have children. Children have parents. Siblings are children on the same level (brothers and sisters).

All elements can have text content (Harry Potter) and attributes (category="cooking”).

Self-Describing Syntax

XML uses a much self-describing syntax.

A prolog defines the XML version and the character encoding:

<?xml version="1.0" encoding="UTF-8"?>

The next line is the root element of the document:


The next line starts a <book> element:

<book category="cooking">

The <book> elements have 4 child elements: <title>, <author>, <year>, <price>.

<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>

The next line ends the book element:


You can assume, from this example, that the XML document contains information about books in a bookstore.

XML Syntax Rules

The syntax rules of XML are very simple and logical. The rules are easy to learn, and easy to use.

XML Documents Must Have a Root Element

XML documents must contain one root element that is the parent of all other elements:


In this example <note> is the root element:

<?xml version="1.0" encoding="UTF-8"?>
  <body>Don't forget me this weekend!</body>

The XML Prolog

This line is called the XML prolog:

<?xml version="1.0" encoding="UTF-8"?>

The XML prolog is optional. If it exists, it must come first in the document.

XML documents can contain international characters, like Norwegian øæå or French êèé.

To avoid errors, you should specify the encoding used, or save your XML files as UTF-8.

UTF-8 is the default character encoding for XML documents.

Character encoding can be studied in our Character Set Tutorial.

UTF-8 is also the default encoding for HTML5, CSS, JavaScript, PHP, and SQL.

All XML Elements Must Have a Closing Tag

In XML, it is illegal to omit the closing tag. All elements must have a closing tag:

<p>This is a paragraph.</p>
<br />

The XML prolog does not have a closing tag! This is not an error. The prolog is not a part of the XML document.

XML Tags are Case Sensitive

XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.

Opening and closing tags must be written with the same case:

<message>This is correct</message>

“Opening and closing tags” are often referred to as “Start and end tags”. Use whatever you prefer. It is exactly the same thing.

XML Elements Must be Properly Nested

In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:

<b><i>This text is bold and italic</i></b>

In the example above, “Properly nested” simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element.

XML Attribute Values Must Always be Quoted

XML elements can have attributes in name/value pairs just like in HTML.

In XML, the attribute values must always be quoted:

<note date="12/11/2007">

Entity References

Some characters have a special meaning in XML.

If you place a character like “<” inside an XML element, it will generate an error because the parser interprets it as the start of a new element.

This will generate an XML error:

<message>salary < 1000</message>

To avoid this error, replace the “<” character with an entity reference:

<message>salary &lt; 1000</message>

There are 5 pre-defined entity references in XML:

Source Code Display Meaning
&lt; < less than
&gt; > greater than
&amp; & ampersand
&apos; ' apostrophe
&quot; " quotation mark

Only < and & are strictly illegal in XML, but it is a good habit to replace > with > as well.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML:

<!-- This is a comment -->

Two dashes in the middle of a comment are not allowed:

<!-- This is an invalid -- comment -->

White-space is Preserved in XML

XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one single white-space):

XML Hello Tove
HTML: Hello Tove

XML Stores New Line as LF

Windows applications store a new line as: carriage return and line feed (CR+LF).

Unix and Mac OSX use LF.

Old Mac systems use CR.

XML stores a new line as LF.

Well Formed XML

XML documents that conform to the syntax rules above are said to be “Well Formed” XML documents.

XML Elements

An XML document contains XML Elements.

What is an XML Element?

An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.


An element can contain:

  • text
  • attributes
  • other elements
  • or a mix of the above
  <book category="children">
    <title>Harry Potter</title>
    <author>J K. Rowling</author>
  <book category="web">
    <title>Learning XML</title>
    <author>Erik T. Ray</author>

In the example above:

<title>, <author>, <year>, and <price> have text content because they contain text (like 29.99).

<bookstore> and <book> have element contents, because they contain elements.

<book> has an attribute (category="children”).

Empty XML Elements

An element with no content is said to be empty.

In XML, you can indicate an empty element like this:


You can also use a so called self-closing tag:

<element />

The two forms produce identical results in XML software (Readers, Parsers, Browsers).

Empty elements can have attributes.

XML Naming Rules

XML elements must follow these naming rules:

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

Any name can be used, no words are reserved (except xml).

Best Naming Practices

Create descriptive names, like this: <person>, <firstname>, <lastname>.

Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.

Avoid “-”. If you name something “first-name”, some software may think you want to subtract “name” from “first”.

Avoid “.”. If you name something “first.name”, some software may think that “name” is a property of the object “first”.

Avoid “:”. Colons are reserved for namespaces (more later).

Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software doesn’t support them.

Naming Styles

There are no naming styles defined for XML elements. But here are some commonly used:

Style Example Description
Lower case <firstname> All letters lower case
Upper case <FIRSTNAME> All letters upper case
Underscore <first_name> Underscore separates words
Pascal case <FirstName> Uppercase first letter in each word
Camel case <firstName> Uppercase first letter in each word except the first

If you choose a naming style, it is good to be consistent!

XML documents often have a corresponding database. A common practice is to use the naming rules of the database for the XML elements.

Camel case is a common naming rule in JavaScripts.

XML Elements are Extensible

XML elements can be extended to carry more information.

Look at the following XML example:

  <body>Don't forget me this weekend!</body>

Let’s imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce this output:


Imagine that the author of the XML document added some extra information to it:

  <body>Don't forget me this weekend!</body>

Should the application break or crash?

No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.

This is one of the beauties of XML. It can be extended without breaking applications.

XML Attributes

Open WeChat and scan or enter 'yidajiabei00' to subscribe to the blog
comments powered by Disqus