Nothing Special   »   [go: up one dir, main page]

Introduction To XML

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

Introduction to XML

Extensible Markup Language


What is XML

• XML stands for eXtensible Markup Language.


• A markup language is used to provide
information about a document.
• Tags are added to the document to provide the
extra information.
• HTML tags tell a browser how to display the
document.
• XML tags give a reader some idea what some of
the data means.
What is XML Used For?
• XML documents are used to transfer data from one
place to another often over the Internet.
• XML subsets are designed for particular applications.
• One is RSS (Rich Site Summary or Really Simple
Syndication ). It is used to send breaking news bulletins
from one web site to another.
• A number of fields have their own subsets. These
include chemistry, mathematics, and books publishing.
• Most of these subsets are registered with the
W3Consortium and are available for anyone’s use.
Advantages of XML

• XML is text (Unicode) based.


– Takes up less space.
– Can be transmitted efficiently.
• One XML document can be displayed differently
in different media.
– Html, video, CD, DVD,
– You only have to change the XML document in order
to change all the rest.
• XML documents can be modularized. Parts can
be reused.
Example of an HTML Document

<html>
<head><title>Example</title></head.
<body>
<h1>This is an example of a page.</h1>
<h2>Some information goes here.</h2>
</body>
</html>
Example of an XML Document

<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
Difference Between HTML and XML

• HTML tags have a fixed meaning and


browsers know what it is.
• XML tags are different for different
applications, and users know what they
mean.
• HTML tags are used for display.
• XML tags are used to describe documents
and data.
XML Rules

• Tags are enclosed in angle brackets.


• Tags come in pairs with start-tags and
end-tags.
• Tags must be properly nested.
– <name><email>…</name></email> is not allowed.
– <name><email>…</email><name> is.
• Tags that do not have end-tags must be
terminated by a ‘/’.
– <br /> is an html example.
More XML Rules
• Tags are case sensitive.
– <address> is not the same as <Address>
• XML in any combination of cases is not allowed
as part of a tag.
• Tags may not contain ‘<‘ or ‘&’.
• Tags follow Java naming conventions, except
that a single colon and other characters are
allowed. They must begin with a letter and may
not contain white space.
• Documents must have a single root tag that
begins the document.
Encoding
• XML (like Java) uses Unicode to encode characters.
• Unicode comes in many flavors. The most common one
used in the West is UTF-8.
• UTF-8 is a variable length code. Characters are
encoded in 1 byte, 2 bytes, or 4 bytes.
• The first 128 characters in Unicode are ASCII.
• In UTF-8, the numbers between 128 and 255 code for
some of the more common characters used in western
Europe, such as ã, á, å, or ç.
• Two byte codes are used for some characters not listed
in the first 256 and some Asian ideographs.
• Four byte codes can handle any ideographs that are left.
• Those using non-western languages should investigate
other versions of Unicode.
Well-Formed Documents
• An XML document is said to be well-formed if it follows all the rules.
• An XML parser is used to check that all the rules have been obeyed.
• Recent browsers such as Internet Explorer 5 and Netscape 7 come
with XML parsers.
• Parsers are also available for free download over the Internet.
• A well-formed XML document is a document that conforms to the
XML syntax rules, like:
• it must begin with the XML declaration
• it must have one unique root element
• start-tags must have matching end-tags
• elements are case sensitive
• all elements must be closed
• all elements must be properly nested
• all attribute values must be quoted
• entities must be used for special characters
XML Example Revisited
<?xml version=“1.0”/>
<address>
<name>Alice Lee</name>
<email>alee@aol.com</email>
<phone>212-346-1234</phone>
<birthday>1985-03-22</birthday>
</address>
• Markup for the data aids understanding of its purpose.
• A flat text file is not nearly so clear.
Alice Lee
alee@aol.com
212-346-1234
1985-03-22
• The last line looks like a date, but what is it for?
Expanded Example
<?xml version = “1.0” ?>
<address>
<name>
<first>Alice</first>
<last>Lee</last>
</name>
<email>alee@aol.com</email>
<phone>123-45-6789</phone>
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day>
</birthday>
</address>
XML Files are Trees

address

name email phone birthday

first last year month day


XML Trees

• An XML document has a single root node.


• The tree is a general ordered tree.
– A parent node may have any number of
children.
– Child nodes are ordered, and may have
siblings.
• Preorder traversals are usually used for
getting information out of the tree.
XML Attributes
• Attribute values must always be quoted.
Either single or double quotes can be
used.
• For a student's gender, the <student>
element can be written as:
• <student gender="female">
XML Elements vs. Attributes
• <student gender="female">
<firstname>Abc</firstname>
<lastname>Pqr</lastname>
</student >
• OR
<student >
<gender>female</gender>
<firstname>Abc</firstname>
<lastname>Pqr</lastname>
</student>
• Both examples provide the same information.
• There are no rules about when to use attributes or
when to use elements in XML.
Some things to consider when using
attributes are:
• attributes cannot contain multiple values
(elements can)
• attributes cannot contain tree structures
(elements can)
• attributes are not easily expandable (for
future changes)
XML Namespaces
• XML Namespaces provide a method to
avoid element name conflicts.
• In XML, element names are defined by the
developer. This often results in a conflict
when trying to mix XML documents from
different XML applications.
Validity
• Use our XML validator to syntax-check your
XML. With XML, errors are not allowed.
• An XML document with correct syntax is called
"Well Formed". A well-formed document has a
tree structure and obeys all the XML rules.
• A particular application may add more rules in
either a DTD (document type definition) or in a
schema.
• Many specialized DTDs and schemas have
been created to describe particular areas.
• These range from disseminating news bulletins
(RSS) to chemical formulas.
• DTDs were developed first, so they are not as
comprehensive as schema.
Valid XML Documents
• A "well formed" XML document is not the same as a
"valid" XML document.
• A "valid" XML document must be well formed. In
addition, it must conform to a document type
definition/ Schema.
• There are two different document type definitions that
can be used with XML:
• DTD - The original Document Type Definition
• XML Schema - An XML-based alternative to DTD
• A document type definition defines the rules and the
legal elements and attributes for an XML document.
XML DTD- Document Type Definitions
• An XML document with correct syntax is called "Well
Formed".
• An XML document validated against a DTD is both
"Well Formed" and "Valid".
• A DTD describes the tree structure of a document
and something about its data.
• There are two data types, PCDATA and CDATA.
– PCDATA is parsed character data.
– CDATA is character data, not usually parsed.
• A DTD determines how many times a node may
appear, and how child nodes are ordered.
DTD for address Example
<!ELEMENT address (name, email, phone, birthday)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
Schemas

• Schemas are themselves XML documents.


• They were standardized after DTDs and provide
more information about the document.
• They have a number of data types including
string, decimal, integer, boolean, date, and time.
• They divide elements into simple and complex
types.
• They also determine the tree structure and how
many children a node may have.
Schema for First address Example
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Explanation of Example Schema
<?xml version="1.0" encoding="ISO-8859-1" ?>
• ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
• www.w3.org/2001/XMLSchema contains the schema standards.
<xs:element name="address">
<xs:complexType>
• This states that address is a complex type element.
<xs:sequence>
• This states that the following elements form a sequence and must
come in the order shown.
<xs:element name="name" type="xs:string"/>
• This says that the element, name, must be a string.
<xs:element name="birthday" type="xs:date"/>
• This states that the element, birthday, is a date. Dates are always of
the form yyyy-mm-dd.
XML Schemas are More
Powerful than DTD
• XML Schemas are written in XML
• XML Schemas are extensible to additions
• XML Schemas support data types
• XML Schemas support namespaces
Why Use an XML Schema?
• With XML Schema, your XML files can
carry a description of its own format.
• With XML Schema, independent groups of
people can agree on a standard for
interchanging data.
• With XML Schema, you can verify data.
XML Schemas Support Data
Types
• One of the greatest strengths of XML
Schemas is the support for data types:
• It is easier to describe document content
• It is easier to define restrictions on data
• It is easier to validate the correctness of
data
• It is easier to convert data between
different data types
XML Schemas use XML Syntax
• Another great strength about XML
Schemas is that they are written in XML:
• You don't have to learn a new language
• You can use your XML editor to edit your
Schema files
• You can use your XML parser to parse
your Schema files
• You can manipulate your Schemas with
the XML DOM
• You can transform your Schemas with
XSLT
When to Use a DTD/Schema?
• With a DTD, independent groups of people
can agree to use a standard DTD for
interchanging data.
• With a DTD, you can verify that the data
you receive from the outside world is valid.
• You can also use a DTD to verify your own
data.
When NOT to Use a
DTD/Schema?
• XML does not require a DTD/Schema.
• When you are experimenting with XML, or
when you are working with small XML
files, creating DTDs may be a waste of
time.
• If you develop applications, wait until the
specification is stable before you add a
document definition. Otherwise, your
software might stop working because of
validation errors.
Parsers

• There are two principal models for


parsers.
• SAX – Simple API for XML
– Uses a call-back method
– Similar to javax listeners
• DOM – Document Object Model
– Creates a parse tree
– Requires a tree traversal
XML DOM
• The XML DOM makes a tree-structure
view for an XML document.
• We can access all elements through the
DOM tree.
• We can modify or delete their content and
also create new elements. The elements,
their content (text and attributes) are all
known as nodes.
• According to the XML DOM, everything in
an XML document is a node:
• The entire document is a document node
• Every XML element is an element node
• The text in the XML elements are text
nodes
• Every attribute is an attribute node
• Comments are comment nodes

You might also like