Nothing Special   »   [go: up one dir, main page]

ARCHIVING 2007 Art00003 Frank-L - Walker

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

PDF File Migration to PDF/A: Technical Considerations

Frank L. Walker, Marie E. Gallagher, and George R. Thoma; Lister Hill National Center for Biomedical Communications, National
Library of Medicine, Bethesda, Maryland, USA

A and B. Both conformance levels preserve the long-term visual


Abstract appearance of an electronic document. A PDF/A-1 file
The PDF/A specification for long-term preservation of conforming to Level B (also referred to as PDF/A-1b) provides the
electronic documents became an international standard in 2005. minimal requirements for ensuring a document’s long-term visual
This standard seeks to guarantee the long-term visual appearance appearance. This is accomplished by embedding all fonts within
of an electronic document. For collections to be archived as PDF the file, using a device-independent color system, and including
files, it makes sense to select the PDF/A file format, because this XMP metadata for describing the document [4]. This standard
particular type of PDF file makes it easier to migrate to future file also eliminates several features of PDF 1.4: LZW compression,
formats. However, in the years before the PDF/A specification encryption, external references, transparency, audio or visual
became a standard, many organizations began creating archives of multimedia, JavaScript executable code, and embedded files. As a
collections in PDF, but in formats not necessarily compatible with further requirement for maintaining semantic and structural
PDF/A. Because of its value to preservation that PDF/A offers, information in the document, files conforming to Level A (also
there is an advantage to migrating collections to PDF/A. referred to as PDF/A-1a) must be “tagged” and contain Unicode
Commercial software tools are becoming available, both for character maps. A tagged file contains logical structure
creating PDF/A files and for evaluating their compatibility with information that specifies the natural reading order of its contents.
the PDF/A standard. One such tool was used to study PDF files This not only facilitates migration to future file formats, but
culled from the Internet as well as from an in-house collection to improves accessibility by permitting a user to read the document’s
determine the chances of success for migrating an archived contents in proper sequence. The Unicode character map provides
collection of PDF documents to PDF/A. This study explores the semantic information about the characters, and facilitates text
types of problems posed by such a migration, and determines the searching and copying, particularly for Asian languages. Tagging
circumstances in which a migration would be successful. is most easily accomplished when a document is first created, such
as with Microsoft Word, which allows a user to specify the
document structure through heading levels, paragraphs, and table
titles. When the Word document is converted to PDF, this
Background information is used to create tags in the PDF file. It is also
The Adobe Portable Document Format (PDF) has been in use possible for Adobe Acrobat Professional to be used to tag an
for more than fifteen years and has been widely adopted for existing PDF file, but this can be a labor-intensive task.
electronic document use and distribution. Users have installed Questions arise when an organization considers preservation
more than half a billion copies of the freely available Acrobat using PDF/A. The use of PDF/A facilitates preservation, but
Reader® on a wide variety of computing platforms, to view local proactive steps are required to guarantee it. These steps include
computer-based PDF files, as well as an estimated 200 million periodic file replication (before media decays), the widespread
Internet-based PDF files (approximately ten percent of all Web adoption of the PDF/A standard through the creation and use of
documents). Over the years, successive versions of the PDF file software tools designed to create and render PDF/A files, and the
format have become exceedingly more complex as new features migration of other file formats to and from PDF/A. Due to its
appear with each new release, such as: embedded multimedia, simplified format, migration of PDF/A to future file formats when
document annotation, password protection, encryption, forms, and necessary will be easier. Preservation is also facilitated because a
3D capabilities. The continuing growth of PDF capabilities have PDF/A file is completely self-contained: all resources necessary to
led to a file format that, while feature-rich, is undesirable for enable a PDF/A reader to display or print the electronic document
specific applications. As a result, various subsets of PDF either are contained in the file. In addition, the file contains the metadata
have been adopted or are under development for specific uses: describing the document. During the first fifteen years of its
PDF/X for the publishing industry, PDF/E for engineering existence, PDF has been used not only as a format for electronic
document workflow, PDF/UA for handicapped accessibility, document exchange, but also for preservation. In some instances,
PDF/H for health records, and PDF/A for electronic document considerable resources have been invested to create document
preservation. After three years of work by the Association for collections in the PDF format. Institutions with a preservation
Information and Image Management (AIIM), the Association for objective may need to address the following: should existing PDF
Suppliers of Printing, Publishing and Converting Technologies collections be converted to PDF/A? Is this possible? What
(NPES), and many government agencies and private organizations, problems may be encountered? As shown in Figure 1, PDF/A is a
the proposed PDF/A standard was approved by the International subset of Version 1.4 (published in November 2001), which is a
Standards Organization (ISO) in September 2005. This new subset of version 1.5 (August 2003), which in turn is a subset of
standard is designated ISO 19005-1:2005, Document management version 1.6 (November 2004), and this is a subset of version 1.7
– Electronic document file format for long-term preservation – (October 2006). Because each new version offers more
Part 1: Use of PDF 1.4 (PDF/A-1) [1,2]. capabilities than the previous one, will it become more difficult for
The PDF/A-1 standard is a subset of the specifications for a PDF archive created using the latest version of PDF to be
PDF version 1.4 [3]. It supports two conformance levels: Levels convertible to PDF/A? A two-part study considers these questions

6 Society for Imaging Science and Technology


by examining the types of problems encountered during such a successfully converted to PDF/A. It lists all possible problems,
migration, and determining the degree of migration success. grouping these into the ones that the tool can correct during
conversion, and those that it cannot. Among the checks performed
are an analysis of most objects in the file for syntax and
Capabilities consistency with the PDF/A standard, including the Info
Dictionary, Catalog Dictionary, fonts, color spaces, ICC profiles,
object streams, trailer dictionary, cross reference table, Unicode
map, and XMP metadata. The results of each analysis was saved
PDF 1.7
in an XML file, producing 10,000 XML file results for the first
sample, and 1,000 for the second sample. We wrote software to
PDF 1.6
read all the XML file analyses and produce an Excel-compatible
spreadsheet that summarized the results, namely, the types of
PDF 1.5
problems that may be encountered during conversion of a general
PDF 1.4 collection of PDF files to PDF/A.
Part 2 of the study considers a specific collection of PDF files
PDF/A-1a
available on an NLM Web site, called Profiles in Science®
(http://profiles.nlm.nih.gov/). Long-term preservation of this
PDF/A-1b digital library has been a primary consideration from its inception.
The purpose of Profiles in Science is to make available digital
reproductions of historical items selected from the personal
Capabilities
collections of prominent biomedical researchers and leaders in
Figure 1. PDF Version Capabilities public health [6,7]. This Web site was launched in 1998 as a
research project to expand access to these valuable collections and
to promote the use of the Internet for research and teaching in the
history of biomedical science. It features more than 20 collections
containing published and unpublished items, including books,
Procedure journal volumes, pamphlets, diaries, letters, manuscripts,
The study consists of two parts. The first part identifies the
photographs, audiotapes, video clips, and other materials. In
types of problems encountered in converting a general collection
September 2006 there were 16,389 PDF files representing the
of Internet-based PDF files to PDF/A, and determines the potential
paper-based portion of the collections. A few of these PDF files
degree of migration success. The second part of the study
were created from color JPEG images, but over 16,370 PDF files
considers a specific example of an archived PDF collection at the
were created using black and white images produced by scanning
United States National Library of Medicine (NLM) to determine
documents at 300 dots per inch resolution and storing them in a
whether it is a candidate for migration to PDF/A.
lossless compressed TIFF format. During the conversion from
In part 1 of this study, we used samples of Internet-based
TIFF to PDF, the files were put through an OCR process to
PDF files from a wide variety of Web sites. The samples were
produce text-searchable PDF. The PDF files are available to the
taken from two time periods to see if the conversion to PDF/A at
public through the Web site, and the original TIFF files have been
different times posed different problems. The first sample consists
archived off-line. We applied the PDF Appraiser analysis tool to
of 10,000 PDF files selected at random from several thousand Web
approximately one percent of the PDF files in Profiles in Science.
sites between 2001 and 2003, with most of the files from 2002.
These randomly selected 172 files, representing samples from all
We first located the files through Google searches. We then
collections, were analyzed and the results saved in separate XML
developed software to read the results of the Google searches, and
files. Then our software read all 172 XML files, and produced a
automatically download these files. The second sample,
spreadsheet to summarize the analysis results. These results reveal
assembled in September 2006, consists of 1,000 PDF files. For
the likelihood of successfully converting this specific collection to
both samples, we compared all files to ensure that they are unique.
PDF/A.
We also performed quality checks on the files to ensure that they
were downloaded properly without dropping bits. This PDF
validation was accomplished by displaying each file using Acrobat
Results – Part 1 of the Study
In part 1 of this study, we analyzed 10,000 PDF files
Reader.
randomly selected and downloaded from Internet Web sites over a
In 2006, new tools for creating and analyzing PDF/A files
three-year period (2001-2003), with the majority of the files
became commercially available. The tools generally fell into one
downloaded in 2002. We found that the PDF Appraiser tool failed
of three functional categories: (1) for converting a non-PDF file to
to process 274 of these files (2.74%): it either crashed or hanged,
PDF/A; (2) for “preflighting” or analyzing a file for conformance
even after all files had passed a quality control check with Acrobat
with the PDF/A standard; and (3) for converting PDF files to
Reader. Of the remaining 9,726 files that the analysis tool
PDF/A. At the time of this study the tools available could only
successfully processed, we found that there were 332 unique
manage conversion to Level B PDF/A files, not Level A files.
One such tool that provided all three functions is PDF Appraiser, producers of the files. A “producer” is basically a printer driver,
distributed by Apago, Inc. [5]. This was used to study the chances such as Acrobat Distiller for Windows. Different versions of the
same driver were counted as distinct producers. Our results
of success for converting the assembled PDF files to PDF/A. We
used an evaluation version of this tool to analyze each of the showed that it would be possible to successfully convert 4,404
10,000 files in the first sample, and the 1,000 files in the second files to PDF/A, or 45.3% of the usable total. We spot-checked the
sample. PDF Appraiser determines whether a PDF file can be reliability of conversion using the tool to convert a number of

Archiving 2007 Final Program and Proceedings 7


these files to PDF/A by using Adobe Acrobat Professional version Table 2. Top Ten Producers and their Conversion Failure Rates:
7 to “preflight” the resulting files. In every case the tool produced 2006 Sample
valid, displayable PDF/A files. The tool reported that the
remaining 5,322 files (54.7% of the usable total) had problems that Producer Percentage Conversion Failure
would prevent their conversion to PDF/A. Table 1 lists the ten of all PDF Rates: Percentage
most common producers and the percentage of PDF files they Files in the of Producer files that
created that could not be converted to PDF/A. This table shows, Sample cannot be converted
as expected, that the most common PDF producers were various
to PDF/A
versions of Adobe Acrobat Distiller released during the period
1998 to 2002. There is no apparent trend in the conversion failure Acrobat Distiller
rates of these producers. 5.0.5 (Windows) 11.1 28.3
Acrobat Distiller 5.0
Table 1. Top Ten Producers and their Conversion Failure Rates: (Windows) 9.3 25.8
2002 Sample Acrobat Distiller 6.0
(Windows) 8.7 60.2
Producer Percentage Conversion Failure Acrobat Distiller
of all PDF Rates: Percentage 6.0.1 (Windows) 4.7 53.3
Files in the of Producer files that Acrobat Distiller
Sample cannot be converted 4.05 (Windows) 3.5 52.9
to PDF/A Acrobat Distiller
7.0.5 (Windows) 3.4 60.6
Acrobat Distiller Acrobat Distiller 7.0
4.05 (Windows) 9.9 47.4 (Windows) 3.4 72.7
Acrobat Distiller 4.0 Acrobat PDFWriter
(Windows) 9.6 62.0 5.0 for Windows
Acrobat Distiller 4.0 NT 3.3 50.0
for Macintosh 7.9 73.8 Acrobat Distiller 4.0
Acrobat Distiller 5.0 (Windows) 3.0 79.3
(Windows) 6.3 27.4 Acrobat Distiller
Acrobat Distiller 4.05 for Macintosh 2.6 76.0
4.05 for Macintosh 5.5 69.5
Acrobat PDFWriter Table 3 shows the distribution of PDF file versions in the two
3.02 (Windows) 5.0 48.8 samples, and their respective conversion failure rates. This reveals
Acrobat PDFWriter that, except for the small sample of files for PDF version 1.6, the
conversion failure rate generally does not increase with newer
4.0 (Windows) 3.6 50.9
versions of PDF. This indicates that the new features and
Acrobat Distiller capabilities offered by each new version of PDF do not appear to
3.01 (Windows) 3.5 70.5 affect the ability to convert a file to PDF/A.
Acrobat PDFWriter
4.05 for Windows Table 3. Distribution of PDF Versions and Conversion Failure
NT 3.4 48.1 Rates
Acrobat PDFWriter PDF 2002 Sample 2006 Sample
4.0 for Windows Version
NT 3.3 43.9 Number Failure Number Failure
of files Rate of files Rate
We noticed similar results from the September 2006 sample 1 23 60.8% 1 0%
of 1,000 Internet-based PDF files. In this sample, the analysis tool
1.1 726 62.6 19 84.2
failed to process 50 files (5% of the total). Of the remaining 950
files, the tool found that 496 were convertible to PDF/A, or 52.2% 1.2 5409 54.1 222 55.8
of the usable files. This is nearly the same percentage as found in 1.3 3006 54.0 262 32.8
the earlier sample. The tool found a total of 148 producers in this 1.4 562 56.4 353 51.2
sample. Table 2 lists the ten most common producers in this 1.5 0 0 73 53.4
sample, and their conversion failure rates. The manufacturer 1.6 0 0 20 70.0
released these producers during the period 2002 through 2005. It
is interesting to note that the two oldest producers encountered, Table 4 lists the ten most common non-correctable problems
Acrobat Distiller 4.0 for Windows and Acrobat Distiller 4.05 for identified by the tool for the 2002 sample of PDF files that could
Windows) had significant increases in failure rates over the 2002 not be converted to PDF/A. These are all of a serious nature, and
sample, but they were nearly the same failure rate as that of the make migration impossible. The Frequency of Occurrence is the
second newest producer, Acrobat Distiller 7.0 for Windows. percentage of files in the sample experiencing the problem.

8 Society for Imaging Science and Technology


Table 4. Top Ten Non-Correctable Problems Preventing PDF/A Table 5. Top Ten Non-Correctable Problems Preventing PDF/A
Conversion: 2002 Sample Conversion: 2006 Sample
Frequency of Frequency of
Problem Description Occurrence Problem Description Occurrence
Font Not embedded 37.2% Font Not embedded 27.3%
No matching No matching
CharSet entry Missing value 11.9 CharSet entry Missing value 13.6
Security Invalid value 6.9 Security Invalid value 6.4
No matching glyph No matching glyph
for CharCode Missing value 3.5 for CharCode Missing value 1.6
BaseFont Missing value 3.2 BaseFont Missing value 1.5
Wrong type for Incorrect
BG object .9 ColorSpace Invalid Value 1.1
Wrong type for BM Invalid value .9
UCR object .9 Count Missing value .5
Appearance Missing value .4 Appearance Missing value .5
Action Invalid value .1 Wrong type for
BG object .3
Incorrect
ColorSpace Invalid value .1
Table 6 gives the ten most common correctable problems that
the tool encountered in the 2002 sample. These minor problems
Several of the most common non-correctable problems were can be fixed during migration to PDF/A. Table 7 lists the same
problems with fonts: results for the 2006 sample. Among the most common correctable
• Font not embedded. The top problem is failure to embed problems:
fonts. Either an entire font or a subset of a font must be • Font not embedded. Although this is also listed as a
embedded within the PDF file. This type of error non-correctable problem in Tables 4 and 5, it is
indicates it is not possible for the tool to embed the font correctable if the conversion software can embed the
in the file. The tool may not be able to embed the font missing font in the PDF file, which would be most likely
either due to licensing restrictions, or perhaps it cannot for the fourteen Postscript Type 1 fonts.
find an appropriate font to embed. • DestOutputProfile missing value. This is an object that
• No matching CharSet entry. It is permissible to embed describes the ICC profile for device independent output
only a subset of a Type 1 font as long as all characters color.
that are to be displayed are specified in the subset. This • XMP Metadata missing value. The metadata object is
error indicates that an entry is missing in the CharSet missing, and is required for the PDF/A specification.
element of the Font Descriptor. • Colorspace Issues. There are a number of colorspace
• BaseFont missing value. This error is encountered if the problems that the tool can correct while creating the
PostScript name of the font is missing. PDF/A file.
• No matching glyph for CharCode. A character code • LZWDecode. The LZW compression algorithm is not
could not be matched to a glyph. permitted in PDF/A files, but images that are LZW-
Another non-correctable problem was with security, or compressed are usually convertible to Zip or Group 4
encryption. This occurred in 6.9 percent of all files in the sample. compression.
This indicates that the creator placed restrictions on file viewing, • PDF/A tag not located. This indicates the file contained
copying, modifying, or printing. an XMP metadata object without elements for the
Table 5 lists the ten most common non-correctable problems PDF/A identification. This problem is easily fixed by
in the 2006 file sample. The relative frequency of occurrence is transferring information from the Info object (e.g.,
nearly the same as that of the earlier sample. producer, creation date, and subject).

Archiving 2007 Final Program and Proceedings 9


Table 6. Top Ten Most Common Correctable Problems: 2002 the likelihood of converting a non-tagged PDF file to PDF/A-1a is
Sample very small. In general we can conclude that if a PDF file can be
Frequency of converted to PDF/A, it is much more likely that it would be Level
Problem Description Occurrence B compliant rather than Level A. This would be sufficient to
DestOutputProfile Missing value 93.0% maintain the long-term visual appearance of the document, but not
enough to make it accessible to the handicapped or text searchable
XMP Metadata Missing value 81.3
for some types of fonts.
Font Not embedded 63.0 One interesting aspect of this part of the study is that we
Colorspace Issues Invalid value 58.5 found one type of PDF file to be almost always convertible to
TR Forbidden object 28.7 PDF/A with a high degree of success. This is an image-only or
Encoded with text-behind-image PDF file. In the 2006 sample, 31 files fell into
LZWDecode invalid filter 23.2 this category. All other files in the sample population contained
Invalid Colorspace Undefined 12.4 some form of visible text or text combined with images. Of these
PDF/A tag not 31 files, 26 were convertible to PDF/A. There were 3 files that
located Missing value 11.6 could not be converted due to encryption; had they not been
encrypted they could have been converted. One file could not be
Flags Missing value 9.1
converted due to a damaged color space, and another could not be
ID Missing value 3.8
converted because of a missing BaseFont. If we counted the three
files that could not be converted due to a security lock placed on
the files, then 29 of 31 files were convertible to PDF/A (94%).
This leads into part 2 of the study, in which all archived PDF files
Table 7. Top Ten Most Common Correctable Problems: 2006 fell into this category.
Sample
Frequency of Results – Part 2 of the Study
Problem Description Occurrence Here we used the same procedure as in part 1 to analyze 172
DestOutputProfile Missing value 93.4% PDF files (approximately a 1% sample) from the Profiles in
PDF/A tag not Science collection at the National Library of Medicine. While this
located Missing value 61.4 sample fell into the category of text-searchable image (text-
Font Not embedded 48.4 behind-image) files, a few files had no searchable text because the
original material was handwritten. The sample was indicative of
XMP Metadata Missing value 31.8
the entire population of PDF files in the collection, as all fell into
TR2 Forbidden object 23.2
this category of PDF file. The analysis tool, which successfully
Invalid Colorspace Undefined 22.0 processed all files in the sample, revealed that 100% of the sample
CIDSet Missing value 19.4 population was convertible to Level B-compliant PDF/A. Three
CIDToGIDMap Missing value 16.5 producers were used to create these samples: Acrobat PDFWriter
TR Forbidden object 15.8 3.03 for Windows NT (89.5% of samples), Adobe PDFWriter
Colorspace Issues Invalid value 15.7 2.01 for Windows (8.7%), and Adobe PDF Library 4.0 (1.7%).
The files were in PDF versions of 1.1, 1.2, or 1.3. All problems
were correctable, with the most common ones being the following:
We examined the two samples to determine the potential for missing XMP Metadata, missing value for the DestOutputProfile,
conversion to PDF/A files with Level A compliance. While no and invalid compression (LZW). Since all PDF files in the
tools were commercially available that specifically produced Level Profiles in Science collection were created in the same manner, we
A files at the time of this study, we could estimate the potential for can conclude that there is a high probability that all files in the
success of producing Level A files using our samples. In order to collection are convertible to PDF/A-1b.
accomplish this, a PDF file would have to be convertible to Level
B, but also be tagged, and have ToUnicode maps for its embedded Conclusion
fonts. In the 2002 sample there were 386 tagged files out of the Organizations that have already archived files in the PDF
9,726 files that could be processed (3.9%). Of these, there were format may consider migration to PDF/A, a new standard for long-
only 36 files containing embedded fonts with ToUnicode maps, term preservation of electronic documents. To determine whether
with correctable problems. These are candidates for PDF/A Level a PDF collection is convertible to PDF/A, one of the emerging
A compliance. Unfortunately, this is only 0.37% of all files that commercial tools may be used to analyze the collection. This
the tool could process. This indicates that in a general collection study used one such tool (PDF Appraiser) to confirm that image-
of PDF files there is only a small percentage that could be only PDF collections may be readily migrated to PDF/A Level B.
migrated to a Level A compliant file (PDF/A-1a). It is interesting Our investigation of Internet-based PDF files reveal that only
to note that in the more recent 1,000 file sample, there were 77 about half of the PDF files available through Web sites can be
tagged files of the 950 files the tool processed (8.2%). Of these, converted to PDF/A-1b files, and that less than one percent is
there were only 15 correctable files containing embedded fonts convertible to the more stringent PDF/A-1a. PDF files that are
with ToUnicode maps, making them candidates for Level A text-only or combine visible text with image pose a challenge to
compliance (1.5% of the sample). conversion tools. We found that new capabilities offered by recent
Because it is unlikely that an automatic process would be able versions of PDF do not appear to restrict the ability to convert a
to tag a PDF file accurately, then unless the file is already tagged, PDF file to PDF/A. Instead, most of the problems preventing

10 Society for Imaging Science and Technology


migration deal with incorrectly specified fonts, non-embedded Author Biography
fonts, encryption, and invalid color spaces. In order to achieve Frank L. Walker received his B.S. and M.S. degrees in electrical
successful migration to PDF/A-1b, all non-standard fonts must be engineering from the University of Maryland. Since he joined the National
embedded in the file, all fonts and color spaces must be well Library of Medicine in 1979, he has designed, developed, performed
defined, and no restrictions be placed on file use as governed research, and published a number of papers on computer systems utilizing
through security settings. electronic imaging, primarily for the purpose of electronic document
storage, retrieval, transmission, and use. His current interest is in
Acknowledgement developing software tools for improving the communication and use of
This research was supported by the Intramural Research biomedical library information.
Program of the National Institutes of Health (NIH), National Marie E. Gallagher, a computer scientist in the National Library of
Library of Medicine (NLM), and Lister Hill National Center for Medicine's Lister Hill National Center for Biomedical Communications
Biomedical Communications (LHNCBC). since 1990, is the project leader of the Digital Library Research and
Development team. The team investigates systems and develops the
References software underlying Profiles in Science. Ms. Gallagher earned her B.S.
[1] ISO 19005-1, Document management – Electronic document file degree in Computer Science and Mathematics from the College of William
format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-
and Mary in Virginia.
1), available at http://www.iso.org/iso/en/ISOOnline.frontpage.
[2] PDF-Tools.com White Paper: PDF/A – The Basics. Version 1.0 George R. Thoma is a Branch Chief at an R&D division of the U.S.
February 1, 2006. Available at: http://www.pdf- National Library of Medicine. He directs R&D programs in document
tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf. image analysis, biomedical image processing, animated virtual books, and
[3] Adobe Systems Incorporated, PDF Reference: Adobe Portable related areas. He earned a B.S. from Swarthmore College, and the M.S.
Document Format, Version 1.4, Addison-Wesley, Boston, 3rd and Ph.D. from the University of Pennsylvania, all in electrical
edition (2001).
engineering. Dr. Thoma is a Fellow of the SPIE, the International Society
[4] Adobe Systems Incorporated, XMP Specification (2004).
for Optical Engineering.
[5] Apago, Inc. Web site: www.apagoinc.com .
[6] McCray, Alexa T., Marie E. Gallagher. "Principles for Digital Library
Development." Communications of the ACM 44, no. 5 (May 2001):
48-54.
[7] Gallagher, Marie E., Christie Moffatt. "Surviving Change: The First
Step toward Sustaining Your Digital Library." In: J. Trant and D.
Bearman (eds.). Museums and the Web 2006: Proceedings, Toronto:
Archives & Museum Informatics, published March 1, 2006 at
http://www.archimuse.com/mw2006/papers/gallagher/gallagher.html

Archiving 2007 Final Program and Proceedings 11

You might also like