Nothing Special   »   [go: up one dir, main page]

Full Download The AWK Programming Language, 2nd Edition Aho File PDF All Chapter On 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

The AWK Programming Language, 2nd

Edition Aho
Visit to download the full and correct content document:
https://ebookmass.com/product/the-awk-programming-language-2nd-edition-aho/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

C Programming Language: C PROGRAMMING LANG _p2 2nd


Edition, (Ebook PDF)

https://ebookmass.com/product/c-programming-language-c-
programming-lang-_p2-2nd-edition-ebook-pdf/

Programming for Absolute Beginners: Using the


JavaScript Programming Language 1st Edition Jonathan
Bartlett

https://ebookmass.com/product/programming-for-absolute-beginners-
using-the-javascript-programming-language-1st-edition-jonathan-
bartlett/

MIPS assembly language programming Robert Britton

https://ebookmass.com/product/mips-assembly-language-programming-
robert-britton/

Modern C for Absolute Beginners: A Friendly


Introduction to the C Programming Language 2nd Edition
Slobodan Dmitrovi■

https://ebookmass.com/product/modern-c-for-absolute-beginners-a-
friendly-introduction-to-the-c-programming-language-2nd-edition-
slobodan-dmitrovic/
RISC-V Assembly Language Programming Stephen Smith

https://ebookmass.com/product/risc-v-assembly-language-
programming-stephen-smith/

Go Programming Language For Dummies 1st Edition Wei-


Meng Lee

https://ebookmass.com/product/go-programming-language-for-
dummies-1st-edition-wei-meng-lee/

The Art of Multiprocessor Programming 2nd Edition


Maurice Herlihy

https://ebookmass.com/product/the-art-of-multiprocessor-
programming-2nd-edition-maurice-herlihy/

The Handbook of Language and Speech Disorders 2nd


Edition Jack S. Damico

https://ebookmass.com/product/the-handbook-of-language-and-
speech-disorders-2nd-edition-jack-s-damico/

The Art of Assembly Language Programming using PIC


Technology. Core Fundamentals Theresa Schousek

https://ebookmass.com/product/the-art-of-assembly-language-
programming-using-pic-technology-core-fundamentals-theresa-
schousek-2/
About This eBook
ePUB is an open, industry-standard format for eBooks. However,
support of ePUB and its many features varies across reading devices
and applications. Use your device or app settings to customize the
presentation to your liking. Settings that you can customize often
include font, font size, single or double column, landscape or portrait
mode, and figures that you can click or tap to enlarge. For additional
information about the settings and features on your reading device
or app, visit the device manufacturer’s Web site.
Many titles include programming code or configuration examples. To
optimize the presentation of these elements, view the eBook in
single-column, landscape mode and adjust the font size to the
smallest setting. In addition to presenting code and configurations in
the reflowable text format, we have included images of the code
that mimic the presentation found in the print book; therefore,
where the reflowable format may compromise the presentation of
the code listing, you will see a “Click here to view code image” link.
Click the link to view the print-fidelity code image. To return to the
previous page viewed, click the Back button on your device or app.
The AWK Programming
Language
Second Edition
The AWK Programming
Language
Second Edition

Alfred V. Aho
Brian W. Kernighan
Peter J. Weinberger

Addison-Wesley
Hoboken, New Jersey
Cover image: “Great Auk” by John James Audubon from The Birds of
America, Vols. I-IV, 1827–1838, Archives & Special Collections,
University of Pittsburgh Library System

Many of the designations used by manufacturers and sellers to


distinguish their products are claimed as trademarks. Where those
designations appear in this book, and the publisher was aware of a
trademark claim, the designations have been printed with initial
capital letters or in all capitals.
The authors and publisher have taken care in the preparation of this
book, but make no expressed or implied warranty of any kind and
assume no responsibility for errors or omissions. No liability is
assumed for incidental or consequential damages in connection with
or arising out of the use of the information or programs contained
herein.
For information about buying this title in bulk quantities, or for
special sales opportunities (which may include electronic versions;
custom cover designs; and content particular to your business,
training goals, marketing focus, or branding interests), please
contact our corporate sales department at
corpsales@pearsoned.com or (800) 382-3419.

For government sales inquiries, please contact


governmentsales@pearsoned.com.
For questions about sales outside the U.S., please contact
intlcs@pearson.com.

Visit us on the Web: informit.com/aw


Library of Congress Control Number: 2023941419

Copyright © 1988, 2024 Bell Telephone Laboratories, Incorporated.


UNIX is a registered trademark of The Open Group.
This book was formatted by the authors in Times Roman, Courier
and Helvetica, using Groff, Ghostscript and other open source Unix
tools. See https://www.awk.dev.

All rights reserved. This publication is protected by copyright, and


permission must be obtained from the publisher prior to any
prohibited reproduction, storage in a retrieval system, or
transmission in any form or by any means, electronic, mechanical,
photocopying, recording, or likewise. For information regarding
permissions, request forms and the appropriate contacts within the
Pearson Education Global Rights & Permissions Department, please
visit www.pearson.com/permissions.

ISBN-13: 978-0-13-826972-2
ISBN-10: 0-13-826972-6
$PrintCode
To the millions of Awk users
Contents

Preface

1. An Awk Tutorial
1.1 Getting Started
1.2 Simple Output
1.3 Formatted Output
1.4 Selection
1.5 Computing with Awk
1.6 Control-Flow Statements
1.7 Arrays
1.8 Useful One-liners
1.9 What Next?
2. Awk in Action
2.1 Personal Computation
2.2 Selection
2.3 Transformation
2.4 Summarization
2.5 Personal Databases
2.6 A Personal Library
2.7 Summary
3. Exploratory Data Analysis
3.1 The Sinking of the Titanic
3.2 Beer Ratings
3.3 Grouping Data
3.4 Unicode Data
3.5 Basic Graphs and Charts
3.6 Summary
4. Data Processing
4.1 Data Transformation and Reduction
4.2 Data Validation
4.3 Bundle and Unbundle
4.4 Multiline Records
4.5 Summary
5. Reports and Databases
5.1 Generating Reports
5.2 Packaged Queries and Reports
5.3 A Relational Database System
5.4 Summary
6. Processing Words
6.1 Random Text Generation
6.2 Interactive Text-Manipulation
6.3 Text Processing
6.4 Making an Index
6.5 Summary
7. Little Languages
7.1 An Assembler and Interpreter
7.2 A Language for Drawing Graphs
7.3 A Sort Generator
7.4 A Reverse-Polish Calculator
7.5 A Different Approach
7.6 A Recursive-Descent Parser for Arithmetic Expressions
7.7 A Recursive-Descent Parser for a Subset of Awk
7.8 Summary
8. Experiments with Algorithms
8.1 Sorting
8.2 Profiling
8.3 Topological Sorting
8.4 Make: A File Updating Program
8.5 Summary
9. Epilogue
9.1 Awk as a Language
9.2 Performance
9.3 Conclusion
Appendix A: Awk Reference Manual
A.1 Patterns
A.2 Actions
A.3 User-Defined Functions
A.4 Output
A.5 Input
A.6 Interaction with Other Programs
A.7 Summary

Index
Preface

Awk was created in 1977 as a simple programming language for


writing short programs that manipulate text and numbers with equal
ease. It was meant as a scripting language to complement and work
well with Unix tools, following the Unix philosophy of having each
program do one thing well and be composable with other programs.
The computing world today is enormously different from what it
was in 1977. Computers are thousands of times faster and have a
million times as much memory. Software is different too, with a rich
variety of programming languages and computing environments. The
Internet has given us more data to process, and it comes from all
over the world. We’re no longer limited to the 26 letters of English
either; thanks to Unicode, computers process the languages of the
world in their native character sets.
Even though Awk is nearly 50 years old, and in spite of the great
changes in computing, it’s still widely used, a core Unix tool that’s
available on any Unix, Linux, or macOS system, and usually on
Windows as well. There’s nothing to download, no libraries or
packages to import — just use it. It’s an easy language to learn and
you can do a lot after only a few minutes of study.
Scripting languages were rather new in 1977, and Awk was the
first such language to be widely used. Other scripting languages
complement or sometimes replace Awk. Perl, which dates from 1987,
was an explicit reaction to some of the limitations of Awk at the
time. Python, four years younger than Perl, is by far the most widely
used scripting language today, and for most users would be the
natural next step for larger programs, especially to take advantage
of the huge number of libraries in the Python ecosystem. On the
web, and also for some standalone uses, JavaScript is the scripting
language of choice. Other more niche languages are still highly
useful, and “the shell” itself has become a variety of different shells
with significantly enriched programming capabilities.
Programmers and other computer users spend a lot of time doing
simple, mechanical data manipulation — changing the format of
data, checking its validity, finding items that have some property,
adding up numbers, printing summaries, and the like. All of these
jobs ought to be mechanized, but it’s a real nuisance to have to
write a special-purpose program in a language like C or Python each
time such a task comes up.
Awk is a programming language that makes it possible to handle
simple computations with short programs, often only one or two
lines long. An Awk program is a sequence of patterns and actions
that specify what to look for in the input data and what to do when
it’s found. Awk searches a set of files that contain text (but not non-
text formats like Word documents, spreadsheets, PDFs and so on)
for lines that match any of the patterns; when a matching line is
found, the corresponding action is performed. A pattern can select
lines by combinations of regular expressions and comparison
operations on strings, numbers, fields, variables, and array elements.
Actions may perform arbitrary processing on selected lines; the
action language looks like C but there are no declarations, and
strings and numbers are builtin data types.
Awk scans text input files and splits each input line into fields
automatically. Because so many things are automatic — input, field
splitting, storage management, initialization — Awk programs are
usually much shorter than they would be in a more conventional
language. Thus one common use of Awk is for the kind of data
manipulation suggested above. Programs, a line or two long, are
composed at the keyboard, run once, then discarded. In effect, Awk
is a general-purpose programmable tool that can replace a host of
specialized tools or programs.
The same brevity of expression and convenience of operations
make Awk valuable for prototyping larger programs. Start with a few
lines, then refine the program until it does the desired job,
experimenting with designs by trying alternatives quickly. Since
programs are short, it’s easy to get started, and easy to start over
when experience suggests a different direction. And if necessary, it’s
straightforward to translate an Awk program into another language
once the design is right.

Organization of the Book


The goal of this book is to teach you what Awk is and how to use
it effectively. Chapter 1 is a tutorial on how to get started; after
reading even a few pages, you will have enough information to begin
writing useful programs. The examples in this chapter are short and
simple, typical of the interactive use of Awk.
The rest of the book contains a variety of examples, chosen to
show the breadth of applicability of Awk and how to make good use
of its facilities. Some of the programs are ones we use personally;
others illustrate ideas but are not intended for production use; a few
are included just because they are fun.
Chapter 2 shows Awk in action, with a number of small programs
that are derived from the way that we use Awk for our own personal
programming. The examples are probably too idiosyncratic to be
directly useful, but they illustrate techniques and suggest potential
applications.
Chapter 3 shows how Awk can be used for exploratory data
analysis: examining a dataset to figure out its properties, identify
potential (and real) errors, and generally get a grip on what it
contains before expending further effort with other tools.
The emphasis in Chapter 4 is on retrieval, validation,
transformation, and summarization of data — the tasks that Awk
was originally designed for. There is also a discussion of how to
handle data like address lists that naturally comes in multiline
chunks.
Awk is a good language for managing small, personal databases.
Chapter 5 discusses the generation of reports from databases, and
builds a simple relational database system and query language for
data stored in multiple files.
Chapter 6 describes programs for generating text, and some that
help with document preparation. One of the examples is an indexing
program based on the one we used for this book.
Chapter 7 is about “little languages,” that is, specialized
languages that focus on a narrow domain. Awk is convenient for
writing small language processors because its basic operations
support many of the lexical and symbol table tasks encountered in
translation. The chapter includes an assembler, a graphics language,
and several calculators.
Awk is a good language for expressing certain kinds of
algorithms. Because there are no declarations and because storage
management is easy, an Awk program has many of the advantages
of pseudo-code but Awk programs can be run, which is not true of
pseudo-code. Chapter 8 discusses experiments with algorithms,
including testing and performance evaluation. It shows several
sorting algorithms, and culminates in a version of the Unix make
program.
Chapter 9 explains some of the historical reasons why Awk is as
it is, and contains some performance measurements, including
comparisons with other languages. The chapter also offers
suggestions on what to do when Awk is too slow or too confining.
Appendix A, the reference manual, covers the Awk language in a
systematic order. Although there are plenty of examples in the
appendix, like most manuals it’s long and a bit dry, so you will
probably want to skim it on a first reading.
You should begin by reading Chapter 1 and trying some small
examples of your own. Then read as far into each chapter as your
interest takes you. The chapters are nearly independent of each
other, so the order doesn’t matter much. Take a quick look at the
reference manual to get an overview, concentrating on the
summaries and tables, but don’t get bogged down in the details.

The Examples
There are several themes in the examples. The primary one, of
course, is to show how to use Awk well. We have tried to include a
wide variety of useful constructions, and we have stressed particular
aspects like associative arrays and regular expressions that typify
Awk programming.
A second theme is to show Awk’s versatility. Awk programs have
been used from databases to circuit design, from numerical analysis
to graphics, from compilers to system administration, from a first
language for non-programmers to the implementation language for
software engineering courses. We hope that the diversity of
applications illustrated in the book will suggest new possibilities to
you as well.
A third theme is to show how common computing operations are
done. The book contains a relational database system, an assembler
and interpreter for a toy computer, a graph-drawing language, a
recursive-descent parser for an Awk subset, a file-update program
based on make, and many other examples. In each case, a short Awk
program conveys the essence of how something works in a form that
you can understand and play with.
We have also tried to illustrate a spectrum of ways to attack
programming problems. Rapid prototyping is one approach that Awk
supports well. A less obvious strategy is divide and conquer:
breaking a big job into small components, each concentrating on one
aspect of the problem. Another is writing programs that create other
programs. Little languages define a good user interface and may
suggest a sound implementation. Although these ideas are
presented here in the context of Awk, they are much more generally
applicable, and ought to be part of every programmer’s repertoire.
The examples have all been tested directly from the text, which is
in machine-readable form. We have tried to make the programs
error-free, but they do not defend against all possible invalid inputs,
so we can concentrate on conveying the essential ideas.

Evolution of Awk
Awk was originally an experiment in generalizing the Unix tools
grep and sed to deal with numbers as well as text. It was based on
our interests in regular expressions and programmable editors. As an
aside, the language is officially AWK (all caps) after the authors’
initials, but that seems visually intrusive, so we’ve used Awk
throughout for the name of the language, and awk for the name of
the program. (Naming a language after its creators shows a certain
paucity of imagination. In our defense, we didn’t have a better idea,
and by coincidence, at some point in the process we were in three
adjacent offices in the order Aho, Weinberger, and Kernighan.)
Although Awk was meant for writing short programs, its
combination of facilities soon attracted users who wrote significantly
larger programs. These larger programs needed features that had
not been part of the original implementation, so Awk was enhanced
in a new version made available in 1985.
Since then, several independent implementations of Awk have
been created, including Gawk (maintained and extended by Arnold
Robbins), Mawk (by Michael Brennan), Busybox Awk (by Dmitry
Zakharov), and a Go version (by Ben Hoyt). These differ in minor
ways from the original and from each other but the core of the
language is the same in all. There are also other books about Awk,
notably Effective Awk Programming, by Arnold Robbins, which
includes material on Gawk. The Gawk manual itself is online, and
covers that version very carefully.
The POSIX standard for Awk is meant to define the language
completely and precisely. It is not particularly up to date, however,
and different implementations do not follow it exactly.
Awk is available as a standard installed program on Unix, Linux,
and macOS, and can be used on Windows through WSL, the
Windows Subsystem for Linux, or a package like Cygwin. You can
also download it in binary or source form from a variety of web sites.
The source code for the authors’ version is at
https://github.com/onetrueawk/awk. The web site
https://www.awk.dev is devoted to Awk; it contains code for all the
examples from the book, answers to selected exercises, further
information, updates, and (inevitably) errata.
For the most part, Awk has not changed greatly over the years.
Perhaps the most significant new feature is better support for
Unicode: newer versions of Awk can now handle data encoded in
UTF-8, the standard Unicode encoding of characters taken from any
language. There is also support for input encoded as comma-
separated values, like those produced by Excel and other programs.
The command
$ awk --version

will tell you which version you are running. Regrettably, the default
versions in common use are sometimes elderly, so if you want the
latest and greatest, you may have to download and install your own.
Since Awk was developed under Unix, some of its features reflect
capabilities found in Unix and Linux systems, including macOS; these
features are used in some of our examples. Furthermore, we assume
the existence of standard Unix utilities, particularly sort, for which
exact equivalents may not exist elsewhere. Aside from these
limitations, however, Awk should be useful in any environment.
Awk is certainly not perfect; it has its full share of irregularities,
omissions, and just plain bad ideas. But it’s also a rich and versatile
language, useful in a remarkable number of cases, and it’s easy to
learn. We hope you’ll find it as valuable as we do.

Acknowledgments
We are grateful to friends and colleagues for valuable advice. In
particular, Arnold Robbins has helped with the implementation of
Awk for many years. For this edition of the book, he found errors,
pointed out inadequate explanations and poor style in Awk code, and
Another random document with
no related content on Scribd:
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like