Nothing Special   »   [go: up one dir, main page]

(FREE PDF Sample) State of The Art On Grammatical Inference Using Evolutionary Method 1st Edition Pandey Ebooks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Full download test bank at ebookmeta.

com

State of the Art on Grammatical Inference Using


Evolutionary Method 1st Edition Pandey

For dowload this book click LINK or Button below

https://ebookmeta.com/product/state-of-the-art-on-
grammatical-inference-using-evolutionary-
method-1st-edition-pandey/

OR CLICK BUTTON

DOWLOAD EBOOK

Download More ebooks from https://ebookmeta.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Handbook on the State of the Art in Applied Psychology


1st Edition Peter Graf

https://ebookmeta.com/product/handbook-on-the-state-of-the-art-
in-applied-psychology-1st-edition-peter-graf/

Orthodontics The State of the Art Harry G. Barrer


(Editor)

https://ebookmeta.com/product/orthodontics-the-state-of-the-art-
harry-g-barrer-editor/

Asymptotic Statistical Inference A Basic Course Using R


Shailaja Deshmukh

https://ebookmeta.com/product/asymptotic-statistical-inference-a-
basic-course-using-r-shailaja-deshmukh/

Progressive Myoclonus Epilepsies State of the art 1st


Edition Berge A. Minassian

https://ebookmeta.com/product/progressive-myoclonus-epilepsies-
state-of-the-art-1st-edition-berge-a-minassian/
Mathematical Modelling of Decision Problems Using the
SIMUS Method for Complex Scenarios 1st Edition Nolberto
Munier

https://ebookmeta.com/product/mathematical-modelling-of-decision-
problems-using-the-simus-method-for-complex-scenarios-1st-
edition-nolberto-munier/

Government Transparency State of the Art and New


Perspectives Gregory Porumbescu

https://ebookmeta.com/product/government-transparency-state-of-
the-art-and-new-perspectives-gregory-porumbescu/

The state of the art Teaching drama in the 21st century


1st Edition Michael Anderson

https://ebookmeta.com/product/the-state-of-the-art-teaching-
drama-in-the-21st-century-1st-edition-michael-anderson/

Applied Evolutionary Algorithms for Engineers using


Python 1st Edition Leonardo Azevedo Scardua

https://ebookmeta.com/product/applied-evolutionary-algorithms-
for-engineers-using-python-1st-edition-leonardo-azevedo-scardua/

Breast MRI: State of the Art and Future Directions


Katja Pinker (Editor)

https://ebookmeta.com/product/breast-mri-state-of-the-art-and-
future-directions-katja-pinker-editor/
STATE OF THE ART ON
GRAMMATICAL
INFERENCE USING
EVOLUTIONARY
METHOD
This page intentionally left blank
STATE OF THE ART ON
GRAMMATICAL
INFERENCE USING
EVOLUTIONARY
METHOD

HARI MOHAN PANDEY


Department of Computer Science, Edge Hill University, United Kingdom
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

Copyright Ó 2022 Elsevier Inc. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from
the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be
found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as
may be noted herein).

Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

ISBN: 978-0-12-822116-7

For information on all Academic Press publications visit our


website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner


Acquisitions Editor: Chris Katsaropoulos
Editorial Project Manager: Franchezca A. Cabural
Production Project Manager: Omer Mukthar
Cover Designer: Victoria Pearson

Typeset by TNQ Technologies


To my mom and dad for motivating me and making my life
sweeter.
To my brother and sisters for helping and supporting me.
To my wife for giving time and understanding my vision.
To sweetest Anant for making me smile all the time.
This page intentionally left blank
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Chapter 1 Introduction and scientific goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Why grammatical inference is popular . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3. Scientific goals: why this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2 State of the art: grammatical inference . . . . . . . . . . . . . . . . . . . . . . . 3


2.1.Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.Part 1. Preliminary definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3.Part 2. Introduction to learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.Comparison and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5.What are the challenges with grammatical inference
algorithms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 3 State of the art: genetic algorithms and


premature convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2. Factors affecting genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3. Theoretical framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4. Approaches to preventing premature convergence . . . . . . . . . . . . . . 42
3.5. Classifications and analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
viii Contents

3.6. Challenges with the genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 92


3.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Further reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Chapter 4 Genetic algorithms and grammatical inference . . . . . . . . . . . . . 125


4.1.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2.
Bit-mask oriented genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.
Bit-masking oriented data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4.
Reproduction operators: crossover and mutation
mask fill. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.5. New offspring generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.6. Genetic algorithm implemented for grammar
induction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.7. Maintaining regularity and generalization and
minimum description length principle . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.8. Grammatical inference and minimum description
length principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.9. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Further reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Chapter 5 Performance analysis of genetic algorithm for


grammatical inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.1.
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.
Simulation model and test languages . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.3.
Parameter selection and tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.4.
Performance analysis of proposed bit masking-oriented
genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Contents ix

Chapter 6 Applications of grammatical inference methods and


future development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.2. Application of grammatical inference method . . . . . . . . . . . . . . . . . . 194
6.3. Opportunities for future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


Author index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
This page intentionally left blank
Foreword

Grammatical inference, that is, learning a formal grammar


from a set of observations, has many practical applications.
These arise in natural language processing, computational
biology, and many other areas. As with many challenges in
machine learning, grammatical inference remains “unsolved.”
Yet, over the years, a variety of machine approaches have been
applied to these problems, with varying degrees of success
depending on the application and the implementation. In this
book, Prof. Pandey provides a very helpful overview of some of
these methods, and also offers methods and results based on his
own approaches within the field of evolutionary computation.
Mainly, the techniques focus on derivatives of canonical genetic
algorithms and swarm optimization, but the reader will be able to
take any of Prof. Pandey’s efforts and investigate and extend
them for their own purposes using any other approach that is
worthy of exploration. To get the most from the content, readers
should give themselves the patience to understand the algorithms,
and their acronyms, so as to be able to become familiar with the
approaches. There will continue to be a need for improving our
ability to perform grammatical inference reliably, and Prof.
Pandey’s contributions outlined in this book help to solidify the
foundation for continuing this process of improvement.
David Fogel
Natural Selection, Inc.
San Diego, California
This page intentionally left blank
Preface

This book is a comprehensive, hands-on guide to formal


grammatical inference using evolutionary algorithms. This book
is designed and Chaptalize for researchers, academicians, and
students for all levels. The contents of this book assume that you
have no previous knowledge or background information about
grammatical inference and evolutionary algorithms, especially
genetic algorithms. People familiar with formal languages,
compiler construction, and parser design, and a basic under-
standing of algorithm design and analysis will, of course, have an
easier time and move through the first few chapters quickly.
The first three chapters of this book will allow you to take full
advantage of existing learning or grammatical inference algo-
rithms along with a comprehensive overview of both grammat-
ical inference and genetic algorithms. You will start at the
beginning, and when you have finished this book, you will have
moved far along the road to grammatical inference algorithms. I
have tried hard to cover most learning algorithms, starting from
the very first learning algorithm proposed by Gold in 1950.
I have also tried to stress new ways of thinking required to
master learning and evolutionary algorithms, so that even experts
in grammatical inference or evolutionary algorithms can benefit
from this book. I have used this approach to force evolutionary
algorithms or genetic algorithms into the framework of
older grammatical inference algorithms with evolutionary and
metaheuristic algorithms.
To make this book even more useful, there are extensive dis-
cussions of important topics. The strengths and weakness of each
method are given, which are left out by most other introductory
books. Unlike many introductory books, I want to introduce you
to a topic, but I also go into it in enough depth that you can
actually use the techniques to develop new algorithms.
The subject matter of this book is divided into six chapters,
each of which has been written and developed with immensely
simplified theoretical and practical concepts (except Chapter 1,
which is the foundation chapter to introduce scientific goals).
These chapters clearly provide the core concepts of grammatical
inference and genetic algorithms. State of the Art on Gram-
matical Inference Using Evolutionary Method was written
xiv Preface

especially for those who are novices to the field of grammatical


inference and evolutionary algorithms.
As mentioned, this book is divided into six chapters. A short
summary of each chapter is presented next for a quick look.
Chapter 1 (Introduction and Scientific Goals): This chapter
presents an introductory text with the motivation and scope of
the book. It shows research problems and main contributions
and briefly describes grammatical inference and its effectiveness
across domains. The basics of formal grammars and various
existing grammatical inference methods are discussed in
Chapter 2.
Chapter 2 (State of the Art: Grammatical Inference): This
chapter presents the current state of the art within the context of
grammatical inference methods. I have divided this chapter into
two parts. Part 1 covers some preliminary definitions such as
Backus-Naur form, grammars, and Chomsky hierarchy. The focus
of Part 2 is mainly on the different grammatical inference
methods. A comprehensive discussion of each method is pre-
sented with their strengths and weaknesses. Grammatical infer-
ence methods are classified based on various factors and learning
techniques. At the end, challenges are presented with grammat-
ical inference methods and a summary.
Chapter 3 (State of the Art: Genetic Algorithms and Prema-
ture Convergence): The purpose of this chapter is to discuss the
genetic algorithm, which is a popular algorithm in the evolu-
tionary algorithm family. An introduction of genetic algorithm is
provided with factors affecting its behavior with theoretical
frameworks. In addition, challenges in executing genetic algo-
rithms are presented. The state of the art within the context of
premature convergence in genetic algorithms is presented
comprehensively. A detailed summary and analysis of different
methods are given for quick review. A comparative analysis is
provided based on different parameters. The motivation for this
chapter is to identify methods that allow the development of new
strategies to prevent premature convergence, and then to apply
evolutionary algorithms to solve grammatical inference
problems.
Chapter 4 (Genetic Algorithms and Grammatical Inference):
The focus of this chapter is on evolutionary algorithms used for
grammatical inference. For this book, I have considered genetic
algorithms such as bit-masking oriented genetic algorithms. I
discuss the role of bit-masking oriented data structure and its
formation. The roles of the crossover mask (CM) and mutation
mask (MM) are also shown with three crossover operators and an
Preface xv

MM operator. In addition, the role of Boolean-based procedure is


discussed with examples for the offspring generation. Finally,
algorithms are shown that use the CM and MM with a Boolean-
based procedure for grammatical inference. A detailed flowchart
is presented highlighting the applicability of the minimum
description length principle.
Chapter 5 (Performance Analysis of Genetic Algorithm for
Grammatical Inference): The primary aim of this chapter is to
report computational and statistical test results by implementing
algorithms discussed in Chapter 4. This chapter shows the
detailed method for developing a robust experimental setup to
conduct the experiments. This chapter is also presents a com-
parison with and analysis of other algorithms.
Chapter 6 (Applications of Grammatical Inference Methods
and Future Development): This chapter discusses the wide range
of applications of grammatical inference methods and the pos-
sibilities of future investigation in this area.
Now, a confession: My original goal was to make this book a
one-stop resource, but realistically, grammatical inference and
evolutionary algorithms have gotten far too big and powerful for
any one book to do this. Nonetheless, if you finish this book, I
truly believe that you will be in a position to begin writing or
implementing both grammatical inference and evolutionary al-
gorithms for commercial-quality applications! True mastery will
take longer. I have tried to explain the advantages, disadvantages,
and suggestions that can take you to the next level.
This page intentionally left blank
Acknowledgment

First of all, I would like to thank Baba Visvnath for constant blessings throughout the
development of this book. Because an understanding of a study such as this is never the
outcome of efforts of a single person, it bears the imprint of a number of people who have
directly or indirectly helped me complete this book, State of the Art on Grammatical
Inference Using Evolutionary Method. I would be failing in my duty if I did not thank all those
whose sincere advice helped me to make this book truly educative, effective, and pleasurable.
I would like to acknowledge my family: Dr. Vijay Nath Pandey, Smt. Madhuri Pandey,
Anjana Pandey and Ranjana Pandey, Man Mohan Pandey, Rachana Pandey, and Anant
(my sweet babu).
I have immense pleasure in expressing wholehearted gratitude to my supervisors and
mentors, Dr. Deepti Mehrotra (Amity University), Dr. Ankit Chaudhary (University of Mis-
sourieSt. Louis), and Prof. Abhay Bansal (Amity University). I am also very thankful to my
friends and mentors, Prof. Arun Prakash Agarwal (Sharda University), Prof. Ankur
Choudhary (Sharda University), Prof. Gaurav Raj (Sharda University), Prof. Neha Agarwal
(Amity University), Prof. Neetu Narayan (Amity University), Prof. Anchal Garg (Amity Uni-
versity), Prof. Ranjeet Rout (NIT Srinagar), Shruti Gupta (Amity University), Prof. Graham
Kendall (University of Nottingham), Prof. David Windridge (Middlesex University), Prof. Nik
Bessis (Edge Hill University), Prof. David Fogel (Natural Selection), and Prof. Francesco
Masulli (University of Genova) for supporting and guiding me during the preparation of this
book. Also, I am truly thankful to my students, whose conceptual queries have always helped
me dig more deeply into the subject matter.
Last but not least, I am thankful to Elsevier ERC Editorial USA for feedback, support, and
guidance for writing and publishing this book.
Dr. Hari Mohan Pandey
(Author)
This page intentionally left blank
Abbreviations

ABL Alignment-Based Learning


ADIOS Automatic Distillation of Structure
AGA Abstract Genetic Algorithm
AHCF Adaptive Hierarchical Fair Competition
ALLiS Architecture for Learning Linguistic Structure
ALPS Age-Layered Population Structure
AMR Adaptive Mutation Rate
ATIS Air Travel Information System
BBP Boolean-Based Procedure
BFWA Broadband Fixed Wireless Access
BMODS Bit-Mask Oriented Data Structure
BMOGA Bit-Mask Oriented Genetic Algorithm
BMU Best Matching Unit
BNF BackuseNaur Form
BS Base Station
CA Clustering Approach
CDC Context Distribution Clustering
CF Crowding Factor
CFBS Correlative Family-Based Selection
CFG Context-Free Grammar
CFL Context-Free Language
CG Categorical Grammar
CGA Conventional Genetic Algorithm
CHFC Continuous Hierarchical Fair Competition
CLL Computational Learning of Natural Language
CM Crossmask/Crossover Mask
CMOC Crossover-Mutation Operator Combination
CNF Conjunctive Normal Form
CTS Correlative Tournament Selection
DARO Dynamic Application Reproduction Operator
DCGA Diversity Control-Oriented Genetic Algorithm
DE Differential Evolution
DE-MC Differential Evolution Markov Chain
DFA Deterministic Finite Automata
DNF Disjunctive Normal Form
DSL Domain-Specific Language
EA Evolutionary Algorithm
EMPGA Elite Mating Pool Genetic Algorithm
EP Evolutionary Programming
ES Evolutionary Strategy
FC Frequency Crossover
FKB Fuzzy Knowledge Base
FUS Fitness Uniform Selection
GA Genetic Algorithm
GAC Genetic Algorithm with Compliments
GASOM Genetic Algorithm Using Self-organizing Maps
xx Abbreviations

GAVaPS Genetic Algorithm with Varying Population Size


GAWMDL Genetic Algorithm With Minimum Description Length Principle
GAWOMDL Genetic Algorithm Without Minimum Description Length Principle
GCG Generalized Categorial Grammar
GI Grammatical Inference
GP Genetic Programming
GRIDS Grammar Induction Drive by Simplicity
GSM Gready Search Method
HFC Hierarchical Fair Competition
HM Heuristic Method
IFS Individual Fitness Score
IPA Incest Prevention Algorithm
ITBL Improved Tabular Representation Algorithm
KL KullbackeLeibler
LAgent Language Agent
LHDC Long Hamming Distance Crossover
LOC Local Optimum Convergence
MA Memetic Algorithm
MCMC Markov Chain Monte-Carlo
MCMP Multicrossover on Multiple Parents
MCMPIP Multiple Crossover on Multiple Parents with Incest Prevention
MCPC Multiple Crossovers per Couple
MDL Minimum Description Length
MM Mutmask/Mutation Mask
MOGA Matrix-Oriented Genetic Algorithm
NN Neural Network
NSGA Nondominated Sorting Genetic Algorithm
NSGA-DAR Nondominated Sorting Genetic AlgorithmeDynamic Application of
Reproduction Operators
OEGA OddeEven Genetic Algorithm
OVIS Openbaar Vervoer Informatie System
PA Pygmy Algorithm
PAC Probably Approximately Correct
PAHCF Parallel Adaptive Hierarchal Fair Competition
PDT Population Distribution Table
PRA Population Replacement Algorithm
PSO Particle Swarm Optimization
QPSO Quantum-Behaved Particle Swarm Optimization
RBCGA Real/Binary-like Coded Genetic Algorithm
RE Resultant Effectivity
RL Regular Language
RNN Recurrent Neural Network
RO Random offspring
ROGGA Random Offspring Generation Genetic Algorithm
RPF Racial Preference Factor
RTS Restricted Tournament Selection
SA Simulated Annealing
SASEGA Self-adaptive Segregative Genetic Algorithm
SASEGASA Self-adaptive Segregative Genetic Algorithm with Simulated Anneal-
ing aspects
SBGA Shifting Balance Genetic Algorithm
SBT Shifting Balance Theory
SDT Social Disaster Techniques
SEGA Segregative Genetic Algorithm
Abbreviations xxi

SHDC Short Hamming Distance Crossover


SHT Search History Table
SINR Signal to Interference and Noise Ratio
SM Statistical Method
SNR Signal-to-Noise Ratio
SOM Self-organizing Map
TBL Tabular Representation Algorithm
TPC Two-Point Crossover
TPCIS Two-Point Crossover with Internal Swapping
TS Terminal Station
TSP Traveling Salesman Problem
UC Uniform Crossover
This page intentionally left blank
Introduction and scientific goals
1
1.1 Introduction
Multilingual text data are increasing daily. This creates opportu-
nities and challenges in the field of computational linguistics.
This book will enrich the domains of language processing with
performance improvement. This book is a resource presenting
an approach to grammatical inference (GI) using the evolutionary
algorithm (EA). GI deals with the standard learning procedure
to acquire grammar based on evidence about language. It was
extensively studied owing to its high importance in various fields
of science and engineering. Typically, GI is a research domain
that uses an unsupervised learning procedure to extract gram-
matical descriptions from a corpus or set of textual data. Several
approaches have been proposed for GI that had their own merits
and demerits. It was noticed that some approaches showed good
results whereas others had limitations in dealing with negative
samples. Hence, this book provides a basic but comprehensive
overview of different GI algorithms. The book can be used as a
reference study material for different domain. It focuses on formal
grammar learning and applications, language acquisition, language
processing, and information retrieval.

1.2 Why grammatical inference is popular


The field of GI is transversal to several research domains. There
are few applications based purely on GI, but many uses of GI
techniques exist; hence, there is plenty of room to find tasks in
which GI techniques have performed far better than other ma-
chine learning or pattern recognition methods. GI methods
have been used successfully in many domains, including robotics
and control systems, structural pattern recognition, computa-
tional linguistics, automatic translation, computational biology,
inductive logic programming, document management, compres-
sion, applications to time series, data mining, and music. GI
methods have attracted researchers who are working in the soft-
ware engineering domain. Some application areas in which GI

State of the Art on Grammatical Inference Using Evolutionary Method. https://doi.org/10.1016/B978-0-12-822116-7.00004-5


Copyright © 2022 Elsevier Inc. All rights reserved. 1
2 Chapter 1 Introduction and scientific goals

techniques have been heavily used are (1) inference of general-


purpose programming languages; (2) inference of grammar rules
from programming languages; (3) inference of domain-specific
languages; (4) inference of graph grammar and visual languages;
and (5) software testing and test case generation. This discussion
reveals that GI has a wide spectrum of applications, but there are
few powerful GI algorithms. Where they exist, they deal only with
the positive corpus. Some GI algorithms have been proposed to
deal with both the positive and negative corpus. These GI
algorithms use EAs or other optimization methods such as the
greedy method, minimum description length principle, and
statistical methods. Most GI algorithms are based on EAs whose
genetic algorithm and genetic programming have been employed
on a larger scale. There is no doubt about the exploratory power
of EAs, but they suffer with key challenges known as premature
convergence and slow finishing. Hence, a serious investigation
is required to withstand premature convergence. This book pre-
sents a comprehensive study of approaches that were proposed
to handle premature convergence.

1.3 Scientific goals: why this book?


The aims of this book are to present the state of the art on GI
algorithms and discuss the most recent evolutionary algorithms
proposed for GI from the positive and negative corpus.
To achieve these aims, these objectives were formed in the
preparation of this book:
Objective 1: Explore the current state of the art for GI
algorithms.
Objective 2: Offer a comprehensive discussion on EAs. Here,
the focus will be on the genetic algorithm.
Objective 3: Explore the current state of the art on premature
convergence handling algorithms.
Objective 4: Discuss implementation strategies of GI using
EAs.
Objective 5: Discuss application areas in which GI algorithms
have an important role.
Objective 6: Explore future developments for GI algorithms.
State of the art: grammatical
2
inference

2.1 Introduction
This chapter is divided into two parts. Part 1 covers some prelim-
inary definitions such as the BackuseNaur form (BNF), gram-
mars, and Chomsky hierarchy. Part 2 focuses on the different
grammar learning algorithms. This chapter comprehensively dis-
cusses different grammatical inference (GI) algorithms along with
their strengths and weaknesses. This chapter also presents a
detailed classification of learning algorithms.

2.2 Part 1. Preliminary definitions


In the area of computer science and others, grammars have a
vital role in both the theoretical and practical points of view.
Context-free grammar (CFG) is one of four classes of grammars
defined by Noam Chomsky that have a wide variety of applica-
tions. Primarily, CFG has been used to build compilers to verify
the syntax of a computer program. However, it is known that do-
ing research in this field is a computationally hard nut to crack.

2.2.1 BackuseNaur form


In the field of computer science, BNF is a notation method for
CFGs most often applied to describe the syntax of a language
used in computing, such as in computer programming languages,
instruction sets, document formats, and communication proto-
cols. The BNF specification is a set of derivation rules, such as:
<symbol> :: [ expression
<symbol> :: [ expression1|expression2|expression3
where <symbol> represents a nonterminal, and the expression
shows one or more sequence of symbols. If several expressions are
used, then they are separated by a vertical bar “|”, which shows a
choice.

State of the Art on Grammatical Inference Using Evolutionary Method. https://doi.org/10.1016/B978-0-12-822116-7.00002-1


Copyright © 2022 Elsevier Inc. All rights reserved. 3
4 Chapter 2 State of the art: grammatical inference

2.2.2 Grammars
A grammar is the most useful and general system P
employed to
represent languages. A grammar is a quadruple ðV ; ; P; SÞ, where:
• V is a finite nonempty set of elements known as nonterminals
or
P a variable;
• is a finite nonempty set of elements known as terminals;
• S˛V is a distinguished nonterminal referred as the start symbol;
and
• P is a finite set of productionP P
rule (s) represented
P
in
BNFða /bÞ, where a˛ð WV Þ  V ð WV Þ and b˛ð WV Þ;
that is, a is a string of terminals and nonterminals, which con-
tains at least one nonterminal, and b is a string of terminals
and nonterminals.

2.2.3 Chomsky hierarchy of grammars


Chomsky (1956, 1963) presented a classification of grammars known
as the Chomsky hierarchy by dividing grammars into four classes,
gradually increasing restrictions on the form of production. The pro-
posed classification scheme has a strong connection to the classifi-
cation of the automata,P
as depicted in Fig. 2.1.
Suppose G ¼ ðV ; ; P; SÞ is a grammar. Then,
• G is also called a type 0 grammar or unrestricted grammar.
• G is a type 1 or context-sensitive grammar if each production
ða /bÞ˛P satisfies jaj  jbj. Type 1 grammar has a production
S/˛; therefore, S does not appear on the right-hand side
(RHS) of any production.

Languages Automata

Type 0 Turing Machine

Type 1 Linear Bounded Automata

Type 2 Pushdown Automata

Type 3

Finite Automata
Regular Language

Context Free Language

Context Sensitive Language

Unrestricted Language

Figure 2.1 Chomsky hierarchy.


Chapter 2 State of the art: grammatical inference 5

• G is a type 2 or CFG if each production ða /bÞ˛P satisfies


jaj ¼ 1; that is, a indicates a single nonterminal.
• G is a type 3 or right linear or regular grammar if each production
has one of three forms: A/cB, A/c, or A/˛, where A and B are
nonterminals (with B ¼ A allowed) and c is a terminal symbol.
Fig. 2.1 indicates that the language generated by type i grammar
is known as type i language (i ¼ 0, 1, 2, 3). Detailed descriptions of
these grammars are presented in the next section. Fig. 2.1 shows
that the Chomsky hierarchy is connected to a classification of the
automata, which can be used to recognize a specific language as
well as the grammar used to recognize them. The problem of recog-
nizing a language can be explained thus:
P
Let L be any language over an alphabet . The problem is to
P
design an automata M that can accept an input sequence in  .
P
The M accepts the input sequence in only if they are valid
elements of L; otherwise, it rejects it.

Fig. 2.1 shows that each class of language in the Chomsky hier-
archy is generated by a specific type of grammar, which is then
recognized by an appropriate type of the automata. Moving up in
the hierarchy of the languages, the type of the automata required
to recognize a language is challenging, whereas the type of grammar
required to create a language becomes more general. It can be seen
that type 2 languages generated by the CFG are the class of lan-
guages that can be recognized by the pushdown automata (equiv-
alent to finite automata) that have an unbounded stack available.
These are used to verify the acceptability of productions of the CFG.

2.2.4 Major grammar definitions


Section 2.2.2 focused on the hierarchy of grammars. The primary in-
terest in this section is on defining the grammars in the Chomsky
hierarchy. In decreasing order of description, they are: unstructured
grammar, context-sensitive grammar, CFG, and regular grammar.
Using the rewriting mechanism, the other forms of grammar are
matrix grammar, random CFG, valence grammar, and bag context
grammar.

2.2.4.1 Unrestricted grammars


An P unrestricted grammar (type 0 grammar) G is a 4-tuple
ðN ; ; R; SÞ, where:
• N Pis a nonempty set of symbols called nonterminals;
P
• Psymbols called terminals, with N X
is a set of ¼ 4;
• VG ¼ N W , and R is a set of production rules of the form a/
b, where a˛VG NVG and b˛VG ; and
P
• S˛ðV  Þ is the start symbol.
6 Chapter 2 State of the art: grammatical inference

A string x˛VGþ directly derives y˛VG if x ¼ w1 aw2 , y ¼ w1 bw2 ,


with w1 ; w2 ˛VG and there is a rule a/b˛R. If x directly derives y,
we denote this by x0Gy or simply, x0y. The reflexive and transi-
tive closure of the relation 0 is denoted by 0 . If x0 y, we say
that x derives y.
In an unrestricted grammar, the left-hand side of a rule con-
tains a string of terminals and nonterminals (at least one of
which must be a nonterminal). To generate a string with an
unrestricted grammar, start with the initial symbol while the
string contains at least one nonterminal, and then find a substring
that matches the LHS of some rules. Replace that substring with
the RHS of the rule.

2.2.4.2 Context-sensitive grammars


A context-sensitive grammar (type 1 grammar) is a type
0 grammar with production rules of the form x1 Ax2 /x1 wx2 ,
where x1 ; x2 ˛VG , A˛N , and w˛VGþ . The remarkable point of
this is that we allow the rule S/˛, when S does not appear on
the RHS of any rule.
Similarly, we can define context-sensitive grammars as
length-increasing type 0 grammars; that is, for the produc-
tion rule a/b, it holds that jaj  jbj. Here again, we allow
S/˛ when S does not appear on the RHS of any production
rule.
The language generated by G is denoted as LðGÞ, which can be
P
defined as: LðGÞ ¼ fxjx ˛  and S0 xg. The class of all lan-
guages that which one can generate using context-sensitive gram-
mars is called the family of context-sensitive languages, which we
denote as L(CS).

2.2.4.3 Context-free grammars


In formal language theory, a CFG is a grammar in which
every production rule is of the form A/a, where A is
single nonterminal and a is a string of terminals and/or
nonterminals.
“Context-free” expresses the fact that nonterminals can be
rewritten without regard to the context in which they occur. A
formal language is context-free if some CFG generates it. The lan-
guage generated by G is denoted as LðGÞ, and one can define it
P
as: LðGÞ ¼ faja ˛  and S0 ag. The class of all languages gener-
ated by CFG is called the class of context-free languages, which we
can represent as LðCFÞ.
Chapter 2 State of the art: grammatical inference 7

2.2.4.4 Regular grammars


Suppose there are two languages, X, Y. Then, we can define the
regular operations, union, concatenation, and star, as:
• Union: X WY X jY ¼ faja ˛X or a ˛Y g.
• Concatenation: X +Y ¼ fabja ˛X and b ˛Y g, and we simply
write it as XY .
• Star: X  ¼ f  0x1 x2 ::::::::xk j  0k  0 and each xi ˛X g.
A regular grammar is a type 0 grammar that has a production
P
rule of the form A/aB or A/a, where A; B˛N and a˛ Wf ˛g.
The language generated by G is denoted by L(G) and is defined as:
P
LðGÞ ¼ fxjx ˛  and S0 xg. The class of languages generated
by regular grammars is called the class of regular grammar and
is denoted by LðREGÞ.

2.2.4.5 Regular expression


P
A regular expression R over a finite set of alphabets is defined
recursively:
• F is a regular grammar.
• ˛ is a regular
P expression.
• a, for a˛ is a regular expression.
• If R1 and R2 are a regular expression, then ðR1 jR2 Þ is a regular
expression.
• If R1 and R2 are a regular expression, then R1 R2 is a regular
B

expression.
• If R is a regular expression, then R is a regular expression.

2.2.4.6 Matrix grammars


P
Suppose G ¼ ðN ; ; F ; M; SÞ is a matrix grammar with appear-
ance checking. For x; y˛VG ; we can write x0ac y if there are
x0 ;..; xn ; x00 ;...; xn1
0 and x000 ;...; xn1
00 ˛V  with x ¼ x
G 0
and xn ¼ y, and a matrix rule ða1 /b1 ; a2 /b2 ;...an /bn Þ˛
M, such that for 1  i  n; xn1 ¼ xi1 0 a x00
i i1 and xi ¼
0 00
xi1 bi xi1 or ai /bi ˛F , ai is not substring of xi1 , and
xi1 ¼ xi . The language generated by G is denoted by L(G) and
P
is defined as: LðGÞ ¼ fxjx ˛  and S0 xg. The class of matrix
grammars with context-free rules without l production represents
a production of the form A/l and no appearance checking, are for
example denoted byLðM; CF lÞ, and the same class
with appearance checking by LðM; CF l; acÞ. Also, the classes of
matrix grammar of type 0, type 1, type 2, and type-3 without appear-
ance checking are denoted byLðM; REÞ; LðM; CSÞ; LðM; CF Þ
and LðM; REGÞ; respectively and with appearance checking by
LðM; RE; acÞ, LðM; CS; acÞ, LðM; CF ; acÞ; and LðM; REG; acÞ.
8 Chapter 2 State of the art: grammatical inference

2.2.4.7 Programmed grammars


P
Let G ¼ ðN ; ; P; SÞ be a programmed grammar, where P con-
tains a finite amountP of production rules of the form ðr : a /b;
sðrÞ; 4ðrÞÞ and N ; ; S and a/b are as in the Chomsky grammars.
In the rule ðr : a /b; sðrÞ; 4ðrÞÞ, r is a label and each rule in P has a
unique label. The set of all labels is represented by LabðPÞ; there-
fore, we can write: LabðPÞ ¼ fr : ðr : a /b; sðrÞ; 4ðrÞÞ ˛Pg. In the
production rule ðr : a /b; sðrÞ; 4ðrÞÞ, we have that sðrÞ4LabðPÞ,
and 4ðrÞ4LabðPÞ. Sets sðrÞ and 4ðrÞ are referred to as the success
and failure fields respectively associated with r. For a pro-
grammed grammar G, the language that one can generate is
P
denoted by LðGÞ and is defined as: LðGÞ ¼ fx ˛  jðS; r1 Þ0
 ðx; r Þ for some r ; r ˛LabðPÞg. One can denote the classes of
2 1 2
programmed grammars with and without appearance checking
by LðP; X ; acÞ and LðP; X Þ, respectively, where X could, for example,
be any of RE; CS; CF; CF  l; orREG. Therefore, by LðP; REÞ;
LðP; CSÞ; LðP; CFÞ; LðP; CF lÞ, for example, we denote the classes
of programmed grammars without appearance checking and
with production rules of the same form as in the recursively enumer-
able context-sensitive, context-free, and context-free without l,
respectively.

2.2.4.8 Random context-free grammars


P
Let G ¼ ðN ; ; R; SÞ be a random CFG, where R contains a finite
numberP of production rules of the form ða /b; P; F Þ, with P; F 4N
and N ; ; S; a/b as in the Chomsky grammars. In the rule
ða /b; P; F Þ, sets P and F are called the permitting and forbidding
context, respectively. For two strings, x; y˛VG , we write x0y if
x ¼ x0 ax00 , y ¼ x0 bx00 for some x0 ; x00 ˛VG and ða /b; P; F Þ is a
rule in R such that all nonterminals in P appear in x0 and x00 ,
and none of the nonterminals in F appears in x0 or x00 .

2.2.4.9 Valence grammars


Valence grammar G P over ℤk where k is copies of the integers is a
quadruple G ¼ ðN ; ; R; SÞ, where R contains rules of the form
P
ðv /w; rÞ; r˛ℤk is the valence of the rule, and N ; ; S and v/w
are as in the Chomsky grammars. The language generated by G
  P  !  !
is defined as: LðGÞ ¼ xx ˛  ; S; 0 0 x; 0 .

2.2.4.10 Bag context grammars


A k-bag
P context grammar P is a 5-tuple defined as G ¼
ðN ; ; R; S; b0 Þ, where N ; ; S are as in the Chomsky grammars,
Chapter 2 State of the art: grammatical inference 9

b0 ˛ℤk , where k copies of the integers is the initial bag, and R


contains rules of the form v/wðl; m; aÞ, where v/w are as in the
Chomsky grammars and l; m˛ℤkN ðℤN ¼ ℤ WfþN; NgÞ are the
lower and upper limits, respectively, and a˛ℤk is the bag adjust-
ment. The language generated by G is defined as: LðGÞ ¼
  P 
xx ˛ ; ðS; b0 Þ0 ðx; bÞ; b ˛ℤk .

2.3 Part 2. Introduction to learning algorithms


This section provides insight into various learning algorithms that
were proposed for the GI in a detailed and comprehensive manner.
The strengths and weaknesses of each approach are discussed,
which helps identify the gaps and therefore leads to new research
directions in the GI. Various learning algorithms were proposed for
the GI. We have selected 14 of the most popular GI algorithms,
starting with the first learning model, identification of the limit,
proposed by Gold (1967). The learning algorithms are classified
based on various factors: the technique used, learning types, the
corpus type, the method adapted, and the key elements used.

2.3.1 Identification of the limit


The first learning model, proposed by Gold (1967), addresses the
question “Is the information sufficient to determine which of the
possible languages is the unknown language?” Gold’s approach
considers a sequence of strings one at a time, collectively called
a presentation. Two types of presentation are considered: positive
and complete. In the case of positive presentation, strings in the
sequence are in the target language; the reverse order is the
case for complete presentation. After finishing the presentation,
the inference algorithm hypothesizes a new grammar that can
satisfy all of the strings seen previously. In other words, identify
a grammar that generates all positive and no negative examples.
Gold’s learning model mainly focuses on the terms positive, nega-
tive, and complete information indicates the strings present in the
target language, string does not present in the target language and
both respectively. Gold shows that the inference algorithm needs
a finite number of steps to identify an unknown language within
the limit from complete information.

2.3.1.1 Strengths and weaknesses


Gold (1967) laid the foundation of the GI; the focus was on identi-
fying the grammar for the target language. Since his seminal work,
10 Chapter 2 State of the art: grammatical inference

many researchers have started working in this area. Also, it was


shown that an inference algorithm can identify an unknown lan-
guage in the limit from complete information in a finite number
of steps. This learning model gained popularity over the years.
Nearly all the GI algorithms, such as inference of general-purpose
programming languages, inference of domain-specific languages,
graph grammars, and visual languages, use the identification of
limit (Stevenson and Cordy, 2013, 2014).
The problem with the Gold’s learning algorithm is that sufficient
information is not available with the inference algorithm about
when a correct grammar has been identified, because there is al-
ways the possibility that the next sample might invalidate the previ-
ous hypothesis. Angluin (1980) proposed “tell tales” (a unique string
makes the difference between languages) to avoid this drawback.
Gold laid the foundation of the GI, and then Bunke and Sanfeliu
(1990a,b) presented the first usable GI algorithm in the syntactic
pattern recognition community with the aim of classifying and
analyzing patterns, classifying biological sequences, performing
character recognition, and so on. The main drawback of this algo-
rithm (Bunke and Sanfeliu, 1990a,b) was it deals only with positive
data. It does not fit exactly into a finite state machine, and therefore
good formal language theories are completely lost.

2.3.2 Teacher and queries learning algorithm


“Teacher and queries” is another learning algorithm. It uses a
teacher, also referred to as an oracle, who knows the target lan-
guages and is capable of answering a particular type of question
or query via an inference algorithm. The teacher and query learning
model is similar to the game of 20 Questions used by a teacher who
knows the target language and the response to the queries. Six
different types of queries were described by Angluin and Smith
(1983), two of which are membership and equivalence queries
that have a significant impact on learning. In the case of member-
ship queries, the inference algorithm presents yes or no as an answer
to the oracle. An oracle receives yes if the hypothesis is true, and no
otherwise by the inference algorithm. A teacher who answers both
memberships and equivalence queries is known as a minimally
adequate teacher (Angluin and Smith, 1983; Angluin, 1988), because
the teacher is able to identify deterministic finite automata (DFA) in
polynomial time without using additional examples.

2.3.2.1 Strengths and weaknesses


The teacher and query learning model was considered to be an
alternative way to measure the learnability of a class of languages.
Chapter 2 State of the art: grammatical inference 11

The beauty of this learning model is that it can be used on its own
way or in conjunction with a presentation of corpus (positive, nega-
tive, or complete) to present the abilities of the learner. Stevenson
and Cordy (2013, 2014) showed that the teacher and query learning
model addresses many difficulties faced in the identifying the limit
(Gold, 1967).
The key issue with this approach is the implementation of an
oracle. The implementation of an oracle is difficult because it re-
quires a vast amount of information and therefore is less commonly
used in software engineering applications, whereas Gold’s approach
(Gold, 1967) is popular on this front.

2.3.3 Probably Approximately Correct learning


algorithm
Valiant (1984) presented a Probably Approximately Correct (PAC)
learning model, which takes the advantage of both the identification
of the limit (Gold, 1967) and teachers and queries (Angluin and
Smith, 1983) learning models. The PAC learning model is different
from the other two models (Gold’s and teachers and queries
learning models) discussed previously because of two main reasons:
(a) It does not guarantee exact identification with certainty.
(b) It compromises between accuracy and certainty.
The PAC learning model supports two user-defined parame-
ters, accurate () and confidence (), which measure the correctness
of the learning process. As discussed, PAC learning enjoys the fea-
tures of Gold’s learning model (i.e., incremental approaches to-
ward the target with higher accuracy over time). A metric (sum
of probabilities) was evaluated and used to measure the distance
between two concepts. It follows the criterion that an inference al-
gorithm can confidently (with probability 1  d) guess a concept
with higher accuracy, whereas Gold’s learning model compro-
mises in terms of accuracy.

2.3.3.1 Strengths and weaknesses


The PAC learning model produces results in polynomial time,
which is useful for approximating a bounded conjunctive normal
form (k-CNF) and monotone disjunctive normal form expression
by only positive representation.
The problem with the PAC model is the inference algorithm
that must learn in polynomial time under all distributed but be-
lieves this is too strict in the reality. This problem occurs because
many apparently simple classes are polynomial learnable for the
entire distribution. Another issue with the PAC model is that it is
not fit for negative and NP-hard (Non polynomial) equivalence
12 Chapter 2 State of the art: grammatical inference

results. To mitigate this issue, Li and Vita nyi (1991) presented a


modified PAC learning model in which simplicity was measured
by Kolmogorov complexity. Despite the modification made in the
original PAC learning model, it has attracted a smaller number of re-
searchers, whereas Gold’s learning model and the query-based
learning model remain far more prevalent.

2.3.4 Neural network in learning algorithm


Apart from these learning models , much research exists to explain
the suitability of the neural network (NN) for the GI. The NN has
the ability to maintain a temporal internal state like a short-term
memory. In the case of an NN, a set of inputs and their corre-
sponding outputs (Yes: string is in the target language, No: other-
wise) and a function is defined that needs to learn and describe
inputeoutput pairs. The NN favored by the GI community is
known as the recurrent neural network (RNN) because it maintains
a temporal internal state.
Graves et al. (2009) conducted extensive experiments for hand-
writing recognition using the NN and explained the capabilities of
the NN in predicting subsequent elements from an input sequence.
Cleeremans et al. (1989) implemented a special case of the RNN. It
was proposed by Elman (1990) to approximate a DFA. Delgado and
Pegalajar (2005) proposed a multiobjective genetic algorithm (GA)
to analyze the optimal size of an RNN to learn from positive and
negative examples. Delgado and Pegalajar (2005) used the merits
of the self-organizing map to determine automation after comple-
tion of the training process.

2.3.4.1 Strengths and weaknesses


The NN is good at simulating an unknown function. An impor-
tant element of the NN is the number of hidden nodes that
have a significant role in determining the encoding mechanism
of a DFA in the activation weights of a trained network. The
following convention is applied: “If the number of hidden nodes
is constrained, then the resulting encoding will produce the small-
est DFA for the target language, whereas if more hidden nodes are
added, then the DFA will be distributed across the additional
nodes and participate in representing the equivalent grammar.”
This feature reduces the complexity of the GI and significantly im-
proves the overall performance.
As discussed, the NN is widely used in the GI because it is
effective at simulating an unknown function, but there is no
way to reconstruct the function from the connection from the
trained network.
Chapter 2 State of the art: grammatical inference 13

2.3.5 Automatic DIstillation of structure algorithm


Solan et al. (2005) proposed the Automatic DIstillation of Structure
(ADIOS) algorithm as a statistical method of the GI. The ADIOS
produces results in the form of the CFG. It accepts a corpus of
strings such as text, nucleotide base pairs, or transcribed speech
in the form of the positive examples in an unsupervised manner.
Hence, it is also referred to as the text-based and unsupervised
grammar-based method.
The overall work of the ADIOS is divided into three phases:
initialization, pattern classification, and generalization. ADIOS
uses the features of the pseudograph in which both loops and
multiple edges are allowed. Lexicon entries are considered to be
vertices augmented by two special symbols: “begin” and “end.”
The work of the various phases of ADIOS is presented in
Fig. 2.2. The process begins with loading the sentence from a
corpus to find a suitable path for the sentence. Then, the system
searches for significant patterns (i.e., a sequence of nodes). The
pattern extraction process was influenced by Van Zaanen’s
alignment-based learning (Van Zaanen, 2000). From the pattern
obtained from phase 2, equivalence classes can be deduced,
which leads to the construction of the CFG.

2.3.5.1 Strengths and weaknesses


Performance of the ADIOS was evaluated against an artificial
corpus of 6400 tokens, in which 95% precision and 86% recall
was obtained, indicating that effectiveness outperformed for
smaller corpora.
The authors (Solan et al., 2005) accepted that the ADIOS is not
well-suited to large corpora. Significant work need to be done for
improvement and to construct a reliable grammar. When ADIOS

Working of ADIOS

Phase-1: Initialization Phase-2: Pattern distillation Phase-3: Generalization

Finding the most significant


Loading the corpus Extraction of significant
pattern and then generalize
onto a directed patterns/sequence of
to create equivalence
psedograph nodes
classes

Figure 2.2 Automatic DIstillation of structure algorithm’s (ADIOS’s) phases and their responsibilities.
14 Chapter 2 State of the art: grammatical inference

was tested on 1.3-million-word samples from the CHILDES corpus,


the performance was significantly lower although it achieved 60%
accuracy. Horn et al. (2004) implemented ADIOS on a parallel
corpus of the Bible; his observation was “ADIOS is fit for Intuitive
generalization.”

2.3.6 EMILE
Adriaans (1992, 1999) proposed EMILE, which was successfully
updated over the years. Adriaans and Vervoort (2002) presented
the latest version of EMILE 4.1, which was based on the teachere
pupil metaphor. The idea of EMILE 4.1 is that the teacher gener-
ates grammatically correct sentences, whereas the pupil can ask
valid queries. It indicates that the EMILE algorithm belongs to
the class of text-based and supervised learning.
Basic steps involved in the EMILE algorithm are depicted in
Fig. 2.3. The EMILE is a categorical grammar (CG) inference algo-
rithm in which for a given input, sentences, for example, are con-
verted in the CG of the basic categories. After applying first-order
explosion, each sentence is examined and identifies possible ways
to break into subexpressions. The outcome of this phase is passed
to an oracle for verification.
The primary objective of the verification phase is to identify the
valid subexpression of the same type. The next step is clustering
the subexpression of the same type. The results of the clustering
phase produce basic and complex rules that are usually difficult
to understand. Therefore, the next phase is rule induction, which
helps to identify simple and generalized representation for the
training data. These rules are passed to the rule rewriting phase,
which then produces the final CFG.

2.3.6.1 Strengths and weaknesses


EMILE has become popular because it produces accurate results
by collecting evidence before constructing the final grammar rules.
Results presented by Adriaans (1992, 1999) showed that the system
acquires a language with 50,000 words, which needs a sample of
5 million sentences. If one assumes an average of 15 words per
sentence, a 75-million-word corpus is needed. The figure used by

Steps used in EMILE algorithm

First order Rule Rule


Verification Clustering
explosion Induction rewriting
Figure 2.3 Steps involved in EMILE algorithm.
Chapter 2 State of the art: grammatical inference 15

Adriaans (1992) to conduct the experiments was too large


(Roberts, 2008). Therefore, EMILE was considered to be a slow
learning approach. Van Zaanen and Adriaans (2001) compared
alignment-based learning (ABL) with EMILE and concluded
that EMILE is a slow learning approach compared with ABL.
An experiment was conducted on the Air Travel Information Sys-
tem (ATIS) corpus, which produced a precision of 51.6% to
describe the performance of EMILE (Van Zaanen and Adriaans,
2001).

2.3.7 e-Grammar Induction Drive by Simplicity


Petasis et al. (2004) proposed a GI algorithm known as e-Grammar
Induction Drive by Simplicity GRIDS (e-GRIDS). It is based on the
GRIDS algorithm presented by Langley and Stromsten (2000). The
e-GRIDS algorithm uses a simplicity bias for CFG induction from
only positive examples. It does not support an oracle to validate
sentences; therefore, is classified as a text-based and unsupervised
learning algorithm.
An architectural view of the e-GRIDS learning model is shown
in Fig. 2.4, using training examples as input and constructing an
initial grammar for each sentence. The beam search is used to orga-
nize the learning process. To explore the search space adequately,
the e-GRIDS algorithm uses three learning operators: MergeNT,
CreateNT, and Create OptionalNT. The purpose of these operators
is shown in Table 2.1.

Training
Examples MergeNT
Operator
Learning Operators

Initial Beam of
Grammar Grammars

CreateNT
Operator

Create
Optional NT

YES Any Inferred NO Final


Grammar better Grammar
than those in beam?

Figure 2.4 e-Grammar Induction Drive by Simplicity algorithm architecture. Redrawn from Petasis et al., 2004. e-GRIDS:
computationally efficient grammatical inference from positive examples. Grammars 7, 69e110.
16 Chapter 2 State of the art: grammatical inference

Table 2.1 e-Grammar Induction Drive by Simplicity learning operators.

Learning
S.N. operators Purpose
1. MergeNT Applied to merge two nonterminal symbols into a single nonterminal symbol.
2. CreateNT Creates a new nonterminal symbol from two existing nonterminal symbols.
3. Create OptionalNT Used to duplicate the rule created by CreateNT operator and then append a
nonterminal symbol, which makes the symbol optional.

The beam search is used to explore the search space; therefore,


the learning process occurs in three steps based on the learning
operators, as discussed in Table 2.1. The learning process termi-
nates when the successor grammar’s score is not better than the
one present in the beam.

2.3.7.1 Strengths and weaknesses


As discussed, the e-GRIDS algorithm uses simplicity bias to direct the
search through the space of the CFGs. It is a well-suited GI approach
from the positive samples. The minimum description length (MDL)
principle is used as a criterion to measure the simplicity of a
grammar that avoids overly general grammars.
The following issues were observed with this approach:
(a) It is not suitable for negative examples.
(b) The beam search in the learning process uses three operators
(Table 2.1). Implementing these operators and collecting the
temporary results makes this approach ineffective.

2.3.8 Computation learning of natural language


Computational learning of natural language (CLL) is a text-based
and supervised learning algorithm proposed by Watkinson and
Manandhar (2001). The primary interest of this method is to learn
natural language syntax from a corpus of a declarative sentence. It
does not require an oracle for validation.
Fig. 2.5 represents the general workflow of the CLL algorithm,
consisting of three main phases: parsing of the example, parse
selector, and lexicon modifier. Initially, a sentence is accepted
from the corpus, which is then parsed using the n-best probabi-
listic chart parser. The n-best probabilistic chart parser is devel-
oped from a standard stochastic CYK algorithm (Kasami, 1965).
The results of the parser phase are sent to the parse selector phase;
Chapter 2 State of the art: grammatical inference 17

Examples Probabilisc Categories


Corpus and Rules
parser
N Most probable parser

Parsed
Parser Selector Examples

Current
Lexicon

Lexicon Modifier

Figure 2.5 Block diagram showing the general workflow of the computational learning of natural language
algorithm (Watkinson and Manandhar, 2001).
this determines which parser will produce the most compressive
lexicon. This can be achieved measuring the sum of the size of cat-
egories of an individual lexical entry that evaluates the effect of new
lexicons on the previous parses employing reparsing.
At the final stage, the CLL takes the current lexicon and replaces
it with most compressive lexicon chosen in the previous stage.
These three stages are repeated until all sentences of the corpus
have been parsed.

2.3.8.1 Strengths and weaknesses


The CLL model was effective for the complex problem. It was
extracted from Penn Treebank (Marcus et al., 1993, 1994), and, is
divided into two categories C1 and C2 (C: category) (C1: 5000 senten-
ces of 15 words or less and C2: 1000 sentence of 15 words or less).
Two different sizes (31 and 348) of the initial lexicon were selected
for learning.
The CLL model was developed with too much innateness, which
reduces its credibility for unsupervised learning. In addition, the
CLL algorithm uses the n-best probabilistic chart parser. This results
in a number of possible parsers to decide which one would be ad-
vantageous for the lexicon most considering compressive lexicon
created by a parser that is achieved by re-parsing of an example,
appears to be a very costly approach, does not ensure the most
compressive lexicon.
Another random document with
no related content on Scribd:
The telegram was certainly a work of art and ingenuity, and it
took art and ingenuity to understand it, with no punctuation marks
and some words evidently invented by a despairing operator in a
quandary over Madeline’s perfectly illegible handwriting. But the
general drift was that Madeline had been “on the way to” utter
despair,—because the heroine of her novel insisted on eloping with
the villain instead of the hero—when she thought of making a story
out of Patricia’s long-lost letters from “R.” While she was waiting for
her effort to come back to her, as usual, she scribbled off a college
tale about a girl who had a desk with a secret drawer and didn’t
know it. The first story was accepted—and paid for—by the
magazine that had been the goal of her ambitions all winter, and the
other had brought her a contract for a dozen college stories to be
written within a year, on terms that made a true Bohemian like
Madeline feel fairly dizzy with sudden wealth.
This splendid sequel to the hunt for Eugenia’s theme reminded
Betty of the papers which had filled her drawer, and which, in the
rush of other excitements, she had quite forgotten. If they had
anything to do with Patricia and “R.” perhaps Madeline might write a
sequel to her first story and score another triumph. But examination
proved that the nearest name to Patricia mentioned in them was
prosaic Peter, and the only “R.” a Robert Wales who signed one of
the papers in the minor rôle of witness for Peter’s signature. Betty
was interested at discovering her surname; but prosy old documents
make dull reading, even if witnessed by a possible ancestor.
However, she finally sent them to Madeline, for, as she told Georgia
Ames, you never can tell what a literary person will see in the most
commonplace things.
Of course Madeline was overjoyed at the happy outcome of the
Tally-ho’s crisis, and so was Babbie, who appeared in Harding with
the very earliest signs of spring.
“Florida was duller than ever this year,” she told Betty. “I’ve left
mother in Washington waiting for really warm weather, and I’ve come
to see about my branch of the Tally-ho. I’m sure it needs my
personal attention. Mr. Thayer certainly ought to give the poor
stocking-makers ice-cream for staying in and learning their lessons
now that it’s getting to be nice weather. You’re not a bit enterprising
about working up business through the night-school, Betty.”
“I have to leave that to you,” Betty told her solemnly. “The regular
affairs of the tea-shop, and Mr. Morton, are all that I can manage.
The ploshkins will be here to-morrow in full force, and Mr. Morton
has written to know if we can’t think of some small improvements
that can be made next week during the spring vacation. He can’t
bear to wait until summer for everything.”
“As if this place wasn’t just about perfect now!” said Babbie
scornfully.
But Mary Brooks, appearing in the midst of the discussion, took a
different view. “You’ve got to keep making them sit up and take
notice of something new over and over and over,” she announced.
“That’s business. The ploshkins will do for one thing, but if the
Morton millions are fairly languishing to be wasted on this property,
you ought to be able to think of some features to spend them on.
Just wait a minute—I have it—a tea-garden! Pagoda effects
scattered over the side yard. Lattice work, and thatched roofs,
Japanese screens to keep out the sun and the stares of the gaping
crowd, and lanterns for evenings. I’m sure it would take.”
“It’s commonplace compared to what I’ve thought of,” declared
Babbie proudly. “What we want is a Peter Pan Annex in our elm
trees. I presume you’ve never been to the original Café Robinson,
Mary, but we have, and it’s way beyond any tea-garden.”
Betty was in the window, peering out at the Harding elm trees.
“We could,” she declared. “I always wondered how those two
trees happened to be so close together, and now it seems like fate
that they’re exactly right for a Café Robinson.”
“And easily tall enough for three stories,” cried Babbie, joining
her.
“We mustn’t forget the big one-two-three signs for the stories,”
chimed in Betty excitedly.
“Nor the basket to pull up with the extra things,” added Babbie.
“We’ll tell Nora to have some extra things in every order so they
can all have the fun of hauling up the basket.”
“The view will be perfectly lovely from the top,” declared Babbie.
“And isn’t it fine that our trees are in such a sheltered place, behind
the little white house?”
Betty nodded. “If Bob were here she’d shin up to the top this very
minute and tell us what you can see.”
“But Babe will surely say she likes the second story best,
because she and John made up their quarrel in the second story,”
laughed Babbie; and then they settled down to telling the bewildered
Mary about the house-in-the-trees café that they had discovered
near Paris, and how the going-away party held there for Madeline
had developed into an announcement party for Babe. And of course
Mary agreed that a Peter Pan Annex was the only thing for the Tally-
ho Tea-Shop.
“And as Madeline won’t let me call my night-school a branch of
the business, I shall write her how I thought up this,” Babbie
declared. “I will also hunt up that comical carpenter that Madeline
had such times with last fall, and show him how to build it.”
Now carpentry and the supervision of carpentry are no work for a
woman; and the Tally-ho’s trees were in plain sight from Mr. Thayer’s
office windows. So it was only natural, when Babbie’s slender figure
appeared on the lawn for the purpose of supervision, that Mr. Thayer
should join her for the purpose of applying an understanding
masculine intelligence and a firm masculine will to the direction of
the thickest-headed carpenter imaginable. Babbie had a careless
fashion of running out on the rawest day without a wrap. This made it
all the more necessary for Mr. Thayer to come over, bringing his
sweater to throw across her shoulders.
“I saw your Cousin Austin at Palm Beach,” Babbie had explained
shortly after her arrival in Harding, “and then at St. Augustine. At
Miami he took us on the loveliest cruise, and I drove his car at sixty
miles an hour on the beach at Ormond. It was ripping fun. Not many
men will risk your losing your head and smashing them up.”
“And don’t you ever lose your head?” inquired Mr. Thayer blandly.
“Not over your Cousin Austin,” said Babbie, with a flash of a
smile.
After that Mr. Thayer came oftener and stayed longer. Babbie
assured Betty and Emily Davis that they had no idea how
complicated a Peter Pan Annex seemed to an untraveled carpenter
of Harding.
“We’re so afraid it won’t have the real French air,” she said.
“That’s why we spend such ages in staring at it from all possible
angles.”
“And then it must be perfectly secure,” she explained on another
occasion, just after she and Mr. Thayer had sat for almost an hour in
the top story, among the branches that now made a most beautiful
feathery screen. “Think how horrible it would be if the railing was too
low and some silly little freshman fell out, or if the floor wasn’t strong
enough and gave way. Mr. Thayer knows all about such things. He’s
taking a lot of interest. We never could have done it properly except
for him.”
But in spite of the accommodating slowness and stupidity of the
untraveled carpenter, the Peter Pan Annex was finished at last.
“I’m a candidate now for the Perfect Patron’s Society,” Mr. Thayer
told Betty, “so I want to give an opening-day tea up on the top floor
for all the owners, managers, assistant managers, and small sisters.
It’s to be this afternoon at four. I also want another stocking factory
party, and hadn’t we better get it off our hands early, before
commencement begins to loom up ahead?” Mr. Thayer looked very
hard at Betty. “I suppose you are terribly busy?”
“Terribly,” returned Betty gravely, “but I think Babbie will help.”
Babbie would not.
“I’m going to your Cousin Austin’s Adirondack camp,” she
explained, “to see spring come in the woods. Mother is the
chaperon, and I have an awful suspicion that I am a sort of guest of
honor. Anyway, the spring part of it appeals to me. And secondly,
mother has been solemnly promised a reunion with her long-lost
daughter.”
Later in the day Babbie, in a kimono, which is the attire of
confidential intercourse, complained that “Mummy was as bad as
Margot about a multi-millionaire,” and that she hated the woods in
spring; they were always hot, and smoky from forest fires, there was
no shade and no shooting, and the canoes leaked from being dry all
winter.
“Moreover,” added Babbie wearily, a “so-called camp, with a
butler and three other men, and a sunken garden, is going too much
for me. But when mummy really insists, the laws of the Medes and
Persians aren’t in it.” She gave a funny little mirthless laugh. “I
suppose one ought to be very sure that one isn’t foolishly prejudiced
against the popular idea of the idle rich.”
So Emily planned the factory party with much energy and
originality, and Mr. Thayer was duly grateful. But his rare smile came
only when Betty showed him a note from Babbie, inquiring carefully
about the date of the party and stating in a postscript, with vehement
underlining, that she never wanted to see spring come in anybody’s
woods again.
“There are mosquitoes, and other things much worse,” ended
Babbie enigmatically, with the blackest possible lines under the last
two words.
“Suppose you let me write her about the date of the party?”
suggested Mr. Thayer. “Then you needn’t bother.”
Evidently the change in correspondents did not displease Babbie
seriously, for she was back on the appointed day, with a bewitching
smile, flashed out from beneath a bewitching hat, for all her stocking
factory friends, including Mr. Thayer. The party was a sort of spring
fête held out on the grounds of the factory, in the late afternoon and
early evening. There were folk dances in costume, national songs,
and old-country games. Emily had made all the guests feel a
tremendous pride in doing whatever they could to entertain the rest,
and everything, from the Irish bag-pipe music to the Russian
mazurkas, went off with great spirit.
It was while Jimmie O’Ferrel was dancing a jig with all his might
and main, and with all eyes fastened upon his flying feet, that Betty,
happening to glance across the grounds, saw a bewitching hat slip
swiftly from the fence top down on the tea-shop side. But she had no
proof that Mr. Thayer was concerned in the disappearance of the
hat, until the smallest sister sought her out importantly, a little later.
“Do you want to know what I think?” she asked. “Well, I think
Babbie and Mr. Thayer are in love.”
“Why do you think that?” asked Betty laughingly.
“Because,” explained Dorothy, “I ran up in the Peter Pan Annex
just now to see how small people look ’way down here from ’way up
there, and I jumped ’most out of my skin ’cause there those two sat.
They never saw me at all, and he had his arm around her and she
didn’t care. She was smiling about it. So I came straight away. Was
that right?”
“Of course,” laughed Betty. “You hadn’t been invited.”
“I was invited to Mr. Thayer’s party, though,” objected Dorothy,
“and now he isn’t here. He’s over at our house. That’s queer.”
Up in the Peter Pan Annex Mr. Thayer was saying to Babbie, “I
must go back before any one misses me.”
“I can’t go back,” said Babbie sadly. “I tore my dress dreadfully
getting over the fence. You shouldn’t have made me do it.”
“I didn’t make you,” retorted Mr. Thayer. “I particularly advised
you to go around.”
“Exactly,” agreed Babbie, “and that made me want to go over.
Dear me! Do you suppose we shall ever really quarrel on account of
my not wanting to give in to your chin?”
“No, because I shall always want to give in to yours,” Mr. Thayer
told her.
“But I shouldn’t let you give in always,” declared Babbie. “I should
take turns giving in.”
“Don’t say ‘should,’” objected Mr. Thayer. “Say ‘shall.’ Haven’t we
settled it?”
“Of course.” Babbie gave a comical little sigh. “It feels so queer to
be settled—and so very nice. Now go back to your party, and I’ll get
Nora to lend me some pins so I can go back too. Oh, and we’ll tell
Betty, shan’t we, right away?”
Under the circumstances Betty wasn’t extremely surprised, but
she was extremely pleased.
“Now our tea-room is as successful as the famous one that
belonged to the cousins of the girl who lives over Mrs. Bob,” she
laughed. “It has produced an engagement, and a literary career to
match the artist person’s.”
Babbie frowned. “You mustn’t leave yourself out, Betty. You’re
mixed up in everything, and I don’t believe that other tea-room was
half as nice as this or made half as much money.”
“Neither do I,” agreed Betty happily. “I’m perfectly satisfied with
my profits, though they’re not so extraordinary as yours and
Madeline’s. Every morning when I unlock the door I’m in such a
hurry to look in and see that everything is all right and all here. It’s so
pretty and I love it so, that I’m afraid it will vanish some night like a
fairy palace.”
It was odd that the very next morning when Betty unlocked the
door, she should find that some marauder had been there before her.
She had locked her desk the night before, as she always did. But
during the night the lid had been forced back, the papers in the
pigeonholes tossed out on to the floor, the drawers opened and
emptied.
Her face was white and frightened as she rushed over to find
Babbie, who was staying in the little white house this time.
“The tea-room has been robbed!” she gasped. “Come over there,
quick.”
Babbie, who always breakfasted late, was pinning her collar, and
she gave a start that jabbed the pin straight into her thumb. “Ouch,
but that hurt!” she groaned. “What did they take?”
“I was so frightened I didn’t stop to see. I thought they might be
hiding in the loft.”
Babbie dropped a skirt over her head, and started down the
stairs, hooking it up as she ran.
“They wouldn’t do that. They’d want to escape in the dark,” she
called back encouragingly.
But at the door of the tea-shop she paused. “There is something
moving up there,” she whispered cautiously. “See! Over in that
corner by the curtain.”
Betty couldn’t see anything moving, but when Babbie started in a
hasty retreat toward the little white house she banged to the big door
and followed. Just then Bridget came waddling breathlessly up the
hill.
“Wat’s up now, Misses?” she called. “Why are yez afther shuttin’
of me out?”
Bridget’s fat figure was very reassuring. Simultaneously Betty
and Babbie ran toward it, gasping out the news.
“COME ALONG NOW”

“In the loft? Well, we’ll finish ’em thin.” Bridget seized a brass-
handled poker, the latest addition to the tea-shop’s stock of antiques.
Then she laid it down again, carefully removed her neat black
bonnet, and as carefully laid it on a table. “No use of spilin’ that in a
fight. Come along now wid yez,” she ordered.
Betty seized an umbrella that some one had opportunely left in a
corner, and Babbie chose as weapon a tall brass candlestick. Then
the procession started, Bridget waddling and wheezing in front,
Betty, still white with terror, following, and Babbie, beginning to smile
again at the absurdity of the search, bringing up the rear. But they
hunted conscientiously, exploring every hiding-place into which a
man could possibly squeeze himself and some that would have
cramped a self-respecting cat.
“They ain’t here at all,” announced Bridget at last, removing her
eye from a knot-hole in the wall into which she had been spying
laboriously, and standing upright with more puffings and pantings.
“It’s downstairs we go. Thim stalls are foine for burgulars, and
mebbe they’re in me kitchen this minute, ating up me angil-food that
’ud riz light as a feather. Oh me, oh me.”
“They aren’t here now. I’m sure they’re not,” protested Babbie.
“Think how absurd it would be for a burglar to hide in here, just
waiting around to be caught. I’m going to see what we’ve lost.”
Bridget persisted in completing her search, and Betty would not
desert her. But when the fat cook was satisfied and had sat down to
fan herself into a semblance of calmness that would make possible
the successful cooking of waffles for the “Why-Get-Up-to-Breakfast
Club,” Betty joined Babbie, and together they straightened out and
looked over the papers from the desk.
“There’s nothing gone. Of course they wouldn’t want grocer’s
bills, even if they were receipted,” Betty declared. “But I left six
dollars and thirty cents all rolled up in one of the top drawers. Emily
forgot it when she went to the bank. I suppose they’ve got that.”
“Drawer wide open, and one—five—yes, six dollars and thirty
cents all here,” Babbie reported. “That’s very queer. Burglars that
hunt as hard as this and then don’t take the money when they find it
are certainly particular. Well, did they like our old brasses, I wonder,
or our plated silver spoons?”
But the candlesticks—except the one Babbie had seized upon—
and the Flemish lamps were all in place. The gargoyles grinned
serenely from their accustomed niches. The silver drawer had not
been tampered with. In the kitchen the angel-food was just as
Bridget had left it.
“It’s a mystery,” declared Babbie at last, “a thrilling and
impenetrable mystery. When do burglars not burgle?”
“When they are frightened off,” answered Betty prosaically.
“But it wouldn’t have taken a second to dip out that money,”
Babbie objected. “It was all mussed up, so some one’s hand must
have been in there, since you left it in a roll——”
“Yes, in a tight little wad,” put in Betty.
“And that some one could have pulled back his hand full just as
quickly as empty,” Babbie went on. “I tell you it’s a horrible mystery.
I’m going to ask Robert to come over this minute and see about it.”
Meanwhile Emily, who had been doing the day’s marketing,
arrived; but neither she nor Mr. Thayer could solve the “thrilling,
impenetrable, horrible” mystery, though Mr. Thayer found “jimmy”
marks on the shed door, and that, as Betty said, proved beyond a
doubt that the burglars had been the real thing.
“Real, but very eccentric,” laughed Emily. “Let’s hope that all the
Tally-ho’s burglars will belong to the same accommodating tribe.”
CHAPTER XIX
THE AMAZING MR. SMITH AND OTHER
AMAZEMENTS

“Rachel Morrison? No, not yet, but she’s coming. Everybody’s


coming.”
“K. Kittredge is as comical as ever. Ask her about her prize
English pupil.”
“Do you know, you’re glad to see everybody these days. Why,
Jean Eastman rushed up to me, and I fell upon her neck. Digs and
freaks and snobs and all, they belong to 19— and the good old
days.”
“Do you feel that way too? I wondered if any one else had noticed
the horrid little changes. I suppose things will change, but I wish——”
“Nonsense! Look at this tea-shop. It’s a change all right, and for
my part I don’t see how we should live without it.”
“Oh, but this is different. This is 19—’s very own.”
“Where’s Betty Wales, anyway? She’s so busy you can’t get
within a mile of her.”
Thus 19—, over its ices in the Peter Pan Annex. The Tally-ho
Tea-Shop was 19—’s headquarters, official and unofficial. There they
breakfasted, lunched, tea-ed, and dined; there held informal “sings”
and rallies, and there on the last evening of the festal week they
were to eat their class supper. The tenth year class were to eat theirs
in the loft. The fifteeners had engaged the first floor of the Peter Pan
Annex, and the six graduates of the very oldest class were to lunch
up in the top floor, among the tree-tops. No wonder that Betty was
busy and had to be caught on the wing and forcibly detained by 19—
friends. Commencement guests fairly beset the Tally-ho at meal-
times. Between meals old girls and belated undergraduates thronged
the tables. Betty could hardly believe her eyes when she counted up
one day’s returns from the Peter Pan Annex. As for ploshkins, the
first order had sold out almost before it was unpacked, and every
class in college had wanted to adopt the ploshkin for its class animal.
But Betty explained that 19— had already secured it.
Madeline had had that happy thought, of course, and Kate
Denise, who was chairman of the supper committee, had capped it
by ordering miniature ploshkins for favors and a mammoth one for a
centerpiece. Then Madeline had written a ploshkin song which was
so much cleverer than “The Bay Where the Ploshkin Bides,” that the
Glee Club groaned with envy. There was also a 19— song called
“Tea-Shop,” and one called “The House of Peter Pan,” so that Betty’s
enterprises were much in the public eye, if she was not.
It was dreadfully hard to stick to work, when you knew that 19—
was having a “Stunt-doers’ Meet” under the apple-trees on the back
campus, or Dramatic Club’s Alumnæ tea, also with “stunts,” was on
in the Students’ Building. The only consolation lay in the fact that
your dearest friends calmly cut these surpassing attractions, to which
some of them had traveled thousands of miles, just to sit by the
cashier’s desk in the Tally-ho Tea-Shop, and talk to the cashier in her
intervals of comparative leisure, waiting patiently while she made
change, found tables for helpless or hurried customers, took “rush
orders” to the kitchen when the waitresses were all too busy, and in
general made things “go” in the steady, plodding, systematic fashion
that her gay little soul loathed. But she realized that she had made a
success of the Tally-ho just by keeping at it, and she was going
home next week with little Dorothy and “money in her pocket,” in
Will’s slangy phraseology, leaving Emily to take charge of the
improvements which Madeline and Mr. Morton had planned on a
scale of elegance that fairly took away Betty’s breath, and of the
remnants of business that would be left when the hungry Harding
girls had departed, and sleepy silence reigned on the deserted
campus.
Eugenia Ford came in one afternoon early in commencement
week, looking very meek and unhappy.
“I’m going home to-night. I was foolish to plan to stay over, but a
senior I know asked me to, and I thought of course she meant it. And
she only let me entertain her youngest brother part of one morning,
and made me give her my ticket to the senior play.”
“What a shame!” Betty sympathized.
“But I was to blame. I was a goose,” Eugenia repeated. “I ought
to have known that she only wanted to get something out of me. If I
rush up to people all of a sudden, when I’ve never noticed them
much before, I generally want to get something out of them. It’s
naturally the same with other girls.”
Betty laughed. “Better stick to the ones who are always nice to
you—your real friends,” she advised.
“But then you won’t get on,” objected Eugenia wisely. “They say
you’ve got to scheme a lot to be in things here. You’ve got to make
yourself known.”
“Why not just try to be worth knowing?” Betty suggested. “My
friend Rachel Morrison was as quiet and—and—unpushing as could
be, but she was so bright and nice and thoughtful for other people
and so reliable that everybody wanted her for a friend.”
Eugenia sighed. “I’m not bright or thoughtful for others. I—oh,
dear, this isn’t what I came to talk about, Miss Wales. I—I stopped to
say good-bye to Dorothy. I—she—we made up. I mean—we hadn’t
exactly quarreled, so we couldn’t exactly make up. But I felt so
ashamed. Being mean to little girls makes you feel so ashamed—
even if they don’t know about it. Miss Wales, I’ve heard about the
dormitory for poor girls—Morton Hall. When I went home in the
spring my father said that as far as he could see you’d taught me
about all the sensible things I’d learned this year. He asked me what
you’d like for a present. I couldn’t decide, but when I heard about the
dormitory I wrote and asked him to send you a check for extra
things, you know, for the furnishings, or to pay part of some girl’s
board. I thought perhaps you’d rather have that—from us—than
something for yourself.” She put three checks into Betty’s hand. “Two
of my best friends sent the others. It was what they had left from
their spring term allowances. Susanna would like hers to go for a
picture in the house parlor. Molly doesn’t care.”
Eugenia rushed through all this information so fast that Betty had
no chance to interrupt, and at the end she was speechless with
surprise. She glanced at the checks. The smallest was for a hundred
dollars. Together they would provide endless “extras” for Morton Hall,
or help dozens of poor girls to make both ends meet.
“Oh, Eugenia, you are a dear,” she cried impulsively. “And your
father is a dear too, and these other girls. But why not give it right to
the college yourselves?”
“Because you’ll think of something nicer than they would to do
with it. Anyway it’s a sort of a present to you—father’s part. You’re
just to say it’s from friends of yours. We don’t want our names
mentioned. You’re the one who put the idea into my head. We’re not
doing it for anything but to please you, and Susanna and Molly
because they liked the idea, and what was the use keeping over
their allowances?”
Betty was glad of this explanation. She had tried to choke back
an ugly little suspicion that this gift might be a part of Eugenia’s
campaign to “make herself known,” by having her father’s name
linked with Mr. Morton’s as a benefactor of Harding. Now she was
reassured on that point, and she thanked Eugenia again, trying to
make her feel how much the money would accomplish.
“I suppose that’s so,” Eugenia agreed, “and we shan’t any of us
miss it. Lots of the girls could give away more than they do, Miss
Wales, only they never think of it.”
“It’s the same way about helping the ones who are rather left out
to have some good times,” Betty put in eagerly. “It doesn’t take much
effort or time from your own fun, and it means such a lot to them.”
“Yes,” Eugenia agreed soberly. “I’m going to try to be more like
that next year. It’s horrid to be as snippy as most of our crowd are.
Some awfully nice girls are left out of things for one reason or
another. We should all have more fun, I guess, if we all had it,”
ended Eugenia rather obscurely. “Good-bye, Miss Wales, until next
fall.”
Betty was wondering busily whether she should be back next fall,
for mother had just written that father’s business was improving fast
and that he hoped to have the family together again soon, when the
supper committee appeared to inquire about the shape of the 19—
table and to consult the president about the seating arrangements.
Betty was deep in the problem of how to get all the speakers on one
side of the table and yet not separate them from their friends, when a
strange gentleman walked in and came straight up to Betty’s desk.
“Miss Wales?” he inquired in businesslike tones.
“I am Miss Wales.” Betty stood up behind the desk, and Kate
Denise and the rest withdrew to a window until the man should have
finished his business with Betty.
“My name is Smith,” he went on. “I represent Furbush, a Boston
antique shop. You’ve heard of it, I presume?”
Betty had not heard of Furbush’s.
“Well, that’s not vital,” Mr. Smith told her smilingly, “because we
buy on a cash basis, so it’s not a question of our credit. I should
have said that I’m up here buying old furniture. I heard you had a
rather good desk that you might like to sell, and some pieces of
brass.”
“Yes, we have those things, but we don’t care to sell any of
them,” Betty told him shortly. The idea of any one’s coming to buy
the Tally-ho’s most prized features, and in commencement week too,
when every minute was precious. Mr. Smith’s hand was on the desk,
but now he looked down as if he had but just discovered the fact.
“Oh, this is the desk I was told about, isn’t it?” he said, and came
around to Betty’s side to see it to better advantage. “It’s a good piece
—a very good piece. I’ll give you a good price for it, Miss Wales. Just
name your figure.”
“I couldn’t, for the desk belongs to the firm—the tea-shop firm,”
Betty answered. “And if we should even decide to sell,—though I
don’t think we shall—two friends of ours are ready to give us the full
value of the desk.”
“Now what would you consider the full value of the desk, Miss
Wales?” Mr. Smith asked, in a tone that was meant to be half
persuasive and half scornful of Miss Wales’s knowledge of antiques.
“I don’t know exactly, and it doesn’t matter at all, because we
don’t wish to sell the desk or anything else that we have.” Betty’s
tone was meant to be wholly anxious for the immediate departure of
the importunate Mr. Smith.
“I’ll give you four hundred dollars for that desk, Miss Wales.
That’s about five times what you paid for it, I guess, and twice what
your friends would give. Furbush’s can pay top prices for a thing they
like, because their customers are the top-price sort.”
Betty was inwardly amazed, both at the sum Mr. Smith offered
and at the accuracy of his guesses about the price Madeline had
paid and the advance Mrs. Bob had offered. But she reflected that if
Furbush’s, of which she had never heard, would pay four hundred
dollars for the desk to-day they probably would pay that or nearly
that later in the week. Babbie was off walking with Mr. Thayer, whom
she was keeping very much in the background because only Betty
and the other two B’s were to know of the engagement until class
supper night, when Babbie meant to run around the table with the
other engaged girls. And Madeline had not yet torn herself away
from her beloved studio apartment, where her latest diversion was
papering her study with “rejection slips” from over-fastidious editors.
The desk certainly could not be sold at any price without Madeline’s
consent. So in the face of Mr. Smith’s munificent offer, Betty
preserved a stony silence which finally evoked a low whistle from
that gentleman.
“All right,” he said, slipping his hand lovingly across the carved
panels and the inlaid fronts of the little drawers. “If you feel that way
about it, Furbush must do without. Now have you the same
objections to selling me a cup of tea?”
“Certainly you can have tea here,” Betty told him. “If you will sit
down at one of the tables you will be served directly.” Then she
turned her attention to Kate and the others, and forgot all about Mr.
Smith, who chose a retired nook in Flying Hoof’s stall, ordered tea
with three kinds of sandwiches, pulled a book out of his pocket, and
explained to the waitress that he liked to eat slowly and read, without
being disturbed.
The supper committee worked out its seating plan and departed,
highly indignant that Betty wouldn’t come up to the campus with
them to pay calls on the lesser stars of the senior play cast, who
were on exhibition in their make-ups.
“I’m lucky to get off to-night for the play,” Betty told them sternly,
and in the pause before dinner she tried to concentrate her mind on
preparing a menu for the next day. She needed to consult Bridget
about several items, and as the tea-room was quite empty and she
would only be gone a minute she slipped out without calling in Emily,
who was busy in the kitchen, to take her place at the desk. When
she came back she was startled to find her chair occupied by Mr.
Smith, who had opened several drawers and was poking the fan-
shaped panel, trying vainly to push it to one side. Betty stared at him
for a moment in amazement, then she called out loudly, “I thought
you had gone, Mr. Smith,” keeping meanwhile close to the kitchen
door which separated her from Bridget, Nora, and Emily, for she had
no idea what a man might do when you caught him robbing your
desk.
But Mr. Smith was not even disconcerted. “Oh, no, Miss Wales,”
he began easily. “Don’t you remember I haven’t paid for my grub?
I’m not the sort of man to go off without paying my bill. I’d finished,
and you weren’t here, so I was taking a last lingering look at your
lovely desk. Seems to me as if there might be a secret drawer
behind one of these panels.” He tapped the panels gently, one after
another, with his knuckles.
“If we ever decide to sell you the desk, Mr. Smith, you can
examine it as closely as you like,” Betty told him with dignity. “But
now I must ask you to leave it alone.”
“Oh, very well,” Mr. Smith answered absently, still fingering the
carved panel in the center.
As Betty watched him indignantly, a dreadful thought came into
her head. The three checks that Eugenia had given her were on the
desk. She had tucked them carelessly under the blotter, meaning to
take them out again as soon as Kate and the others had gone. Betty
did not stop to consider how useless they would be to Mr. Smith. She
only reflected that he was certainly dishonorable, and probably
dishonest, and that the checks were a sacred trust. Mr. Smith was
absorbed in the arrangements of the desk. Betty slipped silently
through the kitchen door and approached Bridget.
“I’m not sure, but I think there’s a burglar in there,” she
whispered. “He’s at the desk, and he won’t get away from it. I want
you to scare him into another part of the room, and then bar the door
until I’ve found out whether or not he’s stolen anything. Do you
understand?”
“Aisy,” returned Bridget calmly, wiping her hands on her apron,
and seizing a poker and a rolling-pin she marched boldly into the
tea-room.
“Scat!” she hissed into the ear of the astonished Mr. Smith, who
jumped back like a frightened rabbit when he saw the poker and the
rolling-pin brandished dangerously about his head. In a minute
Bridget had him prisoned in Flying Hoof’s stall, in front of which she
danced back and forth, waving her improvised weapons frantically.
“I’ve got him,” she called triumphantly to Betty. “An’ if he’s a
burgular fur shure, I’ll kape him safe while Miss Emily do be runnin’
for the perlice.”
It took Betty only an instant to put her hand under the blotter, and
there, just as she had left them, were the three checks.
“Oh, Bridget, he’s not a burglar,” she cried. “The money is here all
right. Let him out the door. I’m sorry, Mr. Smith,” she added with
dignity, “but you certainly acted like a thief, so you mustn’t blame me,
since I knew that there was a large amount of money in the desk, for
treating you like one.”
“Indade it’s a good whack yez desarve for troublin’ me lovely
young ladies,” declared Bridget, reluctantly moving to one side to let
her prisoner pass out.
Mr. Smith, scowling angrily, walked across to the desk that had
been the cause of all the trouble, and threw down the slip Nora had
given him and the change to pay it.
“It’s a pity if a gentleman can’t satisfy his idle curiosity about the
date of an antique desk without being taken for a sneak thief,” he
declaimed angrily, as he started off.
“It’s a pity when a gintlemin ain’t got enough bisniss of his own to
mind so it’ll kape his nose out of other people’s private propity,” cried
Bridget after him, and then she turned her attention to comforting
Betty, who had been dreadfully frightened by the episode.
“I almost wish the desk was sold,” she declared with a sob in her
voice. “It’s always making us trouble with its queer old secret
drawers and the people that try to steal out of it—and don’t.”
“It’s a foine desk that burgulars can’t burgle, I’m thinkin’,” Bridget
declared consolingly.
“But it attracts burglars,” Betty objected, “and being frightened is
almost as bad as being really robbed.”
Madeline, who came that evening, fairly gloated in the mysterious
robbery and the strange conduct of Mr. Smith. “It’s like living in a
detective story,” she declared. “Mr. Smith was hunting for something,
and so were the burglars,—something so valuable that they turned
up their noses at six good round dollars. Those old papers can’t be
valuable. Therefore it stands to reason that there must be something
else in there that we haven’t found—jewels, maybe, worth a king’s
ransom. As soon as I’ve embraced dear old 19—, I’ll have another
hunt.”
But embracing dear old 19— was a more absorbing process than
Madeline had counted it. Class supper night, the grand wind-up of
Harding commencement, arrived, and she had not given another
thought to the hidden treasure.

You might also like