Programmer's Guide to Apache Thrift
()
About this ebook
Programmer's Guide to Apache Thrift provides comprehensive coverage of the Apache Thrift framework along with a developer's-eye view of modern distributed application architecture.
Foreword by Jens Geyer.
About the Technology
Thrift-based distributed software systems are built out of communicating components that use different languages, protocols, and message types. Sitting between them is Thrift, which handles data serialization, transport, and service implementation. Thrift supports many client and server environments and a host of languages ranging from PHP to JavaScript, and from C++ to Go.
About the Book
Programmer's Guide to Apache Thrift provides comprehensive coverage of distributed application communication using the Thrift framework. Packed with code examples and useful insight, this book presents best practices for multi-language distributed development. You'll take a guided tour through transports, protocols, IDL, and servers as you explore programs in C++, Java, and Python. You'll also learn how to work with platforms ranging from browser-based clients to enterprise servers.
What's inside
- Complete coverage of Thrift's IDL
- Building and serializing complex user-defined types
- Plug-in protocols, transports, and data compression
- Creating cross-language services with RPC and messaging systems
About the Reader
Readers should be comfortable with a language like Python, Java, or C++ and the basics of service-oriented or microservice architectures.
About the Author
Randy Abernethy is an Apache Thrift Project Management Committee member and a partner at RX-M.
Table of Contents
- Introduction to Apache Thrift
- Apache Thrift architecture
- Building, testing, and debugging
- Moving bytes with transports
- Serializing data with protocols
- Apache Thrift IDL
- User-defined types
- Implementing services
- Handling exceptions
- Servers
- Building clients and servers with C++
- Building clients and servers with Java
- Building C# clients and servers with .NET Core and Windows
- Building Node.js clients and servers
- Apache Thrift and JavaScript
- Scripting Apache Thrift
- Thrift in the enterprise
William Abernethy
Randy Abernethy is an Apache Thrift Project Management Committee member and a partner at RX-M.
Related to Programmer's Guide to Apache Thrift
Related ebooks
Rx.NET in Action Rating: 0 out of 5 stars0 ratingsjQuery in Action Rating: 0 out of 5 stars0 ratingsOpenStack in Action Rating: 0 out of 5 stars0 ratingsMeteor in Action Rating: 0 out of 5 stars0 ratingsAspectJ in Action: Enterprise AOP with Spring Applications Rating: 0 out of 5 stars0 ratingsRuby in Practice Rating: 0 out of 5 stars0 ratingsAndroid in Action Rating: 0 out of 5 stars0 ratingsASP.NET AJAX in Action Rating: 0 out of 5 stars0 ratingsContinuous Integration in .NET Rating: 0 out of 5 stars0 ratingsThe Quick Python Book Rating: 0 out of 5 stars0 ratingsDependency Injection: Design patterns using Spring and Guice Rating: 0 out of 5 stars0 ratingsLearn Linux in a Month of Lunches Rating: 3 out of 5 stars3/5JavaScript on Things: Hacking hardware for web developers Rating: 0 out of 5 stars0 ratingsThe Well-Grounded Java Developer: Vital techniques of Java 7 and polyglot programming Rating: 4 out of 5 stars4/5iOS Development with Swift Rating: 0 out of 5 stars0 ratingsReactive Application Development Rating: 0 out of 5 stars0 ratingsSpring Microservices in Action, Second Edition Rating: 0 out of 5 stars0 ratingsLearn Rust in a Month of Lunches Rating: 0 out of 5 stars0 ratingsLearning Apache Thrift Rating: 0 out of 5 stars0 ratingsTiny C Projects Rating: 0 out of 5 stars0 ratingsDesign Patterns in C#: A Hands-on Guide with Real-world Examples Rating: 0 out of 5 stars0 ratingsBootstrap for Rails Rating: 0 out of 5 stars0 ratingsOptimizing Visual Studio Code for Python Development: Developing More Efficient and Effective Programs in Python Rating: 0 out of 5 stars0 ratingsSDL Game Development Rating: 0 out of 5 stars0 ratingsNexus 7 For Dummies (Google Tablet) Rating: 0 out of 5 stars0 ratingsBinary Mathematics: Using Simple Symbols Rating: 0 out of 5 stars0 ratingsJetpack Compose 1.5 Essentials: Developing Android Apps with Jetpack Compose 1.5, Android Studio, and Kotlin Rating: 0 out of 5 stars0 ratingsWebSocket Essentials – Building Apps with HTML5 WebSockets Rating: 0 out of 5 stars0 ratingsHaskell from Another Site Rating: 0 out of 5 stars0 ratingsGet Programming: Learn to code with Python Rating: 0 out of 5 stars0 ratings
Databases For You
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator Rating: 0 out of 5 stars0 ratingsBlockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsAccess 2019 For Dummies Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5CompTIA DataSys+ Study Guide: Exam DS0-001 Rating: 0 out of 5 stars0 ratingsBehind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5Python Projects for Everyone Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5Access 2016 For Dummies Rating: 0 out of 5 stars0 ratingsGo in Action Rating: 5 out of 5 stars5/5The Analytic Detective: Decipher Your Company’s Data Clues and Become Irreplaceable Rating: 0 out of 5 stars0 ratingsAccess for Beginners: Access Essentials, #1 Rating: 0 out of 5 stars0 ratingsLearn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5Learning Oracle 12c: A PL/SQL Approach Rating: 0 out of 5 stars0 ratingsAccess 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Learn Git in a Month of Lunches Rating: 0 out of 5 stars0 ratingsAzure SQL Revealed: A Guide to the Cloud for SQL Server Professionals Rating: 0 out of 5 stars0 ratingsA Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsGetting Started with SQL Server 2014 Administration Rating: 0 out of 5 stars0 ratingsPython and SQLite Development Rating: 0 out of 5 stars0 ratingsPractical SQL Rating: 4 out of 5 stars4/5SQL in 30 Pages Rating: 4 out of 5 stars4/5Learning PostgreSQL Rating: 1 out of 5 stars1/5
Reviews for Programmer's Guide to Apache Thrift
0 ratings0 reviews
Book preview
Programmer's Guide to Apache Thrift - William Abernethy
Copyright
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email:
orders@manning.com
©2019 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
Development editors: Cynthia Kane, Jennifer Stout
Technical development editor: Pim van Oerle
Review editor: Ozren Harlović
Project editor: Lori Weidert
Copyeditor: Katie Petito
Proofreader: Alyson Brener
Technical proofreader: Akon Dey
Typesetter: Gordan Salinovic
Illustrator: Chuck Larson
Cover designer: Marija Tudor
ISBN 9781617296161
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – SP – 24 23 22 21 20 19
Dedication
Dedicated to my mom, Kay. You are an inspiration to me in everything I do.
Brief Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this book
About the author
About the cover illustration
1. Apache Thrift overview
Chapter 1. Introduction to Apache Thrift
Chapter 2. Apache Thrift architecture
Chapter 3. Building, testing, and debugging
2. Programming Apache Thrift
Chapter 4. Moving bytes with transports
Chapter 5. Serializing data with protocols
Chapter 6. Apache Thrift IDL
Chapter 7. User-defined types
Chapter 8. Implementing services
Chapter 9. Handling exceptions
Chapter 10. Servers
3. Apache Thrift languages
Chapter 11. Building clients and servers with C++
Chapter 12. Building clients and servers with Java
Chapter 13. Building C# clients and servers with .NET Core and Windows
Chapter 14. Building Node.js clients and servers
Chapter 15. Apache Thrift and JavaScript
Chapter 16. Scripting Apache Thrift
Chapter 17. Thrift in the enterprise
Index
List of Figures
List of Tables
List of Listings
Table of Contents
Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this book
About the author
About the cover illustration
1. Apache Thrift overview
Chapter 1. Introduction to Apache Thrift
1.1. Polyglotism, the pleasure and the pain
1.2. Application integration with Apache Thrift
1.2.1. Type serialization
1.2.2. Service implementation
1.3. Building a simple service
1.3.1. The Hello IDL
1.3.2. The Hello server
1.3.3. A Python client
1.3.4. A C++ client
1.3.5. A Java client
1.4. The communications toolkit landscape
1.4.1. SOAP
1.4.2. REST
1.4.3. Protocol Buffers
1.4.4. Apache Avro
1.4.5. Strengths of Apache Thrift
1.4.6. Take away
Summary
Chapter 2. Apache Thrift architecture
2.1. Transports
2.1.1. The transport interface
2.1.2. Endpoint transports
2.1.3. Layered transports
2.1.4. Server transports
2.2. Protocols
2.3. Apache Thrift IDL
2.3.1. User-defined types and serialization
2.3.2. RPC services
2.4. Servers
2.5. Security
Summary
Chapter 3. Building, testing, and debugging
3.1. Installing the Apache Thrift IDL compiler
3.1.1. Platform installers
3.1.2. VMs and containers
3.1.3. Building from source
3.2. The Apache Thrift source tree
3.3. Apache Thrift tests
3.4. Debugging RPC services
3.4.1. Examining packets on the wire
3.4.2. Unbuffered interfaces
3.4.3. Interface misalignment
3.4.4. I/O stack misalignment
3.4.5. Instrumenting code
3.4.6. Additional techniques
Summary
2. Programming Apache Thrift
Chapter 4. Moving bytes with transports
4.1. Endpoint transports, part 1: Memory & disk
4.1.1. Programming with memory transports
4.1.2. Programming with file transports
4.2. The transport interface
4.2.1. Basic transport operations
4.3. Endpoint transports, part 2: Networks
4.3.1. Network programming with TSocket
4.4. Server transports
4.4.1. Programming network servers with server transports
4.4.2. The server transport interface
4.5. Layered transports
4.5.1. Message framing
Chapter 5. Serializing data with protocols
5.1. Basic serialization with the binary protocol
5.1.1. Using the C++ TBinaryProtocol
5.1.2. Using the Java TBinaryProtocol
5.1.3. Using the Python TBinaryProtocol
5.1.4. Takeaway
5.2. The TProtocol interface
5.2.1. Apache Thrift serialization
5.2.2. C++ TProtocol
5.2.3. Java TProtocol
5.2.4. Python TProtocolBase
5.3. Serializing objects
5.3.1. Struct serialization
5.3.2. Struct de-serialization
5.3.3. Struct evolution
5.4. TCompactProtocol
5.5. TJSONProtocol
5.6. Selecting a protocol
Summary
Chapter 6. Apache Thrift IDL
6.1. Interfaces
6.2. Apache Thrift IDL
6.2.1. IDL file names
6.2.2. Element names
6.2.3. Keywords
6.3. The IDL compiler
6.3.1. Compilation phases and error messages
6.3.2. Command line switches
6.4. Comments and documentation
6.5. Namespaces
6.6. Built-in types
6.6.1. Base types
6.6.2. Container types
6.6.3. Literals
6.7. Constants
6.7.1. C++ interface constant implementation
6.7.2. Java interface constant implementation
6.7.3. Python interface constant implementation
6.8. Typedefs
6.9. Enum
6.10. Structures, unions, exceptions, and argument-lists
6.10.1. Structs
6.10.2. Fields
6.10.3. Exceptions
6.10.4. Unions
6.11. Services
Functions
6.12. Including external files
6.13. Annotations
Summary
Chapter 7. User-defined types
7.1. A simple user-defined type example
7.2. Type design
7.2.1. Namespaces
7.2.2. Constants
7.2.3. Structs
7.2.4. Base types
7.2.5. Typedefs
7.2.6. Field IDs and retiring fields
7.2.7. Enums
7.2.8. Collections
7.2.9. Unions
7.2.10. Requiredness and optional fields
7.3. Serializing objects to disk
7.4. Under the type serialization hood
7.4.1. Serializing with write()
7.4.2. De-serializing with read()
7.5. Type evolution
7.5.1. Renaming fields
7.5.2. Adding fields
7.5.3. Deleting fields
7.5.4. Changing a field’s type
7.5.5. Changing a field’s requiredness
7.5.6. Changing a field’s default value
7.6. Using Zlib compression
7.6.1. Using Zlib with C++
7.6.2. Using Zlib with Python
Summary
Chapter 8. Implementing services
8.1. Declaring IDL services
8.1.1. Parameter identifiers
8.1.2. Parameter requiredness
8.1.3. Default parameter values
8.1.4. Function and parameter types
8.2. Building a simple service
8.2.1. Interfaces
8.2.2. Coding service handlers and test harnesses
8.2.3. Coding RPC servers
8.2.4. Coding RPC clients
8.3. Service interface evolution
8.3.1. Adding features to a service
8.4. RPC services in depth
8.4.1. Under the hood
8.4.2. One-way functions
8.4.3. Service inheritance
8.4.4. Asynchronous clients
Summary
Chapter 9. Handling exceptions
9.1. Apache Thrift exceptions
9.2. TTransportException
9.2.1. C++ exception processing
9.2.2. Java exception processing
9.2.3. Python exception processing
9.2.4. Error processing without exceptions
9.3. TProtocolException
9.4. TApplicationException
9.5. User-defined exceptions
9.5.1. User-defined exception IDL example
9.5.2. C++ user-defined exception client
9.5.3. C++ user-defined exception server
9.5.4. Java user-defined exception client
9.5.5. Python user-defined exception client
Summary
Chapter 10. Servers
10.1. Building a simple server from scratch
10.2. Using multithreaded servers
10.3. Server concurrency models
10.3.1. Connection-based processing
10.3.2. Task-based processing
10.3.3. Multithreading vs. multiprocessing
10.3.4. Server summary by language
10.4. Using factories
10.4.1. Building I/O stacks with factories
10.4.2. Processor and handler factories
10.4.3. In/out factories
10.4.4. Building servers with custom factories and transports
10.5. Server interfaces and event processing
10.5.1. TServer
10.5.2. TServerEventHandler
10.5.3. Building a C++ thread pool server with server events
10.6. Servers and services
10.6.1. Building multiservice servers
10.6.2. Building a multiplexed Java threaded selector server
Summary
3. Apache Thrift languages
Chapter 11. Building clients and servers with C++
11.1. Setting up Apache Thrift for C++ development
11.1.1. Apache Thrift C++ versions and Boost
11.1.2. Building Apache Thrift C++ libraries
11.1.3. Building Apache Thrift C++ libraries on Windows
11.2. A simple client and server
11.2.1. The Hello IDL
11.2.2. Building a simple C++ client
11.2.3. Creating a simple RPC server
11.3. C++ transports, protocols, and servers
11.3.1. C++ transports
11.3.2. C++ protocols
11.3.3. Runtime versus compile time polymorphism
11.3.4. C++ servers
11.4. The C++ TNonBlockingServer
Summary
Chapter 12. Building clients and servers with Java
12.1. Setting up Apache Thrift for Java development
12.1.1. Apache Thrift and SLF4J
12.2. A simple client and server
12.2.1. The Hello IDL
12.2.2. Building a simple Java client
12.2.3. Creating a simple RPC server
12.2.4. Building with Ant
12.2.5. Building with Maven
12.3. Using Apache Thrift in other JVM languages
12.4. Java transports, protocols, and servers
12.4.1. Java transports
12.4.2. Java protocols
12.4.3. Java servers
12.5. Asynchronous Java RPC
Summary
Chapter 13. Building C# clients and servers with .NET Core and Windows
13.1. Setting up Apache Thrift on Windows
13.2. A simple client and server
13.2.1. Creating a Visual Studio RPC solution
13.2.2. Creating the interface library
13.2.3. Creating the RPC server
13.2.4. Creating the RPC client
13.2.5. Testing the RPC application
13.3. C# transports, protocols, and servers
13.3.1. C# transports
13.3.2. C# protocols
13.3.3. C# servers
13.4. Long polling with named pipes
13.4.1. A long polling interface
13.4.2. Installing Apache Thrift support through NuGet
13.4.3. Creating a named pipe server
13.4.4. Building the long polling server
13.4.5. Building a named pipe client
Summary
Chapter 14. Building Node.js clients and servers
14.1. A simple client and server
14.1.1. Generating client/server stubs
14.1.2. Creating a Node.js server
14.1.3. Creating a Node.js client
14.2. Q
14.3. Node.js servers
14.4. Multiplexed services
14.5. Apache Thrift IDL and Node.js
14.5.1. Creating full-featured IDL handlers
14.5.2. Creating a full-featured Node.js client
Summary
Chapter 15. Apache Thrift and JavaScript
15.1. Apache Thrift JavaScript quick start
15.2. A simple client and server
15.2.1. Installing Apache Thrift for JavaScript
15.2.2. The Hello World IDL
15.2.3. The Hello World Node.js server
15.2.4. The Hello World web client
15.2.5. Running the Hello World example
15.2.6. Node.js HTTP clients
15.3. Asynchronous browser client calls
15.4. RPC error handling
15.5. Browser RPC and jQuery
15.6. Apache Thrift and web security
15.6.1. Cross Origin Resource Sharing (CORS)
15.6.2. Content Security Policy (CSP)
15.6.3. X-Frame-Options
15.6.4. Transport security
15.7. Using the WebSocket transport
Summary
Chapter 16. Scripting Apache Thrift
16.1. Apache Thrift and Ruby
16.1.1. A Ruby server
16.1.2. A Ruby client
16.1.3. Ruby features
16.2. Apache Thrift and PHP
16.2.1. A PHP program
16.2.2. A PHP Apache Thrift client
16.2.3. PHP features
16.3. Apache Thrift and Perl
16.4. Apache Thrift Perl clients
16.5. Apache Thrift Perl servers
16.5.1. Apache Thrift Perl features
16.6. Apache Thrift and Python
Summary
Chapter 17. Thrift in the enterprise
17.1. Polyglot systems
17.2. Service tooling and considerations
17.2.1. Services
17.2.2. Interface comparisons
17.3. Messaging
17.4. Best practices
17.4.1. IDL
17.4.2. Interface evolution
17.4.3. Service design
17.4.4. Type design
17.4.5. Coding practices
Summary
Index
List of Figures
List of Tables
List of Listings
Foreword
I first met Randy on the Apache Thrift mailing lists, where we both grew from contributing enthusiasts to committers and finally to PMC members of the Apache Thrift project. Later on I met him a few times in person, and we formed a bond—the kind many programmers are familiar with—while working on a piece of open source software across two continents.
Isn’t it funny how that works? At the same time there are heavy conflicts in certain areas of the world, countless open source projects are bringing people together, to communicate freely and build bridges—across oceans, across continents, and across cultures. And if there is any Apache project that best fits this picture of communication and connections, it’s probably Apache Thrift.
When I became aware of Apache Thrift for the first time, I quickly realized its potential. This RPC and serialization framework is a powerful and enabling technology. It’s easy to use and extremely flexible, and it supports a wide range of target languages and dialects—more than 20 at the time of this writing. Besides establishing connections across languages, Thrift also supports the application developer by crossing platform boundaries.
The consequences of this new freedom for developers are overwhelming. For the first time, we’re in a position where we can literally choose the right tool for the job, on the platform we find most suitable, without having to think too much about how we can integrate it all. This fact alone lets Thrift fit very well in today’s microservice, cloud-native world.
There’s a good chance that you bought this book to find out how you can unleash the nearly unparalleled capabilities of the Apache Thrift framework for your projects. You want to know about the possibilities, use cases, and applications, or how the serialization part could help you with your message-queue–based system. You want to see examples and code and have them explained.
This book gives you all the answers. Randy did a great job creating it, preparing and fine-tuning countless examples to keep pace with the latest developments of the Apache Thrift project. What you hold in your hand is the single most comprehensive publication about Apache Thrift available today.
JENS GEYER
SENIOR SOFTWARE ENGINEER, VSX VOGEL SOFTWARE GMBH
Preface
I’ve been in technology, often in coding roles, for about 30 years. During the dot-com era, I created an institutional equities trading platform that turned into a broker-dealer transacting somewhere around a billion US dollars a day. Needless to say, making sure the technology ran smoothly was a constant concern.
At that company we created technology bits in the line of trading with C++. Building the web-based frontend bits required some JavaScript. When we turned our hands to creating the internal monitoring and support systems, Active Server Pages, and, later, C# were the easiest tools to use. As much as possible, we wanted the language-based systems to interact, rather than have to reinvent bits from one language to the other ourselves.
The platform was based on Windows NT (later Windows 2000), and the RPC elements of the platform were COM+ and described in MS IDL, Microsoft’s interface definition language. While I had used IDL on Unix systems in the past, this was the first big thing I had done in IDL. As the project developed, I became more and more enamored with the engineering processes the IDL abstraction enforced on our organization.
Everything central to the system was represented in IDL, including messages used to place orders and report executions. Interfaces that described the ways in which you could interact with the market data system or the order entry system were concisely defined in a beautifully abstract way. When we hired new engineers, the first thing we asked them to do was dig into the IDL. It was the best way to understand this vast platform without ever clouding or fixing our ideas with implementation code.
Our architecture meetings also focused on the IDL, because the interfaces and structure of the overall platform were critical but the implementation really wasn’t. If you got the implementation wrong, you could rewrite it without impacting anyone else. If you got the interface wrong, the problem would propagate and often becoming debilitating.
There were challenges as well. My wish list included, as time rolled on, the ability to interoperate with Linux systems. Given that these were the Linux is a cancer
days at Microsoft, that wasn’t happening. I also wanted to be able to evolve our IDL without having to rebuild the world each time. A critical flaw in many distributed system technologies is that they don’t allow one element to be updated without also updating all of those interacting with it.
Fast-forward to 2009: I was preparing to architect and develop another trading platform, and I reflected on my IDL wish list. Was it possible that somewhere out there in the cybersphere someone had open-sourced my dream technology for distributed computing? It wasn’t long before I discovered Apache Thrift. I was stunned. Here was a system that worked with every commercially viable programming language and platform, included a compact but elegant IDL, and, most importantly, supported a critical set of features enabling interface evolution. I’ve been an Apache Thrift fan ever since.
In today’s world of microservices and cloud-native systems, where new services are deployed multiple times a day, not having interfaces that support evolution and backward compatibility is a nonstarter. Apache Thrift delivers elegance, evolution, and the performance necessary to support the real-time needs of multiple microservices collaborating where a single monolith once prevailed.
The only thing missing was a book.
Acknowledgments
While documenting a comprehensive serialization and RPC framework that operates across more than 20 programming languages was no small task, imagine what it took to create such a thing! My most profound thanks must first and foremost go to the Apache Thrift developers.
I must also thank my family for putting up with me writing chapters and committing patches in the middle of family gatherings and holidays over the course of several years. Thanksgiving and Christmas holidays turned into chapter-production activities, and no one yelled at me for staring at my laptop for hours while the family played Risk, Settlers of Catan, or what have you.
I owe a special thanks to the folks at Manning. I have to be the biggest laggard they have ever dealt with. No matter how late I was, they were as professional and supportive as a firm could be. In particular I’d like to thank Jenny Stout, who is not only a wonderful person but a great editor; Akon Dey, for his fantastic technical insights; and Kevin Sullivan, for driving the book to completion and helping me with all the final issues necessary to button up the book.
I’d also like to give a huge thank you to the reviewers who took the time to read the chapters and provide invaluable feedback, including Barry Alexander, Carlos Saltos, Chris Snow, Daniel Bryant, Ezra Simeloff, Georges Clerc, Jerry Goodnough, Palak Mathur, Raphaël P. Barazzutti, Ray Morehead, Robin Coe, Rock Lee, and Thomas Lockney. Jens Geyer was without doubt my most stalwart sounding board, providing detailed and thoughtful commentary and guidance from beginning to end. Roger Meier made sure I didn’t miss important topics along the way and shared some of his compelling Apache Thrift IoT projects. Ben Craig kept me honest; when I couldn’t get a good example done, Ben would push me to patch Thrift so that I could. He also saved me from falling into the pit between C++98 and C++11 or committing concurrency crimes. Jake Farrell, the PMC chair, provided encouragement and bore the burden of pushing new Apache Thrift versions out the door while the book developed, managing the complex set of package releases that grows with every new language.
About this book
Programmer’s Guide to Apache Thrift was written to make learning how to use Apache Thrift drastically easier. Open source projects are famous for substandard documentation, and Apache Thrift has traditionally been a poster child for this stereotype. In retrospect, I can see why this is the case! This book and the accompanying source code repository should help newbies get started quickly and enable old hands to design better interfaces.
Who should read this book
Programmer’s Guide to Apache Thrift is for anyone serious about mastering Apache Thrift. Both beginners and experienced Apache Thrift developers will find valuable bits of insight and useful reference material, making it easier to develop quality, extensible interfaces in Apache Thrift.
How this book is organized
The book has 17 chapters divided into three parts:
Part 1 imparts introductory concepts, basic architecture knowledge and Apache Thrift set up, and basic debugging insights. Developers new to Apache Thrift should probably read this part thoroughly, while current Apache Thrift users may want to simply skim it.
Part 2 covers the Apache Thrift system layer by layer, working from the lowest layer, transports, through to the highest layer, servers. Programmers seeking an in-depth understanding of Apache Thrift should read this part end to end. Those interested in a higher-level understanding of Apache Thrift can skim the chapters here, with perhaps a deeper dive into chapter 6, which covers the Apache Thrift IDL in detail.
Part 3 provides language-based walk-throughs that not only demonstrate the use of Apache Thrift in some of the most popular programming languages, but also continue the journey through use cases and features. Part 3 ends with chapter 17, which looks at Apache Thrift serialization in messaging systems, contrasts Apache Thrift IDL with other popular interfaces, such as REST/HTTP, and finally digs into Apache Thrift RPC performance. I would recommend everyone read the chapters on the languages they’re interested in, as well as Chapter 17, which provides important summary information and Apache Thrift best practices.
About the code
This book contains many examples of source code, both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight changes from previous steps in the chapter, such as when a new feature adds to an existing line of code.
In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers ( ). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Numbered markers 1 accompany many of the listings, and mark particular lines and elements discussed in the text.
Source code for the examples in this book is available for download from the publisher’s website at https://www.manning.com/books/programmers-guide-to-apache-thrift or on GitHub at http://github.com/randyabernethy/thriftbook.
liveBook discussion forum
Purchase of Programmer’s Guide to Apache includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/programmers-guide-to-apache-thrift/discussion.
You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
Online resources
Need additional help?
The Apache Thrift mailing lists and IRC chat are both useful resources (https://thrift.apache.org/mailing).
The Thrift tag at StackOverflow (stackoverflow.com/questions/tagged/thrift) is a great place both to ask questions and to help others. Helping someone else is a great way to learn!
About the author
RANDY ABERNETHY is a partner at RX-M LLC, a leading cloud-native systems consultancy. He has been an Apache Thrift user for almost a decade and is currently an Apache Thrift committer and PMC member. He has a passion for distributed systems technology and markets, frequently working with clients in the capital markets and financial services spaces.
About the cover illustration
The figure on the cover of Programmer’s Guide to Apache Thrift is captioned L’agent d’affaires.
The illustration is taken from a collection of works by many artists, edited by Louis Curmer and published in Paris in 1841. The title of the collection is Les Français peints par eux-mêmes, which translates as The French People Painted by Themselves. Each illustration is finely drawn and colored by hand, and the rich variety of drawings in the collection reminds us vividly of how culturally apart the world’s regions, towns, villages, and neighborhoods were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their trade or station in life was just by their dress.
Dress codes have changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns or regions. Perhaps we have traded cultural diversity for a more varied personal life—certainly for a more varied and fast-paced technological life.
At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by pictures from collections such as this one.
Part 1. Apache Thrift overview
Apache Thrift is an open source, cross-language serialization and remote procedure call (RPC) framework. With support for more than 20 programming languages, Apache Thrift can play an important role in many distributed application solutions. As a serialization platform, it enables efficient cross-language storage and retrieval of a wide range of data structures. As an RPC framework, Apache Thrift enables rapid development of complete cross-language services with little more than a few lines of code.
Part 1 of this book will help you understand how Apache Thrift fits into modern distributed application models, while imparting a high-level understanding of the Apache Thrift architecture. Part 1 will also get you started with basic Apache Thrift setup and debugging and includes a look at building a simple cross-language hello world
service.
Chapter 1. Introduction to Apache Thrift
This chapter covers
Using Apache Thrift to unify polyglot systems
Simplifying the creation of high-performance networked services
Introducing the Apache Thrift modular serialization system
Creating a simple Apache Thrift cross-language microservice
Comparing Apache Thrift with other cross-language communications frameworks
Modern software systems live in a networked world. Network communications are critical to the tiniest embedded systems in the Internet of Things through to the weightiest of relational databases anchoring traditional multitier applications. As new software systems increasingly embrace dynamically scheduled, containerized microservices, lightweight, high-performance, language-agnostic network communications are ever more important.
But how to wire all these things together, the old and the new, the big and the small? How do we package a message from a service written in one language in such a way that a program written in any other language can read it? How do we design services that are fast enough for high-performance, backend cloud systems but accessible by frontend scripting technologies? How do we keep things lightweight to support efficient containers and embedded systems? How do we create interfaces that can evolve over time without breaking existing components? How do we do all of this in an open, vendor-neutral way, and, perhaps most important, how can we do it all precisely once, reusing the same communications primitives across a broad platform? For companies such as Facebook, Evernote, and Twitter, the answer is Apache Thrift.
This chapter introduces the Apache Thrift framework and its role in modern distributed applications. We’ll look at why Apache Thrift was created and how it helps programmers build high-performance, cross-language services. To begin, we’ll consider the growing need for multi-language integration and examine the role Apache Thrift plays in polyglot application development. Next, we’ll look at the two key functions of Apache Thrift, serialization and RPC, and walk through the construction of a simple Apache Thrift service. At the end of the chapter we’ll compare Apache Thrift to several other tools offering similar features to help you determine when Apache Thrift might be a good fit.
1.1. Polyglotism, the pleasure and the pain
The number of programming languages in common commercial use has grown considerably in recent years. In 2003, 80% of the Tiobe Index (http://www.tiobe.com/index.php/tiobe_index) was attributed to six programming languages: Java, C, C++, Perl, Visual Basic, and PHP. In 2013, it took nearly twice as many languages to capture the same 80%, adding Objective-C, C#, Python, JavaScript, and Ruby to the list (see figure 1.1). In early 2016 the entire Tiobe top 20 didn’t add up to 80% of the mind share. In Q4 2015, Github reported 19 languages all having more than 10,000 active repositories (http://githut.info/), adding Swift, Go, Scala, and others to the list.
Figure 1.1. The Tiobe Index uses web search results to track programming language popularity (http://www.tiobe.com).
Increasingly, developers and architects choose the programming language most suitable for the task at hand. A developer working on a Big Data project might decide Clojure is the best language to use; meanwhile, folks down the hall may be doing front-end work in TypeScript, while programmers in the basement might be using C with embedded systems (no aversion to sunlight implied). Years ago, this type of diversity would be rare at a single company; now it can be found within a single team.
Choosing a programming language uniquely suited to solving a particular problem can lead to productivity gains and better quality software. When the language fits the problem, friction is reduced, programming becomes more direct, and code becomes simpler and easier to maintain. For example, in large-scale data analysis, horizontal scaling is instrumental to achieving acceptable performance. Functional programming languages such as Haskell, Scala, and Clojure tend to fit naturally here, allowing analytic systems to scale out without complex concurrency concerns.
Platforms drive language adoption as well. Objective-C exploded in popularity when Apple released the iPhone, and Swift is following suit. Go is the language of the booming container ecosystem, responsible for Docker, Kubernetes, etcd, and other essentials. Those programming for the browser will have teams competent with Java-Script or TypeScript, while the game and GUI world still often codes in C++ for top-performing graphics. These choices are driven by history as well as compelling technology underpinnings. Even when such groups are internally monoglots, languages mix and mingle as they collaborate across business boundaries.
Many organizations who claim monoglotism make use of a range of support languages for testing and prototyping. Dynamic programming languages such as Groovy and Ruby are often used for testing, while Lua, Perl, and Python are popular for prototyping, and PHP has a long history with the web. Build systems such as the Groovy-based Gradle and the Ruby-based Rake also provide innovative capabilities.
The polyglot story isn’t all wine and song, however. Mastering a programming language is no small feat, not to mention the tools and libraries that come with it. As this burden is multiplied with each new language, firms may experience diminishing returns. Introducing multiple languages into a product initiative can have numerous costs associated with cross-language integration, developer training, and complexity when building and testing. If managed improperly, these costs can quickly overshadow the benefits of a multi-language strategy.
One of the key strengths of Apache Thrift is its ability to simplify, centralize, and encapsulate the cross-language aspects of a system. Apache Thrift offers broad support, in tree, for polyglot application development. Every language mentioned previously is supported by the Apache Thrift project, more than 20 languages in all, and growing (see table 1.1). This unrivaled direct support for existing languages and the Apache Thrift community’s rapid addition of support for new languages can help organizations maximize the potential of polyglotism while minimizing the downsides. The more our programs mirror the dialog on the floor of the United Nations General Assembly, the more we’ll need professional translators such as Apache Thrift to streamline communications.
Table 1.1. Languages supported by Apache Thrift
1.2. Application integration with Apache Thrift
Whether your application uses multiple platforms and languages or not, it’s likely that its operations span multiple processes over networks and time. At times these processes will need to communicate, either through a file on disk, through a buffer in memory, or across networks. Two central concerns are associated with inter-process communications:
Type serialization
Service implementation
Let’s consider each in turn.
1.2.1. Type serialization
Serialization is a basic function in any cross-platform/language exchange. For example, imagine an application for the music industry that uses NATS as a messaging system to move song data between processes (see figure 1.2). Using NATS, the team can send/receive messages rapidly between their remote processes written in Java and Python. The question is, can the programs read the musical messages when sent by another language? Python objects are represented differently in memory than Java objects. If a Python program sent the raw memory bits for its music track data to a Java program, fireworks would ensue.
Figure 1.2. Apache Thrift can be used to serialize data in cross-platform messaging scenarios.
To solve this problem, we need a data serialization layer on top of the messaging platform. Why not send everything back and forth in JSON, one might ask? Using a standard format such as JSON is part of a solution; however, we must still answer questions such as: how are data fields ordered when sending multi-field messages, what happens when fields are missing, and what does a language that doesn’t directly support a data type do when receiving that data type? These and many other questions cannot be answered by a data layout specification such as JSON, YAML, or XML. Different languages frequently produce different, though legally formatted, documents for the same dataset.
IDL and types
Apache Thrift provides a modular serialization framework that addresses these issues. With Apache Thrift, developers define abstract data types in an Interface Definition Language (IDL). This IDL can then be compiled into source code for any supported language. The generated code provides complete serialization and deserialization logic for all of the user’s defined types. Apache Thrift ensures that types written by any language can be read by any other language. The following listing shows Apache Thrift IDL type definitions for a hypothetical music application.
Listing 1.1. Apache Thrift IDL type definitions
namespace *
music
enum PerfRightsOrg {
ASCAP
= 1
BMI
= 2
SESAC
= 3
Other
= 4 }
typedef double
Minutes
struct MusicTrack {
1:
string title
2:
string artist
3:
string publisher
4:
string composer
5:
Minutes duration
6:
PerfRightsOrg pro
}
Some people complain that creating IDL is an extra step, slowing the development process. I’ve found that it’s the opposite. IDL forces you to carefully consider your interfaces in isolation, free of noisy implementation code. This may be the most important time you spend on a system design. IDL is also lightweight, easy to modify and experiment with, and often useful as a communications tool on the business side.
Users may say schemaless systems are more flexible and that IDL is brittle. The truth is, whether you document your schema or not, you still have a schema if you’re reading and interpreting data. Implied (undocumented) schemas can be the source of fairly treacherous application errors and create a burden on developers who need to interact with the data or extend the system. If you have no definition for the data layout you read and write except the code that reads and writes it, it will be slow going when you want to extend the system. How many bits of code throughout the system depend on this implied schema? How do you change such a thing?
The popularity of NoSQL systems, many of which are schemaless, creates another role for IDL. You can now document your types in a single place and use those types in service calls, with messaging systems and in storage systems such as Redis, MongoDB, and others.
Several systems reverse the process and generate their schema from a given coded solution. Annotation-driven systems, such as Java’s JAX-RS, can work this way. This approach makes it easy to allow implementation details to bias the interface definition, straining portability and clarity. It’s generally much more work to modify implementation code than it is to modify IDL. Also, you have no guarantee that another vendor’s code generator will create compatible code from a foreign schema. This is a problem any time multiple vendors are involved in a communications solution.
Apache Thrift sidesteps many of these problems by providing a single source of truth, the IDL. Apache Thrift supplies vendor-independent support for a single IDL across a wide array of programming languages, and the Apache Thrift cross-language test suit is constantly at work verifying interoperability as the framework grows.
Interface evolution
IDL creates a contract that all parties can rely upon and that code generators can use to create working serialization operations, ensuring the contract is adhered to. Yet IDL schemas need not be brittle. Apache Thrift IDL supports a range of interface evolution features which, when used properly, allow fields to be added and removed, types to be changed, and more.
Support for interface evolution greatly simplifies the task of ongoing software maintenance and extension. Modern engineering sensibilities such as microservices, Continuous Integration (CI), and Continuous Delivery (CD) require systems to support incremental improvements without impacting the rest of the platform. Tools that supply no form of interface evolution tend to break the world
when changed. In such systems, changing an interface means all the clients and servers using that interface must be rewritten and/or recompiled, then redeployed in a big bang.
Apache Thrift interface evolution features allow multiple interface versions to coexist seamlessly in a single operating environment. This makes incremental updates viable, enabling CI/CD pipelines and empowering individual Agile teams to deliver business value at their own cadence.
Continuous Integration (CI) and Continuous Delivery (CD)
Continuous integration is an approach to software development wherein changes to a system are merged into the central code base frequently. These changes are continuously built and tested, usually by automated systems, providing developers with rapid feedback when patches create conflicts or fail tests. Continuous Delivery takes CI one step further, migrating successfully merged code to evaluation/staging systems and ultimately into production, many times per day. The goal of continuous systems is to take many small risks and provide immediate feedback rather than taking large risks and delaying feedback over long release cycles. The longer integration is delayed, the more patches are involved, making it more difficult to identify and repair conflicts and bugs.
Modular serialization
Apache Thrift provides pluggable serializers, known as protocols, allowing you to use any one of several serialization formats for data exchange, including binary for speed, compact for size, and JSON for readability. The same contract (IDL) can remain in place even as you change serialization protocols. This modular approach allows custom serialization protocols to be added as well. Because Apache Thrift is community managed and open source, you can easily change or enhance functionality and push it upstream when needed (patches are always welcome at the Apache Thrift project).
1.2.2. Service implementation
Services are modular application components that provide interfaces accessible over a network. Apache Thrift IDL allows you to define services in addition to types (see listing 1.2). Like types, IDL services can be compiled to generate stub code. Service stubs are used to connect clients and servers in a wide range of languages.
Listing 1.2. /ThriftBook/part1/hello/sail_stats.thrift
service SailStats {
double get_sailor_rating(1: string sailor_name)
double get_team_rating(1: string team_name)
double get_boat_rating(1: i64 boat_serial_number)
list
list
2: double max_rating)
string get_team_captain(1: string team_name)
}
Imagine you have a module that tracks and computes sailing team statistics and that this module is built into a Windows C++ GUI application designed to visualize wind flow dynamics. As it happens, your company’s web dev team wants to use the sail stats module to enhance a client-facing, Node.js-based web application on Linux. Faced with multiple languages and platforms and the laziness
axiom (wanting to write as little code as possible), Apache Thrift could be a good solution (see figure 1.3).
Figure 1.3. The Apache Thrift RPC framework enables cross-platform services.
With Apache Thrift we could repackage the sail stats functions as a microservice and provide the Node.js programmers with access to the service through an easy-to-use Node.js client stub. To create the sail stats microservice we need only define the service interface in IDL, compile the IDL to create client and server stubs for the service, select one of the prebuilt Apache Thrift servers to host the service, and then assemble the parts.
Prebuilt server shells
It’s important to note that, unlike standalone serialization solutions, Apache Thrift comes with a complete set of server shells, ready to use, in almost all the supported languages. This sidesteps the difficult and repetitive process of building custom network servers. The prebuilt Apache Thrift servers are also small and focused, providing only the functionality necessary to host Apache Thrift services. A typical Apache Thrift server will consume an order of magnitude less memory than an equivalent Tomcat deployment. This makes Apache Thrift servers a good choice for containerized microservices and embedded systems that don’t have the resources necessary to run full-blown web or application servers.
Microservices and Service Oriented Architecture (SOA)
The microservice and SOA approaches to distributed application design break applications down into services, which are remotely accessible, autonomous modules composed of a set of closely related functions. Such systems provide their features over language-agnostic interfaces, allowing clients to be constructed in the most appropriate language and on the most appropriate platform, independent of the service implementation. These services are typically (and in the best case) stateless and loosely coupled, communicating with clients through a formal interface contract. Services may be internal to an organization or support clients across business boundaries. The distinction between SOA services and microservices is subtle, but most agree that microservices are a subset of SOA services in which the services are more atomic and independently deployable.
Modular transports
Apache Thrift also offers a pluggable transport system. Apache Thrift clients and servers communicate over transports that adapt Apache Thrift data flows to the outside world. For example, the TSocket transport allows Apache Thrift applications to communicate over TCP/IP sockets. You can use prebuilt transports for other communications schemes, such as named pipes and UNIX domain sockets. Custom transports are easy to craft as well. Apache Thrift also supports offline transports that allow data to be serialized to disk, memory, and other devices.
A particularly elegant aspect of the Apache Thrift transport model is support for layered transports. Protocols serialize application data into a bit stream. Transports read and write the bytes, making any type of manipulation possible. For example, the TZLibTransport is available in many Apache Thrift language libraries and can be layered on top of any other transport to achieve high-ratio data compression. You can branch data to loggers, fork requests to parallel servers, encrypt, and perform any other manner of manipulation with custom-layered transports.
1.3. Building a simple service
To get a better understanding of the practical aspects of Apache Thrift, we’ll build a simple hello world
microservice. The service will be designed to supply various parts of our enterprise with a daily greeting, exposing a single hello_func
function that takes no parameters and returns a greeting string. To see how Apache Thrift works across languages, we’ll build clients in C++, Python, and Java.
1.3.1. The Hello IDL
Most projects involving Apache Thrift begin with careful consideration of the interface components involved. Apache Thrift IDL is similar to C in its notation and makes it easy to define types and services shared across systems. Apache Thrift IDL is plain text saved in files with a .thrift
extension (see the following listing).
Listing 1.3. /ThriftBook/part1/hello/hello.thrift
service HelloSvc { 1
string hello_func() 2
}
Our hello.thrift IDL file declares a single service interface called HelloSvc 1 with a single function, hello_func() 2. The function accepts no parameters and returns a string. To use this interface we can compile it with the Apache Thrift IDL compiler. The IDL compiler binary is named thrift
on UNIX-like systems and thrift.exe
on Windows. The compiler expects two command line arguments, an IDL file to compile and one (or more) target languages to generate code for. Here’s an example session that generates Python stubs for HelloSvc:
/ThriftBook/part1/hello$ ls -l
-rw-r--r-- 1 root root 88 Feb 16 17:01 hello.thrift
/ThriftBook/part1/hello$
thrift --gen py hello.thrift 1
/ThriftBook/part1/hello$
ls -l
drwxr-xr-x 4 root root 4096 Feb 17 00:16 gen-py
2
-rw-r--r-- 1 root root 88 Feb 16 17:01 hello.thrift
In the previous session the IDL compiler is invoked with the --gen py switch 1, which causes the compiler to create a gen-py directory 2 to house the emitted Python code for your hello.thrift IDL. The directory contains client/server stubs for all the services and serialization code for all the user-defined types in the IDL file.
1.3.2. The Hello server
Now that we have our support code generated, we can implement our service and use a prebuilt Apache Thrift server to house it. The following listing provides a sample server coded in Python.
Listing 1.4. /ThriftBook/part1/hello/hello_server.py
At the top of our server listing we use the built-in Python sys module to add the gen-py directory to the Python Path. This allows us to import the generated service stubs for our HelloSvc service 1.
Our next step is to import several Apache Thrift library packages. TSocket provides an endpoint for our clients to connect to, TTransport provides a buffering layer, TBinaryProtocol will handle data serialization, and TServer will give us access to the prebuilt Python server classes 2.
The next block of code implements the HelloSvc service itself through the Hello-Handler class. This class is called a handler in Apache Thrift because is handles all of the calls made to the service. All the service methods must be represented in the Handler class; in our case this is the hello_func() method 3. In real projects, almost all of your time and effort is spent here, implementing services. Apache Thrift takes care of the wiring and boilerplate code.
Next we create an instance of our handler and use it to initialize a processor for our service. The processor is the server-side stub generated by the IDL compiler that turns network service requests into calls to the appropriate handler function 4.
The Apache Thrift library offers endpoint transports for use with files, memory, and various network types: the example here creates a TCP server socket endpoint to accept client connections on TCP port 9090 5. The buffering layer ensures that we make efficient use of the underlying network, transmitting bits only when an entire message has been serialized 6. The binary serialization protocol transmits our data in a fast binary format with little overhead 7.
Apache Thrift provides a range of servers to choose from, each with unique features. The server used here is an instance of the TSimpleServer class, which, as its name implies, provides the most basic server functionality 8. Once constructed, we run the server by calling the serve() method 9.
The following example session runs our Python server:
/ThriftBook/part1/hello$ ls -l
drwxr-xr-x 4 randy randy 4096 Jan 27 02:34 gen-py
-rw-r--r-- 1 randy randy 732 Jan 27 03:44 hello_server.py
-rw-r--r-- 1 randy randy 99 Jan 27 02:24 hello.thrift
/ThriftBook/part1/hello$
python hello_server.py
The Python server took approximately seven lines of code, excluding imports and the service implementation. The story is similar in C++, Java, and most other languages. This is a basic server, but the example should help you see how much leverage Apache Thrift gives you when it comes to quickly creating cross-language microservices.
1.3.3. A Python client
Now that we have our server running, let’s create a simple Python client to test it, as shown in the following listing.
Listing 1.5. /ThriftBook/part1/hello/hello_client.py
The Python client begins by importing the same HelloSvc module used by the server, but the client will use the client-side stubs for the hello service 1. We’ll also import three modules from the Apache Thrift Python library. The first is TSocket, which is used on the client side to make a TCP connection to the server socket 2; as you may guess, the client must use a client-side transport compatible with the server transport. The next import pulls in TTransport, which will provide a network buffer 3, and the TBinaryProtocol import allows us to serialize messages to the server 4. Again, this must match the server implementation.
Our next block of code initializes the TSocket with the host and port to connect to 5. We’ll wrap the socket transport in a buffer 6 and finally wrap the entire transport stack in the TBinaryProtocol 7, creating an I/O stack that can serialize data to and from the server.
The I/O stack is used by the client stub, which acts as a proxy for the remote service 8. Opening the transport causes the client to connect to the server 9. Invoking the hello_func() method on the Client object serializes our call request with the binary protocol and transmits it over the socket to the server, then deserializes the returned result 10. The program prints out the result 11 and then closes the connection using the transport close() method 12.
Here’s a sample session running the above client (the Python server must be running in another shell to respond):
/ThriftBook/part1/hello$ ls -l
drwxr-xr-x 3 randy randy 4096 Mar 26 21:45 gen-py
-rw-r--r-- 1 randy randy 386 Mar 26 21:59 hello_client.py
-rw-r--r-- 1 randy randy 535 Mar 26 16:50 hello_server.py
-rw-r--r-- 1 randy randy 95 Mar 26 16:28 hello.thrift
/ThriftBook/part1/hello$
python hello_client.py
[Client] received: Hello from the python server
While it takes more work than your run of the mill hello world
program, a few lines of IDL and a few lines of Python code have allowed us to create a language-agnostic, OS-agnostic, and platform-agnostic service API with a working client and server. Not bad.
1.3.4. A C++ client
To broaden your perspective and demonstrate the cross-language aspects of Apache Thrift, let’s build two more clients for the hello server, one in C++ and one in Java. We’ll start with the C++ client.
First we need to compile the service definition again, this time generating C++ stubs:
/ThriftBook/part1/hello$ thrift --gen cpp hello.thrift 1
/ThriftBook/part1/hello$
ls -l
drwxr-xr-x 2 randy randy 4096 Mar 26 22:25 gen-cpp
drwxr-xr-x 3 randy randy 4096 Mar 26 21:45 gen-py
-rw-r--r-- 1 randy randy 386 Mar 26 21:59 hello_client.py
-rw-r--r-- 1 randy randy 535 Mar 26 16:50 hello_server.py
-rw-r--r-- 1 randy randy 95 Mar 26 16:28 hello.thrift
Running the IDL compiler with the --gen cpp switch 1 causes it to emit C++ files in the gen-cpp directory, roughly equivalent to those generated for Python, producing C++ headers (.h) and source files (.cpp). The gen-cpp/HelloSvc.h header 1 contains the declarations for our service, and the gen-cpp/HelloSvc.cpp source file contains the implementation of the service stub components.
The code for a HelloSvc C++ client with the same functionality as the Python client appears in the following listing.
Listing 1.6. /ThriftBook/part1/hello/hello_client.cpp
Our C++ client code is structurally identical to the Python client code. With few exceptions, the Apache Thrift meta-model is consistent from language to language, making it easy for developers to work across languages.
The C++ main() function corresponds line for line with the Python code with one exception; hello_func() doesn’t return a string conventionally, rather it returns the string through an out parameter reference 3.
The Apache Thrift language libraries are generally wrapped in namespaces to avoid conflicts in the global namespace. In C++ all of the Apache Thrift library code is located within the apache::thrift
namespace. The using statements here provide implicit access to the necessary Apache Thrift library code 1.
Apache Thrift strives to maintain as few dependencies as possible to keep the development environment simple and portable; however, exceptions do exist. For example, the Apache Thrift C++ library relies on the open source Boost library. In this example, several objects are wrapped in boost::shared_ptr 2. Apache Thrift uses shared_ptr to manage the lifetimes of almost all of the key objects involved in C++ service operations.
Those familiar with C++ will know that shared_ptr has been part of the standard library since C++11. While the sample code is written in C++11, Apache Thrift supports C++98 as well, requiring the use of the Boost version of shared_ptr (C++98 support will likely be dropped in the future, moving all Boost namespace elements to the std namespace).
The following listing shows a Bash session that builds and runs the C++ client.
Listing 1.7. Bash session running C++ client
$ ls -l
drwxr-xr-x 2 randy randy 4096 Mar 26 22:25 gen-cpp
drwxr-xr-x 3 randy randy 4096 Mar 26 21:45 gen-py
-rw-r--r-- 1 randy randy 641 Mar 26 22:36 hello_client.cpp
-rw-r--r-- 1 randy randy 386 Mar 26 21:59 hello_client.py
-rw-r--r-- 1 randy randy 535 Mar 26 16:50 hello_server.py
-rw-r--r-- 1 randy randy 95 Mar 26 16:28 hello.thrift
$ g++ --std=c++11 hello_client.cpp gen-cpp/HelloSvc.cpp -lthrift 1 $ ls -l
-rwxr-xr-x 1 randy randy 136508 Mar 26 22:38 a.out
drwxr-xr-x 2 randy randy 4096 Mar 26 22:25 gen-cpp
drwxr-xr-x 3 randy randy 4096 Mar 26 21:45 gen-py
-rw-r--r-- 1 randy randy 641 Mar 26 22:36 hello_client.cpp
-rw-r--r-- 1 randy randy 386 Mar 26 21:59 hello_client.py
-rw-r--r-- 1 randy randy 535 Mar 26 16:50 hello_server.py
-rw-r--r-- 1 randy randy 95 Mar 26 16:28 hello.thrift
$
./a.out 2
[Client] received: Hello thrift, from the python server
Here we use the Gnu C++ compiler to build the hello_client.cpp file into an executable program 1. Clang, Visual C++, and other compilers are also commonly used to build Apache Thrift C++ applications.
For the C++ build we must compile the generated client stubs found in the HelloSvc.cpp source file. During the link phase the –lthrift
switch tells the linker to scan the standard Apache Thrift C++ library to resolve the TSocket and TBinaryProtocol library dependencies (this switch must follow the list of .cpp files when using g++ or it will be ignored, causing link errors).
Assuming the Python Hello server is still up, we can run our executable C++ client and make a cross-language RPC call. The C++ compiler builds our source into an a.out file that produces the same result as the Python client when executed 2.
1.3.5. A Java client
As a final example let’s put together a Java client for the service. Our first step is to generate Java stubs for the service, as shown in the following listing.
Listing 1.8. Generating Java stubs
/ThriftBook/part1/hello$ thrift --gen java hello.thrift 1
/ThriftBook/part1/hello$
ls -l
-rwxr-xr-x 1 randy randy 136508 Mar 26 23:07 a.out
drwxr-xr-x 2 randy randy 4096 Mar 26 22:25 gen-cpp
drwxr-xr-x 2 randy randy 4096 Mar 26 23:23 gen-java
drwxr-xr-x 3 randy randy 4096 Mar 26 21:45 gen-py
-rw-r--r-- 1 randy randy 641 Mar 26 22:36 hello_client.cpp
-rw-r--r-- 1 randy randy 386 Mar 26 21:59 hello_client.py
-rw-r--r-- 1 randy randy 535 Mar 26 16:50 hello_server.py