Expand your sphere with a piece of Google
Unlimited Worlds
Amazon started converting its excess computational power to cash some time ago. Under such names as EC2 (Elastic Computing Cloud), S3 (Simple Storage Service), and SimpleDB, the book and coffee machine mail order company has had considerable success leasing virtual machines, disk space, and distributed database services to users.
Introducing App Engine
Google's release of App Engine brought another web hosting giant to this emerging market. The search engine host can easily spare the necessary power; Google's giant data centers are unlikely to be fazed by just another web application. But Google was not satisfied with providing simple disk space – it added a chic interior as well. Garnished with a couple of buzzwords, the results of this project were ready for the general public in May 2008. Three years later, it's still in a beta phase, but it's nevertheless quite stable and usable.
The product designation, Google App Engine, covers a number of services. To begin with, anybody who registers for a free user account with Google is given some free disk space on Google's servers, a database, and the ability to launch their own web application.
At first, App Engine only supported Python. (Incidentally, Python's creator, Guido van Rossum, hired by Google in 2005, was responsible for the corresponding App Engine environment.) Users got to choose the next programming language to be supported by the App Engine themselves [1]. Most voters opted for a Java environment, which was finally completed by Google in April 2009. The third language will be Go [2]. At the time of writing this, the support for Go is still marked as experimental.
In all three languages, a couple of interesting libraries are available by default. In Python, for example, the popular Django framework facilitates the creation of HTML with its template collection, and the WebApp Library helps process requests. Google is looking to support more languages in the near future, although they are not revealing which ones right now.
All of this sounds fairly encouraging, but it is unlikely to have expert web developers dancing in the streets. However, another service available through the package is a bit more interesting: automatic scaling for your own web applications. No matter how many requests hit your application at the same time, no matter what volume of data you need to store in the database, the Google servers will handle this in the background.
As a pleasant side effect, you do not need to bother with configuring, maintaining, or load balancing the infrastructure – typically in the form of Linux with Apache and MySQL. You can just upload your web application and enjoy the ride. Keep in mind that Google imposes limits to this freefest. Their hospitality is limited to 500MB of disk space and traffic volume of 1GB per day (see the "Restricted Traffic" box for other restrictions). If you need more, you pay, but you are not forced to lease a complete, typically expensive, (virtual) server. Instead, you just order additional power – as much as you need.
This model gives small businesses in particular the chance to experiment with new web applications without running too much risk. The free start removes the need for an initial outlay; you just have to set up the web server and the database. Once your business model starts to return profit and hit the limits of the free account, you simply order additional resources. The "Google App Services" box summarizes the available services.
Sandbox
Every Google App Engine application runs in a sandbox that prohibits access to other parts of the system. The service distributes incoming browser requests over multiple servers and is said to be capable of starting and stopping servers depending on the data traffic volumes. For this to happen in a trouble-free way – and, above all, transparently for the user – the service must be abstracted from the underlying operating system. Your web application thus can't access the operating system; neither is it aware of which server it runs on.
Another restriction is that a web application does not have write access to the filesystem; it can only read files that the user has uploaded previously to the server along with the application. Also, you are forced to use the database Google provides if you want to save your data persistently.
If you have the bright idea of offloading your files onto other external servers at this point, you should know that the web application is only allowed to use the libraries and APIs provided to communicate with other computers; that is, they can use email or the URL Fetch API. The latter can only handle HTTP or HTTPS requests. Incidentally, this also applies in the reverse direction. External web applications can only use HTTP and HTTPS requests to talk to the App Engine application. Additionally, the application can use several APIs to communicate with JavaScript clients or XMPP-compatible instant messaging services (e.g., Google Talk).
To prevent misuse of the Google infrastructure, Google restricts processor time and thus execution time for each Python script or Java program. If it takes too long to execute, or if it does not return its results in good time, Google will terminate the execution and return an error message instead. If a script or program exceeds its maximum run time too many times, the App Engine will punish the web application by reducing this limit. The App Engine back ends are an exception. These special Java programs or Python scripts have access to more memory (up to 1GB) and more CPU power and are able to run for a longer time than the normal App Engine programs. The App Engine back ends are intended for computing-intensive or continuous-running applications; they must specify their hardware needs at startup and are billed for uptime.
As another security measure, the App Engine only runs Python scripts, Java programs, or Go applications in response to incoming requests. After a response has been returned, a request handler can no longer trigger internal subprocesses. This makes repetitive and background tasks, such as cron jobs that clean up every hour, impossible to implement. However, Google set up the "App Engine Cron Service," which calls one or multiple URLs at freely defined intervals [3] [4].
Test Environment
If you can live with these restrictions, your first step is to download the free App Engine Software Development Kit (SDK) [5]. With this kit, you can develop and test your web application in your home environment on Linux, Windows, or Mac OS X. To this end, the SDK provides not only the full set of libraries you will find on the Google servers later but also a web server that emulates the full set of App Engine services, including the sandbox and the database.
A separate SDK is available for all supported languages. Each was written in the language in question – with the advantage that it will run on any computer with either Python 2.5 or Java version 5 or newer. Python programmers need to use version 2.5; the SDK and the environment on the Google servers do not support the current Python 3.0. The SDK for Go exists at the moment only for Linux and Mac OS X.
Linux users can simply grab the corresponding ZIP archive off the download page and unpack it in a directory of their choice to complete the installation. On Windows, a Microsoft installer does the work, and for Mac OS, you can just download the disk image.
Pythonesque
Your new web application will need a project directory: For this exercise, I'll call it helloworld
. At this point, Python and Java programmers head in different directions. Java programmers should skip to the "Coffee Time" box.
Listing 1: Java "Hello World" Example
01 package helloworld; 02 03 import java.io.IOException; 04 import javax.servlet.http.*; 05 06 public class HelloworldServlet extends HttpServlet { 07 public void doGet(HttpServletRequest req, HttpServletResponse resp) 08 throws IOException { 09 resp.setContentType("text/plain"); 10 resp.getWriter().println("Hello World!"); 11 } 12 }
Listing 2: Java web.xml for "Hello World!"
01 <?xml version="1.0" encoding="utf-8"?> 02 <!DOCTYPE web-app PUBLIC 03 "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 04 "http://java.sun.com/dtd/web-app_2_3.dtd"> 05 06 <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5"> 07 <servlet> 08 <servlet-name>helloworld</servlet-name> 09 <servlet-class>helloworld.HelloworldServlet</servlet-class> 10 </servlet> 11 <servlet-mapping> 12 <servlet-name>helloworld</servlet-name> 13 <url-pattern>/helloworld</url-pattern> 14 </servlet-mapping> 15 <welcome-file-list> 16 <welcome-file>index.html</welcome-file> 17 </welcome-file-list> 18 </web-app>
Listing 3: Java build.xml
01 <project> 02 <property name="sdk.dir" location="../appengine-java-sdk-1.5.0.1" /> 03 04 <import file="${sdk.dir}/config/user/ant-macros.xml" /> 05 06 <path id="project.classpath"> 07 <pathelement path="war/WEB-INF/classes" /> 08 <fileset dir="war/WEB-INF/lib"> 09 <include name="**/*.jar" /> 10 </fileset> 11 <fileset dir="${sdk.dir}/lib"> 12 <include name="shared/**/*.jar" /> 13 </fileset> 14 </path> 15 16 <target name="copyjars" description="Copies the App Engine JARs to the WAR."> 17 <copy todir="war/WEB-INF/lib" flatten="true"> 18 <fileset dir="${sdk.dir}/lib/user"> 19 <include name="**/*.jar" /> 20 </fileset> 21 </copy> 22 </target> 23 24 <target name="compile" depends="copyjars" description="Compiles Java source and copies other source files to the WAR."> 25 <mkdir dir="war/WEB-INF/classes" /> 26 <copy todir="war/WEB-INF/classes"> 27 <fileset dir="src"> 28 <exclude name="**/*.java" /> 29 </fileset> 30 </copy> 31 <javac srcdir="src" destdir="war/WEB-INF/classes" classpathref="project.classpath" debug="on" /> 32 </target> 33 </project>
Python programmers need to create an app.yaml
configuration file, which gives the App Engine some information about your web app. Listing 4 shows the minimal version.
Listing 4: Python app.yaml
01 application: helloworld 02 version: 1 03 api_version: 1 04 runtime: python 05 06 handlers: 07 - url: .* 08 script: helloworld.py
The first four settings should be self-explanatory. The application name follows the application keyword; this is the name you will use when you register the application with Google. The section following handlers
specifies how the App Engine should map the URLs called by the browser to the web application code. In this case, the helloworld.py
script will handle all incoming requests.
As a response, you want the script to return a simple string, "Hello World," to the browser. Listing 5 proves that this is not exactly rocket science.
Listing 5: Python helloworld.py
01 #!/usr/bin/env python 02 03 print 'Content-Type: text/plain' 04 print '' 05 print 'Hello World!'
Getting Started
Now you have a tiny, but complete, App Engine application. To test it, change to the SDK directory and start the server provided by Google:
./dev_appserver ~/helloworld
This command takes the content from the project directory and lets you access it in your browser. The last line in the terminal has the address you need, which is typically http://localhost:8080
(see Figure 1).
The web server provided with the SDK automatically detects source code changes and applies them without delay. The server does not have to be restarted for each test run. Error messages and issues are output to the terminal.
Facilitated
The run-time environment on the server supports the full scope of the Python language, version 2.5.2, and most of the standard Python libraries. Functions that would compromise the security of the sandbox have been removed, including, for example, functions for opening sockets or writing to files. For the same reasons, the run-time environment will only execute pure Python code. Extensions written in C are not permitted.
To compensate for this, the run-time environment provides various Python APIs, most of which are for access to Google's own services. For example, Google Accounts simplifies user login and management, whereas URL Fetch is useful for communicating with other web applications. An API for email dispatch and a small library for JPG and PNG image manipulation are also available.
If you need to save temporary data or cache results (on a large scale), you will need the Memcache service. Memcache gives your web application a "high-performance in-memory key-value cache" and remains persistent over multiple instances of the application. Memcache is also perfect for buffering data from the database for accelerated access.
In the Framework
Finally, the web application framework, webapp
, helps handle web requests. Listing 6 shows the "Hello World" example with the framework added.
Listing 6: helloworld.py
01 #!/usr/bin/env python 02 03 from google.appengine.ext import webapp 04 from google.appengine.ext.webapp.util import run_wsgi_app 05 06 class MyHandler(webapp.RequestHandler): 07 def get(self): 08 self.response.out.write("Hello World!") 09 10 def main(): 11 app = webapp.WSGIApplication([ 12 (r'.*', MyHandler)], debug=True) 13 run_wsgi_app(app) 14 15 if __name__ == "__main__": 16 main()
The application starts by defining a class derived from webapp.RequestHandler
to process incoming requests as a handler. The MyHandler
class only processes GET requests. A get()
is executed whenever a request arrives, and in this case, it simply outputs "Hello World."
The second part of Listing 6 uses the Python Web Server Gateway Interface (WSGI) from the wsgiref library [10], which was introduced in Python 2.5, to execute the web application, where run_wsgi_app()
is a wrapper for wsgiref.handlers.CGIHandler().run(app)
.
No matter how small your web application, you should resist the temptation to use HTML commands to format the output directly in Python script. This approach will lead to unmanageable spaghetti code sooner or later.
It makes far more sense to separate the display from the content. Google has taken this requirement into account by integrating the popular Django web application framework [11].
Because of the restrictions applied by the sandbox and the properties of the distributed Google database, some components have been disabled. The most important element, the Django Template Engine, still works as expected. Listing 7 shows a short example of separating display aspects and content.
Listing 7: Script with Django Template Engine
01 #!/usr/bin/env python 02 03 import os 04 from google.appengine.ext import webapp 05 from google.appengine.ext.webapp.util import run_wsgi_app 06 from google.appengine.ext.webapp import template 07 08 class MyHandler(webapp.RequestHandler): 09 def get(self): 10 variable1 = "No" 11 variable2 = "Idea" 12 13 template_values = { 14 'firstname': variable1, 15 'familyname': variable2, 16 } 17 18 path = os.path.join(os.path.dirname(__file__), 'helloworld.html') 19 self.response.out.write(template.render(path, template_values)) 20 21 def main(): 22 app = webapp.WSGIApplication([ 23 (r'.*', MyHandler)], debug=True) 24 run_wsgi_app(app) 25 26 27 if __name__ == "__main__": 28 main()
The Python script starts by bundling variable1
and variable2
into a dictionary called template_values
. Then, it passes the information to the HTML file helloworld.html
in Listing 8. Here, you can access the content as the name specified in the dictionary.
Listing 8: The helloworld.html Template
01 <html> 02 <body> 03 <p> 04 {{ firstname }} <i>{{ familyname }}</i> 05 </p> 06 </body>
The Django Template Engine replaces the double curly braces with the content in the elements they designate. The helloworld.html
file is thus a kind of template with content dictated by the script.
Do It Yourself
If you are not happy with the selection of libraries that Google offers, you can upload your own library along with the application and use it – assuming it is written in pure Python. It only uses functions available in the standard libraries, and it keeps to the other restrictions.
Bottomless Pits
A web application typically handles a large volume of data that it needs to store safely and persistently. The App Engine does not use a relational database for this; instead, it uses a powerful, distributed data storage service: the App Engine Datastore (or datastore for short). This database is based on the distributed GFS filesystem and Google's own BigTable storage system.
The latter has a couple of notable qualities, such as its ability to scale into the petabyte range and across several thousand computers.
In your own applications, you do not need to worry about the size and response times of the database. You can just drop everything into the database and rely on the results coming back in next to no time. For this to happen, the datastore does a few things differently than you might expect from a legacy relational database.
To be able to use the datastore, you need to include the API with your Python script:
from google.appengine.ext import db
Instead of creating a table, as you might in, say, MySQL, you now define a data model. A data model is a (Python) object that inherits from db.Model
and whose attributes are the data to be stored. Listing 9 shows an example.
Listing 9: Address Class
01 class Address(db.Model): 02 name = db.StringProperty(required=True) 03 street = db.StringProperty() 04 zip = db.IntegerProperty() 05 city = db.StringProperty() 06 birthdate = db.DateProperty()
In the datastore's terminology, the data objects to be stored (entities) have a number of attributes (properties). Some of the properties the address in Listing 9 possesses are name
, street
, and zip
code. These properties would be columns in a table if you used an SQL database.
Here, too, properties always have a type (property value type). The name, for example, is a string: db.StringProperty()
.
Additionally, users can define restrictions for properties. For example, the address in the previous example must always contain a name: required=True
.
After defining the data model, you can create a tangible data object (Listing 10). As with an object database, the user simply pushes the object into the datastore with anaddress.put()
.
Listing 10: Data Object
01 anaddress = Address(name="Harry Kowalski" 02 street="Main street 6" 03 zip=12345 04 city="Kansas" 05 birthdate=datetime.date(2002,2,2))
Gimme!
A database request returns the stored data. The datastore uses GQL, an extremely lean variant of SQL. For example, the following command fishes all the addresses out of the database and stores them in addresses:
addresses = db.GqlQuery ('SELECT * FROM Address')
Listing 11 shows the complete GQL command set. If you are familiar with SQL, the reduced command set might be somewhat of a surprise at first. Besides details such as LIKE
, another SQL option you do not get is a JOIN
instruction.
Listing 11: GQL Syntax
01 SELECT [* | __key__] FROM <kind> 02 [WHERE <condition> [AND <condition> ...]] 03 [ORDER BY <property> [ASC | DESC] [, <property> [ASC | DESC] ...]] 04 [LIMIT [<offset>,]<count>] 05 [OFFSET <offset>] 06 07 <condition> := <property> {< | <= | > | >= | = | != } <value> 08 <condition> := <property> IN <list> 09 <condition> := ANCESTOR IS <entity or key>
According to Google, the distributed nature of the database outlaws this operator. Joins would only be possible if you joined the data locally. However, considering the potential data volume the datastore has to handle, the content is always distributed over multiple computers.
Xenophobic
In line with Google's restrictions, App Engine applications will use the Users API for user management. With this API, just a few lines of code are needed to generate a complete login form. Google account holders can use this to log in directly to the web application. If successful, the web application knows the user's name and email address. Additionally, the application knows whether the logged-in user is the administrator. This feature gives developers a simple approach to creating special administrative back ends.
Also, thanks to the Users API, you don't need to set up additional accounts for users, and application developers needn't worry about implementing user management. However, this design does prevent you from porting your application to a different hosting service, and all users must register for a Google account, which might scare off potential users if they do not trust Google.
Uploading
After completing your web application and making sure it runs without error in the SDK environment, the next step is to upload it to the Google servers. To do so, you need a Google user account and a cell phone. If you don't have an account, you can register for a free account [5]. When you use the credentials Google sends you to log in to appengine.google.com
, you can go ahead and click Create an Application in the new dialog to make space for your new web application. As of this writing, any user is allowed to run 10 applications at the same time.
The next step is a slightly laborious validation of your credentials. For validation to occur, you need to enter your cell phone number (and possibly the international dialing code). Google will then text a message to this number with a cryptic numeric code that you will have to enter into the next dialog box. After negotiating this obstacle, you are finally free to create your web application.
To do this, you need to choose a free subdomain below appspot.com
. This address is where you can reach your web application later. Click Check Availability to find out if your choice of subdomain is still available. The subdomain is also the official name of your web application (the application identifier). You need to type this application identifier as the value for the application:
parameter in the configuration file (app.yaml
). Although the application identifier is only used internally, the Application Title, which you also need to provide, is shown to users. For this title, you can choose an arbitrary name.
After you acknowledge the "General Terms and Conditions" and click Save, Google will give you the 500MB disk space you were promised. To deposit your own web application in this space, just run appcfg.py
from the SDK:
./appcfg.py update ~/helloworld
This command pushes the application from the ~/helloworld
directory to the server. To do this, it needs your Google username (that's the email address in the upper right-hand corner) and the corresponding password. And that's all, folks! As of now, your web application will be waiting for you and other visitors at <application-id>.appspot.com
.
Command Center
The Dashboard, which is also known as the administration console (Figure 2), gives precise statistics on your web application's load, manages your domain name, displays the error logs, and lets you a peek into the datastore.
Another Dashboard function is automatic version management; if you create a modified version, increment the version
parameter in your configuration file (app.yaml
).
App Engine will detect the change and back up the previous version. Also, the previous version continues running on the server until you explicitly state that you want to switch to the new version. Even then, the old version is simply moved into an archive, so you can switch back (Figure 3).
Criticism
The Google App Engine is fun to use. It tempts developers with free disk space and takes much of the grunt work off their shoulders. However, it is also probably App Engine's major weakness. Exclusive services like the datastore mean you must commit to Google; it will be very difficult to move to a different hosting service later, and even more painful if you hit the limits of the free variant and Google starts billing you.
Whether Google App Engine works with your business might also depend on the scripting languages you use. Right now, you can choose any language – as long as it is Python or Java.