Building a Seppia Application from Scratch.

Introduction

The purpose of this article is to show you how to build a simple Seppia application from scratch. There will not be just one .zip file to install and run but only instructions to follow. In this way we believe you will be given a more satisfactory opportunity to familiarize with the Seppia Technology.

These are the steps we will take:

  1. Define the application we need to write
  2. Identify existing open source software we might be able to use (who wants to re-invent the wheel these days ?)
  3. Write a Seppia Application to solve point 1

The Application we need to write

The user requirements are set: "We need a software application able to check for spelling errors in a PDF file (and of course we need it now)" 
In the mind of any experienced developer such requirements can be split into two separate tasks:  

Open Source Projects


The immediate search for open source software to incorporate leads us to the following software:

Note: many interesting alternative products for spell-checking and for PDF-manipulation are available.Our choices were purely based on a few minutes evaluation and should not imply or be taken as our preference or recommendation of one software over another.

 

Building The Application


We can now begin to build our Seppia application. Five steps await you before victory. We hope you will enjoy them.

1.

The first thing to do is to give our application a name. Let's call it GBShaw in honour of the famous writer George Bernard Shaw 1856-1950. Create the directory C:\gbshaw 
unzip Seppia.zip into it so that it looks like this: 


c:\gbshaw\StartUp.class      
c:\gbshaw\jars\              
c:\gbshaw\modules\ 
...

If we are in the dos-prompt we should be able to type and see the following: 


c:\gbshaw>java -cp . StartUp
Seppia successfully installed at URL file:/C:/gbshaw/
 

2.

We now want to create the main module for this application.
Let's call it "org.gbshaw.core" and let's create a javascript for it: 

module/folder: org.gbshaw.core 
id/file:       Main.js 

function main() 
{ 
   java.lang.System.out.println(this +" running"); 
} 

At the moment this javascript does not do much but later on we will modify it to glue together the work of the other modules. 
Let's also not forget to change "org.seppia.bootstrap.StartUp.js" so to launch it. 

module/folder: org.seppia.bootstrap
id/file:       StartUp.js 

//
// redirecting to org.gbshaw.core.Main.js
//
function main() 
{ 
   run("org.gbshaw.core","Main");
} 

So the directory structure should look like this now: 


c:\gbshaw\StartUp.class      
c:\gbshaw\jars\   
c:\gbshaw\modules\ 
c:\gbshaw\modules\org.gbshaw.core\javascripts\Main.js
c:\gbshaw\modules\org.seppia.bootstrap\javascripts\StartUp.js (modified)
...

If we are still in the dos-prompt we should be able to type and see the following: 


c:\gbshaw>java -cp . StartUp
org.gbshaw.core.Main running
 

3.

The next step is to create a module for the spell checker. As we said we have decided to use jazzy (http://jazzy.sourceforge.net/
At the time of writing this article the version of jazzy is 0.5 and development status is "3-alpha".
Now You need to do the following 

Finally write the following javascript file.

module/folder: org.gbshaw.spell
id/file:       SpellChecker.js

var File =                   java.io.File; 
var HashSet =                java.util.HashSet; 
var SpellDictionary =        Packages.com.swabunga.spell.engine.SpellDictionary; 
var SpellDictionaryHashMap = Packages.com.swabunga.spell.engine.SpellDictionaryHashMap; 
var SpellCheckEvent =        Packages.com.swabunga.spell.event.SpellCheckEvent; 
var SpellCheckListener =     Packages.com.swabunga.spell.event.SpellCheckListener; 
var SpellChecker =           Packages.com.swabunga.spell.event.SpellChecker; 
var StringWordTokenizer =    Packages.com.swabunga.spell.event.StringWordTokenizer; 

//
// Creates a JavaScript Object with one method to check the spelling on a text.
//
function main() 
{ 
   var obj = new Object(); 
   obj.execute = execute; 
   return obj; 
} 

// 
// This function receives a String (text) and parses for misspelt words.
//
function execute(text) 
{ 
   var url = Packages.org.seppia.core.IO.subURL(module.url,"dictionary/english.0");
   var file =  new File(url.getFile());
   var dictionary = new SpellDictionaryHashMap(file, null); 
   var spellCheck = new SpellChecker(dictionary); 
   var swt = new StringWordTokenizer(text); 
   spellCheck.checkSpelling(swt); 
   var badWords = new HashSet(); 
   spellCheck.addSpellCheckListener(function spellingError(event) 
   { 
      var suggestions = event.getSuggestions(); 
      var invalidWord = event.getInvalidWord();           
      if (invalidWord.length()<2) return; 
      if (badWords.contains(invalidWord)) return; 
      badWords.add(invalidWord); 
      java.lang.System.out.println("MISSPELT WORD: " + event.getInvalidWord()); 
   }); 
   spellCheck.checkSpelling(new StringWordTokenizer(text)); 
} 

The above javascript gives us a mechanism to send a string (a text) in input and get the 'bad' words in output. 
We can easily test its correctness by changing org.myspellchecker.core.Main to use it. 

module/folder: org.gbshaw.core
id/file:       Main.js

function main() 
{ 
   var spellChecker = run("org.gbshaw.spell","SpellChecker");
   spellChecker.execute("What a beautiful day to make a mistaaake"); 
} 

If we are still in the dos-prompt we should be able to type and see the following: 


c:\gbshaw>java -cp . StartUp
MISSPELT WORD: mistaaake

4.

We now want create a module for the text extractor from pdf  Our choice was on pdfbox (http://www.pdfbox.org
downloadable at http://sourceforge.net/project/showfiles.php?group_id=78314 At the time of this article the available version is 0.6.7a. So those are the instructions:

Finally create the following javascript:

module/folder: org.gbshaw.pdf
id/file:       TextExtractor.js

//	Silencing LOG4J (used internally by PDF Box)
var p = new Properties(); 
p.setProperty("log4j.rootCategory","OFF"); 
PropertyConfigurator.configure(p); 

//
// Creates a JavaScript Object with just one method to extract text from a pdf file.
//
function main() 
{ 
   var obj = new Object(); 
   obj.execute = execute; 
   return obj; 
} 

//
// This function receives a file (.PDF) in input and returns String (text)
// with the extracted text.
//
function execute(file) 
{ 
   var inputStream = new FileInputStream(file); 
   var parser = new PDFParser(inputStream); 
   parser.parse(); 
   inputStream.close(); 
   var document = parser.getPDDocument(); 
   var stripper = new PDFTextStripper(); 
   var sw = new StringWriter(); 
   stripper.writeText(document, sw); 
   document.close();
   return sw.getBuffer().toString(); 
}

to test this module we need to

 

module/folder: org.gbshaw.core
id/file:       Main.js

var File = java.io.File;

function main()
{
   var textExtractor = run("org.gbshaw.pdf","TextExtractor");
   var s = textExtractor.execute(new File("c:\\eclipse-overview.pdf"));
   java.lang.System.out.println(s);
} 

If we are still in the dos-prompt we should be able to type and see the following: 


c:\gbshaw>java -cp . StartUp
Eclipse Platform 
Technical Overview 
Object Technology International, Inc. 
February 2003 (updated for 2.1; originally published July 2001) 

Abstract: The Eclipse Platform is designed for building 
...
...
 

5.

Now that we have written two useful modules the last remaining thing to do is to glue their functionality. This can be achieved by modifying (once again) the javascript org.myspellchecker.core.Main 

module/folder: org.gbshaw.core
id/file:       Main.js

var System = java.lang.System;
var File =   java.io.File;

function main()
{
   var f = new File("c:\\eclipse-overview.pdf")
   
   System.out.println("GB Shaw Started.");
   System.out.println("Analyzing File: "+f);
   
   var textExtractor = run("org.gbshaw.pdf","TextExtractor");
   var spellChecker = run("org.gbshaw.spell","SpellChecker");
   var s = textExtractor.execute(f);
   spellChecker.execute(s); 
}

If we are still in the dos-prompt we should be able to type and see the following: 


c:\gbshaw>java -cp . StartUp
GB Shaw Started.
Analyzing File: c:\eclipse-overview.pdf
MISSPELT WORD: IDEs
MISSPELT WORD: Java
MISSPELT WORD: JavaBeans
MISSPELT WORD: Workspaces
MISSPELT WORD: Workbench_and_UI_Toolkits
MISSPELT WORD: JFace
MISSPELT WORD: UI_Integration
MISSPELT WORD: JDT_Features
MISSPELT WORD: Java_Projects
MISSPELT WORD: Java_Compiler
MISSPELT WORD: Java_Model
MISSPELT WORD: Java_UI
MISSPELT WORD: workspace
MISSPELT WORD: dos
MISSPELT WORD: Kinsella
MISSPELT WORD: ISVs
...
...

Congratulations you have finished your Seppia Application. Of course there are still things to do (for example we have 'hardcoded' the name of the PDF file in the Main javascript), yet the application meets the requirements we wrote at the beginning of the article and it should not be too difficult going back  tidying it up and improving it.

Final Considerations

Now that your Seppia application is complete you might wonder whether it would have not been better to write it as a standard java application. We think not and we would like to explain why. 

Because Seppia technology sets the rules of where to place jars, javascripts other resources your application benefits from an elegant and well-formed directory structure. 
The fact that there is no classpath to set or -D properties to pass or string array to pass to the main method follows from it as a valuable benefit. 
Therefore, except for providing additional material like readme files, documents and licenses, shipping your application just requires to zip its root directory. 

The requirements of your application will change. Rather than parsing just one .pdf file you might need to be able to work with many .pdf files, peraphs selecting only files whose names are beginning with a certain sequence of characters. 
In order to address this requirement you simply need to modify the javascript "Main" in the module "org.gbshaw.core" to do so. And that is it. There is nothing else to do it. No code to recompile. No new jars to ship or classes to patch. 

GBShaw has become popular and now we need to be able to parse .doc files. That's not too difficult !!! If we can find a good third party software able to do most of this work all we need to do is to glue it into our application. Just create a new module to work with .doc files, adds the jar files in the jar directory and write a similar javascript to "org.gbshaw.pdf.TextExtractor" to do the work.