< documentation
Guide
running
walking
crawling
running
-
The goal of Envjs is to provide a highly portable javascript implementation of the Browser as a scripting environment ( often referred to as a 'headless' browser ).
The default implementation is Rhino but many developers have and continue to develop bridges for running Envjs in Ruby, Python, and other host languages with the SpiderMonkey and V8 javascript engines, to name a few.
The examples below will guide you through how to use Envjs as an end-user, noting in each example which platform the example pertains to. Platform developers should add equivalent examples whenever possible.
A Warning...
Envjs will not automatically load and run the external javascript unless the script tags have the attribute type='text/envjs'. To enable all external javascript files you only have to tell Envjs to do so, however remember that all javascript executed will have read/write access to your file system
Please be aware of the dangers of loading arbitrary code in an insecure environment.
-
rhino
When running with generic rhino, you will only need the latest rhino, (the rhino bundled with java 1.6 is not recent enough). You will also need to write a javascript file that is responsible for loading env.rhino.js, setting available options if any, and finally setting window.location.
#!bash # Running env.rhino.js from a script or the command line # Note the optimization setting java -jar lib/js.jar -opt -1 myscript.js # or to simply invoke the javascript shell (you can then manually enter # the steps in the javascript example below) java -jar js.jar # rhino also has a great debugger built right into it which # can be run with the following: java -cp lib/js.jar org.mozilla.javascript.tools.debugger.Main myscript.js
#!bash # creating an alias will simplify the commands we need to # invoke to run with rhino, and in general makes our examples # more applicable to using envjs on other javascript platforms alias js="\ java -cp lib/js.jar org.mozilla.javascript.tools.shell.Main -opt -1" alias jsd="\ java -cp lib/js.jar org.mozilla.javascript.tools.debugger.Main" # now you can just invoke # > js myscript.js # or even just # > js # to enter an interactive shell, similarly, # > jsd myscript.js # will open the rhino debugger -
console
Envjs can also be run in a console mode as mentioned above and its a great way to explore its features. Rhino 1.7rc2 provides a great console. You may find on Mac OSX that your up/down/left/right arrows and tab complete dont work. If this occurs just add JLine to the mix.
In the following example console session, you'll notice that after loading the env.rhino.js script, we enable the scriptType 'text/javascript'. The we can simply open a url, and the javascript in that page will be loaded (in this case making jQuery automatically available to use).
#!bash #in case your up/down/left/right arrows dont work in console mode alias js="java -cp lib/js.jar:lib/jline.jar \ jline.ConsoleRunner org.mozilla.javascript.tools.shell.Main -opt -1 "#!bash # An example console session > js Rhino 1.7 release 2 2009 03 22 js> load('lib/env.rhino.js') [ Envjs/1.6 (Rhino; U; Mac OS X i386 10.5.8; en-US; rv:1.7.0.rc2) Resig/20070309 PilotFish/1.2.13 ] js> Envjs.scriptTypes['text/javascript'] = true; true js> window.location = "http://localhost:8080/" http://localhost:8080/ js> jQuery function (selector, context) { return new jQuery.fn.init(selector, context); } js> $('#welcome > p').text() Envjs is a simulated browser environment written in javascript. It was originally developed by John Resig and discussed in his blog here. Envjs is now supported by a community of developers who all use Envjs as part of their own open source projects. js> console.log('successfully loaded %s', document.location); successfully loaded http://localhost:8080/ js> $.get('http://localhost:8080/rest/', null, function(response){ console.log(response) }, 'text') [object Object] js> { "db": "http://appengine.google.com/1.0/", "request": "1273661929373_75_61582614", "cpu": "n/a", "domains": [ "apis", "distributables", "events", "guides", "news", "releases" ] } #ctrl-c or cmd-c to exit -
embed
Embedding env.rhino.js in a Java application is relatively easy. The following pattern provides the general pattern.
#!java import org.mozilla.javascript.Context; import org.mozilla.javascript.ContextFactory; import org.mozilla.javascript.tools.shell.Global; import org.mozilla.javascript.tools.shell.Main; ... Context cx = ContextFactory.getGlobal().enterContext(); cx.setOptimizationLevel(-1); cx.setLanguageVersion(Context.VERSION_1_5); Global global = Main.getGlobal(); global.init(cx); Main.processSource(cx, "path/to/your/JSfile");
walking
-
Now that you are up and running, it's time to slow down and check outwhat Envjs has to offer. The goal of Envjs is simply to emulate the browser client-side javascript environment. All you have to do is load env.rhino.js, configure, and go.
-
load
Load the proper env.js file for your platform. Currently only rhino is supported on our primary github branch though our goal is to support arbitrary javascript engines on many hosting languages.
// assuming you ran rhino from the command line with -opt -1 // you can go ahead a load up env.rhino.js load('env.rhino.js'); // now you can load any additional utility scripts you might // be using to manipulate the page once loaded load('lib/jquery.sh');// if you don't want to bother invoking rhino with -opt -1 you // can also set the optimization level before loading scripts Packages.org.mozilla.javascript.Context. getCurrentContext().setOptimizationLevel(-1); // now load env.rhino.js and utility scripts load('env.rhino.js'); -
configure
Optionally you can turn on/off settings by passing an options object to Envjs as a function
In this example we enable loading and execution of anonymous and external javascript files by setting our scriptTypes, then we hook into qunit using 'afterScriptLoad' to detect its loaded.
Envjs({ scriptTypes: { '': true, //anonymous and inline 'text/javascript': true }, afterScriptLoad:{ "qunit": function(script){ var count = 0, module; // track test modules so we can include them in logs QUnit.moduleStart = function(name, settings) { module = name; }; // hookinto QUnit log so we can log test results QUnit.log = function(result, message){ console.log( '{%s}(%s)[%s] %s ', module, count++, result ? 'PASS' : 'FAIL', message ); }; // hook into qunit.done and write resulting html to a // a new file. Be careful to neutralize script tags so // opening the script in the browser allows the results // to act as a static report without re-running tests QUnit.done = function(fail, pass){ console.log('PASSED: %s FAILED: %s', pass, fail); //Writing Results to File jQuery('script').each(function(){ this.type = 'text/envjs'; }); Envjs.writeToFile( document.documentElement.outerHTML, Envjs.uri('results.html') ); }; } } }); -
go
Tell env.js to load an HTML file
//This is how we prefer you do it, just like the browser... window.location = 'some/file.html'; //The HTML application does not need to be local! window.location = 'http://example.com/some/file.html';
// The following are deprecated but documented // for backward compatibility Envjs('some/file.html'); // or equivalently you can take care of settings and location Envjs('some/file.html', { scriptTypes: { 'text/javascript': true } });
crawling
-
With Envjs, we refer to crawling loosely as the act of injecting script into an HTML application. This means you could be adding jQuery to the browser environment and grepping for all links, potentially storing them and following them, or more simply allowing script to run while monitoring the DOM or state changes or logging test results to the console.
Crawling can be achieved by combining a little shell scripting with Envjs script loading hooks or with pure javascript.
-
ajax
The safest way to crawl the wild wonderful web is to bootstrap the crawler with your own html page, load your favorite javascript libraries, and crawl with ajax. A basic bootstrap is very straight forward and much safer than assuming you can load any url with Envjs and trust it's not malicious...
In the following example we will use jQuery, but you should be able to easily adapt the examples for most libraries. Remember one of the strengths of Envjs is that you can automate a crawler with cron jobs and multiple threads, explore the guts of the web, and build your own search indexes with your favorite javascript library.
#!bash >js plugins/env.robot.js [ Envjs/1.6 (Rhino; U; Mac OS X i386 10.5.8; en-US; rv:1.7.0.rc2) Resig/20070309 PilotFish/1.2.13 ] deleted search index created search index indexed document http://localhost:8080/ indexed document / indexed document /releases indexed document /docs indexed document /support indexed document /news indexed document /release/envjs-1.0.x indexed document /events indexed document /doc/guides indexed document /doc/apis
/** * @author thatcher */ load('dist/env.rhino.js'); load('plugins/jquery.js'); function scrape(url, links){ // scrape text from current document which we will // assign weights to in our search index var data = { $id: encodeURIComponent(url), url: url, full_text: $(document.body).text(), title: document.title, headings: $('h1, h2, h3, h4, h5, h6').text(), description: $('meta[name=description]').attr('content'), keywords: $('meta[name=keywords]').attr('content').split(',') }; // find all the relavant links, but don't include any we // already have in our link array $('a[href]').each(function(){ var href = $(this).attr('href'); if($.inArray(href, links) == -1 && !href.match(/^(\s)*http|#/)){ //we only want to crawl local links links.push(href); } }); // save the record to our index $.ajax({ url:'http://localhost:8080/rest/index/'+data.$id, contentType:'application/json', dataType:'json', type: 'post', async: false, data: JSON.stringify(data), processData: false, success: function(){ console.log('indexed document %s', url); } }); } $(function(){ // delete the index to start fresh $.ajax({ url:'http://localhost:8080/rest/index/', contentType:'application/json', dataType:'json', type:'delete', async: false, success: function(){ console.log('deleted search index'); } }); // create the search index we will populate with // our simple crawl $.ajax({ url:'http://localhost:8080/rest/index/', contentType:'application/json', dataType:'json', type:'put', async: false, success: function(){ console.log('created search index'); } }); // create an array which we'll use // to store relavant links to crawl var links = []; // index this document scrape(document.location.toString(), links); // now crawl our links for(var i = 0; i < links.length; i++){ try{ // replaces this document with the document // from the link document.location = Envjs.uri(links[i]); scrape(links[i], links); }catch(e){ console.log('failed to load %s \n %s', links[i], e); } } }); window.location = 'http://localhost:8080/'; -
testing
Envjs provides an amazing bridge to facilitate continuous testing cycles for web applications. There are many ways to achieve this, but in our example we show how to use qunit to create tests a developer can run directly in the browser while their working, or run the cammand line with Envjs.
Of course if you can run them at the command-line you can run them as part of your continuous testing scripts.
You can run these tests right now by going to test. Below is the script, followed by the itself command to run them at the command line. Additionally we write the tests results to file which you can see here test results.
/** * @author thatcher */ load('lib/env.rhino.js'); load('local_settings.js'); var starttime = new Date().getTime(), endtime; Envjs({ // let it load the script from the html scriptTypes: { "text/javascript" :true }, // we dont need to load the commercial share this widget // for these continuous testing cycles, plus I like to // run my tests locally when I'm on the train without // a real network connection beforeScriptLoad: { 'sharethis': function(script){ script.src = ''; return false; } }, // we are also going to hook into qunit logging and // qunit done so we can write messages to the console // as tests run, and when complete can write the resulting // file out as a static report of test results afterScriptLoad: { 'qunit': function(){ //console.log('loaded test runner'); //hook into qunit.log var count = 0, module; // plugin into qunit QUnit.moduleStart = function(name, testEnvironment) { module = name; }; QUnit.log = function(result, message){ console.log('{%s}(%s)[%s] %s', module, count++, result ? 'PASS' : 'FAIL', message ); }; QUnit.done = function(fail, pass){ endtime = new Date().getTime(); console.log( 'RESULTS: ( of %s total tests )\n' + 'PASSED: %s\n' + 'FAILED: %s\n' + 'Completed in %s milliseconds.', pass+fail, pass, fail, endtime-starttime ); console.log('Writing Results to File'); jQuery('#qunit-testrunner-toolbar'). text(''). attr('id', '#envjs-qunit-testrunner-toolbar'); if(fail === 0){ jQuery('#qunit-banner').attr('class', 'qunit-pass'); } Envjs.writeToFile( document.documentElement.outerHTML, Envjs.uri(REPORTS + 'tests.html') ); }; }, // when writing our report we dont want the tests // to be run again when we view the file in a // browser so set script tags to non-standard type '.': function(script){ script.type = 'text/envjs'; } } }); window.location = 'http://localhost:8080/test'#!bash >js plugins/env.qunit.js [ Envjs/1.6 (Rhino; U; Mac OS X i386 10.5.8; en-US; rv:1.7.0.rc2) Resig/20070309 PilotFish/1.2.13 ] {data}(0)[PASS] apis : table-of-contents is syncronized: [object Object] {data}(1)[PASS] apis : hooks-intro is syncronized: [object Object] {data}(2)[PASS] apis : platform-intro is syncronized: [object Object] {data}(3)[PASS] apis : options-intro is syncronized: [object Object] {data}(4)[PASS] apis : hooks-beforeScriptLoad is syncronized: [object Object] {data}(5)[PASS] apis : platform-log is syncronized: [object Object] {data}(6)[PASS] apis : options-scriptTypes is syncronized: [object Object] {data}(7)[PASS] apis : hooks-afterScriptLoad is syncronized: [object Object] {data}(8)[PASS] apis : platform-location is syncronized: [object Object] {data}(9)[PASS] apis : options-logLevel is syncronized: [object Object] {data}(10)[PASS] apis : hooks-onScriptLoadError is syncronized: [object Object] {data}(11)[PASS] apis : platform-loadInlineScript is syncronized: [object Object] {data}(12)[PASS] apis : options-appCodeName is syncronized: [object Object] {data}(13)[PASS] apis : hooks-onInterrupt is syncronized: [object Object] {data}(14)[PASS] apis : platform-loadLocalScript is syncronized: [object Object] {data}(15)[PASS] apis : options-appName is syncronized: [object Object] {data}(16)[PASS] apis : platform-timer is syncronized: [object Object] {data}(17)[PASS] apis : options-tmpdir is syncronized: [object Object] {data}(18)[PASS] apis : platform-runAsync is syncronized: [object Object] {data}(19)[PASS] apis : hooks-onExit is syncronized: [object Object] {data}(20)[PASS] apis : options-os_name is syncronized: [object Object] {data}(21)[PASS] apis : platform-loadFrame is syncronized: [object Object] {data}(22)[PASS] apis : options-os_arch is syncronized: [object Object] {data}(23)[PASS] apis : platform-proxy is syncronized: [object Object] {data}(24)[PASS] apis : options-os_version is syncronized: [object Object] {data}(25)[PASS] apis : platform-writeToFile is syncronized: [object Object] {data}(26)[PASS] apis : options-lang is syncronized: [object Object] {data}(27)[PASS] apis : platform-writeToTempFile is syncronized: [object Object] {data}(28)[PASS] apis : options-platform is syncronized: [object Object] {data}(29)[PASS] apis : platform-deleteFile is syncronized: [object Object] {data}(30)[PASS] apis : platform-connection is syncronized: [object Object] {data}(31)[PASS] apis : options-javaEnabled is syncronized: [object Object] {data}(32)[PASS] distributables : env.rhino.1.0.x.js is syncronized: [object Object] {data}(33)[PASS] distributables : env.frames.1.0.x.jar is syncronized: [object Object] {data}(34)[PASS] distributables : envjs.1.0.x.jar is syncronized: [object Object] {data}(35)[PASS] distributables : env.rhino.1.2.11.js is syncronized: [object Object] {data}(36)[PASS] events : jquery-conf-2009 is syncronized: [object Object] {data}(37)[PASS] guides : table-of-contents is syncronized: [object Object] {data}(38)[PASS] guides : running-intro is syncronized: [object Object] {data}(39)[PASS] guides : walking-intro is syncronized: [object Object] {data}(40)[PASS] guides : crawling-intro is syncronized: [object Object] {data}(41)[PASS] guides : running-rhino is syncronized: [object Object] {data}(42)[PASS] guides : walking-load is syncronized: [object Object] {data}(43)[PASS] guides : crawling-ajax is syncronized: [object Object] {data}(44)[PASS] guides : running-frames is syncronized: [object Object] {data}(45)[PASS] guides : walking-configure is syncronized: [object Object] {data}(46)[PASS] guides : crawling-testing is syncronized: [object Object] {data}(47)[PASS] guides : crawling-scripting is syncronized: [object Object] {data}(48)[PASS] guides : running-console is syncronized: [object Object] {data}(49)[PASS] guides : walking-go is syncronized: [object Object] {data}(50)[PASS] guides : running-embed is syncronized: [object Object] {data}(51)[PASS] news : jquery-claypool-on-appengine is syncronized: [object Object] {data}(52)[PASS] news : jmvc-testing-triage is syncronized: [object Object] {data}(53)[PASS] news : blue-ridge-javascript-testing-rails-plugin is syncronized: [object Object] {data}(54)[PASS] releases : envjs-1.0.x is syncronized: [object Object] RESULTS: ( of 55 total tests ) PASSED: 55 -
scripting
Envjs brings a whole new level of productivity to javascripters in the work place. If you live and breathe your favorite javascripting library in the browser, you can now make use of those skills I a whole new way.
Given the follow task you could have written it several different languages, straight bash, python, ruby, perl, etc. But I also know I could write in half the time with half as many lines using jQuery because that's what I'm really good at. Now I finally can...
In the following example I need to crawl a big xml database splitting up a monolithic document into smaller parts, geocode each document based on title, then save the geocodes to a flat file and a json database.
/** * @author thatcher * break a monolithic xmldb document into smaller * documents with appropriate id's */ load('lib/env.rhino.js'); load('lib/jquery-1.4.2.js'); load('local_settings.js'); function get_count(){ var count = -1; $.ajax({ url:XMLDB_DUMP, contentType:'text/plain', dataType:'text', type:'get', data:{ _query: 'count(/collection/document)', _wrap: false }, async: false, success: function(howmany){ count = Number(howmany); console.log('xml collection has %s docs', count); } }); return count; }; function get_document_id(i){ var id; $.ajax({ url:XMLDB_DUMP, contentType:'application/xml', dataType:'xml', type:'get', data:{ _query: '/collection/document/document_id', _howmany: 1, _wrap: false, _start: i }, async: false, success: function(xml){ id = $(xml).text(); console.log('document %s id %s', i ,id); } }); return id; }; function copy_document(i, id){ // we arent manipulating the xml, just moving it around to // another part of the xmldb to reduce the overall collection // file size var xml_text; $.ajax({ url:XMLDB_DUMP, contentType:'text/plain', dataType:'text', type:'get', data:{ _query: '/collection/document', _howmany: 1, _wrap: false, _start: i }, async: false, success: function(text){ xml_text = text; console.log('copying document to %s', id); } }); $.ajax({ url:XMLDB_PROD + id, contentType:'text/xml', dataType:'text', type:'put', data: xml_text, async: false, processData: false, success: function(){ console.log('copied document to %s', id); }, error: function(xhr, status, e){ console.log('failed to copy %s %s', id, e); } }); }; $(function(){ var count = get_count(), current_id; // now crawl our xmldb for(var i = 1; i <= count; i++){ try{ current_id = get_document_id(i); copy_document(i, current_id); }catch(e){ console.log('failed to copy document %s \n %s', i, e); } } }); window.location = 'http://localhost:8080/'; -
readability
This example is prompted by work from Emre Sevinç who began to try to use Envjs to crawl web pages and use readability.js to output nice simple pages.