Archive for the 'scripting' Category


a small guide on choosing a scripting language

I hate posts that start with an introduction of what they’re about. If this were one of those posts, it would have started by describing what a scripting language is, and how it can make your life easier.

I’m gonna skip that and go right into a language comparison:

  • Perl
    1. every major company has at least a few scripts written in it
    2. decent speed
    3. you can debug code with/without an IDE ( you can use the Enbugger module to stop your script at runtime, and debug )
    4. there is more than one way of achieving a task ( you can find a lot of modules on CPAN )
    5. unless you have a pretty good working knowledge of Perl, writing good code may be hard. Also, writing object-oriented code is a bit more difficult in Perl than in other languages. EDIT:Moose probably is a great alternative.
    6. some of the cool modules you might want to use are very likely to be outdated ( for example Http::Recorder )
    7. you have source filters ( which means you get to create your own syntax )
    8. Perl REPL’s are not that user-friendly.EDIT:A REPL update from one of the comments Devel::REPL
    9. depending on the way you implement your objects, introspection can be a nasty task. You could, however use the Data::Dumper module to inspect any object.
    10. no support on the JVM or CLR ( or at least, not actively maintained ones ) EDITInline::Java may be something users should have a look at.
    11. I have no experience with making stand-alone executables from Perl code, but people say it can be done. See Par::Packer
  • Python
    1. you can place debugger calls in your code and the interpreter will stop there when it reaches them ( you use the pdb module
    2. great mature libraries
    3. almost everything is well documented
    4. if you don’t have access to an internet connection, and you don’t have documentation available, good libraries will provide documentation through the docstrings
    5. OO is pretty simple to do
    6. the language is pretty strict, and Python guys insist on one way of doing stuff
    7. ipython is a great REPL
    8. good code is pretty easy to write ( and maintain )
    9. the Python guys keep saying this phrase repeteadly “We’re all adults here”. This sucks, because anytime you’d like to do something in a more unusual/cool way, they will say that. Most of the time it’s “their way or the highway”.
    10. if you’d like to inspect an object, you can use the dir function:
      # let's assume we have an object named x, and we don't know anything about it
      for meth in dir(x):
         print meth # this will print all the methods/attributes the object has
    11. you can find a lot of modules here
    12. Jython covers the JVM aspect, and IronPython covers the CLR.
    13. You can easily turn scripts into stand-alone executables. One way of doing this is with py2exe
  • Ruby
    1. everything can be manipulated in any way you’d like. It’s pretty easy too.
    2. like Python, Ruby has a debug gem, ruby-debug you can use to place debugger calls in your code.
    3. even though some of the libraries Ruby has are great ( see WWW::Mechanize), a lot of them are very poorly documented
    4. Ruby has no docstrings, but has a great introspection mechanism:
      # let's say that the variable x is an object you know nothing about
      x.methods.each {|m| puts m}

      and this will show you all the methods an object has.

    5. OO code is a lot easier ( and more logical too ) than in Python/Perl. I mean, come on, decorators for static methods? And we really need the self parameter ( in Perl/Python )? I can see how that parameter would be useful when writing object oriented C code, but … people use it, and apparently, they are happy with that.
    6. you can find a lot of modules here
    7. non-technical persons can read most Ruby syntax ( and understand what it does )
    8. JRuby covers the JVM aspect, and IronRuby the CLR.
    9. There are a lot of tools to create stand-alone executables from your scripts. I’ve had great success with exerb and rubyscript2exe
  • Groovy
    1. you get to work with everything Java has to offer
    2. some nice alternatives/enhancements to Java tools , for example gant
    3. you kind of have to work with an IDE. As far as I know, there’s no way to debug code without an IDE, or other external tools. This makes things VERY hard to debug if you can’t use an IDE.
    4. if you work along with Java classes you have to be sure you’re working with updated class files.
    5. it’s a lot less verbose than Java, and pretty fun to write
    6. there is a GroovyConsole you can test code in. Almost all Java code is valid code, and can be ran from there. I use the console all the time.
    7. if you ever wanted to modify Java’s final classes, Groovy makes that possible.
    8. not being able to debug code without an IDE really sucks. Groovy people, please fix this.
    9. OO code is easy to do. If you know Java, you can get around in Groovy.
    10. My personal opinion is that Groovy code is kind of hard to maintain. Your mileage may vary.
    11. I have no idea how one would go about turning a script into a stand-alone executable.

Even though every language has it’s strengths and weaknesses, I suggest you learn at least 2. You should use the right tool for the right job, but try to limit the number of languages involved in a project to about 2-3. These are my conclusions after working with each of the languages mentioned above. Maybe you’ll find them useful if you’re trying to decide which scripting language to learn.


hacking python at runtime:a cool way of modifying your scripts

I recently discovered the _ast module. Using it, one can process Python’s syntax trees.

I’m going to illustrate the use of this module with a simple example. Here’s the code snippet we’re going to work on :

import geo
for file in os.listdir("."):
	print file

Let’s talk a bit about this snippet.

It’s kind of obvious this script has runtime errors. On my Python installation there’s no module named geo … chances are there’s not one on your system either 🙂
Another error would be the call to os.listdir. This is not an error per-se, because if you import the os module, no exception will get thrown at runtime. We can easily notice that the os module has not been imported.

Here’s what we’ll accomplish using the _ast module :

  • remove the import of the module geo
  • add an import to the os module
  • to make things more interesting, we’ll change the “.” argument of the listdir function to the “D:\\” drive

So, how do we get the syntax tree? Pretty simple :

import _ast

source_code = """
  import geo
  for file in os.listdir("."):
    print file
ast = compile(source_code,"<string>","exec",_ast.PyCF_ONLY_AST)

The compile function will build an AST object. The members of the AST can be accessed through the ast.body list. In this example, ast.body[0] will be the import statement, and ast.body[1] will be the for statement.

So, now we have an AST!

The first thing we’ll do is clone the import object. I’m doing this so that I don’t have to create an import object manually. I’m lazy, I know. If you don’t know how to clone a Python object, the following snippet illustrates it :

import copy
an_object_copy = copy.deepcopy(an_object)

With this import clone, I want to import the os module. But, since this clone still has geo as it’s argument, we need to change that. We change that with the following snippet :

os_import = copy.deepcopy(ast.body[0])
os_import.names[0].name = "os"

This object is now equivalent to the following code:

import os

This is nice, but we have to add it to the AST. I’ll take advantage of this to remove the import geo statement:

# remove the import geo statement
# we insert the import os as the first statement

Right now, the code is runnable. No exceptions will be thrown. Before we run it, let’s accomplish the final task too. Let’s modify the argument of listdir from “.” to “D:\\”.We know that the for object will be the second object in the list:

for_obj = ast.body[1]
# change the argument of listdir to D:\
for_obj.iter.args[0].s = "D:\\"

This changes the argument. In case the attributes I’m setting seem magic, you can find them out using Python’s introspection system ( it’s how I found them too ). You can use this system from ipython, or even your python interpreter, by calling the dir function on any object. This call will list the name of the methods the object has.
Now that we have modified the AST, we need to transform it into runnable code. We do that by calling the compile function :

code = compile(ast,"<string>","exec")

We can run the code with the exec function :

exec code

Here’s the full code of the script:

import _ast
import copy

def fix_source(source_string):
	ast = compile(source_string,"<string>","exec",_ast.PyCF_ONLY_AST)
	# clone the import object, so we can modify it
	os_import = copy.deepcopy(ast.body[0])
	# remove the import geo statement
	# change the import argument to os
	os_import.names[0].name = "os"
	# add the import os statement to the ast
	for_obj = ast.body[1]
	# change the argument of listdir to D:\
	for_obj.iter.args[0].s = "D:\\"
	# transform the AST into something runnable
	return compile(ast,"<string>","exec")
if __name__ == "__main__":
	source_code = """
import geo
for file in os.listdir("."):
	print file
	code = fix_source(source_code)
	exec code

I’m sure you can put this trick to use. It’s one of the greatest “hacks” I know.

I’ll try to post a Ruby alternative as soon as time allows me.


finding size of directories

I wanted to find out the size taken by some directories in a given folder. I didn’t like what du gave me, so I wrote this script :

require "find"

# directory we want to list size for
where = ARGV[0]
# open it do |dir|
	# iterate
	dir.each do |subfile|		
		# get a full path
		subfile = File.join(where,subfile)
		# initialize this to 0 for each new file
		size = 0
		# test to see if it's a directory
			# if it's a dir, we need to add the size of each of the files there
			Find.find(subfile) do |file|
				# skip if it's a dir
				next if
				# add the size
				# we rescue this, because some files might be locked
				size += File.stat(file).size rescue 0
			# this is a regular file, we initialize size to this file's size
			size = File.stat(subfile).size rescue 0
		# we print
		puts "size for #{subfile} is %.2f MB" % (size/(1024.0*1024.0))


python goes turbo

I read a very interesting article here. Can you imagine what big of a boost this will be ? As I understand, Unladen Swallow is two times faster than CPython , at this time. There’s a 5x speed increase planned for the project. w00t !.


perl headshot

Perl was the first programming language I learned. It felt great to be so productive with so few lines of code. Trying to provide code for this stackoverflow answer was pretty cumbersome. My first impression was : this is an easy task for File::Find, but it turned out I was wrong.

I may be wrong, but I think File::Find does something like this :

get a list of all files in directory passed as argument
foreach file
  call the supplied callback with the file as it's argument

Let me give you an example why this is not the desired output in this particular case, consider the following directory structure:


So, let’s assume this is the order in which we will get the files in File::Find’s wanted callback. We first receive the “a/” directory, which we capitalize as the question asked, so, after we’re done processing, the directory’s name will be “A/”. The next arguments the wanted function will receive will be “a/first.txt”,”a/b”,”a/b/another_file.txt”,”a/b/c/”, and not “A/first.txt”,”A/b”,”A/b/another_file.txt”,”A/b/c”. If we keep getting the arguments like that, we would have to split the path, check and see if the directory name has been capitalized, and then apply the rename.

Hard, right? It would have been easier to get the files while keeping count of the renamed directories. So, after 9 revisions to my post, I gave up on perl, and I rewrote everything from scratch using ruby. I could have accomplished the same thing in perl pretty easy,and I’m not blaming File::Find for this, but, overall, I felt that the language wasn’t working with me anymore.

… so, it turns out I’m not productive in perl anymore. I’m won’t use it anymore. No more perl references, globs, hashes, arrays, contexts, objects etc. I’ve got to the point where they just get in my way, and I don’t find them cool anymore.

Good luck, Perl! And thanks!

P.S: I’m pretty confident that this is the behaviour the author of the question wanted :). The ruby code can be found on the question’s page


building an IDE in small steps:language recognition

For my diploma project, I chose to do an “advanced text-editor”… something along the lines of an IDE. I’m writing it in ruby. At this point I have a GUI that provides almost everything I need. One of the things I thought my IDE would be cool to have is automatic language detection : you paste some source code in the editor, and it will highlight it BEFORE you save the file to disk. For this purpose I created the following class :

class LanguageDetector

	def declare_language_arrays
		# declare the language arrays
		@oop = ["ruby","java","c#","c++","scala","php"]
		@scripting = ["ruby","perl","php","python"]
		@all = (@oop+@scripting).uniq
		@text = ["text"]

	def initialize
		@score =		
		@language_map = {
			"public" => @oop,
			"private" => @oop,
			"protected" => @oop,
			"static" => ["java","c","c++","c#"],
			"void" => ["java","c","c++","c#"],
			"main" => ["java","c","c++"],
			"Main" => ["c#"],
			"class" => @oop + ["python"],
			"def" => @scripting - ["perl","php"],
			"begin" => ["ruby","pascal"],
			"end" => ["ruby","pascal"],
			"throw" => @oop,
			"throws" => ["java","c++"],
			"try" => @oop+["python"],
			"catch" => ["java","c++","c#"],
			"except" => ["python"],
			"String" => ["java"],
			"rescue" => ["ruby"],
			"redo" => ["ruby","perl"],
			"next" => ["ruby","perl"],
			"last" => ["ruby","perl"],
			"while" => @oop+["python","perl"],
			"for" => @all,
			"if" => @all,
			"else" => @all,
			"elif" => ["python"],
			"elsif" => ["ruby","perl"],
			"final" => ["java"],
			"del" => ["python"],
			"delete" => ["c++"],
			"free" => ["c"],
			"new" => ["java","c++","c#"],
			"in" => ["python"],		
			:default => method(:default_detection)
		# method that detects which language a token belongs to
		# this gets called if a token was not found in the map
		@default = @language_map[:default]

	def get_tokens(code)
		# return the tokens from the code sent as parameter
		return code.split(/\s+/)

	def get_score
		# return the score hash		

	def get_language(score)
		# process the score hash and return the element with the highest value;
		# should consider case with equal score languages
		max = -1
		language = ""
		# language is key, score is value
		score.keys.each do |key|
			# store the score of the language			
			language_score = score[key]
			# if it's bigger, we store it
			if language_score > max
				language = key
				max = language_score
		return language

	# handler for each token
	def process_token(token)
		# obtain the language array for each word
		languages = @language_map[token]
		# if languages array is nil, the token doesn't exist in the map
		if languages.nil?
			# obtain the languages by processing the token with the
			# language detector method
			languages =			
		# compute language score
		languages.each do |language|
			@score[language] += 1

	# detect a language based on the source code sent
	def detect_language(source_code)
		# split source code into tokens ( should use a lexer here )		
		words = get_tokens(source_code)
		# process each token
		words.each do |word|

	def default_detection(token)
		if token.start_with?("$")
			return ["perl","ruby"]
		return @text


It’s still “very incomplete” ( to say the least ), but I’ll continue to work on it and improve it. Here is how I envisioned something like this works : you split the code into tokens ( actual tokens, not by whitespace as I did here ), and you assign each token to a language. Each language has a “score” associated to it. When the language detector finishes with the last token, all that needs to be done is to obtain the key with the highest score from the score hash. Here is a snippet of how you could use it :

require "language_detector"

language =
language.detect_language("this is a test")
# this will output text
puts language.get_language(language.get_score)
# because I'm tokenizing based on whitespace,I have to put spaces between tokens
# this will change in a future version
language.detect_language("public static void main ( String [] args )")
# this will output java
puts language.get_language(language.get_score)

This class will be updated to provide better support for ( more ) programming languages really soon.


how to submit a form programmatically

I have received some comments about submitting a form from your applications, and I’ve decided to write an article about that.

There are a number of ways to accomplish this task:

  • you can use selenium ide firefox addon to record a session — this will generate all the code for everything you do inside your browser: clicking a button, filling out a certain field. This is an easy solution if you don’t know programming
  • you can use the firefox addon tamper data to find out all the names of the fields your browser is sending, and their values as well. A downside would be that you need to submit the form at least once.
  • you can use the firefox addon firebug to find the names of the fields in a form. A downside would be that it’s very likely to miss some of them because they are hidden. This is why I would recommend using tamper data together with firebug.

I will illustrate this process with screenshots and some code:

Here is what we will be submitting:
And this is the html for it:

<form action="whatever.php" method="POST">
	<label for="user">User</label>
	<input type="text" name="user"/>
	<label for="pwd">Pass</label>
	<input type="password" name="pwd"/>
	<input type="submit" value="login"/>

( please, don’t even bother telling me that this html code doesn’t respect the standards. I don’t care. This is for learning purposes only )

It’s a simple form made of three fields : username, password and the submit button. Open your favorite text editor and paste it in. Save the buffer to a file ending with .html extension, then open it in your browser.

I hope you installed tamper data and firebug, because now we’ll make use of them. We’ll start with firebug. If you’ve installed it, a bug like icon will appear in the lower right corner of the browser. If it’s coloured gray, it means it’s disabled, and you have to click it and enable all of it’s features. If you’ve succeeded in doing that, the icon should be now orange, with black stripes.

Right click the user field. The contextual menu should have the option “Inspect Element”, like in the following screenshot:
Click it. You should now see something resembling this picture:
Notice that the field’s name is “user”. If you do the same for the password field, you’ll see that it’s name is “pass”. In this example, this is redundant, because we already know the name of the fields. However, in the real-world, you will not, and you should follow the steps showed here. Here is the code we have so far :

require "rubygems"
require "mechanize"

mech =
# i'm loading this file locally
# in real-life you would provide the url of the page containing the form you want to submit
# obtain the form object
# because this page contains only one form, it's obvious we request the first one
# if the page contained more than one form, you would have iterated over the forms
# and selected the one containing the fields you needed
form =
# and now we complete the fields
# username first
# the order in which you complete this form is not important
form.user = "geo"
# and now the password
form.pwd = "mypassword"
# submit the form
# do whatever you want to with the returned page

If you run this code you’ll notice that it works ( that is, if you configured the action parameter to something real. If you haven’t, you’ll get a 40* error code, which still means that it works – this error will appear because the script needed to handle the form wasn’t found )

Usually, before submitting a form, you should use tamper data to make sure you’re sending all the parameters. So, open the website in firefox, fill out all the fields in the form, go to the “Tools” menu entry of your browser, click “Tamper Data”, like in the following screenshot :
If you did this, a new window will appear on your desktop :
Click “Start tamper”, and then submit your form ( click on login/submit/search/whatever ). After you’ve done this, something like this will appear :
Click Tamper. This is what you will see next :

In this example, this is exactly what we expected to see. Just the user and pwd fields are sent. However, in the real-world, you’ll see that usually more parameters are needed. Use tamper data before you start writing your code.

I like using mechanize for this sort of stuff, because it really makes this sort of tasks easy for you to handle. You can apply what you’ve learned here to whatever “mechanize-like framework”.


Blog Stats

  • 215,219 hits