keeping things simple (hopefully)
December 16, 2008
Filed under: Ruby — Jason @ 3:57 pm

Some scraping code I did for fun last night. I used Nokogiri and Ruby to scrape this site for the names and photos. Take a look at the code if you want to mess around with it.

I wrote this is code b/c I didn’t want to click through the 100 pages of the top 100 movie characters site. I mean, it’s like 100 clicks. I just wanted to see the name and the pictures. The following are the results.

1. Tyler Durden

2. Darth Vader

3. The Joker

UPDATE (2-19-2010): I moved this to another page b/c it was taking too much space and annoying when looking at later posts.

September 23, 2008
Filed under: Rails, Ruby — Jason @ 4:49 am

I recently needed to divide a signup form into to 2 separate pages in order to neatly organize the flow. I went through a bunch of different techniques before I came up with a solution I was happy with. Below I outline each of those techniques.

My initial thought was to simply save this data into a cookie. I determined this to be a bad idea b/c although unlikely, the data might exceed 4KB, which is the limit for cookies as far as I know (anyone know where this comes from?). I didn’t want to run into any possible cookie overflow errors.

The second idea I had was to save the record to the database but keep the record in an inactivated state. This solution annoyed me because, if a user abandoned the process, there would be orphaned records. This would require some sort of cron script to clean it up–more work than I am willing to do. Also, I would have to save data without going through all the validations, which didn’t smell right.

My third solution was to serialize the parameters as YAML and store them in hidden text area boxes on the page. Having to marshal the params hash to and from YAML is somewhat overkill. Additionally, this method started to see some breakage once the there were validation errors on the second submission form. Since I was passing YAML to the next page, the serialization of the parameters nested the data. For example, a simple hash serialized in YAML looks like this:


--- !map:HashWithIndifferentAccess
name: Jason
address: !map:HashWithIndifferentAccess
city: New York
zip: "10009"
street1: 123 Main Street
street2: Apartment 8
state: New York
email: foo@bar.com

After a few validation errors it would look like this:


--- |+
--- |
--- !map:HashWithIndifferentAccess
name: Jason
address: !map:HashWithIndifferentAccess
city: New York
zip: "10009"
street1: 123 Main Street
street2: Apartment 8
state: New York
email: foo@bar.com

My final solution, and probably the simplest, was to take the entire params hash and turn that into hidden fields. So I wrote this little helper function to handle that.


def params_to_hidden_fields(params, scope=[], depth=0, options={})
  # Reject parameters you don't want to stay persistent
  reject_list = %w(action controller authenticity_token)
  reject_list = reject_list + options[:reject] if options[:reject]
  params = params.reject{|key, value| reject_list.include?(key)}
  puts reject_list

  #The final output to return
  output = ""

  # Cycle through each object in the hash
  params.each do |key, value|
    # If the value is a Hash, recursively call this function on that Hash
    # otherwise turn it into a hidden field
    output << if value.class == HashWithIndifferentAccess
      "#{params_to_hidden_fields(value, scope + [key], depth+1, options)}"
    else
    # This conditional sets the scope for the hidden fields.  Nested objects in
    # Rails are displayed like this:
    #    <input type="hidden" name="main_object[:subj_object][:key]" id="main_object_subj_object_key" value="value" />
    # so we need keep track of the parent calls
    name = if scope.empty?
      "#{key}"
    else
      scope.first.to_s + scope[1..scope.length].inject(""){|sum, crumb| "#{sum}[#{crumb}]" } + "[#{key}]"
    end

    # Same as technique as above but for ID, the Rails way
    id = if scope.empty?
      "#{key}"
    else
      scope.first.to_s + scope[1..scope.length].inject(""){|sum, crumb| "#{sum}_#{crumb}" } + "_#{key}"
    end

    # Basic output
    "<input type=\"hidden\" name=\"#{h name}\" id=\"#{h id}\" value=\"#{h value}\" />\n"
    end
  end
  output
end

I’ve commented the code, but here’s an explanation of the parameters for the function

PARAMETERS

  • params: A Hash in which to turn into hidden fields
  • scope: An Array for handling nested hashes within the recursion
  • depth: An Integer for counting the depth of nesting. Can be used for formatting purposes
  • options: A Hash of options. Right now it only handles the :reject option which is an array of keys in which to ignore.

This works great. The only issues I had were nested hashes and the parameters for action, controller, authenticity_token. I handled the nesting issue by recursively handling any values that were hashes. I handled the unwanted param values by stripping them out.

This is the output of the params_to_hidden_fields method with the same hash as the YAML above.


<input type="hidden" name="name" id="name" value="Jason" />
<input type="hidden" name="address[city]" id="address_city" value="New York" />
<input type="hidden" name="address[zip]" id="address_zip" value="10009" />
<input type="hidden" name="address[street1]" id="address_street1" value="123 Main Street" />
<input type="hidden" name="address[street2]" id="address_street2" value="Apartment 8" />
<input type="hidden" name="address[state]" id="address_state" value="New York" />
<input type="hidden" name="email" id="email" value="foo@bar.com" />

I would love to hear any thoughts or questions on this approach. Also, if you have any ideas feel free to post them up.

UPDATE (2-19-2010): I just came across this post which kinda does the same thing. Just thought it interesting I never saw it before. Could have saved me some time I suppose: http://marklunds.com/articles/one/314

August 22, 2008
Filed under: Ruby — Jason @ 4:15 pm

When trying to start out with using scRubyt, I started immediately got slammed by a stupid gem dependency error. This happened to me a few times a while back and I forgot how to fix it. I remember one time I had to uninstall a particular gem version in order to get it to work and realized another app needed. Ugghh.

Anyway, the gist of the problem is this: if you already have RubyInline installed and it’s greater than version 3.6.3, you’re gonna get this error:

Gem::Exception: can't activate RubyInline (= 3.6.3), already activated RubyInline-3.7.0]

or something similar.

After some google-fooing I came across this post where Ryan Davis gives a clue on what to do. The code snippet below works great with both versions of RubyInline installed. If you don’t even have RubyInline 3.6.3 installed then just install via this command sudo gem install RubyInline -v 3.6.3

Then remove the current require statements


require 'rubygems'
require 'scrubyt'

and replace with


require 'rubygems'
gem 'RubyInline', '= 3.6.3'
require 'scrubyt'

Good luck scraping!