The 7 main actions we took to improve the Rails stack performance at Justin.tv

Here are the slides of the talk I gave at the San Francisco Rails meetup group yesterday about the work we have done on improving Rails performance at Justin.tv

Enjoy!

  • Share/Bookmark

Avoid memory leaks in your ruby/rails code and protect you against denial of service

We heard a lot about that Ruby is cool cause we do not have to care about memory, the garbage collector does it for us. Well, that’s kind of true, but this does not mean we can write code without keeping in mind on what’s is going on under our ruby code.

Ruby symbol memory leak

We all know that using symbols instead of strings is a good practice to have, it’s faster and it saves your memory. Yes but at what price ? Symbols are faster in part cause they are created just one time in memory, that’s great ! But then ? they will stay forever in memory

That means, do not convert everything in symbol ! Be sure to well know what you are converting in symbol.

Example: Somewhere in your app, you apply a to_sym on an user’s name like :

hash[current_user.name.to_sym] = something

When you have hundreds of users, that’s could be ok, but what is happening if you have one million of users ? Here are the numbers :

kwi:~$ irb
ruby-1.9.2-head >
# Current memory usage : 6608K
# Now, add one million randomly generated short symbols
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s).to_sym }

# Current memory usage : 153M, even after a Garbage collector run.
# Surprisingly, on Ruby 1.8.7-p249,
# the VM only grow up to 33M, but that's still a lot !

# Now, imagine if symbols are just 20x longer than that ?
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s * 20).to_sym }
# Current memory usage : 501M

Furthermore, NEVER convert non controlled arguments in symbol or check arguments before, this can easily lead to a denial of service.

Example: You have a website with a locale parameter in order to localize your content and you have something like this in your application controller:

before_filter :set_locale

def set_locale
  I18n.locale = params[:locale].to_sym
end

It’s really simple to call thousand of times your website with a long params[:locale] and make your application bloat !

By the way, it looks like the I18n gem converts automatically the locale in symbol, so be sure to check if the locale is valid before assigning it !
Here is the link: http://github.com/svenfuchs/i18n/blob/master/lib/i18n/config.rb#LID8

If you need to control your number of allocted symbols in your app, you can use Symbol.all_symbols.size. Add this to your log to see if you are leaking symbols over time ! (This can be a good measure to add in Newrelic; Newrelic guys, are you reading ? :)

Reference to objects leak

This leak is a fake one but can grow rapidly in your app.
It happens when you keep a variable in your code referering objects, and these objects are also referencing objects, and etc…

This often happen when using $variable or @@variable as they stay forever in memory.
Here is a little example :

# Memory usage at irb launch: 6320K

class HelloIamLeaking
  @@an_array = []

  def initialize()
    # Put something big in the array
    @@an_array << "hello world" * (4**10)
  end
end

x = HelloIamLeaking.new
x = nil # So no more HelloIamLeaking instance in our code
GC.start # Run the garbage collector to be sure this is real !
# Memory usage after : 17M

ruby-1.9.2-head > ObjectSpace.each_object(HelloIamLeaking) {|x| p x }
 => 0
# So we have no more instance of HelloIamLeaking
# but the class variable remains in memory.

Ok, this is a completely logical and dumb example but this show you the principle.

And this can grow exponentially if you have objects linking to huge array or datasets, they will never be garbage collected if just one object in your code is still referencing the source object.

This will consume your memory, but not only, this will also consume your cpu time as when the garbage collector runs, it looks on every single object, and the more objects you have, the more it spent time looking at them …

To resume reference leak : with time, it’s grow in memory and slow down dramatically the garbage collector running time.

If you want to read more about reference leaks, read the awsome post descent into darkness on the blog of Joe Damato.

Update: Find this leaks easily using the memprof gem and by using memprof.com (awsome stuff again by Joe Damato)

My app is still bloating !

After that, if you have still ruby/rails process bloating, be sure to use the latest version of gem that are using C code, they can be an easy source of memory leak.

And, this is obvious, but be sure to not load huge dataset in memory at one time ! (use find_in_batch instead for example)

Then, If you want more control over the memory allocation, here is a good link for tune up the heap easily and control your ruby process growth.



Thanks for taking the time to read and I hope this article will help you to reduce your memory consumption !

  • Share/Bookmark

Introducing BrB, extremely fast interface for doing distributed ruby

BrB is a simple, transparent and extremely fast interface for doing distributed ruby easily.
It’s inspired from the original Ruby Drb library (Distributed Ruby) but it is build on top of EventMachine for performance.

The concept

BrB use a simple concept : Create an object instance and expose it to the world.
Any other ruby process will be able to call method on that object after having created a communication tunnel.

  • It’s simple as a method call
  • It’s efficient, by default BrB do simple message passing (no return value)
  • You can pass over network every object that is dumpable through Marshal

Example 1 – Simple communication

Start communicating between your different Ruby processes in two easy steps :

Start accepting connections :

class ExposedCoreObject
  def simple_api_method(parameter)
    puts " > Receive #{parameter} in the main ruby process"
  end
end

EM::run do # Start event machine
  # Start BrB Service, expose an instance of core object to the outside world
  BrB::Service.instance.start_service(:object => ExposedCoreObject.new, :host => 'localhost', :port => 5555)
end

In any other ruby process, start communicating :

# Create a communication tunnel to the core process
# nil as first parameter as we do not expose any object in exchange
core = BrB::Tunnel.create(nil, "brb://localhost:5555")

core.simple_api_method('a message')
# Results :
# On core process :  "> Receive a message in the main ruby process"

At the current point, the client call the simple_api_method on our core process.
All the ruby magic is happening again, and number of processes communicating that way is unlimited !

Example 2 – Both side communication

Our previous example was great, but clients can receive method’s call too.

Core code :

EM::run do # Start event machine
  # Start BrB Service, expose an instance of core object to the outside world
  BrB::Service.instance.start_service(:object => ExposedCoreObject.new, :host => 'localhost', :port => 5555)  do |type, tunnel|
    # Get alerted that a new connection has been made :
    if type == :register
      tunnel.say_hi_in_return('I am the core saying Hi')
    end
  end
end

Client code :

class ExposedClientObject
  def say_hi_in_return(s)
    puts " > Core says : #{parameter}"
  end
end

# That time, we are exposing an object.
core = BrB::Tunnel.create(ExposedClientObject.new, "brb://localhost:5555")
core.simple_api_method('a message')
# Results :
# On client process :  "> Core says : I am the core saying Hi"
# On core process :  "> Receive a message in the main ruby process"

That’s it, our both processes are now communicating each others, it’s completely transparent as it’s just work like normal ruby method calls.

Example 3 – Waiting for a return value

By default, calling a method on a distant object is not blocking. That means that it do not wait for any return value. But sometimes, it’s useful to get a return, in order to do this, just add _block at the end of the method name.

core = BrB::Tunnel.create(nil, "brb://localhost:5555")
ret = core.simple_api_method_block('a message') # Wait for the return

What BrB is usable for ?

  • Doing Simple message passing between ruby processes.
  • Connecting hundred of ruby processes transparently.
  • Building a real-time scalable (game) server
  • Taking important load on a server easily just by distributing the load on multiple BrB instance.
  • Taking advantage of multi-core and multi-threaded systems.

If you want to know more about BrB, go to the BrB github.

  • Share/Bookmark

String concatenation performance – Ruby Tricks #02

When it’s come to make string concatenation that you use hundred time in your every day projects, you have the choice in Ruby !

Most common cases :

"Hi #{login}"

'Hi ' + login

s = 'Hi '
s += login

s = 'Hi '
s << login

But, all these methods for concatening strings does not really behave the same :

First case, += VS << :

s = 'Hi '
s += login

The + operator for strings create a new string object by concatening two strings, here ‘Hi ‘ and login. So we have instanciated two strings in order to just get one.

s = 'Hi '
s << login

On the other hand, the << append directly the content of the second string in the first string, so you do not re-instantiate a new string. But you modify your first object, so be careful especially when it comes from a variable.

Second case, + VS #{} :

'Hello ' + 'ruby ' + 'world'

Create the ‘Hello ruby ‘ string then re-create the last string : ‘Hello ruby world’
=> So create unecessary strings.

"Hello #{'ruby '}#{'world'}"

Directly create the full string ‘Hello ruby world’ without an intermediate state like seen before

Conclusion

  • Privilegiate << when you can !
  • Use the “#{}” concatenation manner when you concatenate more than 2 strings together.
  • Share/Bookmark

Symbol#to_proc – Ruby Tricks #01

First tricks today, here is an easy one :

If you are using Active Support (shipped with Rails), or a ruby version superior or equal to 1.8.7, you can use the symbol proc shortcut :

Here is the standard way declaring a block :

>> ['a', 'b', 'c'].collect {|letter| letter.capitalize}
=> ["A", "B", "C"]

Here is the handy method :

>> ['a', 'b', 'c'].collect(&:capitalize)
=> ["A", "B", "C"]

But, keep in mind that the shortcut method is a little bit slower in term of performance than the normal way cause it creates a new Proc on each call !

Benchmark :

t = Benchmark.realtime do
  (['a'] * 1000000).collect(&:to_s)
end
puts "Time using to_proc: #{t}"

t = Benchmark.realtime do
  (['a'] * 1000000).collect do |e|
    e.to_s
  end
end
puts "Time using normal block: #{t}"

# Time using to_proc: 0.631899118423462
# Time using normal block: 0.246822834014893
# Results are the same if you test the normal block first
  • Share/Bookmark