tech notes

At Sokrati we enjoy working on solving high-scale performance issues and have fun building systems that are fault tolerant and highly available. A system that deals with millions of keywords and transactions will eventually run out of juice if it isn't prepared for "everything"!

We would like to share with all our findings and tools that we built along the way.

Dainemo: Distributed Scheduling

Attention: open in a new window. PDFPrintE-mail

Last Updated on Monday, 19 September 2011 12:53 Written by Santosh Gannavarapu

User Rating: / 5
PoorBest 

There is always a need to run batch/offline jobs while utilizing resources efficiently. The initial attempts are to set aside pre-configured worker boxes that will pick up jobs either by a cron scheduler or by a pre-designated manager application that is responsible for kicking off processes. Assigning boxes statically would necessitate deploying boxes that may either run out of bandwidth or may be so under-utilized that you are paying for unnecessary CPU cycles.

 

There is a need for a job scheduler that will distribute the load uniformly across boxes whose CPU cycles are available while distributing the load such that every box in the fleet is efficiently used. Such a scheduler should be tolerant of host failures resulting in high-availability.

 

What is Dainemo?

After evaluating various open source schedulers we decided to build our own. While distributing load across boxes is the essential requirement in many there are subtle nuances such as honoring CPU load, distributed scheduling agent or priority that are missing in many.

 

Read more

   

Case Study

The client is one of the Top Job Portals in India, specializing in IT, Banking & FMCG recruitment. The core business philosophy is to deliver high quality.

 

View more