The Deacon Project Droid+Beacon – Open-Source Push Notifications for Android & Java

27Jan/114

Burn-in testing Meteor with Deacon

On a few occasions in blog comments and on the Deacon Project mailing list, interested folks have asked just how many Deacon clients a Meteor server can handle. The discussions have mostly revolved around theoretical limits imposed by Meteor's codebase, or the Linux OS under which it runs. Rather than perpetuating the discussion on hearsay and conjecture, I decided to write up a little Meteor test suite and commit it as part of Deacon to enable some objective experimentation.

My test, which is located in the source tree's org.deacon.test package, performs some simplistic load testing of the Meteor server using a whole bunch of DeaconService instances. It runs in pure Java, so it can be run from your PC rather than an Android device. Along with some delays to prevent race conditions, the first test - aptly named the Simultaneous Subscribers Test - successively generates tens of thousands of Deacon instances, connects them all to the same channel on a single Meteor server, and measures the worst-case push notification turnaround times by sending pushes containing the system time in milliseconds as the payload. When the pushes are received, the payload times are compared with the system's new current time to provide a latency measurement. The Simultaneous Subscribers Test runs until the push latency reaches a pre-defined maximum (default: 1 second), or a maximum number of subscribers are created (default: 1 million).

I'm working on a second test to prod Meteor for its maximum achievable channel count. As subscribers are successively added on the client side, each will join a new channel. Test push messages - again containing CurrenTimeMillis() payload data - will be sent to each channel, and latency measured until a maximum latency or channel count is reached.

As far as the architecture required to support widely-adopted mobile applications goes, these tests are very simplistic. In my case, they measure latency over a LAN (which is good, as it reduces the influence of intermediary bottlenecks on the results) but all pushes terminate at a single machine (which is bad, as they must be crammed through a single network interface and CPU). A distributed test, however, would require far more effort, infrastructure and coordination - so the tests in this little suite offer a good first-order look at Meteor's capabilities when used as a mobile push server. Another caveat to these test results is the way latency is measured: while using the System's CurrentTimeMillis() value as a payload is clever, it inherently lumps together the latency of push client delivery with that of connecting to the channel controller and delivering the push in the first place. I believe this is responsible for many of the outliers in the chart below.

So how'd it do on my runs? Squeaking along on my 1.8GHz Celeron-powered server (an aging but trusty Dell Poweredge 600SC with 512MB of RAM), I was able to connect 32,768 simultaneous Meteor subscriptions with a maximum latency of about 0.7 seconds, which is where I had configured the test to stop. The file descriptor limit on this machine (/proc/sys/fs/file-max) is 65,536 with only 800-or-so open descriptors at idle, so I imagine I had a lot more headroom remaining. Last night, I added a charting library and re-ran the test, this time setting the stop value around 25,000 (I wanted it to complete before I woke up this morning). The results: 25,000 instances saw a maximum latency under half-a-second, with a typical latency between 40 and 100ms:

Simultaneous Subscribers Test Result - Click for much more detailed version

This chart shows the worst-case latency observed at each subscriber count. As you can see, the results are distributed largely bimodally between around 40ms and 100ms values, with higher outliers. The linearity of this chart seems to indicate that extrapolation to far more client instances would be entirely feasible.

My next step is to complete the channel count test as well as revamp the stop conditions for this test so that it can be run over longer periods and accumulate a much larger quantity of clients. Further, I'll downsample the chart data and add error bars, so that is conveys more meaningful data with fewer raw datapoints. Stay tuned!

[Image credit: Wikimedia Commons]

About Dave

Dave Rea is an upstate-NY engineer specializing in embedded systems. He holds a BS degree in Electrical Engineering and a MS degree in Software Engineering, both from the Rochester Institute of Technology. Dave is an open-source enthusiast, totes an HTC Droid Incredible, and runs Ubuntu Linux. You can find more on Dave at daverea.com or LinkedIn.
Comments (4) Trackbacks (0)
  1. Thanks for doing this testing Dave. I’m thrilled to see some objective results to contrast with the theoretical and/or subjective discussions I’ve seen up to this point.

    There is one observation I would like to point out about your result of 32,768. This number is exactly one half of 65536 (2^16). 65536 is the file descriptor limit on your machine, as you mentioned, and it is also the number of TCP ports available to one network IP. I doubt this is a coincidence and I wouldn’t be surprised if meteor’s architecture actually is the cause for this observed limit.

    Digging a little deeper could be illuminating. For example, if the number of available ports is the limiting factor, then one could increase the number of subscribers one server can support by splitting the subscribers between multiple IP aliases.

  2. Thanks for your interest, Adam! I agree that more testing is definitely needed. Unfortunately, the test is somewhat fragile; delays must be introduced between test messages, and my tries at adding more than one Deacon instance at a time have been unsuccessful. So the limits of what I’ve been able to test so far have been largely imposed by the time available!

    I’m working on improving the test not only so that it can be run more quickly, but also to use PTP so that the client and server clocks are synchronized, to eliminate the influence of push send time on the results.

  3. This is a great test. We are using Meteor for our website (www.wethepixels.com) and were trying to do some load testing with Meteor as well.

    If I am understanding this test correctly, it looks like you are sending 1 message to the x number of subscribers. The test we were trying to run is if all x number of subscribers were sending events (through the server) to all x number of subscribers. This would simulate if all subs did something at the exact same time. So for 5 subs, the Meteor server would have 25 messages to send.

    This was our idea of a real “worse case scenario.” I wonder how Meteor would handle the 23k+ for that?

  4. Yes, the current test is a parallel subscribers test, but I’m also working on a concurrent channels test…it is in the test file but not currently enabled…


Leave a comment


No trackbacks yet.