Message Queues and Gemstone

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Message Queues and Gemstone

YossiDM
I was wondering if anyone has experience or information about the performance and architecture for building highly scalable messaging solutions on Gemstone. I think our initial design would be to scale out of course using Gems horizontally across several machines (commercial version) and keep what we need cached as much as possible. We're starting with GLASS and then working our way up from there. My concerns have to do more with the stone, namely that there is only one.

I am aware that things transactions get queued and then are eventually written to disk. I want to avoid a backlog of writes if possible. I was thinking for the amount of messages (think high-traffic XMPP server) we will be receiving per second, we would definitely have to look to something like RabbitMQ and using service-like gems. We can control the flow from RabbitMQ as far as when things get persisted and flood control the writes to prevent us from taking our own servers down. Is there a better way to do it with Gemstone? Is RabbitMQ even needed or is this just solving a problem that doesn't exist?

We're trying to simplify things as well by avoiding updates whenever possible. I think most of the time we will just write a new version rather than any complicated update logic. The problem still remains though that the one stone is the weak link.  Note also that we need to be able to make this data available to read as fast as possible as well and it's hard to have the best of both worlds for sure.

On other systems, I'd handle some of these problems by sharding the database, but I am not aware of the best way to do this with Gemstone. I've heard people mention before having multiple instances, but it seems like we would lose a lot of the benefits of transparent persistence and leveraging the SPC. I don't want to have to manually federate queries and writes, but it sounds like if went that way, we'd need to build a layer to handle all of that.

Any ideas or thoughts? How do other customers handle extremely high real-time read/write volume?
Reply | Threaded
Open this post in threaded view
|

Re: Message Queues and Gemstone

Dale Henrichs
Yossi,

A bunch of good questions and detailed answers would require some
detailed information about what you are doing, but I will try to give
you some high level answers ...

The large commit volume customers tend to scale on big hardware these
are guys that are pushing way beyond 10K commits/second .. lots of cpu
lots of ram, very fast disks ... everything is basically in memory and
i/o for transaction logs is the bottleneck...

If you put the network in the middle somewhere, you are limited by how
much data you can push around the network for commits ... depending upon
commit size, commit rate and disk speed the network could be the
bottleneck...

So the basic question for scaling GemStone is how many commits/second
and how big are the commits. That will set up your architectural limits.

Note that depending upon the size of the objects involved creating new
objects and changing old objects are very similar in cost from a
transaction perspective. A modified or new object is copied from vm
memory onto a different page in the shared page cache as part of the
commit.

In addition to the copy, the changes to the object and any new objects
are pushed to the tranlog. The tranlog must be flushed to disk for the
commit to be finished (this is where the disk i/o bottleneck shows up)...

At a checkpoint, the stone guarantees that all dirty pages in the cache
that contain objects that were part of any transaction preceding the
checkpoint are written to disk. To avoid filling the shared page cache
with dirty pages, you need to be able to write the dirty pages from the
spec at a fast enough rate and the scaling factor here is to add more
disk spindles and more aiopageservers so that pages can be written in
parallel to disk while avoiding disk i/o contention.

I would say that technology-wise we are poised on the brink on taking
serious looks at the problem of managing the distribution of data across
multiple stones ... which is another way of saying that we don't have
any solid answers right now...To date the focus has been on making the
single stone robust and fast enough so that sharding isn't necessary for
performance reasons...

I know that we have commercial customers who have claimed that when they
have run comparison tests against large well-known rdb vendors that they
needed about 10x the hardware to get the same performance:)

So, one stone is not necessarily a weak link, it is a strength as well...

How interconnected is the data that you are storing ... is it a highly
complex interconnect rats nest of object relationships or is it vast
quantities of relatively isolated object graphs? Isolated object graphs
might be easier to manage in a roll your own sharding scheme than one
massive object graph...

Dale

On 02/25/2011 09:03 AM, YossiDM wrote:

>
> I was wondering if anyone has experience or information about the performance
> and architecture for building highly scalable messaging solutions on
> Gemstone. I think our initial design would be to scale out of course using
> Gems horizontally across several machines (commercial version) and keep what
> we need cached as much as possible. We're starting with GLASS and then
> working our way up from there. My concerns have to do more with the stone,
> namely that there is only one.
>
> I am aware that things transactions get queued and then are eventually
> written to disk. I want to avoid a backlog of writes if possible. I was
> thinking for the amount of messages (think high-traffic XMPP server) we will
> be receiving per second, we would definitely have to look to something like
> RabbitMQ and using service-like gems. We can control the flow from RabbitMQ
> as far as when things get persisted and flood control the writes to prevent
> us from taking our own servers down. Is there a better way to do it with
> Gemstone? Is RabbitMQ even needed or is this just solving a problem that
> doesn't exist?
>
> We're trying to simplify things as well by avoiding updates whenever
> possible. I think most of the time we will just write a new version rather
> than any complicated update logic. The problem still remains though that the
> one stone is the weak link.  Note also that we need to be able to make this
> data available to read as fast as possible as well and it's hard to have the
> best of both worlds for sure.
>
> On other systems, I'd handle some of these problems by sharding the
> database, but I am not aware of the best way to do this with Gemstone. I've
> heard people mention before having multiple instances, but it seems like we
> would lose a lot of the benefits of transparent persistence and leveraging
> the SPC. I don't want to have to manually federate queries and writes, but
> it sounds like if went that way, we'd need to build a layer to handle all of
> that.
>
> Any ideas or thoughts? How do other customers handle extremely high
> real-time read/write volume?

Reply | Threaded
Open this post in threaded view
|

Re: Message Queues and Gemstone

Monty Williams-3

> I know that we have commercial customers who have claimed that when they
> have run comparison tests against large well-known rdb vendors that they
> needed about 10x the hardware to get the same performance:)

i.e. the RDB side needed 10x the HW

I've heard that number from several large customers. One where the RDB company gave them the HW, and one where they told us moving from an RDB to GemStone increased performance by 10-15x.

Of course, YMMV. There are apps where RDB's are an excellent choice. Generally those aren't ones that store complex objects, though.

As Dale says, you need think through a lot of detail to figure the optimal solution. Generating some tests that you can actually measure is always a good start.

-- Monty
Reply | Threaded
Open this post in threaded view
|

Re: Message Queues and Gemstone

YossiDM
In reply to this post by Dale Henrichs
Thanks Dale and Monty. One of the largest reasons we want to go with Gemstone is precisely the fact that we know it should be able to handle dealing with our kind of data much better than an RDBMS. We're loving GLASS and Gemstone in general for how easy it us to work with, Smalltak + Seaside support from you guys, transparency, and the speed. My worry is that at some point we may cripple ourselves with our architecture if we don't consider how far it can scale now. The plan is definitely to run lots of high volume tests.

You mentioned specifics might help, so I'll give you a brief description.

We're building primarily a messaging application. There's more to it, but the core is conversations. We are building a sort of hybrid system of a specialized graph database on top of Gemstone. The vertices and edges will all carry some basic properties and some optimizations for search/traversing purposes. They will also carry a payload of a regular object. I guess you could say that the vertices and edges describe the structure of how these objects are related. We generally wanted to avoid a system where everything had to be inherited from an edge or vertex just to participate in the graph. Nonetheless, if testing forces us to go that way for speed purposes, we might. We really want to think about locality of reference, keeping things on the same page that are frequently used together, etc.  

To answer your question, in a sense the data is a rat's nest of connectivity. I think the one advantage here though is you can use graph methodologies like clustering and centrality measures to determine where there are highly interconnected objects. The strategy in that case would be to give those objects a shard, then if you cross shards via a looser set of connections, ship over the context and continue traversals or other operations. It's a tough problem and not pretty.

Incidentally, a graph DB on top of an object DB design exists and is available from one of your competitors that deals with the sharding issues. We are still honestly considering their product, but we're not pleased with the lack of feedback and information from their team and we'd rather work in Smalltalk than Java or even other JVM languages anyway. The persistence strategy is also far from transparent (and requires significant object-level changes). When you persist objects, there's essentially a worker strategy for where to put things, and you can also periodically heal the graph in the background with some code.

Anyway, let me briefly explain how we are dealing with messages. Messages are received from IM, SMS, web interfaces,  web services, and email and stored in this graph (as a payload). Each message is linked with other data in the graph. This brings up three problems related to how well our site performs - speed of message intake, volume, and velocity to share that data with other users (i.e. read the writes). I suppose a compromise for scalability could be delaying when these messages are persisted and somehow faking it via broadcast like regular IM, but I see some problems with this from a user point of view in our app.

So far we've done some prototyping in .NET, Java, Ruby, and Smalltalk. We know our application itself is sane, works, and does what it does well. We now need to build a real version that scales and has a pretty UI. We've also got other pieces of Gemstone and Smalltak code in general that have overlap with this project.

The next step of course is finalizing our architecture, so the feedback you've given matches what I thought and I think will help us get starting with how to best scale and punish Gemstone in testing. I've taken a look at the older siege numbers you placed on your blog and we've tested Gemstone against simple cases. We did some of the obvious like throwing SSD at it just to get a feel for the speed and it's definitely impressive.

We've got some other challenges ahead like full-text indexing. We've considered Solr and Magritte which has been mentioned before, but I am not as worried about that. We do need to think to that end about how indexes are affected by all the writes going on and where it makes sense to put those, but I think we can solve that.