Tuple Space


A language for expressing parallel processing based on an Associative Memory data structure that provides a repository (mathematically a bag; that is, a set that can contain multiple copies of the same item) for tuples (see Tuple Definition).
The structure has six primitive operations:

PUT, COPY, TAKE, TRY_COPY, TRY_TAKE, and EVAL

Tuple Space is an Associative Memory, which means that tuples are accessed not by their address but rather by their content and type:

put places a tuple into the bag. copy finds and reads a tuple. take is like copy but also removes a tuple after reading it. Copy and find block if they cannot find a tuple matching the query; try_copy and try_find are the non-blocking versions. Eval forks a new process.

The repository is "generative" in that (see: Generative Communication) because a tuple generated by a process has an independent existence in Tuple Space. Any process may remove the tuple, and the tuple is bound to no process in particular. any process can reference any Tuple regardless of where that Tuple is stored. A Tuple Space is a logical concept and does not require an underlying physical shared memory.


One strength of the model is its ability to describe parallel algorithms without reference to any specific computer architecture.

The concept of tuple space was pioneered by David Gelernter at the Yale Linda Group with the Linda Tuple Spaces system and was initially offered in Fortran Language and Cee Language languages. However, various other Tuple Space systems exist today such as LiPS or ActorSpaces for many languages including the Smalltalk Language, Java Language, Lisp Language and Ruby Language (Distributed Ruby). For some reason, the Java Language is experiencing an explosion of generative systems including Java Spaces, TSpace, PageSpaces, Open Spaces, and so on.


Systems based on the Blackboard Metaphor can be simply created in Tuple Space. Tuple spaces are more general than the Blackboard Metaphor (which has specialists grabbing information scoped to their specialty), for example, they can also be used for Compute Server.


For criticisms of the Tuple Space model relative to Concurrent Logic Programming, Message Passing Concurrency and Communicating Sequential Processes, see www.cypherpunks.to (starting on page 4 of the PDF).


Putting mutable reference objects into a tuple may incur some problems that subvert some of the benefit for using tuple spaces.


'Introductory links'

Linda in Context, www.cs.uwaterloo.ca (haven't finished it yet, and I'd prefer a little less context ;-))

Java Spaces Principles Patterns And Practice (Freeman, Hupfer, Arnold) is a very good introduction to tuple spaces in general, even though it is centered around Java Spaces which is a bit different.


What is a Tuple Space?; by comparing it to other similar things

vs message queue Messages go from one person to another (or to many others). Tuple Space is a bag with tuples in it. Each process can then find a tuple that matches their search criteria and remove it from the bag. If you choose to have as a field in your tuple a destination process, and if selection is done on that field, then tuples can sort of simulate message passing. By copying rather than removing tuples you can also simulate messages to groups, and broadcast messages.

However, one of the key differences between a Tuple Space and a message queue involves the temporal aspect. Generally with a message queue once the message has reached the front of the queue and been processed it is gone. With a Tuple Space that does not have to happen. A Tuple can be placed into the Tuple Space and it may never be taken. Or it may be taken in a month. Or it might already have been taken by the time you read this sentence.

In addition, Message Queue has ordering of the messages, which is not maintained by a Tuple Space.

vs SQL DB

A Tuple Space is arguably a more primitive construct than a SQL DBMS. A SQL DBMS could conceivably be built using a Tuple Space implementation as its storage layer. It would, however, be an awkward Abstraction Inversion to create a Tuple Space using an SQL DBMS, especially in cases where Tuple Space functionality is needed without requiring other aspects of SQL or DBMSes.

So it's almost like a database assembler language?

[I don't know. What's a database assembler language? There are various technologies that can underpin a relational database management system. Tuple Space is one possibility, key/value stores (like the Berkeley DB) is another.]


Implementations of Tuple Spaces

Giga Spaces, a commercial implementation - www.gigaspaces.com

IBM's TSpaces - www.almaden.ibm.com

jxtaspaces for P2P jxtaspaces.jxta.org

In C

Scientific Computing Associates' TCP-Linda lindaspaces.com

W.F. Wong's Simple C-Linda with Posix Threads www.comp.nus.edu.sg

HTML Page Spaces -- flp.cs.tu-berlin.de

Masatoshi Seki's Rinda, part of Distributed Ruby dRuby (a.k.a. DRb?) www2a.biglobe.ne.jp (this is part of the standard ruby distribution now)

Ruple where the tuples are replaced by XML documents (well-formed, but not necessarily validated) - www.roguewave.com (Ceased development as of 20 May 2003)

p4-Linda - part of the p4 parallel programming system (no longer supported)

ftp://info.mcs.anl.gov/pub/p4

Py Linda. Linda imp. in Python www-users.cs.york.ac.uk

in Python, R, Matlab:
Net Work Spaces, www.ddj.com

Triple Space is where the tuples are RDF triples:
www.deri.ie


Scalability discussion

A question. On Tuple Space Scalability, I just made the assertion that a distributed, persistent, transactional, fault tolerant Associative Memory cannot scale linearly; that is, Tuple Space won't work on the Internet, precisely because of its underlying "distributed shared memory" idea. Some assumptions have to be relaxed slightly; I've indicated one possible direction there, but there must be others. What do you think? -- Vladimir Slepnev


Discussion

I am a HUGE fan of tuple spaces. They are the most powerful execution environment there is, since the guillotine has gone out of favor. No contest. I believe it can be optimally configured to model any execution environment. Static binding, no problem, dynamic binding, no problem, distribution, no problem, transactions, no problem, persistence, no problem, and so on. These Yale guys are sitting on a gold mine. All we need is a fast O/S mapping and this would take over. No doubt. Word :). -- Richard Henderson

A gold mine? Does Yale University (or anyone else for that matter) hold patents on Tuple Space?


I'm thinking it wouldn't be such a hideous Abstraction Inversion to use an RDBMS to implement a Tuple Space.

Instead of having different kinds of tuples, you simply have records in different tables. Since most modern RDBMSs provide in-memory tables, you have the possibility creating/consuming large numbers of them without hitting the disk space (so long as you can keep your tables small enough to fit into memory). The transactional nature of such things would seem to be a good fit for Linda's atomic operations. Additionally, if something happened which caused a transaction to fail, the ability to roll-back to a known state and try again would be very helpful. In this respect, I see quite a bit of the development surrounding Linda as re-inventing the RDBMS wheel. Naturally, you'd want to do it in a more lightweight fashion, so much as possible. Eliminating the SQL engine and providing an API to access data/perform operations would be a good start.

Many of the Linda tasks you create are based on "consume a tuple of this type, do X calculations with it, emit a tuple of this other type, repeat as necessary." This would require something repeatedly polling the database, looking for an input tuple to consume. If you could use triggers, which spawn or execute that process every time a record is written to the appropriate table, you get an interrupt-driven process instead of a polling-driven process; from there, it becomes a question of the DB server's ability to manage large numbers of spanwed tasks. Add in the fact that, for many, modern RDBMSs you can have a cluster of machines, sharding the data from a specific set of tables across multiple physical machines, and you have the ability to scale across multiple physical machines for increased performance. Indeed, while you may want some of the records persisted for actual consumption outside of the process, if the vast majority of "intermediate" results, working toward those records, can be done entirely in memory, it may be worth your while to have several physical machines which only shard the in-memory tables and handle the bulk of the triggered processes.

While such a system would be a bit of overkill, I suspect there are more people who are well-versed in the care and feeding of RDBMSs than there are developing with Linda. This could be a good "jumping off point" for introducing more people, inside and outside of various enterprises, to the concepts in Linda.

Are there papers out there about people already trying this? Or have tried this? I have to think I'm not the only person thinking in this direction, but I'm coming up short on hits on Google. It's possible I'm just not searching on the right terms, yet. -- Meower68

It's not clear to me what RAM has to do with it. That's more or less of an implementation detail, not a database "user" interface issue. Often it's best to focus on the UI/language/interface aspects of a "technology" first, and then consider implementation issues. Of course it's not always seamless because machine/performance-issue trade-offs often dictate what's available or feasible.

To clarify, one of the performance advantages which a Tuple Space frequently has over an RDBMS is that the Tuple Space lives only in RAM. I'm merely suggesting that a modern RDBMS, with memory-backed tables, would bring the performance closer to parity.

[Any DBMS or Tuple Space developed in the last few decades makes extensive use of caching, to the point that disk speed generally isn't an issue except (possibly) at startup whilst the cache is being populated, or if the pattern of retrievals results in a significant quantity of cache misses. The most notable conceptual distinction between a Tuple Space and an RDBMS is that in an RDBMS, tuple storage must be organised into a finite quantity of predefined tables. Tuple Spaces do not have this structural constraint. Depending on the application, this may be a help or a hindrance. If there are a wide variety of tuple types, the overhead of creating one table per tuple type (or of using some schema that supports arbitrary tuple types, such as EAV) may prohibit using an RDBMS.]

(See related discussion at Multi Paradigm Database Discussion.)


Wow. I'm almost sorry I brought this up.

When designing an application which will run in a Tuple Space, you typically have some idea, up front, as to what kind of tuples you will need to create. In that case, it's pretty easy to just create tables and read/write records to them. To avoid getting hung up from the beginning on the whole Multi Paradigm Database argument, and what is and is not implemented, I'm asking if anyone has done something like this, using SQL and whatever stored procedure language the server implements. I recognize this is not what RDBMSs were created for, but it would appear that one could play around with this, possibly resulting in something useful. Does anyone know if this has been tried?

If the answer is no, no one has tried this, fine; that is a valid answer. It's possible that what I'm asking is simply stretching/abusing such systems too much. If someone knows of someone who has tried this, please provide some references where I can learn more about it. -- Meower68

I'm not aware of anyone who has tried this. I'm not sure what you expect the outcome to be, other than (perhaps) a somewhat slower than usual Tuple Space implementation.

Getting most Fortune 500 companies to provision systems, install a Tuple Space and experiment with it is, in my experience, a bridge too far. All of them, however, have RDBMSs in place. Those are reasonably well-understood technologies. Many of them have clustered, sharded databases in place, both for production work and on test/development levels. Which is more likely to get a positive response from the typical F500 Pointy Haired Boss?

We need a higher-performance way to do X. I have some ideas for how to do it, utilizing existing infrastructure, albeit in an unconventional fashion. Can I get some time to experiment with it?

We need a higher-performance way to do X. Can I get some servers provisioned, on the development layer, and install a bunch of new software, as well as some time to experiment with it?

A dedicated Tuple Space would perform better; on that we agree. Additional infrastructure, hardware and software, especially software with which no one currently has experience, is a harder sell to management. If, however, you can get the RDBMS solution working and show significant performance improvement, you may be able to lead them toward a dedicated Tuple Space. -- Meower68

If your organisation does not have a culture of experimentation, it's no more likely to let you experiment on a DBMS than on provisioned servers or anything else. From the company's point of view, the difficult (and usually impossible) sell is "can I get some time to experiment with it" -- unless you were explicitly hired to conduct such experiments. And that's rare in organisations without an experiment-oriented culture. Usually, if "innovation" can't be bought from Oracle or Microsoft along with a full support contract, or implemented entirely within an Excel spreadsheet, it's not going to happen. On the other hand, if your organisation does support experimentation, it's as likely to support provisioning a server or two as it is to allow you your own experimental database. Indeed, with modern data centre infrastructure, provisioning servers may require no more expense and effort than launching a couple more virtual machine instances on the corporate private cloud.


Time for a Tuple-Ware party? :-)



See original on c2.com