Re: Please consider Apache Integration of Services

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Please consider Apache Integration of Services

aglynn42

From the (admittedly not huge) testing I’ve done, Storm is slower, has more overhead, and more issues with concurrency than simply using the PostgreSQL overlay to HDFS.

 

It would also be easier to do using Gemstone and Pivotal HDB than using Storm, since the code to do from Gemfire is already open source and massively smaller than the code to Storm (or Spark - the main difference is the higher level API).

 

Since I don’t see the advantage, what advantage to you see in it? 

 

It’s just my $0.02, and given we no longer have five cent coins in Canad, you can round it to what it’s probably worth …

 

Andrew Glynn

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Friday, November 10, 2017 5:12 PM
To: [hidden email]; [hidden email]
Subject: [Pharo-users] Please consider Apache Integration of Services

 

Note, my quote below, Storm is implementable by any language, hello Smalltalk!

 

At the risk of throwing a rock in the pool, I must as well acknowledging the unique offerings we have in Smalltalk spaces. The challenge as I see it is the lack of a coordinated effort to adopt common interfaces used in industry. I would hold up the history of the Cryptography team how different folks came together to join efforts in creating a shared library. Which they did. It holds up over the test of time and is ported through multiple Smalltalk environments. Using that as a reference, were the various cloud and BigData applications be seen as worthy to build good integration with, the access to all the great data-manipulation tools and presentations that Smalltalk environments offer will finally be accessible to put on the table in corporate and advanced data processing environments.

 

Alright, you may accept what I have been saying then for those of us still in contemplation will be curious about what work that entails. Allow me to present a few of the projects working together through Apache Foundation. They are truly leaders in creating the computing of BigData. The core architecture that is commonly used in BigData and Cloud is called the Lambda Architecture, consisting of a fault-tolerant event streaming source, such as Apache Kafka [1], a batch-processing pipe and NoSQL database coordinator, such as Apache Cassandra [2], and a real-time processing pipe, for example Apache Storm [3]. There is also analytics on the other side of storage, such as through Cassandra to Hadoop and queries. What I would like to highlight is Apache Storm.

 

Apache Storm is quite simple in idea though more complex on hardware. One thing to keep in mind is that the fault-tolerance requirement has forced Kafka and Storm to both be replication centric in a de-centralized way. They tend to use Apache Zookeeper [4] to monitor progress through durable queues of data. Storm in particular is a way to consume streams of events, including the ability to join and filter them. Here are two blurbs, one about Storm Architecture and the other about how other languages can implement Storm architecture pieces, especially Bolts.

 

My question is why can't and why shouldn't Smalltalks (Pharo, Squeak, Gemstone, Smalltalk Express, Swift, ST/X, Dolphin) be able to participate? Working software engineers are building this stuff and lots of time and money are being spent converting critical data in industry. It seems now is the time to jump aboard as much computing will be done in these areas and we can compete! Here are blurbs about Storm then links to Kafka [1], Cassandra [2] and Storm [3].

 

------

"There are just three abstractions in Storm: spouts, bolts, and topologies. A spout is a source of streams in a computation. Typically a spout reads from a queueing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API. Spout implementations already exist for most queueing systems.

A bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on.

A topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation. Topologies run indefinitely when deployed."

------

"Storm was designed from the ground up to be usable with any programming language. At the core of Storm is a Thrift definition for defining and submitting topologies. Since Thrift can be used in any language, topologies can be defined and submitted from any language.

Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl."

------

 

 

- HH

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Please consider Apache Integration of Services

henry
The advantage I see is enabling Smalltalk to be a real open-source player in BigData. All the modern corporations are using the cloud for BigData. The internet companies of the twenty-teens developed all of this software for back-office and middleware processing and Smalltalk does not rank highly on used languages. What is the future and where is Smalltalk in that future. I love Smalltalk and I think about this. Smalltalk has the best development environment but other languages are not far off. Being the most creative development environment does not win contracts. So I think an analysis is useful.

Client apps are slim pickings, what with so many options. In reality, nuclear power stations, banks and government bureaucracy are all using web interfaces to their business apps. The retail market is governed by smart phones and playstations. We can't make a phenomenal jump to prominence in those areas I just do not believe. Another way to look is which areas of computing has the most market-place velocity and acceleration. Hands down that is BigData and the cloud management of services. Along with these areas are vast standardizations of data structure definition, what with XML, JSON, Avro and the like. I just found Apache Thrift, a compiled language for data interfaces. Perhaps that should be a target.

We can't do everything, but it would be great to see Squeak and Pharo in the list of languages that support pieces of this architecture. The advantages to Storm are introductory. Some combinations of technologies may be more performant but that is no longer the key paramer, what with 100s of partitions forking data to different pipelines to process in massive parallel. Add to this the rise of machine learning upon all of this data flow.

Note I said data flow, because half of these new technology architectures is in stream processing while the other is query analytics after the data has been captured. Do we agree here?

The key is encoding. I thought the advantage to adopting Storm interoperability was to be:
1 - language placement as feasible in interoperability
2 - initial encoding standards (Avro, so on)
3 - Participation in BigData
4 - solidify capability on the stream side
5 - gateway to operating on the query side

This is the future of computing in a significant space where we are not really playing. Ruby is ahead because they can interoperate. I was hoping to encourage a combined coordinated effort towards creating a set of backoffice tools, one each. Currently, I read of three of so different windowing and rendering systems, why all the duplicated efforts there at the expense of integrating with other environments? That seems to be key, to me.

Perhaps a better choice to start than Storm? I submit as merely the initial suggestion to coalesce focus, to the backoffice smalltalk consortium.

- HH


-------- Original Message --------
Subject: RE: [Pharo-users] Please consider Apache Integration of Services
Local Time: November 10, 2017 5:55 PM
UTC Time: November 10, 2017 10:55 PM
To: henry <[hidden email]>, Any question about pharo iswelcome <[hidden email]>

From the (admittedly not huge) testing I’ve done, Storm is slower, has more overhead, and more issues with concurrency than simply using the PostgreSQL overlay to HDFS.

 

It would also be easier to do using Gemstone and Pivotal HDB than using Storm, since the code to do from Gemfire is already open source and massively smaller than the code to Storm (or Spark - the main difference is the higher level API).

 

Since I don’t see the advantage, what advantage to you see in it? 

 

It’s just my $0.02, and given we no longer have five cent coins in Canad, you can round it to what it’s probably worth …

 

Andrew Glynn

 

Sent from Mail for Windows 10

 

Sent: Friday, November 10, 2017 5:12 PM
Subject: [Pharo-users] Please consider Apache Integration of Services

 

Note, my quote below, Storm is implementable by any language, hello Smalltalk!

 

At the risk of throwing a rock in the pool, I must as well acknowledging the unique offerings we have in Smalltalk spaces. The challenge as I see it is the lack of a coordinated effort to adopt common interfaces used in industry. I would hold up the history of the Cryptography team how different folks came together to join efforts in creating a shared library. Which they did. It holds up over the test of time and is ported through multiple Smalltalk environments. Using that as a reference, were the various cloud and BigData applications be seen as worthy to build good integration with, the access to all the great data-manipulation tools and presentations that Smalltalk environments offer will finally be accessible to put on the table in corporate and advanced data processing environments.

 

Alright, you may accept what I have been saying then for those of us still in contemplation will be curious about what work that entails. Allow me to present a few of the projects working together through Apache Foundation. They are truly leaders in creating the computing of BigData. The core architecture that is commonly used in BigData and Cloud is called the Lambda Architecture, consisting of a fault-tolerant event streaming source, such as Apache Kafka [1], a batch-processing pipe and NoSQL database coordinator, such as Apache Cassandra [2], and a real-time processing pipe, for example Apache Storm [3]. There is also analytics on the other side of storage, such as through Cassandra to Hadoop and queries. What I would like to highlight is Apache Storm.

 

Apache Storm is quite simple in idea though more complex on hardware. One thing to keep in mind is that the fault-tolerance requirement has forced Kafka and Storm to both be replication centric in a de-centralized way. They tend to use Apache Zookeeper [4] to monitor progress through durable queues of data. Storm in particular is a way to consume streams of events, including the ability to join and filter them. Here are two blurbs, one about Storm Architecture and the other about how other languages can implement Storm architecture pieces, especially Bolts.

 

My question is why can't and why shouldn't Smalltalks (Pharo, Squeak, Gemstone, Smalltalk Express, Swift, ST/X, Dolphin) be able to participate? Working software engineers are building this stuff and lots of time and money are being spent converting critical data in industry. It seems now is the time to jump aboard as much computing will be done in these areas and we can compete! Here are blurbs about Storm then links to Kafka [1], Cassandra [2] and Storm [3].

 

------

"There are just three abstractions in Storm: spouts, bolts, and topologies. A spout is a source of streams in a computation. Typically a spout reads from a queueing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API. Spout implementations already exist for most queueing systems.

A bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on.

A topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation. Topologies run indefinitely when deployed."

------

"Storm was designed from the ground up to be usable with any programming language. At the core of Storm is a Thrift definition for defining and submitting topologies. Since Thrift can be used in any language, topologies can be defined and submitted from any language.

Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl."

------

 

 

- HH