OT: Convince me github is a wise choice

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

OT: Convince me github is a wise choice

Eliot Miranda-2
Hi Ben, Hi All,

    I'm quite conservative when it comes to relying on others' infrastructure so I need some help making me take the plunge.  Please see below:

On Thu, Dec 17, 2015 at 7:15 AM, Ben Coman <[hidden email]> wrote:

> On Wed, Dec 16, 2015 at 1:00 AM, Ben Coman <[hidden email]> wrote:
>>
>>
>> On Wed, Dec 16, 2015 at 10:43 AM, Ryan Macnak <[hidden email]> wrote:
>> >
>> > What would be more helpful is if the VM build was fixed to work with a cross compiler, so it would compile fast enough to test ARM and MIPS on Travis CI alongside IA32 and X64.
>> >
>> > It would also help if the top-of-tree Intel VMs were always kept working so we'd know which change broke something. Moving the Subversion repository to a more reliable host (which likely means migrating to Git) would also cut down on the false positives Travis reports because the Subversion server has a habit of dropping connections.
>>
>> +1 github :)

btw, Did you know that github supports subversion clients since 2011 [1]?
Here are supported features [2].  Are these sufficient for your
current svn workflows?
Potentially we could have ONE repository and those liking subversion
can stick with it and those liking git can use that.  Of course, this
would need to be proven.

Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.

[4] Provides pragmatic advice for cutting over.  Esteban appears to
have done similar to step 1 and 2 [3] - but it seem sometimes his
modifications directly update this mirror so its not clear to see when
that branch is an *exact* copy of the current svn trunk.  So I'd love
to see a github repository that is always an *exact* mirror of the svn
repository, with any pharo mods occurring in a branch off that.  Even
better if the repository for svn users resides on github in place of
that mirror.

I've been googling around for problems reported using github via an
svn client, and haven't found any smoking guns.
Is this something we can trial?  I'm willing to put some effort into
it.  A key requirement would be not interrupting Eliots work on
Spur-64.  Potentially we could stay for months on step 3 [4] with the
CI infrastructure running on the git side, but code check-ins
continuing onthe svn side.

btw2, [5] provides a use case for the advantages of a full switch.

cheers -ben

[1] https://github.com/blog/966-improved-subversion-client-support
[2] https://help.github.com/articles/support-for-subversion-clients/
[3] https://github.com/pharo-project/pharo-vm/network
[4] http://blogs.atlassian.com/2013/01/atlassian-svn-to-git-migration-technical-side/
[5] http://blogs.atlassian.com/2013/01/svn-to-git-how-atlassian-made-the-switch-without-sacrificing-active-development/



--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: OT: Convince me github is a wise choice

Peter Crowther-2
On 17 December 2015 at 17:29, Eliot Miranda <[hidden email]> wrote:
> So my concern is whether github is a safe long-term
> bet.  Specifically what is there to prevent some third party from buying
> github, or of github going public and the board taking the decision, or
> github on its own, deciding to charge for hosting, keeping the data hostage
> to extract payment?  What safeguards are in place to prevent this?  I'm not
> interested in "this will never happen" arguments.  I'm interested in hard
> data please.

There is nothing in place commercially to prevent this, but there are
some technical safeguards.  Principal among these is that a clone of a
repo includes a full version history.  One or more of us could choose
to "git clone" regularly from the github repo, and treat those as
backups against github becoming unavailable or unreasonable.

- Peter

Reply | Threaded
Open this post in threaded view
|

Re: OT: Convince me github is a wise choice

Patrick R.
On this regard: We could for sure set up a server pulling all branches of the the repository periodically. Additionally, there are tools to back up the meta data associated with the repository (issues, wiki, etc.) :)

http://heyrod.com/snippets/github-backup.html
https://github.com/joeyh/github-backup

- Patrick
Reply | Threaded
Open this post in threaded view
|

Re: OT: Convince me github is a wise choice

Colin Putney-3
In reply to this post by Eliot Miranda-2


On Thu, Dec 17, 2015 at 9:29 AM, Eliot Miranda <[hidden email]> wrote:
 
Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.

This sounds like a risk management problem. We want to minimize the risk that we lose access to the source code and it's history, right? Is there other data that you are concerned about?

With regard to GitHub, I think these are the interesting questions:

  1. What are the chances that GitHub will stop providing free hosting to open source projects?
  2. What are the consequences if #1 occurs?
  3. What can we do about it?
First, let's look at #1. This sort of thing does happen. Holding data hostage is unusual, but free online services get shut down all the time. What might cause *Github* to do it? 

Could they be forced to cut expenses? Github has been around for almost 8 years, and have stuck with their model of "free public repositories, pay for privacy" throughout that time. It seems to be working for them. Three years ago one of their investors said they've been profitable over most of their life, and are growing revenue at 300% per year[1]. This summer, they raised $250 million more, with the company valued at $2 billion[2]. That indicates that they're still growing quickly, and think they'll be able to expand into new markets. So running out of money and dropping free hosting as a way to cut costs seems unlikely. 

How about a change in control? Maybe Oracle will buy them and squeeze as much profit out of them as possible before tossing the dry husk away. For that to happen, the offer would have to be spectacular. Github's investors need at least a 10x return, and probably more, to make money for their funds. If they were worth $2 billion this summer, the acquisition price would have to be something like $20-50 billion. That just doesn't allow the buyer much room to maneuver. There's no special technology behind Github that would make sense to to acquire at that price. Github's value is entirely in market position, customer relationships, goodwill etc. To make back the money, the buyer would need to keep running Github and keep earning revenue from it.

Going public? Even less likely. Because of regulatory changes, tech companies have been waiting longer to go public and doing so at a much higher valuation. (Lots of different takes on this, but see eg. [3]) If Github went public, it would be because its valuation was so high that employees and investors wanted to (more easily) sell some shares and enjoy their wealth. That would be a huge endorsement of the business model and current management team. With few investors—only five so far[4]—the founders would undoubtedly retain control, similar to the IPOs of Google and Facebook. Messing with the business model would be unthinkable at that point. 

What if Github decided to change strategies without some sort of external impetus? That seems unlikely as well. The economics underlying the freemium strategy are getting more and more compelling over time. Disks are cheap, and the cost of storage keeps going down. I just ran across a new cloud storage service that charges half-a-cent per GB per month[5]. Computing power is also getting cheaper, and with cluster managers like Mesos and Kubernetes, we're using it more efficiently as well. The "burden" of providing free hosting is low and will be getting lower as time goes on. 

On the other hand, Github is *the* go-to place for hosting source code. There are millions of users that have both free public repositories and paid private ones. (Github reports 12 million users[6], and I bet a large fraction of them at least have access to both public and private repositories.) Taking away the free repositories would alienate a LOT of customers, and hurt revenue.

So, without saying "this will never happen," I will say that Github shutting down free hosting would be unlikely.

Alright, let's look at #2. If the unlikely did happen, what would be the consequences?

As others have mentioned, the architecture of git makes it impossible to hold the source code and history hostage. Everyone who clones a git repository has a complete copy of the data. If they decided to lock everyone out of the repositories we'd just get another server and do this:
cd cog
git remote add origin git://git.squeak.org/cog.git
git push origin master
At the same time, we'd be in good company. Github currently has 30 million repositories[6]. Let's be really generous and say that half of those are private, and thus paid-for and exempt from hostage-taking. That means 15 million repositories are now subject to extortion from Github. Sure, most of those are personal forks with no significant changes. But even if there were only, say, 100,000 "real" repositories, that would be a *cataclysm* for the open source world. Alternate hosting would be popping up all over the place, and whatever inconvenience we might have about moving would be quickly solved by larger and richer open source projects. It wouldn't take much more than "here's our new git hosting" posted on the mailing list and squeak.org to make the change, because *everybody* would know about the problem.

Finally, #3, what can we do about it?

Well, in terms of influencing Github's business model, nothing. We have no leverage. So #1 is out of our control.

But, there are a few things we can do to improve #2. First, we could mirror all commits to another repository. That could be a Github competitor, like BitBucket, or just a server that we host with Rackspace or whatever, or even "offline" storage like S3. I believe the Pharo folks are already mirroring the VM source, from the current hosting, so that helps reduce the risk as well.

Second, we could move more of the VM source into Smalltalk. That might mean generating more of the source files with VM maker, running builds from within the image instead of using CMake etc. It probably wouldn't be worth it to make *all* the platform sources versioned in MC, but we could go further in that direction from where we are now. 

Finally, if it really did come down to Github holding the sources hostage and we had no other copies, we could just pay up. Currently, their cheapest plan is $7/month for 5 private repositories, which ought to cover our needs. Even with the meager donations that Squeak attracts today, surely we could raise $85 to get a year of paid hosting, and use that time to figure out what to do for the long term. Github might raise their prices (Why not? This scenario already has them being suicidally irrational.), but I can't see them exceeding our fundraising capabilities. What's the point of extortion if the victim can't pay?

(As a side note, I would be shocked if hosting squeakvm.org currently costs less than $7/month. No idea who's paying for it, but how confident are we that they'll continue to do so?)

In summary, Github is a very safe bet. Your nightmare scenario involves a series of very improbable events: Github would have to stop offering free hosting. They'd have to actively alienate their paying customers by holding their source code hostage. There would have to be sudden disk failures on dozens of laptops and servers where the repository is cloned. And to top it all off, the larger Squeak community, including Pharo, Cuis, Newspeak, Scratch and Croquet would have to be unable to come up with a few dozen dollars to pay for the hosting. 

This will never happen.

Colin




Reply | Threaded
Open this post in threaded view
|

Re: OT: Convince me github is a wise choice

fniephaus
On Mon, Dec 21, 2015 at 7:17 AM Colin Putney <[hidden email]> wrote:
On Thu, Dec 17, 2015 at 9:29 AM, Eliot Miranda <[hidden email]> wrote:
 
Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.

This sounds like a risk management problem. We want to minimize the risk that we lose access to the source code and it's history, right? Is there other data that you are concerned about?

With regard to GitHub, I think these are the interesting questions:

  1. What are the chances that GitHub will stop providing free hosting to open source projects?
  2. What are the consequences if #1 occurs?
  3. What can we do about it?
First, let's look at #1. This sort of thing does happen. Holding data hostage is unusual, but free online services get shut down all the time. What might cause *Github* to do it? 

Could they be forced to cut expenses? Github has been around for almost 8 years, and have stuck with their model of "free public repositories, pay for privacy" throughout that time. It seems to be working for them. Three years ago one of their investors said they've been profitable over most of their life, and are growing revenue at 300% per year[1]. This summer, they raised $250 million more, with the company valued at $2 billion[2]. That indicates that they're still growing quickly, and think they'll be able to expand into new markets. So running out of money and dropping free hosting as a way to cut costs seems unlikely. 

How about a change in control? Maybe Oracle will buy them and squeeze as much profit out of them as possible before tossing the dry husk away. For that to happen, the offer would have to be spectacular. Github's investors need at least a 10x return, and probably more, to make money for their funds. If they were worth $2 billion this summer, the acquisition price would have to be something like $20-50 billion. That just doesn't allow the buyer much room to maneuver. There's no special technology behind Github that would make sense to to acquire at that price. Github's value is entirely in market position, customer relationships, goodwill etc. To make back the money, the buyer would need to keep running Github and keep earning revenue from it.

Going public? Even less likely. Because of regulatory changes, tech companies have been waiting longer to go public and doing so at a much higher valuation. (Lots of different takes on this, but see eg. [3]) If Github went public, it would be because its valuation was so high that employees and investors wanted to (more easily) sell some shares and enjoy their wealth. That would be a huge endorsement of the business model and current management team. With few investors—only five so far[4]—the founders would undoubtedly retain control, similar to the IPOs of Google and Facebook. Messing with the business model would be unthinkable at that point. 

What if Github decided to change strategies without some sort of external impetus? That seems unlikely as well. The economics underlying the freemium strategy are getting more and more compelling over time. Disks are cheap, and the cost of storage keeps going down. I just ran across a new cloud storage service that charges half-a-cent per GB per month[5]. Computing power is also getting cheaper, and with cluster managers like Mesos and Kubernetes, we're using it more efficiently as well. The "burden" of providing free hosting is low and will be getting lower as time goes on. 

On the other hand, Github is *the* go-to place for hosting source code. There are millions of users that have both free public repositories and paid private ones. (Github reports 12 million users[6], and I bet a large fraction of them at least have access to both public and private repositories.) Taking away the free repositories would alienate a LOT of customers, and hurt revenue.

So, without saying "this will never happen," I will say that Github shutting down free hosting would be unlikely.

Alright, let's look at #2. If the unlikely did happen, what would be the consequences?

As others have mentioned, the architecture of git makes it impossible to hold the source code and history hostage. Everyone who clones a git repository has a complete copy of the data. If they decided to lock everyone out of the repositories we'd just get another server and do this:
cd cog
git remote add origin git://git.squeak.org/cog.git
git push origin master
At the same time, we'd be in good company. Github currently has 30 million repositories[6]. Let's be really generous and say that half of those are private, and thus paid-for and exempt from hostage-taking. That means 15 million repositories are now subject to extortion from Github. Sure, most of those are personal forks with no significant changes. But even if there were only, say, 100,000 "real" repositories, that would be a *cataclysm* for the open source world. Alternate hosting would be popping up all over the place, and whatever inconvenience we might have about moving would be quickly solved by larger and richer open source projects. It wouldn't take much more than "here's our new git hosting" posted on the mailing list and squeak.org to make the change, because *everybody* would know about the problem.

Finally, #3, what can we do about it?

Well, in terms of influencing Github's business model, nothing. We have no leverage. So #1 is out of our control.

But, there are a few things we can do to improve #2. First, we could mirror all commits to another repository. That could be a Github competitor, like BitBucket, or just a server that we host with Rackspace or whatever, or even "offline" storage like S3. I believe the Pharo folks are already mirroring the VM source, from the current hosting, so that helps reduce the risk as well.

Second, we could move more of the VM source into Smalltalk. That might mean generating more of the source files with VM maker, running builds from within the image instead of using CMake etc. It probably wouldn't be worth it to make *all* the platform sources versioned in MC, but we could go further in that direction from where we are now. 

Finally, if it really did come down to Github holding the sources hostage and we had no other copies, we could just pay up. Currently, their cheapest plan is $7/month for 5 private repositories, which ought to cover our needs. Even with the meager donations that Squeak attracts today, surely we could raise $85 to get a year of paid hosting, and use that time to figure out what to do for the long term. Github might raise their prices (Why not? This scenario already has them being suicidally irrational.), but I can't see them exceeding our fundraising capabilities. What's the point of extortion if the victim can't pay?

(As a side note, I would be shocked if hosting squeakvm.org currently costs less than $7/month. No idea who's paying for it, but how confident are we that they'll continue to do so?)

In summary, Github is a very safe bet. Your nightmare scenario involves a series of very improbable events: Github would have to stop offering free hosting. They'd have to actively alienate their paying customers by holding their source code hostage. There would have to be sudden disk failures on dozens of laptops and servers where the repository is cloned. And to top it all off, the larger Squeak community, including Pharo, Cuis, Newspeak, Scratch and Croquet would have to be unable to come up with a few dozen dollars to pay for the hosting. 

This will never happen.

Colin




There's not much I can add to Colin's great write-up except that others must
have asked themselves the same questions as well. If moving to GitHub was a
big risk, many companies wouldn't have done it already including the big ones
(e.g. Microsoft, Google and now even Apple has even released Swift on GitHub).

I am also +1 for GitHub. We have been successfully using it as a hosting
platform for student projects [1]. IMHO, it is very convenient to not having to worry
about solved problems including infrastructure. Also, mirroring a git repository
can be done with a simple cronjob. However, I must admit, that there's still
potential to improve client-side tooling (git + Filetree), but at least I don't have to
worry about running a server and maintaining a SqueakSource/SqueakMap
instance anymore.

Lastly, we have been working on bringing Smalltalk support to Travis CI which
will hopefully make it very easy to enable CI for any Smalltalk project on GitHub.
An announcement will follow very soon.

Happy holidays,
Fabio 



Reply | Threaded
Open this post in threaded view
|

Re: OT: Convince me github is a wise choice

fniephaus
"Python moves to GitHub":
https://mail.python.org/pipermail/core-workflow/2016-January/000345.html

On Mon, Dec 21, 2015 at 10:52 AM Fabio Niephaus <[hidden email]> wrote:
On Mon, Dec 21, 2015 at 7:17 AM Colin Putney <[hidden email]> wrote:
On Thu, Dec 17, 2015 at 9:29 AM, Eliot Miranda <[hidden email]> wrote:
 
Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.

This sounds like a risk management problem. We want to minimize the risk that we lose access to the source code and it's history, right? Is there other data that you are concerned about?

With regard to GitHub, I think these are the interesting questions:

  1. What are the chances that GitHub will stop providing free hosting to open source projects?
  2. What are the consequences if #1 occurs?
  3. What can we do about it?
First, let's look at #1. This sort of thing does happen. Holding data hostage is unusual, but free online services get shut down all the time. What might cause *Github* to do it? 

Could they be forced to cut expenses? Github has been around for almost 8 years, and have stuck with their model of "free public repositories, pay for privacy" throughout that time. It seems to be working for them. Three years ago one of their investors said they've been profitable over most of their life, and are growing revenue at 300% per year[1]. This summer, they raised $250 million more, with the company valued at $2 billion[2]. That indicates that they're still growing quickly, and think they'll be able to expand into new markets. So running out of money and dropping free hosting as a way to cut costs seems unlikely. 

How about a change in control? Maybe Oracle will buy them and squeeze as much profit out of them as possible before tossing the dry husk away. For that to happen, the offer would have to be spectacular. Github's investors need at least a 10x return, and probably more, to make money for their funds. If they were worth $2 billion this summer, the acquisition price would have to be something like $20-50 billion. That just doesn't allow the buyer much room to maneuver. There's no special technology behind Github that would make sense to to acquire at that price. Github's value is entirely in market position, customer relationships, goodwill etc. To make back the money, the buyer would need to keep running Github and keep earning revenue from it.

Going public? Even less likely. Because of regulatory changes, tech companies have been waiting longer to go public and doing so at a much higher valuation. (Lots of different takes on this, but see eg. [3]) If Github went public, it would be because its valuation was so high that employees and investors wanted to (more easily) sell some shares and enjoy their wealth. That would be a huge endorsement of the business model and current management team. With few investors—only five so far[4]—the founders would undoubtedly retain control, similar to the IPOs of Google and Facebook. Messing with the business model would be unthinkable at that point. 

What if Github decided to change strategies without some sort of external impetus? That seems unlikely as well. The economics underlying the freemium strategy are getting more and more compelling over time. Disks are cheap, and the cost of storage keeps going down. I just ran across a new cloud storage service that charges half-a-cent per GB per month[5]. Computing power is also getting cheaper, and with cluster managers like Mesos and Kubernetes, we're using it more efficiently as well. The "burden" of providing free hosting is low and will be getting lower as time goes on. 

On the other hand, Github is *the* go-to place for hosting source code. There are millions of users that have both free public repositories and paid private ones. (Github reports 12 million users[6], and I bet a large fraction of them at least have access to both public and private repositories.) Taking away the free repositories would alienate a LOT of customers, and hurt revenue.

So, without saying "this will never happen," I will say that Github shutting down free hosting would be unlikely.

Alright, let's look at #2. If the unlikely did happen, what would be the consequences?

As others have mentioned, the architecture of git makes it impossible to hold the source code and history hostage. Everyone who clones a git repository has a complete copy of the data. If they decided to lock everyone out of the repositories we'd just get another server and do this:
cd cog
git remote add origin git://git.squeak.org/cog.git
git push origin master
At the same time, we'd be in good company. Github currently has 30 million repositories[6]. Let's be really generous and say that half of those are private, and thus paid-for and exempt from hostage-taking. That means 15 million repositories are now subject to extortion from Github. Sure, most of those are personal forks with no significant changes. But even if there were only, say, 100,000 "real" repositories, that would be a *cataclysm* for the open source world. Alternate hosting would be popping up all over the place, and whatever inconvenience we might have about moving would be quickly solved by larger and richer open source projects. It wouldn't take much more than "here's our new git hosting" posted on the mailing list and squeak.org to make the change, because *everybody* would know about the problem.

Finally, #3, what can we do about it?

Well, in terms of influencing Github's business model, nothing. We have no leverage. So #1 is out of our control.

But, there are a few things we can do to improve #2. First, we could mirror all commits to another repository. That could be a Github competitor, like BitBucket, or just a server that we host with Rackspace or whatever, or even "offline" storage like S3. I believe the Pharo folks are already mirroring the VM source, from the current hosting, so that helps reduce the risk as well.

Second, we could move more of the VM source into Smalltalk. That might mean generating more of the source files with VM maker, running builds from within the image instead of using CMake etc. It probably wouldn't be worth it to make *all* the platform sources versioned in MC, but we could go further in that direction from where we are now. 

Finally, if it really did come down to Github holding the sources hostage and we had no other copies, we could just pay up. Currently, their cheapest plan is $7/month for 5 private repositories, which ought to cover our needs. Even with the meager donations that Squeak attracts today, surely we could raise $85 to get a year of paid hosting, and use that time to figure out what to do for the long term. Github might raise their prices (Why not? This scenario already has them being suicidally irrational.), but I can't see them exceeding our fundraising capabilities. What's the point of extortion if the victim can't pay?

(As a side note, I would be shocked if hosting squeakvm.org currently costs less than $7/month. No idea who's paying for it, but how confident are we that they'll continue to do so?)

In summary, Github is a very safe bet. Your nightmare scenario involves a series of very improbable events: Github would have to stop offering free hosting. They'd have to actively alienate their paying customers by holding their source code hostage. There would have to be sudden disk failures on dozens of laptops and servers where the repository is cloned. And to top it all off, the larger Squeak community, including Pharo, Cuis, Newspeak, Scratch and Croquet would have to be unable to come up with a few dozen dollars to pay for the hosting. 

This will never happen.

Colin




There's not much I can add to Colin's great write-up except that others must
have asked themselves the same questions as well. If moving to GitHub was a
big risk, many companies wouldn't have done it already including the big ones
(e.g. Microsoft, Google and now even Apple has even released Swift on GitHub).

I am also +1 for GitHub. We have been successfully using it as a hosting
platform for student projects [1]. IMHO, it is very convenient to not having to worry
about solved problems including infrastructure. Also, mirroring a git repository
can be done with a simple cronjob. However, I must admit, that there's still
potential to improve client-side tooling (git + Filetree), but at least I don't have to
worry about running a server and maintaining a SqueakSource/SqueakMap
instance anymore.

Lastly, we have been working on bringing Smalltalk support to Travis CI which
will hopefully make it very easy to enable CI for any Smalltalk project on GitHub.
An announcement will follow very soon.

Happy holidays,
Fabio 



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] OT: Convince me github is a wise choice

Frank Shearar-3
On 1 January 2016 at 22:08, Fabio Niephaus <[hidden email]> wrote:

>
> "Python moves to GitHub":
> https://mail.python.org/pipermail/core-workflow/2016-January/000345.html
>
> On Mon, Dec 21, 2015 at 10:52 AM Fabio Niephaus <[hidden email]> wrote:
>>
>> On Mon, Dec 21, 2015 at 7:17 AM Colin Putney <[hidden email]> wrote:
>>>
>>> On Thu, Dec 17, 2015 at 9:29 AM, Eliot Miranda <[hidden email]> wrote:
>>>
>>>>
>>>> Ah, that's interesting.  So my concern is whether github is a safe long-term bet.  Specifically what is there to prevent some third party from buying github, or of github going public and the board taking the decision, or github on its own, deciding to charge for hosting, keeping the data hostage to extract payment?  What safeguards are in place to prevent this?  I'm not interested in "this will never happen" arguments.  I'm interested in hard data please.
>>>
>>>
>>> This sounds like a risk management problem. We want to minimize the risk that we lose access to the source code and it's history, right? Is there other data that you are concerned about?
>>>
>>> With regard to GitHub, I think these are the interesting questions:
>>>
>>> What are the chances that GitHub will stop providing free hosting to open source projects?
>>> What are the consequences if #1 occurs?
>>> What can we do about it?
>>>
>>> First, let's look at #1. This sort of thing does happen. Holding data hostage is unusual, but free online services get shut down all the time. What might cause *Github* to do it?
>>>
>>> Could they be forced to cut expenses? Github has been around for almost 8 years, and have stuck with their model of "free public repositories, pay for privacy" throughout that time. It seems to be working for them. Three years ago one of their investors said they've been profitable over most of their life, and are growing revenue at 300% per year[1]. This summer, they raised $250 million more, with the company valued at $2 billion[2]. That indicates that they're still growing quickly, and think they'll be able to expand into new markets. So running out of money and dropping free hosting as a way to cut costs seems unlikely.
>>>
>>> How about a change in control? Maybe Oracle will buy them and squeeze as much profit out of them as possible before tossing the dry husk away. For that to happen, the offer would have to be spectacular. Github's investors need at least a 10x return, and probably more, to make money for their funds. If they were worth $2 billion this summer, the acquisition price would have to be something like $20-50 billion. That just doesn't allow the buyer much room to maneuver. There's no special technology behind Github that would make sense to to acquire at that price. Github's value is entirely in market position, customer relationships, goodwill etc. To make back the money, the buyer would need to keep running Github and keep earning revenue from it.
>>>
>>> Going public? Even less likely. Because of regulatory changes, tech companies have been waiting longer to go public and doing so at a much higher valuation. (Lots of different takes on this, but see eg. [3]) If Github went public, it would be because its valuation was so high that employees and investors wanted to (more easily) sell some shares and enjoy their wealth. That would be a huge endorsement of the business model and current management team. With few investors—only five so far[4]—the founders would undoubtedly retain control, similar to the IPOs of Google and Facebook. Messing with the business model would be unthinkable at that point.
>>>
>>> What if Github decided to change strategies without some sort of external impetus? That seems unlikely as well. The economics underlying the freemium strategy are getting more and more compelling over time. Disks are cheap, and the cost of storage keeps going down. I just ran across a new cloud storage service that charges half-a-cent per GB per month[5]. Computing power is also getting cheaper, and with cluster managers like Mesos and Kubernetes, we're using it more efficiently as well. The "burden" of providing free hosting is low and will be getting lower as time goes on.
>>>
>>> On the other hand, Github is *the* go-to place for hosting source code. There are millions of users that have both free public repositories and paid private ones. (Github reports 12 million users[6], and I bet a large fraction of them at least have access to both public and private repositories.) Taking away the free repositories would alienate a LOT of customers, and hurt revenue.
>>>
>>> So, without saying "this will never happen," I will say that Github shutting down free hosting would be unlikely.
>>>
>>> Alright, let's look at #2. If the unlikely did happen, what would be the consequences?
>>>
>>> As others have mentioned, the architecture of git makes it impossible to hold the source code and history hostage. Everyone who clones a git repository has a complete copy of the data. If they decided to lock everyone out of the repositories we'd just get another server and do this:
>>>
>>> cd cog
>>> git remote add origin git://git.squeak.org/cog.git
>>> git push origin master
>>>
>>> At the same time, we'd be in good company. Github currently has 30 million repositories[6]. Let's be really generous and say that half of those are private, and thus paid-for and exempt from hostage-taking. That means 15 million repositories are now subject to extortion from Github. Sure, most of those are personal forks with no significant changes. But even if there were only, say, 100,000 "real" repositories, that would be a *cataclysm* for the open source world. Alternate hosting would be popping up all over the place, and whatever inconvenience we might have about moving would be quickly solved by larger and richer open source projects. It wouldn't take much more than "here's our new git hosting" posted on the mailing list and squeak.org to make the change, because *everybody* would know about the problem.
>>>
>>> Finally, #3, what can we do about it?
>>>
>>> Well, in terms of influencing Github's business model, nothing. We have no leverage. So #1 is out of our control.
>>>
>>> But, there are a few things we can do to improve #2. First, we could mirror all commits to another repository. That could be a Github competitor, like BitBucket, or just a server that we host with Rackspace or whatever, or even "offline" storage like S3. I believe the Pharo folks are already mirroring the VM source, from the current hosting, so that helps reduce the risk as well.
>>>
>>> Second, we could move more of the VM source into Smalltalk. That might mean generating more of the source files with VM maker, running builds from within the image instead of using CMake etc. It probably wouldn't be worth it to make *all* the platform sources versioned in MC, but we could go further in that direction from where we are now.
>>>
>>> Finally, if it really did come down to Github holding the sources hostage and we had no other copies, we could just pay up. Currently, their cheapest plan is $7/month for 5 private repositories, which ought to cover our needs. Even with the meager donations that Squeak attracts today, surely we could raise $85 to get a year of paid hosting, and use that time to figure out what to do for the long term. Github might raise their prices (Why not? This scenario already has them being suicidally irrational.), but I can't see them exceeding our fundraising capabilities. What's the point of extortion if the victim can't pay?
>>>
>>> (As a side note, I would be shocked if hosting squeakvm.org currently costs less than $7/month. No idea who's paying for it, but how confident are we that they'll continue to do so?)
>>>
>>> In summary, Github is a very safe bet. Your nightmare scenario involves a series of very improbable events: Github would have to stop offering free hosting. They'd have to actively alienate their paying customers by holding their source code hostage. There would have to be sudden disk failures on dozens of laptops and servers where the repository is cloned. And to top it all off, the larger Squeak community, including Pharo, Cuis, Newspeak, Scratch and Croquet would have to be unable to come up with a few dozen dollars to pay for the hosting.
>>>
>>> This will never happen.
>>>
>>> Colin
>>>
>>>
>>> [1] http://peter.a16z.com/2012/07/09/software-eats-software-development/
>>> [2] http://fortune.com/2015/07/29/github-raises-250-million-in-new-funding-now-valued-at-2-billion/
>>> [3] http://www.forbes.com/sites/samanthasharf/2014/12/24/is-the-ipo-outmoded-why-venture-backed-companies-are-waiting-longer-to-go-public/
>>> [4] https://www.crunchbase.com/organization/github/investors
>>> [5] https://www.backblaze.com/b2/cloud-storage.html
>>> [6] https://github.com/about/
>>>
>>
>> There's not much I can add to Colin's great write-up except that others must
>> have asked themselves the same questions as well. If moving to GitHub was a
>> big risk, many companies wouldn't have done it already including the big ones
>> (e.g. Microsoft, Google and now even Apple has even released Swift on GitHub).
>>
>> I am also +1 for GitHub. We have been successfully using it as a hosting
>> platform for student projects [1]. IMHO, it is very convenient to not having to worry
>> about solved problems including infrastructure. Also, mirroring a git repository
>> can be done with a simple cronjob. However, I must admit, that there's still
>> potential to improve client-side tooling (git + Filetree), but at least I don't have to
>> worry about running a server and maintaining a SqueakSource/SqueakMap
>> instance anymore.
>>
>> Lastly, we have been working on bringing Smalltalk support to Travis CI which
>> will hopefully make it very easy to enable CI for any Smalltalk project on GitHub.
>> An announcement will follow very soon.
>>
>> Happy holidays,
>> Fabio
>>
>> [1] https://github.com/hpi-swa-teaching

I just stumbled across this - https://github.com/joeyh/github-backup -
which will back up EVERYTHING GitHub knows about your repository -
milestones, issues, forks, comments, etc etc. And with `github-backup
<username>` you can back up an entire account (and all the
repositories that account watches/has starred). For example,
`github-backup squeak-smalltalk`.

(There are of course some limitations - see
https://github.com/joeyh/github-backup/blob/master/README.md#limitations
for details. But for the most important purposes - backing up source,
and bugs/wikis - the limitations won't cause inconvenience.)

frank