Friday, February 24, 2012

Put More Real Estate on that Desk

In my talk at the A2 DataDive I mention my three way video splitter.  I use a Matrox Tripple Head 2 Go. I still find it amazing that so many people, who work on computers most of their day, do not use dual monitors. I use a laptop because I move around and speak in classes etc.  thus the TH2Go lets me run the setup displayed below.  When I am just hacking away it is wonderful to have docs, terminals, and email all on the desktop at the same time.  I also make heavy use of spaces, virtual desktops, on my Mac.

Most desktops can support two heads. If you are paying someone to be on a computer most of their day, for the extra cost, they should have at least two screens, if you can, three. I personally do start to see diminishing returns past three. Or you can go overboard and use threeXthree to make a 9 tile display.

My $0.02 for the day. I can't recommend the extra real estate enough. Once windows start overlapping I slow way down and lose focus.

Brock's Work Desk, Laptop+TH2Go 3x19" 4x3 Screens

U of M 4K display 3 GPU Desktop w/3xTH2Go
Co-Workers Desktop

Thursday, February 23, 2012

A2 Data Dive Talk

On February 11th I gave a talk at the A2 Data Dive on Visit and mentions to GlobusOnline. The video is below:


Thursday, February 9, 2012

Doing More With the Same Budget

Andrew Jones has an article up at HPCWire on my series of posts on HPC funding models.  I wish to move away from condo funding, that is researchers own the underlying hardware, to a service model where users effectively rent cores. This is exactly what we did at Michigan.

Note my comments on this blog are mine and do not reflect that of my major employer.

I don't actually see much disagreement between Andrew and myself. The free model, overheads in Andrews terms, is a wonderful example of The Tragedy of the Commons. Queue times get long and what is to stop anyone from running anything?  Under this wild west approach, which I don't think Andrew is advocating, users end up paying in time.  As with anything as you hold supply steady and increase demand, wait times go up.

I live and work in a world where nothing is free.  Nodes, storage, admins, consulting, power, facilities etc. All take real resources and their use needs to be moderated.  Many educational institutions live in the world where the support from central is normally admins, maybe power, and maybe some software tokens.

By default in this world the researcher must bring the hardware, in the form of the funds to purchase nodes.  Under most cases this hardware can only be used by that group.  I currently run user support for a cluster of 5600+ cores spread across 51 groups.  The utilization is 50-70% but the matter is because these groups cannot, and are not allowed to share, inside a given group is very different. Some are 100% and queued, some are idle.

If the assumption is that we will never be just given the gear to run, and that funding agencies are only going to spend $X on computational research a year what value do you want out of it?  I argue that just reorganizing how the capital funding is spent more value is realized, as Andrew says "Science and Business output".

I agree 100% with Andrew that users do not always like paying for high speed networking, consults etc and might not even realize until after the funding arrives that that is what they need.  I also agree that some of the most interesting stuff is the "just try this" jobs.  I think these are problems we can solve other ways out of the savings of driving utilization higher.  I don't expect that all resources should be billed by the unit.  Some resources will most efficiently be provided as a public good.

Under the system I propose I would expect to extract close to maximum value from the capital resources, while maintaining great service.  Thus far these resources had been the hardware and the facilities they consume. In my most recent post I point out the benefit for of a HAAS model to users:
For the user they gain flexibility in utilization.  Groups with small budgets can now utilize large chunks of HPC resources for short periods, opening HPC to an entire new class of user.  To illustrate the Michigan Flux Project, a group with a budget of $1000 could not even buy one node in a condo but can purchase 89 cores compute for 1 month, or 1 core for 89 months.  The options here are the biggest to be realized by moving to a HAAS model.
This flexibility of utilization of the hardware equates to flexibility in the most important resource which is the researcher behind the job.  I am not saying the user behind the desk does nothing while they are jailed to smaller core counts in their condo, that they must maintain for 5 years, but I do think they can gain huge outcomes from being able to use a large number of resources in a short period for the same cost or lower than that of a condo.

Lastly the biggest winner is HAAS being approachable to those with the smallest budgets. The bottom billion researchers.  If you don't get a bucket of money to provide HPC to everyone who comes knocking, what do you tell that experimentalist that wants to run one model to align their input?  That they have to pony up for all the cores they want to use for 5 years?  Under HAAS this large base of small total users can bring a large amount of resources to bear in a short period.  This is real science and business value created from the same set of resources.

Hardware will always cost $X for Y cores, and HPC admins will ran those Y cores for 5 years.  I don't think it is good to drag the users with it. $X is a constant, there is no reason users should be forced into Y cores for 5 years. They should be able to vary Y and the number of years (months or days) up and down until the total area under the curve is $X.

Reorganize how your resources are provided and for the same capital you will see more capability.

Why HPC Services are Smarter than Condos

In my continuing tirade against HPC condos I will now propose what I think the solution is: HPC As A Service or HAAS.

HAAS provides a number of advantages over condos, the main being flexibility in construction and utilization.

For the HPC operators leasing cores by the hour or month etc. provides the administering organization ownership over hardware. If users don't own the hardware this means operators can swap out underlying hardware as needs demand. Example would be wanting to swap out older hardware for newer more powerful hardware to free data center space for expatiation, cooling needs, etc.

With a condo this meant buyouts and negotiations that were long and drawn out, and had the overhead of negotiating with every hardware owner. With HAAS the service is maintained and can be moved from older equipment to new equipment without the involvement of users as long as the service sold before and after the hardware swap is the same or improved.

For funding agencies this HAAS model provides for better utilization of hardware.  The cost passed to the user, and thus the funding source, should be less than the condo due to recovered capital deprecation. This would be in the form of over subscription of resources, so their average utilization as a unit is higher than in the condo. Remember in the condo that groups own gear and if the group is not currently using their gear, no other group can.  In the HAAS model over subscription can reach 50% or higher of the available hardware driving capital utilization higher.  Thus more research for less buck.

For the user they gain flexibility in utilization.  Groups with small budgets can now utilize large chunks of HPC resources for short periods, opening HPC to an entire new class of user.  To illustrate the Michigan Flux Project, a group with a budget of $1000 could not even buy one node in a condo but can purchase 89 cores compute for 1 month, or 1 core for 89 months.  The options here are the biggest to be realized by moving to a HAAS model.

The existing groups with large funding and continuous needs also benefit from flexibility, again as they can now procure additional resources for short periods to augment their standard needs. This burst use in emergencies or other sporadic needs went unsatisfied under pure condo models.

I personally think more HPC providers should move to an HAAS model.  These models are already used in the commercial space with providers like Amazon, Penguin and IBM.  It is also heavily used in the academic space where funding does not change hands eg. XSEDE.  As soon as funding comes into play, the push is for condos because of funding requirements and this is unfortunate.

Tuesday, February 7, 2012

What is Out Of Core Processing

In my previous post on Swap and why to avoid it I mention out of core methods (OOC).  OOC used to be much more common than it is currently as memory sizes have increased and price per GB has fallen.  I still see traces of it in many engineering codes that have been with us a long time.  Many applications do OOC without realizing it.

OOC processing is the assumption that the working memory of the system, RAM, is less than that needed for the working set.  Thus the application is written such that data is read and written from disk to make up for the lacking space.  This is different from swap because the application does the read() and write() calls rather than the operating system doing this transparently for you.

While relying on the operating system is easy, the operating system is really making just educated guesses and tends to writes data in small blocks rather than large sequential reads and writes which are the best for performance.  I noted in my previous post that swapping hard drives maintain about 10-20MB/s, while they stream large reads and writes at 100MB+.   If you write your own OOC method to write all the data to disk,you will read later in a large block and then read the previously written large block in you will have much better performance than that provided by swapping.

A good example of this is iterating on a large array of values much larger than the RAM of the system.  The application would read a chuck of data into memory, do the calculation it could on that chuck, put it back down to disk and read the next chunk etc.  Obviously this is still much slower and much more complicated to code than fitting the entire application into RAM if available.

Thus OOC is not recommended if avoidable, but it is a better option if you know you will never have hardware with enough RAM to fit your data.  Lastly, the case of fitting all data into RAM is called in core.

Friday, February 3, 2012

Swap the Annoying Cousin of HPC

Swap, Paging, Virtual Memory, drunk relatives, whatever you call it, avoid it.  There has been a lot of confusion among users so I hope to dispel some of the myths and ideas around swap for HPC applications.

Virtual memory allows modern operating systems to present more memory to an application than is actually installed in the platform. There is a lot more going on here but for the sake of simplicity in our case what happens if my HPC job requires 4GB of ram and my node I am allocated only has 3GB free?

When the operating system needs memory, and there is not any more physical memory installed the system starts swapping. This is the process of taking some data out of memory and writing it to a swap/page file on disk.  This sounds like a great idea, I have talked to users who rely on swap space to run their application, not expecting any impact on their performance.

To start think of how a hard drive works, there is a physical spinning platter with a needle moving over it, while RAM in a system is charges stored in a capacitor. Which is faster: Flying electrons or 7200RPM record player?  When the operating system runs out of space and starts putting some of your data onto disk it tries to make a best guess about what was used last in hopes that you will not access that data again soon, in most HPC applications this is not the case. Most of our data is represented in a few large arrays that we walk up and down over and over again.

The result is the hard drive in the computer rushing back and forth, trying to write out some data from RAM to make space for the data just requested, then read that requested data back into memory to run your application on.  So what is the speed of a hard drive vs. ram?

In modern systems with 3 memory channels and two sockets 12 total cores stream provides memory bandwidth of 42,174MB/s.  Under the best of situations hard drives give 100MB/s, under the swapping case where chunks of data are both read and written at the same time this falls by 10X to about 10-20MB/s.

In these situations your application will crawl. If you expected to use your hard drive, for any type of data storage for the application, in addition to swap your performance will be even worse because of all the demands placed on the hard drive.

What about SSD drives?   For the cost/speed I would just buy extra ram, if you still use SSD use one of the PCIe cards not the ones that use the SATA bus.

Never would I ever, as my first recommendation, say 'use swap'.  I would in order say:

   * Buy more ram
   * Get an Xsede allocation on Blacklight
   * Write your application to use out of core methods with a pile of SSD
   * Use out of core methods without SSD
   * Partition your code to run in parallel on more nodes (more ram in total)
   * Build a system with ScaleMP/vSMP.
   * Fine use swap

That is the story, total memory on the system is ram+swap, the useable memory on the system is ram+0*swap=ram. Do not use swap!

Some Links