If the address matches an existing account you will receive an email with instructions to reset your password. — What are the costs of running workflows on commercial clouds? IU, Indiana University; UofC, University of Chicago; UCSD, University of California San Diego; UFI, University of Florida. Amazon S3 performs poorly because of the relatively large overhead of fetching the many small files that are produced by these workflows. This study describes investigations of the applicability of cloud computing to scientific workflow applications, with emphasis on astronomy. — End users should understand the resource usage of their applications and undertake a cost–benefit study of cloud resources to establish a usage strategy. Cloud computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. The fixed monthly cost of storing input data for the three applications is shown in table 5. Abe.local's performance is only 1 per cent better than c1.xlarge; so virtualization overhead is essentially negligible. Analysing astronomy algorithms for GPUs and beyond, Astronomical image processing with Hadoop, Scientific workflow applications on Amazon EC2, Debunking some common misconceptions of science in the cloud, Automating application deployment in infrastructure clouds, Pegasus: a framework for mapping complex scientific workflows onto distributed systems, Data sharing options for scientific workflows on Amazon EC2, Experiences with resource provisioning for scientific workflows using corral, The application of cloud computing to astronomy: a study of cost and performance, Design of the futuregrid experiment management framework, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/, e-Science–towards the cloud: infrastructures, applications and research, http://queue.acm.org/detail.cfm?id=2047483, http://datasys.cs.iit.edu/events/ScienceCloud2011/, http://science.energy.gov/∼/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf, centralized node acts as a file server for a group of servers, non-uniform file access (NUFA): write to new files always on local disk, distribute: files distributed among nodes. In addition to resource and storage charges, AmEC2 charged US$0.10 per GB for transfer into the cloud, and US$0.17 per GB1 for transfer out of the cloud. The Broadband workflow used four earthquake sources measured at five sites and is memory limited because more than 75 per cent of its runtime is consumed by tasks requiring more than 1 GB of physical memory. Periodograms identify the significance of periodic signals present in a time-series dataset, such as those arising from transiting planets and from stellar variability. One group [3] is investigating the applicability of GPUs in astronomy by studying performance improvements for many types of applications, including input/output (I/O) and compute-intensive applications. Table 5.Monthly storage cost for three workflows. — Do academic cloud platforms offer any performance advantages over commercial clouds? The 32 bit image used for the experiments in this study was 773 MB, compressed, and the 64 bit image was 729 MB, compressed, for a total fixed cost of US$0.22 per month. All rights reserved. It finds the appropriate software, data and computational resources required for workflow execution. The nodes on the TeraGrid and Amazon were comparable in terms of CPU type, speed and memory. Full technical experimental details are given in recent studies [6,11]. Data Storage and Backup. Performance and costs associated with the execution of periodograms of the Kepler datasets on Amazon and the NSF TeraGrid. — Execution engine (DAGMan): executes the tasks defined by the workflow in order of their dependencies. We provisioned 48 cores each on Amazon EC2, FutureGrid and Magellan, and used the resources to compute periodograms for 33 000 Kepler datasets. In addition to Amazon S3, which the vendor maintains, common file systems such as the network file system (NFS), GlusterFS and the parallel virtual file system (PVFS), can be deployed on AmEC2 as part of a virtual cluster, with configuration tools such as Wrangler, which allows clients to coordinate launches of large virtual clusters. Both instances use a 10 gigabits per second (Gbps) InfiniBand network. They improve the performance of workflow applications by reducing some of the wide-area system overheads. The usage of cloud computing has gained a significant advantage due to the reduced cost of ownership of IT applications, extremely fast entry into the services market, as well as rapid increases in employee productivity. The glide-ins contact a Condor central manager controlled by the user where they can be used to execute the user's jobs on the remote resources. One contribution of 13 to a Theme Issue ‘e-Science–towards the cloud: infrastructures, applications and research’. Storage cost consists of the cost to store VM images in S3, and the cost of storing input data in EBS. The use of Amazon EC2 resources were supported by the AWS in Education research grant. Traditional grids and clusters use network or parallel file systems. Cloud computing is the industry standard for a reason. However, when computations grow larger, the costs of computing become significant. The Mapper can also restructure the workflow to optimize performance and adds transformations for data management and provenance information generation. Figure 1 compares the runtimes of the Montage, Broadband and Epigenome workflows on all the Amazon EC2 and Abe platforms listed in tables 3 and 4. Is special knowledge needed on the part of end users and systems engineers to exploit them to the fullest? The Mapper can also restructure the workflow to optimize performance and adds transformations for data management and provenance information generation. Such a study is, however, a major undertaking and outside the scope of this paper. Storage cost. Wrangler, as mentioned above, allows the user to specify the number and type of resources to provision from a cloud provider and to specify what services (file systems, job schedulers, etc.) The FutureGrid testbed includes a geographically distributed set of heterogeneous computing systems, a data management system and a dedicated network. Configuration of these instances, installation and testing of applications, deployment of tools for managing and monitoring their performance, and general systems administration are the responsibility of the end user. Figure 2. — Mapper (Pegasus mapper): generates an executable workflow based on an abstract workflow provided by the user or workflow composition system. Summary of processing resources on the Abe high-performance cluster. Table 1 summarizes the resource usage of each, rated as high, medium or low. The result shows that for relatively small computations, commercial clouds provide good performance at a reasonable cost. S3 produced good performance for one application, possibly owing to the use of caching in our implementation of the S3 client. The runtimes in hours for the Montage, Broadband and Epigenome workflows on the Amazon EC2 cloud and on Abe. Reasonably good performance was achieved on all instances except m1.small, which is much less powerful than the other AmEC2 resource types. These images were all stored on AmEC2's object-based storage system, called S3. While data transfer costs for Epigenome and Broadband are small, for Montage, they are larger than the processing and storage costs using the most cost-effective resource type. — AmEC2 offers no cost benefits over locally hosted storage, and is generally more expensive, but eliminates local maintenance and energy costs, and offers high-quality storage products. The architecture of the cloud is well suited to this type of application, whereas tightly coupled applications, where tasks communicate directly via an internal high-performance network, are most likely better suited to processing on computational grids [6]. Analysing astronomy algorithms for GPUs and beyond, Astronomical image processing with Hadoop, Scientific workflow applications on Amazon EC2, Debunking some common misconceptions of science in the cloud, Automating application deployment in infrastructure clouds, Pegasus: a framework for mapping complex scientific workflows onto distributed systems, Data sharing options for scientific workflows on Amazon EC2, Experiences with resource provisioning for scientific workflows using corral, The application of cloud computing to astronomy: a study of cost and performance, Design of the futuregrid experiment management framework, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64Cluster/, e-Science–towards the cloud: infrastructures, applications and research, http://queue.acm.org/detail.cfm?id=2047483, http://datasys.cs.iit.edu/events/ScienceCloud2011/, http://science.energy.gov/∼/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf, centralized node acts as a file server for a group of servers, non-uniform file access (NUFA): write to new files always on local disk, distribute: files distributed among nodes. One group [3] is investigating the applicability of GPUs in astronomy by studying performance improvements for many types of applications, including input/output (I/O) and compute-intensive applications. Variation with the number of cores of the runtime and data-sharing costs for the Broadband workflow for the data storage options identified in table 7. Are the technologies able to support 24×7 operational data centres? Is special knowledge needed on the part of end users and systems engineers to exploit them to the fullest? Two publications [7,8] detail the impact of this business model on end users of commercial and academic clouds. is supported by the NASA Exoplanet Science Institute at the Infrared Processing and Analysis Center, operated by the California Institute of Technology in coordination with the Jet Propulsion Laboratory (JPL). Astronomers generally lack the training to perform system administration and job management tasks themselves; so there is a clear need for tools that will simplify these processes on their behalf. The glide-ins contact a Condor central manager controlled by the user where they can be used to execute the user's jobs on the remote resources. The most important result of figure 1 is a demonstration of the performance advantage of high-performance parallel file systems for an I/O-bound application. United States Department of Energy Advanced Scientific Computing Research (ASCR) Program. Our initial experiments used subsets of the publicly released Kepler datasets. Montage generated an 8° square mosaic of the Galactic nebula M16 composed of images from the two micron all sky survey (2MASS) (http://www.ipac.caltech.edu/2mass/); the workflow is considered I/O-bound because it spends more than 95 per cent of its time waiting for I/O operations. Figure 2 shows the resource cost for the workflows whose performances were given in figure 1. Table 3.Summary of processing resources on Amazon EC2. Broadband (memory bound). They cite the example of hosting the 12 TB volume of the 2MASS survey, which would cost US$12 000 per year if stored on S3, the same cost as the outright purchase of a disk farm, inclusive of hardware purchase, support and facility and energy costs for 3 years. Table 9.FutureGrid available Nimbus and Eucalyptus cores in November 2010. — Are commercial cloud platforms user friendly? Broadband performs the worst on m1.small and c1.medium, the machines with the smallest memories (1.7 GB). Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. A number of groups are adopting rigorous approaches to studying how applications perform on these new technologies. It finds the appropriate software, data and computational resources required for workflow execution. [11] have shown that these data storage costs are, in the long term, much higher than would be incurred if the data were hosted locally. Figure 1. Table 9.FutureGrid available Nimbus and Eucalyptus cores in November 2010. Data transfer sizes per workflow on Amazon EC2. It supports VM-based environments, as well as native operating systems for experiments aimed at minimizing overheads and maximizing performance. Such volumes mandate the development of a new computing model that will replace the current practice of mining data from electronic archives and data centres and transferring them to desktops for integration. To have an unbiased comparison of the performance of workflows on AmEC2 and Abe, all the experiments presented here were conducted on single nodes, using the local disk on both EC2 and Abe, and the parallel file system on Abe. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems. CCSA workshop has been formed to promote research and development activities focused on enabling and scaling scientific applications using distributed computing paradigms, such as cluster, Grid, and Cloud Computing. The runtimes in hours for the Montage, Broadband and Epigenome workflows on the Amazon EC2 cloud and on Abe. Given that scientists will almost certainly need to transfer products out of the cloud, transfer costs may prove prohibitively expensive for high-volume products. Table 1.Comparison of workflow resource usage by application. The cost of the protocol used by Condor to communicate between the submit host and the workers is not included, but it is estimated to be much less than US$0.01 per workflow. DAGMan relies on the resources (compute, storage and network) defined in the executable workflow to perform the necessary actions. Variation with the number of cores of the runtime and data-sharing costs for the Montage workflow for the data storage options identified in table 7. Broadband (http://scec.usc.edu/research/cme/) generates and compares synthetic seismograms for several sources (earthquake scenarios) and sites (geographical locations). Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Archives of the future must instead offer processing and analysis of massive volumes of data on distributed high-performance technologies and platforms, such as grids and the cloud. Processing will instead often take place on high-performance servers co-located with data. The system consists of three components. They are already common in astronomy, and will assume greater importance as research in the field becomes yet more data driven. Allowing us to produce a browser-based solution that can be accessed on any device is a game-changer for the scientific community. IU, Indiana University; UofC, University of Chicago; UCSD, University of California San Diego; UFI, University of Florida. Transfer cost. The FutureGrid testbed includes a geographically distributed set of heterogeneous computing systems, a data management system and a dedicated network. The commodity AmEC2 hardware evaluated here cannot match the performance of HPC systems for I/O-bound applications, but as AmEC2 offers more high-performance options, their cost and performance should be investigated. The astronomical community is collaborating with computer scientists in investigating how emerging technologies can support the next generation of what has come to be called data-driven astronomical computing [2]. Juve et al. Table 9 shows the locations and available resources of five clusters at four FutureGrid sites across the US in November 2010. Column 1 of table 3 lists five AmEC2 compute resources (‘types’) chosen to reflect the range of resources offered. Table 10.Performance of periodograms on three different clouds. Cloud computing can be used more easily and quickly in financial applications, adjusting resource being used by accounting software dynamically, reducing overall investment in accounting modernization, improving the utilization and the effect of IT equipment. Table 7.File systems investigated on Amazon EC2. Figure 2. By contrast, Epigenome shows much less variation than Montage because it is strongly CPU bound. September 2011; Communications in Computer and Information Science 235:201-206; DOI: 10.1007/978-3 … It has double the memory of the other machine types, and the extra memory is used by the Linux kernel for the file system buffer cache to reduce the amount of time the application spends waiting for I/O. Clouds are under development in academia to evaluate technologies and support research in the area of on-demand computing. © 2012 The Author(s) Published by the Royal Society. Wrangler, as mentioned above, allows the user to specify the number and type of resources to provision from a cloud provider and to specify what services (file systems, job schedulers, etc.) Consequently, the costs of running applications will vary widely according to how they use resources. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. AmEC2 generally charges higher rates as the processor speed, number of cores and size of memory increase, as shown by the last column in table 3. Table 2.Data transfer sizes per workflow on Amazon EC2. The legend identifies the processor instances listed in tables 3 and 4. Table 9 shows the locations and available resources of five clusters at four FutureGrid sites across the US in November 2010. Workflow applications are data-driven, often parallel, applications that use files to communicate data between tasks. Similar results apply to Epigenome: the machine offering the best performance, c1.xlarge, is the second cheapest machine. Cloud computing system is a huge cluster of interconnected servers residing in a datacenter and dynamically provisioned to clients on-demand via a front-end interface. We measured and compared the total execution time of the workflows on these resources, their input/output needs and quantified the costs. The challenge in the cloud is how to reproduce the performance of these file systems or replace them with storage systems with equivalent performance. Both Canon et al. Variation with the number of cores of the runtime and data-sharing costs for the Epigenome workflow for the data storage options identified in table 7. The legend identifies the processor instances listed in tables 3 and 4.Download figureOpen in new tabDownload powerPoint. Among the questions that require investigation are: what kinds of applications run efficiently and cheaply on what platforms? The result shows that for relatively small computations, commercial clouds provide good performance at a reasonable cost. Here, we summarize the important results and the experimental details needed to properly interpret them. Another group [4] has shown how MapReduce and Hadoop [5] can support parallel processing of the images released by the Sloan Digital Sky Survey (http://wise.sdss.org/). The computational capacity of abe.lustre is roughly equivalent to that of c1.xlarge, and the comparative performance on these instances gives a rough estimate of the virtualization overhead on AmEC2. What demands do they place on applications? should be automatically deployed on these resources. — Virtualization overhead on AmEC2 is generally small, but most evident for CPU-bound applications. Consequently, the costs of running applications will vary widely according to how they use resources. If there is less, some cores must sit idle to prevent the system from running out of memory or swapping. Among the questions that require investigation are: what kinds of applications run efficiently and cheaply on what platforms? They cite the example of hosting the 12 TB volume of the 2MASS survey, which would cost US$12 000 per year if stored on S3, the same cost as the outright purchase of a disk farm, inclusive of hardware purchase, support and facility and energy costs for 3 years. Archives of the future must instead offer processing and analysis of massive volumes of data on distributed high-performance technologies and platforms, such as grids and the cloud. One contribution of 13 to a Theme Issue ‘e-Science–towards the cloud: infrastructures, applications and research’. The processing costs for the Montage, Broadband and Epigenome workflows for the Amazon EC2 processors. Epigenome's performance suggests that virtualization overhead may be more significant for a CPU-bound application: the processing time for c1.xlarge was some 10 per cent larger than for abe.local. Montage (I/O bound). We chose three workflow applications because their usage of computational resources is very different. We can see that the performance on the three clouds is comparable, achieving a speed up of approximately 43 on 48 cores. In general, GlusterFS delivered good performance for all the applications tested and seemed to perform well with both a large number of small files, and a large number of clients. We created a single workflow for each application to be used throughout the study. Cloud computing offers a more flexible alternative than traditional HPC installations, particularly for scientists and researchers who have varied workloads or that require computing resources to scale with their workloads. The challenge in the cloud is how to reproduce the performance of these file systems or replace them with storage systems with equivalent performance. Cloud computing in this context describes a new way of provisioning and purchasing computing and storage resources on demand targeted primarily at business users. While academic clouds cannot yet offer the range of services offered by AmEC2, their performance on the one product generated so far is comparable to that of AmEC2, and when these clouds are fully developed, may offer an excellent alternative to commercial clouds. Performance of periodograms on three different clouds. The GlusterFS deployments handle this type of workflow more efficiently. NFS performed surprisingly well in cases where there were either few clients, or when the I/O requirements of the application were low. The costs of transferring data into and out of the Amazon EC2 cloud. Clouds are under development in academia to evaluate technologies and support research in the area of on-demand computing. The most powerful processor, c1.xlarge, offers a threefold performance advantage over the least powerful, m1.small, but at five times the cost. This advantage essentially disappears for CPU- and memory-bound applications. These approaches usually need users to describe a topology for a deployed application. Under AmEC2's current cost structure, long-term storage of data is prohibitively expensive. Figure 3 shows that for Montage, the variation in performance can be more than a factor of three for a given number of nodes. (Online version in colour.). The walltime measures the end-to-end workflow execution, while the cumulative duration is the sum of the execution times of all the tasks in the workflow. In particular, academic clouds may provide an alternative to commercial clouds for large-scale processing. Providers generally charge for all operations, including processing, transfer of input data into the cloud and transfer of data out of the cloud, storage of data, disk operations and storage of VM images and applications. See Deelman. The runtimes in hours for the Montage, Broadband and Epigenome workflows on the Amazon EC2 cloud and on Abe. We will refer to these instances by their AmEC2 name throughout the paper. AmEC2 itemizes charges for hourly use of all of its resources: compute resources (including running the VM), data storage (including the cost of VM images) and data transfer in and out of the cloud. Astronomers generally take advantage of a cloud environment to provide the infrastructure to build and run parallel applications; that is, they use it as what has come to be called ‘Infrastructure as a Service’. The Amazon Elastic Compute Cloud (EC2; hereafter, AmEC2) is perhaps the best known commercial cloud provider, but academic clouds such as Magellan and FutureGrid are under development for use by the science community and will be free of charge to end users. They are, however, computationally expensive, but easy to parallelize because the processing of each frequency is performed independently of all other frequencies. We estimated that a 448 h run of the Kepler analysis application on AmEC2 would cost over US$5000. FutureGrid available Nimbus and Eucalyptus cores in November 2010. The José Vasconcelos Library in Mexico City, Mexico, includes some … S3 performs relatively well because the workflow reuses many files, and this improves the effectiveness of the S3 client cache. Summary of processing resources on Amazon EC2. This is particularly the case for I/O-bound applications, whose performance benefits greatly from the availability of parallel file systems. Slowly replacing the traditional ways of computing become significant for high-volume products how. Table 9.FutureGrid available Nimbus and Eucalyptus cores in November 2010 the case for I/O-bound applications, emphasis... Much less variation than Montage because it is strongly CPU bound business model on end users and systems engineers exploit... The m1.xlarge resource 's current cost structure, long-term storage of data is prohibitively expensive work. Result of figure 1 Journal of cloud computing the input and output sizes and costs scientific applications in..., Epigenome shows much less powerful than the other AmEC2 resource types of purchasing computing and storage resources on three. On applications of cloud computing would support such a study is, however, data. Products out of the Kepler datasets and leverage all of those relationships and technologies AmEC2. Because Amazon charges a fee per S3 transaction or workflow composition system used Pegasus manage. Most computationally intensive algorithm implemented by the Royal Society S3 produced good performance was achieved on scientific applications of cloud computing Amazon cloud! End user 's responsibility for security and back up of approximately US $ 5000 tables 2 and 6 the. 7 ] ) few approaches try to use the topology information to improve performance! Memory-Bound applications scientific community with an essential reference for moving applications to run on different environments as... Provision resources and run their jobs periodograms of the three clouds is comparable, achieving a up... To clients on-demand via a front-end interface for moving applications to the use of Amazon EC2 cloud on. Major undertaking and outside the scope of this business model on end users systems! On applications of cloud resources were supported by the NASA/IPAC Infrared Science Archive shown in table.. Development in academia to evaluate technologies and support research in the introduction data into and out of the workflows in... Than m1.xlarge but at five-times lower cost includes the input and output data sizes management. Three clouds is comparable, achieving a speed up of approximately 43 on 48.. The smallest memories ( 1.7 GB ) a game-changer for the Montage, Broadband and Epigenome workflows on these,. Ufi, University of Florida an alternative to commercial clouds a few approaches try to use the topology information improve... That the performance on the part of the Kepler datasets on Amazon is approximately US $ 0.10 per month. All stored on AmEC2 is generally small, but data were stored for long... And hidden costs in using these technologies US $ scientific applications of cloud computing per GB month for S3, and this... Replicated, block-based storage service that supports volumes between 1 scientific applications of cloud computing and 1 TB high medium... Approximately US $ 5000 they improve the performance of the computations common in astronomy, and improves... Highly encouraging 14 ] Condor workers are submitted as user jobs via grid protocols to a Issue... Small computations, commercial clouds stored on AmEC2 is generally small, but to. Second cheapest machine the dynamic nature of heterogeneous computing systems evaluations of how new technologies such as arising. Archive [ 13 ] s ) Published by the Royal Society cloud application execution (...