---------------------------------------------------------------------- H H PPP CCC UK NATIONAL HPC SERVICE H H P P C C -------------------------- HHHHH PPP C x x provided by H H P C C xx EPCC and H H P CCC x x CCLRC Daresbury Laboratory ---------------------------------------------------------------------- HPCx User Mailing 020 15 August 2003 ---------------------------------------------------------------------- Contents ** UK e-Science All-Hands Meeting ** Capability incentives scheme ** Regions of the system ** Development hardware ** Mass store: status of HSM ** Updating your web record ** Fed up with these mailings? ---------------------------------------------------------------------- Greetings-- UK e-SCIENCE ALL-HANDS MEETING A reminder that the second All-Hands Meeting will take place on 2-4 September, 2003, at the East Midlands Conference Centre, Nottingham. This should be a good opportunity to meet people who are working in a variety of e-Science areas including HPC, other users of HPCx and the framers of e-Science policy in the UK. HPCx will be taking part both in the presentation sessions and in the poster presentations, and we will have a stand. We look forward to meeting you there. More details and registration: http://www.research-councils.ac.uk/escience ---------------------------------------------------------------------- CAPABILITY INCENTIVES SCHEME If a code can utilise a large number of processors efficiently, then it may be eligible to receive a reduction in the number of AUs debited from its user's budget. This is to act as an incentive to people to write and use codes which scale well. You may have a code of your own which could receive a discount like this; or you may want to apply for a discount on a generally- available code. To apply for a discount, submit a query to the helpdesk at support@hpcx.ac.uk You will be asked to make the code and a representative dataset available to HPCx staff, who will test its scaling efficiency. Among other things, this will involve timing the code on both n and 2*n LPARs, say, and calculating the speed up. If this speed up is adequate, the code will be given a "Seal of Approval"; they come at four levels: gold silver and bronze, depending on the number of processors they use. A list of codes which have already received Seals of Approval can be found here: http://www.hpcx.ac.uk/services/policies/starcodes.html Once your project and code have been approved for a discount, you will be given a special token to identify your jobs to the accounting software. You use this in the job_name line of your LoadLeveler script. For example, if the token you are given is 'x01B3_solve', then your job_name line might read: #@ job_name = #x01B3_solve#job123 When the job has run and the accounting is done, the accounting software will apply the discount automatically. The scaling efficiency is often tied to the kind of data being processed and the number of processors used. For this reason, discounts are generally given for a specific combination of code, project and number of processors, and the token you are given will only work for this combination. ---------------------------------------------------------------------- REGIONS OF THE SYSTEM The processors of the HPCx system are divided into a number of regions. Region LPARs Processors ------------------------------------------------ Production 128 1024 Development 24 192 Interactive Parallel 4 32 Batch Serial 4 32 Login 1 8 The main HPCx system consists of 40 frames, each with 4 LPARs of 8 processors in them: that's 160 LPARs, 1280 processors. THE LOGIN REGION above is a single LPAR which is not on one of these 40 frames; instead, it is on one of the special frames otherwise devoted to IO. When you ssh to login.hpcx.ac.uk, you are connected to this Login LPAR. Every command you type directly on the command line at HPCx is executed in the Login LPAR as well. This LPAR is a strictly limited resource, and if it is bound up with lengthy executions, it will impact directly on the activities of other users. It's fine to use it for edits, short compilations and housekeeping; anything else should be executed as a serial batch job (see below). Another thing which we would like people not to do is to put long jobs into the background on the Login LPAR, especially jobs which fire off batch jobs one after another. It's preferable to submit jobs which start their successors, if they complete successfully. The other regions are batch regions, and the batch system selects a region to run your job, depending on your batch job command file. THE PRODUCTION REGION is where we expect the majority of the work to be done. Batch jobs will be placed here if they use more than 8 LPARs (64 processors) and run for a maximum of 12 hours. THE DEVELOPMENT REGION will take jobs which use no more than 8 LPARs (64 processors) and run for no more than 6 hours. It's intended principally for development work. People will notice that these descriptions do not cover jobs which last for longer than 6 hours, but only use 8 LPARs or less. This is intentional: such jobs will not run and shouldn't be submitted, as they cannot really be described as development jobs. For the same reason, it's not in general a legitimate use of the Development Region to run large numbers of 6-hour 64-processor jobs. This is really production work. If this seems tough, we have to remember that HPCx is intended primarily as a capability system. The justification for a system of such size and cost is to do work which can't be carried out elsewhere. Moreover, if the Development Region is being used for lots of production work, people who have legitimate development work to do can't make progress, which isn't fair. We will be happy to help you take advantage of the full power of the system - please mail to support@hpcx.ac.uk if you need this. THE INTERACTIVE PARALLEL REGION is for interactive use, including runs of TotalView. For general interactive use, please look in the "Interactive Execution" section in the User Guide at: http://www.hpcx.ac.uk/support/documentation/UserGuide/HPCxuser/ For TotalView use, look here: http://www.hpcx.ac.uk/support/FAQ/totalview/ Note that this region is quite a scarce resource. If one person is doing an interactive run over 32 processors, no one else can get in until it finishes. Please bear this in mind. THE BATCH SERIAL REGION is for general serial work. Please use this, rather than the Login LPAR, for anything serious, including long compilations and links, etc. We don't intend to lay down strict formal rules for the use of the various regions. However, it's part of our remit to make sure that the best possible use is made of the system, and that as far as reasonably possible, people's work isn't badly affected by the activities of their neighbours on the machine. If we notice people doing things which fall outside this policy, we will take it up with them. Please help us with this. ---------------------------------------------------------------------- DEVELOPMENT HARDWARE Recently HPCx has taken delivery of some more hardware, which has been installed in the computer room at Daresbury, with the cooperation of EPSRC and IBM. The Test and Development System is like a small model of the current Phase 1 system, both in its hardware and its software, and includes two p690 frames. Its cost has been underwritten by the University of Edinburgh, and it will be used to test and tune new versions of the system software and new configurations, before implementing them on the main service. This will help us to respond flexibly to users' needs for adjustments of the configuration and access to the latest software, without disrupting current work. The Phase 2 Development System is another new setup, provided with the support of EPSRC, which is enabling us to start working towards the new Phase 2 system which will replace the current system next year. This includes two Regatta H+ frames and is due to be enlarged in the near future; a preliminary version of the new Federation switch will be added around the end of this year. This system has not yet been formally accepted. ---------------------------------------------------------------------- MASS STORE: STATUS OF HSM Hierarchical Storage Management (HSM) is currently under test and evaluation on the Test and Development System (see above) that is now part now of the equipment installed at Daresbury. HSM, which is part of the Tivoli suite of products, implements the archiving functions of the 3584 Tape Library mass store. HSM has been successfully installed on the test system, and is now being put under a rigorous testing regime that is intended to show up any problems or deficiencies that there may be with the product before it may be rolled out onto the service machine. It is expected that this test and evaluation will take some time, as it is important that HSM is both reliable and robust by the time it is made available to users. Currently, the intention is to provide a level of functionality that is similar to that which has been available to users under DMF on the Edinburgh and Manchester CRAY systems. We are aware that the present lack of HSM may be causing some problems for users with vary large space requirements. Accordingly, we are proposing to make available additional disc space under GPFS to user groups who have a particular need for additional storage. Note that this space will be provided on a temporary basis only until HSM is in service, and will be "at-risk" in that it will not be backed up (although it will exist on RAID disc arrays giving a high degree of protection against medium failure). Users with particular data storage requirements who might wish to take advantage of this additional disc availability should discuss their needs with user support. --Mike Brown, Head, Operations and Systems Group John adds: It's worth noting that the mass store hardware itself is in daily use, backing up users' home spaces. Mike's concerns above affect only the HSM software. ---------------------------------------------------------------------- UPDATING YOUR WEB RECORD As users will know, everyone has a record on the HPCx database, which you can see by logging into the administrative website. You can update the information in your record whenever you wish, and we would like to encourage you to do this. Some people have chosen not to supply their phone numbers - this is entirely up to you, of course, but it could be helpful for us to have your phone number if we need to contact you urgently. Similarly, it's useful if your email address is up-to-date, and I would like to urge people to make sure that it is. A reminder: To log in to your website account, do this: - Go to https://www.hpcx.ac.uk/. Then: - Enter your email address - Enter your password for the admin website - Click "Login" If you've forgotten your password, leave it blank. A red error message will appear, plus a button marked "Email". Click this and your password will be mailed to you. The website will only mail to addresses it knows; nevertheless, after this, you should change your password. To update your details, click the button marked "Update". Change the fields as you wish, and then click "Commit update". ---------------------------------------------------------------------- FED UP WITH THESE MAILINGS? If you would rather not receive email from HPCx, including these mailings, you can stop them by using the website. Log into your website account (see above), go to the "Update" page, and click the "Opt out of user emails" field; then click "Commit update". ---------------------------------------------------------------------- Regards --John ---------------------------------------------------------------------- Earlier mailings: http://www.hpcx.ac.uk/support/notices/index.html -- John Fisher j.fisher@epcc.ed.ac.uk HPCx User Administration and Helpdesk HPCx: http://www.hpcx.ac.uk Helpdesk: support@hpcx.ac.uk Phone: +44 131 650 5029 Fax: +44 131 650 6555