---------------------------------------------------------------------- H H PPP CCC UK NATIONAL HPC SERVICE H H P P C C -------------------------- HHHHH PPP C x x provided by H H P C C xx EPCC and H H P CCC x x CCLRC Daresbury Laboratory ---------------------------------------------------------------------- HPCx User Mailing 028 28 November 2003 ---------------------------------------------------------------------- Contents ** December meetings - Final Call!! ** HPCx course: Optimisation techniques for the Power4 processor ** HPCx seminar: Towards capability computing ** HPCx Users' group meeting ** 14th Daresbury Machine Evaluation Workshop ** jtmp: Job-temporary scratch space ** Memory allocation under LoadLeveler ** Busy machine ---------------------------------------------------------------------- Greetings-- ---------------------------------------------------------------------- DECEMBER MEETINGS - FINAL CALL FOR REGISTRATIONS A reminder of the meeting which are taking place at Daresbury Laboratory over 9 - 12 December. They're described in the sections below. This is a final call for registrations to these events. We must close them at lunchtime on Monday, 1 December 2003. REGISTER NOW!! Together these events will make up a really important opportunity for contact between users, vendors and providers of high-performance computing in the UK. You are most cordially invited to attend, and HPCx and DL staff look forward to meeting you at Daresbury. How to get to the Daresbury Laboratory: http://www.cclrc.ac.uk/Activity/ACTIVITY=DLMaps; Hotels in the area: http://www.cse.clrc.ac.uk/disco/mew14/hotels.html ---------------------------------------------------------------------- HPCx COURSE: OPTIMISATION TECHNIQUES FOR THE POWER4 PROCESSOR Tuesday, 9 December. This course will focus on tools and techniques for single processor optimisation on HPCx. The course will cover the architecture of the processors and memory system, profiling and hardware counter tools, getting the best from the compilers and Power4-specific tips and tricks. There will be hands-on sessions in addition to lecture material. Registration form: http://www.hpcx.ac.uk/support/training/form.html ---------------------------------------------------------------------- HPCx SEMINAR: TOWARDS CAPABILITY COMPUTING Wednesday, 10 December One of our key challenges is to ensure that the full capability of the HPCx service is used to the limit. We need to enable applications in all disciplines to scale effectively right up to the full size of the system. Speakers will include experienced users and HPCx and IBM staff. This will be the first in an annual series of seminars. http://www.hpcx.ac.uk/about/events/annual2003/ ---------------------------------------------------------------------- HPCx USERS' GROUP MEETING Wednesday, 10 December This is a chance for users to bring their concerns and problems directly to the senior management of HPCx, who will be present. We hope that this will also include issues that have arisen during the Seminar. Our intention is that in many cases it will be possible to take decisions on the spot to respond to these. ---------------------------------------------------------------------- 14TH DARESBURY MACHINE EVALUATION WORKSHOP Thursday-Friday, 11-12 December This well-established and widely-respected annual event aims to encourage close contact between the research communities and vendors of distributed high-performance scientific computing. About a dozen vendors are expected to make presentations. Systems will be available for benchmarking (it is hoped starting on Monday, 8 December), and there will be an exhibition. Proceedings will be published and made available to those registered to attend. http://www.cse.clrc.ac.uk/disco/mew14.shtml ---------------------------------------------------------------------- jtmp: JOB-TEMPORARY SCRATCH SPACE This is a way for users to have access to a large (4.5Tb) shared filespace. Space there is created automatically when your LoadLeveler job starts, and released automatically as soon as it ends. You can get the pathname of a temporary directory in this space by doing this in your job: JTMPDIR=`lljtmp` You can then get there by doing this: cd $JTMPDIR Access to the jtmp space is unlimited - there are no quotas there. In practice, however, you have to share the space with whoever else is using it at the time. There are no guarantees. You can do anything you like there. However, the directory and everything in it are completely wiped at the end of teh job and cannot be retrieved. Anything you want to keep must be copied away before the job ends. ---------------------------------------------------------------------- MEMORY ALLOCATION UNDER LOADLEVELER Recently we have had some problems with allocation of memory for jobs under LoadLeveler, which have caused difficulties for some user groups. To cope with these, we have implemented some changes to the software which scans jobs as they are submitted to LoadLeveler, known as the 'submission filter'. These changes allow users more control over the use of memory by the system. In future, the total amount of real memory which can be occupied by a process will be: 7.2 Gb / tasks_per_node This is called the RSS - the resident set size. (A node is an LPAR.) The RSS is divided into two areas: stack and data. In Fortran terms, stack is used for: - subprogram calling information - local variables, including arrays, unless they are marked SAVE Data is used for: - program code - static variables, including COMMON variables and variables marked SAVE - memory allocated by ALLOCATE - known as 'heap' variables. - buffers for use by MPI and Fortran IO A program that runs out of either stack or data space will fail. For stack space, you will usually get a message like this: ERROR: 0031-250 task 1: Segmentation fault (Unfortunately, there are other things which can also cause this.) If you run out of data space, the message will usually be like this: 1525-108 Error encountered while attempting to allocate a data object. The program will stop. With the new filter, you can specify the amount of memory to be allocated to these two areas, using the keywords stack_limit and data_limit. For example: #@ stack_limit = 400mb will set the stack to 400 Mb. The following rules apply: * If stack_limit is not specified, it defaults to 200 Mb (This has been the fixed size until now) * If data_limit is not specified, it defaults to RSS - stack_limit * If stack_limit + data_limit > RSS, the job will fail This means that if you specify neither of these, LoadLeveler will behave as it has up to now. Notice that the system will not now allow you to have a virtual memory which is larger than the RSS. This means that the system should not normally swap. If you need more memory per process than is allowed by these rules, you will need to reduce the number of tasks you have per node. You might think that we are forcing users to waste AUs, by obliging them to decrease the number of tasks per node instead of swapping in this case. But swapping a multiprocessor application has serious performance implications; your application is likely to waste at least as much time simply waiting for the swaps. ---------------------------------------------------------------------- BUSY MACHINE HPCx is very busy at the moment. We appreciate that this can be frustrating. Here are a couple of points. * Submitting large numbers of jobs to LoadLeveler does no harm. However, it doesn't help you to get to the top of the queue. It's not the case that if you have lots of jobs waiting, your jobs are more likely to run. In fact, when it's planning its job mix, LoadLeveler only looks at the four oldest jobs you have submitted. * Please don't use the interactive-parallel region to run jobs which don't need to be interactive. This can seriously affect people who really need to run interactively. ---------------------------------------------------------------------- Regards --John ---------------------------------------------------------------------- Earlier mailings: http://www.hpcx.ac.uk/support/notices/index.html To be removed from the mailing list: log into your website account, go to the "Update" page, and click the "Opt out of user emails" field; then click "Commit update". -- John Fisher j.fisher@epcc.ed.ac.uk HPCx User Administration and Helpdesk HPCx: http://www.hpcx.ac.uk Helpdesk: support@hpcx.ac.uk Phone: +44 131 650 5029 Fax: +44 131 650 6555