-*- text -*- General stuff ------------- - The haltd in the lamd has a sleep(1) in it to allow other things to happen before the lamd actually dies. This has the side effect of "lamhalt" returning before the lamd actually dies, and if you have a fast processor and are (for example) running a script, you may run into a race condition where there *should* be a lamd running, but there isn't because the delayed lamhalt/tkill got it. For example: while (forever) lamboot mpirun C foo lamhalt This can run into problems on fast processors. The fix: somehow make lamhalt wait until the local lamd actually dies before returning. - Add evironment variable LAM_MPI_RSH to do the same as LAM_RSH. - Do enough of MPI_CANCEL so that we can claim that it is conformant. - Add new RPI functions for basic collectives: probably only barrier and broadcast. - Add new RPI functions for checkpointing; have the rolling checksums on each communication pair that can determine when all "in flight" messages have arrived and been put into the processes' local memory so that a checkpoint will save it (as opposed to just receiving a message onto memory on a NIC -- e.g., myrinet -- that will not be saved in a checkpoint). - Finish the MPI_INIT_THREAD stuff such that the first three levels are supported. This will entail finishing the thread tests in configure as well, and testing to ensure that the "global lock" stuff works for the 3rd level. - How about making the RPI's have a separate thread that runs them? i.e., not making LAM multi-threaded (yet), but as a first step towards that, have a thread running in the RPI that makes progress in the background? i.e., the RPI API would just be access points to queues instead of doing the real work. - Raja suggests a --with option to configure that will make all four Fortran conventions in the fortran library(ies?). Hence, a vendor can do this in an RPM and there maximize the possibility to client code working properly with it. It won't fix fortran-string-handling differences between different fortran compilers (for example), but it may expand the shelf-life of an RPM on a CD. ...or does it? If only the symbol convention is fixed and nothing else is, does this really work for multiple fortran compilers? Hrmm. Doubtful. See http://www.mpi.nd.edu/Internal/llamas/msg01612.php3. - Make -O option to mpirun be unnecessary -- have pairs of ranks exchange info on startup that determine endianness and set appropriate flags as to whether swapping is necessary. - Make the lamtests module not suck. There are several things that could be done better: - add a configure script; automagically adding "-Ae" on HP-UX 10.20 native compilers, for example. - ensuring to get the "right" mpicc/mpif77/lamboot/mpirun/etc. - eliminate much of the deep Makefile voodoo that makes the lamtests suite work; move it all to shell scripts or something. - add some f77 tests (e.g., MPI_STATUSES_IGNORE). - do a "lamboot" if it's not already there. - perhaps make lamtests be "make check" in the main LAM distribution so that it's not a separate package -- it could then piggyback off the main configure script, and wouldn't need to be a second download. - Fix up (i.e., test) the LAM signal handler so that it won't go into a recursive loop if, for example, a seg fault occurs during the exit() call in the signal handler. - Make value of directory where named sockets and whatnot be overridable by an environment variable. Specifically, "/tmp" is currently hard-coded. Make it be an environment variable, for environments like Scyld (or other NFS-mounted / environments) that may not have local storage in /tmp. - Add support for MPI_LONG_LONG_INT in builtin reduction operations. See http://www.mpi.nd.edu/Internal/llamas/msg01537.php3. - Add an MPI_Info flag to MPI_Comm_spawn to have it spawn processes on nodes/cpus in a 'round-robin' fashion. Add another MPI_Info flag indicating which node/cpu to start with. - what happens to mpirun if "tkill" (effectively) happens on a node during a run? i.e., the fault tolerant example problem. The real solution is that when a rank discovers that another rank is dead, it should notify mpirun somehow. Then mpirun will wait for one less process to finish (however, mpirun will have to keep track of who ranks tell it not to wait for -- it is possible that multiple ranks will report to mpirun that rank 7 has died, so mpirun needs to register that rank 7 is dead only once). - Make mpirun have a nice, flexible method of getting the return statuses from all the ranks. Some command line switches should be able to return any/all/none/any combination of return statuses, and perhaps a few trivial algorithmic functions on the resulting set (max, min, etc.). See http://www.mpi.nd.edu/Internal/llamas/msg01456.php3 for a whole schlew of ideas w.r.t. this concept. - In a bootschema, if a different username is specified for the localhost, we should rsh to launch that one, not lam_few(). - Finish the C++ bindings for all the MPI-2 functions. - Integrate ROMIO and MPI2C++ into the build and dist process better (especially MPI2C++ -- the extra steps of "touch" are really silly, and if mpi2c++ gets any bigger, will become unweildy). Also, it is desirable for have ROMIO be able to produce shared libraries where libtool supports it. - Make recon check for writability in /tmp (since the lamd will need it), and have fail with an appropriate error message if it can't write to /tmp/. Thanks to Alex Rhomberg for pointing this out (http://www.mpi.nd.edu/MailArchives/lam/msg01599.php3). - Better error reporting in lamboot/recon when the remote lamd is not able to start up properly. i.e., hboot immediately severs the stdout/stderr to the lamd so that rsh/ssh can finish. Perhaps we should wait for the connection from the remote lamd before allowing the hboot to quit, and therefore not have to sever the stderr/stdout from the lamd immediately. Hence, the we can have a "-d" (or whatever) debugging output flag to the lamd, who can then send error output messages to stdout/stderr, and rsh/hboot will funnel it back to recon/lamboot. Hence, hboot doesn't die until a) it receives an ACK from lambootagent saying "ok to die, close stdout/stderr" -- we'll also have to tell the lamd to close stdout/stderr, too, or b) lamd dies abnormally, upon which point we nsend something back to lambootagent saying "oops... badness happened here". Perhaps hboot can block in nrecv waiting for a), and be setup to catch SIGCHLD to detect b)...? See http://www.mpi.nd.edu/MailArchives/lam/msg01599.php3 and http://www.mpi.nd.edu/MailArchives/lam/msg01600.php3. - Add IPv6 support for All Things using TCP. Not quite as simple as it sounds -- although we have a small number of places that do IP name lookups and open sockets, there are both a larger number of places that require hostname parsing that will need to change, as well as the internals of the lamd (which hold IPv4 address tables) and the lamboot protocols (which exchange IPv4 adresses) to modify as well. Helpful URLs in learning about IPv6: http://playground.sun.com/pub/ipng/html/INET-IPng-Paper.html www.ipv6.org www.freenet6.org http://www.freenet6.net/ - Solaris 2.6: compiling LAM with -fast and trying to run using bcheck -all results in a "MPI: Process not initialized" error inside MPI_Init. Why? Don't know if this is LAM's fault or Solaris' fault. - Redesign the web site: - better/more obvious navigation - put performance numbers (and some comparisons) right up front (Andy) - put Linux/BSD logos right up front for those who carry LAM, potentially with cross links to relevant posts on the mailing list, notes like "RedHat 6.2 users encouraged to upgrade to x.y.z" (Raja) - Postpone any C++ in the mainline LAM kernel until libtool can do C++ libraries. :-( IN PROGRESS --> SLAM - Have lamboot skip over failures and try to boot the rest of the cluster anyway -- just report failures at the end. - Reuse communicator ID's. Perhaps via bitmap (since we only have 12 bits total). But this will change a bunch of the collective algorithms for getting new CIDs (e.g., MPI_COMM_DUP), because it breaks the simple "just get the max" assumption. We "sort of" reuse cid's now, but not really (we will mark cid's as "unused" when we are finished with them, but when we need a new cid, we still take the highest non-used cid. Not quite the same as reusing all cid's). - Move majority of functionality in mpirun into library code somewhere, thereby allowing MPI_INIT to trigger this code if someone does "./foo N -pty -- prog-specifc-args". This would allow multiple runs in a debugger, for example. However, would need some portable way for fortran programs to get argc/argv. - Make run-time flag (env variable?) to disable all parameter checking in MPI functions for added performance. - Make large shmem copies incremental, a la Sun's MPI. This will decrease latency and increase bandwidth on SMPs. - Fortran programs and C/C++ programs that effectively do MPI_Init(NULL, NULL) do not show nicely in mpitask. So use mpirun to pass argv[0] to the processes, possibly with the -e mechanism. Raja suggest this in http://www.mpi.nd.edu/Internal/llamas/msg01137.php3. - Do lamshrink/lamgrow work in the PBS environment? Do they need to be extended to pass the socket name, like lamboot was? - Make lamboot detect 127.0.0.1 addresses, and print a big warning if it's not the only address being used. Per Raja llamas mail: - before hostname-to-IP mapping: if more than one host and one of them is "localhost" or "127.0.0.1" then report "invalid bhost configuration" - after hostname-to-IP mapping: if more than one host and one of them is 127.0.0.1 then report "probable invalid hostname resolution setup" - Add support for TotalView debugger, per http://www.mcs.anl.gov/~gropp/papers/pvmmpi99/eurompi-paper.ps. More specific information is in the MPICH distribution, in src/infoexport/*, including both the MPICH implementation as well as a detailed document with the required API and whatnot. See if we can scam a free TotalView license out of this. :-) MAY BE IRRELEVANT: LAM/gm, for example, needs TCP/IP to lamboot and setup, and the canonical hostnames are probably fine for this. The myrinet stuff will still use gm for all communications, etc. - Add hostname translation in lamboot for recognizing multiple NICs in situations where only one NIC name is provided (e.g., PBS only gives canonical hostname, not hostnames of alternate NICs). Idea: LAM administrator installs a "synonym file": LABELS:hostname:myri:atm HOST:foo:foo-myri:foo-atm HOST:bar:bar-myri:bar-atm HOST:baz:baz-myri:baz-atm If user boots a hostfile of "foo\nbar\nbaz" with "lamboot hostfile -nic myri", will do a lookup in the synonym file, and translate the hostnames to the X-myri names. PROBLEMATIC... - Eric Roman idea: if you mpirun -pbs and there's no running lamd, fork/exec/wait a "lamboot -pbs" behind the scenes. Hence, there's no need for a user to do lamboot or wipe in a PBS job (but they don't need to know that). SORT OF DONE: Spawn taks "file" info key that contains appschema filename - Put in some info keys in the *spawn* functions, such as which nodes, etc. From a Llamas email Thu, 09 Sep 1999 Jun Funabiki LLAMAS: about info objects. He wanted "host", "wdir", "path". Makes sense -- want to specify which nodes/cpus we spawn on is an obvious one. - Add Myrinet/ATM/etc. RPI's. - Make LAM MT-safe (allow first three levels of MPI_INIT_THREAD -- MPI_THREAD_SINGLE, MPI_THREAD_FUNNELLED, MPI_THREAD_SERIALIZED). - Make LAM MT-hot (allow last level of MPI_INIT_THREAD -- MPI_THREAD_MULTIPLE). Can probably do this by one of two methods: 1. have a separate thread for the progress engine (see below), or 2. allow the first user thread to go down and be the progress engine (this will mean that the first thread has to act as a true progress engine, though, so it's not much different than #1, meaning that it will have to selectively unblock other user threads when messages come in for them as it is waiting for its own message). Either method will require the selective unblocking of threads when a message comes in. #1 is probably less complicated, actually. - Have a thread to run the progress engine so that message passing can really occur in the background. Probably only necessary/advisable for MPI_THREAD_MULTIPLE since it will mandate the use of locks and whatnot. This would necessitate LAM being MT-hot (i.e., if none of the user threads go down into the progresss engine at all, except perhaps by bypassing it in the fastsend/fastrecv). This will essentially separate an MPI process into 2 parts: the progress engine and the user code. See JMS phd proposal. - Perhaps split the progress engine to have a separate thread for each destination (or some percentage of destinations to ensure that it scales). See JMS phd proposal. - Add version numbers to startup of all binaries (to include MPI_Init) such that version number mismatches between binaries (e.g., lamboot/hboot/lamd, or mpirun/a.out) are printed out, followed by an abort. This version number might be the startup protocol version, not the actual LAM version number (not sure if this is a good idea, though). Easiest solution might be to put the version checking in kinit/kenter (i.e., everyone compares themselves to lamd). IN PROGRESS --> SLAM - In C2C mode, open sockets the first time they are needed -- don't just make a fully connected mesh upon MPI_INIT/MPI_COMM_SPAWN*. - Make stdout/stderr between ranks scalable -- i.e., some kind of tree-based heirarchy of passing stdout/stderr back to mpirun. Also add "rank:" prefixes to each line of output (add a command line switch to mpirun to enable this). - Add the C++ datatypes in the C++ package per the MPI-2 standard. --> Steal from the open-sourced Sun MPI. - Nick also suggests that we should allow "mpirun n0:4 n1:2 ..." command line arguments to specify how many copies to start on each node. - Put in mpirun command line switch to change send/rsend->ssend, isend/irsend->issend, send_init/rsend_init->ssend_init. Should not be the default. - Make a configure test to see if the fortran compiler can accept -I or not; add -I in hf77.c. Will need to add some hf77 flag to not use -I if user overrides default fortran compiler (e.g., with a compiler that does not support -I). - Integrate MagPIe, and/or otherwise optimize the collectives (see the Sun HPC paper) to be better than simple linear/log schemes. - Optimize datatytpes (see the MPICH datatype papers) so that we can avoid all the extra copying and whatnot. DONE - In configure.in, don't put all the "-l" args in LIBS -- put them in something else and manually propogate that into all the Makefile.am's for all the LAM binaries. If we don't do that (i.e., like what happens now), compiling any binary will get the amalgamation of billions of "-lutil" (or whatever) arguments, since libtool will take *all* the -l arguments that were used to build all the convenience libraries. DONE --> Reserved all events over 0x40000000 for internal/system use. - Prevent *_event collisions when using the process PID as the event. The problem is that (-getpid()) is used all over the place in LAM. But when a program has a PID that is small enough, (-getpid()) collides with the internal LAM services. Nick is against using positive PID values (histerical raisin distinction: negative events for system events, positive events for user events). Nick's suggestion (from http://www.mpi.nd.edu/Internal/llamas/msg01574.php3): "Another approach may be to add another synchronization type along the lines of KSYCNSQL, say KSYNCSYS, to distinguish between syetem and application events." DONE --> Information written out to syslogs - Add a logging feature to the lamds so if they crash diagnostic information can be found. DONE (overlapping; automatically senses PBS_ENVIRONMENT env var, and uses the PBS_JOBID to make unique unix socket name, broadcasts this jobid out to all the other nodes via lamboot/hboot/tkill/wipe) - Add -pbs options for lamboot, mpirun, and wipe, that will a) work only if the PBS environment variables are set, b) use PBS job ID's in the local socket names so that LAM's can overlap, c) use the TM extensions to lamboot/wipe instead of rsh. TM/PBS will return an array of virtual nodes that were allocated. lamboot/wipe will have to sift through this array and generate a list of unique physical nodes and tm_spawn() to just the physical nodes. However, this will let lamboot count the number of CPUs per physical machine (see the "mpirun C foo" bullet, below). For LAM 6.4 ----------- - otb/sys/impid/client.c -- it seems that Dec OS does not have atoll or strtoll. Need to implement those; shouldn't be hard. - Change to have only *one* IMPI_Pk_ackmark and IMPI_Pk_hiwater -- per 3.6 of IMPI std. After discussions with Bill George/NIST, we decided that only *one* pair of values was sufficient. LAM currently maintains a piar for each host pair. DONE - Add attributes on IMPI communicators per section 2.5, IMPI_CLIENT_SIZE, IMPI_CLIENT_COLOR, IMPI_HOST_SIZE, IMPI_HOST_COLOR - Integrate Dog's server package into LAM, make mpirun -server do the Right Thing, use his .h file. - Do the collectives. - Rename all extern variables in impi.h to have common naming scheme DONE FUNCTION IMPI-IFIED ------------ ---------- Bsend Done (calls lam_isend) -- IMPI_Isend_lamgiappe Bsend_init Done -- IMPI_Send_lamgiappe_init Ibsend Done (calls MPI_Bsend) Rsend Done (calls lam_send) -- IMPI_Send_lamgiappe Rsend_init Done -- IMPI_Send_lamgiappe_init Irsend Done (calls lam_isend) -- IMPI_Isend_lamgiappe Send Done (calls lam_send) -- IMPI_Send_lamgiappe Send_init Done -- IMPI_Send_lamgiappe_init Isend Done (calls lam_isend) -- IMPI_Isend_lamgiappe Ssend Done (calls lam_send) -- IMPI_Register_ping/MPI_Wait Ssend_init Done -- IMPI_Send_lamgiappe_init Issend Done (calls lam_isend) -- IMPI_Isend_lamgiappe Recv Done -- MPI_Irecv/MPI_Wait Recv_init Done -- IMPI_Register_ping_init Irecv Done -- IMPI_Register_ping Test Done (fixed lam_test for nested requests) Testall Done (calls lam_test) Testany Calls MPI_Test Wait Done (fixed for nested requests) Waitall Calls MPI_Waitany Waitany Done Start Calls MPI_Startall Startall Done DONE - IMPI_PK_CANCEL DONE - IMPI MPI_ANY_TAG DONE - IMPI collective helpers for Barrier DONE - IMPI barrier DONE - Put intercepts in all collectives that will fail if communicator is an IMPI communicator DONE (in mpirun -- -O is not allowed) - Unset homog flag on lam impid proc DONE - Put intercepts in MPI_Comm_spawn* to disallow spawns on IMPI comms DONE - Make impirun sym link to mpirun DONE - Make environment variable overrides COLL_XSIZE and COLL_MAXLINEAR For LAM 6.3 ----------- DONE - Change/addendum all references to "wipe" to "lamhalt" in all the man pages. Should probably document "wipe" as a "last resort" kind of option, only to be used if "lamhalt" fails. DONE - Additionally, if LAM knows numbers of CPUs on each node, if you have a 4 way SMP (n0) and an 8 way SMP (n1), and you do "mpirun -np 8", LAM should run exclusively on n1. i.e., minimize off-node communication (this shouldn't be too hard). DONE - Add in extensions for lamboot to recognize multiple CPUs on a single machine by either counting occurances of a hostname, or by using the ":n" notation in the hostfile. Then make "mpirun C foo" startup a foo for each CPU. DONE (new command: "lamhalt") - Bill Saphir raised a good point that "wipe" should not require a schema file. "wipe" by itself (or perhaps with a flag) should take down the entire currently running machine. Implement this as: send a message to the local lamd. Have it send suicide messages to all the other lamds, then quit. DONE (didn't rename, made a new command called "lamhalt") - Rename the "wipe" command -- we've all hated that name for a long time now. Rename it to be lamhalt or something (and make wipe a sym link to it, for usability sake). DONE (new command: "lamnodes") - Bill Saphir also mentions that there should be a way to find out what *machines* (by name) are in the environment, not just the nX nomenclature. Perhaps a flag to mpitask/mpimsg can be used to change the display from the nX nomenclature to the hostnames. SKIPPED (new command obviates need for this: "lamnodes") - Make tping also somehow show the hostnames involved (via command line switch or something). DONE - Redo the make/build structure to be GNU conformant (i.e., separate "make all" and "make install"). Use automake. DONE - Move "share/h" to "share/include" (can provide a sym link if necessary; it's hard to ditch these long-standing conventions, even if they are bad!). DONE - Make it possible to use the profiling support in ROMIO so that when a user program utilizes the profiling interface, only the MPI I/O functions are profiled, not the underlying MPI calls that are used to implement the MPI I/O functions. This support is inclued in ROMIO, we just need to do some things in LAM to make that work (modifying the build process for "make profile", I think). DONE -- problem was actually in accept/connect, not publish_name - MPI_Lookup_name doesn't seem to work under Solaris 2.6 (and others) when MPI_Post_name (or whatever) is called on a different machine that MPI_Lookup_name, but it *does* work under Solaris 2.5.1. DONE - Fix mpif.h, to whatever was decided on the llamas list -- real*8, or double precision as the standard states. DONE -- web pages about it - Find out why performance sucks under Linux 2.2.x -- wait for info from Pete Rijks about this DONE -- web pages about it - Add info in thread from http://www.mpi.nd.edu/MailArchives/lam/msg00715.php3 to LAM FAQ; in general, add section about Linux 2.2.x -- wait for info from Pete Rijks about this DONE -- left them as 0 for now, consolidated into one file - Define BROKEN_INET_* to be 1 on AIX machines in ./configure.in -- or do we need a real test to see if it's broken or not? DONE - #define LAM_MPI 1 DONE - move spawn code into share/mpi/lamspawn.c DONE - C++ problems: don't sym link mpi2c++/mpi++.h to mpi++.h; you need to -I to get the rest of the file anyway DONE - C++ problems: g++ -Wall complains about order of initializers in some constructors DONE - var args problems with show_help under Dec Alpha (possible old non-ANSI C compiler). See: http://www.mpi.nd.edu/MailArchives/lam/msg00801.php3 DONE -- AIX problem; user must enable AIO in the kernel (i.e., not LAM's problem) - Undefined symbols with xlc in AIX 4.1 when trying to run ROMIO programs: http://www.mpi.nd.edu/MailArchives/lam/msg00780.php3 [22:15] spin14:~/jeff/aixlocal/test % mpirun N simple -fname `pwd`/foo Could not load program simple Symbol kaio_rdwr in /usr/lib/libc.a is undefined Symbol listio in /usr/lib/libc.a is undefined Symbol acancel in /usr/lib/libc.a is undefined Symbol iosuspend in /usr/lib/libc.a is undefined Could not load library libc.a[aio.o] -- Update from Rajeev -- looks like an AIX administrator problem; he recalls a similar problem, and the user's sysadmin fixed the problem. As such, even a simple aio_read() program doesn't even run properly under AIX 4.2 right now (even though the man pages say that it should). DONE -- must have "." in PATH (i.e., not LAM's problem) - mpirun -wd doesn't work in Linux? At the very least, "mpirun foo" (where foo is in the pwd) doesn't work under Linux. It does work under Solaris. DONE - Cannot compile with cc under AIX 4.1 -- problem with tprintf.c, arglist defined twice (oops) http://www.mpi.nd.edu/MailArchives/lam/msg00780.php3 DONE - lamexec.1 man page DONE - Change INSTALL/RELEASE_NOTES/man pages/etc. to reflect lamteam's new changes: passing environment variables, LAM_MPI_PAUSE, mpirun of non-MPI programs, etc. DONE - Make specific list of 6.3 features that need to be tested DONE - All the $COPYRIGHT$ header stuff (see share/mpi/init.c for example) DONE - add test/MPI 2 C++ bindings DONE - add test/ROMIO. There were some errors in the ROMIO test programs... DONE - Check for ZERO_ME stuff in sysv and usyv RPI's DONE - Add in MPI 2 C++ and ROMIO examples into examples/ directory. DONE -- No. But updated the docs to say that it will not happen until after MPI_Init(). - Is mpirun -D broken? DONE - Add in INSTALL file that to make ROMIO and MPI 2 C++ examples, can do "make examples" in top-level LAM directory DONE - Patch up man pages DONE: hcc, hcp, hf77, mpicc, mpiCC, mpif77, lam-helpfile TO DO: ...list to-do man pages here... DONE - Increment the ref count in errget.c like Raja indicated in some llamas mail DONE (by lamteam) - Add start of non-MPI programs from mpirun (e.g., bcheck, gdb, shell scripts, etc.). DONE - Change INSTALL/RELEASE_NOTES/etc. to reflect that --without-romio is the default ALREADY DONE -- hf77 prints a message that LAM was not compiled with FORTRAN support - Find out why fortran stuff is still compiled when we do --without-fc; stop it if possible, and definitely do not build hf77/mpif77 DONE - Change hcc/hf77 -- use open instead of fopen, and #if the MPI2C++ and ROMIO sections DONE - Make --without-romio be default ./configure DONE - Put in the right COPYRIGHT files DONE: did hboot, lamboot, recon, tkill, wipe (and some other routines that these call, such as inetexec() and lambotagent()) - Make messages in "recon" be more descriptive (i.e. rsh failed, can't find executables, etc.). Can these same messages also apply in lamboot? DONE - Deny root from running lamboot, lamd, hboot DONE: But LAM still won't compile with a C++ compiler -- too many C-dependant things - some things in configure break when setenv CC CC and run it. AC_WORDS_BIGENDIAN, apparently, tries to use "exit()" without prototyping it first, for example. DONE - Create mpiCC DONE - Make mpicc/mpif77 be sym links to their h counterparts IGNORED - Do *something* about MPI_Cancel; requests need to be marked as want_to_be_cancelled, and then behavior of Test and Wait is modified. Ugh! DONE - Add in some docs about what MPI_Cancel() does/does not do. DONE: NEED TO DO: cvs add tools/hcc/hcp.in DONE: NEED TO DO: cvs remove tools/hcc/hcp DONE: TO DO: cse remove configure DONE: - Copy over the ZERO_ME stuff from IMPI branch to 6.2b branch DONE: - Copy the WANT_PROTOS stuff over; move ARGS macro to lam_config.h DONE: - Ditch all variables named "new" DONE: - Fix MPI_Wtime and MPI_Wtick in mpif.h -- nail down to real4, not just real (I think) DONE: - Allow change of default fortran compiler in ./configure DONE: - Build wrapper compilers to use same compilers that ./configure used FIXED: - there's actually a user who didn't have ar in their path, but configure kept going, even though it wasn't found. Then they were mystified when it didn't compile. Solution: if configure can't find AR, abort. DONE: - need to update share/h/patchlevel.h file for new patches DONE: - Write INSTALL file. Include information on how to NFS cross-mount LAM executables around (as well as user programs), etc. DONE: (can't find it anywhere...) - Copy signal handling bug fix over from IMPI tree DONE: - Copy mreadv/mwritev bug fix over from IMPI tree DONE: - Write makedist stuff DONE: - some architectures/compilers appear to define status.st_dev as 8 bytes, not four. From the LAM mailing list (5 Mar 1999): DONE: -------- As I reported yesterday, the Portland Group c compiler fails to compile some source code in LAM. I looked a little further into the problem and it looks like some LAM code tries to copy a value from a 4-byte wide value to an 8-byte wide value. PGC-S-0094-Illegal type conversion required (../../../../share/freq/rfstat.c: 121) PGC/x86 Linux/x86 3.0-1: compilation completed with severe errors PGC-S-0094-Illegal type conversion required (../../../../otb/sys/filed/fface.c: 209) PGC/x86 Linux/x86 3.0-1: compilation completed with severe errors Here is sample code which demonstrates this; it won't compile with PGCC, although GCC seems to like it just fine, and LAM runs without any (noticeable) problems. #include typedef unsigned int uint4; void main(void) { struct stat status; uint4 buf; printf("sizeof(status.stdev): %d\n", sizeof(status.st_dev)); printf("sizeof(buf): %d\n", sizeof(buf)); printf("sizeof(long long): %d\n", sizeof(long long)); status.st_dev = buf; } ------