ENCP release notes, from v3_10d to v3_10e
Encp changes:
=============
This release is mainly bug fix release to v3_10d which introduced support for
next generation namespace provider called Chimera.
v3_10e must be installed by end users in order to be able to store data using
Chimera namespace provider. It is backward compatible with existing PNFS namespace
prvider.
Misc.:
======
Detailed cvs commit logs
========== chimera.py ====================================================================================
fix logic of consistent() method
replace traceback.print_tb() call with Trace.handle_error()
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1088
BUG FIX: encp fails to work with chimera on files imported from pnfs as the decsion what class to
instantiate is made based on examination of pnfsid. This patch fixes the issue.
http://uqbar/reviews/r/409/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1083
========== encp.py ====================================================================================
bump version of encp to v3_10e
========== enstore.py ====================================================================================
enstore pnfs does the same thing as enstore sfs. Thus keeping it backward compatible when switching to chimera
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1117
========== file_clerk_client.py ====================================================================================
Fix enstore sfs commands and fix enstore file --restore command
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1085
========== file_utils.py ====================================================================================
stop generating stack traces in DEBUG log file
encp: remove debug log that fills up log files fast
http://uqbar/reviews/r/375/
Commented out a debugging statement.
========== namespace.py ====================================================================================
Fix enstore sfs commands and fix enstore file --restore command
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1085
BUG FIX: encp fails to work with chimera on files imported from pnfs as the decsion what class to instantiate
is made based on examination of pnfsid. This patch fixes the issue.
http://uqbar/reviews/r/409/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1083
========== pnfs.py ====================================================================================
fix path concatenation in pnfs.__get_path
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1098
fix file concatenation problem seen when this command failed:
/opt/encp/encp --verbose=4 --threaded --ecrc --bypass-filesystem-max-filesize-check \
--pnfs-mount /pnfs/fs --put-cache 000C00000000000000B1FCC8 \
/diskb/write-pool-2/data/000C00000000000000B1FCC8
http://uqbar/reviews/r/443/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1098
protect access to intf.directory with hasattr(intf,"directory") check.
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1075
ENCP release notes, from v3_9c to v3_10d
Encp changes:
=============
Chimera support for encp is added.
Fixed encp to work with new style DiskMovers that hash using the PNFS/Chimera
ID of the file.
The v3_10a version addresses a problem introduced in v3_10 regarding
unnecessary PNFS accesses to address dual PNFS and Chimera support. The
high load put upon PNFS was manifesting itself as a high rate of ESTALE
errors.
The v3_10b and v3_10c version attempted to address an intermitent problem
where the Linux kernel would hang. These versions did not solve the problem.
A v3_10d version now fully addresses this. The short description is that
backdoor files into PNFS could be constructed and used by Enstore with a
pattern that does not resemble a tree, but instead a graph. The Linux
kernel assumes a file system can only be a tree. The code that created
theses paths has been modified. See the following incidents and problems:
INC000000056879 and PBI000000000184 for CDF
INC000000070546 and PBI000000000147 for D0
Misc.:
======
Chimera support for enmv is added.
Added the --match-directory-file-family and --match-volume-file-family switches
for enmv. These modifiy the layer 4 file_family value to match the
.(tag)(file_family) file or the Enstore DB record, respectively.
PNFS and Chimera are supported with the "enstore sfs" commands. The
"enstore pnfs" commands are retained for backward compatibility. The
'sfs' stands for Storage File System.
Detailed cvs commit logs
========== duplicate.py ====================================================================================
Implemented the duplication cleanup_after_scan() function. Previously, it was a no-op. (bugzilla #973, review board #289)
Detect when a previous make-failed-copies attempt failed and reuse the destination copy. (bugzilla #835, review board #191)
The patch adds --make-copies to duplicate.py: 1) Add support for --make-copies in option.py. This includes fixing an arguement processing bug with a required arguement first appearing in the extra values section of the switch definition. 2) Added make_copies() in migrate.py based on make_failed_copies(). 3) Reworked migrate.migrate() when --with-final-scan is used to join all read and write threads, then sort all the destination files by destination location, then start the final_scan threads. The old way worked, but proved to be very inefficent with tape access. An unrelated item also fixed: CDMS122444004900000 on VOB738 had multiple migrations run simultaneously. One set completed successfully, while the other did not. The code that determines if a source file is all done returns True, because it found a set of successful destination files. The Total done check fails because it finds the unsuccessful set of destination files. This patch gives the user the error, while previously the users only got "failed from previous errors" without there being a previous error. For all: bugzilla #791, review board #166
Lets get the correct versions commited this time... 1) --restore switch now works for duplication 2) --status now has --migration-only and --multiple-copy-only modifying switches to show only a limited type(s) of status. 3) In addition to supplying a bfid or volume, the user can now give ":" for migrating/duplicating, scanning, status reporting or restoring. Reporting --status on a per file basis is also new. Previously, it only worked on volumes. 4) Added two new functions: is_migration_history_done() and is_migration_history_closed(). These functions return True if all source and destination pairs have their entries in the migration_history table. The _done() function only makes sure that all the entries exist. The _closed() function additional makes sure all the "closed_time" fields are also filled in. (bugzilla #772, review board #153)
1) To fix two --status bugs. 2) To give a better error message when a ghost file is found during migration. 3) To prevent the full path look up information was then being dropped without using it, since it was already migrated. 4) Catch errors updating migration_history.
Fix duplication to handle deleted files. (bugzilla 715, review board 123)
Fixed compatibility issue between migrate.py and duplicate.py for get_filenames() and duplicate_metadata()/swap_metadata(). (bugzilla #638, review board #72).
========== enstore_functions3.py ====================================================================================
moved file_id2path from mover.py to enstore_functions3.py (bz 976).
branches: 1.12.4; Fix issues with new style disk volumes and location cookies not being recognized correctly. (bugzilla #772, review board #153)
New format for enstore disk volume name is: hostname:SG.FF.WRAPPER:YYYY-mm-ddTHH:MM:SSZ (bz 607)
Modifed the import of is_chimeraid() from chimera.py to not fail if chimera module can not be found. (The rest of the chimera patch is not ready for production yet). (bugzilla bug #606, review board #50)
Chimera integration changes
========== hostaddr.py ====================================================================================
added diagnostic message
========== duplication_util.py ====================================================================================
Detect when a previous make-failed-copies attempt failed and reuse the destination copy. (bugzilla #835, review board #191)
1) To fix two --status bugs. 2) To give a better error message when a ghost file is found during migration. 3) To prevent the full path look up information was then being dropped without using it, since it was already migrated. 4) Catch errors updating migration_history.
Fix duplication to handle deleted files. (bugzilla 715, review board 123)
========== dispatching_worker.py ====================================================================================
ADDED FEATURE: furnish dispatching_worker.thread_wrapper with printing of function execition time http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=661 http://uqbar/reviews/r/85/
pass ticket to restricted_access http://uqbar/reviews/r/36 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=586
modify restricted_access to accept reply_address as argument
========== configuration_client.py ====================================================================================
Update domains in dump method. Othrewise they never get loaded in encp and only default domains are used as a consequence.(bz 1032). This problem has been found at PIC.
========== drivestat2.py ====================================================================================
extract and pass database user parameter to drivestat database (it was overlooked in previous attempt to address this issue) Bugzilla: http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=676 RB: http://uqbar/reviews/r/104
========== inventory.py ====================================================================================
count active files properly http://uqbar/reviews/r/333/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=995
typo : replace "Reqested" w/ "Requested" http://uqbar.fnal.gov/reviews/r/329/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=979
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== enstore_up_down.py ====================================================================================
branches: 1.73.2; make sure get_enstore_state() return value http://uqbar/reviews/r/239 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=896
Do not set red ball for insufficient movers when library manager is scheduled down.
========== file_utils.py ====================================================================================
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
Fixed a problem with encps causing Linux kernels to hang. The .(access)() paths of directories can be put together in a way that the filesystem does not form a tree, but a graph. The kernel implementation assuses only trees and hangs in a loop. This patch also addresses some issues with spurious errors from PNFS under high load. Incident ickets: INC000000056879, INC000000070546 Problem tickets: PBI000000000147, PBI000000000184 URL: https://plone4.fnal.gov/P0/Enstore_and_Dcache/developers/enstore-developers/documents/encp-investigation-of-inc000000056879-pbi000000000147/ Bugzilla ticket: http://www-enstore.fnal.gov/Bugzilla/show_bug.cgi?id=981 Reveiw board: http://uqbar.fnal.gov/reviews/r/324/diff/#index_header
Commented out a debugging print statement. (review board #313)
Improved the perfomance of encp, migration and metadata scanning. (bugzilla #931, review board #262) This includes: 1) fewer PNFS/Chimera accesses seting file sizes for long filename files 2) handling PNFS databases that have multiple entry points 3) not hanging SLF4 kernels with paths like /pnfs/fs/usr/Migration/.(access)(000000000000000000001080)/Migration/.(get)(database) 4) processing the list of current mountpoints once (namespace.py) instead of twice, once for PNFS and once for Chimera. 5) cache found PNFS database starting directories and their associated .(get)(database) values 6) retrying ESTALE errors, since PNFS has been found to be inconsistant
branches: 1.20.2; skip non-existent direntries while running recursive rm on directory http://uqbar.fnal.gov/reviews/r/234/ will address http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=892
Encp when running as root for dCache was running into permission problems with Chimera that do not exist for PNFS. This code now sets the effective IDs to the owner of the file to avoid them. (bugzilla #881, review board #228)
Added new function get_mount_point() that returns the mount point of the supplied path. (bugzilla #858, review board #206)
Fixed wrapper() to return the return value from the executed function. (bugzilla bug #599, review board #45)
========== pnfs.py ====================================================================================
Handle the situation where the user is using a path containing a symbolic link to the PNFS mountpoint. Also, convert the PNFS id from a hex string to integer, instead of trying it as a decimal string.
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
Allow the --mount-point switch to be usable by dCache instead of just by an administrative user. This will be used by encp_test_script.
Fixed a problem with encps causing Linux kernels to hang. The .(access)() paths of directories can be put together in a way that the filesystem does not form a tree, but a graph. The kernel implementation assuses only trees and hangs in a loop. This patch also addresses some issues with spurious errors from PNFS under high load. Incident ickets: INC000000056879, INC000000070546 Problem tickets: PBI000000000147, PBI000000000184 URL: https://plone4.fnal.gov/P0/Enstore_and_Dcache/developers/enstore-developers/documents/encp-investigation-of-inc000000056879-pbi000000000147/ Bugzilla ticket: http://www-enstore.fnal.gov/Bugzilla/show_bug.cgi?id=981 Reveiw board: http://uqbar.fnal.gov/reviews/r/324/diff/#index_header
Fixed "enstore sfs --path" or "enstore pnfs --path" when the pnfsid belongs to a regular file or directory in one PNFS server, but to a tag or layer in a second PNFS server where both PNFS servers need to have a at least one database mounted, but the second one does not have the correct databases mount point mounted. Now the fully resoved path from server one will be returned instead of the error "No such file or directory". (bugzilla #988, review board #299)
In get_directory_name() if the directory consists of something like "." then the correct parent directory is not returned. (bugzilla #974, reveiw board #290)
Make the "enstore pnfs --id" and "enstore sfs --id" commands work with enstore executable built for dCache use. (bugzilla #986 review board #298)
The "enstore pnfs --path" fails to give the path of a file if the file is located under 3 or more PNFS databases. (bugzilla #970, review board #287)
At one point in pnfs.py and chimera.py it tries to make a copy of an object. This object is a dictionary, but the copy is attempted as if it is a list. The dictionary copy() method is now used. (bugzilla #956, review board #276)
Improved the perfomance of encp, migration and metadata scanning. (bugzilla #931, review board #262) This includes: 1) fewer PNFS/Chimera accesses seting file sizes for long filename files 2) handling PNFS databases that have multiple entry points 3) not hanging SLF4 kernels with paths like /pnfs/fs/usr/Migration/.(access)(000000000000000000001080)/Migration/.(get)(database) 4) processing the list of current mountpoints once (namespace.py) instead of twice, once for PNFS and once for Chimera. 5) cache found PNFS database starting directories and their associated .(get)(database) values 6) retrying ESTALE errors, since PNFS has been found to be inconsistant
branches: 1.302.2; Optimize pnfs._get_mount_point2() to know if it already found the path for the ".(get)(database)" output. This short circuits the search if it knows it already has the answer from a previous iteration. (bugzilla #914, review board #248)
Additional patch for bugzilla #856 and review board #204. The addressed use case prevents a traceback when obtaining the path of a PNFS ID that belongs to a tag or layer file.
Added feature for tag and layer PNFS IDs to work with --showid and --path. (bugzilla #856, review board #204)
These patches are all related to modifying find_pnfs_file.py to support Chimera instead of just PNFS. Also, modify encp to use the new functionality. (Bugzilla #839, review board #196)
This is Vijay's delete_at_exit Chimera integration patch. (review board #105)
These are additional patches for Chimera integration into encp. The namespace.StorageFS class now uses the __class__ member variable to better become one of the chimera.ChimeraFS, pnfs_agent_client.PnfsAgentClient or pnfs.Pnfs class. (bugzilla #649)
Chimera integration changes
========== operation.py ====================================================================================
Adjusted output caps for 8500F1 robots. They are different due to SL8500-7 having been added as lib 1. Still want d0en ejected from SL8500-5 which is now 1,9,0. Still want stken and cdfen ejected from SL8500-3 which is now 1,5,0.
implement interface to servicenow
http://uqbar/reviews/r/355/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1040 (mitigates, does not solve) operation.py currently cuts single ticket with maximum of 5 caps of tapes. Next ticket will be cut of there are more then 10 caps worth of full tapes or 7 days elapsed since last ticket was cut. The volumes in the list are ordered by label. As the result some volume may take a long time to get flipped if they happen to be in the end of the list. The solution is to cut tickets for all tapes in the list. SSA does not want to have one ticket with more than 5 caps. Cutting many tickets, 5 caps each requires some work. Meanwhile a mitigating solution is to order volumes by si_time_1 in ascending order.
Added CD-10KCG1 library to tab flipping for the 8500G1 robot - inc 89457.
Added missing CD-LTO3GS to map to 8500GS in library type check.
Changes to support new SL8500-6 robot. Added 'r' to note the 8500GS robot. Added mappings of the 3 new LTO4 GS libraries (CD-LTO4GS, CDF-LTO4GS, D0-LTO4GS) to map to 8500GS robot. Added section to produce correct fntt node ACSLS command to eject 8500GS tapes to the proper cap address 2,1,0.
generate correct text in Remedy on request to take write protection off http://uqbar/reviews/r/255/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=744
get rid of usage of option.Interface http://uqbar/reviews/r/231/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=889
Modified make_cap(list) to differentiate between the 3 instances for the output CAP for the 8500F1 library. All d0en drives are now only in SL8500-51 which is ACSLS LSMs 1,4-7, while all cdfen and stken drives and tapes are now only in SL8500-3 which is ACSLS LSMs 1,0-3. Now ejecting these tapes only from their respective CAPS. Changes reviewed by Stan Naymola Apr 15 2010.
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
added product categorization tiers for tab flipping
protect import remedy_interface in try : except block (http://uqbar/reviews/r/41, http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=594)
http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=576 RB http://uqbar/reviews/r/27/ assign flip tab remedy ticket to the correct service group
introduced interface to new remedy API. Use this interface instead of system call in operation.py and create shell wrapper around tab_flipping_nanny.py Bugzilla #567 RB: http://uqbar/reviews/r/18
========== generic_client.py ====================================================================================
Generic client may not have csc under differrent conditions. Check if self has atribute server addres and if yes use it. (bz 1004).
BUG FIX:generic client does not allow requests with TCP connections from sonfiguration server if client and server run on different network domains. This was reported by PIC. (bz 1004).
If the callback address is not in the allowed list, don't let the client connect back over TCP. (bugzilla #859, review board #207)
Patched defect where the short answer in send() receives an error. It used to try the long answer, if allowed; now it skips the long answer on these errors. (bugzilla bug #598, review board #44)
========== enstore_html.py ====================================================================================
branches: 1.228.4; Avoiding improper use of Interface class. (bugzilla #830)
========== web_server.py ====================================================================================
get rid of interface usage (to play nice with option.py) http://uqbar/reviews/r/184/
Always work with copy of the original httpd.conf
========== inquisitor_plots.py ====================================================================================
added module that plots mount/dismount latencies per robot per drive type RB http://uqbar/reviews/r/147/ Bugzilla http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=765
========== backup.py ====================================================================================
modify search pattern to include all files in source directory (instead of just entv.tar files). This led to failure to pickup "enstore.dmp" file and subsequent failure of checkdb.py script http://uqbar/reviews/r/304/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=994
========== option.py ====================================================================================
branches: 1.187.4; Added the --match-directory-file-family and --match-volume-file-family switches for enmv. (bugzilla #838, review board #195)
1) Deep in option.py processing, the list of arguements not consumed by a switch needs to be updated. There were some cases were this was happening incorrectly. This was resulting in a confusing warning message to the user. 2) Some arguements were not in alphabetical order. (bugzilla #836, review board #193)
Avoid possibily reference to undefined variable, new_index. (bugzilla #829, review board #186).
The patch adds --make-copies to duplicate.py: 1) Add support for --make-copies in option.py. This includes fixing an arguement processing bug with a required arguement first appearing in the extra values section of the switch definition. 2) Added make_copies() in migrate.py based on make_failed_copies(). 3) Reworked migrate.migrate() when --with-final-scan is used to join all read and write threads, then sort all the destination files by destination location, then start the final_scan threads. The old way worked, but proved to be very inefficent with tape access. An unrelated item also fixed: CDMS122444004900000 on VOB738 had multiple migrations run simultaneously. One set completed successfully, while the other did not. The code that determines if a source file is all done returns True, because it found a set of successful destination files. The Total done check fails because it finds the unsuccessful set of destination files. This patch gives the user the error, while previously the users only got "failed from previous errors" without there being a previous error. For all: bugzilla #791, review board #166
New switches for migration: --migration-only and --multiple-copy-only (bugzilla #772).
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time.
Address printing help_strings with embedded newlines. (bugzilla bug #605, review board #48.)
Patch to keep the code from confusing arguments and switches with the same string values (without any leading - or --). (bugzilla bug #600, review board #46) Also, add the new fs.py switches for extended attributes. (--xattr, --xattrs, --xattrrm, --xattrchmod --xattrchown)
========== plotter_main.py ====================================================================================
add Mounts/day per tape library plots http://uqbar/reviews/r/215/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=866
added module that plots mount/dismount latencies per robot per drive type RB http://uqbar/reviews/r/147/ Bugzilla http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=765
========== pnfs_backup_plotter_module.py ====================================================================================
Don't throw a traceback if there exists a start time but not (yet) a finish time. (Bugzilla #722, RB #128)
The interval value can optionally display "N days HH:MM:SS" or "1 day HH:MM:SS" instead of just "HH:MM:SS". Handle the formats. (bugzilla #697, review board 115)
========== enstore_status.py ====================================================================================
modified to take into account a new format of log message.
========== file_clerk_client.py ====================================================================================
branches: 1.185.4; Modify "enstore file ---restore --force" to become the existing users owner for file manipulations. (bugzilla #724, review board 130)
========== manage_queue.py ====================================================================================
BUG FIX: thread syncronization problem in Queue. It was observed that if one thread meakes put and another delete or update the tags list may have 2 identical entries. If the corresponding request gets selected and deleted later the "orphan" entry for this request stays in tags without reference to the original requests in the queue. This results in the indefinite loop inside if the queue selection. (bz 992)
BUF FIX: negative max_index causes indefinite loop in Atomic_Request_Queue.get When last entry is deleted from tags max_index in line 366 can become negative, setting self.current_index and self.start_index to a negative value too. Then new entries are added to tags sorted list. When Atomic_Request_Queue.get is called with list of active_volumes same as in tags, the while loop in line 904 is entered. This loop never exits. The reason is: SortedList._get_next will never set self.stop_rolling flag to 1, because it is always > self.start_index, thus allowing selection of already selected requests.(bz 975).
BUG FIX: exception on line 750 This bug was causing exception on line 750 when updated_rq was None.
branches: 1.111.2; request priority growth was broken in manage_queue.py, which resulted in not granting request which priority is expected to grow in time. This patch fixes the bug. (bz 924)
In SortedList.rm() set current_index and start_index to the (length of list) - 1 if the last element of list is deleted. Before start_index could become bigger than the biggest index for the list causing resinchronization and indefinine loop in get_next (bz 774)
Bug fix: set stop_rolling
In SortedList class of manage_queue.py call get() inside of get_next() if it was never called. The problem was found when adding diagnostic messages to resolve library_manager hangings. (bz #785)
The Request_Queue.get() in manage_queue.py does not process the next request in tags if previous requests were dropped due to restriction on host selection. (bigzilla 769). This problem is caused by the bug in manager_queue.py the Request_Queue.get() does not process the next request in tags if previous requests were dropped due to restriction on host selection, e.i. Request_Queue.get(key, next=1, active_volumes=active_vols, disabled_hosts=disabled_hosts_list) -- returns None because all requests for a give key (volume family of external label) were for a host in the disabled_hosts_list Request_Queue.get(next=1, active_volumes=active_vols, disabled_hosts=self.disabled_hosts) - returns None because next request from a given host was in the disabled_hosts_list Request_Queue.get(next=1, active_volumes=active_vols, disabled_hosts=self.disabled_hosts) - returns None because Atomic_Request_Queue called by Request_Queue.get does not proceed with another tag.
Removing ticket['fc']['external_label'] in manage_queue.py for write requests in the middle of the request selection cycle may cause KeyError exception when request gets pulled from postponed requests queue, because requests in manage_queue.py queue and postponed requests queue are the same references. To fix this allow to remove ticket['fc']['external_label'] from write request only if the request has not yet been processed in the request selection cycle. Note that request selection cycle begins every time when either IDLE or HAVE_BOUND Librray Manager request comes from a mover. (Bugzilla 712)
Clean external_label entry in write request. This entry is a result of the volume assignement in some previous request processing cycle, when volume was assigned, but later request was not selected (usually for fair share reason).
One more line needed the same change
Corrected calculation of curpri.
fixed bugs, changed test() return value
This is a first version of modified manage_queue.py to be used with a new implementation of scaled library_manager.py. It is not backward compatible with library_manager.py 1.667 and older.
========== log_server.py ====================================================================================
fix http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=773 "Enstore alarm log search seems broken" RB: http://uqbar/reviews/r/154/
========== enstore_plotter_framework.py ====================================================================================
get rid of using interface so enstore_plotter_framework can work with option parser http://uqbar/reviews/r/152/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=771
========== cleanUDP.py ====================================================================================
Allow UDP packet fragmentation. It is disallowed by default. If it is disallowed then the following problem occurs. If UDP packet size is less than MTU on a sender node and bigger than MTU on receiver node the packet gets delivered without fragmentation and rejected on receiver node because it can not treat frames bigger than MTU. This was observed when deploying new CMS worker nodes with 10Gb intrefaces. (bz 1065).
Give better error messages from sendto() for EBADF and EBADFD. (bugzilla #859, review board #207)
========== event_relay_client.py ====================================================================================
Additional patches to get entv for SLF6 to work. (bugzilla 1021, review board 325)
========== enstore_system_html.py ====================================================================================
Avoiding improper use of Interface class. (bugzilla #830)
========== quota.py ====================================================================================
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== migration_summary_plotter_module.py ====================================================================================
Corrected the migration accounting to handle: 1) Destination tapes that were determined to be bad during scanning. They had partial migration_history table records that were not properly being ignored. 2) Shelved-tapes with incomplete migration_history table records were getting included in the total tape count, but were not included in the remaining tape list.
========== enstore.py ====================================================================================
branches: 1.86.2; Use "sfs" for Storage File System as the command name for generic access to pnfs, chimera or pnfs_agent. This reserves "fs" for local disk equivalents. (bugzilla 649)
Chimera integration changes
========== library_manager.py ====================================================================================
In the case when there is a write disabled tape for a given file family in bound state and another tape for the same file family in active write, the next write request with on-line priority draws an addtional tape even though file family width rule should not allow this. (bz 1056).
BUG FIX: typo in variable name Typo in variable name caused incorrect request processing. This resulted in work not being assigned to movers if fair share for processed request was exceeded.
In some cases admin priority request gets skipped if the request for a given label or file family is already in self.processed_admin_requests. This actually is not correct. If current request is from a different client host than the request in self.processed_admin_requests it must get processed and checked against discipline and not just skipped. This bug was inherited from library manager version preceeding Scalable Library Manager. (bz 977)
BUG FIXED: Library Manager does not respect discipline rules for write requests if request being processed exceeds fair share.(bz 972).
Library manager was ignoring suspect volume rules for read requests. Fix: Do not assign request to the mover declared in the list of suspect volumes. (bz 967)
The logic was assuming permissions as array of 2 arrays like [['e1','e2'],['e3','e4']], while it actually is array of 2 strings: 201 Fri Jan 14 14:31:24 2011.70 busy_volumes: permissionss ['none', 'none'] Thread process_mover_requests where ei - is some string, representing state of volume system_inhibit. (bz 944).
Put a volume into susspect volume list on positioning error from mover. When not done so the defective tape put lots of movers offline. (bz 947)
branches: 1.688.2; Library manager incorrectly compared encp versions beginning with v3_10.(bz 921)
1. Put complete volume in fo into write request in work_at_movers. (bz 919) 2. It has been noticed that admin priority write request processing violates file It has been noticed that admin priority write request processing violates file family width restriction in the following scenario. 1. Regular priority request completed and mover has returned HAVE_BOUND. File family - FF1 2. Request with admin priority gets pocked up for processing. File family - FF2 3. There is (are) movers in HAVE_BOUND state for FF2 and file family with is equal to the number of such movers. This should not allow selected request to go to the current mover. 4. But due to bug the request goes to the current mover. This happens not so oftten because there must be movers waiting in HAVE_BOUND state for FF2. This can only happen when there were no write requests for FF2 (bz 918)
This fixes a bug when admin priority request overrides currently mounted volume. The currently mounted volume was getting removed from volumes_at_movers list before it was actially dismounted, allowing to send request for this volume to another mover. As a consequence there was a mount request for not dismounted volume, causing mount failure and volume set to NOACCESS. (bz 498)
Fixed bug introduced in 1.683: In mover_busy volumes_at_movers list does not get updated when "good" mover request comes.(bz 888)
Send a blank reply to mover when library manager thinks it is a duplicate request. If not sent the mover retries for 10 minutes the same request.(bz 878)
Sometimes mover request to library manager comes with "Unknown" in volume status fields: BUSY RQ {'status': ('ok', None), 'returned_work': None, 'library': 'CD-LTO4F1', 'move r_type': '', 'current_location': '0000_000000000_0000172', 'ip_map': '', 'operation': None, 'read_only': 0, 'current_time': 1282137872.1632309, 'error_source ': None, 'external_label': 'VP0388', 'state': 'SETUP', 'r_a': (('131.225.13.106', 60099), 891215L, '131.225.13.106-60099-1277858829.946311-6245'), 'library_l ist': [('CD-LTO4F1.library_manager', ('131.225.13.3', 7010))], 'current_priority': (1, -1), 'transfer_deficiency': 1, 'time_in_state': 0.015711784362792969, 'address': ('131.225.13.106', 7871), 'volume_family': 'backups.d0en_metadata_backups_copy_1.cern', 'work': 'mover_busy', 'mover': 'LTO4_111.mover', 'volume_s tatus': (['Unknown', 'Unknown'], ['Unknown', 'Unknown']), 'unique_id': 'd0ensrv3n.fnal.gov-1282137470-32461-1'} This may lead to assignment of additional volume for writes, while the current volume is not filled in. (bz 853).
Bug fix: When preemptive (admin. priority) request gets sent to mover and mover returns dismount failure for the tape being preempted, the request does not get removed from volumes at movers list. This results in requests not being selected for a given volume family (write) or a give volume (read). (Bugzilla 761).
after conversation with Mike and Gene removed check for NOSPACE status from line 2049
check return status of is_vol_available call for NOSPACE http://uqbar/reviews/r/114/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=698
Code review changes.
library manager incorrectly handled admin priority read and write requests which resulted in: volume dismounts numerous times during scan. If in the request queue there are read and write requests with administrative priority with the same volume family then: If completed request was READ then in some cases there is an attempt to select write request for the bound volume if this volume can not be written it results in skippng this volume and its consequent dismount even if there are pending read requests. (bz #674)
Library Manager contacted volume clerk for information which is already in the read request ticket when adding request to the queue. Remove this call and get information directly from the ticket. Also few Traces added for better debugging. There also were some KeyErrors that need a better handling and debugging to better identify their causes.
trace added
allow_hipri was returning rq as None in case when a previous non hipri request was READ. It was not expected by the calling method. Now allow_hipri never return None as rq.
For mover_idle and mover_bound_volume requests set mover state to SETUP when sending work to mover.
Changes per Dmitry's comments for bugzilla 513
In case of duplicate mover_idle or mover_bound_volume requests send a blank reply to avoid possible mover hangs du to the lost library manager reply to the previous mover request.
Moved verify_data_transfer_request() in write_to_hsm() to correct place
replaced ALARM with INFO severety level (bugzilla 512)
Improved processing of write requests. If there are active volumes for a given file family and their number exceeds file family width try to write to all of them to fill unfilled volumes and avoid excessive mounts/dismounts (bugzilla tickets: 513,514,515)
Improved scalability. Individual ports for movers, encp clients, and other clients. Each port is served by separate thread.
========== accounting.py ====================================================================================
BUG FIX: fix exception in accounting server RB: http://uqbar/reviews/r/65/ Bugzilla: http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=637
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== atomic.py ====================================================================================
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
1) To fix two --status bugs. 2) To give a better error message when a ghost file is found during migration. 3) To prevent the full path look up information was then being dropped without using it, since it was already migrated. 4) Catch errors updating migration_history.
========== rawUDP_p.py ====================================================================================
define ret and and request_id so that they exist if exception occurs.(bz 917)
========== enstore_constants.py ====================================================================================
add Mounts/day per tape library plots http://uqbar/reviews/r/215/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=866
branches: 1.106.4; These patches are all related to modifying find_pnfs_file.py to support Chimera instead of just PNFS. Also, modify encp to use the new functionality. (Bugzilla #839, review board #196)
added module that plots mount/dismount latencies per robot per drive type RB http://uqbar/reviews/r/147/ Bugzilla http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=765
Added CAPACITY_PREFIX and RATE_PREFIX for bugzilla #700 (RB #116).
========== ejournal.py ====================================================================================
1) fix "Id" keyword (it was mistyped as "I") 2) remove duplicated code as per RB : http://uqbar/reviews/r/38/
added Id keyword per code review discussion RB : http://uqbar/reviews/r/38/ Bugzilla: http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=587
implementation of multithreaded file and volume clerk that use psycopg2 via edb
========== enstore_alarm_search_cgi.py ====================================================================================
correct HTML output (per incident (INC000000021389)
========== delete_at_exit.py ====================================================================================
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
This is Vijay's delete_at_exit Chimera integration patch. (review board #105)
Use python's default handling of SIGPIPE. We really want python to raise an IOError instead of calling the signal handler to exit the program. (bugzilla #741, review board #137)
========== encp.py ====================================================================================
bumping version to v3_10d because of encpCut
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
bumping version to v3_10c because of encpCut
Fixed a problem with encps causing Linux kernels to hang. The .(access)() paths of directories can be put together in a way that the filesystem does not form a tree, but a graph. The kernel implementation assuses only trees and hangs in a loop. This patch also addresses some issues with spurious errors from PNFS under high load. Incident ickets: INC000000056879, INC000000070546 Problem tickets: PBI000000000147, PBI000000000184 URL: https://plone4.fnal.gov/P0/Enstore_and_Dcache/developers/enstore-developers/documents/encp-investigation-of-inc000000056879-pbi000000000147/ Bugzilla ticket: http://www-enstore.fnal.gov/Bugzilla/show_bug.cgi?id=981 Reveiw board: http://uqbar.fnal.gov/reviews/r/324/diff/#index_header
Include useful information in the LM receive error message of encp. (bugzilla #939, review board #267)
branches: 1.990.2; bumping version to v3_10 because of encpCut
Encp now sends PNFS or Chimera ID for multiple copy and --put-cache cases. This is necessary to make new DiskMovers happy. (bugzilla #860, review board #208)
These patches are all related to modifying find_pnfs_file.py to support Chimera instead of just PNFS. Also, modify encp to use the new functionality. (Bugzilla #839, review board #196)
Fixed a string 'replace("_copy_")' to be 'replace("_copy_", "")'. (bugzilla #835, review board #191)
The --skip-pnfs switch was not being honored when reading a file by BFID. (bugzilla #805, review board #171)
These are additional patches for Chimera integration into encp. The namespace.StorageFS class now uses the __class__ member variable to better become one of the chimera.ChimeraFS, pnfs_agent_client.PnfsAgentClient or pnfs.Pnfs class. (bugzilla #649)
Chimera integration changes
========== tapes_burn_rate_plotter_module.py ====================================================================================
handle situation when volume is present in status table of drivestat database and not present in volume table of enstoredb datatabase http://uqbar/reviews/r/356 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1048
branches: 1.13.4; add slope in MB/s to burn rate plots
========== ratekeeper.py ====================================================================================
add handling of non-postgresql exceptions in update_slots function http://uqbar/reviews/r/306/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=997
========== cpio_odc_wrapper.py ====================================================================================
replaced string exceptions
========== accounting_client.py ====================================================================================
create new table "drive_data" and code that fills the data in this table. Needed for drive degradation studies. http://uqbar/reviews/r/179 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=817
========== enstore_start.py ====================================================================================
branches: 1.71.2; BUG FIX: enstore_start.py. Not defined variable. msg was not defined on line 91: except (socket.error, socket.herror, socket.gaierror) (bz 927)
When there is some program running with server name argument, the server does not start with "enstore start .." command. Make changes in enstore_start.py to better parse the EPS output. (bz 788)
========== enstore_plots.py ====================================================================================
removed classes that were used to generate mount latency plots Bugzilla : http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=776 RB : http://uqbar/reviews/r/155/
========== volume_clerk_client.py ====================================================================================
introduce decimal space definitions http://uqbar/reviews/r/320/
========== mover_summary_plotter_module.py ====================================================================================
avoid histogram overflows, use solid lines to draw histograms
========== pnfs_agent.py ====================================================================================
1) A missing return in readlayer(). 2) KeyError used instead of e_errors.KEYERROR in readlayer(). 3) find_pnfs_file.BOTH should be enstore_constants.BOTH in find_pnfsid_path(). (bugzilla #993, review board #303)
Chimera integration changes
========== enstore_log_file_search_cgi.py ====================================================================================
correct HTML output (per incident (INC000000021389)
========== en_eval.py ====================================================================================
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time. (bugizlla #702, review board #117)
========== enmv.py ====================================================================================
Replace os.*() function calls with those from file_utils.py in enmv.py. (Bugzilla #1026, review board #335)
Added the --match-directory-file-family and --match-volume-file-family switches for enmv. (bugzilla #838, review board #195)
Put Chimera support into enmv. (bugzilla #740, review board #133)
Allow files on tapes that have had their file_families modified to be movable. This allows tapes squeezed together via migration, from tapes with multiple file families, to still be enmv-ed. (bugzilla #739, review board #132)
========== weekly_summary_report.py ====================================================================================
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== scanfiles.py ====================================================================================
A migrated file that has its destination copy deleted by the user before it was scanned was giving incorrect reasons for the error. This is now reported as a warning that the destination just needs to be scanned for the source file to be marked deleted. (bugzilla #969, review board #286)
1) If a primary file (with multiple copies) was replaced with a new file an error was reported where everything is okay. (bugzilla #955) 2) Older files did not put the drive and CRC information into layer 4. These are not fatal to encp, but lets warn about them. (bugzilla #955) 3) For multiple copy files we need to look for the original copies pnfsid in case they are not the same. (bugzilla #957)
Improved the perfomance of encp, migration and metadata scanning. (bugzilla #931, review board #262) This includes: 1) fewer PNFS/Chimera accesses seting file sizes for long filename files 2) handling PNFS databases that have multiple entry points 3) not hanging SLF4 kernels with paths like /pnfs/fs/usr/Migration/.(access)(000000000000000000001080)/Migration/.(get)(database) 4) processing the list of current mountpoints once (namespace.py) instead of twice, once for PNFS and once for Chimera. 5) cache found PNFS database starting directories and their associated .(get)(database) values 6) retrying ESTALE errors, since PNFS has been found to be inconsistant
1) Removed debugging print statements. 2) If find_pnfs_file.find_pnfid_path() raises an exception that includes a msg.filename value, include that information in the error output for easier investigation. (bugzilla #891, review board #232)
Added support for scanning Chimera filesystems instead of just PNFS. (Bugzilla #840, review board #197)
Fixed issue with check() needing the Interface object to be available globally. (bugzilla #850, review board #202)
Patched to better handle reporting if an unknown file is involved in migration. This should never be able to happen. (Bugzilla #723, Reviewboard #129)
Divide up the scanning of a volume across multiple threads. (Bug #572)
========== accounting_server.py ====================================================================================
create new table "drive_data" and code that fills the data in this table. Needed for drive degradation studies. http://uqbar/reviews/r/179 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=817
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== file_clerk.py ====================================================================================
branches: 1.290.4; pass database name to edb.FileDB and edb.VolumeDB constructors RB : http://uqbar/reviews/r/172 Bugzilla : http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=800
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
1) multi threaded info server and file server w/ thread safe DB API 2) proper handling of exceptions DB transactions in edb Bugzilla #579 http://uqbar/reviews/r/28
bug fix: remove line that contained undefined variable Bugzilla : #575 RB : http://uqbar/reviews/r/25
implementation of multithreaded file and volume clerk that use psycopg2 via edb
========== esgdb.py ====================================================================================
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== udp_server.py ====================================================================================
Add log messages to reply_with_list for a case when deepcopy fails.
========== edb.py ====================================================================================
http://uqbar/reviews/r/338/ some minor imporvement whe nconstructing insert query
Conversion of default data in different time zones using mktime may cause an OverflowError exception. This patch fixes this problem, occured in Russia. (bz 1020).
branches: 1.50.4; convert specialized dictionary returned by psycopg2 into ordinary dictionary on the fly. Fixes ticket: INC000000029450 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=667 http://uqbar/reviews/r/88
BUG FIX (URGENT): replace utctimetuple() with timetuple() when handling datetime.datetime. http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=655 http://uqbar/reviews/r/83/
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
1) multi threaded info server and file server w/ thread safe DB API 2) proper handling of exceptions DB transactions in edb Bugzilla #579 http://uqbar/reviews/r/28
added standard header
implementation of multithreaded file and volume clerk that use psycopg2 via edb
conform to code standards
adhere to code standards
file_clerk: 1) in __tape_list use connection pool 2) spawn tape_list3 in separate thread 3) add two new parameters - max number of threads and maximum number of open database connections (which is passed to edb) edb: 1) add new data member connection pool (from DBUtils) 2) add new parameter - maximum open connection allowed in the pool
========== entv.py ====================================================================================
Make entv run on an SLF6 node. (bugzilla #1021, review board #325)
1) Turn the dictionary for storing client colors into a list. This allows the .entvrc file writer to choose the order that color rules are applied. For example, if the user wants cmsstor12 to have a special color, then that rule would need to go before the more general rule of cmsstor[1-9]* to display correctly. 2) Add implicit beginning of line (^) and end of line ($) regular expression characters to turn cmsstor12 into ^cmsstor12$ for matching. This has the effect of preventing nodes like cmsstor121 from inadvertently matching cmsstor12. (bugzilla #961, review board #280)
Changes from comments made by Sasha in regard to bugzilla #916 (review board #249).
Bug 916 - BUG FIX: Don't restart entire process; NEW FEATURE: Allow for users to choose libraries.
1) Don't exit when a configuration server is not available. acc_daily_summary.py accounting_client.py accounting_client.pyc accounting.py accounting.pyc accounting_query.py accounting_server.py aci.py alarm_client.py alarm_client.pyc alarm.py alarm.pyc alarm_server.py alarm_server.pyc aml2_dummy.py aml2_log.py aml2.py atomic.py atomic.pyc backup_backup.py backup_client.py backup_client.pyc backup.py backup.pyc bfid_db.py bytes_per_day_plotter_module.py callback.py callback.pyc cern_wrapper.py change_loc_cookie.py change_s_i.py charset.py charset.pyc checkdb.py checkdbs.py check_pnfs_db.py chimera.py chimera.pyc cleanUDP.py cleanUDP.pyc config-filec configuration_client.py configuration_client.pyc configuration_server.py configuration_server.pyc cpio_odc_wrapper.py create_sg_db.py CVS db_compare.py db_dump.py db.py db_retrieve_backup.py dbs.py dcache_make_queue_plot_page.py dcache_monitor.py delete_at_exit.py delete_at_exit.pyc delfile_chimera.py discard_copy.py discipline.py discipline.pyc disk_driver.py disk_driver.pyc dispatching_worker.py dispatching_worker.py.bak_before_merge dispatching_worker.pyc dispatching_worker.py.new drivestat2.py drivestat_client.py drivestat_client.pyc drivestat_server.py drive_utilization_plotter_module.py duplicate.py duplication_util.py ecron_util.py ecron_util.pyc edb.py edb.pyc e_errors.py e_errors.pyc ejournal.py ejournal.pyc encp_admin.py ENCPBIN encp.py encp.pyc encp_rate_multi_plotter_module.py encp_ticket.py encp_ticket.pyc encp_user2.py encp_user.py encp_wrapper.py encp_wrapper.pyc en_eval.py en_eval.pyc enmail.py enmv_admin.py enmv.py enmv.pyc enmv_user.py enstore_admin.py enstore_alarm_cgi.py enstore_alarm_search_cgi.py ENSTORE_BIN ENSTORE_BIN_TEMP enstore_constants.py enstore_constants.pyc enstore_display.py enstore_display.pyc enstore_display.py.conflict enstore_erc_functions.py enstore_erc_functions.pyc enstore_file_listing_cgi.py enstore_files.py enstore_files.pyc enstore_functions2.py enstore_functions2.pyc enstore_functions3.py enstore_functions3.pyc enstore_functions.py enstore_functions.pyc enstore_html.py enstore_html.pyc enstore_log_file_search_cgi.py enstore_mail.py enstore_mail.pyc enstore_make_generated_page.py enstore_make_log_calendar.py enstore_make_plot_page.py enstore_overall_status.py enstore_pg.py enstore_pg.pyc enstore_plots.py enstore_plotter_framework.py enstore_plotter_module.py enstore.py enstore.pyc enstore_restart.py enstore_restart.pyc enstore_saag_network.py enstore_saag_network.pyc enstore_saag.py enstore_saag.pyc enstore_show_inventory_cgi.py enstore_show_inv_summary_cgi.py enstore_start.py enstore_start.pyc enstore_status.py enstore_status.pyc enstore_stop.py enstore_stop.pyc enstore_system_html.py enstore_up_down.py enstore_up_down.pyc enstore_user2.py enstore_user_cgi.py enstore_user.py enstore_utils_cgi.py ensync_admin.py ensync.py ensync_user.py entv.py entv.py.conflict entv.tar esgdb.py esgdb.pyc espion.py estart.py estop.py event_relay_client.py event_relay_client.pyc event_relay_messages.py event_relay_messages.pyc event_relay.py event_relay.py.new fake_quota.py fdbdump.py file_clerk_client.py file_clerk_client.pyc file_clerk.py file_clerk.pyc file_family_analysis_plotter_module.py fileinfo.py file_utils.py file_utils.py.~1.23.~ file_utils.pyc fill_slot_usage.py find_pnfs_file.py find_pnfs_file.pyc flip_tab_acsls.sh flip_tab_das.sh fs.py ftt_driver.py ftt.py generic_alarm.py generic_client.py generic_client.pyc generic_driver.py generic_driver.pyc generic_server.py generic_server.pyc get_all_bytes_counter.py get_cron_title.py get.py get_total_bytes_counter.py hello histogram.py hostaddr.py hostaddr.pyc hostaddr.py.new host_config.py host_config.pyc html_main.py idlemovers.py info_client.py info_client.pyc info_server.py inquisitor_client.py inquisitor_client.pyc inquisitor_plots.py inquisitor.py interface.py inventory.py journal.py label_tape.py library_manager_client.py library_manager_client.pyc library_manager_nanny.py library_manager.py lintit lm_list.py lm_list.pyc lm_que_length.py log_client.py log_client.pyc log_finish_event.py log_server.py log_start_event.py log_trans_fail.py m2.py Makefile make_ingest_rates_html_page.py make_migrated_as_duplicate.py make_original_as_duplicate.py makeplot.py make_pyc_files.py manage_queue.py manage_queue.pyc match_syslog.py media_changer_client.py media_changer_client.pyc media_changer.py media_changer_test.py migrate.py migration_scope.py migration_summary_plotter_module.py module_trace.py monitor_client.py monitor_client.pyc monitored_server.py monitored_server.pyc monitor_server.py mount_latency_plotter_module.py mounts_per_robot_plotter_module.py mounts_plot.py mounts_plotter_module.py movcmd_mc.py mover_client.py mover_client.pyc mover_constants.py mover_constants.pyc mover-nanny.py mover.py mover_summary_plotter_module.py mpq.py mpq.pyc multiple_interface.py multiple_interface.pyc namespace.py namespace.pyc net_directory.py net_driver.py net_driver.pyc null_driver.py null_driver.pyc null_wrapper.py null_wrapper.pyc on-call.py operation.py option.py option.pyc plotter_main.py plotter.py pnfs_agent_client.py pnfs_agent_client.pyc pnfs_agent.py pnfs_backup_plot.py pnfs_backup_plotter_module.py pnfsidparser.py pnfs.py pnfs.py.1.302_diff pnfs.pyc pnfs.py.old priority_selector.py priority_selector.pyc put.py quickquota.py quota.py quota.pyc quotas_plotter_module.py ratekeeper_client.py ratekeeper_client.pyc ratekeeper_plotter_module.py ratekeeper.py ratemeter.py rate_test.py rawUDP_p.py rawUDP_p.pyc rawUDP.py readonly_volumes.py read_write_condition_variable.py read_write_condition_variable.pyc recent_file_listing.py remedy_interface.py report_volume_quotas.py requests_snapshot.py restoredb.py rm_volmap.py run1.py safe_dict.py scanfiles.py scanfiles.pyc scsi_mode_select.py send_alive.py set_lm_noread.py setpath.py setpath.pyc sg_db.py show_file_cgi.py show_volume_cgi.py slots_usage_plotter_module.py stethoscope.py string_driver.py string_driver.pyc summary_bpd_plotter_module.py summary_burn_rate_plotter_module.py summary_plots.py swap_original_and_copy.py tab_flipping_exemption.py tab_flipping_nanny.py take_out_acsls.sh take_out_das.sh tapes_burn_rate_plotter_module.py time_fifo.py timeofday.py timeofday.pyc Trace.py Trace.pyc udp_client.py udp_client.pyc udp_cl.py udp_common.py udp_common.pyc udp_load_test.py udp_server.py udp_server.pyc udp_srvr.py vdbdump.py volume_assert.py volume_assert.py.~1.29.~ volume_assert_wrapper.py volume_assert_wrapper.pyc volume_clerk_client.py volume_clerk_client.pyc volume_clerk.py volume_clerk.pyc volume_family.py volume_family.pyc web_server.py weekly_summary_report.py wr_errors_logger.py write_protection_alert.py www_server.py www_server.pyc yank.py If the $ENSTORE_CONFIG_HOST configuration server is down, pull known Enstore systems from the ~/.entvrc file. 2) If the user iconfified entv, don't un-iconify it at the next hourly re-init. (bugzilla #899, review board #242)
Added menu options to show waiting clients and to choose between the linear or circular layouts. (bugzilla #760, review board #143)
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time. (bugizlla #702, review board #117)
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time.
Prevent entv from crashing if a mover is still in the configuration, that no longer exists. (bugzilla #691)
Sasha reported a case where a mover in error state was not being displayed correctly. Bugzilla bug #560 comment #2.
In total_memory(), the second sysconf() call should have been to get the size of a page in memory, instead of getting the number of pages in memory.
If the main thread is hung, restart the entire process.
Cleanup some error messages.
Minimize sending show() queries to the inquisitor.
========== migrate.py ====================================================================================
Implemented the duplication cleanup_after_scan() function. Previously, it was a no-op. (bugzilla #973, review board #289)
Useful patch to have the migration logging functions use the Trace.py logging functions internally. This keeps the output in sync between threads. (bugzilla #931, review board #266)
Putting 2.6 million files on the command line won't work. Instead read them from a file specified with --infile in a seperate thread and put the targets in a queue. (bugzilla #914, review board #248)
Loop over the correct list of files in _make_copies(). (bugzilla #905, review board #245)
1) Fix --status output for duplicated bfids/volumes that have had the primary and multiple copies swapped. (bugzilla #876, review board #223) 2) Temporary file cleanup was able to miss some D0en files because of the "sam" in /pnfs/sam/dzero not just being /pnfs/dzero. (bugzilla #890, review board #223)
The BOTH, FS and NONFS contants were moved from find_pnfsid_file.py to enstore_constants.py. This patch reflects that change. (bugzilla #857, review board #205)
Handle re-migrating a file that has been replaced. This is most likely, from bfid1 -> bfid2 -> bfid3; then re-running the migration for bfid1. (bugzilla #845, review board #200)
Detect when a previous make-failed-copies attempt failed and reuse the destination copy. (bugzilla #835, review board #191)
final_scan_volume() is obtaining the current pnfs location for one test; and does so incorrectly. final_scan_file() is also obtaining this information, but does so correctly. The is_expected_volume() check is now moved to final_scan_file(). (bugzilla #816, review board #178)
1) The migrate --status output for a bad file that has not been involved in migration or multiple copies was being dropped. (bugzilla #798, review board #171) 2) There is a check to make sure that there is at least on destination copy before doing the restore. It should have been "> 0", but instead was "> 1". (bugzilla #799, review board #171) 3) Scanning a multiple copy made during migration by BFID was resulting in a traceback. (bugzilla #805, review board #171)
The patch adds --make-copies to duplicate.py: 1) Add support for --make-copies in option.py. This includes fixing an arguement processing bug with a required arguement first appearing in the extra values section of the switch definition. 2) Added make_copies() in migrate.py based on make_failed_copies(). 3) Reworked migrate.migrate() when --with-final-scan is used to join all read and write threads, then sort all the destination files by destination location, then start the final_scan threads. The old way worked, but proved to be very inefficent with tape access. An unrelated item also fixed: CDMS122444004900000 on VOB738 had multiple migrations run simultaneously. One set completed successfully, while the other did not. The code that determines if a source file is all done returns True, because it found a set of successful destination files. The Total done check fails because it finds the unsuccessful set of destination files. This patch gives the user the error, while previously the users only got "failed from previous errors" without there being a previous error. For all: bugzilla #791, review board #166
Fixed an issue with running --make-failed-copies for a deleted file on a machine without an admin PNFS path. (bugzilla #782, review board #158)
Adjust the checks in migrate_volume() to allow for a migration destination tape to be scanned after it has already been migrated to yet another tape. (bugzilla #779, review board #156)
Lets get the correct versions commited this time... 1) --restore switch now works for duplication 2) --status now has --migration-only and --multiple-copy-only modifying switches to show only a limited type(s) of status. 3) In addition to supplying a bfid or volume, the user can now give ":" for migrating/duplicating, scanning, status reporting or restoring. Reporting --status on a per file basis is also new. Previously, it only worked on volumes. 4) Added two new functions: is_migration_history_done() and is_migration_history_closed(). These functions return True if all source and destination pairs have their entries in the migration_history table. The _done() function only makes sure that all the entries exist. The _closed() function additional makes sure all the "closed_time" fields are also filled in. (bugzilla #772, review board #153)
1) --restore switch now works for duplication 2) --status now has --migration-only and --multiple-copy-only modifying switches to show only a limited type(s) of status. 3) In addition to supplying a bfid or volume, the user can now give ":" for migrating/duplicating, scanning, status reporting or restoring. Reporting --status on a per file basis is also new. Previously, it only worked on volumes. 4) Added two new functions: is_migration_history_done() and is_migration_history_closed(). These functions return True if all source and destination pairs have their entries in the migration_history table. The _done() function only makes sure that all the entries exist. The _closed() function additional makes sure all the "closed_time" fields are also filled in. (bugzilla #772, review board #153)
1) To fix two --status bugs. 2) To give a better error message when a ghost file is found during migration. 3) To prevent the full path look up information was then being dropped without using it, since it was already migrated. 4) Catch errors updating migration_history.
Fixed issue involving migration deleted files after patches for bugzilla #700 (1.216) and #715 (1.217). ENOENT is accurate for deleted files, but not correct for them either. This is for bugzilla #757, review board #139.
Set debugging flag to false after last patch was commited.
Fix duplication to handle deleted files. (bugzilla 715, review board 123)
New migration input mode and limit re-reads of restarted migrations. (bugzilla 700, review board 116)
Fixed is_duplicated() to have a flag that includes or excludes multiple copy files from consideration. (bugzilla 695, review board 113)
Improve --restore to handle files migrated from two migrations running simultaneously. (bugzilla #686, review board #109)
1) migrate.migrate_files() takes a list of bfids, not a list of file_records. 2) Handle files in active_file_copying table that have already been migrated. 3) Handle files in active_file_copying table that have had their original copies migrated. 4) Handle files in active_file_copying table that have an unknown deleted status. (bugzilla #639, review board #73)
Fixed --status output to handle showing multiple copy files, not just those from migration, duplication or cloning. (bugzilla #605, review board #48) Have the migration code add migration table information retroactively for files migrated to multiple copies. Also, show the multiple copies with a 1 in the S --status output column. (bugzilla #406, review board #48)
Address the queue size for migration. (bugzilla bug #571, review board #47)
Added feature that migrate.py --status can now output information on multiple copies not produced using duplication. (bugzilla bug #605, review board #48) Also, modified the --status help string to include information on the S, D and B fields.
========== encp_wrapper.py ====================================================================================
split string on empty space before passing it to function that expects list http://uqbar/reviews/r/340/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=1030
branches: 1.14.4; Make the logname and threadname values in Trace.py threadsafe. Migration is known to have an issue, so encp_wrapper and volume_assert_wrapper also are modified. (bugzilla bug #533)
========== summary_burn_rate_plotter_module.py ====================================================================================
handle situation when volume is present in status table of drivestat database and not present in volume table of enstoredb datatabase http://uqbar.fnal.gov/reviews/r/357
========== volume_assert_wrapper.py ====================================================================================
Make the logname and threadname values in Trace.py threadsafe. Migration is known to have an issue, so encp_wrapper and volume_assert_wrapper also are modified. (bugzilla bug #533)
========== udp_client.py ====================================================================================
Additional patches to get entv for SLF6 to work. (bugzilla 1021, review board 325)
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time. (bugizlla #702, review board #117)
========== charset.py ====================================================================================
Fix issues with new style disk volumes and location cookies not being recognized correctly. (bugzilla #772, review board #153)
========== enstore_display.py ====================================================================================
Make entv run on an SLF6 node. (bugzilla #1021, review board #325)
1) Turn the dictionary for storing client colors into a list. This allows the .entvrc file writer to choose the order that color rules are applied. For example, if the user wants cmsstor12 to have a special color, then that rule would need to go before the more general rule of cmsstor[1-9]* to display correctly. 2) Add implicit beginning of line (^) and end of line ($) regular expression characters to turn cmsstor12 into ^cmsstor12$ for matching. This has the effect of preventing nodes like cmsstor121 from inadvertently matching cmsstor12. (bugzilla #961, review board #280)
Changes from comments made by Sasha in regard to bugzilla #916 (review board #249).
Bug 916 - BUG FIX: Don't restart entire process; NEW FEATURE: Allow for users to choose libraries.
1) Don't exit when a configuration server is not available. acc_daily_summary.py accounting_client.py accounting_client.pyc accounting.py accounting.pyc accounting_query.py accounting_server.py aci.py alarm_client.py alarm_client.pyc alarm.py alarm.pyc alarm_server.py alarm_server.pyc aml2_dummy.py aml2_log.py aml2.py atomic.py atomic.pyc backup_backup.py backup_client.py backup_client.pyc backup.py backup.pyc bfid_db.py bytes_per_day_plotter_module.py callback.py callback.pyc cern_wrapper.py change_loc_cookie.py change_s_i.py charset.py charset.pyc checkdb.py checkdbs.py check_pnfs_db.py chimera.py chimera.pyc cleanUDP.py cleanUDP.pyc config-filec configuration_client.py configuration_client.pyc configuration_server.py configuration_server.pyc cpio_odc_wrapper.py create_sg_db.py CVS db_compare.py db_dump.py db.py db_retrieve_backup.py dbs.py dcache_make_queue_plot_page.py dcache_monitor.py delete_at_exit.py delete_at_exit.pyc delfile_chimera.py discard_copy.py discipline.py discipline.pyc disk_driver.py disk_driver.pyc dispatching_worker.py dispatching_worker.py.bak_before_merge dispatching_worker.pyc dispatching_worker.py.new drivestat2.py drivestat_client.py drivestat_client.pyc drivestat_server.py drive_utilization_plotter_module.py duplicate.py duplication_util.py ecron_util.py ecron_util.pyc edb.py edb.pyc e_errors.py e_errors.pyc ejournal.py ejournal.pyc encp_admin.py ENCPBIN encp.py encp.pyc encp_rate_multi_plotter_module.py encp_ticket.py encp_ticket.pyc encp_user2.py encp_user.py encp_wrapper.py encp_wrapper.pyc en_eval.py en_eval.pyc enmail.py enmv_admin.py enmv.py enmv.pyc enmv_user.py enstore_admin.py enstore_alarm_cgi.py enstore_alarm_search_cgi.py ENSTORE_BIN ENSTORE_BIN_TEMP enstore_constants.py enstore_constants.pyc enstore_display.py enstore_display.pyc enstore_display.py.conflict enstore_erc_functions.py enstore_erc_functions.pyc enstore_file_listing_cgi.py enstore_files.py enstore_files.pyc enstore_functions2.py enstore_functions2.pyc enstore_functions3.py enstore_functions3.pyc enstore_functions.py enstore_functions.pyc enstore_html.py enstore_html.pyc enstore_log_file_search_cgi.py enstore_mail.py enstore_mail.pyc enstore_make_generated_page.py enstore_make_log_calendar.py enstore_make_plot_page.py enstore_overall_status.py enstore_pg.py enstore_pg.pyc enstore_plots.py enstore_plotter_framework.py enstore_plotter_module.py enstore.py enstore.pyc enstore_restart.py enstore_restart.pyc enstore_saag_network.py enstore_saag_network.pyc enstore_saag.py enstore_saag.pyc enstore_show_inventory_cgi.py enstore_show_inv_summary_cgi.py enstore_start.py enstore_start.pyc enstore_status.py enstore_status.pyc enstore_stop.py enstore_stop.pyc enstore_system_html.py enstore_up_down.py enstore_up_down.pyc enstore_user2.py enstore_user_cgi.py enstore_user.py enstore_utils_cgi.py ensync_admin.py ensync.py ensync_user.py entv.py entv.py.conflict entv.tar esgdb.py esgdb.pyc espion.py estart.py estop.py event_relay_client.py event_relay_client.pyc event_relay_messages.py event_relay_messages.pyc event_relay.py event_relay.py.new fake_quota.py fdbdump.py file_clerk_client.py file_clerk_client.pyc file_clerk.py file_clerk.pyc file_family_analysis_plotter_module.py fileinfo.py file_utils.py file_utils.py.~1.23.~ file_utils.pyc fill_slot_usage.py find_pnfs_file.py find_pnfs_file.pyc flip_tab_acsls.sh flip_tab_das.sh fs.py ftt_driver.py ftt.py generic_alarm.py generic_client.py generic_client.pyc generic_driver.py generic_driver.pyc generic_server.py generic_server.pyc get_all_bytes_counter.py get_cron_title.py get.py get_total_bytes_counter.py hello histogram.py hostaddr.py hostaddr.pyc hostaddr.py.new host_config.py host_config.pyc html_main.py idlemovers.py info_client.py info_client.pyc info_server.py inquisitor_client.py inquisitor_client.pyc inquisitor_plots.py inquisitor.py interface.py inventory.py journal.py label_tape.py library_manager_client.py library_manager_client.pyc library_manager_nanny.py library_manager.py lintit lm_list.py lm_list.pyc lm_que_length.py log_client.py log_client.pyc log_finish_event.py log_server.py log_start_event.py log_trans_fail.py m2.py Makefile make_ingest_rates_html_page.py make_migrated_as_duplicate.py make_original_as_duplicate.py makeplot.py make_pyc_files.py manage_queue.py manage_queue.pyc match_syslog.py media_changer_client.py media_changer_client.pyc media_changer.py media_changer_test.py migrate.py migration_scope.py migration_summary_plotter_module.py module_trace.py monitor_client.py monitor_client.pyc monitored_server.py monitored_server.pyc monitor_server.py mount_latency_plotter_module.py mounts_per_robot_plotter_module.py mounts_plot.py mounts_plotter_module.py movcmd_mc.py mover_client.py mover_client.pyc mover_constants.py mover_constants.pyc mover-nanny.py mover.py mover_summary_plotter_module.py mpq.py mpq.pyc multiple_interface.py multiple_interface.pyc namespace.py namespace.pyc net_directory.py net_driver.py net_driver.pyc null_driver.py null_driver.pyc null_wrapper.py null_wrapper.pyc on-call.py operation.py option.py option.pyc plotter_main.py plotter.py pnfs_agent_client.py pnfs_agent_client.pyc pnfs_agent.py pnfs_backup_plot.py pnfs_backup_plotter_module.py pnfsidparser.py pnfs.py pnfs.py.1.302_diff pnfs.pyc pnfs.py.old priority_selector.py priority_selector.pyc put.py quickquota.py quota.py quota.pyc quotas_plotter_module.py ratekeeper_client.py ratekeeper_client.pyc ratekeeper_plotter_module.py ratekeeper.py ratemeter.py rate_test.py rawUDP_p.py rawUDP_p.pyc rawUDP.py readonly_volumes.py read_write_condition_variable.py read_write_condition_variable.pyc recent_file_listing.py remedy_interface.py report_volume_quotas.py requests_snapshot.py restoredb.py rm_volmap.py run1.py safe_dict.py scanfiles.py scanfiles.pyc scsi_mode_select.py send_alive.py set_lm_noread.py setpath.py setpath.pyc sg_db.py show_file_cgi.py show_volume_cgi.py slots_usage_plotter_module.py stethoscope.py string_driver.py string_driver.pyc summary_bpd_plotter_module.py summary_burn_rate_plotter_module.py summary_plots.py swap_original_and_copy.py tab_flipping_exemption.py tab_flipping_nanny.py take_out_acsls.sh take_out_das.sh tapes_burn_rate_plotter_module.py time_fifo.py timeofday.py timeofday.pyc Trace.py Trace.pyc udp_client.py udp_client.pyc udp_cl.py udp_common.py udp_common.pyc udp_load_test.py udp_server.py udp_server.pyc udp_srvr.py vdbdump.py volume_assert.py volume_assert.py.~1.29.~ volume_assert_wrapper.py volume_assert_wrapper.pyc volume_clerk_client.py volume_clerk_client.pyc volume_clerk.py volume_clerk.pyc volume_family.py volume_family.pyc web_server.py weekly_summary_report.py wr_errors_logger.py write_protection_alert.py www_server.py www_server.pyc yank.py If the $ENSTORE_CONFIG_HOST configuration server is down, pull known Enstore systems from the ~/.entvrc file. 2) If the user iconfified entv, don't un-iconify it at the next hourly re-init. (bugzilla #899, review board #242)
Corrected some comments from code review. (bugzilla #760, review board #143)
Added menu options to show waiting clients and to choose between the linear or circular layouts. (bugzilla #760, review board #143)
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time. (bugizlla #702, review board #117)
1) Added the "systems" drop down menu. This provides the found selection of Enstore systems to display. 2) Fixed the --generate-messages-file switch to output consistent information. 3) Handle adding new movers on the fly to the display. 4) Removed --movers-file switch. With #2 and #3 this is obsolete. 5) Have --messages-file use the time information to space the replayed display updates in real time.
Patched division-by-zero traceback when get_mover_list() returns an empty list. (bug #569)
Modified some numeric literals to be constants after reviewboard comments (see also bugzilla bug #560).
If the main thread is hung, restart the entire process.
Minimize sending show() queries to the inquisitor.
========== enstore_make_plot_page.py ====================================================================================
add Mounts/day per tape library plots http://uqbar/reviews/r/215/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=866
added module that plots mount/dismount latencies per robot per drive type RB http://uqbar/reviews/r/147/ Bugzilla http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=765
========== library_manager_client.py ====================================================================================
branches: 1.169.4; Added reset_pending_queue_counters() method and --reset-queue-counters option to reset pending queue of data transfer requests counters: number of accepted requests resets to the number of requests in the queue, number of deleted requests resets to 0
========== histogram.py ====================================================================================
add Mounts/day per tape library plots http://uqbar/reviews/r/215/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=866
print warning if we get ValueError on sqrt().
make sure variance is never negative which was happening for large numbers. http://uqbar/reviews/r/244/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=901
========== volume_clerk.py ====================================================================================
reset volume comment on recycle http://uqbar/reviews/r/336/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=240
removed "limit 1" on query executed by find_matching_volume function. It had unintended consequence of drawing extra blank tape even if non-full tapes exist for a given file family http://uqbar/reviews/r/273/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=951
pass database name to edb.FileDB and edb.VolumeDB constructors RB : http://uqbar/reviews/r/172 Bugzilla : http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=800
thread safety : make sure that esgdb.SGDb that holds pg connection to database is accesset sequentially by multiple threads. Otherwise volume_clerk suffers instability and reports corrupted results form sg_count table http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=682 http://uqbar/reviews/r/107
pass max_connections parameter to VolumeDB. Sat max_connections to max_threads+1 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=660 http://uqbar.fnal.gov/reviews/r/84
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
fix enstore volume --history command introduction of to_char(time, ...) broke the client. modified SQL statement to provide the key name the client expects RB: http://uqbar/reviews/r/58/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=622
pass ticket to restricted_access http://uqbar/reviews/r/36 http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=586
it was noticed that sorting on same values could be unpredictable. Added label to the order clause to avoid that.
proper syntax
modify restricted_access to accept reply_address as argument
implementation of multithreaded file and volume clerk that use psycopg2 via edb
========== generic_server.py ====================================================================================
Improve the error reporting of serve_forever_error(). (bugizlla #678, review board #141).
========== plotter.py ====================================================================================
BUG FIX: fix make_sg_plot after changes to option.py plotter.py fails. This patch fixes the problem. http://uqbar/reviews/r/194/
added module that plots mount/dismount latencies per robot per drive type RB http://uqbar/reviews/r/147/ Bugzilla http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=765
========== mover.py ====================================================================================
In stream_write use convert_0_adler32_to_1_adler32 if crc base 0 failed to check if crc base 1 is correct. This is done the same way it is done in read_tape. Bugzilla 1035
Modify mover.py so that it goes to OFFLINE state, if it was unable to connect on data socket. The mover sends a corresponding alarm. Bugzilla 1022
T10000C specific. 1. Set "Allow Maximum Capacity" (default: set) 2. Set compression (default: no compression)
Improvement:alarm should say which mover for READ_VOL1_READ_ERR (bz 958).
Fixed bug for reopened bz 950.
Control calcuation of CRC for read/assert operations. Default 0: calculate CRC when writing to network (was when reading from tape). Mover configuration parameter: "read_crc_control" value: 0/1
branches: 1.1137.2; Mover did not assign network intefrace specified by data_ip. As a result if 2 movers were configured to run on the same host, they both were using the same default inteface. This resulted in traffic 2 times slower if going into the same direction. This change fixes this problem and also addresses the case when encp (client) runs on the same host with mover.(bz 925)
Bug: Code updates file clerk and volume clerk dictionaries in transfer setup. So if the preemting request comes some key:value pairs may not get updated, leaving them unchanged from the previous request. This change fixes this problem (bz 870).
1. The mover currently records error count since tape mount in per file transaction. It must record deltas - errors related to this file transfer. 2. Drive rates must be calculated as file_size/io_time 3. Mover checks the drive rates: file_size/block_rw_time and compares it with a specified threshold per file transfer. If drive rate is lower than the threshold mover makes a record in suspect drives table in accounting db and sends an alarm. (bz 391)
If dismount fails close net_driver (data connection). If there is a preemptive dismount and net_driver is not closed, the client (encp) port does not get disconnected. This resulted in 15 min timeout for encp retry. When fixed, encp retries immediately.(bz #767)
Set volume to NOACCESS if e_errors.MC_VOLNOTFOUND is returned by media changer.(bz 641)
Legacy enstore disk file read was broken when the new local name convention was introduced, which resulted in inability to read old enstore disk files.
Bugzilla ticket 591 1. Code cleanup. 2. Moving common for tape and disk mover parts into one place. 3. Changed volume name in DiskMover 4. Convert pnfs id into local file name for DiskMover
Code review for bz 552 changes.
use sysconf to get total RAM size
fixed formula for MAX_BUFFER it must be: MAX_BUFFER = long(mem_total)*KB - GB, because mem_total is in KB
fixed typo
fixed typo
Per bugzilla 552 code review changes.
run set_max_buffer() in a proper place
fixed set_max_buffer()
1. Set MAX_BUFFER depending on the python binary type: 32, or 64 bit. 2. Use enatore_funstions2.shell_command
========== find_pnfs_file.py ====================================================================================
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
Find paths for files that had originally been written with /pnfs/fnal.gov/usr/ style paths on nodes without /pnfs/fs mounted. (bugzilla #959, review board #277)
Improved the perfomance of encp, migration and metadata scanning. (bugzilla #931, review board #262) This includes: 1) fewer PNFS/Chimera accesses seting file sizes for long filename files 2) handling PNFS databases that have multiple entry points 3) not hanging SLF4 kernels with paths like /pnfs/fs/usr/Migration/.(access)(000000000000000000001080)/Migration/.(get)(database) 4) processing the list of current mountpoints once (namespace.py) instead of twice, once for PNFS and once for Chimera. 5) cache found PNFS database starting directories and their associated .(get)(database) values 6) retrying ESTALE errors, since PNFS has been found to be inconsistant
branches: 1.19.2; Removed debugging print statements. (bugzilla #891, review board #232)
These patches are all related to modifying find_pnfs_file.py to support Chimera instead of just PNFS. Also, modify encp to use the new functionality. (Bugzilla #839, review board #196)
Call pnfs.get_enstore_pnfs_path() and pnfs.get_enstore_fs_path() in seperate try...except blocks. If the admin path is not mounted, that shouldn't prevent the non-admin path from being used. (bugzilla #835, reviewboard #191)
========== enstore_pg.py ====================================================================================
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
========== ftt.py ====================================================================================
Added interface for do_scsi_command
========== Trace.py ====================================================================================
Useful patch to have the migration logging functions use the Trace.py logging functions internally. This keeps the output in sync between threads. (bugzilla #931, review board #266)
Fixed issue with --do-print messages not including the Thread name. (bugzilla bug #616, review board #24)
Log the correct threadname for servers that care. (bugzilla 595)
Make the logname and threadname values in Trace.py threadsafe. Migration is known to have an issue, so encp_wrapper and volume_assert_wrapper also are modified. (bugzilla bug #533)
========== pnfs_agent_client.py ====================================================================================
1) The os.path.isdir(), os.path.islink(), etc. functions call os.stat() which needs to be wrapped with file_utils.wrapper(). 2) Writes using --put-cache and --shortcut have had a patch to get the full path to PNFS files to send to the mover to avoid the mover's no NULL in path error. This patch generalizes this feature for reads too. 3) Another source of cyclic paths in PNFS has been found. 4) When --get-bfid and --skip-bfid are used together, honor the skip pnfs part. (bugzillas 1039, 1043, 1044 and 1045; review board 347)
Do the first part of check locally as done in pnfs.py This allows to send requests to pnfs agent only when needed, thus reducing the traffic between pnfs agent client and pnfs agent. bz 1041
Fixed a problem with encps causing Linux kernels to hang. The .(access)() paths of directories can be put together in a way that the filesystem does not form a tree, but a graph. The kernel implementation assuses only trees and hangs in a loop. This patch also addresses some issues with spurious errors from PNFS under high load. Incident ickets: INC000000056879, INC000000070546 Problem tickets: PBI000000000147, PBI000000000184 URL: https://plone4.fnal.gov/P0/Enstore_and_Dcache/developers/enstore-developers/documents/encp-investigation-of-inc000000056879-pbi000000000147/ Bugzilla ticket: http://www-enstore.fnal.gov/Bugzilla/show_bug.cgi?id=981 Reveiw board: http://uqbar.fnal.gov/reviews/r/324/diff/#index_header
call appropriate function in set_file_family (it was calling set_file_family_width ...) http://uqbar/reviews/r/224/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=877
These patches are all related to modifying find_pnfs_file.py to support Chimera instead of just PNFS. Also, modify encp to use the new functionality. (Bugzilla #839, review board #196)
These are additional patches for Chimera integration into encp. The namespace.StorageFS class now uses the __class__ member variable to better become one of the chimera.ChimeraFS, pnfs_agent_client.PnfsAgentClient or pnfs.Pnfs class. (bugzilla #649)
Chimera integration changes
Add missing chown() wrapper functions.
========== alarm_server.py ====================================================================================
alarm server creates alarm entry if no match in mail action is found http://uqbar/reviews/r/211/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=863
========== discipline.py ====================================================================================
Changes driven by pychecker.
Codew review recommended changes (bugzilla ticket 388) 1. self.dict -> self.discipline_dict 2. in match_found added commenst describing default return values 3. replaced string.atoi(..) with int(...) 4. More readable ticket in unit test
Discipline, modified for better library manager performance. This discipline requires a new format in configuration, it is not backward compatible with old discipline.
corrected configuration example
========== rawUDP.py ====================================================================================
define ret and and request_id so that they exist if exception occurs.(bz 917)
========== drivestat_server.py ====================================================================================
extract and pass database user parameter to drivestat database (it was overlooked in previous attempt to address this issue) Bugzilla: http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=676 RB: http://uqbar/reviews/r/104
========== configuration_server.py ====================================================================================
This has been initially reported by PIC when trying to access configuration server from the node running on a different domain. The request to configuration was rejected by it with: Fri Mar 18 15:54:41 2011 003336 enstore E configuration_server attempted connection from disallowed host 192.168.20.38 (bz 999)
========== media_changer.py ====================================================================================
In the STK implimentation of listSlots(), define the inaccessible and reserved variables when there is an error to avoid NameErrors when used later on. (bugzilla #1002, review board #315)
Catch exception in STK listVolumes(). (bugzilla #960, review board #279)
Additional comment changes for the ACSLS 7 and 8 --list-clean and --list-volumes differences. (bugzilla #897, review board #240)
Added comments documenting the format change for "query volume all" and "query clean all" between ACSLS version 7 and 8 of the STK media changer. (bugzilla #897, review board #240)
There were 2 problems found with media_changer.py working with new version of ACSSA (1.8): --list-clean fails --list-volumes returns wrong lines The corresponding media_changer methods were fixed.(bz 886)
Fixed three occurances where and error causes a query request to remain in the queue because the r_a field no longer exists in the ticket. (bugzilla #737, review board #212)
In MTX_MediaLoader the remote command arguments had a wrong order.(bz 867)
If media is not in home position do not retry to mount it and do not try to unmount it. (bugzilla 766)
1) There were some inconsistent uses of msg versus message that was found to lead to uninitialized variables. 2) When removing a request from the list of active work items, make sure there is an 'r_a' field. Log the error if it is not there. (bugzilla #737, reviewboard #134)
Changed STK media changer to send e_errors.MC_VOLNOTFOUND when "Unreadable label" is returned by ACSSA. (bz 641)
========== enstore_functions2.py ====================================================================================
Fixed a problem with encps causing Linux kernels to hang. The .(access)() paths of directories can be put together in a way that the filesystem does not form a tree, but a graph. The kernel implementation assuses only trees and hangs in a loop. This patch also addresses some issues with spurious errors from PNFS under high load. Incident ickets: INC000000056879, INC000000070546 Problem tickets: PBI000000000147, PBI000000000184 URL: https://plone4.fnal.gov/P0/Enstore_and_Dcache/developers/enstore-developers/documents/encp-investigation-of-inc000000056879-pbi000000000147/ Bugzilla ticket: http://www-enstore.fnal.gov/Bugzilla/show_bug.cgi?id=981 Reveiw board: http://uqbar.fnal.gov/reviews/r/324/diff/#index_header
branches: 1.29.2; Library manager incorrectly compared encp versions beginning with v3_10.(bz 921)
========== log_finish_event.py ====================================================================================
========== info_server.py ====================================================================================
pass database name to edb.FileDB and edb.VolumeDB constructors RB : http://uqbar/reviews/r/172 Bugzilla : http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=800
fix typo that was causing : enstore info --file ..." command to fail Bugzilla: http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=747 RB: http://uqbar/reviews/r/138/ Reemedy: INC000000037413
always specify username to database access API http://uqbar/reviews/r/63/ http://www-ccf.fnal.gov/Bugzilla/show_bug.cgi?id=630
1) multi threaded info server and file server w/ thread safe DB API 2) proper handling of exceptions DB transactions in edb Bugzilla #579 http://uqbar/reviews/r/28
========== chooseConfig ====================================================================================
removed Chih from gccen mail