Dell Equallogic Ps Series Firmware Version 6.0.6 Fix List: Issues Corrected in Version 6.0.6
Dell Equallogic Ps Series Firmware Version 6.0.6 Fix List: Issues Corrected in Version 6.0.6
Dell Equallogic Ps Series Firmware Version 6.0.6 Fix List: Issues Corrected in Version 6.0.6
6 Fix List
Note: Items that have the potential for significant impact on system availability or data integrity are
Hardware
After removing the array that previously was the group lead, the group could no longer send syslog messages. [Tracking #: 562098] Some PS41x0 and PS61x0 arrays incorrectly reported that the control module clock battery had a low charge by issuing the following error message: Time-of-day clock battery voltage is low. [Tracking #s: 301264, 297744, 311890, 332643, 328593, 410414, and 412124]
While using EqualLogic PS-M4110 blade storage arrays, fans randomly may have revved up to a high RPM rate and then went back down to normal. The getfanreqinfo section of the CMC logs showed normal fan request percentages, but the airflow messages indicated one of the arrays requested 82 or 100 percent.
Volumes
When running SyncRep and doing a switch over between SyncActive and SyncAlternate pools, the system sometimes became unresponsive for an undetermined amount of time. This condition was linked to volume renames after a switchover and then a switchback. This issue also occurred with SyncRep switches with lower block size I/O. Several fixes and improvements have been made in relation to conditions that might have determined a volume to become unavailable due to SyncRep after enabling/disabling SyncRep on a collection, or after attempting switching to SyncAlternate pool. [Tracking #: 746454] An issue was corrected in relation to an internal task pause that may have resulted in stuck page move process while member free space was getting low. An internal algorithm was improved to achieve more evenly distribution of connections across the members in the same pool.
Page 1 of 9
Hardware
In certain conditions, a fan failure warning event may have been displayed at various intervals with the following message: Fan speed is outside operating limits. Condition may then have cleared itself seconds later. On PS41x0 / PS61x0 platforms, controller modules may have reported invalid sensors reading as sensor temperature below the operating limit. [Tracking #s: 420027 and 549161] An issue that may have determined a passive controller to reboot spontaneously has been corrected to resolve the temporary effect on array redundancy. [Tracking #: 749774] [CRITICAL] On very rare occasions, during controller bootup, new RAID labels may have been written to the drives, resulting in data loss. [Tracking #s: 719252 and 725264] In very rare circumstances, certain use of IP payload compression may have generated an inappropriate response at the targets which may have resulted in an unexpected controller failover. [Tracking #: 706460] In rare occasions when using OffloadDataTransfers (ODX) with Windows 2012 initiators, a specific WriteUsingToken command could have generated an inappropriate response at the target that may have resulted in a controller failover. (see T10 specifications regarding WriteUsingToken command) [Tracking #: 762035] An improvement has been made when an array would be operating in high-altitude installations to eliminate a cosmetic power supply fan speed error message in such circumstances. [Tracking #: 687218] Legacy platforms to include controller modules Type 2 and Type 5 may have experienced unnecessary copy-to-spare of valid drives, if user was using SanAssist or ran diagnostics. [Tracking #: 758190] CPU erratas have been reviewed for PS(M)41x0, PS61x0, PS65x0, and PS60x0 platforms to prevent a watchdog reset that would have triggered an unexpected controller failover. After a copy-to-spare process for a SMART tripped drive, a replacement drive may have been displayed as "history of failure" instead of becoming hot-spare. [Tracking #s: 796873 and 749774]
Replication
If SyncRep was in use but had been paused on a volume for a long time, it could have resulted in either resource exhaustion on the controller or in an unexpected controller failover. When using snap replication, the behavior of the unmanaged flag for the temporary promotion, temporary promotion with failback and permanent promotion has been revised to fix issue when such flag was inadvertently not removed. [Tracking #s: 424189, 619340, and 292232]
Volumes
A condition has been corrected that may have triggered unnecessary redistribution of the iSCSI sessions to the same array member when volume distribution had not changed. [Tracking #s: 634504, 667173, 673207, 742967, and 777614]
Page 2 of 9
In certain scenarios, user actions of disable/enable, pause/resume of SyncRep-enabled volumes may have resulted in volume transition to an invalid internal state, requiring support intervention to restore availability of the volume to the initiators. [CRITICAL] In very rare circumstances and after a failover in multiple RAID LUN arrays, an internal log may have been inadequately committed, affecting volume page consistency. [Tracking #s: 694268 and 261429]
Other Issues
A PagedPool resource allocation method was revised to prevent resource exhaustion in certain scenarios. [Tracking #s: 797010, 700072, and 796918] In very large Active Directory implementations with the EqualLogic FS7xx0 and one or more PS Series arrays, repeated user / group list enumeration requests may have caused CIFS authentication failures and disconnects. A CLI support option was introduced to turn on/off the enumeration. [Tracking #s: 793483 and 814123]
Hardware
An issue occurred with Self-Encrypting Drives (SED) where the array incorrectly marked the spare drive as failed and needing replacement. [Tracking #: 656506] When using Host Integration Tools 4.5, after upgrading to firmware version 6.0.x, a controller restarted. [Tracking #s: 714079, 679484, 705740, 732366, 760900, 760896, and 764987] On a PS6110 or PS4110 array, a vertical failover followed by a passive controller failure or reboot may have caused the entire array to go offline. [CRITICAL] While processing a preemptive drive removal request, RAID incorrectly attempted to process more than one drive failure request, and generated a failure event and possible outages. [Tracking #s: 606292, 650187, 689710, 689806, 724011, 695665, 720342, and 753459]
Page 3 of 9
Networking
A PS6110/PS4110 array with Data Center Bridging (DCB) enabled, in a correctly configured DCB network, could take up to three or four minutes to fail over as a result of a network switch failure. LDAP Active Directory user name and group name are now limited to 63 ASCII characters. Prior to this release, ad-user/ad-group entries were limited to 26 characters. When settings negotiated with a DCB switch that was incorrectly configured resulted in there being no Ethernet flow control, the error message returned was not clear. Improved the GUI DCB flow control warning message to provide better guidance on what switch changes are necessary.
Replication
Under certain conditions, a retry request involving the automatic balancing functionality and replication resulted in an unexpected controller failover. [Tracking #: 738935] User was unable to rerun replication. For current volumes, the process stopped with a status of inprogress. For new volumes, the following error was displayed: Bad password or partner name
specified. Check the spelling and case of the group name and password.
If user paused SyncRep for prolonged periods of time (more than 24 hours), even on a single volume, the array became unresponsive. Under certain conditions, after promoting and demoting replicas, the following was displayed:
Replica set with the same name already exists on the partner. Remove it on the partner. However, the phantom replica set could not be deleted. [Tracking #: 60746]
In certain conditions when using EqualLogic Storage Replication Adapter for VMware Site Recovery Manager (SRM), a re-protect operation failed due to an orphan MIB entry. When promoting a replica of a promoted replica, the volume ID of the second promoted replica did not match the volume ID of the original volume. Instead it matched the actual volume ID of the first promoted replica. If user restarted MgmtExec on the source array while replication was in progress, a Kernel exception occurred.
User Interface
The default value displayed in the web interface and Remote Setup Wizard was RAID 50, which is different than the Dell-recommended RAID 6 policy. The Add Pair operation was not permitted while a node was detached. The following message was displayed: NAS controller {0} is detached. You must attach the controller to perform
this operation.
Volumes
Intermittently, volume was getting stuck in "out of sync" state on switching to SyncAlternate pool while one of the members of that pool was vacating. A snapshot of a volume with multihost access enabled displayed the following error if multiple hosts attempted to simultaneously access the snapshot: Initiator cannot access this target
because an iSCSI session from another initiator already exists and multihost access is not enabled for this target. [Tracking #: 635738]
If sync-rep was enabled on template volumes that had "in use" pages, when user clicked on the Failover to SyncAlternate link, volume went out of sync.
Page 4 of 9
Other Issues
After enabling the EMhome functionality, the following INFO message was displayed in the event log every two minutes: E-Mail Home notification has been enabled. [Tracking #: 32985] In arrays running versions 6.0.0 through 6.0.3 firmware, an unauthenticated user could have potentially traversed the directory and accessed a file that contains encrypted password information for the Dell EqualLogic storage array. No user data was exposed by this vulnerability. A copy of the encrypted passwords does not offer a useful path to attacking the system. Users could not modify or upload system files via this vulnerability. [Tracking #: 691433]
The firmware update script did not check suitability of all language kits before copying them. This resulted in missing language kits after the upgrade.
Hardware
Non-hardware drive errors on SATA drives led to preemptive drive removal. This issue impacted controller types 7, 8, and 10 only, which were shipped on PS4000E, PS6000E, PS6010E, PS6500E and PS6510 arrays. [Tracking #s: 383341, 380686, 358418, 391004, 344726, 374075] Under rare conditions on a PS-M4110 array, while the OS was booting or transitioning to active from passive, a NULL pointer in the status page address caused a controller to restart out of sequence. If an array with SED drives was not set up correctly, and a reboot was performed during the setup process, SED key sharing was disrupted with a failed health condition of sed_unresolved.
Replication
When synchronous replication was enabled, if user paused the synchronous replication for a long time while continuing to write to the volume, the secondary controller may have restarted. When the following operation was performed, one or more volumes were changed to state unavailable due to SyncRep. Once this state was reached, the volumes were stuck and could not be deleted. User added and removed volumes to an existing SyncRep collection, then a SyncRep switch occurred.
Page 5 of 9
Other Issues
If a MAC address was not resolved, the packets were put in the ARP hold queue until a response to the ARP request was generated. An attempt to create a share directory with space in the name failed with the following error:
" Character "" is not allowed.
Under rare circumstances, a drive mirror operation caused the array to become unresponsive. If you were in the Replication section of the graphical user interface (GUI) and selected an individual outbound replica container, the help button in the upper right corner of the screen did not contain the correct information. An intra-group connection failure on a PS-M4110 in a DCB network caused volumes to go offline and become inaccessible. After about 4 or 5 minutes, the connection resumed and volumes were online again.
Under rare circumstances, a PS6x10 array would not restart after a disk drive replacement. [Tracking #s: 193506, 225473, 141950, 146411, 154775] During a firmware upgrade, before users issued a restart command, I/O performance dropped significantly. [Tracking #: 23320] Following a firmware update, the secondary controller in a PS61xx array erroneously reported a power supply fan failure every few hours. [Tracking #: 356374]
Page 6 of 9
Group Security
The OpenSSH has been updated to OpenSSH_5.0 NetBSD_Secure_Shell-20080403+, OpenSSL 0.9.9-dev 09 May 2008, sha2 fix (NetBSD-SA2009-012) applied in order to correct security vulnerability issues. [Tracking #: 79575]
Initiator Connections
Disrupted iSCSI connections may not have been recovered immediately in groups with large numbers of iSCSI initiator connections attempting to connect simultaneously. Even though a dedicated management network port was defined, NTP traffic traveled over Ethernet ports used for iSCSI traffic. [Tracking #: 122433] Users were seeing excessive warnings for connection limits. [Tracking #: 121385] Replication performance was degraded when using high-latency WAN links. [Tracking #: 334970] In SAN environments using the Host Integration Tools, iSCSI connections to the group were dropped when an internal process unexpectedly restarted. [Tracking #s: 301700, 315785, 400833]
User Interface
The CLI command account show active was corrected to display remote and local IP addresses. The GUI's Ethernet interface error status was not cleared until the member was rebooted. [Tracking #s: 107065, 97610, 97465, 96996, 94501, 75048, 93895, 93517, 93135, 88161] Users with read-only accounts could not change their own passwords. [Tracking #: 134062] If a pool movement failed, the GUI continued to report the movement's status as "in-progress." User was able to delete a member of a group that was offline, which caused a loss of data. The process to delete an offline member of a group has been changed to require interaction with the Dell EqualLogic customer support team in an effort to prevent data loss. [Tracking #: 151373] In rare instances, after a firmware upgrade, grpadmin could not manage pools using the CLI. The following error was returned: Administrator grpadmin is not allowed access to pool defaults. [Tracking #: 155201] In rare instances, a resource used by the network management process could be exhausted, causing slow GUI response. [Tracking #s: 155970, 326406, 121237, 172614, 154843] The Group Manager GUI allowed the "Thin Provisioned Volume" option to be set when cloning a volume that was not thin provisioned. The resulting clone was not thin provisioned, but was listed as such in the GUI. This option is no longer available. You must first clone, and then thin provision a volume. [Tracking #s: 359992, 316424, 318855, 23490, 153934, 176714, 218976] Using the GUI to perform a pool move and a change from normal to thin-provisioned volume at the same time resulted in the volume having 0% volume reserve, and 0% for snap reserve. [Tracking #s: 176955, 298161, 191887] The Group Manager GUI could not display information about a storage pool, and indefinitely displayed a Retrieving data status message when merging pools or deleting them from the group. In rare instances, the Group Manager GUI displayed snapshots associated with previous replication operations that had been interrupted. [Tracking #s: 195934, 139176] The Group Manger GUI erroneously displayed the following error message while running diagnostics from the GUI while a CLI session was left open: Unknown error 13 returned by the server. [Tracking #s: 177329, 215508, 311220] User could not log in to the Group Manager GUI. [Tracking #: 187466]
Page 7 of 9
The Group Manager GUI and CLI incorrectly indicated that the construction of a RAID set remained at 0%, even after the operation had completed. [Tracking #: 332409]
Hardware
Under rare conditions, a read error on a disk caused it to go offline. [Tracking #s: 290764, 301337, 433799, 263746] Under rare conditions, the array panicked during a power cycle. When this occurred, the following message was displayed: Panic recovery from CPU0 with reason 'added invalid SAS address. [Tracking #: 336304] The PS65xx arrays did not always reset the amber drive warning light after a bad disk drive was replaced. [Tracking #s: 271703, 270105, 271189, 267520, 290665, 300213, 307085, 323951] Under certain conditions, a PS65x0 channel card failure did not fail over to the secondary channel card. [Tracking #: 405829] After a hardware fault, the PS6100 primary active controller sometimes hung and did not fail over. [Tracking #s: 429293, 391583, 411766, 483775]
Replication
If replication was paused and the size was changed on the primary group, a restart of the replication failed after a failback of the original volume from secondary to primary group. [Tracking #: 233412] Volume description did not replicate with volume name. [Tracking #: 840561086] If a VMware SRM volume was replicated and promoted, user could not connect to the replica.
Volumes
If a volume move operation failed due to a source pool space problem, it did not resume even when the free space issue was resolved. [Tracking #: 32356] User could not cancel a volume move operation that was in-progress but not working correctly. [Tracking #: 327549] Space balancing when free space was low was not always effective. I/O errors and/or volume disconnects occurred when the pool approached 0 free space. [Tracking #: 150830, 162772, 150830, 179068, 200269, 202159, 150830, 228707, 241119] Volume replication became stuck in-progress while trying to replicate a volume containing lost blocks. [Tracking #: 401773] If a reset command was invoked for an array, users were not provided with a warning of what data was to be destroyed (the array name, the IP addresses, and any volumes on its member). Note that Dell strongly recommends that users not reset a running array. Use the delete member command instead, which moves data from the array to other arrays and automatically resets the array. [Tracking #: 249528] Using CLI, users could create a volume using 0% thin-growth-max, which is below the allowed minimum of 10%. [Tracking #: 268307] When the array attempted to automatically balance a 30MB volume across two members, it failed. Volumes larger than 30MB were not affected. [Tracking #: 294635] In rare circumstances, volumes went offline during a member update. When this occurred, notifications were generated, but not sent to the user. [Tracking #: 356009] In rare circumstances, during inter-array network problems with multiple volume moves, the GUI became unresponsive and volume moves could not continue. [Tracking #: 375991]
Page 8 of 9
Under certain instances, volumes marked with lost blocks, but still online, did not allow new iSCSI initiator logins after the lost blocks were cleared. [Tracking #: 378045] Free space balancing sometimes did not occur after a firmware update. [Tracking #: 338059] During firmware upgrade procedures, while migrating members into a maintenance pool, acknowledgments on cancelling tasks were not requested from participating members. This resulted in overlapping page movement tasks and ultimately in unexpected controller failovers. [Tracking #: 416721]
Other Issues
Under a rare condition, such as an extremely heavy workload on the group, or an extremely poor network environment, the iSCSI connection count could get out of sync. [Tracking #: 80616] The UNIX Secure Copy (SCP) command failed to transfer large segments of diagnostics. [Tracking #: 86317] Free space and provisioning warning messages were reported too often during normal array operations. [Tracking #s: 160843, 78575, 295963, 348799] When applying a config.cli (from the CLI save-config command) to a fresh configuration, the following error was generated: Error: Bad command. When a volume was thin provisioned, then immediately moved to another pool, the volumes snapshot reserve showed 0%. [Tracking #: 176955] A single-member group became unresponsive when communications with a RADIUS server were disrupted. [Tracking #: 211274] The crypto-legacy-protocols were being saved incorrectly with save-config. A network resource issue occurred during times of severe ARP flooding. [Tracking #: 191806] When a dedicated management port was enabled, bringing down eth0 caused sessions to disconnect on the management network. [Tracking #: 347770] In certain situations, volumes which had lost blocks cleared in the past were taken offline and incorrectly marked as having lost blocks. [Tracking #s: 348142, 217145, 373054, 442742] A timing issue caused RADIUS authentication to fail when there was a large number of volumes on the SAN. [Tracking #: 343065] The Remote Setup Wizard displayed incorrect serial number for PS4100 and PS6100 arrays.
Page 9 of 9