Wednesday, February 27, 2019

NetBackup status code: 54 - Error Code details and Solution (Recommended Actions to resolve)



NetBackup status code: 54

Message: timed out connecting to client

Explanation: The server did not complete the connection to the client. The accept system or winsock call timed out after 60 seconds.

Some third-party software packages (for example, a personal firewall product) can affect the TCP/IP stack in Windows. This action can cause a loss of connection between the NetBackup server and the bpcd process on the client. NetBackup tries to set SO_REUSEADDR (allow local address reuse) on the inbound socket connection so that the port can be handed off from bpinetd.exe (the NetBackup Client Service) to bpcd.exe. Some products may not allow this functionality due to the various methods that can be used to violate system security.

Recommended Action: Do the following, as appropriate:

  • For a Macintosh or NetWare target client: Verify that the server does not try to connect when a backup or restore is already in progress on the client. These clients can handle only one NetBackup job at a time.

    On a Macintosh, check for activity by examining the NetBackupListen file in the following folder on the startup disk of the Macintosh client:

    :System Folder:Preferences:NetBackup:logs:inetd:log.mmddyy  
  • Perform the following procedure:

    See "Resolving network communication problems" in the Troubleshooting Guide.

  • On UNIX and Linux clients, verify that the /usr/openv/netbackup/bin/bpcd binary exists and that it is the correct size.

  • Check the /etc/inetd.conf file to make sure the bpcd path is correct in the following entry:

    bpcd stream tcp nowait root /usr/openv/netbackup/bin/bpcd bpcd  
  • On the systems that include the following, make sure that the client name is in the master's /etc/hosts file: NetBackup master, media, and clients (with NetBackup database extension products installed on one or more clients).

  • Completely uninstall the third-party software package on the client that causes the failure. Or, contact the software manufacturer to investigate if other configuration options or workarounds are possible.


    Netbackup (NBU) Important Error Codes with solution - NBU Status Codes

    Following are the most required and must known NBU status code which need to know by every NetBackup administrator. 



    1. Status code 2
    Reason: None of the file backed up
    Action taken: no files in target path

    2. Status code 13
    Reason: File read failed
    Action taken: network connectivity

    3. Status code 25
    Reason: Cannot connect to socket
    Action taken: bpcd daemon want to check .

    4. Status code 50
    Reason: Client process aborted
    Action taken: restart the backup manually wants to check any errors

    5. Status Code – 59
    Reason: Access to the client was not allowed
    Action Taken: want to check the bp.conf / master-client access connectivity

    6. Status code 71
    Reason: Backup taking path changed
    Action taken: path should be correct

    7. Status Code – 84
    Reason: Reduce the backup failure due to I/O error.
    Action Taken: Clean the media mounts and to change the tape
    default parameters to reduce backup failures due to I/O.

    8. Status code – 96
    Reason: Backup failure due to unavailable at scratch pool
    Action taken: Volume pool has been allocated to scratch pool

    9. Status code – 129
    Reason: Disk storage unit is full
    Action taken: remove old images

    10. Status code – 196
    Reason: Client backup was not attempted becoz backup window
    close/elapsed time.
    Action taken: manually restart the backup, if exceeds changes
    the timing frequency and backup window frequency.

    11. Status code – 2001
    Reason: Tape library down error/Robotic path changed
    Action taken: Manually bring up the robot.



    Tuesday, February 26, 2019

    NetBackup 7.x Backup Process Flow: Step 1-17 process explained in detail

    NetBackup 7.x Backup Process Flow





    1. When a PolicyClient task has its timer expire (indicating that it is due to run) an internal job task is created within nbpem that sends a Job Start to nbjm for the job which is due. nbpem provides to nbjm the parameters indicated in the backup policy and schedule that is generating the job.


    2. nbjm adds the job to its job list or queue. It then communicates with bpjobd to inform it of the job, at which time the job becomes visible in the Activity Monitor in the queued state until resources are allocated for it.






    3. nbjm sends a resource allocation request to the Resource Broker, nbrb, indicating the resources which are required for the backup operation and any resource consumption constraints for the job, including max jobs per policy, max jobs per client, and max jobs this client. These resource consumption constraints were provided to nbjm by nbpem when the job was initiated.


    4. nbrb requests resources from the EMM service, nbemm, including storage unit, storage unit group, media, and devices or drives.


    5. When physical resources are available, nbemm will allocate them and respond to nbrb, which in turn responds to nbjm. With resources allocated for the job, nbjm will notify bpjobd and the job moves to the active state.


    6. nbjm is responsible for creating the files in the Images database that will house the backup information, the Header file and the Files (.f) file. nbjm initiates this activity by communicating with bpdbm (via nbproxy).


    7. nbjm uses bpcompatd to communicate via PBX to start bpbrm on the media server that will write the backup image. The media server is selected based upon the destination storage unit that is selected.






    8. bpbrm on the media server starts bpbkar (the client's backup and archive process) on the client system. bpbrm also starts bptm on the media server.


    9. bptm initiates a connection with nbjm in order to get media and drive information for the backup job, which nbjm returns through a separate connection it initiates.


    10. bptm will then initiate the mount of the media (tape) specified on the drive specified, or the mount of the disk specified. It will also spawns a bptm child process to receive the image from the client. The details of the Media Manager daemons (ltid, txxd, txxcd, and avrd) involved in the mounting of the media on the drive are not shown in this illustration, to reduce the complexity of the illustration.


    11. bpbkar sends information about the backup image to bpbrm which forwards it to bpdbm on the master server. This stream of metadata is sent throughout the backup and stored in the master server's Image database.


    12. When mounting and positioning of the media in the drive, or of the disk to be used, have been accomplished, the client backup process, bpbkar, will begin sending backup data to the bptm child process on the media server system. The bptm child process receives the image and stores it block by block into a shared memory segment on the media server. The parent bptm process retrieves the image from shared memory and directs it block by block to the allocated storage media.







    13. When the backup has been completed bptm will notify bpbrm, which in turn will notify the Job Manager nbjm that the job has finished bptm will also notify nbjm that it is done with the media.

    14. While the client and media server processes invoked to perform the backup operation (bpbrm, bptm, and bpbkar) are terminated, nbjm will update the status for the job by communicating with bpjobd. The job will be changed to Completed status, and the ending status of the job will be recorded.

    15. nbjm communicates with bpdbm (using nbproxy) to complete the writing and verification of the files for the backup image in the Images database.

    16. With the backup job completed nbjm will de-allocate the resources used for the backup by communicating with nbrb.

    17. nbjm will notify nbpem that the job has been completed The completion status will be included in this notification. The PolicyClient task that created the job is responsible for requesting a re-try operation for this job on failure, or for computing the new 'due time' for this job on success.

    Available Backup Types in Netbackup (Types of Backup)

    1. Full Backup
    Is the starting point of all the backups, contains all the data in the folders and files that are selected to be backed up. 

    Advantages: Restore is fast
    Disadvantages: Backup is slow and consumes more space.

    2. Differential Incremental
    Takes the backup of the files changed since the latest backup.  Archive bit is reset. Backup is taken based on the time stamp.

    Monday, February 25, 2019

    Netbackup Important Error Codes & its Solutions

    Based on my experience in the daily issues level I have shorted few must know backup failure issues status codes which I have mentioned here.

    The following are the Veritas Netbackup important error codes and its solutions.

    1. NBU Status code: 2
    Reason: None of the file backed up

    Error bpsched(pid=XXXXX) backup of client SQLHOST exited with status 2 (none of the requested files were backed up)

    Action taken: no files in target path. 

    Enable the dbclient logfile on the SQL server. 

    Veritas NetBackup (tm) SQL Agent is notconfigured to use Windows NT Authentication. 

    Incase of SQL agent issue: Configure VeritasNetBackup (tm) SQL Agent to use Windows NT Authentication.

    2. NBU Status code 13
    Reason: File read failed

    Corresponding example from the UNIX /usr/openv/netbackup/logs/bpbrm/log.<date>file:
    <16> bpbrm readline: socket read failed, An existing connection was forcibly closed by the remote host. (10054)
    <2> inform_client_of_status: INF - Server status = 13
    <2> put_long: (11) network write() error: An existing connection was forcibly closed by the remote host. ; socket = 496
    <16> inform_client_of_status: could not send server status message

    A Status 13 will occur due to network issues on the master or client.  This error indicates a read operation of a file or socket failed.  This can also occur for Flash Backup or Advanced Client backups.

    Backups fail with Status Code 13 "file read failed", indicating that a read of a file or socket has failed.  Winsock errors 10054 and 10053 may also be seen in the bpbkar log on the client. 
    Action taken: network connectivity, check the below basic troubleshooting steps.

    • Ensure that the latest service packs for all products and components (SQL, Exchange, Notes, etc.) have been installed.
    • Ensure that all the network settings throughout the environment (NICs, hubs, switches, routers, etc.) are set to full duplex, not half duplex
    • Increase the timeout settings on the NIC, if available.
    • Try a different NIC, if available.
    • If NIC teaming is implemented, deactivate for testing purposes.


    3. Status code 25
    Reason: Cannot connect to socket

    The master server is getting a status code 25 (cannot connect on socket) error when attempting to bring up the client host properties using the GUI or remote admin console. 

    Action taken: bpcd daemon want to check.

    When troubleshooting status 25 errors on a NetBackup client, verify that the client was working prior to the issue. If it had been working try to determine what changes may have been made to the client server's OS or the network links

    If there is no major changes, do the following basic troubleshooting: 

    1. To test the master/media server resolution of the client server hostname run the following command:
      • <install path>/netbackup/bin/bpclntcmd -hn <client hostname>
    2. Since reverse lookups is part of the NBU server to client connections make sure the client can also be resolved by its IP address:
      • <install path>/netbackup/bin/bpclntcmd -ip <client IP address>
    3. On the client test the resolution of the NBU servers by issuing the same commands. These commands should be run against the master and all of the media servers that may be trying to backup the client server:
      • <install path>/netbackup/bin/bpclntcmd -hn <NBU server hostname>
      • <install path>/netbackup/bin/bpclntcmd -ip <NBU server IP address>
    4. Verify you are able to "ping" the client's IP address from the NBU server. If this fails consult with your Network Administrator and client server System Administrator to resolve the layer 3 or IP network connectivity.
    5. Double check the server's NIC's IP address and netmask to ensure they are configured correctly.

    4. Status code 50
    Reason: Client process aborted

    The NetBackup Policy Execution Manager (NBPEM) ran out of memory and crashed causing all active jobs to fail with Status Code 50 - client process aborted  
    Action taken: restart the backup manually wants to check any errors

    1. Recycling the NetBackup services will temporarily restore functionality to the master server until the NBPEM process reaches the imposed 4GB memory limit. 

    2. The solution is to set to maxdsiz_64bit 8GB in bp.conf .

    5. Status Code – 59
    Reason: Access to the client was not allowed


    Action Taken: want to check the bp.conf / master-client access connectivity

    A status code of 59 commonly occurs when the client does not have the NetBackup master or media server properly defined in the / usr/openv/netbackup/bp.conf file. There are a number of well-documented and effective ways to troubleshoot this problem, (e.g., creating a bpcd log on the client and then re-attempting the backup) but the UNIX last command is a quick and simple way to clearly establish the cause of this problem. 

    Steps to follow to fix the issue:

    1. If this is not a name resolution problem, add a "SERVER = BUServer" entry to bp.conf on NBUClient 

    OR 

    2. If this is a name resolution problem, correct the name resolution configuration ( /etc/hosts file, DNS or NIS maps) on NBUClient so that the above scenario would show NBUServer being returned in the output of the last command 

     3. In addition If the NetBackup Client being backed up is virtual (vmware).  The policy must be set as "Vmware policy type" if NetBackup is accessing the vsphere server to backup the server.  The other option is the NetBackup client needs to be installed on the virtual server being backed up.

    6. Status code 71
    Reason: Backup taking path changed

    A Status 71 error "none of the filesin the file list exist" occurs.  However, it is known for certain that thebackup selections specified in the policy exist on the clients inquestion. 

    Action taken: path should be correct

    Then proceed with the following: 

    1. Expand Policies in the left pane 

    2. Double click the name of the policy that has failed with Status 71 

    3. In the Change Policy window, click the Files or Backup Selections tab (the tab name varies depending upon version) 

    4. Highlight an entry in the file list and click Rename. Then highlight the entire entry with the mouse.  See if there is a space at the end of the listing.   shows a file list entry that has a trailing space. 

    Remove the trailing space if one is present. Check all other file listentries as needed.  Click OK in the Change Policy window and run thebackup again 

    7. Status Code – 84
    Reason: Reduce the backup failure due to I/O error.

    Backup jobs fail with a NetBackup Status Code 84 (media write error) and the system's device driver returns an I/O error while NetBackup is writing to removable media or a disk file.


    Action Taken: Clean the media mounts and to change the tape

    default parameters to reduce backup failures due to I/O.

    Turn logging up to Verbose = 5 for the bptm process on the problematic media server.  Capture the problemat the higher logging level and examine the resulting log file inside the <install_path>\netbackup\logs\bptm log folder. 

    Additionally,examine the Application Event Log for NetBackupErrors. 

    Some of errors can be caused by a faulty SCSI card.  Replace thefaulty SCSI card. 

    8. Status code – 96
    Reason: Backup failure due to unavailable at scratch pool

    Similar Error log: invalid volume pool (90)unable to allocate new media for backup, storage unit has none available (96)

    Action taken: Volume pool has been allocated to scratch pool

    When duplicating tapes, verify the destination volume pool name is not defined as the same volume pool name which is configured as the scratch volume pool. The -dp option for the bpduplicate command defines the destination volume pool name.

    9. Status code – 129
    Reason: Disk storage unit is full

    Backups to disk storage units fail with a VERITAS NetBackup (tm) Status 129 because the storage unit is full.

    Action taken: remove old images

    There are several methods that can be used to reclaim disk space on a disk storage unit.  Options such as, but not limited to, expiring older images, using an alternate storage unit, changing the retention level used and adding more disk space to the disk storage unit. 

    Expire older images from the disk storage unit to reclaim disk space on the file system. 


    10. Status code – 196
    Reason: Client backup was not attempted becoz backup window
    close/elapsed time.

    Action taken: manually restart the backup, if exceeds changes
    the timing frequency and backup window frequency.

    client backup was not attempted because backup window closed 

    This inability to allocate drive can cause the backup window to be closed. This can lead to an EMM server going down and getting various EMM errors including: 

    Unable to obtain the server list from the Enterprise Media Manager server. Database Server is down (93) 

    Verify there are no ACTIVE jobs. Deactivate and cancel all jobs. 

    Steps: 1. Stop all NetBackup services on the master server: 

    Windows: <install path>\Netbackup\bin\bpdown -v -f 

    UNIX: /usr/openv/netbackup/bin/goodies/netbackup stop 

    2. Check for active processes using command - <install path>\netbackup\bin\bpps –a 

    3. Clear / kill them if you find any 

    4. Restart all NetBackup services on the master server: 

    Windows: <install path>\netbackup\bin\bpup -v -f 

    UNIX: /usr/openv/netbackup/bin/goodies/netbackup start 

    5. To see allocations (likely due to the EMM connection error): 

    <install path>\NetBackup\bin\admincmd\nbrbutil -dump 

    6. To clear the allocations run the following: 

    <install path>\NetBackup\bin\admincmd\nbrbutil -resetall 

    7. Run a regular backup and confirm that it is writing to tape. 

    8.  Reactivate the jobs that were previously deactivated and confirm that they also now are running. 

    11. Status code – 2001
    Reason: Tape library down error/Robotic path changed
    Action taken: Manually bring up the robot.

    Only command vmoprcmd without parameters tells the actual status of drives. Any attempt of bringing of downed drives up with vmoprcmd ended with message: The drive is not ready or inoperable.

    The solution is to deleting of downed drives with tpconfig, then tpautoconf -a and restarting of NetBackup.


    Thursday, February 14, 2019

    Configure Restoredev file - HP Data Protector (MicroFocus)


    In some Scenarios restore files when you use with default backed up device if it has been changed after the backup, such cases you need to change or divert the recovery process with a new device.

    ex: Oracle restore took wrong device, not that at backup time

    In case such situations occur it is always worth considering the possibilities that might affect the device selection due to user activities.
    Two different ways to affect restore device should be taken into account:

        a) restoredev file
        b) omnidbutil -changebdev
     

    Wednesday, February 13, 2019

    Storage Area Networks (SAN) Interview Questions and Answers - Top 76 Questions added

    Below are some of the frequently asked Storage (SAN) basic interview question and answers. Check the Storage Area Networks (SAN) basic & advanced concepts page in this site to learn more SAN basics. 

    These are the general SAN questions, these questions will be helpful for any Storage Vendor specific interviews.

    Storage Area Networks (SAN) Interview Questions and Answers

    Storage Area Networks (SAN) Basic concepts Interview Questions and Answers:

    Thursday, February 7, 2019

    NetBackup Tutorial: Steps to verify device configuration using "robtest"

    NetBackup Tutorial: Steps to verify device configuration using "robtest"


    The process to absolutely verify that the drive path mapping is correct on all of the media servers is a bit time consuming, but will ensure that everything is correct. 

    Step 1 
    Please note that this test cannot be run while VERITAS NetBackup (tm) is attempting backups, restores, duplicates, or any other actions that involve the robot and drives. 

    Step 2 
    Acquire the output of " /usr/openv/volmgr/bin/tpconfig -d " from each of the media servers. This is the drive configuration within NetBackup for each server.