What Is Always On in SQL Server 2012?: Group Supports A Failover Environment For A Discrete Set of User Databases
What Is Always On in SQL Server 2012?: Group Supports A Failover Environment For A Discrete Set of User Databases
What Is Always On in SQL Server 2012?: Group Supports A Failover Environment For A Discrete Set of User Databases
Asynchronous-commit mode
Synchronous-commit mode
Asynchronous-commit mode
Synchronous-commit mode
The availability replica that makes the primary databases available for read-
write connections from clients and, also, sends transaction log records for each
primary database to every secondary replica.
Offloads your secondary read-only workloads from your primary replica, which
conserves its resources for your mission critical workloads. If you have mission
critical read-workload or the workload that cannot tolerate latency, you should
run it on the primary.
Improves your return on investment for the systems that host readable
secondary replicas.
We can have up to 2 synchronous replicas, but we are not required to use any.
We could run all Secondaries in Async mode if desired
Yes. An active secondary can be used to offload read-only queries from the
primary to a secondary instance in the availability group.
Yes, we can take transaction log backups on the secondary replicas without
COPY_ONLY option.
Within the context of a session between the primary replica and a secondary
replica, the primary and secondary roles are potentially interchangeable in a
process known as failover. During a failover the target secondary replica
transitions to the primary role, becoming the new primary replica. The new
primary replica brings its databases online as the primary databases, and client
applications can connect to them. When the former primary replica is available, it
transitions to the secondary role, becoming a secondary replica. The former
primary databases become secondary databases and data synchronization
resumes.
Three forms of failover exist—automatic, manual, and forced (with possible data
loss). The form or forms of failover supported by a given secondary replica
depends on its availability mode,
SQL Server Failover Cluster Instances (FCIs) do not support automatic failover by
availability groups, so any availability replica that is hosted by an FCI can only be
configured for manual failover.
Answer –AlwaysOn is a term Microsoft has used since SQL Server 2012 for high
availability and disaster recovery solutions. As of now, two features fall under the
umbrella of AlwaysOn. These two features support high availability and disaster
recovery for SQL Server databases:
SQL Server AlwaysOn FCIs are SQL Server clustered instances whereas AGs are
the new features introduced in SQL Server 2012 to support data high availability
and disaster recovery. We can group each set of databases into one unit and
execute a failover at one time with the help of the Availability Group.
Answer – Please have a look at the main differences between both AlwaysOn
solutions:
AlwaysOn Failover Cluster Instance needs shared storage between all of
the nodes in the cluster. Whereas AlwaysOn Availability Groups do not
require shared disk storage for the server hosting the SQL Server.
AlwaysOn Failover Cluster Instance is available on both SQL Server
Standard and Enterprise Edition whereas we need Enterprise Edition to
configure SQL Server AlwaysOn Availability Groups until SQL Server 2014.
There is now an option to create a basic Availability Group with SQL
Server 2016 Standard edition, but it has lot of limitations.
SQL Server AlwaysOn Failover Clustered Instances work at an instance
level whereas Availability Group works at a database level or for a set of
databases.
We cannot use AlwaysOn Failover Clustered Instances while installing
standalone instances whereas Availability Groups can be configured on
both standalone as well as SQL Server Clustered Instances.
Answer - The Availability Group Listener is a virtual network name that we use
to make connections to the databases whether it is running from a primary
replica or secondary replica after failover.
Answer – The "New Availability Group Wizard" option is disabled until you
enable the Availability Group feature from the SQL Server service property.
You need to launch SQL Server Configuration Manager then open SQL
Server service property.
Here you can see the AlwaysOn High Availability tab. Click on this tab.
You can see the Windows cluster name here if you have enabled the
cluster feature. As we know Windows cluster is mandatory for AlwaysOn.
If you haven’t configured a Windows cluster then first you need to
configure it in order to enable AlwaysOn High Availability.
Once you will enable it, the Cluster Group name will appear in this
property window and the grayed-out option "Enable AlwaysOn
Availability Group" will be enabled. Just click on this option and click the
OK button.
You need to restart the service to apply this change on the SQL Server
Instance.
Answer – There is only one difference between configuring AOAG in a single vs.
multi subnet. You will follow same process that we follow while configuring
AOAG in a single subnet, but if you have a multi subnet network then we need
one IP from each subnet to configure the AOAG listener. Read attached article
where I have explained the step by step process to configure AOAG in a multi
subnet network.
Answer – We can easily add new databases to an existing Availability Group.
First, we need to prepare the secondary database by taking the full backup and
subsequent transaction log backup then restore it on the secondary replicas in no
recovery mode. Then we can right click on Availability Group name to launch the
Add Database wizard. We should follow all required steps to proceed with this
wizard. Once completed, your new database will be added to the identified
Availability Group. You can read this whole process in detail in this article: How
to Add New Database to Existing Availability Group?
Answer – Until SQL Server 2014, the AlwaysOn Availability Group will not
initiate a failover process if anything goes wrong at the database level. Microsoft
introduced an option named Enhanced Database Failover in SQL Server 2016 to
trigger the failover in case any database participating in an Availability Group
loses the ability to write transactions. We also call it Database Level Health
Detection in an Availability Group. By default, this option is not enabled. You need
to configure it if you want to initiate a failover if anything goes wrong at the
database level.
Answer – Yes, we can add database files to the databases that are configured as a
portion of the AlwaysOn Availability Group.
1. First remove the database from the secondary replica. Now, the secondary
database will be in a restoring state.
2. Add the data file to your Availability database on the primary replica.
3. Issue a transaction log backup of this availability database on the primary
replica.
4. Copy this transaction log backup to the secondary replica and restore it on
its corresponding secondary replica using NORECOVERY and the WITH
MOVE option.
5. Now add the database back to AlwaysOn Availability Group.
Can we configure an Availability Group between SQL Server instances that are
hosted on servers that are part of two different Windows server failover cluster
groups?
PART-II:
Q. What is availability group wizard?
Ans:
Availability Group Wizard is a GUI using SQL Server Management Studio to create and
configure an AlwaysOn availability group in SQL Server 2012.
Q. Suppose primary database became in suspect mode. Will AG have failover to
secondary replica?
Ans:
Issues at the database level, such as a database becoming suspect due to the loss of a
data file, deletion of a database, or corruption of a transaction log, do not cause an
availability group to failover.
Q. Can we have two primary availability replica?
Ans:
No, it is not possible.
Q. Does AG support automatic page repair for protection against any page
corruption happens?
Ans:
Yes, It automatically takes care of the automatic page repair.
Q. How to add a secondary database from an availability group using T-SQL?
Ans:
ALTER DATABASE Db1 SET HADR AVAILABILITY GROUP = <AGName>;
Q. How to remove a secondary database from an availability group?
Ans:
ALTER DATABASE <DBName> SET HADR OFF;
Q. SQL Server 2012 AlwaysOn supports encryption and compression?
Ans:
SQL Server 2012 AlwaysOn Availability Group supports row and page compression for
tables and indexes, we can use the data compression feature to help compress the data
inside a database, and to help reduce the size of the database. We can use encryption in
SQL Server for connections, data, and stored procedures; we can also perform database
level encryption: Transparent data encryption (TDE). If you use transparent data
encryption (TDE), the service master key for creating and decrypting other keys must be
the same on every server instance that hosts an availability replica for the availability
group
Q. Does AG support Bulk-Logged recovery model?
Ans:
No, it does not.
Q. Can a database belong to more than one availability group?
Ans:
No.
Q. What is session timeout period?
Ans:
Session-timeout period is a replica property that controls how many seconds (in
seconds) that an availability replica waits for a ping response from a connected replica
before considering the connection to have failed. By default, a replica waits 10 seconds
for a ping response. This replica property applies only the connection between a given
secondary replica and the primary replica of the availability group.
Q. How to change the Session Timeout period?
Ans:
ALTER AVAILABILITY GROUP <AG Name>
MODIFY REPLICA ON ‘<Instance Name>’ WITH (SESSION_TIMEOUT = 15);
Q. What are different synchronization preferences are available?
Ans:
As part of the availability group creation process, We have to make an exact copy of the
data on the primary replica on the secondary replica. This is known as the initial data
synchronization for the Availability Group.
Q. How many types of Data synchronization preference options are available in
Always ON?
Ans:
There are three options- Full, Join only, or Skip initial data synchronization.
Q. Is it possible to run DBCC CHECKDB on secondary replicas?
Ans:
Yes.
Q. Can I redirect the read-only connections to the secondary replica instead of
Primary replica?
Ans:
Yes, we can specify the read_only intent in the connection string and add only
secondaries (not the primary) to the read_only_routing list. If you want to disallow direct
connections to the primary from read_only connections, then set its allow_connections to
read_write.
Q. If a DBA expands a data file manually on the primary, will SQL Server
automatically grow the same file on secondaries?
Ans:
Yes! It will be automatically expanded on the Secondary replica.
Q. Is it possible to create additional indexes on read-only secondary replicas to
improve query performance?
Ans:
No, it is not possible.
Q. Is it possible to create additional statistics on read-only secondaries to
improve query performance?
Ans:
No. But we can allow SQL Server to automatically create statistics on read-only
secondary replicas.
Q. Can we manually fail over to a secondary replica?
Ans:
Yes. If the secondary is in synchronous-commit mode and is set to “SYNCHRONIZED”
you can manually fail over without data loss. If the secondary is not in a synchronized
state then a manual failover is allowed but with possible data loss
Q. What is read intent option?
Ans:
There are two options to configure secondary replica for running read workload. The first
option ‘Read-intent-only’ is used to provide a directive to AlwaysOn secondary replica to
accept connections that have the property ApplicationIntent=ReadOnly set. The word
‘intent’ is important here as there is no application check made to guarantee that there
are no DDL/DML operations in the application connecting with ‘ReadOnly’ but an
assumption is made that customer will only connect read workloads.
Q. Does AlwaysOn Availability Groups repair the data page corruption as
Database Mirroring?
Ans:
Yes. If a corrupt page is detected, SQL Server will attempt to repair the page by getting
it from another replica.
Q. What are the benefits of Always on feature?
Ans:
Utilizing database mirroring for the data transfer over TCP/IP
providing a combination of Synchronous and Asynchronous mirroring
providing a logical grouping of similar databases via Availability Groups
Creating up to four readable secondary replicas
Allowing backups to be undertaken on a secondary replica
Performing DBCC statements against a secondary replica
Employing Built-in Compression & Encryption
Q. How much network bandwidth will I need?
Ans:
For a really rough estimate, sum up the amount of uncompressed transaction log
backups that you generate in a 24-hour period. You’ll need to push that amount of data
per day across the wire. Things get trickier when you have multiple replicas – the
primary pushes changes out to all replicas, so if you’ve got 3 replicas in your DR site,
you’ll need 3x the network throughput. Calculating burst requirements is much more
difficult – but at least this helps you get started.
Q. What’s the performance overhead of a synchronous replica?
Ans:
From the primary replica, ping the secondary, and see how long (in milliseconds) the
response takes. Then run load tests on the secondary’s transaction log drive and see
how long writes take. That’s the minimum additional time that will be added to each
transaction on the primary. To reduce the impact, make sure your network is low-latency
and your transaction log drive writes are fast.
Q. How far behind will my asynchronous replica be?
Ans:
The faster your network and your servers are, and the less transactional activity you
have, the more up-to-date each replica will be. I’ve seen setups where the replicas are
indistinguishable from the primary. However, I’ve also seen cases with underpowered
replicas, slow wide area network connections, and heavy log activity (like index
maintenance) where the replicas were several minutes behind.
Q. What’s the difference between AGs in SQL 2012 and SQL 2014?
Ans:
SQL Server 2014’s biggest improvement is that the replica’s databases stay visible when
the primary drops offline – as long as the underlying cluster is still up and running. If I
have one primary and four secondary replicas, and I lose just my primary, the
secondaries are still online servicing read-only queries. (Now, you may have difficulties
connecting to them unless you’re using the secondary’s name, but that’s another story.)
Back in SQL 2012, when the primary dropped offline, all of the secondaries’ copies
immediately dropped offline – breaking all read-only reporting queries.
Q: How do I monitor AlwaysOn Availability Groups?
Ans:
That’s rather challenging right now. Uptime monitoring means knowing if the listener is
accepting writeable connections, if it’s correctly routing read-only requests to other
servers, if all read-only replicas are up and running, if load is distributed between
replicas the way you want, and how far each replica is running behind. Performance
monitoring is even tougher – each replica has its own statistics and execution plans, so
queries can run at totally different speeds on identical replicas.
Q: How does licensing work with AlwaysOn Availability Groups in SQL 2012 and
2014?
Ans:
All replicas have to have Enterprise Edition. If you run queries, backups, or DBCCs on a
replica, you have to license it. For every server licensed with Software Assurance, you
get one standby replica for free – but only as long as it’s truly standby, and you’re not
doing queries, backups, or DBCCs on it.
Q: Can I use AlwaysOn Availability Groups with Standard Edition?
Ans:
Not at this time, but it’s certainly something folks have been asking for since database
mirroring has been deprecated.
Q: Do AlwaysOn AGs require shared storage or a SAN?
Ans:
No, you can use local storage, like cheap SSDs.
Q: Do Availability Groups require a Windows cluster?
Ans:
Yes, they’re built atop Windows failover clustering. This is the same Windows feature
that also enables failover clustered instances of SQL Server, but you don’t have to run a
failover clustered instance in order to use AlwaysOn Availability Groups.
Q: Do I need a shared quorum disk for my cluster?
Ans:
No
Q: If I fail over to an asynchronous replica, and it’s behind, how do I sync up
changes after the original primary comes back online?
Ans:
When I go through an AG design with a team, we talk about the work required to merge
the two databases together. If it’s complex (like lots of parent/child tables with identity
fields, and no update datestamp field on the tables), then management agrees to a
certain amount of data loss upon failover. For example, “If we’re under fifteen minutes
of data is involved, we’re just going to walk away from it.” Then we build a project plan
for what it would take to actually recover >15 minutes of data, and management decides
whether they want to build that tool ahead of time, or wait until disaster strikes.
PART-III:
Q. We have got an alert “WSFC cluster service is offline”. What is your action
plan?
Ans:
This alert is raised when the WSFC cluster is offline or in the forced quorum state. All
availability groups hosted within this cluster are offline (a disaster recovery action is
required).
Possible Reasons:
This issue can be caused by a cluster service issue or by the loss of the quorum in the
cluster.
Possible Solutions:
Use the Cluster Administrator tool to perform the forced quorum or disaster recovery
workflow. Once WFSC is started you must re-evaluate and reconfigure NodeWeight
values to correctly construct a new quorum before bringing other nodes back online.
Otherwise, the cluster may go back offline again.
Reestablishment may require if there are any High Availability features (Alwayson
Availability Groups, Log Shipping, Database Mirroring) using on effected nodes.
Q. How to force a WSFC (Windows Server Failover Cluster) Cluster to start
without a quorum?
Ans:
This can be done using
Failover Cluster Manager
Net.exe
PowerShell
Here we’ll see how this can be done using FCM.
Failover Cluster Manager
Open a Failover Cluster Manager and connect to the desired cluster node to force
online.
In the Actions pane, click Force Cluster Start, and then click Yes – Force my cluster
to start.
In the left pane, in the Failover Cluster Manager tree, click the cluster name.
In the summary pane, confirm that the current Quorum Configuration value is:
Warning: Cluster is running in ForceQuorum state.
Q. We have got an alert “Availability group is offline”. Can you explain about
this warning and your action plan?
Ans:
This alert is raised when the cluster resource of the availability group is offline or the
availability group does not have a primary replica.
Possible Reasons:
The availability group is not configured with automatic failover mode. The primary
replica becomes unavailable and the role of all replicas in the availability group
become RESOLVING.
The availability group is configured with automatic failover mode and does not
complete successfully.
The availability group resource in the cluster becomes offline.
There is an automatic, manual, or forced failover in progress for the availability
group.
Possible Solutions:
If the SQL Server instance of the primary replica is down, restart the server and then
verify that the availability group recovers to a healthy state.
If the automatic failover appears to have failed, verify that the databases on the
replica are synchronized with the previously known primary replica, and then failover
to the primary replica. If the databases are not synchronized, select a replica with a
minimum loss of data, and then recover to failover mode.
If the resource in the cluster is offline while the instances of SQL Server appear to be
healthy, use Failover Cluster Manager to check the cluster health or other cluster
issues on the server. You can also use the Failover Cluster Manager to attempt to
turn the availability group resource online.
If there is a failover in progress, wait for the failover to complete.
Q. We have got an alert “Availability group is not ready for automatic failover”.
Can you explain about this warning and your action plan?
Ans:
This alert is raised when the failover mode of the primary replica is automatic; however
none of the secondary replicas in the availability group are failover ready.
Possible Reasons:
The primary replica is configured for automatic failover; however, the secondary replica
is not ready for automatic failover as it might be unavailable or its data synchronization
state is currently not SYNCHRONIZED.
Possible Solutions:
Verify that at least one secondary replica is configured as automatic failover. If there
is not a secondary replica configured as automatic failover, update the configuration
of a secondary replica to be the automatic failover target with synchronous commit.
Use the policy to verify that the data is in a synchronization state and the automatic
failover target is SYNCHRONIZED, and then resolve the issue at the availability
replica.
Q. In your environment data inserted on Primary replica but not able to see
that on secondary replica. When you check that Availability is in healthy state
and in most cases data reflects in a few minutes but in this case it’s didn’t
happen. Now you need to check for the bottleneck and fix the issue. Can you
explain your views and workaround in this situation?
Ans:
Possible Reasons:
Long-Running Active Transactions
High Network Latency or Low Network Throughput Causes Log Build-up on the
Primary Replica
Another Reporting Workload Blocks the Redo Thread from Running
Redo Thread Falls behind Due to Resource Contention
Possible Workaround:
Use DBCC OPENTRAN and check if there are any oldest transactions running on
primary replica and see if they can be rolled back.
A high DMV (sys.dm_hadr_database_replica_states) value log_send_queue_size can
indicate logs being held back at the primary replica. Dividing this value by
log_send_rate can give you a rough estimate on how soon data can be caught up on
the secondary replica.
Check two performance objects SQL Server:Availability Replica > Flow Control Time
(ms/sec) and SQL Server:Availability Replica > Flow control/sec. Multiplying these
two values shows you in the last second how much time was spent waiting for flow
control to clear. The longer the flow control wait time, the lower the send rate.
When the redo thread is blocked, an extended event called
sqlserver.lock_redo_blocked is generated. Additionally, you can query the DMV
sys.dm_exec_request on the secondary replica to find out which session is blocking
the REDO thread, and then you can take corrective action. You can let the reporting
workload to finish, at which point the redo thread is unblocked. You can unblock the
redo thread immediately by executing the KILL command on the blocking session ID.
The following query returns the session ID of the reporting workload that is blocking
the redo thread.
Transact-SQL
Select session_id, command, blocking_session_id, wait_time, wait_type, wait_resource
from sys.dm_exec_requests
where command = ‘DB STARTUP’
When Redo Thread Falls Behind Due to Resource Contention; a large reporting
workload on the secondary replica has slowed down the performance of the
secondary replica, and the redo thread has fallen behind. You can use the following
DMV query to see how far the redo thread has fallen behind, by measuring the
difference between the gap between last_redone_lsn and last_received_lsn.
Transact-SQL
Select recovery_lsn, truncation_lsn, last_hardened_lsn,
last_received_lsn, last_redone_lsn, last_redone_time
from sys.dm_hadr_database_replica_states.
If you see thread is indeed failing behind, do a proper investigation and take the help of
resource governor and can control the CPU cycles
Note: Have a look at MSDN sites and try to understand these solutions because when
you say possible solutions, immediately you might be asked about resolutions.
Q. You perform a forced manual failover on an availability group to an
asynchronous-commit secondary replica, you find that data loss is more than
your recovery point objective (RPO). Or, when you calculate the potential data
loss of an asynchronous-commit secondary replica using the method in Monitor
Performance for AlwaysOn Availability Groups, you find that it exceeds your
RPO. What are the possible reasons that causes data loss is more than your
recovery point objective?
Ans:
There are mainly two reasons:
High Network Latency or Low Network Throughput Causes Log Build-up on
the Primary Replica. The primary replica activates flow control on the log send
when it has exceeded the maximum allowable number of unacknowledged messages
sent over to the secondary replica. Until some of these messages have been
acknowledged, no more log blocks can be sent to the secondary replica. Since data
loss can be prevented only when they have been hardened on the secondary replica,
the build-up of unsent log messages increases potential data loss.
Disk I/O Bottleneck Slows Down Log Hardening on the Secondary Replica. If
the log file and the data file are both mapped to the same hard disk, reporting
workload with intensive reads on the data file will consume the same I/O resources
needed by the log hardening operation. Slow log hardening can translate to slow
acknowledgement to the primary replica, which can cause excessive activation of the
flow control and long flow control wait times.
Q. After an automatic failover or a planned manual failover without data loss on
an availability group, you find that the failover time exceeds your recovery time
objective (RTO). Or, when you estimate the failover time of a synchronous-
commit secondary replica (such as an automatic failover partner) using the
method in Monitor Performance for AlwaysOn Availability Groups, you find that
it exceeds your RTO. Can you explain what are the possible reasons which
causes the failover time exceeds your RTO?
Ans:
Reporting Workload Blocks the Redo Thread from Running: On the secondary
replica, the read-only queries acquire schema stability (Sch-S) locks. These Sch-S
locks can block the redo thread from acquiring schema modification (Sch-M) locks to
make any DDL changes. A blocked redo thread cannot apply log records until it is
unblocked. Once unblocked, it can continue to catch up to the end of log and allow
the subsequent undo and failover process to proceed.
Redo Thread Falls Behind Due to Resource Contention: When applying log
records on the secondary replica, the redo thread reads the log records from the log
disk, and then for each log record it accesses the data pages to apply the log record.
The page access can be I/O bound (accessing the physical disk) if the page is not
already in the buffer pool. If there is I/O bound reporting workload, the reporting
workload competes for I/O resources with the redo thread and can slow down the
redo thread.
Q. Let’s say you have configured Automatic failover on SQL server 2012
AlwaysOn environment. An automatic failover triggered but unsuccessful in
making secondary replica as PRIMARY. How do you identify that failover is not
successful and what are the possible reasons that causes an unsuccessful
failover?
Ans:
If an automatic failover event is not successful, the secondary replica does not
successfully transition to the primary role. Therefore, the availability replica will report
that this replica is in Resolving status. Additionally, the availability databases report that
they are in Not Synchronizing status, and applications cannot access these databases.
Possible Reasons for Unsuccessful Failover:
“Maximum Failures in the Specified Period” value is exhausted: The
availability group has Windows cluster resource properties, such as the Maximum
Failures in the Specified Period property. This property is used to avoid the indefinite
movement of a clustered resource when multiple node failures occur.
Insufficient NT Authority\SYSTEM account permissions: The SQL Server
Database Engine resource DLL connects to the instance of SQL Server that is hosting
the primary replica by using ODBC in order to monitor health. The logon credentials
that are used for this connection are the local SQL Server NT AUTHORITY\SYSTEM
login account. By default, this local login account is granted the following
permissions: 1.Alter Any Availability Group, 2.Connect SQL, 3.View server state. If
the NT AUTHORITY\SYSTEM login account lacks any of these permissions on the
automatic failover partner (the secondary replica), then SQL Server cannot start
health detection when an automatic failover occurs. Therefore, the secondary replica
cannot transition to the primary role. To investigate and diagnose whether this is the
cause, review the Windows cluster log.
The availability databases are not in a SYNCHRONIZED state: In order to
automatically fail over, all availability databases that are defined in the availability
group must be in a SYNCHRONIZED state between the primary replica and the
secondary replica. When an automatic failover occurs, this synchronization condition
must be met in order to make sure that there is no data loss. Therefore, if one
availability database in the availability group in the synchronizing or not synchronized
state, automatic failover will not successfully transition the secondary replica into the
primary role.
Q. Have you ever seen the Error 41009?
Ans:
Yes! This error might occur when you try to create multiple availability groups in a SQL
Server 2012 AlwaysOn failover clustering environment. This issue can be resolved by
applying Cumulative Update Package 2.
Q. Let’s say you added a new file to a database which is a part of AlwaysOn
Availability Groups. The add file operation succeeded on primary replica but
failed in secondary replica. What is the impact and how you troubleshoot?
Ans:
This might happens due to a different file path between the systems that hosts primary
and secondary replica. Failed add-file operation will cause the secondary database to be
suspended. This, in turn, causes the secondary replica to enter the NOT
SYNCHRONIZING state.
Resolution:
Remove the secondary database from the availability group.
On the existing secondary database, restore a full backup of the filegroup that
contains the added file to the secondary database, using WITH NORECOVERY and
WITH MOVE (Specify the correct file path as per secondary).
Back up the transaction log that contains the add-file operation on the primary
database, and manually restore the log backup on the secondary database using
WITH NORECOVERY and WITH MOVE. Restore the last transaction log file with NO
RECOVERY.
Rejoin the secondary database to the availability group.
Q. Can you write T-SQL statement for joining a replica to availability group?
(AG name “ProAG”
Ans:
Connect to the server instance that hosts the secondary replica and issue the below
statement:
ALTER AVAILABILITY GROUP ProAG JOIN;
The same operation can be done using SSMS or using Power Shell
Q. Data synchronization state for one of the availability database is not healthy.
Can you tell me the possible reasons?
Ans:
If this is an asynchronous-commit availability replica, all availability databases should be
in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all
availability databases should be in the SYNCHRONIZED state. This issue can be caused
by the following:
The availability replica might be disconnected.
The data movement might be suspended.
The database might not be accessible.
There might be a temporary delay issue due to network latency or the load on the
primary or secondary replica.
Q. Let’s say we have a premium production server and it is in AlwaysOn
Availability Group. You oberve that CPU utilization is hitting top at a specific
time in a day. You did an RCA and found that CPU utilization reaches top and
most CPU is from backup process due to backup compression is on. Now what
do you suggest? Do we have any features for backup
Ans:
Yes! There is an option to perform backup from secondary replicas. We can set this from
Availability Group properties we can find “Backup Preferences” and from that we can
choose one of the option from:
Preferred Secondary: Backups performed on Secondary if there is no secondary
configured performed from primary
Secondary Only: Backups should be done from secondary only
Primary: Must occur on Primary Replica
Any Replica: Can occur from any replica in Availability Group
Q.Is there any specific limitations if we need to perform auto backups from
secondary backups?
Ans:
Yes! There are few:
Only Copy_Only backup allowd from secondary replica
Differential backups not allowed from secondary replica.
Log backups can be performed from different secondary replicas but all these
backups maintains a single log chain (LSN sequence). It might help in some of the
situations
Q. Have you ever applied patches / CU / service packs on Alwayson Availability
Groups? Did you face any issues while applying?
Ans:
Yes! I have applied CU and service packs on SQL Server 2012 SP2 Cumulative Update 4
I had a bad experience with Alwayson AG:
After CU4 applied we saw that AlwaysOn vailiabilty Gropus are in Non- Synchronizing
state.
After RCA we found that there was a huge blocking between user sessions and a
unknown session, CHECKPOINT with command running as “DB_STARTUP”.
Through of the MSDN SITE we found that Microsoft declared it’s a bug and the solution
chosen as below:
We had to open an outage:
Disable Automatic Failover
Restart the SQL Server on Primary Replica
Re-enable automatic failover.
This worked and fixed the issue.
Q. Can you explain any difficult issue you have faced recently on High
Availability Groups?
Ans:
Sure! We are configuring AlwaysOn AG on SQL server 2014.
We have taken backup from Primary replica and restored on secondary replica
When we are trying to add secondary replica to availability group to our surprise sql
server got shut down and we found the error message:
(Error: 3449, Severity: 21, State: 1.
SQL Server must shut down in order to recover a database (database ID 1). The
database is either a user database that could not be shut down or a system database.
Restart SQL Server. If the database fails to recover after another startup, repair or
restore. SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an
informational message only; no user action is required. )
Cause:
We did RCA and found the below.
Service broker is enabled at Primary Replica
We have taken a full backup from Primary Replica
Restored on Secondary Replica where Service Broker is not enabled
When we try to add secondary replica to AG, Service Broker is enabled, the same
GUID on availability database is detected which causes an silent error 9772:
“The Service Broker in database “<dbname>” cannot be enabled because there is
already an enabled Service Broker with the same ID”.
This results into error 3449 and shut down the sql server unexpectedly.
Solution:
This has been fixed by applying the CU1 on SQL Server 2014.
Q. Replica is in “resolving” status? What does it mean?
Ans:
A replica is into “RESOLVING” state when a auto failover is not successful.
Additionally the availability databases reports that they are in non-synchronizing state
and not accessible.
Q. What are the top reasons that cause an unsuccessful failover?
Ans:
Auto failovers in a specific period may crossed the value “Maximum Failures in the
Specified Period”
Insufficient NT Authority\SYSTEM account permissions
The availability databases are not in a SYNCHRONIZED state