HackerRank SQL 5 Stars PDF
HackerRank SQL 5 Stars PDF
HackerRank SQL 5 Stars PDF
Problems
Hackerrank.com
By: Majdi Awad
Senior PHP & SQL Developer
23/09/2023
Abstract
In an era where data reigns supreme, proficiency in Structured Query Language (SQL) has
become a vital skill for anyone venturing into the realm of data management and analysis.
SQL is not just a database querying language; it is the key to unlocking the vast treasure
troves of information stored in databases worldwide. To embark on this journey of SQL
mastery, one must not merely skim the surface but delve deep into the art and science of
data manipulation. This document, borne out of my own experiences and ambitions, serves
as a comprehensive guide for those seeking to enhance their SQL prowess through
hands-on problem-solving.
The world we live in today is shaped by data. From e-commerce giants to healthcare
institutions, every sector relies on data to drive decision-making and innovation. In the heart
of this data-driven revolution stands SQL, the quintessential programming language for
managing, querying, and extracting insights from databases. SQL proficiency is not just a
checkbox on a resume; it's a gateway to a world of opportunities and the cornerstone of a
successful career in data science, database administration, and software development.
Practice makes perfect, and nowhere is this adage truer than in the world of SQL. SQL is not
a language that can be fully grasped through passive learning; it demands active
engagement and hands-on experience. Imagine learning to play a musical instrument solely
by reading music theory books without ever touching the instrument itself – the result would
be far from mastery. Similarly, to truly master SQL, one must apply theoretical knowledge to
practical scenarios.
Problem-solving is the bridge between theory and practice. It empowers SQL practitioners
to:
1. Searching and Retrieval Algorithms: SQL SELECT queries involve searching for
specific data in databases. Understanding search algorithms, such as linear search
and binary search, can help optimise query performance, especially when dealing
with large datasets.
2. Sorting Algorithms: Sorting is a common operation in SQL, especially when retrieving
data in a specific order. Familiarity with sorting algorithms like quicksort and
mergesort can be advantageous.
3. Join Algorithms: SQL often involves joining data from multiple tables. Knowing join
algorithms, such as nested loop joins and hash joins, is essential for efficient query
writing.
4. Aggregation Algorithms: Aggregation operations like SUM, COUNT, and AVG require
algorithms to compute results. Understanding aggregation techniques can improve
query efficiency.
5. Indexing Techniques: Indexing is crucial for quick data retrieval. Learn about
database indexing methods like B-tree and hash indexing to optimise query
performance.
6. Data Transformation: SQL is used to clean and transform data. Algorithms for data
parsing, regular expressions, and data conversion are essential skills.
7. Recursion: Recursive queries are becoming more common in SQL. Understanding
recursion and recursive common table expressions (CTEs) is valuable.
To master SQL through problem-solving, one must adopt a systematic approach. Each
problem is an opportunity to learn, and it's crucial to read, understand, solve, and explain
them methodically:
1. Reading the Problem: Begin by carefully reading and analysing the problem
statement. Pay attention to the provided dataset, constraints, and expected output.
2. Understanding the Problem: Before diving into solutions, ensure you fully grasp the
problem's requirements, constraints, and objectives. Break down the problem into
smaller, manageable components.
3. Solving the Problem: Apply your SQL skills to craft a solution. Start with a plan, write
SQL queries or scripts, and test them rigorously on sample data.
4. Explaining the Solution: Document your thought process and solution step-by-step.
Clear and concise explanations not only help reinforce your understanding but also
aid others in learning from your solution.
Reasons Behind This Document
Several factors have inspired me to create this document. First and foremost, the
ever-increasing importance of SQL in the job market cannot be overstated. A proficiency in
SQL is often a prerequisite for roles in data analysis, database administration, and software
development. By sharing my journey of SQL mastery through problem-solving, I hope to
empower others to unlock their potential and seize these career opportunities.
Secondly, the wealth of online resources, particularly platforms like HackerRank, offers a
treasure trove of SQL problems waiting to be solved. Navigating this vast sea of problems
can be daunting, and having a structured guide can significantly ease the learning process.
Lastly, as a job seeker in the field of data, I understand the value of a strong online
presence. Sharing this document is not only a testament to my commitment to SQL mastery
but also a plea for support from the community. In the competitive job market, networking
and sharing knowledge can make all the difference.
This document is more than just a collection of words; it is a roadmap to SQL mastery.
Reading and studying this document can offer you several advantages:
1. Structured Learning: The document provides a structured learning path for SQL
problem-solving, making it easier to progress from novice to expert.
2. Algorithmic Insights: By delving into the algorithms and techniques used in SQL, you
gain a deeper understanding of the language's inner workings.
3. Confidence Building: The problem-solving approach outlined here builds confidence
by enabling you to tackle increasingly complex SQL challenges.
4. Career Advancement: SQL proficiency is a valuable asset in today's job market.
Mastery can open doors to a variety of exciting and well-paying career opportunities.
5. Community Support: Sharing this document not only benefits you but also contributes
to a thriving community of SQL enthusiasts. By collaborating and sharing knowledge,
we can grow together.
I ask you, dear reader, to not only benefit from this document but also to share it far and
wide. Your support and contributions will not only help me in my career but also create a
ripple effect of knowledge-sharing and collaboration within the SQL community.
Let us unlock the true potential of SQL through problem-solving and chart a course towards
a future enriched with data-driven insights.
Majdi Awad
Senior PHP & SQL Developer
23/09/2023
Problem Number 1
Query the two cities in STATION with the shortest and longest CITY names, as well as their
respective lengths (i.e.: number of characters in the name). If there is more than one
smallest or largest city, choose the one that comes first when ordered alphabetically.
The STATION table is described as follows:
Explanation:
In this problem, I am asked to imagine the existence of a table called "Station" that contains
several columns, one of which, according to the problem statement, is the "City" column. I
am then asked to find the longest city name, display it along with its length, and
subsequently find the shortest city name, display it along with its length.
Solution:
To solve this SQL problem, you can start by writing a query that retrieves the cities and their
corresponding lengths from the "STATION" table. For the longest city name, you can use the
MAX() function combined with the LENGTH() function to find the maximum length of city
names. To find the shortest city name, you can use the MIN() function in conjunction with
LENGTH(). Additionally, you can use the ORDER BY clause to ensure that in cases where
there are multiple cities with the longest or shortest names, the one that comes first
alphabetically is selected. The algorithm used here primarily relies on aggregation functions
(MAX and MIN) and the LENGTH function to identify the longest and shortest city names
and their lengths, and ordering the results accordingly.
The meaning of the statement in the previous paragraph is that there are at least two ways
to solve this problem, which are:
In this method of solving, I will first determine the city's name and its letter count using the
'length' function from the 'station' table. Then, I will set a condition where the value (number
of letters in the city's name) is equal to the result of another query in which I determine the
maximum value of letter counts using the 'max' function from the 'station' table. After that, I
will sort the results based on the city name and request to display only one record using
'limit'.
This query uses MAX(LENGTH(CITY)) to find the maximum length of city names and then
retrieves the city with that maximum length. If there are multiple cities with the same longest
name, it selects the one that comes first alphabetically.
By this method, I will have been able to find the longest city name and its letter count. To find
the shortest city name and determine its letter count, I will repeat the same approach, but
here I will use the 'min' function.
The second query uses MIN(LENGTH(CITY)) to find the minimum length of city names and
retrieves the city with that minimum length. If there are multiple cities with the same shortest
name, it selects the one that comes first alphabetically.
These queries will give me the desired results using the MAX() and MIN() functions.
This method is very simple and essentially relies on sorting city names according to the
number of letters that make up each city's name. If we assume, for example, that the city
column in the table contains the following cities:
1. Dubai
2. Abu Dhabi
3. Sharjah
According to this method of solving, I would first need to determine the number of letters in
each city as follows:
1. Dubai => 5
2. Abu Dhabi => 8
3. Sharjah => 7
Now, I will arrange the cities in descending order based on the number of letters in each city
as follows:
1. Abu Dhabi => 8
2. Sharjah => 7
3. Dubai => 5
Now, all I need to do is display the data in the record of the first city (the one with the largest
number of letters), and I will have successfully found the longest city name.
By the same method, I can find the city with the fewest letters, and all I need to change is the
sorting order of the cities, arranging them in ascending order based on the number of letters
from the least to the most.
This code will give us two results. The first result will be the city with the shortest name and its
length, and the second result will be the city with the longest name and its length. If there are
multiple cities with the same shortest or longest name, the one that comes first alphabetically
will be selected.
Programmers are akin to problem-solving virtuosos, and their toolbox is replete with various
techniques and approaches to tackle challenges in software development. What sets
exceptional programmers apart is not their reliance on a singular, fixed method but their
adaptability and creativity in devising multiple solutions to a single problem. This versatility is
a testament to the richness of the programming world, where multiple pathways often lead to
the same outcome. The essence of effective programming lies not solely in syntax or rote
memorization but in the ability to think critically and strategically. It's about fostering a
developer's mindset, one that thrives on breaking down complex issues into manageable
components, analysing them from different angles, and architecting elegant solutions.
Consequently, the hallmark of a skilled programmer is not just proficiency in a particular
language or technology, but the capacity to think abstractly, to foresee potential pitfalls, and
to engineer elegant, scalable, and maintainable code. In this dynamic and ever-evolving
field, adaptability and problem-solving prowess are paramount. It is this capacity to think as
a developer, to navigate the labyrinth of possibilities, and to choose the most efficient and
elegant path that truly defines a programmer's worth in the world of software development.
Problem Number 2
Query the list of CITY names starting with vowels (i.e., a, e, i, o, or u) from STATION. Your
result cannot contain duplicates.
Input Format
Explanation:
This problem entails extracting a unique list of CITY names from the STATION table that
begin with vowels, specifically, the vowels 'a,' 'e,' 'i,' 'o,' or 'u.' In simpler terms, you are
tasked with finding and listing the names of cities in the STATION table that start with any of
these five vowels.
To achieve this, you'll need to query the STATION table, apply a condition that filters for city
names starting with vowels, and ensure that the final result contains distinct city names
without any duplicates.
The STATION table likely contains information about various cities, including their names.
This SQL query is designed to extract a specific subset of city names based on the
vowel-starting criterion, providing you with a clear and concise list of these cities without
repetition.
In summary, the task is to identify and list unique city names from the STATION table that
have vowels as their initial letter, adhering to the specified input format.
Solution:
This problem is very simple and doesn't require programming skills as much as it needs a
good knowledge of SQL and its tools. The solution is extremely straightforward; I will first
select city names without duplicates, then I will apply conditions to the query, in short, stating
that the query should display cities that start with the letter A, E, I, and so on.
1. SELECT DISTINCT CITY: This part selects the distinct (unique) CITY names from
the STATION table.
2. FROM STATION: Specifies the source table as STATION.
3. WHERE CITY LIKE 'a%' OR CITY LIKE 'e%' OR CITY LIKE 'i%' OR CITY LIKE 'o%'
OR CITY LIKE 'u%': This condition filters CITY names based on whether they start
with any of the specified vowels. The % is a wildcard character that matches any
number of characters following the specified vowel.
By using DISTINCT, you ensure that the result contains only unique CITY names without
duplicates. This query will retrieve the list of CITY names that start with vowels ('a,' 'e,' 'i,' 'o,'
or 'u') from the STATION table.
Problem Number 3
Query the list of CITY names ending with vowels (a, e, i, o, u) from STATION. Your result
cannot contain duplicates.
Input Format
I won't go into much detail in explaining this problem since it's exactly the same as the
previous one, except here, I'm required to display cities that end with vowel letters. The
repetition of this type of problem in HackerRank prompts me to emphasise the following:
In the ever-evolving world of programming, where new languages and frameworks seem to
emerge almost daily, it's easy to get caught up in the excitement of diving straight into
problem-solving without taking a step back to understand the fundamental structure of the
language itself. This tendency to rush headlong into coding is not uncommon, especially
among eager beginners eager to tackle real-world challenges. However, it's crucial to
emphasise that before delving deep into complex problem-solving, a programmer must have
a solid grasp of the language's syntax, semantics, and core principles.
Programming languages are like the building blocks of software development. They are the
tools that allow developers to communicate with computers, instructing them on how to
perform specific tasks. Just as a builder needs to understand the properties of different
materials and the fundamentals of construction before designing intricate structures, a
programmer must comprehend the structure of the language they are working with. This
includes understanding how data is stored, how functions are defined and called, how
variables are scoped, and how control flow is managed.
A deep understanding of the language is not merely an academic exercise; it's a practical
necessity. Without it, programmers may find themselves stumbling through their code,
encountering cryptic error messages, and struggling to identify the root causes of bugs. They
may resort to copying and pasting code snippets from the internet without truly
comprehending what those snippets do, leading to a fragile and unsustainable codebase.
Furthermore, a solid understanding of the language is the foundation upon which more
advanced problem-solving skills are built. It enables programmers to think critically about
how to structure their code, choose the right data structures and algorithms, and optimise for
performance. It empowers them to craft elegant and maintainable solutions to complex
problems.
Consider a programming language as a toolkit, and each feature of the language as a tool
within that kit. To use these tools effectively, a programmer must not only know that they
exist but also understand how and when to use them. This understanding goes beyond
memorising syntax; it involves grasping the underlying concepts and principles.
Solution:
1. SELECT DISTINCT CITY: This part of the query selects unique (distinct) values from
the "CITY" column of the "STATION" table. It means that if there are multiple rows
with the same city name, only one instance of that city name will be returned in the
result set.
2. FROM STATION: This specifies the table from which the data is being queried, in this
case, the "STATION" table.
3. WHERE CITY LIKE '%a' OR CITY LIKE '%e' OR CITY LIKE '%i' OR CITY LIKE '%o'
OR CITY LIKE '%u': This is the filtering condition. It's using the LIKE operator with
the wildcard symbol % to match city names that end with the vowels 'a,' 'e,' 'i,' 'o,' or
'u.' Each LIKE condition is combined with the OR operator, meaning that if any of
these conditions is true for a city name, it will be included in the result.
So, the query retrieves distinct city names from the "STATION" table where the city name
ends with any of the specified vowels. The result will be a list of unique city names that meet
this criteria.
Problem Number 4
Query the list of CITY names from STATION which have vowels (i.e., a, e, i, o, and u) as
both their first and last characters. Your result cannot contain duplicates.
All that's needed in the query is for me to extract the names of cities that both start and end
with one of the vowel letters mentioned in the problem statement. I can easily accomplish
this using the tools and functions provided in SQL.
To achieve this, the code utilises SQL string functions: the LEFT() function extracts the first
character of each CITY name, and the RIGHT() function extracts the last character. The
LOWER() function is used to ensure a case-insensitive comparison. The query then filters
CITY names where both the first and last characters are lowercase vowels, effectively
creating a case-insensitive search for CITY names that meet the specified criteria, and it
ensures that the result contains no duplicates.
Solution:
This query selects CITY names that start and end with vowels, and it doesn't use DISTINCT,
so the result may contain duplicates if there are multiple occurrences of the same CITY
name that meets the criteria.
Problem Number 5
Query the list of CITY names from STATION that do not start with vowels. Your result cannot
contain duplicates.
Input Format
It’s another problem, HackerRank aims to assess your proficiency in SQL as a way of
preparing you for more advanced and complex challenges.
In SQL, the 'NOT LIKE' operator is a powerful tool used to filter and retrieve data from a
database that does not match a specified pattern or condition. Unlike the 'LIKE' operator,
which is used to search for data that matches a particular pattern, 'NOT LIKE' does the
opposite by excluding data that meets a given criterion. It allows SQL developers to query
and extract records that do not adhere to a specific pattern or do not contain a particular
substring.
The 'NOT LIKE' operator is particularly useful in scenarios where you want to narrow down
your search to exclude certain types of data. For instance, you can use 'NOT LIKE' to find all
customers whose email addresses do not end with a specific domain, or to retrieve records
of products that do not have a particular keyword in their descriptions.
This operator relies on wildcard characters, such as '%' and '_', which further enhance its
flexibility. By incorporating 'NOT LIKE' with these wildcards, SQL developers can craft
complex and precise queries to filter out unwanted data, making it an indispensable tool for
data manipulation and retrieval tasks. Whether it's in data cleansing, data analysis, or report
generation, the 'NOT LIKE' operator plays a vital role in ensuring that SQL queries yield
accurate and relevant results by excluding records that do not conform to specified criteria.
to retrieve a unique list of CITY names from the "STATION" table while applying a specific
filtering condition. This condition, implemented through a series of 'NOT LIKE' clauses,
excludes CITY names that start with any of the vowels 'a,' 'e,' 'i,' 'o,' or 'u.' In essence, it
seeks to extract CITY names that do not commence with these vowel letters.
By utilising 'DISTINCT,' the query ensures that the result set contains only distinct CITY
names, eliminating duplicates. This code is particularly useful when there is a need to filter
and display a curated list of CITY names that do not conform to the specified vowel-based
criteria. It showcases the versatility of SQL in data retrieval and demonstrates how SQL can
be employed to meticulously refine query results based on specific patterns or conditions.
Solution:
SELECT DISTINCT CITY FROM STATION WHERE CITY NOT LIKE 'a%' AND CITY NOT
LIKE 'e%' AND CITY NOT LIKE 'i%' AND CITY NOT LIKE 'o%' AND CITY NOT
LIKE 'u%';
1. SELECT DISTINCT CITY: This part of the query selects the distinct (unique) CITY
names from the "STATION" table. It ensures that the result set will only contain
unique city names without any duplicates.
2. FROM STATION: Specifies that you are querying the "STATION" table as the source
of data.
3. WHERE CITY NOT LIKE 'a%': This line sets a condition for filtering the data. It
selects CITY names that do not start with the letter 'a'. The '%' symbol is a wildcard
that represents any sequence of characters following the 'a'.
4. AND CITY NOT LIKE 'e%': This line is similar to the previous one but filters out CITY
names that do not start with the letter 'e'.
5. AND CITY NOT LIKE 'i%': This line filters out CITY names that do not start with the
letter 'i'.
6. AND CITY NOT LIKE 'o%': This line filters out CITY names that do not start with the
letter 'o'.
7. AND CITY NOT LIKE 'u%': Finally, this line filters out CITY names that do not start
with the letter 'u'.
The result of this query will be a list of distinct CITY names from the "STATION" table, where
none of them start with any of the specified vowels ('a,' 'e,' 'i,' 'o,' or 'u'). In essence, it
provides a unique list of CITY names that do not commence with these vowel letters.
Problem Number 6
Query the list of CITY names from STATION that do not end with vowels. Your result cannot
contain duplicates.
Input Format
Explanation:
Yes, it is the same problem as the previous one, about 99%. While it may seem repetitive to
solve problems in this manner, please do not forget that we have not yet completed the
second level of exercises. It's natural that the goal of these exercises is learning, not just
testing the level. Therefore, repeating some problems with slight variations can be highly
beneficial.
Repetition and practising problems with small variations are invaluable techniques in the
realm of education, particularly in fields like mathematics, science, and programming. This
approach, often referred to as deliberate practice, plays a pivotal role in enhancing a
student's learning experience and mastery of a subject. Here's a comprehensive look at how
this strategy aids students in their academic journey.
First and foremost, repetition fosters a deeper understanding of concepts. When students
encounter the same problem multiple times with slight variations, they are compelled to think
critically about the underlying principles at play. This repeated exposure allows them to
grasp the fundamental concepts and nuances better. It's akin to practising a musical
instrument or a sport; the more you rehearse, the more finely tuned your skills become.
Variation in problem-solving is another key aspect. When problems are presented with minor
differences, students are required to adapt their problem-solving strategies. This encourages
them to think flexibly and creatively, which are essential skills in real-world problem-solving
scenarios. For instance, in mathematics, varying the coefficients or parameters in equations
or changing the initial conditions in physics problems forces students to apply their
knowledge in new ways.
Repetition also aids in memory retention. Revisiting a problem several times over a period
reinforces the information and helps transfer it from short-term to long-term memory. When
students encounter a similar problem in future coursework or exams, they are more likely to
recall the solution or the methodology used to solve it. This not only boosts their confidence
but also improves their overall retention of the subject matter.
Furthermore, practising with variations cultivates resilience and adaptability. Students learn
that not all problems are identical, and they become adept at adjusting their strategies to suit
the specifics of each situation. This adaptability extends beyond academic settings and
proves valuable in navigating life's challenges, where problem-solving often requires a
flexible mindset.
Moreover, repetition and small variations can help alleviate test anxiety. When students have
encountered similar problems during their practice sessions, they enter exams with greater
confidence. They know they have the skills and knowledge to handle whatever variations the
test might throw at them, reducing anxiety and improving performance.
In conclusion, the practice of repeating problems with slight variations is a powerful tool for
students' academic growth and skill development. It deepens understanding, enhances
memory retention, cultivates adaptability, builds problem-solving skills, and bolsters
confidence. This deliberate practice methodology is not just about rote learning but about
fostering a holistic and robust learning experience that equips students with the tools they
need to excel academically and succeed in real-world challenges.
Solution:
SELECT DISTINCT CITY FROM STATION WHERE CITY NOT LIKE '%a' AND CITY NOT
LIKE '%e' AND CITY NOT LIKE '%i' AND CITY NOT LIKE '%o' AND CITY NOT
LIKE '%u';
1. SELECT DISTINCT CITY: This part of the query selects the distinct (unique) CITY
names from the "STATION" table. It ensures that the result set will only contain
unique city names without any duplicates.
2. FROM STATION: Specifies that you are querying the "STATION" table as the source
of data.
3. WHERE CITY NOT LIKE '%a': This line sets a condition for filtering the data. It
selects CITY names that do not contain the letter 'a' anywhere within the name. The
'%' symbol is a wildcard that represents any sequence of characters before or after
the 'a'.
4. AND CITY NOT LIKE '%e': This line is similar to the previous one but filters out CITY
names that do not contain the letter 'e' anywhere in the name.
5. AND CITY NOT LIKE '%i': This line filters out CITY names that do not contain the
letter 'i' anywhere in the name.
6. AND CITY NOT LIKE '%o': This line filters out CITY names that do not contain the
letter 'o' anywhere in the name.
7. AND CITY NOT LIKE '%u': Finally, this line filters out CITY names that do not contain
the letter 'u' anywhere in the name.
The result of this query will be a list of distinct CITY names from the "STATION" table, where
none of them contain any of the specified vowels ('a,' 'e,' 'i,' 'o,' or 'u') anywhere within the
name. In essence, it provides a unique list of CITY names that do not contain these vowel
letters.
Input Format
Explanation:
Another problem following the same approach and solution is very straightforward.
Therefore, I won't delve into explaining the solution mechanism here, but rather, I'll focus on
the difference between AND & OR.
In SQL, the logical operators "AND" and "OR" play a pivotal role in filtering and combining
conditions in SQL queries. Understanding the difference between these two operators, as
well as when and where to use them, is essential for crafting precise and effective SQL
statements.
"AND" Operator:
The "AND" operator is used to combine multiple conditions in a SQL query, and it requires
that all of the combined conditions must evaluate to true for a row to be included in the result
set. In other words, it acts as a strict filter, narrowing down the result set to only those rows
that satisfy all specified conditions simultaneously. For example, if you have two conditions,
A and B, in a WHERE clause connected by "AND," the query will return rows that satisfy
both condition A and condition B.
"OR" Operator:
On the other hand, the "OR" operator is used to combine conditions in such a way that a row
will be included in the result set if it satisfies any of the specified conditions. Unlike "AND,"
"OR" provides a more permissive filter, broadening the result set to include rows that meet at
least one of the conditions. For instance, if you have two conditions, X and Y, connected by
"OR," the query will return rows that satisfy either condition X or condition Y.
1. Precedence Rules: "AND" typically takes precedence over "OR" in SQL, but you can
use parentheses to control the order of evaluation. For example, (A AND B) OR C
will ensure that A and B are evaluated together before the "OR" condition.
2. Combining Multiple Conditions: You can use both "AND" and "OR" operators within
the same query to create complex conditions. It's crucial to understand how they
interact to achieve the desired filtering.
3. Use Cases:
a. Use "AND" when you want to narrow down the result set by ensuring that
multiple conditions are met simultaneously. For example, to find employees
who are both managers and have more than five years of experience.
b. Use "OR" when you want to broaden the result set to include rows that meet
at least one of several conditions. For instance, to find products that are either
in stock or on sale.
4. Combining with Other Operators: "AND" and "OR" can be combined with other
operators like "=" (equal), "<>" (not equal), ">" (greater than), "<" (less than), etc., to
create complex filtering criteria.
5. Use of Parentheses: When combining multiple conditions, especially a mix of "AND"
and "OR," using parentheses is crucial to ensure the correct order of evaluation and
to clearly convey the intended logic.
In summary, "AND" and "OR" are fundamental logical operators in SQL that enable you to
filter data based on multiple conditions. "AND" narrows down results by requiring all
conditions to be true, while "OR" broadens results by accepting rows that meet at least one
of the conditions. The proper use of these operators, along with parentheses for clarity, is
essential for crafting accurate and effective SQL queries.
To retrieve a list of CITY names from the STATION table based on two distinct conditions:
either the names should not start with vowels ('a,' 'e,' 'i,' 'o,' or 'u') or they should not end with
vowels. To ensure the result is clear and concise, the query incorporates the DISTINCT
keyword, eliminating any duplicate CITY names. By structuring the conditions with the NOT
LIKE clauses and utilising the logical OR operator, the query effectively captures CITY
names that meet either of the specified criteria. This solution demonstrates the versatility of
SQL in handling complex filtering tasks, enabling the extraction of valuable data insights
from the database. It showcases how SQL can be employed to solve real-world problems
with precision and flexibility, producing a result set that adheres to the unique requirements
of the query while maintaining data integrity.
Solution:
SELECT DISTINCT CITY FROM STATION WHERE CITY NOT LIKE 'a%' AND CITY NOT
LIKE 'e%' AND CITY NOT LIKE 'i%' AND CITY NOT LIKE 'o%' AND CITY NOT
LIKE 'u%' OR CITY NOT LIKE '%a' AND CITY NOT LIKE '%e' AND CITY NOT
LIKE '%i' AND CITY NOT LIKE '%o' AND CITY NOT LIKE '%u';
1. SELECT DISTINCT CITY selects distinct CITY names from the STATION table to
eliminate duplicates.
2. FROM STATION specifies the source table as STATION.
3. The WHERE clause contains two sets of conditions separated by OR. The first set
filters CITY names that do not start with vowels ('a,' 'e,' 'i,' 'o,' or 'u'), and the second
set filters CITY names that do not end with vowels.
By using DISTINCT and the OR operator, this query will retrieve a list of CITY names from
the STATION table where the names either do not start with vowels or do not end with
vowels, and the result set will not contain any duplicate CITY names.
Problem Number 8
Query the list of CITY names from STATION that do not start with vowels and do not end
with vowels. Your result cannot contain duplicates.
Input Format
Explanation:
Another problem focuses on using both 'AND' and 'OR' and is concerned with searching
based on the letter that words start with or end with.
In the previous problem, I discussed in detail the difference between "AND" and "OR."
However, since we are dealing with a similar problem here, allow me to elaborate on the
mechanisms and methods of searching based on the letters that words start with or end
with.
To search for words based on their initial or final letters, you can use the SQL LIKE operator
with wildcard characters. The percent sign % is used as a wildcard to match any sequence
of characters, and the underscore _ matches a single character. Here's how you can use
these in SQL queries:
These SQL queries, using the LIKE operator with appropriate patterns, allow you to search
for words based on the specific letters that they start with or end with. You can adapt these
techniques to your specific database and table as needed.
Solution:
SELECT DISTINCT CITY FROM STATION WHERE CITY NOT LIKE 'a%' AND CITY NOT
LIKE 'e%' AND CITY NOT LIKE 'i%' AND CITY NOT LIKE 'o%' AND CITY NOT
LIKE 'u%' AND CITY NOT LIKE '%a' AND CITY NOT LIKE '%e' AND CITY NOT
LIKE '%i' AND CITY NOT LIKE '%o' AND CITY NOT LIKE '%u';
Problem Number 9
Query the Name of any student in STUDENTS who scored higher than Marks. Order your
output by the last three characters of each name. If two or more students both have names
ending in the same last three characters (i.e.: Bobby, Robby, etc.), secondary sort them by
ascending ID.
Input Format
The Name column only contains uppercase (A-Z) and lowercase (a-z) letters.
Sample Input:
Sample Output
● Ashley
● Julia
● Belvet
● Explanation
Only Ashley, Julia, and Belvet have Marks > . If you look at the last three characters of each
of their names, there are no duplicates and 'ley' < 'lia' < 'vet'.
Explanation:
To retrieve the names of students from the "STUDENTS" table who have scored higher than
a specified "Marks" threshold. It employs the WHERE clause to filter the students based on
their marks, ensuring that only those meeting the condition are included. The ORDER BY
clause arranges the result set first by the last three characters of each student's name using
the RIGHT() function, allowing names with the same ending characters to be grouped
together. In the event of a tie, the query further sorts the results by ascending student IDs,
ensuring a consistent and precise ordering of names. This approach effectively fulfils the
requirement of ordering by the last three characters and providing a secondary sort by
student ID when necessary.
Solution:
SELECT Name FROM STUDENTS WHERE Marks > 75 ORDER BY RIGHT(Name, 3), ID;
1. SELECT Name: Selects the names of students who meet the specified condition.
2. FROM STUDENTS: Specifies the source table as "STUDENTS."
3. WHERE Marks > 75]: Filters the students based on their marks.
4. ORDER BY RIGHT(Name, 3), ID: Orders the result set first by the last three
characters of each student's name using the RIGHT() function, and then by
ascending ID in case of ties. This arrangement ensures that names with the same
last three characters are sorted by their IDs.
Problem Number 10
Write a query that prints a list of employee names (i.e.: the name attribute) from the
Employee table in alphabetical order.
Input Format
The Employee table containing employee data for a company is described as follows:
where employee_id is an employee's ID number, name is their name, months is the total
number of months they've been working for the company, and salary is their monthly salary.
Explanation:
In SQL, the ORDER BY clause is a powerful feature used to sort the result set of a query in
a specific order based on one or more columns. It is a crucial component for presenting data
in a meaningful way. When using ORDER BY, you specify one or more columns by which
you want to sort the result set, and you can choose to sort in ascending (ASC) or
descending (DESC) order. For example, if you have a table of products and you want to
retrieve a list of products sorted by their prices in descending order to show the most
expensive items first, you would use an SQL query like this: SELECT * FROM Products
ORDER BY Price DESC. The result would be a list of products sorted from the highest to the
lowest price. ORDER BY is essential for organising and presenting data in various
applications, from e-commerce websites to financial reports, making it a fundamental tool in
SQL for data manipulation and reporting.
To retrieve the names of employees from the "Employee" table and display them in
ascending alphabetical order. In other words, it will return a list of employee names sorted
from A to Z. The ORDER BY clause with ASC specifies that the sorting should be done in
ascending order based on the "name" column. This query is particularly useful when you
want to obtain a sorted list of employee names for various purposes such as generating
employee directories or creating reports where you need the data to be organised
alphabetically.
Solution:
1. SELECT name: This part of the query specifies the columns that should be included
in the result set. In this case, it selects only the "name" column from the "Employee"
table. So, the result will contain a list of employee names.
2. FROM Employee: This part of the query specifies the source table from which data is
being retrieved. In this case, it's the "Employee" table, which presumably contains
information about employees, including their names.
3. ORDER BY name ASC: This is the crucial part of the query. It instructs the database
to sort the result set in ascending order based on the "name" column. The "ASC"
keyword stands for "ascending," which means the names will be arranged
alphabetically from A to Z.
So, when you execute this SQL query, you will receive a list of employee names from the
"Employee" table, and they will be displayed in alphabetical order, starting with names that
begin with 'A' and ending with names that start with 'Z'. This can be useful for various
purposes, such as creating an employee directory or obtaining a sorted list of names for
reporting and analysis.
Problem Number 11
Write a query that prints a list of employee names (i.e.: the name attribute) for employees in
Employee having a salary greater than per month who have been employees for less than
months. Sort your result by ascending employee_id.
Input Format
The Employee table containing employee data for a company is described as follows:
Explanation:
It's a very simple problem, and all I have to do here is to read it carefully to understand what
is required. It seems that HackerRank is still testing my knowledge of the language and has
not moved on to the level I really want to discuss in this document.
To retrieve a specific subset of employee data from the "Employee" table. It selects only the
"name" column from the table and applies two filter conditions: first, it checks that the
"salary" of each employee is greater than 2000, which means it's looking for employees
earning more than $2000. Second, it checks that the "months" worked by each employee
are less than 10, suggesting it's interested in employees who have worked at the company
for less than 10 months. After filtering the data based on these criteria, the result set is then
ordered in ascending order based on the "employee_id" column. This query might be useful
in scenarios where you need to identify employees who meet specific salary and tenure
conditions and want to display their names in a particular order for further analysis or
reporting.
Solution:
SELECT name FROM Employee WHERE salary > 2000 AND months < 10 ORDER BY
employee_id ASC;
1. SELECT name: This part of the query specifies that you want to retrieve the "name"
column from the "Employee" table. The result will consist of employee names.
2. FROM Employee: This clause indicates that you are querying the "Employee" table,
which presumably contains information about employees, including their names,
salaries, and tenure in months.
3. WHERE salary > 2000 AND months < 10: This part of the query filters the results. It
instructs the database to include only those rows where two conditions are met:
a. The "salary" of the employee is greater than 2000, meaning it selects
employees who earn more than $2000.
b. The "months" worked by the employee are less than 10, indicating employees
who have worked at the company for less than 10 months.
4. ORDER BY employee_id ASC: After filtering the results, this clause sorts them in
ascending order based on the "employee_id" column. As a result, the retrieved
employee names will be displayed in ascending order of their respective employee
IDs.
In summary, this query fetches the names of employees who earn more than $2000 and
have worked for less than 10 months, presenting them in ascending order of their employee
IDs. It's a useful query for identifying specific subsets of employees based on salary and
tenure criteria and ordering them for various reporting or analysis purposes.
Congrats majdi! You just solved all the Basic Select challenges in SQL!
Problem Number 12
Pivot the Occupation column in OCCUPATIONS so that each Name is sorted alphabetically
and displayed underneath its corresponding Occupation. The output column headers should
be Doctor, Professor, Singer, and Actor, respectively.
Note: Print NULL when there are no more names corresponding to an occupation.
Input Format
Occupation will only contain one of the following values: Doctor, Professor, Singer or Actor.
Explanation:
This is the first problem we've encountered here, and it requires additional skills beyond just
language knowledge. If you've ignored the long and tedious explanations in previous
problems, please make a strong effort not to do so here and pay close attention to the
explanation.
First, we need to understand what is required of us in this problem. The problem gives us a
table consisting of only two columns: the 'Name' column and the 'Occupation' column. Our
task is to transform the table so that instead of displaying a person's name and their
occupation side by side, we should display the table in such a way that we have a column for
each occupation, and then we place the names of the workers in each occupation below it
similar to the following example:
Main Table:
Name Occupation
Majdi Developer
Jihad Physical
Mohammed Driver
Khalid Lawer
Query Result:
Now, before we start writing, we need to think about the appropriate mechanism to solve this
problem, regardless of SQL or any other language. In other words, we need to formulate the
solution in our natural language.
To solve this problem, we need to identify the professions present in the profession column
and remove duplicates, leaving us with 4 professions according to the problem. Then we
need to use these professions to create 4 columns, each of which holds the name of a
profession from the mentioned professions. Finally, we need to categorise all the names
based on their professions. For example, if the name is 'Julia' and Julia is an actress, we
should write her name in the 'Actors' column, and so on until we classify all names according
to their professions.In simpler terms, I'll organise the data in rows without repeating
columns in the table.
In SQL, the PIVOT technique is used to transform data from a row-based format into a
column-based format. It's particularly useful when you want to aggregate or reorganise data
in a way that makes it easier to analyse or present. PIVOT is typically used with aggregate
functions to summarise data. Here's a basic explanation of how the PIVOT technique works:
1. Source Data: You start with a table that contains data in a row-based format. Each
row represents a record, and the columns contain various attributes or values.
2. Pivot Column: You identify a column in the source data that you want to pivot or
transform into new columns. This column is often referred to as the "pivot column."
3. Pivot Values: You determine the distinct values within the pivot column. These values
will become the new column headers in the output.
4. Aggregate Functions: You select one or more aggregate functions (e.g., SUM,
COUNT, MAX, AVG) that will be used to summarise data for each combination of
pivot column values.
5. Pivot Query: You write a SQL query that uses the PIVOT operation to reorganise the
data. This query specifies the pivot column, the pivot values, and the aggregate
functions to be applied.
Solution:
1. We first create a common table expression (CTE) called PivotData, which assigns a
row number (rn) to each Name within each Occupation, sorted alphabetically.
2. Then, we use the PIVOT technique to transform the data, using the MAX function to
display the names under their corresponding occupations based on the row number.
3. The GROUP BY rn groups the results by the row number, ensuring that each name
appears only once under its corresponding occupation.
4. The ORDER BY rn sorts the results alphabetically based on the row number.
5. NULL values will automatically be printed when there are no more names
corresponding to an occupation.
This query should give you the desired output with the specified column headers and NULL
values when necessary.
Problem Number 13
You are given a table, BST, containing two columns: N and P, where N represents the value
of a node in Binary Tree, and P is the parent of N.
Write a query to find the node type of Binary Tree ordered by the value of the node. Output
one of the following for each node:
Explanation:
This problem is one of the most prominent and important medium-level (closer to easy)
problems that you will solve on the HackerRank website. The first problem deals with data
structures and requires a bit of thinking and, of course, a good understanding of the basics
of SQL. So, before I start explaining the solution, I need to pause at some terms, and I
kindly request you not to skip the explanation here and to spend some time studying
it, especially if you are a beginner in the field of programming.
A data structure is a way of organising and storing data in a computer's memory or storage
system in a systematic and efficient manner. It provides a means to manage and manipulate
data effectively, allowing for quick access, retrieval, and modification of information. Data
structures are fundamental concepts in computer science and are used in various
applications and algorithms to solve complex problems.
There are many different types of data structures, each designed for specific purposes and
optimised for particular operations. Here are some common types of data structures:
Choosing the right data structure for a particular problem is crucial for optimising algorithms
and achieving efficient data manipulation. The choice of data structure depends on factors
such as the nature of the data, the operations to be performed, and the computational
resources available.
As an SQL developer, you primarily work with relational databases and SQL (Structured
Query Language) to manage and manipulate data. While SQL itself is not a data structure, it
is a language for querying and managing data within a relational database management
system (RDBMS). However, there are data structures and concepts that are closely related
to SQL development:
1. Tables: Tables are the core data structure in SQL. They store data in rows and
columns, where each row represents a record, and each column represents an
attribute or field. SQL developers create and manage tables to organise and store
data.
2. Indexes: Indexes are data structures used to optimise query performance. They
provide fast access to specific rows in a table based on the values in one or more
columns. SQL developers create indexes to speed up data retrieval operations.
3. Views: Views are virtual tables that provide a way to simplify complex queries or
restrict access to certain data. They are defined by SQL queries and can be used to
present a subset of data or join multiple tables together.
4. Indexing Structures: In addition to traditional B-tree indexes, SQL databases may use
other indexing structures like bitmap indexes, hash indexes, and full-text indexes to
optimise specific types of queries.
5. Primary Keys and Foreign Keys: SQL developers use primary keys to uniquely
identify records within a table and foreign keys to establish relationships between
tables. These keys enforce data integrity and help maintain referential integrity.
6. Constraints: Constraints are rules and conditions that are applied to columns or
tables to ensure data integrity. Common constraints include unique constraints,
check constraints, and default constraints.
7. Triggers: Triggers are database objects that automatically execute a set of SQL
statements in response to specific events (e.g., INSERT, UPDATE, DELETE
operations). SQL developers use triggers for tasks like auditing and data validation.
8. Stored Procedures and Functions: SQL developers can create stored procedures
and functions to encapsulate and reuse SQL code. These database objects are
similar to functions in programming languages and can take parameters and return
values.
9. Temporary Tables: Temporary tables are used to store intermediate results within a
session or a specific scope. They are often used in complex queries or in cases
where you need to manipulate data temporarily.
10. Materialised Views: Materialised views are precomputed, persistent query results that
can be refreshed periodically. They are useful for improving query performance when
dealing with complex and computationally expensive queries.
SQL developers work with these data structures and database objects to design, query, and
maintain relational databases effectively. They use SQL statements to interact with these
structures, retrieve data, perform data modifications, and optimise database performance.
Understanding the nuances of these data structures and their relationships is crucial for SQL
developers to work efficiently with relational databases.
Returning to the problem again, you'll find that it deals with a binary tree. SO WHAT IS
BINARY TREE?
A binary tree is a data structure in computer science that consists of nodes linked together in
a hierarchical structure. Each node in a binary tree has at most two children, which are
referred to as the left child and the right child. The topmost node in a binary tree is called the
root node, and it serves as the starting point for traversing or searching the tree. Here are
some key terms and concepts associated with binary trees:
1. Node: Each element in a binary tree is called a node. A node typically contains some
data (often referred to as the node's value or key) and references to its left and right
children (subtrees).
2. Root: The topmost node in a binary tree is called the root. It serves as the starting
point for any tree traversal or search operation.
3. Parent, Child, Sibling: In a binary tree, a node is a parent if it has one or more
children. Children are nodes directly below a parent node. Nodes with the same
parent are called siblings.
4. Leaf Node: A leaf node is a node that has no children, meaning both its left and right
child references are null.
5. Depth: The depth of a node in a binary tree is the length of the path from the root to
that node. The root node has a depth of 0, and its children have a depth of 1, and so
on.
6. Height: The height of a binary tree is the maximum depth of any node in the tree. It
represents the length of the longest path from the root to a leaf node.
7. Binary Search Tree (BST): A special type of binary tree where nodes are organised
in a way that satisfies the binary search property. In a BST, for each node:
a. All nodes in its left subtree have values less than the node's value.
b. All nodes in its right subtree have values greater than the node's value.
Binary trees are widely used in computer science and have various applications, including in
the implementation of data structures like binary search trees and balanced trees (e.g., AVL
trees and Red-Black trees), which facilitate efficient searching, insertion, and deletion of
data. They are also used in parsing expressions, organising hierarchical data, and
representing hierarchical structures like file systems and family trees.
After reading everything about data structures and binary trees, I will start the
explanation.
The problem, in simple terms, tells us that there are two columns, one labelled 'N' and the
other 'P' and both of them contain integer numbers.
Requests:
1. The task in this question is to print the integer numbers in the 'N' column
2. Print the Node Type based on the following:
a. If the integer in 'N' is not found in 'P' then I should print 'Leaf' next to it.
b. If the integer in 'N' is found in 'P' then I should print 'Inner' next to it.
c. If the value in 'P' is unknown (not provided), then I should print 'Root' next to
it.
In this problem, we necessarily deal with conditional statements due to the multiple cases
and outcomes in the problem. Since you are a SQL programmer, you can identify the
necessary tool for this in the language.
We have to write an SQL query that selects values from the 'N' column of a table called
'BST' and applies a conditional statement to determine the node type based on the value of
'P' (parent).
First, I have to check if 'P' is null; if it is, the node is labelled as 'Root,' indicating it's the root
node of a binary tree.
Next, if 'P' is not null, it uses a subquery to count the number of occurrences in the 'BST'
table where the 'P' value matches the 'N' value of the current row (i.e., checking if the node
has children). If this count is greater than 0, the node is labelled as 'Inner,' meaning it's an
inner node with one or more child nodes.
If the count is 0, implying that there are no matching 'N' values in the 'P' column, the node is
labelled as 'Leaf,' indicating it's a leaf node with no children.
Finally, the query orders the results by the 'N' column in ascending order. In essence, this
query efficiently categorises nodes in a binary tree into 'Root,' 'Inner,' or 'Leaf' based on their
relationships with other nodes
Solution:
1. Selecting the 'N' column: The query begins by selecting the values from the 'N'
column of the 'BST' table.
2. CASE statement: The CASE statement is used to evaluate each selected 'N' value
based on specific conditions. It categorises each node as 'Root,' 'Inner,' or 'Leaf'
accordingly.
a. First condition: WHEN P IS NULL THEN 'Root': If the 'P' column (parent) is
null for a particular node, it means that node is a root node, so it's labelled as
'Root.'
b. Second condition: WHEN (SELECT COUNT(*) FROM BST WHERE P = B.N)
> 0 THEN 'Inner': If the 'P' column is not null, the subquery is used to count
how many times the current 'N' value appears in the 'P' column in the same
'BST' table. If the count is greater than 0, it implies that the node has one or
more child nodes and is therefore labelled as 'Inner.'
c. Third condition: ELSE 'Leaf': If none of the above conditions are met, it means
that the node is not a root or an inner node, so it's categorised as a 'Leaf'
node.
3. FROM clause: The query specifies the table 'BST' as 'B' in the FROM clause,
allowing it to use 'B' as an alias for the 'BST' table.
4. ORDER BY clause: The results are ordered by the 'N' column in ascending order.
SELECT N, CASE WHEN P IS NULL THEN 'Root' WHEN (SELECT COUNT(*) FROM
BST WHERE P = B.N) > 0 THEN 'Inner' ELSE 'Leaf' END FROM BST AS B ORDER
BY N;
In summary, this query efficiently identifies and categorises nodes in a binary tree. It labels
them as 'Root' if they have no parent, 'Inner' if they have one or more child nodes, and 'Leaf'
if they are not a root or inner node. The results are then sorted by the 'N' values in ascending
order.
Given the table schemas below, write a query to print the company_code, founder name,
total number of lead managers, total number of senior managers, total number of managers,
and total number of employees. Order your output by ascending company_code.
Note:
Input Format
Company: The company_code is the code of the company and founder is the founder of the
company.
Lead_Manager: The lead_manager_code is the code of the lead manager, and the
company_code is the code of the working company.
Senior_Manager: The senior_manager_code is the code of the senior manager, the
lead_manager_code is the code of its lead manager, and the company_code is the code of
the working company.
Manager: The manager_code is the code of the manager, the senior_manager_code is the
code of its senior manager, the lead_manager_code is the code of its lead manager, and the
company_code is the code of the working company.
Employee: The employee_code is the code of the employee, the manager_code is the code
of its manager, the senior_manager_code is the code of its senior manager, the
lead_manager_code is the code of its lead manager, and the company_code is the code of
the working company.
Explanation:
We can consider this problem as one of the important issues for several reasons, the most
important of which is that it is the first of its kind - as far as I believe - in which we deal with
more than one table.
Now I have to create a request to display information about companies. For each company,
we want to see its unique code, the name of the founder, and the total count of lead
managers, senior managers, regular managers, and all employees combined. The results
should be arranged in ascending order based on the company code.
Since I'm dealing with this issue and working with a database that includes several tables,
and since I need to write a query to extract data from multiple tables, I have to use JOINs.
What is a JOIN in SQL?
In the context of SQL (Structured Query Language), a "JOIN" is used to combine rows from
two or more tables based on a related column between them. It allows you to retrieve data
from multiple tables by specifying how these tables are related. There are different types of
joins in SQL, including:
1. INNER JOIN: Returns only the rows where there is a match in both tables.
2. LEFT JOIN (or LEFT OUTER JOIN): Returns all the rows from the left table and the
matched rows from the right table. If there is no match in the right table, NULL values
are returned.
3. RIGHT JOIN (or RIGHT OUTER JOIN): Returns all the rows from the right table and
the matched rows from the left table. If there is no match in the left table, NULL
values are returned.
4. FULL OUTER JOIN: Returns all the rows when there is a match in either the left or
right table. If there is no match, NULL values are returned for the columns of the
table without a match.
To perform a JOIN operation in SQL, you typically use the JOIN keyword along with the ON
clause to specify the columns on which the tables should be joined. Here's a simple
example:
SELECT * FROM table1 INNER JOIN table2 ON table1.column_name =
table2.column_name;
In your specific case, where you're dealing with data containing multiple tables and need to
write a query to extract data from more than one table, you would use JOINs to combine
these tables based on the relationships between them. The type of JOIN you choose
depends on your specific requirements and the nature of your data.
Solution:
First Part:
In this part of the query, we are selecting several columns and calculating counts:
● c.company_code and c.founder: These columns are selected from the Company
table. c.company_code represents a unique code for each company, and c.founder
represents the founder of the company.
● COUNT(DISTINCT l.lead_manager_code): This counts the distinct lead manager
codes associated with each company. It's obtained by joining the Company table with
the Lead_Manager table and counting the unique lead manager codes.
● COUNT(DISTINCT s.senior_manager_code): Similarly, this counts the distinct senior
manager codes associated with each company. It's based on the relationship
between Lead_Manager and Senior_Manager tables.
● COUNT(DISTINCT m.manager_code): This counts the distinct manager codes
associated with each company. It's based on the relationship between
Senior_Manager and Manager tables.
● COUNT(DISTINCT e.employee_code): Finally, this counts the distinct employee
codes associated with each company. It's calculated by linking the Manager table
with the Employee table.
Second Part:
In this section, we specify the tables involved in the query and define the relationships
between them using JOIN clauses. Here's what each part does:
● FROM Company AS c: Specifies that we are using the Company table and aliasing it
as c for brevity.
● JOIN Lead_Manager AS l ON c.company_code = l.company_code: Joins the
Company table with the Lead_Manager table based on the company_code. This is
used to associate lead managers with their respective companies.
● JOIN Senior_Manager AS s ON l.lead_manager_code = s.lead_manager_code:
Joins the Lead_Manager table with the Senior_Manager table based on the
lead_manager_code. This is used to associate senior managers with lead managers.
● JOIN Manager AS m ON m.senior_manager_code = s.senior_manager_code: Joins
the Senior_Manager table with the Manager table based on the
senior_manager_code. This associates managers with senior managers.
● JOIN Employee AS e ON e.manager_code = m.manager_code: Joins the Manager
table with the Employee table based on the manager_code. This links employees
with their respective managers.
Last Part:
In summary, this SQL query retrieves information about companies, lead managers, senior
managers, managers, and employees, and calculates counts of each type of role associated
with each company. The results are grouped by company and ordered by company code.
Problem Number 14
P(R) represents a pattern drawn by Julia in R rows. The following pattern represents P(5):
*****
****
***
**
*
Write a query to print the pattern P(20).
Explanation:
Yes, this type of question is not limited to the "Introduction to Programming" course, where
you learn C++ in your first year of university. On the contrary, it's quite important, and you will
find many similar questions posed by companies like Google on their official platforms
related to coding problem-solving.
Despite the fact that I consider them beautiful problems, they are not at all difficult. What I
will do, quite simply, is print 20 stars and then subtract one star in the following line, so I print
only 19 stars, and in the next line, I print 18 stars, and so on, continuing this process until I
reach just one star.
Solution:
As a result, this code will produce a sequence of rows, each containing a pattern of stars.
The number of stars in each row decreases from 20 down to 1, creating a descending
pattern.
Problem Number 15
P(R) represents a pattern drawn by Julia in R rows. The following pattern represents P(5):
*
**
***
****
*****
Write a query to print the pattern P(20).
Explanation:
The same previous issue, but here I will start with one star and end the pattern with 20 stars.
Solution:
The result of this query will be a sequence of rows, each containing a pattern of asterisks.
The number of asterisks in each row will increment from 1 to 20, creating an ascending
pattern.
Problem Number 16
Write a query to print all prime numbers less than or equal to . Print your result on a single
line, and use the ampersand () character as your separator (instead of a space).
Solution:
Then, I will filter the numbers to ensure they are less than or equal to 1000 and, importantly,
exclude numbers that are not prime. I will do this by using another subquery 'tatata' that
generates numbers from 1 to 1000 and checks if each number is a factor of any other
number in the range. If a number is not a factor of any other number and it's greater than 1, it
is considered prime and included in the final result.
The 'GROUP_CONCAT' function is used to concatenate these prime numbers into a single
string with '&' as the separator, resulting in a comma-separated list of prime numbers up to
1000.
Congrats majdi! You just solved all the Alternative Queries challenges in SQL!
Problem Number 17
Query the Western Longitude (LONG_W)where the smallest Northern Latitude (LAT_N) in
STATION is greater than . Round your answer to decimal places.
Input Format
This question, despite its simplicity, is relatively important. While it may not add much to your
knowledge in SQL or programming, it alerts you to a critically important point, which is the
need to read the problem carefully and understand the data and requirements well before
starting to write the solution. Even a problem as simple as this one could be a reason for
your exclusion from a job or your failure in an academic exam. All you really need to learn
from this question is the necessity of ensuring your thorough understanding of the question
before proceeding with solving it.
Solution:
This SQL query retrieves data from the "STATION" table based on specific conditions and
returns a single result. Let's break down the query step by step:
1. SELECT ROUND(LONG_W, 4): This part of the query selects the "LONG_W"
(Western Longitude) column from the "STATION" table and rounds its values to four
decimal places. This is done to ensure that the longitude values in the result are
formatted with a specific precision.
2. FROM STATION: This specifies the table from which the data is being retrieved,
which is the "STATION" table.
3. WHERE LAT_N > 38.7780: Here, a filter condition is applied to the rows in the
"STATION" table. It selects only those rows where the "LAT_N" (Northern Latitude)
value is greater than 38.7780. This condition restricts the query to retrieve rows with
latitude values above this threshold.
4. ORDER BY LAT_N: This clause orders the filtered rows based on the "LAT_N"
column in ascending order. In other words, it sorts the rows by their Northern Latitude
values in ascending order.
5. LIMIT 1: Finally, this part of the query limits the result set to just one row, specifically
the row with the smallest Northern Latitude value that meets the condition specified
in the WHERE clause.
So, in summary, this query retrieves the Western Longitude (LONG_W) from the "STATION"
table for the row with the smallest Northern Latitude (LAT_N) value greater than 38.7780.
The result will be a single value representing the rounded Western Longitude of the station
with the smallest qualifying Northern Latitude.
Problem Number 18
Consider P1(a, b) and P2(c, d) to be two points on a 2D plane.
Query the Manhattan Distance between points and and round it to a scale of decimal
places.
Input Format
Explanation:
Another simple problem, but this time it confirms for the eighteenth time your understanding
of the fundamentals of SQL and your ability to use its tools and functions effectively. Another
point I would like to mention before explaining the solution is the importance of your
knowledge as a programmer in applying and implementing mathematical equations
programmatically.
Developers benefit immensely from a strong grasp of mathematics and the ability to
implement mathematical formulas using programming languages. This proficiency enables
them to excel in problem-solving across various domains. They can effectively address
real-world challenges by translating complex problems into code, developing and optimising
algorithms, and employing mathematical models for data analysis. Whether in scientific
simulations, engineering projects, financial modelling, or data science tasks, mathematical
competence ensures precision and accuracy in calculations and outcomes. Moreover, it
fosters innovation, as developers can create novel solutions and algorithms that provide
unique insights and efficiencies.
Now, in order to solve this problem I have to write an SQL query to calculate the sum of the
absolute differences between the minimum and maximum values of latitude (LAT_N) and
longitude (LONG_W) from the "station" table. It does this by first finding the minimum and
maximum values for both latitude and longitude separately. Then, it calculates the absolute
difference for each pair (minimum and maximum) and adds these absolute differences together.
Finally, the result is rounded to four decimal places. Essentially, this query computes a measure
of the total geographic distance, in terms of latitude and longitude, spanned by the data in the
"station" table, providing a single numeric value representing that distance.
Solution:
This SQL query calculates a single numeric value representing the sum of absolute
differences in latitude (LAT_N) and longitude (LONG_W) values within the "station" table. It
does this through the following steps:
1. MIN(LAT_N) and MAX(LAT_N): These functions find the minimum and maximum
values of the latitude (LAT_N) column in the "station" table.
2. MIN(LONG_W) and MAX(LONG_W): Similarly, these functions determine the
minimum and maximum values of the longitude (LONG_W) column.
3. abs(MIN(LAT_N) - MAX(LAT_N)) calculates the absolute difference between the
minimum and maximum latitude values.
4. abs(MIN(LONG_W) - MAX(LONG_W)) calculates the absolute difference between
the minimum and maximum longitude values.
5. These two absolute differences are then added together to yield a single numeric
result.
6. Finally, the round function is used to round this numeric result to four decimal places.
In summary, the query computes the sum of the absolute differences between the minimum
and maximum latitude and longitude values in the "station" table, providing a single numeric
value that represents the total geographic span covered by the latitude and longitude
coordinates in the dataset.
Congrats!
You have earned your 4th star.
Problem Number 19
Consider P1(a,c) and P2(b,d) to be two points on a 2D plane where are the respective
minimum and maximum values of Northern Latitude (LAT_N) and are the respective
minimum and maximum values of Western Longitude (LONG_W) in STATION.
Query the Euclidean Distance between points and and format your answer to display
decimal digits.
Input Format
The translation of your Arabic text to English is: "Another question in a row asks me to solve
a mathematical equation, this time the equation is a bit more complex, but it's still within
simple mathematics and doesn't require a high level of understanding.
Solutions:
The provided SQL query calculates the Euclidean distance between points in a database
represented by latitude and longitude coordinates. It achieves this by first determining the
range of latitudes and longitudes, calculated as the difference between the maximum and
minimum values in their respective columns (LAT_N and LONG_W). These ranges are
squared using the POWER function, and the squared differences in latitude and longitude
are then summed together. The SQRT function is applied to the sum of squared differences,
yielding the Euclidean distance, which measures the straight-line distance between two
points in a two-dimensional space. The query further rounds this calculated distance to four
decimal places using the ROUND function. In essence, the query calculates the Euclidean
distance between the maximum and minimum latitude and longitude values in the STATION
table and returns the result rounded to four decimal places.
Problem Number 20
A median is defined as a number separating the higher half of a data set from the lower half.
Query the median of the Northern Latitudes (LAT_N) from STATION and round your answer
to decimal places.
Input Format
The calculation will begin by identifying the maximum and minimum values within the LAT_N
(latitude north) and LONG_W (longitude west) columns of the STATION table. This will be
done using the MAX and MIN functions. These extremes are essential for determining the
range of latitude and longitude values.
Next, the query will utilise the POWER function to square these latitude and longitude
ranges. The POWER function raises a number to a specified power, and in this case, it will
be used to square the differences.
Following that step, the squared latitude and longitude differences will be summed together.
This addition will yield the sum of the squared differences.
To obtain the final Euclidean distance, the query will employ the SQRT function. This
function computes the square root of a value and will operate on the sum of squared
differences. This final step will effectively calculate the straight-line distance between the
geographical points.
To ensure the precision of the result, the ROUND function will be applied. It will round the
Euclidean distance to display only four decimal places, making the result suitable for various
geographic calculations and analyses.
In summary, the forthcoming SQL query will expertly leverage SQL functions such as
ROUND, SQRT, POWER, MAX, and MIN to systematically compute and present the
Euclidean distance between geographic points represented by latitude and longitude
coordinates within the STATION table. The result will be a neatly rounded distance value,
which can be invaluable for a wide range of geographic calculations and analyses.
Solution:
In summary, this query calculates the average of the LAT_N values for rows in the STATION
table that are closest to the midpoint when the LAT_N values are sorted in ascending order.
The @rowIndex variable is used to keep track of the row's position when sorting, and the
ROUND function ensures that the average is rounded to four decimal places.
Problem Number 21
Query the total population of all cities in CITY where District is California.
Input Format
Explanation:
It's a simple question, and I'm surprised to see it at this level, but perhaps HackerRank
wants to lighten the mood after the previous challenging question.
Solution:
This query is designed to retrieve the total population of cities located in the 'California'
district. Here's an explanation of each part of the query:
In summary, the query calculates the total population of all cities located in the 'California'
district and presents that sum as 'TotalPopulation.' It retrieves this information from the 'city'
table by filtering the rows based on the 'district' column's value.
Problem Number 22
Query the average population of all cities in CITY where District is California.
Input Format
No need to explain anything for this problem, check the previous problem and you will
understand everything:
Problem Number 23
Query the average population for all cities in CITY, rounded down to the nearest integer.
Input Format
Explanation:
It seems that there is something specific that is being sought or something that is on the
minds of those overseeing this level of questions on HackerRank. Regardless of their
objectives in asking such simple questions at this level, it is my duty to elaborate as much as
possible.
In SQL, the "ROUND" function is used to round numeric values to a specified number of
decimal places or to the nearest whole number. It is a mathematical function that can be
applied to numerical columns or expressions within a SQL query. There are two main types
of rounding in SQL:
1. Standard ROUND:
a. The standard ROUND function is used to round a numeric value to a
specified number of decimal places. It takes two arguments: the first is the
numeric value you want to round, and the second is the number of decimal
places to which you want to round the value. For example, if you have a
number like 3.456 and you use ROUND(3.456, 2), it will round to 3.46,
keeping two decimal places.
2. ROUND with an Optional Second Argument:
a. Some SQL database systems also allow you to use the ROUND function with
just one argument, in which case it will round the value to the nearest whole
number. For instance, ROUND(3.456) would round to 3.
Solution:
1. SELECT: This is the SQL keyword used to retrieve data from a database.
2. FLOOR: This is a mathematical function in SQL. In this context, it is used to round
down a numeric value to the nearest integer. It's applied to the result of the AVG
function in this query.
3. AVG(population): This is an aggregate function in SQL that calculates the average
value of the "population" column from the "city" table. It sums up all the population
values in the table and divides the sum by the total number of rows to get the
average population.
4. FROM city: This specifies the source of the data for the query. It tells SQL to retrieve
data from the "city" table.
So, in summary, the query is asking the database to calculate the average population of all
cities in the "city" table and then round down that average to the nearest integer using the
FLOOR function. The result of this query will be a single integer representing the average
population of cities in the "city" table.
Problem Number 24
Query the sum of the populations for all Japanese cities in CITY. The COUNTRYCODE for
Japan is JPN.
Input Format
I really don't know why this question appeared to me at level five, and I don't think the
following query needs any explanation.
Problem Number 25
Query the difference between the maximum and minimum populations in CITY.
Input Format
Solution:
I really don't know why this question appeared to me at level five, and I don't think the
following query needs any explanation.
Problem Number 26
Samantha was tasked with calculating the average monthly salaries for all employees in the
EMPLOYEES table, but did not realise her keyboard's 0 key was broken until after
completing the calculation. She wants your help finding the difference between her
miscalculation (using salaries with any zeros removed), and the actual average salary.
Write a query calculating the amount of error (i.e.: actual - miscalculated average monthly
salaries), and round it up to the next integer.
Input Format
Explanation:
I need to calculate the ceiling value of the average salary for employees, but before doing
so, I remove any zero values from their salaries. After this adjustment, I subtract the
resulting average salary from the average salary calculated using the original data. In
essence, I'm determining the difference between the average salary with zero values
excluded and the average salary with the original values intact, and then rounding the final
result up to the nearest whole number for the entire employee dataset.
Solution:
1. SELECT: This keyword is used to indicate that we want to retrieve data from the
database.
2. CEIL(): This is a mathematical function that rounds up a number to the nearest
integer. In this query, it will be applied to the result of the calculations inside the
parentheses.
3. AVG(salary): This part calculates the average salary of all the employees in the
"employees" table. The AVG function takes all the salary values, adds them together,
and divides by the total number of employees to get the average salary.
4. AVG(REPLACE(salary, '0', '')): This part calculates the average salary after
performing a string replacement operation. It replaces all occurrences of the
character '0' in the "salary" column with an empty string, effectively removing '0' from
the salaries. Then, it calculates the average of these modified salaries.
5. FROM employees: This specifies that we are performing these calculations on the
"employees" table.
So, the entire query calculates the average salary of employees in two different ways:
Afterward, it subtracts the second average from the first one and rounds the result up to the
nearest integer using the CEIL function. The result will be the rounded-up difference
between the two average salary values.
Problem Number 27
We define an employee's total earnings to be their monthly Salary * Months worked, and the
maximum total earnings to be the maximum total earnings for any employee in the
Employee table. Write a query to find the maximum total earnings for all employees as well
as the total number of employees who have maximum total earnings. Then print these
values as 2 space-separated integers.
Input Format
The Employee table containing employee data for a company is described as follows:
Explanation:
I need to craft a query in which we'll be working with an "Employee" table that stores
employee data for a company. The main goal is to identify the highest total earnings among
all employees and determine how many employees earn this maximum amount. This
calculation is based on multiplying each employee's monthly salary by the number of months
they've worked. To accomplish this, the query leverages SQL functions such as MAX and
COUNT. Initially, it computes the maximum total earnings across all employees by
multiplying their individual salary and months worked, selecting the greatest value.
Subsequently, it tallies the number of employees whose total earnings equal this maximum
value. The query's outcome is then presented as two integers separated by a space: the
maximum total earnings for all employees and the count of employees who share this
maximum total earnings.
Solution:
1. In this query, we are working with a table called "Employee," which contains data
about employees in a company.
2. The task is to find the maximum total earnings for all employees and the total number
of employees who have the maximum total earnings.
3. To calculate an employee's total earnings, we multiply their monthly salary by the
number of months they have worked.
4. We use the MAX function to find the highest total earnings among all employees.
This is done by multiplying "Salary" and "Months" for each employee and selecting
the maximum value.
5. We also use the COUNT function to count the number of employees who have
maximum total earnings. This is achieved by comparing each employee's total
earnings to the maximum total earnings found in step 4 and counting those who
match.
6. The query returns two values:
a. max_earnings: The maximum total earnings for all employees.
b. max_earnings_count: The total number of employees who have maximum
total earnings.
7. The result is printed as two space-separated integers, as required by the problem
statement.
Problem Number 28
Query the following two values from the STATION table:
This query calculates the rounded sum of latitude (lat_n) and longitude (long_w) values from
the "station" table. The SUM function is used to add up all the latitudes and longitudes in the
table, and the ROUND function is applied to round the result to two decimal places. This
query will return the total sum of latitudes and longitudes in the "station" table, each rounded
to two decimal places.
Problem Number 29
Query the sum of Northern Latitudes (LAT_N) from STATION having values greater than
and less than . Truncate your answer to decimal places.
Input Format
Solution:
This query calculates the rounded sum of latitude (LAT_N) values from the "STATION" table.
It specifically considers latitude values greater than 38.7880 and less than 137.2345. The
SUM function adds up all the latitudes within this range, and the ROUND function is applied
to round the result to four decimal places.
Problem Number 30
Query the greatest value of the Northern Latitudes (LAT_N) from STATION that is less than
137.2345. Truncate your answer to 4 decimal places.
Input Format
Solution:
This query calculates the maximum latitude (LAT_N) value from the "STATION" table,
specifically for latitude values less than 137.2345, and then rounds this maximum value to
four decimal places.
To complete all the questions related to the SQL on HackerRank, you need to solve 58
questions. However, the 30 questions discussed in this document are the questions that start
from level two and go up to level five. This is because the first level primarily relies on the
student's understanding of the fundamentals of the language and how to use its very basic
tools.
I have tried hard in this document to provide a detailed explanation that is not limited to just
the questions and their answers. I have also attempted to use simple language to make it
easier for students. Now, in this document, I will move on to the section about obtaining
certifications from the HackerRank website. The website offers three certifications that are
distributed across three levels. I recommend everyone, especially students, to obtain them
for academic, professional, and knowledge-based reasons.
Obtaining all SQL HackerRank certificates is highly significant for students due to the
foundational knowledge they provide in SQL, a fundamental skill in today's data-driven
world. These certifications not only equip students with essential data management skills but
also enhance their employability, making them attractive candidates for roles in data
analysis, database management, and software development. The certificates serve as a
valuable addition to a student's portfolio or resume, showcasing their commitment to learning
and practical SQL expertise. Furthermore, they open up diverse career opportunities in
various industries and offer a competitive advantage in a crowded job market. As students
progress in their careers, these certifications can form the basis for further professional
growth and specialisation in SQL and related fields. Recognized globally, HackerRank
certifications hold weight and provide students with the confidence to excel in both academic
pursuits and real-world applications, making them a valuable asset for those embarking on a
career in data and technology.
I will provide you with tutorial videos for all exams with this document.