CODEWARS : TOP 25 SQL INTERVIEW QUESTIONS TO ACE THE DATA ANALYST INTERVIEW
--
This Article is originally published on Vizartpandey
Whether you are a beginner starting your engineering career, or you are an experienced data engineer or data analyst, or data scientist, knowledge of SQL syntax is a must, after all, SQL is the unsung hero of data analytics and data science. There’s so much to talk about just in regards to this one language that it deserves its own place. People may say that SQL is dead, but the reality is that there is no system to replace it currently. There are many very capable NoSQL stores that do their jobs very well, supporting massive scale out with low costs. However, they don’t replace high-quality SQL-based stores — they complement them. The ACID properties of SQL make it a highly reliable way to model data relatively naturally.
If this is your first time interviewing for a data science/data Analyst role, going through this article should give you a solid idea of what to expect and how to prepare. But even if you’ve done this all before, I’m hoping this blog will be useful in focusing your preparation on the skills that are most likely to be tested.
Before I start, let me give you an overview of Codewars. It is an educational community for computer programming. On the platform, software developers are trained on programming challenges known as kata. These discrete programming exercises train a range of skills in a variety of programming languages and are completed within an online integrated development environment.
NOTE: Some database systems require a semicolon to be inserted at the end of each statement. The semicolon is the standard way to indicate the end of each SQL statement. I will be using PostgreSQL for the examples, which requires a semicolon at the end of each statement.
INTERVIEW TASK 01
For this challenge, you need to create a basic Age Calculator SQL function that calculates the age in years on the age field of the people’s table. The function should be called agecalculator
, it needs to take 1 date and calculate the age in years according to the date NOW
and must return an integer.
You may query the people table while testing but the query must only contain the function on your final submission.
TABLE SCHEMA
PEOPLE
A table of persons with their name, age, and id
NAMETYPEidINTnameVARCHAR(100)ageINT
CREATE FUNCTION agecalculator (d timestamp)
RETURNS integer AS $$ --Dollar-quoted String It's something like quotation marks.
BEGIN
RETURN (SELECT EXTRACT(year FROM age(d)))::int;
END;
$$ LANGUAGE plpgsql;
INTERVIEW TASK 02
For this challenge you need to create a SELECT statement, this SELECT statement will use EXISTS to check whether a department has had a sale with a price over 98.00 dollars.
TABLE SCHEMA
DEPARTMENTS
A list of departments and their ID’s
NAMETYPEidINTnameVARCHAR(100)
SALES
Properties of a sale
NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(100)transaction_dateDATE()
SELECT id, name
FROM departments
WHERE EXISTS
(SELECT * FROM departments
INNER JOIN sales
ON departments.id=sales.department_id
WHERE price > 98.00)
INTERVIEW TASK 03
Find what products need restocking from CompanyA. An item needs restocking if there are only 2 or fewer of it left in stock.
Order the results by-products id
.
TABLE SCHEMA
PRODUCTS
A table of products, including their price, amount of stock, and company they are from
NAMETYPEidINTnameVARCHAR(100)priceDECIMALstockINTproducentTEXT
SELECT id, name, stock FROM products WHERE producent='CompanyA' AND Stock <= 2 order by id asc;
INTERVIEW TASK04
Given a posts
the table that contains a created_at timestamp column writes a query that returns the first date of the month, the number of posts created in a given month, and a month-over-month growth rate.
The resulting set should be ordered chronologically by date
.
Note:
- percent growth rate can be negative
- the percent growth rate should be rounded to one digit after the decimal point and immediately followed by a percent symbol “%”. See the desired output below for reference.
DESIRED OUTPUT
The resulting set should look similar to the following:
date | count | percent_growth
-----------+-------+---------------
2016-02-01 | 175 | null
2016-03-01 | 338 | 93.1%
2016-04-01 | 345 | 2.1%
2016-05-01 | 295 | -14.5%
2016-06-01 | 330 | 11.9%
...
date
– (DATE) first date of the monthcount
– (INT) a number of posts in a given monthpercent_growth
– (TEXT) a month-over-month growth rate expressed in percent
select date_trunc('month', created_at)::date as date,
count(*) as count,
round (
100.0 * (count(*) - lag(count(*), 1) over (order by date_trunc('month', created_at)::date asc)) /
lag(count(*), 1) over (order by date_trunc('month', created_at)::date asc)
,1 )
|| '%' as percent_growth
from posts
group by date
order by date asc;
INTERVIEW TASK06
Return a table that shows a count of each race represented within the table. Order your results by the number of individuals that represent each race in descending order.
TABLE SCHEMA
DEMOGRAPHICS
A table of persons with demographic details
NAMETYPEidINTnameVARCHAR(100)birthdayDATEraceVARCHAR(100)
select count(*) as count, race from demographics group by race order by count desc;
INTERVIEW TASK07
For this challenge you need to create a simple GROUP BY statement, you want to group all the people by their age and count the people who have the same age.
TABLE SCHEMAS
PEOPLE
A list of people with their names and age
NAMETYPEidINTnameSTRINGageINT
select age, count(*) as people_count from people group by age;
INTERVIEW TASK08
There is an events
the table is used to track different key activities taken on a website. For this task, you need to filter the name
field to only show “trained” events. Events should be grouped by the day they happened and counted. The description
field is used to distinguish which items the events happened on.
Along with the results, a time series chart will also be provided which will consume the data and be used to visualize the data. Create your query to be ready to be consumed by the chart, such as ensuring the dates are in order.
TABLE SCHEMA
EVENTS
A table of events and the details of each event.
NAMETYPEidINTnameSTRINGcreated_atDateTimedescriptionString
Select date_trunc('day',created_at) as day , description , Count(*) as Count From events where name = ‘trained'
Group by date_trunc('day',created_at),description
INTERVIEW TASK09
For this challenge you need to create a simple HAVING statement, you want to count how many people have the same age and return the groups with 10 or more people who have that age.
TABLE SCHEMA
PEOPLE
A table of people and their age
NAME TYPE idINTnameVARCHAR(100)ageINT
SELECT age, COUNT(*) AS total_people FROM people GROUP BY age
HAVING COUNT(*) >= 10;
INTERVIEW TASK10
For this challenge you need to create a SELECT statement, this SELECT statement will use an IN to check whether a department has had a sale with a price over 98.00 dollars.
TABLE SCHEMAS
DEPARTMENTS
A list of department names and their IDs
NAMETYPEidINTnameVARCHAR(100)
SALES
A table of sales and its details
NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE
SELECT id, name FROM departments WHERE id IN (SELECT department_id FROM sales WHERE (price) > 98.00)
INTERVIEW TASK11
For this challenge, you need to create a simple SELECT statement that will return all columns from the products table and join to the companies table so that you can return the company name.
TABLE SCHEMA
PRODUCTS
A table of product’s id, name, ISBN, related company_id, and price
NAMETYPEidINTnameVARCHAR(100)company_idINTpriceDECIMAL
COMPANIES
A table of companies and their IDs
NAMETYPEidINTnameVARCHAR(100)
SELECT p.id, p.name , p.isbn , p.company_id , p.price , c.name AS company_name FROM products AS p
LEFT JOIN companies AS c
ON p.company_id = c.id
INTERVIEW TASK12
For this challenge, your task is to create a simple SELECT statement that will return all columns from the people
table, then join that result set to the sales
table so that you can return the COUNT of all sales and RANK each person by their sale_count
. In case of ties, sort by id
ascending.
TABLE SCHEMAS
PEOPLE
A table of people and their IDs
NAMETYPEidINTnameVARCHAR(100)
SALES
A list of sales and associated details
NAMETYPEidINTpeople_idINTsaleTEXTpriceINT
SELECT people.id,people.name, count(sales.id) AS sale_count,
rank () OVER ( ORDER BY count(sales.id) desc ) sale_rank
FROM people JOIN sales ON people.id = people_id
GROUP BY people.id
order by sale_count desc,people.id asc;
INTERVIEW TASK13
For this challenge, you need to create a simple SELECT statement that will return all columns from the people
table, and join to the toys
table so that you can return the COUNT of the toys as toy_count
.
TABLE SCHEMAS
PEOPLE
A list of people and their IDs
NAMETYPEidINTnameVARCHAR(100)
TOYS
A list of toys and the people’s IDs they belong to.
NAMETYPEidINTnameVARCHAR(100)people_idINT
SELECT p.id, p.name, count(*) as toy_count FROM people p JOIN toys t
ON p.id = t.people_id
GROUP BY p.id, p.name;
INTERVIEW TASK14
For this challenge, you need to create a simple SELECT statement. Your task is to calculate the MIN, MEDIAN, and MAX scores of the students from the results table.
TABLE SCHEMA
STUDENT
A list of students and their IDs
NAMETYPEidINTnameVARCHAR(100)
RESULT
A table of resulting test scores
NAMETYPEidINTstudent_idINTscoreINT
SELECT min(score),
ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY score)::numeric, 2)::float AS median,
max(score)
FROM result;
INTERVIEW TASK15
For this challenge, you need to create a simple MIN / MAX statement that will return the Minimum and Maximum ages out of all the people.
TABLE SCHEMA
PEOPLE
NAMETYPEidINTnameVARCHAR(100)ageINT
SELECT min(age) AS age_min, max(age) AS age_max FROM people;
INTERVIEW TASK16
Return a single column table with the results of number1 modulus number2.
TABLE SCHEMA
DECIMALS
A table with two columns of floats and a column for the id
NAMETYPEidINTnumber1INTnumber2INT
SELECT mod(number1, number2) AS mod
FROM decimals;
INTERVIEW TASK17
For this challenge, you’ll pivot a table. You’re given two tables, products
and details
. Your task is to pivot the rows in products
to produce a table of products
which have columns of their details
. Group and order by the name of the product.
TABLE SCHEMA
PRODUCTS
A table of products and their IDs
NAMETYPEidINTnameVARCHAR(100)
DETAILS
A list of details from their related products
NAMETYPEidINTproduct_idINTdetailVARCHAR(5)
CREATE EXTENSION tablefunc;SELECT * FROM crosstab ( 'SELECT p.name, detail, COUNT(d.id) FROM products p JOIN details d ON p.id = d.product_id GROUP BY p.name, d.detail ORDER BY 1,2')AS ct (name text, bad bigint, good bigint, ok bigint)
INTERVIEW TASK18
Create a query to calculate how many hours each department has clocked per day.
Order results by day then by department_name.
Assume all shift hours are allocated to the login day, even if the logout is after midnight.
TABLE SCHEMA
select t.login::date as day, d.name as department_name, sum(date_part('hour', age(t.logout, t.login))) as total_hours
from timesheet t
Inner join department d
on t.department_id = d.id
group by day, department_name
order by day, department_name
INTERVIEW TASK19
For this challenge, you will be using a DVD rental database. See below for details on this database.
You are working for a company that wants to reward its best customers with a free gift. You have been asked to generate a simple report that returns the top 10 customers by selected by the total amount spent (from the payments table). The total count of payments and the contact email for each of these customers should also be included in the query.
The query should retrieve the following columns:
- customer_id (int4)
- email (varchar)
- payments_count (int)
- total_amount (float)
and should return rows showing the 10 top-spending customers ordered descending by the total amount spent.
select c.customer_id as customer_id, c.email as email, count(*) as payments_count,
(sum(p.amount)::float)as total_amount
from customer c join payment p
on
c.customer_id = p.customer_id
GROUP BY c.customer_id, c.email
ORDER BY total_amount desc
fetch first 10 rows only
INTERVIEW TASK20
For this challenge, you need to return a single column that is the result of number1 + number2 truncated towards zero.
TABLE SCHEMA
DECIMALS
A table with a column for id and 2 columns of double-precision integers.
NAMETYPEidINTnumber1DOUBLE PRECISIONnumber2DOUBLE PRECISION
SELECT trunc(number1 + number2) AS towardzero
FROM decimals;
INTERVIEW TASK21
Given the schema presented below, write SQL to find two actors who cast together the most and list the titles of the movies they appeared in together. Order the result set alphabetically.
TABLE FILM_ACTOR
Column | Type | Modifiers
------------+-----------------------------+----------
actor_id | smallint | not null
film_id | smallint | not null
...
TABLE ACTOR
Column | Type | Modifiers
------------+-----------------------------+----------
actor_id | integer | not null
first_name | character varying(45) | not null
last_name | character varying(45) | not null
...
TABLE FILM
Column | Type | Modifiers
------------+-----------------------------+----------
film_id | integer | not null
title | character varying(255) | not null
...
THE DESIRED OUTPUT:
first_actor | second_actor | title
------------+--------------+--------------------
John Doe | Jane Doe | The Best Movie Ever
...
first_actor
– Full name (First name + Last name separated by a space)second_actor
– Full name (First name + Last name separated by a space)title
– Movie title
Note: the actor_id
of the first_actor should be lower than the actor_id
of the second_actor.
-- Replace with your SQL Queryselect
Select
concat(a1.first_name, ' ', a1.last_name) as second_actor,
concat(a2.first_name, ' ', a2.last_name) as first_actor,
f.title
from (
select
fa1.actor_id as actor_a_id,
fa2.actor_id as actor_b_id,
count(fa1.film_id) as casted_together,
array_agg(fa1.film_id) as film_ids
from film_actor fa1
join film_actor fa2
on fa1.film_id = fa2.film_id
and fa1.actor_id > fa2.actor_id
join film f
on fa1.film_id = f.film_id
group by fa1.actor_id, fa2.actor_id
order by casted_together desc
limit 1
) s
join actor a1 on a1.actor_id = s.actor_a_id
join actor a2 on a2.actor_id = s.actor_b_id
join film f on f.film_id = any(s.film_ids)
INTERVIEW TASK22
For this challenge you need to create a UNION statement, there are two tables ussales
and eusales
the parent company tracks each sale at its respective location in each table, you must all filter the sale price so it only returns rows with a sale greater than 50.00
. You have been tasked with combining that data for future analysis.
Order by the US than EU sales as location
.
TABLE SCHEMAS
USSALES,EUSALES
The tables of US sales and EU sales with their related details
NAMETYPEidINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE
ALTER TABLE ussales
ADD COLUMN location varchar(255)
DEFAULT 'US';ALTER TABLE eusales
ADD COLUMN location varchar(255)
DEFAULT 'EU';SELECT * FROM ussales
WHERE price > 50.00
UNION ALL
SELECT * FROM eusales
WHERE price > 50.00
ORDER BY location desc, id;
INTERVIEW TASK23
For this challenge, your task is to create a VIEW. This VIEW is used by a sales store to give out vouchers to members who have spent over $1000 in departments that have brought in more than $10000 total ordered by the member’s id. The VIEW must be called members_approved_for_voucher
. Create a SELECT query using this view that extracts the expected results.
TABLE SCHEMA
SALES
The related table for departments, products, and members
NAMETYPEidINTdepartment_idINTproduct_idINTmember_idINTtransaction_dateDATE
CREATE VIEW members_approved_for_voucher ASSELECT m.id, m.name, m.email, SUM(p.price) as total_spending
FROM members m, sales s, products p
WHERE m.id = s.member_id
AND s.product_id = p.id
AND s.department_id IN ( SELECT s.department_id
FROM products p, sales s
WHERE s.product_id = p.id
GROUP BY s.department_id
HAVING SUM(p.price) > 10000)GROUP BY m.id
HAVING SUM(p.price) > 1000
ORDER BY m.id;SELECT * from members_approved_for_voucher;
INTERVIEW TASK24
For this challenge, you need to create a simple SELECT statement that will return all columns from the people
table with age over 50. Sort the results by age descending.
TABLE SCHEMA
PEOPLE
People’s table with id, name, and age
NAMETYPEidINTnameVARCHAR(100)ageINT
SELECT * FROM people
WHERE age > 50
ORDER BY age DESC;
INTERVIEW TASK25
For this challenge you need to create a SELECT statement, this SELECT statement will use an IN to check whether a department has had a sale with a price over 90.00 dollars BUT the SQL MUST use the WITH statement which will be used to select all columns from sales
where the price is greater than 90.00, you must call this sub-query special_sales
.
TABLE SCHEMA
DEPARTMENTS
A list of department’s names and their IDs
NAMETYPEidINTnameVARCHAR(100)
SALES
A list of sales, their properties, and related department ID
NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE
WITH special_sales AS (SELECT * FROM sales WHERE (price) > 90.00)
SELECT id, name FROM departments
WHERE id IN ( SELECT department_id
FROM sales
WHERE (price) > 90.00);
In this article, we have discussed some standard SQL problems that you might come across in your next SQL tech interview. These problems are based on my personal experiences and the same concepts can be tested using other questions. The literal questions don’t matter.
At last, I would also like to share some cool advice from Carson Forter who recently published an article on Medium
We tried to cover as much as we could for a newbie to get started with the Specialist exam. As always, We welcome feedback and constructive criticism. We can be reached on Twitter at @rajvivan and @pallavinaik_ . If you enjoyed this blog, we’d love for you to hit the share button so others might stumble upon it. Please hit the subscribe button as well if you’d like to be added to my once-weekly email list, and don’t forget to follow Vizartpandey on Instagram!
Also, here are a few hand-picked articles for you to read next:
- How to extract Tableau field information using Python API
- Extended bar chart in Tableau
- Taking your design to next level in Tableau
- Create custom maps in Tableau
- Drop Lines using Parameter Actions
- Toggle Button using Parameter Actions
- Overview of Alteryx Architecture
- #PreppinData Challenge S01E03
- How to use Buffer Spatial Function In Tableau