CODEWARS : TOP 25 SQL INTERVIEW QUESTIONS TO ACE THE DATA ANALYST INTERVIEW

Rajeev Pandey
12 min readMay 13, 2022

--

This Article is originally published on Vizartpandey

Whether you are a beginner starting your engineering career, or you are an experienced data engineer or data analyst, or data scientist, knowledge of SQL syntax is a must, after all, SQL is the unsung hero of data analytics and data science. There’s so much to talk about just in regards to this one language that it deserves its own place. People may say that SQL is dead, but the reality is that there is no system to replace it currently. There are many very capable NoSQL stores that do their jobs very well, supporting massive scale out with low costs. However, they don’t replace high-quality SQL-based stores — they complement them. The ACID properties of SQL make it a highly reliable way to model data relatively naturally.

If this is your first time interviewing for a data science/data Analyst role, going through this article should give you a solid idea of what to expect and how to prepare. But even if you’ve done this all before, I’m hoping this blog will be useful in focusing your preparation on the skills that are most likely to be tested.

Before I start, let me give you an overview of Codewars. It is an educational community for computer programming. On the platform, software developers are trained on programming challenges known as kata. These discrete programming exercises train a range of skills in a variety of programming languages and are completed within an online integrated development environment.

NOTE: Some database systems require a semicolon to be inserted at the end of each statement. The semicolon is the standard way to indicate the end of each SQL statement. I will be using PostgreSQL for the examples, which requires a semicolon at the end of each statement.

INTERVIEW TASK 01

For this challenge, you need to create a basic Age Calculator SQL function that calculates the age in years on the age field of the people’s table. The function should be called agecalculator, it needs to take 1 date and calculate the age in years according to the date NOW and must return an integer.

You may query the people table while testing but the query must only contain the function on your final submission.

TABLE SCHEMA

PEOPLE

A table of persons with their name, age, and id

NAMETYPEidINTnameVARCHAR(100)ageINT

CREATE FUNCTION agecalculator (d timestamp) 
RETURNS integer AS $$ --Dollar-quoted String It's something like quotation marks.
BEGIN
RETURN (SELECT EXTRACT(year FROM age(d)))::int;
END;
$$ LANGUAGE plpgsql;

INTERVIEW TASK 02

For this challenge you need to create a SELECT statement, this SELECT statement will use EXISTS to check whether a department has had a sale with a price over 98.00 dollars.

TABLE SCHEMA

DEPARTMENTS

A list of departments and their ID’s

NAMETYPEidINTnameVARCHAR(100)

SALES

Properties of a sale

NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(100)transaction_dateDATE()

SELECT id, name
FROM departments
WHERE EXISTS
(SELECT * FROM departments
INNER JOIN sales
ON departments.id=sales.department_id
WHERE price > 98.00)

INTERVIEW TASK 03

Find what products need restocking from CompanyA. An item needs restocking if there are only 2 or fewer of it left in stock.

Order the results by-products id.

TABLE SCHEMA

PRODUCTS

A table of products, including their price, amount of stock, and company they are from

NAMETYPEidINTnameVARCHAR(100)priceDECIMALstockINTproducentTEXT

SELECT id, name, stock FROM products WHERE producent='CompanyA' AND Stock <= 2 order by id asc;

INTERVIEW TASK04

Given a posts the table that contains a created_at timestamp column writes a query that returns the first date of the month, the number of posts created in a given month, and a month-over-month growth rate.

The resulting set should be ordered chronologically by date.

Note:

  • percent growth rate can be negative
  • the percent growth rate should be rounded to one digit after the decimal point and immediately followed by a percent symbol “%”. See the desired output below for reference.

DESIRED OUTPUT

The resulting set should look similar to the following:

date       | count | percent_growth
-----------+-------+---------------
2016-02-01 | 175 | null
2016-03-01 | 338 | 93.1%
2016-04-01 | 345 | 2.1%
2016-05-01 | 295 | -14.5%
2016-06-01 | 330 | 11.9%
...
  • date – (DATE) first date of the month
  • count – (INT) a number of posts in a given month
  • percent_growth – (TEXT) a month-over-month growth rate expressed in percent
select date_trunc('month', created_at)::date as date,
count(*) as count,
round (
100.0 * (count(*) - lag(count(*), 1) over (order by date_trunc('month', created_at)::date asc)) /
lag(count(*), 1) over (order by date_trunc('month', created_at)::date asc)
,1 )
|| '%' as percent_growth
from posts
group by date
order by date asc;

INTERVIEW TASK06

Return a table that shows a count of each race represented within the table. Order your results by the number of individuals that represent each race in descending order.

TABLE SCHEMA

DEMOGRAPHICS

A table of persons with demographic details

NAMETYPEidINTnameVARCHAR(100)birthdayDATEraceVARCHAR(100)

select count(*) as count, race from demographics group by race order by count desc;

INTERVIEW TASK07

For this challenge you need to create a simple GROUP BY statement, you want to group all the people by their age and count the people who have the same age.

TABLE SCHEMAS

PEOPLE

A list of people with their names and age

NAMETYPEidINTnameSTRINGageINT

select age, count(*) as people_count from people group by age;

INTERVIEW TASK08

There is an events the table is used to track different key activities taken on a website. For this task, you need to filter the name field to only show “trained” events. Events should be grouped by the day they happened and counted. The description field is used to distinguish which items the events happened on.

Along with the results, a time series chart will also be provided which will consume the data and be used to visualize the data. Create your query to be ready to be consumed by the chart, such as ensuring the dates are in order.

TABLE SCHEMA

EVENTS

A table of events and the details of each event.

NAMETYPEidINTnameSTRINGcreated_atDateTimedescriptionString

Select date_trunc('day',created_at) as day , description , Count(*) as Count  From events  where name = ‘trained' 
Group by date_trunc('day',created_at),description

INTERVIEW TASK09

For this challenge you need to create a simple HAVING statement, you want to count how many people have the same age and return the groups with 10 or more people who have that age.

TABLE SCHEMA

PEOPLE

A table of people and their age

NAME TYPE idINTnameVARCHAR(100)ageINT

SELECT age, COUNT(*) AS total_people FROM people GROUP BY age 
HAVING COUNT(*) >= 10;

INTERVIEW TASK10

For this challenge you need to create a SELECT statement, this SELECT statement will use an IN to check whether a department has had a sale with a price over 98.00 dollars.

TABLE SCHEMAS

DEPARTMENTS

A list of department names and their IDs

NAMETYPEidINTnameVARCHAR(100)

SALES

A table of sales and its details

NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE

SELECT id, name FROM departments WHERE id IN (SELECT department_id FROM sales WHERE (price) > 98.00)

INTERVIEW TASK11

For this challenge, you need to create a simple SELECT statement that will return all columns from the products table and join to the companies table so that you can return the company name.

TABLE SCHEMA

PRODUCTS

A table of product’s id, name, ISBN, related company_id, and price

NAMETYPEidINTnameVARCHAR(100)company_idINTpriceDECIMAL

COMPANIES

A table of companies and their IDs

NAMETYPEidINTnameVARCHAR(100)

SELECT p.id, p.name , p.isbn , p.company_id , p.price , c.name AS company_name FROM products AS p 
LEFT JOIN companies AS c
ON p.company_id = c.id

INTERVIEW TASK12

For this challenge, your task is to create a simple SELECT statement that will return all columns from the people table, then join that result set to the sales table so that you can return the COUNT of all sales and RANK each person by their sale_count. In case of ties, sort by id ascending.

TABLE SCHEMAS

PEOPLE

A table of people and their IDs

NAMETYPEidINTnameVARCHAR(100)

SALES

A list of sales and associated details

NAMETYPEidINTpeople_idINTsaleTEXTpriceINT

SELECT people.id,people.name, count(sales.id) AS sale_count, 
rank () OVER ( ORDER BY count(sales.id) desc ) sale_rank
FROM people JOIN sales ON people.id = people_id
GROUP BY people.id
order by sale_count desc,people.id asc;

INTERVIEW TASK13

For this challenge, you need to create a simple SELECT statement that will return all columns from the people table, and join to the toys table so that you can return the COUNT of the toys as toy_count.

TABLE SCHEMAS

PEOPLE

A list of people and their IDs

NAMETYPEidINTnameVARCHAR(100)

TOYS

A list of toys and the people’s IDs they belong to.

NAMETYPEidINTnameVARCHAR(100)people_idINT

SELECT p.id, p.name, count(*) as toy_count FROM people p JOIN toys t 
ON p.id = t.people_id
GROUP BY p.id, p.name;

INTERVIEW TASK14

For this challenge, you need to create a simple SELECT statement. Your task is to calculate the MIN, MEDIAN, and MAX scores of the students from the results table.

TABLE SCHEMA

STUDENT

A list of students and their IDs

NAMETYPEidINTnameVARCHAR(100)

RESULT

A table of resulting test scores

NAMETYPEidINTstudent_idINTscoreINT

SELECT min(score),
ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP(ORDER BY score)::numeric, 2)::float AS median,
max(score)
FROM result;

INTERVIEW TASK15

For this challenge, you need to create a simple MIN / MAX statement that will return the Minimum and Maximum ages out of all the people.

TABLE SCHEMA

PEOPLE

NAMETYPEidINTnameVARCHAR(100)ageINT

SELECT min(age) AS age_min, max(age) AS age_max FROM  people;

INTERVIEW TASK16

Return a single column table with the results of number1 modulus number2.

TABLE SCHEMA

DECIMALS

A table with two columns of floats and a column for the id

NAMETYPEidINTnumber1INTnumber2INT

SELECT mod(number1, number2) AS mod
FROM decimals;

INTERVIEW TASK17

For this challenge, you’ll pivot a table. You’re given two tables, products and details. Your task is to pivot the rows in products to produce a table of products which have columns of their details. Group and order by the name of the product.

TABLE SCHEMA

PRODUCTS

A table of products and their IDs

NAMETYPEidINTnameVARCHAR(100)

DETAILS

A list of details from their related products

NAMETYPEidINTproduct_idINTdetailVARCHAR(5)

CREATE EXTENSION tablefunc;SELECT * FROM  crosstab (      'SELECT p.name, detail, COUNT(d.id)      FROM products p      JOIN details d      ON p.id = d.product_id      GROUP BY p.name, d.detail      ORDER BY 1,2')AS ct (name text, bad bigint, good bigint, ok bigint)

INTERVIEW TASK18

Create a query to calculate how many hours each department has clocked per day.

Order results by day then by department_name.

Assume all shift hours are allocated to the login day, even if the logout is after midnight.

TABLE SCHEMA

select t.login::date as day, d.name as department_name, sum(date_part('hour', age(t.logout, t.login))) as total_hours
from timesheet t
Inner join department d
on t.department_id = d.id
group by day, department_name
order by day, department_name

INTERVIEW TASK19

For this challenge, you will be using a DVD rental database. See below for details on this database.

You are working for a company that wants to reward its best customers with a free gift. You have been asked to generate a simple report that returns the top 10 customers by selected by the total amount spent (from the payments table). The total count of payments and the contact email for each of these customers should also be included in the query.

The query should retrieve the following columns:

  • customer_id (int4)
  • email (varchar)
  • payments_count (int)
  • total_amount (float)

and should return rows showing the 10 top-spending customers ordered descending by the total amount spent.

select c.customer_id as customer_id, c.email as email, count(*) as payments_count, 
(sum(p.amount)::float)as total_amount
from customer c join payment p
on
c.customer_id = p.customer_id
GROUP BY c.customer_id, c.email
ORDER BY total_amount desc
fetch first 10 rows only

INTERVIEW TASK20

For this challenge, you need to return a single column that is the result of number1 + number2 truncated towards zero.

TABLE SCHEMA

DECIMALS

A table with a column for id and 2 columns of double-precision integers.

NAMETYPEidINTnumber1DOUBLE PRECISIONnumber2DOUBLE PRECISION

SELECT trunc(number1 + number2) AS towardzero
FROM decimals;

INTERVIEW TASK21

Given the schema presented below, write SQL to find two actors who cast together the most and list the titles of the movies they appeared in together. Order the result set alphabetically.

TABLE FILM_ACTOR

Column     | Type                        | Modifiers
------------+-----------------------------+----------
actor_id | smallint | not null
film_id | smallint | not null
...

TABLE ACTOR

Column     | Type                        | Modifiers
------------+-----------------------------+----------
actor_id | integer | not null
first_name | character varying(45) | not null
last_name | character varying(45) | not null
...

TABLE FILM

Column     | Type                        | Modifiers
------------+-----------------------------+----------
film_id | integer | not null
title | character varying(255) | not null
...

THE DESIRED OUTPUT:

first_actor | second_actor | title
------------+--------------+--------------------
John Doe | Jane Doe | The Best Movie Ever
...
  • first_actor – Full name (First name + Last name separated by a space)
  • second_actor – Full name (First name + Last name separated by a space)
  • title – Movie title

Note: the actor_id of the first_actor should be lower than the actor_id of the second_actor.

-- Replace with your SQL Queryselect 
Select
concat(a1.first_name, ' ', a1.last_name) as second_actor,
concat(a2.first_name, ' ', a2.last_name) as first_actor,
f.title
from (
select
fa1.actor_id as actor_a_id,
fa2.actor_id as actor_b_id,
count(fa1.film_id) as casted_together,
array_agg(fa1.film_id) as film_ids
from film_actor fa1
join film_actor fa2
on fa1.film_id = fa2.film_id
and fa1.actor_id > fa2.actor_id
join film f
on fa1.film_id = f.film_id
group by fa1.actor_id, fa2.actor_id
order by casted_together desc
limit 1
) s
join actor a1 on a1.actor_id = s.actor_a_id
join actor a2 on a2.actor_id = s.actor_b_id
join film f on f.film_id = any(s.film_ids)

INTERVIEW TASK22

For this challenge you need to create a UNION statement, there are two tables ussales and eusales the parent company tracks each sale at its respective location in each table, you must all filter the sale price so it only returns rows with a sale greater than 50.00. You have been tasked with combining that data for future analysis.
Order by the US than EU sales as location.

TABLE SCHEMAS

USSALES,EUSALES

The tables of US sales and EU sales with their related details

NAMETYPEidINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE

ALTER TABLE ussales
ADD COLUMN location varchar(255)
DEFAULT 'US';
ALTER TABLE eusales
ADD COLUMN location varchar(255)
DEFAULT 'EU';
SELECT * FROM ussales
WHERE price > 50.00
UNION ALL
SELECT * FROM eusales
WHERE price > 50.00
ORDER BY location desc, id;

INTERVIEW TASK23

For this challenge, your task is to create a VIEW. This VIEW is used by a sales store to give out vouchers to members who have spent over $1000 in departments that have brought in more than $10000 total ordered by the member’s id. The VIEW must be called members_approved_for_voucher. Create a SELECT query using this view that extracts the expected results.

TABLE SCHEMA

SALES

The related table for departments, products, and members

NAMETYPEidINTdepartment_idINTproduct_idINTmember_idINTtransaction_dateDATE

CREATE VIEW members_approved_for_voucher ASSELECT m.id, m.name, m.email, SUM(p.price) as total_spending
FROM members m, sales s, products p
WHERE m.id = s.member_id
AND s.product_id = p.id
AND s.department_id IN (
SELECT s.department_id
FROM products p, sales s
WHERE s.product_id = p.id
GROUP BY s.department_id
HAVING SUM(p.price) > 10000
)GROUP BY m.id
HAVING SUM(p.price) > 1000
ORDER BY m.id;
SELECT * from members_approved_for_voucher;

INTERVIEW TASK24

For this challenge, you need to create a simple SELECT statement that will return all columns from the people table with age over 50. Sort the results by age descending.

TABLE SCHEMA

PEOPLE

People’s table with id, name, and age

NAMETYPEidINTnameVARCHAR(100)ageINT

SELECT * FROM people
WHERE age > 50
ORDER BY age DESC;

INTERVIEW TASK25

For this challenge you need to create a SELECT statement, this SELECT statement will use an IN to check whether a department has had a sale with a price over 90.00 dollars BUT the SQL MUST use the WITH statement which will be used to select all columns from sales where the price is greater than 90.00, you must call this sub-query special_sales.

TABLE SCHEMA

DEPARTMENTS

A list of department’s names and their IDs

NAMETYPEidINTnameVARCHAR(100)

SALES

A list of sales, their properties, and related department ID

NAMETYPEidINTdepartment_idINTnameVARCHAR(100)priceDECIMALcard_nameVARCHAR(100)card_numberVARCHAR(19)transaction_dateDATE

WITH special_sales AS (SELECT * FROM sales WHERE (price) > 90.00)
SELECT id, name FROM departments
WHERE id IN ( SELECT department_id
FROM sales
WHERE (price) > 90.00);

In this article, we have discussed some standard SQL problems that you might come across in your next SQL tech interview. These problems are based on my personal experiences and the same concepts can be tested using other questions. The literal questions don’t matter.

At last, I would also like to share some cool advice from Carson Forter who recently published an article on Medium

We tried to cover as much as we could for a newbie to get started with the Specialist exam. As always, We welcome feedback and constructive criticism. We can be reached on Twitter at @rajvivan and @pallavinaik_ . If you enjoyed this blog, we’d love for you to hit the share button so others might stumble upon it. Please hit the subscribe button as well if you’d like to be added to my once-weekly email list, and don’t forget to follow Vizartpandey on Instagram!

Also, here are a few hand-picked articles for you to read next:

--

--

Rajeev Pandey

I’m Rajeev, 3 X Tableau Zen Master , 5 X Tableau Ambassador, Tableau Featured Author, Data Evangelist from Singapore