https://linen.dev logo
Join Slack
Powered by
# data-engineer-challenge
  • g

    gentle-petabyte-80785

    07/06/2023, 3:31 PM
    πŸ† Data Engineer Challenge - Weekly A weekly challenge to practice data engineering skills such as data modelling, sql, statistics, probability, analytics, coding, and relevant data tools etc. How does it work? β€’ Each week, we post a new Data Engineering question β€’ Community members answer the question β€’ Top correct answers leaderboard is updated at the end of the week β€’ Top 3 winners are awarded swags at the end of month Leaderboard Not yet available. After the first week of the challenge, we will publish the leaderboard of people who submitted the correct answers. What do I do to win this? Submit correct answers to weekly questions as soon as you can. What do winner get? RudderStack swags
  • g

    gentle-petabyte-80785

    07/06/2023, 3:31 PM
    set the channel description: Challenges to practice data engineering skills
  • g

    gentle-petabyte-80785

    07/06/2023, 3:51 PM
    What will be a good day to start this challenge? I'm thinking Sat 8AM PT as folks would have more time to work on personal stuff on weekends
  • g

    gentle-petabyte-80785

    07/08/2023, 3:00 PM
    πŸš€ Let's get started Your time to answer this question starts now ⏱️ #1 - Daily Shopping Champions [SQL Practice]
    Copy code
    Assume you are working with two tables: Users and Orders. The Users table contains user information and the Orders table contains the order information. 
    
    The task is to write an SQL query to identify the users who have made the highest number of orders each day. If there are ties, return all users for that day.
    
    Users Table:
    Column Name	Type
    user_id	integer
    user_name	varchar
    user_email	varchar
    
    Users Example Input:
    user_id	user_name	user_email
    1	Pradeep <mailto:psharma@rudderstack.com|psharma@rudderstack.com>
    2	Soumyadeb <mailto:soumyadeb@rudderstack.com|soumyadeb@rudderstack.com>
    3	Sumanth <mailto:sumanth@rudderstack.com|sumanth@rudderstack.com>
    
    Orders Table:
    Column Name	Type
    order_id	integer
    user_id	integer
    order_date	datetime
    order_amount	decimal
    
    Orders Example Input:
    order_id	user_id	order_date	order_amount
    1001	1	01/01/2023 00:00:00	150.00
    1002	2	01/01/2023 01:00:00	200.00
    1003	1	01/01/2023 02:00:00	300.00
    1004	3	01/02/2023 00:00:00	50.00
    1005	3	01/02/2023 01:00:00	100.00
    1006	2	01/02/2023 02:00:00	500.00
    1007	2	01/02/2023 03:00:00	150.00
    1008	3	01/02/2023 04:00:00	200.00
    
    Example Output:
    order_date	user_id
    01/01/2023	1
    01/02/2023	2
    m
    • 2
    • 3
  • g

    gentle-petabyte-80785

    07/15/2023, 3:00 PM
    🎯 # 2 - Unsold Inventory Inspectors [SQL Practice]
    Copy code
    Assume you're given the tables below about a Shopping Platform with data on Products, Sellers and Sales.
    
    Write a SQL query to return the IDs of the Sellers who haven't sold any product yet. The output should be sorted in ascending order.
    
    products Table:
    Column Name	Type
    product_id	integer
    product_name	varchar
    seller_id	integer
    
    products Example Input:
    product_id	product_name	seller_id
    1001	Apple iPhone 13	30001
    1002	Samsung Galaxy S21	30002
    1003	OnePlus 9	30003
    1004	Apple iPhone 13	30004
    1005	Samsung Galaxy S21	30005
    
    sellers Table:
    Column Name	Type
    seller_id	integer
    seller_name	varchar
    
    sellers Example Input:
    seller_id	seller_name
    30001	John's Electronics
    30002	Samsung Official Store
    30003	OnePlus Official Store
    30004	Apple Official Store
    30005	Electronics Depot
    
    sales Table:
    Column Name	Type
    sale_id	integer
    product_id	integer
    sale_date	datetime
    
    sales Example Input:
    sale_id	product_id	sale_date
    50001	1001	05/01/2023 00:00:00
    50002	1002	05/02/2023 00:00:00
    50003	1001	05/03/2023 00:00:00
    50004	1002	05/04/2023 00:00:00
    50005	1001	05/05/2023 00:00:00
    
    Example Output
    seller_id
    30003
    30004
    30005
    
    The dataset you are querying against may have different input & output - this is just an example!
    m
    • 2
    • 2
  • g

    gentle-petabyte-80785

    07/29/2023, 6:35 AM
    Hello everyone, sorry about missing the last week's challenge. It won't happen again. I have problems ready for next 2 months. β€’ You'll get the challenges on the same time- 8AM PT every Sat β€’ I will share the solution after 1 week of posting the problem β€’ You can ask upto 2 hints to solve the problem. To ask for hint, comment or react with βœ‹ to the problem
    πŸ‘ 1
  • g

    gentle-petabyte-80785

    07/29/2023, 6:36 AM
    Let's do this. Your feedback is always welcome. It is a collaborative efforts, and I am flexible to change it to suit your needs.
  • g

    gentle-petabyte-80785

    07/29/2023, 3:00 PM
    🎯 Week 3 Challenge - Statistics Practice Abnormal Visitor Surge A data engineer collected data from a website over a period of 30 days. The website had an average of 5000 visitors per day with a standard deviation of 200. On a particular day, the website had 5400 visitors. Is this an unusual event? Why or why not?
    m
    • 2
    • 3
  • g

    gentle-petabyte-80785

    08/05/2023, 3:00 PM
    🎯 Week 4 Challenge - SQL Practice Lucky Draw A company has a database with 1 million records. They want to run a query that selects 10% of the records randomly. What SQL query would accomplish this?
    m
    • 2
    • 2
  • g

    gentle-petabyte-80785

    08/12/2023, 3:00 PM
    🎯 Week 5 Challenge - Statistics Practice XY Relationship You have two variables X and Y. The correlation coefficient between X and Y is
    -0.8
    What does this imply about the relationship between X and Y?
    m
    • 2
    • 2
  • g

    gentle-petabyte-80785

    08/14/2023, 1:10 AM
    set the channel topic: Riddles and challenges to practice data engineering
  • g

    gentle-petabyte-80785

    08/19/2023, 3:00 PM
    🎯 Week 6 Challenge - SQL Practice Departmental Salary Averages Write a SQL query to find the average salary of employees in each department. Display the department name and the average salary. Order the results by the average salary in descending order. Assumption: β€’ Each employee works in only one department. Employees table:
    Copy code
    Column Name	Type
    employee_id	integer
    first_name	varchar
    last_name	varchar
    salary	decimal
    department_id	integer
    Departments table:
    Copy code
    Column Name	Type
    department_id	integer
    department_name	varchar
    Employees example input:
    Copy code
    employee_id	first_name	last_name	salary	department_id
    1	John	Doe	5000.00	1
    2	Jane	Smith	6000.00	1
    3	Jim	Brown	7000.00	2
    4	Jill	White	8000.00	2
    Departments example input:
    Copy code
    department_id	department_name
    1	HR
    2	Engineering
    ---- Example output:
    Copy code
    department_name	average_salary
    Engineering	7500.00
    HR	5500.00
    The dataset you are querying against may have different input & output - this is just an example!
    m
    • 2
    • 2
  • g

    gentle-petabyte-80785

    08/26/2023, 3:00 PM
    🎯 Week 7 Challenge - Probability Practice Poisson's Mysterious Tail A random variable X follows a Poisson distribution with a mean (
    Ξ»
    ) of 5. What is the probability that X takes a value greater than 7?
    m
    • 2
    • 1
  • g

    gentle-petabyte-80785

    09/02/2023, 3:00 PM
    🎯 Week 8 Challenge - SQL Practice Inactive Customers Write a SQL query to find the customers who have not placed any orders in the year 2023. Display the customer_id and customer_name of these customers. Assumption: β€’ Each order is placed by one and only one customer. Customers table:
    Copy code
    Column Name	Type
    customer_id	integer
    customer_name	varchar
    Orders table:
    Copy code
    Column Name	Type
    order_id	integer
    customer_id	integer
    order_date	datetime
    ---- Customers example input:
    Copy code
    customer_id	customer_name
    1	John Doe
    2	Jane Smith
    3	Jim Brown
    4	Jill White
    Orders example input:
    Copy code
    order_id	customer_id	order_date
    1	1	2022-12-31 00:00:00
    2	2	2023-01-01 00:00:00
    3	3	2023-01-02 00:00:00
    4	1	2023-01-03 00:00:00
    Example output:
    Copy code
    customer_id	customer_name
    4	Jill White
    The dataset you are querying against may have different input & output - this is just an example!
    m
    • 2
    • 1
  • g

    gentle-petabyte-80785

    09/09/2023, 3:00 PM
    🎯 Week 9 Challenge - Statistics Practice Defying Expectations You are testing the null hypothesis that the mean of a population is equal to 50. You obtain a sample mean of 52 with a standard deviation of 4. The sample size is 100. What is your conclusion based on a 5% significance level?
    m
    • 2
    • 1
  • g

    gentle-petabyte-80785

    09/16/2023, 3:00 PM
    🎯 Week 10 Challenge - SQL Practice Top Selling Products by Month Write a SQL query to find the product that has the highest total sales in each month of the year 2023. Display the
    month
    ,
    product_id
    ,
    product_name
    , and
    total_sales
    amount. Assumption: β€’ Each order can contain multiple products and each product can be in multiple orders. Products table:
    Copy code
    Column Name	Type
    product_id	integer
    product_name	varchar
    Orders table:
    Copy code
    Column Name	Type
    order_id	integer
    order_date	datetime
    Order_details table:
    Copy code
    Column Name	Type
    order_id	integer
    product_id	integer
    quantity	integer
    price	decimal
    ------ Products example input:
    Copy code
    product_id	product_name
    1	Apple
    2	Banana
    3	Cherry
    4	Durian
    Orders example input:
    Copy code
    order_id	order_date
    1	2023-01-01 00:00:00
    2	2023-01-02 00:00:00
    3	2023-02-01 00:00:00
    4	2023-02-02 00:00:00
    Order_details example input:
    Copy code
    order_id	product_id	quantity	price
    1	1	10	1.00
    2	2	5	0.50
    3	3	20	0.25
    4	4	15	2.00
    Example output:
    Copy code
    month	product_id	product_name	total_sales
    1	1	Apple	10.00
    2	4	Durian	30.00
    The dataset you are querying against may have different input & output - this is just an example!
    m
    • 2
    • 1
  • g

    gentle-petabyte-80785

    09/23/2023, 3:00 PM
    🎯 Week 11 Challenge - Statistics Practice Confidence in the Shadows You have a sample of 100 observations from a population. The sample mean is 20 and the sample standard deviation is 5. What is the 95% confidence interval for the population mean?
    m
    • 2
    • 1
  • g

    gentle-petabyte-80785

    09/30/2023, 3:00 PM
    🎯 Week 12 Challenge - SQL Practice Identify frequently co-ordered products You have been given a database with two tables:
    orders
    and
    order_items
    . The
    orders
    table contains general information about the orders placed, while the
    order_items
    table contains details about each item within those orders. Your task is to find the products that have been ordered together with 'Product X' (let's assume its product_id is 500) more than 3 times in the year 2022. orders Table: Column Name | Type -------------|------- order_id | integer customer_id | integer order_date | datetime order_items Table: Column Name | Type ------------|------- item_id | integer order_id | integer product_id | integer quantity | integer Example Input: orders: order_id | customer_id | order_date ---------|-------------|------------------------ 1 | 100 | 01/10/2022 103000 2 | 101 | 02/20/2022 112000 3 | 100 | 04/15/2022 091500 order_items: item_id | order_id | product_id | quantity --------|----------|------------|--------- 10 | 1 | 500 | 1 11 | 1 | 501 | 2 12 | 2 | 500 | 1 13 | 2 | 502 | 1 14 | 3 | 500 | 1 15 | 3 | 503 | 1 16 | 3 | 501 | 2 Example Output: product_id | times_ordered_together -----------|------------------------ 501 | 2 502 | 1 503 | 1
    • 1
    • 1
  • g

    gentle-petabyte-80785

    10/15/2023, 12:01 PM
    ⏸️ Update: We're pausing the challenge for a couple of weeks as we work on the next stage of the challenge to make it more useful. your suggestions are appreciated!