Intro

Welcome to my portfolio website, a digital showcase of my passion for learning and problem solving through data exploration and analysis. Here, I present a curated collection of projects that represent my personal and professional interests. I utilize analytical tools such as Python, R, Tableau, and Excel, and will continue to add more as I pick them up. Through these projects, I aim to not only showcase my technical skills but also highlight my ability to generate actionable insights and tell compelling stories from data. Clicking on the title or the thumbnail of any project will take you to the platform where the project is posted publicly if you would like to see the full product.

Enjoy!

Time Series Store Sales Forecasting with R Studio

Time Series Store Sales Forecasting with R Studio

Purpose

This project served as my final exam in my GSCM 571 Business Analytics I class. The focus of the class was the application of key analytics methods dealing with statistics, probability, and forecasting using R Studio. The data is from the Store Sales - Time Series Forecasting competition on Kaggle. The data set contains store sales on data from Corporación Favorita, a large Ecuadorian-based grocery retailer. The training data includes dates, store and product information, whether that item was being promoted, and the sales numbers. Additional files include supplementary information relating to holidays, transactions, and the price of oil that will be useful in building an accurate model.

The original competition is designed to predict unit sales for thousands of items sold at different Favorita stores, but I have modified the scope of this project, limiting it to predicting sales for the top-selling product family from the highest volume store so as to use the same forecasting methods covered in class. I also bring in two quick ARIMA time series models to fill future values for oilprice and transactions in the test data.

My goal for this project was to identify and explain how the predictor variables relate to sales for a particular product family and how the information can be used to help inform management decisions. I offer a detailed explanation of my thought process at every step, along with the interpretation of statistical results including evaluating model fit based on collinearity, hypothesis testing, confidence intervals, standard deviation of the residuals, and prediction intervals. I conclude with a brief summary of my results and include some specific examples of how the information can help inform specific decisions.

Check out the full project on Kaggle.

Supply Chain Optimization With LINGO

Machine Learning: Predicting Student Success

Purpose

Global supply chains are complex systems responsible for moving products from multiple origin locations to various transshipment points (such as distribution centers), and finally to their intended destinations. In supply chain management, one of the primary challenges is solving the transportation problem, which involves determining the optimal number of units to ship across each segment of the network. The objective is to ensure that all destination demands are met while minimizing transportation costs.

Supply chain managers must strategically allocate resources to achieve cost-effective and efficient distribution. However, this task is complicated by the need to balance multiple variables, including production capacity, demand requirements, and transportation costs. Linear programming provides a mathematical approach to solving these network flow problems, enabling the most efficient movement of goods across the distribution network while minimizing costs and satisfying demand requirements.

The Problem

For this example I selected the following case problem from the textbook Introduction to Management Science: Quantitative Approach:

The Darby Company manufactures and distributes meters used to measure electric power consumption. The company started with a small production plant in El Paso and gradually built a customer base throughout Texas. A distribution center was established in Fort Worth, Texas, and later, as business expanded, a second distribution center was established in Santa Fe, New Mexico.

The El Paso plant was expanded when the company began marketing its meters in Arizona, California, Nevada, and Utah. With the growth of the West Coast business, the Darby Company opened a third distribution center in Las Vegas and just two years ago opened a second production plant in San Bernardino, California.

Manufacturing costs differ between the company’s production plants. The cost of each meter produced at the El Paso plant is $10.50. The San Bernardino plant utilizes newer and more efficient equipment; as a result, manufacturing cost is $0.50 per meter less than at the El Paso plant.

Due to the company’s rapid growth, not much attention had been paid to the efficiency of its supply chain, but Darby’s management decided that it is time to address this issue. The cost of shipping a meter from each of the two plants to each of the three distribution centers is shown in Table 6.10.

The quarterly production capacity is 30,000 meters at the older El Paso plant and 20,000 meters at the San Bernardinoplant. Note that no shipments are allowed from the San Bernardino plant to the Fort Worthdistribution center.

The company serves nine customer zones from the three distribution centers. The forecast of the number of meters needed in each customer zone for the next quarter is shownin Table 6.11.

The cost per unit of shipping from each distribution center to each customer zone is given in Table 6.12; note that some distribution centers cannot serve certain customer zones. These are indicated by a dash, “—”.

In its current supply chain, demand at the Dallas, San Antonio, Wichita, and Kansas City customer zones is satisfied by shipments from the Fort Worthdistribution center. In a similar manner, the Denver, Salt Lake City, and Phoenix customer zonesare served by the Santa Fe distribution center, and the Los Angeles and San Diego customer zones are served by the Las Vegas distribution center. To determine how many units to ship from each plant, the quarterly customer demand forecasts are aggregated at the distribution centers, and a transportation model is used to minimize the cost of shipping from the production plants to the distribution centers.

Table 6.10 SHIPPING COST PER UNIT FROM PRODUCTION PLANTS TO DISTRIBUTION CENTERS (IN $)

Distribution Center
Plant Fort Worth Santa Fe Las Vegas
El Paso 3.20 2.20 4.20
San Bernardino 3.90 1.20

Table 6.11 QUARTERLY DEMAND FORECAST

Customer Zone Demand (meters)
Dallas 6300
San Antonio 4880
Wichita 2130
Kansas City 1210
Denver 6120
Salt Lake City 4830
Phoenix 2750
Los Angeles 8580
San Diego 4460

Table 6.12 SHIPPING COST FROM THE DISTRIBUTION CENTERS TO CUSTOMER ZONES

Customer Zone
Distribution Center Dallas San Antonio Wichita Kansas City Denver Salt Lake City Phoenix Los Angeles San Diego
Fort Worth 0.3 2.1 3.1 4.4 6.0
Santa Fe 5.2 5.4 4.5 6.0 2.7 4.7 3.4 3.3 2.7
Las Vegas 5.4 3.3 2.4 2.1 2.5

Managerial Report

You are asked to make recommendations for improving Darby Company's supply chain by answering the following questions:

  1. Draw a network representation of the Darby company's current supply chain map. If the company does not change its current supply chain, what will its distribution costs be for the following quarter?
  2. Suppose that the company is willing to consider dropping the distribution center limitations; that is, customers could be served by any of the distribution centers for which costs are available. Can costs be reduced? If so, by how much?
  3. The company wants to explore the possibility of satisfying some of the customer demand directly from the production plants. In particular, the shipping cost is $0.30 per unit from San Bernardino to Los Angeles and $0.70 from San Bernardino to San Diego. The cost for direct shipments from El Paso to San Antonio is $3.50 per unit. Can distribution costs be further reduced by considering these direct plant-tocustomer shipments?

Solution 1

The first step is to draw a supply chain map showing all of the potential shipment arcs out of the production plants and distribution centers under the current structure. Figure 1 shows graphically the 14 distribution routes (arcs) Darby company can use. The ammount of supply available is written next to each origin node, and the amount of demand is written next to each destination node. The cost per unit of shipping from each node is written on the arc for each route.

Figure 1. Network map
Transshipment network diagram

A linear programming model is used to solve this transshipment problem to determine the minimum cost under the given constraints. I used double-subscripted decision variables based on the labels I gave in the network diagram above, with XA1 denoting the number of units shipped from origin A (El Paso) to destination 1 (Fort Worth), XA2 denoting the number of units shipped from origin A (El Paso) to destination 2 (Santa Fe), and so on.

Because the objective is to minimize the total transportation cost, I used the cost data in Table 6.12 or on the arcs in Figure 1 to develop the following objective function:

Min 13.7XA1 + 12.7XA2 + 14.7XA3 + 13.9XB2 + 11.2XB3 + 0.3X14 + 2.1X15 + 3.1X16 + 4.4X17 + 2.7X28 + 4.7X29 + 3.4X210 + 2.1X311 + 2.5X312

This objective function ensures the lowest total cost through all possible distribution routes.

Transshipment problems also require several constraints for each of the shipment nodes. For origin nodes, the sum of the shipments out minus the sum of the shipments in must be less than or equal to the origin supply. For destination nodes, the sum of the shipments in minus the sum of shipments out must equal demand. For transshipment nodes, the sum of the shipments out must equal the sum of the shipments in.

Based on the requirements listed in the problem description, I came up with the following constraints:

El Paso supply constraint:
XA1 + XA2 + XA3 ≤ 30000
San Bernardino supply constraint:
XB2 ≤ 20000
Fort Worth transshipment node:
(X14 + X15 + X16 + X17) - XA1 = 0
Santa Fe transshipment node:
(X28 + X29 + X210) - (XA2 + XB2) = 0
Las Vegas transshipment node:
(X311 + X312) - (XA3 + XB3) = 0
Dallas demand constraint:
X14 ≥ 6300
San Antonio demand constraint:
X15 ≥ 4880
Wichita demand constraint:
X16 ≥ 2130
Kansas City demand constraint:
X17 ≥ 1210
Denver demand constraint:
X28 ≥ 6120
Salt Lake City demand constraint:
X29 ≥ 4830
Phoenix demand constraint:
X210 ≥ 2750
Los Angeles demand constraint:
X311 ≥ 8580
San Diego demand constraint:
X312 ≥ 4460

The LINGO model and solution report are included in Figure 2 below:

Figure 2. Optimal solution with current constraints
Transshipment optimization model

The optimal solution projects that total transportation costs for the next quarter, given the current constraints, will amount to $620,770. The El Paso plant is slated to ship 14,520 units to the Fort Worth Distribution Center and 13,700 units to the Santa Fe Distribution Center. The San Bernardino plant, however, is expected to ship only 13,840 of its 20,000-unit capacity to Las Vegas, with no shipments to Santa Fe. At the Fort Worth Distribution Center, 6,300 units will be distributed to Dallas, 4,880 units to San Antonio, 2,130 units to Wichita, and 1,210 units to Kansas City. Meanwhile, the Santa Fe Distribution Center will dispatch 6,120 units to Denver, 4,830 units to Salt Lake City, and 2,750 units to Phoenix. Lastly, the Las Vegas Distribution Center will distribute 8,580 units to Los Angeles and 4,460 units to San Diego.

Solution 2

To determine if costs can be reduced if the company is willing to consider dropping the distribution center limitations, I present an updated graphic representation of the distribution network along with a modified linear program below in Figure 3 and Figure 4.

Figure 3. Modified distribution network diagram
Transshipment network diagram

The green arrows highlight new potential distribution routes. Fort Worth can now ship to customers in Denver, while Santa Fe can extend its reach to customers in Dallas, San Antonio, Wichita, Kansas City, Los Angeles, and San Diego. Additionally, Las Vegas can now serve customers in Denver, Salt Lake City, and Phoenix. The supply and demand levels remain unchanged, as do the restrictions on the origin nodes.

Figure 4. Optimal solution without distribution center limitations
Transshipment optimization model

If Darby Company were to lift the distribution center restrictions, it could save a total of $19,828 in shipping costs. By adjusting the constraints, some of the demand could be shifted from the El Paso production plant to the San Bernardino plant, allowing it to ship its full capacity (20,000 units) to Las Vegas. Since the transportation costs per unit are lower from Las Vegas to Salt Lake City ($3.30) and from Las Vegas to Phoenix ($2.40) compared to shipping from Santa Fe to Salt Lake City ($4.70) and from Santa Fe to Phoenix ($3.40), the company would benefit from dropping these restrictions and utilizing the more cost-effective distribution routes.

Solution 3

Allowing direct shipments from El Paso to San Antonio, and from San Bernardino to LA and San Diego add three new arcs to the network model. We add XA5, XB11, & XB12 to the objective function. Additionally, we adjust the demand constraints for San Antonio, LA, and San Diego to account for the direct shipments from the production plants. The results are shown below in Figure 5.

Figure 5. Optimal solution with direct-to-customer shipments
Transshipment optimization model

The value of XA5 = 4,880 indicates that 4,880 units are being shipped directly from El Paso to San Antonio. Similarly, XB11 = 8,580 indicates that 8,580 units are shipped directly from San Bernardino to LA, while XB12 = 4,460 shows that 4,460 units are shipped directly from San Bernardino to San Diego. This solution report highlights a $248,876 reduction in the total cost by implementing these direct-to-customer shipping routes, translating to a 40% savings compared to the previous supply chain model. Based on this analysis, it is recommended that Darby Company adopts direct-to-customer shipping from its production plants and removes the existing limitations on shipping from distribution centers.

Machine Learning: Predicting Student Success With Python

Machine Learning: Predicting Student Success

Purpose

The ultimate goal of higher education institutions is to prepare students for successful careers and graduate highly qualified candidates into the workforce for the betterment of society. Unfortunately, there are many obstacles and barriers to accomplishing this goal. As of 2021, only 33.7% of the population aged 25 and over have completed a bachelor’s degree or higher in the U.S. (US Census Bureau, 2022), while total enrollment has declined by 1.3 million students at public colleges and universities in the United States between 2010 and 2020 (Ma & Pender, 2022).

One of the biggest challenges that higher education institutions face is student dropout rates. Of all the students that enroll in a college institution, more than half fail to complete their degree (Kantrowitz, 2021). Understanding the reasons why is a crucial first step toward addressing the problem and improving graduation rates.

Early detection and intervention for students at risk of dropping out can significantly improve their chances of success. Armed with the right information, educational decision-makers can help provide the right resources at the right time to the students who need them. In addition, institutions can use this data to make changes to university operations that can help facilitate improved graduation rates, such as introducing hybrid and online learning models, DEI initiatives, or the reallocation of resources.

A machine learning model that accurately predicts whether a student will drop out or graduate can provide necessary insights and recommendations that enable universities to address the issues causing students to drop out and develop strategies to improve student success. In this analysis, I utilize a Kaggle dataset made available by Kancharla Naveen Kumar to analyze student success factors and develop multiple supervised machine learning models, including Logistic Regression, Decision Trees, and Random Forest, to classify students into two categories, "Dropout" or "Graduate". The goal is to get the predictive model to correctly label 90% of students into the correct classification.

Conclusion

Based on the results of this analysis, the logistic regression model produces the best overall fit metrics and is the optimal supervised machine learning model to accurately predict student dropouts in this sample population. The logistic regression model accurately predicts 90% of student dropouts and can be used to target communications and allocate resources to help prevent students at risk and only two semesters' worth of data, and the results may not generalize to other institutions. In addition, the lower recall scores on all three models suggest that additional features, including student income factors, housing status (on or off campus), grade point average, employment status, and even faculty data, could be added to help improve the accuracy and reliability of the results to produce fewer false negative results.

Check out the full project on Kaggle.

HR Attrition Dashboard With Tableau

HR Attrition Visualization

Purpose

This project is my entry into the Data In Motion LLC monthly data viz challenge. The assignment was to take on the role of an HR people analyst to find insight on termination trends, department turnover, salary, diversity rates, and more for over 4,000 (not real) employees. I decided to take the approach of creating a dashboard to help us understand what is driving attrition at the company, and what we can do to reduce employee turnover.

The Result

After review, I have concluded there is insufficient data to determine the root-cause of attrition at the company. However, I suspect that the major contributing factors could have to do with the company's salary structure and people practices. These factors appear to have been amplified during COVID, as demand and competition have grown for certain types of roles, and more people in Software, Sales, and Marketing started finding better opportunities and better salaries. I have the following recommendations to help identify the root-cause:

  1. I highly recommend that the company conduct additional research to reevaluate whether their salary is competitive with the market. The departments with the highest turnover impact are Software, Sales, & Marketing, so I would suggest starting here with research.
  2. I recommend that the company conduct a deep dive to determine the equity of their pay structure between race and gender, as well as across education levels and qualifications. In my research I found that African Americans are consistently paid around $16,000 below average across all education levels, and employees holding masters degrees (the highest qualification) are paid less than all other education levels.
  3. I suggest that the company implement a targeted exit survey that is consistent across departments to clarify additional details and reasoning behind voluntary exits. Since the top 3 reasons: Better salary, more flexible benefits, and found a better opportunity could have the same underlying cause, a supplemental survey could help clarify the reason for leaving. The survey should offer questions and capture notes from the employee about job fit, compensation, company culture, and leadership satisfaction. By better understanding the reasons for leaving, we can effectively determine a path toward employee retention.
  4. Without performance related data it is impossible to tell whether the company is losing top performers due to increased competition, or losing performers across the entire rating scale. I recommend adding performance factors to the dataset, along with the previous mentioned recommendations so we can deep dive further.

Preparing the data

In my initial exploration of the data I combed each field for accuracy and consistency, and to familiarize myself with the dataset. One thing that I noticed was inconsistencies in some of the gender values. For example there were 19 males recorded as "lesbian" and 7 females recorded as "gay". To handle these for visualization purposes, based on the gender options available I assigned any females listed as "gay" to "lesbian", and any males listed as "lesbian" to "gay". Perhaps an early indication of the company's lack of diversity, but there were only 2 employees who identified themselves outside of the Male/Female gender as "Other."

Another thing I noticed about the data was that we only had termination data through March, 2022, and the last hire date was December 31, 2021. Since I am unable to determine if this is an incomplete data set (missing hire info for 2022), I operated under the assumption that the company has not hired any new employees in 2022 yet, and the data collection period ended in March.

Finally, I noticed that Tenure is listed in months, so for simplicity in the visuals I converted this value to years to the nearest tenth decimal place.

Analysis

Overview: To establish a baseline for the diversity of the company, I created several elements to look at some basic statistics such as the number of active employees, number of ex-employees, attrition %, salary, age, and the tenure of active vs. ex-employees. I put these stats inside of BANs at the top for at-a-glance information as the dashboard is filtered. Beneath, I added a histogram to display the company's attrition trend during the time period covered in the dataset. For added flexibility, I also decided to add filters at the top of the dashboard for gender, education, and race, so that the user could deep dive any or all of these diversity factors.

Attrition KPIs
Company Overview

Gender, race, & age:Once I had the basic stats identified, the next thing I put together was the demographic visuals to see diversity by gender, race, and age. Based on my initial exploration of the data, I already suspected the company would have opportunities here. To represent gender I decided to use donut charts to show active vs. inactive employee count along with the % of the total population for each gender. I deliberately chose to use color in my visuals to represent active vs. inactive employees, so as to not reinforce gender or racial stereotypes, like the much used pink/blue color palette to represent women and men. In the gender mix I found approximately 60% male vs. 40% female, along with only 2 employees (0.04%) who identified outside of male or female. According to census data, approximately 2.3% of the population identifies as transgender or non-binary1.

While creating a visual for race, I chose to use donut charts as well. I felt this gave the best representation of each race, all as a part of the whole with equal representation and visibility. Within the racial mix, I found that Caucasian makes up approximately 75% of the organization, while the second largest group is Asian with just over 10% representation, followed by African American, Hispanic, Native American, and "Other," accounting for a combined total of 13.74% of the total population. Although minority groups are underrepresented in the company, the disparity really presented itself in the salary data (more on that later...).

Age: The best way to represent the age structure of the company was to present it in a vertical column chart. I created 5 bins in which to group employees: <25, 25-34, 35-44, 45-54, and 55+. I found that roughly 77% of the company was made up of employees between ages 25-44, while people ages 45+ make up a combined total of 8.27% of the company.

Race, Gender, Age diversity charts
Diversity charts

Attrition: After getting to know the composition of the company from a people perspective, it was time to look at actual attrition data. I looked at attrition in several different ways, but ultimately I landed on 3 visuals to help identify who was leaving the company and why. I wanted to investigate attrition by tenure, by department, and by reason. Starting with tenure, I noticed that 82% of turnover within the first year was voluntary, while 18% was involuntary. High voluntary turnover in the first year could indicate that new employees are having a hard time adjusting to, or feeling accepted by the current culture.

When investigating attrition by department, I noticed that Software and Sales combined for approximately one-third of all voluntary terminations. No particular sub-departments stood out as a significant source of turnover within the categories, however, both are highly competitive roles that have grown in demand since the start of COVID.

To attempt to answer the question of why employees are leaving, and spiked from 2021-2022, I created a visual to see termination reasons by department. Here I discovered a common theme that "better salary," "more flexible benefits," and "found a better opportunity" were consistently the top 3 reasons for voluntary terminations across all departments. Unfortunately, however, this does not fully answer the question of why people are leaving. Since many people may place "better salary" under the umbrella of "found a better opportunity," we would need additional data to find the driving causes of each selection. The voluntary reason list is also missing questions relating to job satisfaction, company culture, and leadership satisfaction, which automatically places those factors into a more generic bucket.

I also checked to see which populations have the highest attrition rates. I found that:

  • Employees with the job type Manager have the highest attrition rate (20%) compared with all other job types. As I looked into this further, I found that the top reason for manager terminations was "better salary" across nearly all departments.
  • Native American employees have the highest turnover by race at 23.08%, while African Americans have the lowest turnover (13.64%).
  • Employees holding a masters degree have the highest attrition of any education level at 18.91%
  • Females have a slightly higher attrition rate than males, with 18.42% compared to 17.07% for males
Company attrition
Attrition | By Dept & Reason
Company attrition
Manager Attrition | By Reason

Salary: Fair and equitable compensation is always an important factor in employee retention. To analyze the salary structure of the company I chose to make two column charts, each with a constant reference line indicating the average salary for all employees. One chart shows salaries for active employees, and the other shows salaries for ex-employees. I also added filters for the entire dashboard to allow me to view salary data by gender, race, and education. I felt that there were a few anomalies that were worth further investigation by the company:

  • African Americans were consistently paid around $16,000 below average across all education levels (also recall that they have the lowest attrition rate)
  • Employees holding a masters degree, which is the highest qualification, were paid the least of all education levels. In fact, in the case of female employees, people with no degree at all were paid the highest, over $2,300 more than employees holding a masters degree.
  • Former female employees made significantly more on average than active female employees, which is the largest discrepancy between active and inactive employees.
  • Former employees made more on average than active employees across the board, yet the average tenure of active employees exceeds that of former employees by 4.7 years.
Average salary by race
African American Avg Salary
Average salary by education
Masters is the lowest paid degree

Conclusion

Although we are not able to difinitively conclude that current attrition is caused by salary, the company's salary structure and practices clearly need to be reevaluated by management to ensure fair and equitable pay across races, gender, qualifications, and job performance.

Improving the pay structure will help reduce the heavy costs of employee turnover, and ensure the company is retaining its top talent, while replacing low performing individuals with diverse talent from all races and backgrounds. Adding job performance data, job satisfaction ratings, and additional exit survey questions to this dataset will help pinpoint the root causes of attrition, and help us identify the right actions to take to make the best decision for the business and make progress toward employee retention and job satisfaction.

Interactive Résumé With Tableau

Interactive resume

Intro

What better way is there to get to know me than through interacting with my experiences? I recently saw a LinkedIn post showing some of the creative ways people have made interactive résumés in Tableau and I was inspired to create my own.

Purpose

I decided to make this résumé to showcase my technical and creative skills to potential employers in the analytical space. When people think "Store Manager" they don't always associate it with the same skills and responsibilities as a Business or Data Analyst, however, the roles have some extraordinarily similar qualities.

A Store Manager, much like a Data Analyst is responsible for analyzing their business using whatever analytical tools available to them (Excel, SQL, Tableau, etc...) to identify patterns and trends in the data that help themselves, or their businesses make better decisions. They then take this information and put it together in various forms to communicate their findings and best practices with their peers, managers, and other key stakeholders in weekly, monthly, and quarterly business reviews. Unlike a Data Analyst however, Store Managers are then responsible for developing their own action plans, and then communicating the solution and leading their teams to execute the plan from end-to-end while maintaining responsiblity over success or failure.

In this interactive résumé, you get to see my professional background in a unique way, with additional context that wouldn't fit in a typical one-pager. My work and educational history are presented in a rounded Gantt chart in the center, with my hard and soft skills represented in the horizontal bubble chart below by half-circles (soft skills) and triangles (hard skills). Something that is not included in my résumé yet is that I have applied to receive my master's degree in Global Supply Chain Management at Portland State University, which starts in the Spring of 2023. Once I get admitted I will add it to my timeline as in-progress.

Hover over, or click each of the elements to explore!

Retail Dashboard With Tableau

Retail KPI dashboard

Purpose

In this project I wanted to create a retail KPI dashboard that is simple, yet delivers a ton of value to field leaders. The end user wants to be able to:

  • Flexibly filter by location
  • See performance period-over-period
  • Compare performance across stores/regions/markets
  • Identify which categories, sub-categories, and products drove the biggest performance wins and losses
  • Customize how they can drill into their KPIs to create their own stories based on their uniqe challenges

Preparing the data

The first thing that I did to set up the data was to join the Orders table with the Returns and People tables, and bring this static dataset up to the current time by using the expression:

DATEADD('day', DATEDIFF('day',{MAX([Order Date (new)])},TODAY()-1),[Order Date])

After deciding what I wanted to include in the dashboard and how I wanted users to interact, I started by creating custom date parameters to allow the user to either pick from a list of common periods (Yesterday, Week-to-date, Month-to-date, etc...), or select their own custom date range on which to compare sales.

Select Date parameter: Sets the list of date options for the user to select.
Tableau parameter
Select Date parameter filter: Uses CASE statement to identify whether the KPI falls into the selected date range or not.
									CASE [Select Date]
WHEN 1 THEN DATEDIFF('day', [Order Date], TODAY()) = 1
WHEN 2 THEN DATEDIFF('week', [Order Date], TODAY()) = 0 AND [Order Date] < TODAY()
...
WHEN 12 THEN [Order Date] >= [Start Date] AND [Order Date] <= [End Date]
END
Comparison Date filter: Uses CASE statement to identify whether the KPI falls into the date range for the previous period so that the differences and % change can be calculated.
									CASE [Select Date]
WHEN 1 THEN DATEDIFF('day', [Order Date], TODAY()) = 2
WHEN 2 THEN DATEDIFF('week', [Order Date], TODAY()) = 1 AND DATEPART('weekday', [Order Date]) <= DATEPART('weekday', TODAY())
...
WHEN 12 THEN [Order Date] >= [Start Date] - [Custom Date (Days in Range)] AND [Order Date] <= [End Date] - [Custom Date (Days in Range)]
END

I then created a paremeter list for users to filter by their desired KPI. Each KPI has a calculation that refers to the selected period and the comparison period, as well as the difference, % change, and a directional "arrow" to symolize increase, decrease, or flat based on the difference between periods.

Select KPI parameter: Sets the list of KPI options for the user to select.
Tableau parameter
Selected period KPI: Gets the KPI value within the selected date range.
									IF [Select Date Parameter Filter] = TRUE
THEN [Sales] ELSE 0 END
Comparison period KPI: Gets the KPI value within the comparison period range.
									IF [Comparison Date Filter] = TRUE
THEN [Sales] END
KPI difference: Calculates the difference between the selected date and comparison date.
									ZN(SUM([Sales (Selected Date)])) - ZN(SUM([Sales (Comparison Date)]))
KPI % change: Calculates the % change between the selected date and comparison date.
									(SUM([Sales (Selected Date)]) - SUM([Sales (Comparison Date)]))
/
SUM([Sales (Comparison Date)])
Directional arrows: Shows whether the selected KPI has increased, decreased, or remained flat period vs. period.
									IF [Sales Difference] = 0 THEN "▶ " END
IF [Sales Difference] < 0 THEN "▼ " END
IF [Sales Difference] > 0 THEN "▲ " END

Finally, I calculated the custom KPIs such as ASP (Average Sale Price) and UPT (Units Per Transaction) and started building my worksheets.

Design

Executive Dashboard: This dashboard provides a high-level overview of the business for Regional Managers and above to identify KPI trends at a global level. I wanted the most important metrics to be displayed front and center at the top of the page in what are called BANs (Big A** Numbers), to provide quick context and key takeaways on the business's current performance.

Retail KPIs
BANs

The horizontal bar chart (bottom left) allows the user to drill down on each KPI by location. The user has the flexibility to drill all the way down to the City level to see how individual stores are performing.

Orders by location bar chart
Horizontal Bar Chart

The hotspot map (bottom right) allows the user to quickly compare KPI's at a global level. The color intensity and bubble size indicate KPI performance for each market, country, or region depending on applied filters. Users can click on a specific country to see details, or use the filters at the top of the page and the map will refocus to show only the selected locations.

Orders by country map
Hotspot Map

Field Dashboard: As an ex-Store Manager, this dashboard has the information that I would expect to see at that level. This dashboard allows managers to dive deeper into the business by segment, category, sub-category, and customer-level data. Similar to the Executive dashboard, major KPIs that are most relevant to Store Managers are located in BANs at the top of the report. Just below, users can see the current performance trends by Segment and Category/Sub-Category. Rather than creating another view for Category (which contains only Furniture, Office Supplies, and Technology), I allow them to use the filters at the top to control this level of detail in each view.

Orders by sub category bar chart
Orders | by sub-category

At the bottom of the Field dashboard I created views for the user to see their top and bottom products for each KPI. Managers are able to identify which products are contributing the most and least value to their business so they can make adjustments at the local level to drive sales or fix issues causing poor performance. When I was a store manager, bottom sellers often would often uncover inventory discrepancies preventing products from being sold that wouldn't have been discovered otherwise. Depending on the business, the ability to see customer level data can help managers build important relationships with customers.

Top 10 KPIs by product
Top 10 | by product

KPI Detail: Not everyone is a chart and graph person, and no single dashboard is going to display everything... To allow managers flexibility to dive deeper into their numbers, this dashboard allows the user to drill-down KPIs in a number of ways. The user can create a location based view, a category based view, an order priority based view, or any combination.

KPI drill down table
KPI drill down table

Data Literacy

To promote data literacy and help users understand how to operate the reports, each dashboard has an instruction overlay providing basic information about how to navigate the report. The overlay can be closed or opened by selecting the "X" in the top left corner. Each bar chart dynamically highlights the top performing location or product based on the filtered KPI, as well as the period vs. period change.

Dashboard instructions
KPI drill down table

Check out the full dashboard on Tableau Public.

Bellabeat Case Study With MySQL

Bellabeat case study

Purpose

This case study is the capstone project in the Google Data Analytics Professional Certificate on Coursera. I was provided a dataset from Kaggle on which to analyze trends in fitness tracker usage to answer key business questions and make high-level recommendations for a wellness brand's marketing strategy (Bellabeat).

In this project I take on the role of a data analyst working on the marketing analytics team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they aspire to become a larger player in the global smart device market. I have been tasked with focusing on one of Bellabeat's core products and analyzing smart device data to gain insight into how consumers are already using their devices. The Insights I discover will then help guide the marketing strategy for the company.

Key business questions

To get started, I was asked to analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices. The following questions will guide my analysis:

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

About the data

I was provided a dataset on which to perform my analysis:

  • FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty three fitbit users.

Thirty three eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. However, this dataset has several limitations to keep in mind as I explore the data:

  1. The sample size is extremely small, and not all users had 100% participation.
  2. There is no way to determine the gender of participants. Since Bellabeat specifically designs products for women, this data set may not be the best representation of Bellabeat's target audience.
  3. Not all devices tracked the same information (i.e. sleep and weight log data), so results using that data will be less reliable.
  4. This data set is 7 years old. Both Bellabeat and Fitbit have since come out with new devices and technology that have impacted the way people use their devices.

Preparing the data

After creating a database in MySQL and importing all of the csv files, I started by checking the basic statistics and cleaning the data. I went through and removed duplicate records and outliers that would skew the data, renamed and formatted columns for consistency, and merged tables to compare sleep and weight log data alongside other daily activity. I have uploaded my SQL scripts with detailed notes to my GitHub page.

Designing the dashboard

I decided to make this a Tableau story where I could present my findings in a similar method as a PPT presentation. There are 4 different slides excluding the title page.

Overview: I provide a brief explanation of the project and the business task.

Analysis: I briefly describe the tools I used to complete the analysis, and the questions I wanted to answer while exploring the data.

Results: I summarize my high level recommendations I for Bellabeat regarding how this information can help guide marketing strategy.

Interactive Dashboard: I compiled some of the key insights I found into an interactive dashboard with visuals for the various activities. In a presentation, these would likely be static images that display my results, however, for this particular project I wanted to give real viewers the chance to customize their interaction with the data. Therefore, the horizontal activity bar chart and the sleep vs. activity scatter are both able to to be filtered for different activities.

See this project also done in R Studio

PDX Weather Analysis With RStudio

Weather visualization project

Purpose

I'm fascinated by climate change and weather related events. To develop my skills with the program language R, I pulled real climate data for the PDX weather station dating back to 1938 from NOAA.gov, to help me visualize what climate in Portland has looked like over the last 84 years.

Prior to moving to Portland in 2011, I lived in Bend, OR. Growing up, all I ever heard about was how wet and rainy Portland was. People made it sound like Portland never saw the sun. In my exploration of the data, I found some interesting facts relating to precipitation, snow, and temperature change in Portland. All this Portland data made me curious to compare it to my home town of Bend, OR.

Spoiler alert! Although the average temperature in Portland has been rising over the past 80 years, temperatures in Bend have actually declined!

Preparing the data

To start this project, I used the tidyverse package and the read_csv function to establish a live connection with the NOAA.gov site that provides the .csv file for the PDX weather station. Since the data didn't originally have column headers in the source file, I had to manually locate those in the README file and add them to my import. After determining the columns that I needed, I imported the data and performed a pivot_wider function to change it from long format to wide to get each of the variables into separate columns. The only variables I was interested in were: date, tmax, tmin, prcp, & snow. I also had to use the lubridate package to adjust the date format to be in ymd format instead of just a string of numbers. Once my table format was established, I saved it to a variable called pdx_weather, and saved it as the source file on which to run future analyses. Now, I can come back to it at any time and run the script to get updated climate data.

Analysis

The first thing I wanted to look at was precipitation and snow data. To get started it's always a good idea to make sure the data makes sense. I first went through and checked for outliers by visualizing the data using a line plot and a histogram. I initially thought there might have been something wrong with the snow data, but after searching the date on Google, I discovered that over 40 inches of snow actually did dump on Portland in January of 1950! See the figure below:

Portland snowfall
PDX snowfall 1950

To get a better look at snow in Portland, the next thing I did was organize the data so that I could see snow by month. This would allow me to see when it typically starts and stops snowing, and which months get the most snow. To do this I had to reorganize the calendar into a "snow year", which puts any snow that occurs before August is added to the previous calendar year's snow season. I learned that the latest it has ever snowed in Portland is May (1953), and the earliest it has ever snowed was October (1950). Portland also has 9 years on record with no snow. I came up with the following visual with a nother striking view how snow is distributed throughout the year:

Portland monthly snow
PDX snow by month

After checking the data for anomalies, I moved on to plotting precipitation in a way I could see each year and compare current precipitation to the average precipitation for the last 84 years. I created a line plot where each line represents the cumulative precipitation for a single year. I then added code to highlight the current year in blue, all other years in light gray, and the mean precipitation in a black smoothed curve. I found that as of June, our precipitation was higher than average, but I was really intrigued to find out what year had the massive spike that dumped over 60 inches! I created the two plots below to find out:

Portland precipitation 2022
PDX precipitation 2022
Portland precipitation record 1996
PDX precipitation 1996

This was all very interesting and fun to learn, but I really wanted to know whether it has been getting wetter or drier over the last 84 years. I also thought it would be a good idea to put this on a line plot. I figured I should see which direction the temperature has been moving also to find out if there is a relationship between temperature and precipitation. I created the faceted line plot below to see this relationship:

Portland temperature vs. precipitation correlation
Temperature Vs. Precipitation

Then I put tmax and prcp into a scatterplot and ran a correlation test:

Portland temperature vs. precipitation correlation
Temperature Vs. Precipitation 2

No significant relationship exists between temperature and precipitation.

Temperature

2022 was a hot year, so I wanted to dive deeper into this temperature change next. I wanted to be able to see how the average temperature in 2022 compared to historical temps. I found a few interesting ways to visualize this.

A line chart that shows PDX temperature change (above or below the normalized mean) by month, with 2022 highlighted in dodgerblue, and all other years in light gray. You can see in this chart that July through October were some of the hottest months on record, between 5-10 degrees hotter than average.

Portland average temperature
Temperature Vs. Precipitation 2

During our record heatwave during summer of 2022, I saw a scatterplot posted on Twitter that visualized the record heat in California and I figured out how to reproduce the same visual for Portland. This chart has the daily high temps in gray for each day dating back to 1939, and the current month of interest highlighted in dodgerblue, and a dotted white line representing the normalized mean temp on which I compared. I found that both September and October of 2022 were the hottest September and Octobers on record at PDX.

Portland record temperatures
Record Heat

I can honestly say that this has been my favorite project so far because I can come back to it at any time to update any of the visuals to reflect recent events. I learned a ton about R programming, and I'm excited to continue to produce more projects in the future. If you want to see my code with detailed notes, and the additional comparisons I made with Bend climate data, please check it out on my GitHub page!

LeBron James Stats With Excel & Tableau

Lebron James career stats

Purpose

This was my first self-guided project using Tableau to create a visualization. I performed most of the heavy lifting in Excel prior to importing into Tableau. I'll go more into detail about how I processed the data in Excel below. I love following all the G.O.A.Ts in their respective sports, but I was particularly interested in putting it into a vizz. I decided to focus on Lebron because I knew he was getting close to breaking the all-time scoring record held by Kareem Abdul Jabar for the last 30 years. I wanted to see how he scored against each team he played against and at which arenas he plays best.

Process

For this project, I pulled dozens of tables from the nba.com stats pages and used powerquery to merge all of the files into a single Excel doc. To prepare the data for analysis, I used Text to Columns to separate Date, Team, OPP, and home vs. away into separate columns, and then I replaced the "vs."" and "@" symbols with "home" and "away". I used Google Maps to get the latitude, longitude and name for each stadium, and created a stadium reference sheet and used IF statements to populate the arena name, city, and state columns for each game. This allowed me to make the following transformation:

Before
After

From here, I used formulas to pull over his W's and L's for each opponent to the stadium list to summarize his W/L ratio to be used as a hotspot map in Tableau. I used the following formulas:

Wins and Losses=COUNTIFS(lebron_career_stats!$J:$J, "W", lebron_career_stats!$C:$C,'stadium list'!A2)
Win/Loss Ratio=H2/SUM(H2:I2)

This was the result:

NBA stadium list
Stadium List

Next I loaded this data into Tableau to begin visualizing. I wanted three basic elements to my dashboard:

  • A chart to see his average PPG against each opponent for each team he has played for
  • A visual that shows his points scored in every game, the team he played for, and his opponent
  • A visual that shows where he plays the best based on his W/L ratio

Average points by opponent

For this visual I arranged Team dimension on the rows shelf and the OPP dimension on the columns shelf. I added AVG(PTS) to Label on the Marks card as well as to the color mark to create a highlight table. As the Lakers are his current team, I chose a color scale from white to purple to represent his performance. It's interesting to see that he has consistently performed worst agains LAC on all three teams he has played for, yet he performed much better against the Lakers even though they play in the same stadium. And he clearly holds a grudge against his former team in CLE, as he has a career best 34.6 points per game against the Cavs and a 94.4% W/L ratio.

Lebron James points chart
Points Highlight Chart

Radial Bar Chart

Being new to Tableau, this one took some help and inspiration from fellow vizz artists online. I've seen several radial bar charts while browsing through Tableau visualizations and I thought this would be the perfect opportunity to learn how to create one. With the help of my good friend Youtube, I learned I needed to union my data to itself to create the calculated fields I needed:

Path Order: Dimension that determines the inner and outer dots on each lineIIF([Table Name]="lebron_career_stats",0,1)
Radial Field: Determines the measure to be used and the length of the bars[PTS]
Radial Angle: Looks down the table using the index function to give a form of ordinality and calculates which angle Tableau needs to plot each line(INDEX()-1 * (1/WINDOW_COUNT(COUNT([Radial Field]))) * 2 * PI())
Radial Normalized Length: calculates the length of the bars and how far away from the center Tableau should plot the inner and outer points[Radial Inner] + IIF(ATTR([Path Order]) = 0, 0, SUM([Radial Field])/WINDOW_MAX(SUM([Radial Field])) * ([Radial Outer]-[Radial Inner]))
Radial X & Radial Y: Trigonometry functions that calculate the point on the X and Y axis where we want to place the start and end points (COS for X & SIN for Y)[Radial Normalized Length] * COS([Radial Angle])
								[Radial Normalized Length] * SIN([Radial Angle])

I also created two parameters called Radial Inner and Radial Outer so the normalized length calculation had something to work with.

Now that I had all my calculations needed for the chart, the first step was to switch the chart type to line and drag the Path Order dimension onto the path card. Then drag the Radial X and Radial Y pills onto the columns and rows shelves respectively. Next I added the Date pill to the detail card to make each bar correspond to a particular game, and changed the calculations on the X and Y axis to compute using Date. From there, I just had to add the Team pill to the color card to see which team Lebron played on for each game and make some minor adjustments to the size of the circle, and voila! I had my chart.

Radial bar chart
Radial Bar Chart

Hotspot Map

The final step to have all of the elements for my dashboard was to utilize that location data I worked so hard for earlier and visualize his W/L ratio on a map. To do this, I added the Longitude pill to the columns shelf and the latitude pill to the rows shelf to get my map. Then I added the arena pill to the detail card to get the marks over each arena location. Finally, I added PTS to the size card and made sure it aggregated by AVG, and then W/L Ratio to the color card to make his best W/L ratio a deep shade of purple and his lowest W/L ratio a light gray.

Hotspot map
Hot Spot Map

Conclusion

This was a great project to get me started on learning Tableau. It helped me find specific Youtube channels to subscribe to that can help me in future projects, and I got tons of practice finding answers through the Tableau community forums and Stack Overflow. If I had to do this project over, I would love to try to establish a live connection with the data so that it is easier to update and doesn't become outdated.

Check out the full dashboard on Tableau Public.

About Me

Chase Carlson

Hi, my name is Chase, and I am a process-driven, solution-oriented individual passionate about solving problems and improving operational efficiency. I am a self-starter, motivated by opportunities to learn and apply new skills. I currently hold a bachelor’s degree in Business with an emphasis in Small Business Management & Entrepreneurship. In the Spring of 2023, started pursuing a Master of Science in Global Supply Chain Management at Portland State University with an expected graduation date of December, 2024. Course topics include strategic planning, sourcing, logistics, business analytics (forecasting & optimization modeling), new product introduction, supplier relationship management, global contract negotiations, and leadership. The program also includes an international field study with over 10 factory visits to gain first-hand insights into global manufacturing processes and supplier operations.

I have 17 years of retail leadership and analytical experience working directly with customers and front-line employees to deliver world-class retail experiences and business results. Most recently, I was responsible for building and developing a team of 15-45 retail employees and analyzing customer insights, sales KPIs, and operational metrics to generate data-driven insights and implement solutions to solve business and leadership challenges. As the Site Manager in a start-up environment, I planned, organized, implemented, and continuously improved several processes including omnichannel programs, Customer Returns, Digital Subscriptions, workforce management systems, inventory management systems, and more.

I am currently looking to transfer my skills to a global organization where I can challenge myself to solve progressively difficult problems while continuously learning new skills. My long-term goal is to grow with the company and continue to advance my knowledge, skills, and contributions at various levels of supply chain and business strategy. If you are a recruiter or a hiring manager, I would love to share how my experiences will add value to your team.

View my interactive resume

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form