Which Tool is Right for Your Data Science Task

In other posts, we’ve discussed the benefits of data analysis and shown you the power of data visualization. It’s so important to us that we’ve created a page that shows how these skills benefit your career and organization and how to uplevel your data science skills.

At this point, if you’re considering adding data analysis or data visualization skills to your portfolio, you’re probably wondering which tools you should focus on. This is a common question. There are a lot of data analysis and data visualization tools out there, all of them with different features and capabilities, and people often wonder which tool is best for their task. In other words, when should I stop using a spreadsheet like Excel and start using a data visualization tool like Tableau, and when should I start creating custom solutions using something like Python? That's a difficult question to answer, but in this article, we will try.

This is Not a Feature Comparison

Let me emphasize that we are not comparing products, we’re stating our preferences. There are lots of good articles comparing and contrasting Excel and Tableau, some of which are listed below:

Task Data Entry, Data Collection, Data Storage: Excel

We use spreadsheets, and we use them a lot. There's a big difference between spreadsheets and most data visualization software:

Spreadsheets like Excel support data entry and data storage, but data visualization software like Tableau does not. You can't type data in Tableau or use it to create a budget.

We use spreadsheets to:

  • Make estimates and set budgets for every project.

  • Track bugs in our QA test projects (yes, we have full bug tracking solutions, but not all of our clients do, and spreadsheets are the ubiquitous solution.

  • Provide project reports to clients.

  • Provide bonus reports to employees.

  • Document our training class schedule, then publish it via a plug-in.

  • Track anything and everything that is tracked by line item individually or shared amongst a few people.

Creating Visualizations

We create zero visualizations in Excel for ourselves, although we will do it for clients if that’s how they want them. We don't use Excel to create visualizations because Tableau is so much better. Yes, Excel has the tools to create basic charts in a few clicks, but they are not intuitive or flexible, and the first step in creating a chart is often reorganizing, sorting, and filtering data and, in some cases applying aggregations. Beyond that, customizing visuals to make them perfect for a presentation or attempting to visualize different levels of detail gets complicated and difficult very quickly. All of that is super easy in Tableau or a product like Power BI.

Once the data is connected, you drag and drop the fields you want and click once to pick the type of visualization to use. It's super fast and easy. Then you can start playing with visualizations and adding data, sorting and filtering the visualization, and customizing the level of detail to see exactly what you want.

The line chart above shows company sales across three separate segments, with aggregated data and a forecast, and an interactive filter to allow users to select the segment they want to focus on. All of it was done by drag and drop and clicks (no coding) in less than three minutes.

In short, Excel can create basic charts, but if you want more advanced or complex charts and want to use them as part of a presentation or share them with a team or your clients, Tableau will let you create much better visualizations, faster. Tableau also has more robust filtering options, including global filters that apply to all visualizations and worksheet-specific filters, which give you more ways to showcase your data.

Building Dashboards

Dashboards are the natural evolution of visualizations. Dashboards have multiple visualizations, different types of charts, and graphs that show different insights about a common topic, such as a project, sales data, production information, or whatever you need. Again, you can build dashboards in Excel, but it's not designed to do that. Tableau lets you create insightful and elegant dashboards quickly and easily. Furthermore, Tableau's interactive filtering makes it easy to isolate data you're interested in and analyze it. You can even link tooltips to other visualizations to provide deep and rich visual insights.

Tableau visualizations support tooltips and drill-in.

Data Analysis

Speaking of data analysis, what do we use to analyze data? Before we answer that question, you understand that Excel and Tableau provide fundamentally different types of analysis. Excel is a spreadsheet. Everything is in rows and columns, and that’s how analysis is performed in Excel, by aggregating and comparing data in rows and columns. In Tableau, everything is a visualization, and you analyze data visually.

Another illuminating thing about the difference between how Excel and Tableau are used is that, in Excel, visualizations are created after analysis to showcase findings, whereas, in Tableau, visualizations are used in the analysis process to find those insights and to showcase them.

Both Excel and Tableau let you create calculated fields and data; we feel that Tableau makes it faster and easier to do that at scale. Another key difference is that Tableau supports drill-in automatically, letting you change the level of detail to match what's required for the analysis you're performing. That can get tricky in Excel.

So the question remains; what do we use to analyze data? The answer, is both, depending on the amount of data and depth of analysis we're performing. If we're building an estimate for a client and trying to meet their budgetary requirements, we'll put different options in multiple columns in a spreadsheet, add them up, compare them, and tweak numbers to find the best options for the client. If we have multiple spreadsheets tracking software bugs and want to perform deep analysis to discover what the most common bugs are, which bugs take the longest and shortest time to fix, and which type of bugs generate more cascading issues, then that's a job for Tableau.

Generally speaking, if the data to be analyzed is on a single viewable page on a spreadsheet and involves fewer than five columns of data, and we're performing a simple analysis, such as comparisons, then we'll do that in a spreadsheet. If it goes beyond, if it involves data from multiple sources, or even multiple worksheets, or involves deeper analysis, we'll do that in Tableau.

We always do Exploratory data analysis in Tableau. What is exploratory data analysis? That's when you know your data has information, but you're not entirely sure what you're looking for. Exploratory data analysis is what you do when you don't understand everything about the data you have. Developing that understanding usually generates insights along the way. Take the Excel example about creating an estimate, in that case; we know what we're looking for; costs per line item, the totals based on different options, and a comparison of different packages of options. The bug report analysis is exploratory data analysis. Yes, we have a general idea of things we want to understand more clearly, but don't know for sure what other insights we'll find along the way. For example, we may find that some people on the developer team aren't performing unit tests which is leading to more bugs.

Working with data sources

Another big difference between Excel and Tableau is that Excel is designed to read and store data in a single file, which, inside of Excel, can contain multiple worksheets, but Tableau is designed to work with multiple data sources, including very large databases. While you can get external data for Excel, that’s not typically how we use it. Other people do, but many simply export the data to a .CSV file and open it in Excel.

If you start working with data from multiple data sources and want to maintain a live connection to those data sources for data refresh and scheduled reporting, the challenges get bigger. You have to do a lot of data cleaning, preparation, and transformation to get the data ready for analysis. Excel's Power Query is a nice tool with a smart and easy-to-use interface that is very nice for connecting to data sources and even raw data from the web if you're working with a single table of data. When you go beyond that, Tableau's Prep and Flow tools are better.

Tableau's Prep Builder provides better flexibility in how data is joined and merged and is better at visualizing the cleaning, preparation, and transformation processes through flow diagrams. These make it much easier to maintain and troubleshoot data issues, make changes to data preparation steps, or merge new data sources. The image above shows four data sources being joined, cleaned, and transformed.

Tableau lets you fully automate data preparation steps without using macros and create schedules to refresh data automatically.

Sharing and Collaboration

Some of the comparison articles listed above make the case that "Excel is not a good fit for agile business practices," where data analysis and visualization are concerned, meaning it's not easy to share data easily and securely while maintaining data connections and integrity.

Excel is file-based, and it's designed to share by sharing the file. While Excel does support coauthoring, the default method of sharing is saving a file to a file share where others can access it, and the online sharing tool of choice would be Sharepoint. Herein lies the problem, the only built-in access control to Excel files is password protection, and if you don't use that, you have to rely on file share permissions, or Sharepoint permissions, two different sets of permissions set in different places, often by different admins. This, coupled with the fact that people tend to lean into what they know when trying to get things done, means that Excel files often get shared. This can lead to people using files with data that is out of date or having different files updated and used by different teams and people. This can lead to what they used to call the problem of ‘different versions of the truth’ because people are working off of different files. That was a polite way of saying no one has good data.

Tableau, on the other hand, has a built-in sharing and collaboration platform that is either cloud-based, Tableau Online, or on-premises Tableau Server. You create separate workspaces and assign fine-grain permissions to data sources and workbooks to control who can view and edit workbooks and data sources. You can even control if raw data sources can be viewed and if workbooks and data can be downloaded.

Tableau lets users subscribe to worksheets to see changes delivered to their inbox on a schedule they set, @mention other users in comments, and create alerts based on the data in visualizations so that they are notified when data-based thresholds are crossed. For example, the VP of Sales can set an alert in a report to be notified when total sales for the year exceed the previous year. Tableau is designed for this and designed to do it with good data governance practices.

Reporting

For reporting, we use both Excel and Tableau, and our preference depends on what's being reported and what the reports are supposed to show. We deliver lots of reports in Excel where the data is best represented in a spreadsheet and is contained in a single worksheet, such as a bug list or test results. Individually, these make data and results easy to see and compare. Again, for doing a quarterly rollup, we'd likely bring the data into Tableau to build a series of visualizations.

If the data is large and complex, if it's best presented visually, or if it's going to be shared with executives or clients, we lean toward Tableau. All the reasons listed above make Tableau the better choice for creating and sharing visuals with people inside your organization, and Tableau dashboards are very good presentation tools. Tableau also has richer export options, letting you export to both PDF and PowerPoint directly, and letting you control which visualizations you export. You can also embed Tableau visualizations in web pages.

Python

Where does Python come in? Python is a programming language that is easy to learn and great for working with data. We turn to python when it's better at performing a specific data science task or when we need to do something that other tools can't do and when custom automation is called for.

What Python Does Better:

  • Perform complex or highly customized data cleaning, transformation, and preprocessing where multiple custom or logic-driven steps must be taken, or exterior lookups to other data sources are required to properly prepare or engineer the data.

  • Cleaning and transforming large amounts of data from previously unknown or unused data sources. Tableau is great at working with data sources when you know their features. When we don't know anything about the data, we prefer python because it allows us to more quickly analyze data for features, perform imputation to add values to datasets and perform deduplication across multiple data sets.

  • Examining frequency distribution to determine the best way to statically analyze a data set, taking standard deviation, variance, skewness, kurtosis, etc., into account. Python's ability to output simple visualizations from libraries, such as Matplotlib and Seaborn, allows you to visualize distribution.

What Python Does that Excel and Tableau Can’t:

  • Perform automated data mining by scraping data from websites with libraries like Scrapy and BeautifulSoup and create structured data from it.

  • Image processing for data analysis, such as classification or identification.

  • Natural language processing for data analysis for classification or identification.

  • Building machine learning apps to create estimations, including creating data models, training data models, testing hypotheses, and building data pipelines to replenish models with new data. Python is also an excellent tool for tuning and optimizing models to reduce variance and bias while not making the model too complex or introducing too many errors.

Conclusion

In this article, we've discussed how we use Excel, Tableau, and Python for different data analysis and visualization tasks. As stated earlier, these are preferences, and we have the advantage of knowing how to use all three tools very well. There are also many other tools for data analysis and data visualization that we didn’t examine. Ultimately, you’re going to have to try some tools and figure out what works best for you. We hope the information presented here helps you make your choices. If you use Excel but have never used Tableau or Python, maybe it's time to give them a test drive. For Tableau, a great place to start is our one-day, Tableau for Data-driven Decision Maker's course. For Python, see our list of Python courses, or contact us for the best place to start your Python journey. If you want to learn more about how data analysis and data visualization skills can help your career and organization and how to start your data science journey, see our Data Analysis and Visualization for Career Growth page.