Tips & Tricks

Top Tools for Data Scientists: Analytics Tools, Data Visualization Tools, Database Tools, and More

By
Jenn Day

August 21, 2017

Tips & Tricks

Data scientists are inquisitive and often seek out new tools that help them find answers. They also need to be proficient in using the tools of the trade, even though there are dozens upon dozens of them. Overall, data scientists should have a working knowledge of statistical programming languages for constructing data processing systems, databases, and visualization tools. Many in the field also deem a knowledge of programming an integral part of data science; however, not all data scientist students study programming, so it is helpful to be aware of tools that circumvent programming and include a user-friendly graphical interface so that data scientists’ knowledge of algorithms is enough to help them build predictive models.

With everything on a data scientist’s plate, you don’t have time to search for the tools of the trade that can help you do your work. That’s why we have rounded up tools that aid in data visualization, algorithms, statistical programming languages, and databases. We have chosen tools based on their ease of use, popularity, reputation, and features. And, we have listed our top tools for data scientists in alphabetical order to simplify your search; thus, they are not listed by any ranking or rating.

1. Algorithms.io
@algorithms_io

Algorithmsio

Algorithms.io is a LumenData Company providing machine learning as a service for streaming data from connected devices. This tool turns raw data into real-time insights and actionable events so that companies are in a better position to deploy machine learning for streaming data.

Key Features:

Simplifies the process of making machine learning accessible to companies and developers working with connected devices
Cloud platform addresses the common challenges with infrastructure, scale, and security that arise when deploying machine data
Creates a set of APIs for developers to use to integrate machine learning into web and mobile apps so that any application can turn raw streaming data into intelligent output

Cost: Contact for a quote

2. Apache Giraph

An iterative graph processing system designed for high scalability, Apache Giraph began as an open source counterpart to Pregel but adds multiple features beyond the basic Pregel model. Giraph is used by data scientists to “unleash the potential of structured datasets at a massive scale.”

Key Features:

Inspired by the Bulk Synchronous Parallel model of distributed computation as introduced by Leslie Valiant
Master computation
Sharded aggregators
Edge-oriented input
Out-of-core computation
Steady development cycle and growing community of users

Cost: FREE

3. Apache Hadoop
@hadoop

Apache Hadoop is an open source software for reliable, distributed, scalable computing. A framework allowing for the distributed processing of large datasets across clusters of computers, the software library uses simple programming models. Hadoop is appropriate for research and production.

Key Features:

Designed to scale from single servers to thousands of machines
The library detects and handles failures at the application layer instead of relying on hardware to deliver high-availability
Includes the Hadoop Common, Hadoop Distributed File System (HDFS), Hadoop YARN, and Hadoop MapReduce modules

Cost: FREE

4. Apache HBase
@ApacheHBase

The Hadoop database, Apache HBase is a distributed, scalable, big data store. Data scientists use this open source tool when they need random, real-time read/write access to Big Data. Apache HBase also provides capabilities similar to Bigtable on top of Hadoop and HDFS.

Key Features:

Open source, distributed, versioned, non-relational database modeled after Google’s Bigtable: A Distributed Storage System for Structured Data
Linear and modular scalability
Strictly consistent reads and writes
Automatic and configurable shading of tables

Cost: FREE

5. Apache Hive
@ApacheHive

An Apache Software foundation Project, Apache Hive began as a subproject of Apache Hadoop and now is a top-level project itself. This tool is a data warehouse software that assists in reading, writing, and managing large datasets that reside in distributed storage using SQL.

Key Features:

Project structure onto data already in storage
Command line tool is provided to connect users to Hive
JDBC driver is provided to connect users to Hive

Cost: FREE

6. Apache Kafka
@apachekafka

A distributed streaming platform, Apache Kafka efficiently processes streams of data in real time. Data scientists use this tool to build real-time data pipelines and streaming apps because it empowers you to publish and subscribe to streams of records, store streams of records in a fault-tolerant way, and process streams of records as they occur.

Key Features:

Runs as a cluster on one or more servers
Cluster stores streams of records in categories called topics
Each record includes a key, value, and timestamp
Has of four core APIs: Producer API, Consumer API, Streams API, and Connector API

Cost: FREE

7. Apache Mahout
@ApacheMahout

An open source Apache Foundation project for machine learning, Apache Mahout aims to enable scalable machine learning and data mining. Specifically, the project’s goal is to “build an environment for quickly creating scalable performant machine learning applications.”

Key Features:

Simple, extensible programming environment and framework for building scalable algorithms
Includes a wide variety of pre-made algorithms for Scala + Apache Spark, H2O, and Apache Flink
Provides Samsara, a vector math experimentation environment with R-like syntax, which works at scale

Cost: FREE

8. Apache Mesos
@ApacheMesos

A cluster manager, Apache Mesos provides efficient resource isolation and sharing across distributed applications or frameworks. Mesos abstracts CPU, memory, storage, and other resources away from physical or virtual machines to enable fault-tolerant, elastic distributed systems to be built easily and run effectively.

Key Features:

Built using principles similar to that of the Linux kernel but at a different level of abstraction
Runs on every machine and provides applications like Hadoop and Spark with APIs for resource management and scheduling completely across datacenter and cloud environments
Easily scales to 10,000s of nodes
Non-disruptive upgrades for high availability
Cross platform and cloud provider agnostic

Cost: FREE

9. Apache Pig

A platform designed for analyzing large datasets, Apache Pig consists of a high-level language for expressing data analysis programs that is coupled with infrastructure for evaluating such programs. Because Pig programs’ structures can handle significant parallelization, they can tackle large datasets.

Key Features:

Infrastructure consists of a compiler capable of producing sequences of Map-Reduce programs for which large-scale parallel implementations already exist
Language layer includes a textual language called Pig Latin
Key properties of Pig Latin include ease of programming, optimization opportunities, and extensibility

Cost: FREE

10. Apache Spark
@ApacheSpark

Apache Spark delivers “lightning-fast cluster computing.” A wide range of organizations use Spark to process large datasets, and this data scientist tool can access diverse data sources such as HDFS, Cassandra, HBase, and S3.

Key Features:

Advanced DAG execution engine to support acyclic data flow and in-memory computing
More than 80 high-level operators make it simple to build parallel apps
Use interactively from the Scale, Python, and R shells
Powers a stack of libraries including SQL, DataFrames, MLlib, GraphX, and Spark Streaming

Cost: FREE

11. Apache Storm
@ApacheStorm
@stormprocessor

Apache Storm is a tool for data scientists that handles distributed and fault-tolerant real-time computation. It also tackles stream processing, continuous computation, distributed RPC, and more.

Key Features:

Free and open source
Reliably process unbounded data streams for real-time processing
Use with any programming language
Use cases include real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more
More than one million tuples processed per second per mode
Integrates with your existing queueing and database technologies

Cost: FREE

12. BigML
@bigmlcom

BigML makes machine learning simple. This company-wide platform runs in the cloud or on premises for operationalizing machine learning in organizations. BigML makes it simple to solve and automate classification, regression, cluster analysis, anomaly detection, association discovery, and topic modeling tasks.

Key Features:

Build sophisticated machine learning-based solutions affordably
Distill predictive patterns from data into practical, intelligent applications that anyone can use
The platform, private deployments, and rich toolset help users create, rapidly experiment, fully automate, and manage machine learning workflows to power intelligent applications

Cost: Contact for a quote

13. Bokeh
@BokehPlots

A Python interactive visualization library, Bokeh targets modern web browsers for presentation and helps users create interactive plots, dashboards, and data apps easily.

Key Features:

Provides elegant and concise construction of graphics similar to D3.js
Extends capabilities to high-performance interactivity over large or streaming datasets
Quickly and easily create interactive plots, dashboards, and data applications

Cost: FREE

14. Cascading
@cascading

Cascading is an application development platform for data scientists building Big Data applications on Apache Hadoop. Users can solve simple and complex data problems with Cascading because it boasts computation engine, systems integration framework, data processing, and scheduling capabilities.

Key Features:

Balances an ideal level of abstraction with appropriate degrees of freedom
Offers Hadoop development teams portability
Change a few lines of cod and port Cascading to another supported compute fabric
Runs on and may be ported between MapReduce, Apache Tea, and Apache Flink

Cost: FREE

15. Clojure

A robust and fast programming language, Clojure is a practical tool that marries the interactive development of a scripting language with an efficient infrastructure for multithreaded programming. Clojure is unique in that it is a compile language but remains dynamic with every feature supported at runtime.

Key Features:

Rich set of immutable, persistent data structures
Offers a software transactional memory system and reactive Agent system to ensure clean, correct, multithreaded designs when mutable state is necessary
Provides easy access to Java frameworks with optional type hints and type inference
Dynamic environment that users can interact with

Cost: FREE

16. D3.js
@mbostock

Committed to “code and data for humans,” Mike Bostock created D3.js. Data scientists use this tool, a JavaScript library for manipulating documents based on data, to add life to their data with SVG, Canvas, and HTML.

Key Features:

Emphasis on web standards to gain full capabilities of modern browsers without being tied to a proprietary framework
Combines powerful visualization components and a data-driven approach to Document Object Model (DOM) manipulation
Bind arbitrary data to a DOM and then apply data-driven transformations to the document

Cost: FREE

17. DataRobot
@DataRobot

An advanced machine learning automation platform, DataRobot helps data scientists build better predictive models faster. You can keep up with the ever-expanding ecosystem of machine learning algorithms easily when you use DataRobot.

Key Features:

Constantly expanding, vast set of diverse, best-in-class algorithms from leading sources
Train, test, and compare hundreds of varying models with one line of code or a single click
Automatically identifies top pre-processing and feature engineering for each modeling technique
Uses hundreds and even thousands of servers as well as multiple cores within each server to parallelize data exploration, model building, and hyper-parameter tuning
Easy model deployment

Cost: Contact for a quote

18. DataRPM
@DataRPM

DataRPM is the “industry’s first and only cognitive predictive maintenance platform for industrial IoT. DataRPM also is the recipient of the 2017 Technology Leadership Award for Cognitive Predictive Maintenance in Automotive Manufacturing from Frost & Sullivan.

Key Features:

Uses patent-pending meta-learning technology, an integral component of Artificial Intelligence, to automate predictions of asset failures
Runs multiple live automated machine learning experiments on datasets
Extracts data from every experiment, trains models on the metadata repository, applies models to predict the best algorithms, and builds machine-generated, human-verified machine learning models for predictive maintenance
Workflow uses recipes such as feature engineering, segmentation, influencing factors, and prediction recipes to deliver prescriptive recommendations

Cost: Contact for a quote

19. Excel
@Office

Many data scientists view Excel as a secret weapon. It is a familiar tool that scientists can rely on to quickly sort, filter, and work with their data. It’s also on nearly every computer you come across, so data scientists can work from just about anywhere with Excel.

Key Features:

Named ranges for creating a makeshift database
Sorting and filtering with one click to quickly and easily explore your dataset
Use Advanced Filtering to filter your dataset based on criteria you specify in a different range
Use pivot tables to cross-tabulate data and calculate counts, sums, and other metrics
Visual Basic provides a variety of creative solutions

Cost: FREE trial available

Home Buying Options
- Office 365 Home: $99.99/year
- Office 365 Personal: $69.99/year
- Office Home & Student 2016 for PC $149.99 one-time purchase
Business Buying Options
- Office 365 Business: $8.25/user/month with annual commitment
- Office 365 Business Premium: $12.50/user/month with annual commitment
- Office 365 Business Essentials: $5/user/month with annual commitment

20. Feature Labs

An end-to-end data science solution, Feature Labs develops and deploys intelligent products and services for your data. They also work with data scientists to help you develop and deploy intelligent products, features, and services.

Key Features:

Integrates with your data to help scientists, developers, analysts, managers, and executives
Discover new insights and gain a better understanding of how your data forecasts the future of your business
On-boarding sessions tailored to your data and use cases to help you get off to an efficient start

Cost: Contact for a quote

21. ForecastThis
@forecastthis

ForecastThis is a tool for data scientists that automates predictive model selection. The company strives to make deep learning relevant for finance and economics by enabling investment managers, quantitative analysts, and data scientists to use their own data to generate robust forecasts and optimize complex future objectives.

Key Features:

Simple API and spreadsheet plugins
Uniquely robust global optimization algorithms
Scales to challenges of nearly any shape or size
Algorithms create plausible, interpretable models of market processes to lend credibility to any output and help you get inside the market more successfully

Cost: Contact for a quote

22. Fusion Tables
@GoogleFT

Google Fusion Tables is a cloud-based data management service that focuses on collaboration, ease-of-use, and visualizations. An experimental app, Fusion Tables is a data visualization web application tool for data scientists that empowers you to gather, visualize, and share data tables.

Key Features:

Visualize bigger table data online
Combine with other data on the web
Make a map in minutes
Search thousands of public Fusion Tables or millions of public tables from the web that you can import to Fusion Tables
Import your own data and visualize it instantly
Publish your visualization on other web properties

Cost: FREE

23. Gawk

GNU is an operating system that enables you to use a computer without software “that would trample your freedom.” They have created Gawk, an awk utility that interprets a special-purpose programming language. Gawk empowers users to handle simple data-reformatting jobs using only a few lines of code.

Key Features:

Search files for lines or other text units containing one or more patterns
Data-driven rather than procedural
Makes it easy to read and write programs

Cost: FREE

24. ggplot2
@hadleywickham
@winston_chang

Hadley Wickham and Winston Chang developed ggplot2, a plotting system for R that is based on the grammar of graphics. With ggplot2, data scientists can avoid many of the hassles of plotting while maintaining the attractive parts of base and lattice graphics and producing complex multi-layered graphics easily.

Key Features:

Create new types of graphic tailored to your needs
Create graphics to help you understand your data
Produce elegant graphics for data analysis

Cost: FREE

25. GraphLab Create

Data scientists and developers use GraphLab Create to build state-of-the-art data products via machine learning. This machine learning modeling tool helps users build intelligent applications end-to-end in Python.

Key Features:

Simplifies development of machine learning models
Incorporates automatic feature engineering, model selection, and machine learning visualizations specific to the application
Identify and link records within or across data sources corresponding to the same real-world entities

Cost:

FREE one-year renewable subscription for academic use

26. IPython
@IPythonDev

Interactive Python tools, or IPython, is a growing project with expanding language-agnostic components and provides a rich architecture for interactive computing. An open source tool for data scientists, IPython supports Python 2.7 and 3.3 or newer.

Key Features:

A powerful interactive shell
A kernel for Jupyter
Support for interactive data visualization and use of GUI toolkits
Load flexible, embeddable interpreters into your own projects
Easy-to-use high performance parallel computing tools

Cost: FREE

27. Java
@SW_Java

Java is a language with a broad user base that serves as a tool for data scientists creating products and frameworks involving distributed systems, data analysis, and machine learning. Java now is recognized as being just as important to data science as R and Python because it is robust, convenient, and scalable for data science applications.

Key Features:

Easy to break down and understand
Helps users be explicit about types of variables and data
Well-developed suite of tools
Develop and deploy applications on desktops and servers in addition to embedded environments
Rich user interface, performance, versatility, portability, and security for modern applications

Cost: FREE trial available; Contact for commercial license cost

28. Jupyter
@ProjectJupyter

Jupyter provides multi-language interactive computing environments. Its Notebook, an open source web application, allows data scientists to create and share documents containing live code, equations, visualizations, and explanatory text.

Key Features:

Uses include data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and more
Supports more than 40 programming languages including popular data science languages like Python, R, Julia, and Scala
Share notebooks with others via email, Dropbox, GitHub, and the Jupyter Notebook Viewer
Code can produce images, videos, LaTeX, and JavaScript
Use interactive widgets to manipulate and visualize data in realtime

Cost: FREE

29. KNIME Analytics Platform
@knime

Thanks to its open platform, KNIME is a tool for navigating complex data freely. The KNIME Analytics Platform is a leading open solution for data-driven innovation to help data scientists uncover data’s hidden potential, mine for insights, and predict futures.

Key Features:

Enterprise-grade, open source platform
Deploy quickly and scale easily
More than 1,000 modules
Hundreds of ready-to-run examples
Comprehensive range of integrated tools
The widest choice of advanced algorithms available

Cost: FREE

30. Logical Glue
@logicalglue

An award-winning white-box machine learning and artificial intelligence platform, Logical Glue increases productivity and profit for organizations. Data scientists choose this tool because it brings your insights to life for your audience.

Key Features:

Visual narratives that bring insights to life
Improve the communication and visualization of your insights more easily
Access new techniques with Fuzzy Logic and Artificial Neural Networks
Build the most accurate predictive models
Know exactly which data is predictive
Simple deployment and integration

Cost: Contact for a quote

31. MATLAB
@MATLAB

A high-level language and interactive environment for numerical computation, visualization, and programming, MATLAB is a powerful tool for data scientists. MATLAB serves as the language of technical computing and is useful for math, graphics, and programming.

Key Features:

Analyze data, develop algorithms, and create models
Designed to be intuitive
Combines a desktop environment for iterative analysis and design processes with a programming language capable of expressing matrix and array mathematics directly
Interactive apps to see how different algorithms work with your data
Automatically generate a MATLAB program to reproduce or automate your work after you’ve iterated and gotten the results you want
Scale analyses to run on clusters, GPUs, and clouds with simple code changes

Cost:

MATLAB Standard Individual: $2,150
MATLAB Academic Use, Individual: $500
Contact for other licensing options and pricing

32. Matplotlib
@matplotlib

Matplotlib is a Python 2D plotting library that produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Data scientists use this tool in Python scripts, the Python and IPython shell, the Jupyter Notebook, web application servers, and four graphical user interface toolkits.

Key Features:

Generate plots, histograms, power spectra, bar charts, error charts, scatterplots, and more with a few lines of code
Full control of line styles, font properties, axes properties, etc. with an object-oriented interface or via a set of functions similar to MATLAB
Several Matplotlib add-on toolkits are available

Cost: FREE

33. MLBase
@amplab

UC Berkeley’s AMPLab integrates algorithms, machines, and people to make sense of Big Data. They also developed MLBase, an open source project that makes distributed machine learning easier for data scientists.

Key Features:

Consists of three components: MLib, MLI, and ML Optimizer
MLib is Apache Spark’s distributed ML library
MLI is an experimental API for feature extraction and algorithm development introducing high-level machine learning programming abstractions
ML Optimizer automates the task of machine learning pipeline construction and solves a search problem over feature extractors and ML algorithms
Implement and consume machine learning at scale more easily

Cost: FREE

34. MySQL
@MySQL

MySQL is one of today’s most popular open source databases. It’s also a popular tool for data scientists to use to access data from the database. Even though MySQL typically is software in web applications, it can be used in a variety of settings.

Key Features:

Open source relational database management system
Store and access your data in a structured way without hassles
Support data storage needs for production systems
Use with programming languages such as Java
Query data after designing the database

Cost: FREE

35. Narrative Science
@narrativesci

Narrative Science helps enterprises maximize the impact of their data with automated, intelligent narratives generated by advanced narrative language generation (NLG). Data scientists humanize data with Narrative Science’s technology that interprets and then transforms data at unparalleled speed and scale.

Key Features:

Turn data into actionable, powerful assets for making better decisions
Help others in your organization understand and act on data
Integrates into existing business intelligence tools
Create a new reporting experience that drives better decisions more quickly

Cost: Contact for a quote

36. Natural Language Toolkit (NLTK)
@NLTK_org

A leading platform for building Python programs, Natural Language Toolkit (NLTK) is a tool for working with human language data. NLTK is a helpful tool for inexperienced data scientists and data science students working in computational linguistics using Python.

Key Features:

Provides easy-to-use interfaces to more than 50 corpora and lexical resources
Includes a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and more
Learn more from the active discussion forum

Cost: FREE

37. NetworkX

NetworkX is a Python package tool for data scientists. Create, manipulate, and study the structure, dynamics, and functions of complex networks with NetworkX.

Key Features:

Data structures for graphs, digraphs, and multigraphs
Abundant standard graph algorithms
Network structure and analysis measures
Edges capable of holding arbitrary data
Generate classic graphs, random graphs, and synthetic networks

Cost: FREE

38. NumPy

A fundamental package for scientific computing with Python, NumPy is well-suited to scientific uses. NumPy also serves as a multi-dimensional container of generic data.

Key Features:

Contains a powerful N-dimensional array object
Sophisticated broadcasting functions
Tools for integrating C/C++ and Fortran code
Define arbitrary data-types to seamlessly and speedily integrate with a wide variety of databases

Cost: FREE

39. Octave
@GnuOctave

GNU Octave is a scientific programming language that is a useful tool for data scientists looking to solve systems of equations or visualize data with high-level plot commands. This tool’s syntax is compatible with MATLAB, and its interpreter can be run in GUI mode, as a console, or invoked as part of a shell script.

Key Features:

Powerful math-oriented syntax with built-in plotting and visualization tools
Runs on GNU/Linux, MacOS, BSD, and Windows
Drop-in compatible with many MATLAB scripts
Use linear algebra operations on vectors and matrices to solve systems of equations
Use high-level plot commands in 2D and 3D to visualize data

Cost: FREE

40. OpenRefine
@OpenRefine

OpenRefine is a powerful tool for data scientists who want to clean up, transform, and extend data with web services and then link it to databases. Formerly Google Refine, OpenRefine now is an open source project fully supported by volunteers.

Key Features:

Explore large datasets easily
Clean and transform data
Reconcile and match data
Link and extend datasets with a range of web services
You may upload cleaned data to a central database

Cost: FREE

41. pandas

pandas is an open source library that delivers high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Data scientists use this tool when they need a Python data analysis library.

Key Features:

NUMFocus-sponsored project that secures development of pandas as a world-class, open source project
Fast, flexible, and expressive data structures make working with relational and labeled data easy and intuitive
Powerful and flexible open source data analysis and manipulation tool available in a variety of languages

Cost: FREE

42. RapidMiner
@RapidMiner

Data scientists are more productive when they use RapidMiner, a unified platform for data prep, machine learning, and model deployment. A tool for making data science fast and simple, RapidMiner is a leader in the 2017 Gartner Magic Quadrant for Data Science Platforms, a leader in 2017 Forrester Wave for predictive analytics and machine learning, and a high performer in the G2 Crowd predictive analytics grid.

Key Features:

RapidMiner Studio is a visual workflow designer for data scientists
Share, reuse, and deploy predictive models from RapidMiner Studio with RapidMiner Server
Run data science workflows directly inside Hadoop with RapidMiner Radoop

Cost:

RapidMiner Studio
- FREE – 10,000 rows of data and 1 logical processor
- Small: $2,500/year – 100,000 rows of data and 2 logical processors
- Medium: $5,000/year – 1,000,000 rows of data and 4 logical processors
- Large: $10,000/year – Unlimited rows of data and unlimited logical processors
RapidMiner Server
- FREE – 2 GB RAM, 1 logical processor, and 1,000 Web Service API calls
- Small: $15,000/year – 16 GB RAM, 4 logical processors, and unlimited Web Service API calls
- Medium: $30,000/year – 64 GB RAM, 8 logical processors, and unlimited Web Service API calls
- Large: $60,000/year – Unlimited GB RAM, unlimited logical processors, and unlimited Web Service API calls
RapidMiner Radoop
- FREE – Limited to a single user and community customer support
- Enterprise: – $15,000/year – $5,000 for each additional user and enterprise customer support

43. Redis
@redisfeed

Redis is a data structure server that data scientists use as a database, cache, and message broker. This open source, in-memory data structure store supports strings, hashes, lists, and more.

Key Features:

Built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence
High availability via Redis Sentinel and automatic partitioning with Redis cluster
Run automatic operations such as appending to a string, incrementing the value in a hash, pushing an element to a list, and more

Cost: FREE

44. RStudio
@rstudio

RStudio is a tool for data scientists that is open source and enterprise-ready. This professional software for the R community makes R easier to use.

Key Features:

Includes a code editor, debugging, and visualization tools
Integrated development environment (IDE) for R
Includes a console, syntax-highlighting editor supporting direct code execution and tools for plotting, history, debugging, and workspace management
Available in open source and commercial editions and runs on the desktop or in a browser connected to RStudio Server or Studio Server Pro

Cost:

Open Source Edition: FREE
Commercial License: $995/year

45. Scala
@scala_lang

The Scala programming language is a tool for data scientists looking to construct elegant class hierarchies to maximize code reuse and extensibility. The tool also empowers users to implement class hierarchies’ behavior using higher-order functions.

Key Features:

Modern multi-paradigm programming language designed to express common programming patterns concisely and elegantly
Smoothly integrates features of object-oriented and functional languages
Supports higher-order functions and allows functions to be nested
Notion of pattern matching extended to the processing of XML data with the help of right-ignoring sequence patterns using a general extension via extractor objects

Cost: FREE

46. scikit-learn
@scikit_learn

scikit-learn is an easy-to-use, general-purpose machine learning for Python. Data scientists prefer scikit-learn because it features simple, efficient tools for data mining and data analysis

Key Features:

Accessible to everyone and reusable in certain contexts
Built on NumPy, SciPy, and Matplotlib
Open source, commercially usable BSD license

Cost: FREE

47. SciPy

SciPy, a Python-based ecosystem of open source software, is intended for for math, science, and engineering applications. The SciPy Stack includes Python, NumPy, Matplotlib, Python, the SciPy Library, and more.

Key Features:

Scientific computing tools for Python including a collection of open source software and a specified set of core packages
A community of people who use and develop the SciPy Stack
SciPy library provides several numerical routines

Cost: FREE

48. Shiny

A web application framework for R by RStudio, Shiny is a tool data scientists use to turn analyses into interactive web applications. Shiny is an ideal tool for data scientists who are inexperienced in web development.

Key Features:

No HTML, CSS, or JavaScript knowledge required
Easy-to-write apps
Combines R’s computational power with the modern web’s interactivity
Use your own servers or RStudio’s hosting service

Cost: Contact for a quote

49. TensorFlow
@tensorflow

TensorFlow is a fast, flexible, scalable open source machine learning library for research and production. Data scientists use TensorFlow for numerical computation using data flow graphs.

Key Features:

Flexible architecture for deploying computation to one or more CPUs or GPUs in a desktop, server, or mobile device with one API
Nodes in the graph represent mathematical operations, while graph edges represent the multidimensional data arrays communicated between them
Ideal for conducting machine learning and deep neural networks but applies to a wide variety of other domains

Cost: FREE

50. TIBCO Spotfire
@TIBCO

TIBCO drives digital business by enabling better decisions and faster, smarter actions. Their Spotfire solution is a tool for data scientists that addresses data discovery, data wrangling, predictive analytics, and more.

Key Features:

Smart, secure, governed, enterprise-class analytics platform with built-in data wrangling
Delivers AI-driven, visual, geo, and streaming analytics
Smart visual data discovery with shortened time-to-insight
Data preparation features empower you to shape, enrich, and transform data and create features and identify signals for dashboards and actions

Cost: FREE trial available

Spotfire Cloud: $200/month or $2,000/year; Custom pricing also available
Spotfire Platform: Contact for a quote
Spotfire Cloud Enterprise: Contact for a quote

51. BONUS Pxyll.com
@pyxll

This blog features a comprehensive list of tools for working with Python and Excel. It covers writing Excel Add-Ins in Python, reading and writing Excel files, and interacting with Excel. It’s a great resource for understanding the differences between all the different Python/Excel tools out there, and all in one place.

Top Tools for Data Scientists: Analytics Tools, Data Visualization Tools, Database Tools, and More

NGDATA is a global digital experience company that partners with businesses to harness the power of data to drive customer-centricity

FOLLOW US ON

PLATFORM

Unified Customer Data

AI-Driven Customer Segments

Intelligent Journeys

INDUSTRIES

Community Banks

Hospitality

Insurance

Retail Banking

Sports and Entertainment

Telecommunications

PARTNERS

ISVs & Resellers

Tech Partners

RESOURCES

Client Success Stories

Insights

ABOUT US

Leadership

Careers

Contact Us

©2012–2024 NGDATA N.V. and/or Its Affiliates. All Rights Reserved.

Privacy Policy & Terms of Use