Definition: A statistical consulting service will advise your company on managing, organizing, and drawing conclusions from data. A consulting project can include setting up studies, managing databases, creating scripts in Python or R, and building statistical models.
As one of the largest survey tools on the web, SurveyKing has extensive experience in data science and managing big data. Our wide range of skills will ensure your project is successful from start to finish.
This resource page will give you an idea of the types of projects consultants handle, the tools used, and the skills consultants should have. This information can help you make the correct decision when making a choice on what statistical consulting service to go with.
In the research field, all projects are unique. Here are some of the most common project types that we work on. These project types are not unique to us; generally, these would be the core services a consulting company would offer.
This can include setting up servers, creating a data pipeline, creating SQL databases, building websites to capture data, and even organizing data in Excel using VBA. Data management is a fundamental part of statistical consulting. Properly managed data is needed to build models.
For market research, MaxDiff and Conjoint analysis are two of the main tools used in market research. These survey types measure the most valued parts of a product or service. Both survey types use logistic regression in the analysis, which is standard in our report builder; depending on the project needed, a custom model might need to be built.
Regression analysis estimates the relationships between a dependent variable and one or more independent variables. There are two main types of regression: linear regression and logistic regression.
Linear regression is used when the dependent variable is a number, often called a continuous number—for example, trying to predict if gender affects Vitamin levels in the blood. Vitamin A levels are a number that theoretically has infinite values.
Logistic regression is used when the dependent variable is categorical. Categorial refers to a named level, such as like/dislike. This could be used to predict whether the store location affects a product's liking.
Regression analysis is used in financial forecasting and healthcare and is the fundamental concept in machine learning.
You use a Chi-square test to test the significant difference between two categorical (or qualitative) variables. This test examines the differences between variables from a random sample to judge the fit between expected and observed results.
Survey data often use Chi-square tests because the data is categorical, for example, testing the proportion of females compared to males that approve of a particular product.
When you want to test the statistical significance between two or more quantitative variables, you use a Student's t-test or an ANOVA test. The Student's t-test is used when you have two groups, while the ANOVA test is used when you have three or more groups.
When you record data at specific time intervals, the data would be called a time series. When you want to analyze that data, it is referred to as time series analysis. One of the most powerful features of time series analysis is forecasting. For example, predicting what day of the month will have the highest sales volume.
Grouping similar categorical data can help identify hidden patterns. This process is called latent class analysis. A great example of this would be in a MaxDiff study. You might want to find the size of groups who prefer product A over B and groups who prefer product B over C.
A decision tree is like a flowchart that maps the possible outcomes of related choices. Decision trees are also referred to as probability trees. While decision trees can be e drawn out by hand for use in presentations or to document algorithms, they are usually programmed into a system to generate an expected result from a given input.
Predictive analytics refers to using data to predict outcomes. The building blocks of predictive analytics include regression, time series analysis, and decision trees.
Depending on the project type, various programs can be used to achieve the desired result. For example, Microsoft Excel is generally used for smaller projects and for organizing smaller data sets. Tableau is another tool used mainly for visualization. But these are some of the most common programs for large-scale consulting projects and creating models.
The most popular statistical software used is R. R has many free packages available, and many are open source, including simple linear regression, Bayesian logistic regression, and latent class.
Another popular statistical platform is SPSS. IBM makes this platform and offers a great UI and many features. While this platform is not free, the UI makes it easy to upload data and create visual objects like decision trees.
SQL is the language used to access and manipulate databases. There are many versions of SQL, such as T-SQL from Microsoft, PL-SQL by Oracle, and then open-source SQL such as MySQL and MariaDB. SQL views can be used to create source data for statistical scripts.
Python is an easy-to-use yet powerful scripting langue. The syntax makes it easy for new users to pick up. In addition, Python is open source and has a ton of great free statistical packages that you include to build statistical models.
Of course, math, critical thinking, and communication are essential in statistical consulting. Because a consulting project might start with collecting data, then cleaning, all the way to building the actual models, these hard skills will ensure the consulting firm can deliver a successful project from start to finish.
Understanding how programs are created will allow a consultant to script a custom solution if needed. Some core programming concepts are if statements, loops, arrays, and functions. For example, consider a programmer that needs to write a custom Bayesian Logical regression model. The script would need to include a few loops and functions for sampling.
Understanding how programs are created will allow a consultant to script a custom solution. Some core programming concepts are if statements, loops, arrays, and functions. For example, consider a programmer that needs to write a custom Bayesian Logical regression model. The script would need to include a few loops and functions for sampling.
Collecting inputs is essential for any data analysis. For example, if a client wants to complete a project, but that project requires data not currently captured by the client, then a custom website or app might need to be built as the first step. Of course, a statistical consultant doesn't need to be an expert in this field, but general knowledge will give the consultant an idea of recommendations.
A big issue many companies face is organizing their data. Many companies stores record in various Excel files, text files, or low-end tools like Microsft Access. A consultant well-versed in SQL can help set up databases, recommend new data structures, or offer recommendations to optimize queries.
For a statistical consulting engagement to be successful, the approach to the problem must be well thought out. Here is the five-step process SurveyKing uses. This five-step process should be similar if using another consulting service.
The first step is to understand what outcomes you want from your project, what results you want, and what inputs will be provided.
Depending on the project, this step could also include a dive into your business model. Understanding your business model will help identify opportunities that might not be included in your initial project scope.
Understanding what data you want to use and how that will be collected is crucial for project success. For example, some projects may require a custom-built survey to capture market data, some might need financial data from different sources, and some might need patient data from an ERP.
When dealing with financial or patient data, the source data must be organized or condensed to be fed into a model. Consider an example of a healthcare firm wanting to run regression analysis on patient data to understand how gender and blood vitamin levels affect cholesterol. The model would need a summary of vitamin levels categorized by gender. A consultant would need to be able to pull multiple data sources together and streamline the inputs. Using a SQL view or stored procedure would make the process repeatable to update the model continually.
This step involves creating models, writing scripts, or creating anything necessary to get the dried outcomes. As the draft is created, client feedback is used to help fine-tune the model. For projects that require multiple steps, each step will have a draft.
The review process ensures that all project owners are happy with the outcome. Generally, a final review includes top-level management on a call or email to summarize the findings.
The staff should be trained to update and run the model timescales for custom-built models as new data comes in. For larger projects that include a retainer, ongoing support will ensure models can be changed when assumptions change.
SurveyKing charges a flat rate of $50 per hour for consulting projects, with no minimum project value. This can include scripting writing, custom report creation, or app development. Most firms charge around $100 per hour with a minimum project total.