Strategy: Create new and different perspectives on your data in order to best expose the structure of the underlying problem to the learning algorithms.
Get More Data. Can you get more or better quality data? Modern nonlinear machine learning techniques like deep learning continue to improve in performance with more data.
Invent More Data. If you can’t get more data, can you generate new data? Perhaps you can augment or permute existing data or use a probabilistic model to generate new data.
Clean Your Data. Can you improve the signal in your data? Perhaps there are missing or corrupt observations that can be fixed or removed, or outlier values outside of reasonable ranges that can be fixed or removed in order to lift the quality of your data.
Resample Data. Can you resample data to change the size or distribution? Perhaps you can use a much smaller sample of data for your experiments to speed things up or over-sample or under-sample observations of a specific type to better represent them in your dataset.
Reframe Your Problem: Can you change the type of prediction problem you are solving? Reframe your data as a regression, binary or multiclass classification, time series, anomaly detection, rating, recommender, etc. type problem.
Rescale Your Data. Can you rescale numeric input variables? Normalization and standardization of input data can result in a lift in performance on algorithms that use weighted inputs or distance measures.