Data-driven Finance: Power Unleashed with Feature Factory



Embark on a journey as an online financial services client seeks to enhance their predictive modeling capabilities and streamline their business processes. Faced with fragmented data environments, complex model applications, and a lack of common definitions, the need for a comprehensive solution was paramount. Witness the power of the Data Science Feature Factory as it brings together data source discovery, data quality reports, standardized business definitions, and automated feature calculations. Through this innovative approach, over 5,198 features were developed, reducing model build time and enabling additional feature creation with minimal coding. Standardized inputs, time periods, and calculations eliminate confusion among data science teams, while a hardened process flow minimizes the risk of failure in production environments. Experience the game-changing impact of this revolutionized approach to data science, unlocking the full potential of financial services.


The challenge faced by an online financial services client was the need to  enhance their capability to swiftly develop predictive models in order to  align with their business objectives. The client's data environment was fragmented, with various business processes and models relying on different  sources for the same information. The applications of these models were highly intricate, encompassing real-time fraud detection, loss prevention,  and identity theft prevention. However, the process of developing new models  and maintaining existing ones required substantial manual effort. Additionally, there was a lack of a common definition of key measures and outcomes among data science, engineering, risk, and business stakeholders, further complicating the situation.


To address the challenges faced, a comprehensive solution was implemented. First, data source discovery and evaluation of internal tables and databases were conducted to ensure the availability and reliability of data. Data quality reports and summaries were generated to assess the accuracy and completeness of the data. Then, functions were developed to produce  first-order event logs based on standardized business definitions, facilitating efficient data processing. Additionally, function calls were created  to compute numerous metrics for each input, forming a finalized feature library. Finally, the code base was integrated into the production system to enable batch and on-demand feature calculations, ensuring seamless and timely integration of the solution into the client's operations.


The development of a total of 5,198 features has had a significant impact on the  application of new models. By leveraging the existing feature set, the build time for new models has been greatly reduced. Furthermore, the creation and  iteration of additional features can now be accomplished with minimal coding efforts. To streamline the process and ensure consistency, definitions of inputs, time periods, and calculations have been standardized, reducing confusion among data science teams. Additionally, the implementation of a  hardened process flow has significantly reduced the risk of failure in the  production environment, enhancing overall efficiency and effectiveness.

Key Data Points

Featrures Developed