The rise of Small Open Source in-house Analytics systems

The rise of Small Open Source in-house Analytics systems

The Analytics space is an ever-changing subject which requires a fast pace and a mindset focused on building pilots, testing new features and analysing compatibility with present infrastructure for any organisation that desires a data-driven decision making process.

Share on Facebook
Send on E-Mail

Context, Trends, Movements

The Analytics space is an ever-changing subject which requires a fast pace and a mindset focused on building pilots, testing new features and analysing compatibility with present infrastructure for any organisation that desires a data-driven decision making process. The R&D movement which more and more organisations are implementing is usually prone to failure due to lack of patience and desire to build ready for production units in a strict time frame. It is desired by executives and embraced by developers, although most results lack real world applicability or simply do not improve/ add value to current processes or products. While this can be traced by following the decision making process,  sometimes this happens due to a cumulation of factors such as: infrastructure incompatibility, lack of iterative developments, final result focus, no real world testing and so on.

“What you will be working on 10 years from now has not been invented yet.”

Small platforms are making their way into management mainstream following the adoption of a more agile way of work. The change has different implications that  surpass the infrastructure or tools used – this affects primarily the development cycles and product testing. Mammoth analytics infrastructures are slow, heavy and usually require additional know-how to operate and configure them, thus raising valid concerns for the management in terms of profitability and organisational adoption. Building, testing and deploying new Machine Learning products should not be seen as a milestone or a great accomplishment by the executive level rather a new tool/ asset for the organisation in order to accomplish desired KPIs. This change in mentality has a lot of bridges to cross in order to be successfully implemented. Bigger is not necessarily better in terms of testing and developing new Analytics products, but also we acknowledge the fact that too small of a platform can greatly impact the pool of models used and usually comes with intensive memory optimisation.

What’s the concept?

The go-to tools and solutions for Analytics developers are usually the Open Source ones – some with overwhelming adoption by the community (e.g. Jupyter Notebooks). Building and testing Machine Learning solutions does not require heavy solutions, rather an IDE and a programming language that supports/ has implemented some ML libraries. Organisations that recently invested in an analytics team have a high chance of using the same solutions that a student does for his homework: simple IDEs, maybe a model repository (usually MLflow) – or just pickle (used to serialise objects, e.g. save models to a file) and a Database connection which in some instances is successfully represented by an exported CSV file.

Usually the management has a certain amount of restraint in updating or building an analytics infrastructure without any prior results, profits or maybe valuable insights delivered. Which makes sense from our point of view. You do not need state of the art capabilities to retrieve some insights or maybe provide a different view for the business in order to optimise or create new processes. We consider that the problem arises when scaling the solutions as there is quite a difference between 1 model and 100 models developed.  Of course, you can probably do it manually too, but the costs are high and the human resource scarce, as the developers are not keen on manual runs or file-based model management.

What does it take?

Building an infrastructure from scratch should not be a tedious task, especially considering the fact that the  problem arises on integration with inplace systems.  Best on-demand, auto-ML scalable infrastructure with zero to minimal integration will be more of a burden than an asset. No matter the budget or the capabilities, if you need to manually import and export a CSV file in order to process it and then upload your results to a Sharepoint there is no point in discussing scalability or real world impact other than some isolated use cases.

An in-house Analytics Platform should focus on a few standard aspects and several others that differ from one organisation to another. You need a development place, a repository for your code and one for your ML developments, an orchestrator/ scheduler and a tool for EDA (Exploratory Data Analysis). All combined with a full integration between the platform and desired input/output systems. From experience, I would recommend on backlogging for future developments an explanatory module for your projects and an auto-ML framework, which can easily be integrated by the team through python packages (ex: pycaret). Considering the fact that most solutions (if not all) can be found as open source containers, there is an extensive flexibility for the team to build and test suitable solutions for their organisation or even customise them with in-house plugins/ extensions.

One could argue that the adoption of open source systems in a closed proprietary environment can have various consequences, especially in terms of compatibility and lack of 3rd party support, but this is easily avoided as the platform does not need extensive integration, rather than open communication. Usually, the exchange will be done through APIs and will not in affect in any way how in-place systems behave. This is a strong asset to have. A flexible jack of trades that can enhance and produce valuable insights for the organisation in a rather short time frame.

First steps in Open Source Technologies

World of open source technologies is vast and can be overwhelming when browsing without guidance. I recommend searching for the most used solutions, with an extensive community and recurrent updates. Also, browsing through the Apache community top projects can reveal some interesting tools (see Superset, Airflow – as a fun fact, both came from Airbnb™ and also from the same person: Maxime Beauchemin). Whatever tools and solutions you choose for your platform, keep in mind that the goal is to provide new and exciting insights for the organisation and also new capabilities and know-how for your team, department and business.

A step in the future

A quote that has stood with me since University sounds a bit like this: What you will be working on 10 years from now has not been invented yet.”. Probably the sentence is not 100% bullet proof, but it strongly reflects the Data scene that we are experiencing first hand. The Data Management ecosystem will slowly but surely change its organisational aspects by absorbing various specific roles into a much broader general role as a “data person”. Technical analysts, developers, data engineers and so on, all these roles that now serve a specific purpose will most likely morph into a generic jack of trades one. Data Science & Data Analytics will be seen as indispensable as SQL & Data Warehouses. Clusters, segments, ad-hoc data driven analytics, forecasts, all these methods will become the norm, just as querying the Database is today. We will look back and ask ourselves why we left important strategic decisions to be made by relying on business expert decisions and not by automated data-driven processes. The organisations will need to be swift and adapt to the new landscape or suffer the same fate as heavy silos are having today: adoption of already burned out technologies as “state of the art”, mainly heavy & slow Data Lakes with 2012 technology stack.

Share on Facebook
Send on E-Mail

More articles

data strategy

Building a Data Strategy — Aligning it with your Business Goals

In this article, we'll explore practical steps to ensure your data strategy is not just a plan, but a catalyst for business success.

Cloud Data Management

Cloud-Based Data Management deep dive

This article delves into the world of Cloud-Based Data Management, outlining its key benefits, potential risks, and essential best practices.

Data Integration

Merging Disparate Data Sources for a Unified System

In the landscape of modern business, data integration stands as a strategic imperative. Let's guide you through this intricate process.


Unveiling the Power of Metadata in Data Management

In this article, we will delve into the pivotal role of metadata in effective data management, shedding light on how IDS Consulting can guide your organization towards a

ISO 27701 Security Techniques

We are ISO/IEC 27701 Security Techniques Certified

In a significant milestone, we proudly announce our achievement of ISO/IEC 27701 Security techniques certification.

google cloud partner no outline

Meet your Google Cloud Partners

IDS Consulting has partnered with Google Cloud to help its customers across Europe accelerate their cloud adoption journeys.

Data Security and Privacy

Data Security and Privacy: Safeguarding Against Unauthorized Access and Breaches

In an era where data fuels business operations, ensuring robust data security and privacy measures is paramount. Let's delve into strategies that organizations can employ to fortify their

Large Datasets Seturilor de date voluminoase

Large Datasets Management: Storage and Retrieval Strategies

This article explores the strategies and best practices for managing large datasets effectively, in the world of Data Management.

data quality

The Importance of Data Quality and How to Ensure It

In this article, we delve into the importance of data quality and provide actionable strategies to ensure it within your organization

DevTalks Cluj Winner

Celebrating Success at DevTalks Cluj – Who is the winner of our prize?

Check out who is the winner of the 100E voucher at any retailer, that solved our math quiz at DevTalks Cluj!

Business, cluj, devtalks
DevTalks Cluj

Stand out from the crowd at DevTalks Cluj 2023!

We're thrilled to announce that IDS Consulting is all set to be the Data Management Partner at DevTalks Cluj on September 27th, 2023!

QA analyst

Get to know our team – meet Ionel Ene, our QA Analyst

Get to know Ionel Ene, our QA Analyst. Apart from his technical skills, he is our cup of good mood whenever we get together. He knows when a joke

Business, Meet the team
Laptop with data coming out

Data Management Best Practices

In today's digital age, effective data management is a critical cornerstone of successful business operations. In this article, we'll delve into some best practices, tips, and tricks to


Data Governance: Policies and Procedures for Decision Making and Data Management

In today's data-driven world, organizations must prioritize effective data governance to ensure data integrity, compliance and reliable decision-making.


IDS Consulting: See you at DevTalks 2023!

IDS Consulting is pleased to announce our participation as Data Management partners at DevTalks 2023, one of the most prestigious technology conferences in the industry.


The rise of Small Open Source in-house Analytics systems

The Analytics space is an ever-changing subject which requires a fast pace and a mindset focused on building pilots, testing new features and analysing compatibility with present infrastructure


Achieving Excellence: Our Successful ISO Standards Certification

We are ISO Certified! We just received the certifications in ISO 9001 (Quality Management), ISO 27001 (Information Security), and ISO 20000-1 (IT Service Management)!


Maximizing Business Success: Understanding the Key Components of Business Intelligence

How Business Intelligence Components Drive Informed Decision-Making and Enhance Operational Efficiency


Boosting Performance and Profits: How Data Warehousing Helps Banks Meet Customer Needs

In today’s data-driven world, banks are facing increased pressure to provide faster, more personalized, and more efficient services to their customers.


Find out all about our 2023 plans

Every end of the year brings summons the need of a retrospective. Thus, Gabriel Tataru, Managing Director of Integration Data Systems, helped us to satisfy our curiosity, telling


Meet us @DevCon 2022!

This year, you can find us @DevCon 2022 , between the 9th and 10th of November 2022, at our virtual booth.


The Romanian Banking System in the new data-driven movement

The Romanian Banking System has undergone serious digital transformation in the past years, especially following the 2020 COVID-19 crisis, with full remote work backing and digital products offering.


The challenges of Testing in a changing world

Since business is continuously changing very fast, and we might find that what was crucial yesterday might not be that important today, the solutions designed for supporting the


Letter from the PM Team

A debate between Project Managers around which one of the two methodologies, waterfall or agile, is the best.


BI Sources and Consumers

What can be a source of data for a BI system and what can consume a BI data in your company? Find out!


Data Science Landscape

A walkthrough the data science landscape - roles, algorithms, tools, pipelines, and processes, all summed up in a high level picture.


Analysis in Business Intelligence

A selection of the best analysis techniques for a business intelligence solution, chosen to maximize your organization's value.


Data Management

Testing and Quality Assurance

Application Management

Business Processes Management

Cloud Engineering

Program and Project Management

IT Operations

Technologies and Tool Stack

Scan the code