In recent years, the democratization of analytic, reporting and BI solutions has become a driving force in the growing complexity of data integration and data warehousing models. Add to the equation the growing complexity and volume of information thanks to Big Data, and it’s no surprise that the underlying ETL and data warehousing processes to integrate and access data from multiple sources is becoming increasingly complex.
This trend has been reflected amongst the attendees at this year’s Informatica World. Leveraging disparate collections of command line interfaces and utilities such as PMCMD and scripting no longer cuts it in today’s world where IT is being driven by the business to more seamlessly and efficiently update ETL and data warehousing processes. Automating these processes with an enterprise-grade strategy designed for change and scalability becomes critical.
Informatica users are looking for an automation strategy that reduces the time required to build and manage these ETL and data warehousing processes dynamically without the need to leverage senior-level developers for scripting. Take PMCMD for example. It’s a script-based command line utility that is application specific and not designed to manage dependencies and pass data between Informatica PowerCenter and dependent process types from the multitude of other components that comprise a data warehousing process…at least not with a great deal of manual intervention.
Attendees are looking for enterprise-grade solutions that provide more advanced automation capabilities and prebuilt integrations with other platform types. Data warehousing appliances including Netezza and Teradata are now commonly leveraged in conjunction with PowerCenter. The prebuilt integrations within enterprise scheduling and automation solutions such as ActiveBatch allow ETL developers to streamline the passing of data and ensure more accurate and reliable workload execution without having to hardcode complex workflow logic, schedules and dependencies.
Alternatively, dynamically managing ETL workload properties at runtime represents the real-time processing demands now expected by the business, in addition to ensuring accurate downstream data quality. Many attendees looked at ActiveBatch’s Active Job Variables as a means to dynamically set a PowerCenter workflow parameter at runtime, either via creating a text file prior to executing the PowerCenter workflow or by simply querying a database table. While this could be accomplished via PMCMD, it means leveraging scripts where variables and complex workflow logic is hardcoded. As one Informatica attendee said, this means an extraordinary amount of time onboarding new developers to manage these processes and taking senior-level resources offline from more pressing initiatives.
As ETL and data warehousing processes becoming increasingly complex, it will be interesting to see the role that enterprise automation solutions will play in turning these data pathways into automated, repeatable processes that deliver control and visibility over all steps of the data warehousing process.