Transformer for Spark: Multitable Consumer

Newcomers to StreamSets might not know that we have four engines under the Control Hub hood: Data Collector, Transformer for Spark, Transformer for Snowflake, and Mainframe Collector. This fact... Read more »

Delta Lake Architecture: A Bridge Between Data Lakes & Data Warehouses

Data warehouses and data lakes are the most common central data repositories employed by most data-driven organizations today, each with its own strengths and tradeoffs. For one, while data... Read more »

Using the StreamSets Python SDK To Create Reusable StreamSets Pipelines (S3 to Redshift Example)

Many StreamSets Data Collector customers are now migrating their Hadoop ingestion pipelines to cloud platforms like AWS and they want to take full advantage of the AWS native services... Read more »

Extend StreamSets Integration With Source Systems Using Groovy

StreamSets Data Collector (SDC) supports 69 sources, including relational and no-SQL databases, on-prem and cloud file systems and a handful of messaging applications (documentation). Yet, occasionally, customers ask if... Read more »

3 Ways To Keep Up With Constant Change

The business climate today feels a bit like a battleground, and everyone’s feeling the pressure. A recession looms, competition is fierce, ongoing supply chain issues wreak havoc, and customer... Read more »

4 Ways Data Federation Tools Will Let You Down

Data federation tools are often touted for their ability to unify and query data in a variety of sources and formats using virtualization.  The technology, the theory goes, provides... Read more »

Send Kafka Messages To Amazon S3

In this post, we will take a look at best practices for integrating StreamSets Data Collector Engine (SDC), a fast data ingestion engine, with Kafka. Then, we’ll dive deep into... Read more »

Take Control of the Data “Wild West” — And Empower Your LOB To Boot

Raise your hand if you’ve ever had a line of business user go rogue and create a dataset without telling you. Don’t be shy… You’re in good company. In... Read more »

Cloud Data Migration – Knowing When, Why, and How To Move Your Data

Cloud data migration is on the rise, with cloud adoption expected to nearly double in the next five years. It’s no surprise that cloud data migration is increasing, as... Read more »

Reverse ETL to Marketo: A Real-Life Example

Standing for Extract, Transform, and Load, the acronym ETL describes the process of extracting data from a target, transforming it, and sending it on to load into a destination.... Read more »
Subscribe to our Newsletter