Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?

02/16/17 0

Debugging a failing test case caused by query running “too fast”

Splunking Kafka with Kafka Connect

02/16/17 0

How to use Kafka Connect along with a Splunk Heavy Forwarder to stream data

Apache Kafka: The Cornerstone of an Internet-of-Things Data Platform

02/15/17 0

If you are a developer considering IoT as a career option it is time for you to start investing in Apache Kafka

Crossing the Streams – Joins in Apache Kafka

02/15/17 0

Version 0.10.0 of the popular message broker Apache Kafka saw the introduction of Kafka Streams

Anonymizing Datasets at Scale Leveraging Databricks Interoperability

02/13/17 0

Data anonymization is often the first step performed when preparing data for analysis

An HDFS Tutorial for Data Analysts Stuck With Relational Databases

02/13/17 0

What are the benefits that HDFS has over relational databases?

Running Top-N Aggregation grouped by Dimension

02/12/17 0

How to implement a streaming analytics application using Kafka Streams

When Businesses Go Around IT for Analytics — Upside

02/13/17 0

Going behind IT's back is one of the best-known tropes in IT and business management

The AWS Deep Learning AMI, Now with Ubuntu

02/10/17 0

AWS Deep Learning AMI for Ubuntu is now available in the AWS Marketplace

Building a streaming analytics Java application against a Kafka Topic

02/11/17 0

In this article I will show you my first steps with Kafka Streams

Spark Summit East 2017: Another Record-Setting Spark Summit

02/09/17 0

We’ve put together a short recap of the keynotes and highlights from Databricks’ speakers for Apache Spark enthusiasts who could not attend the summit

Intel’s BigDL on Databricks

02/09/17 0

BigDL is an open source deep learning library from Intel

Microsoft adds patent suit protections for cloud customers

02/09/17 0

The patent protection can provide an edge for Microsoft as the company competes with Amazon and Google in the cloud

Monitoring Kafka Streams Metrics via JMX

02/08/17 0

I'll present a way to access the metrics using the command-line application jmxterm

Hadoop Fundamentals and Key Technologies in the Evolving Hadoop Ecosystem

02/03/17 0

Reliance on open standards such as NFS and POSIX is the best way to leverage data integration into a big data platform

How to Solve IoT’s Big Data Challenge with Machine Learning

02/02/17 0

Machine learning may also help us with a challenge from one of last year’s most buzzed about technology developments: the Internet of Things.

Announcing the Spark Live 2017 World Tour

01/31/17 0

we will be hitting the road again in 2017 to continue our mission of bringing Apache Spark and Databricks to the masses

Kafka – Rewind Consumer Offsets

01/31/17 0

One of the most important features from Apache Kafka is how it manages Multiple Consumers.

Integrating Your Central Hive Metastore with Apache Spark on Databricks

01/30/17 0

The Databricks platform provides a fully managed Hive Metastore that allows users to share a data catalog across multiple Spark clusters

Introduction to Amazon Athena

01/23/17 0

Eschewing ETL seems to be a real theme these days

Keeping Data Scientists Happy: The Rise of the Cloud Data Lab

01/27/17 0

Within the datasets available to organizations lie answers to some of the most pertinent questions and ways to drive and validate important decisions.

How MTV And Nickelodeon Use Real-Time Big Data Analytics To Improve Customer Experience

01/26/17 0

Monitoring of the digital networks which are used to pump their content into millions of homes gives them access to a huge amount of data

Delivering Exceptional Care Through Data-Driven Medicine

01/25/17 0

An emerging theme among providers that are reporting early wins is the central role of big data technologies

See this simple introduction to Natural Language Processing (NLP)

01/21/17 0

Organizations are turning to natural language processing (NLP) technology to derive understanding from the myriad of these unstructured data available online and in call-logs

Events

AWS re:Invent

Nov 28, 2022 Las Vegas, NV

Join us again this year in Las Vegas for our biggest, most comprehensive, and most vibrant event in cloud computing.

AI & Big Data Expo 2022

Oct 05, 2022 Santa Clara, CA

This technology event is for the ambitious enterprise technology professional, seeking to explore the latest innovations, implementations and strategies to drive businesses forward.

Current 2022: The Next Generation of Kafka Summit

Oct 04, 2022 Austin, TX

Join the first-ever data streaming industry event at Current 2022: The Next Generation of Kafka Summit. You’ll be able to immerse yourself in all things real-time data with peers.

O’Reilly Open Source Convention

May 16, 2016 Austin

OSCON covers FLOSS in its entirety. Not just one language, tool, or philosophy, but all the moving parts integrated and working together.

99U 2016

May 05, 2016 New York

The goal of the 99U Conference is to shift the focus from idea generation to idea execution. Providing road-tested insights

Consensus 2016: Making Blockchain Real

May 02, 2016 New York

Consensus 2016 will define what is “real” in blockchain technology and focus on how to mainstream real-world applications for consumers.