WELCOME, GUEST
Minimize
Blogger List

Steven Feuerstein Indicates Oracle ACE director status
PL/SQL Obsession

Guy Harrison Indicates Oracle ACE status
Database topics

Bert Scalzo Indicates Oracle ACE status
Toad for Oracle, Data Modeling, Benchmarking
Dan Hotka Indicates Oracle ACE director status
SQL Tuning & PL/SQL Tips

Valentin Baev
It's all about Toad

Ben Boise
Toad SC Discussions

Dan Clamage
SQL and PL/SQL

Kevin Dalton
Benchmark Factory

Peter Evans 
Business Intelligence, Data Integration, Cloud and Big Data

Vaclav Frolik  
Toad Data Modeler, Toad Extension for Eclipse

Devin Gallagher
Toad SC discussions

Anju Gandhi
Toad for Oracle

Stuart Hodgins
JProbe Discussions

Julie Hyman
Toad for Data Analysts

  Henrik "Mauritz" Johnson
Toad Tips & Tricks on the "other" Toads
  Mark Kurtz
Toad SC discussions
Daniel Norwood
Tips & Tricks on Toad Solutions
Amit Parikh
Toad for Oracle, Benchmark Factory,Quest Backup Reporter
Debbie Peabody
Toad Data Point
Gary Piper
Toad Reports Manager
John Pocknell
Toad Solutions
Jeff Podlasek
Toad for DB2
Kuljit Sangha
Toad SC discussions
Michael Sass 
Toad for DB2
Brad Wulf
Toad SC discussions
Richard To
SQL Optimization
  Toad Data Modeler Opens in a new window
Data Modeling
 
  Toad Higher Education
How Hi-Ed Uses Toad
  Real Automated Code Testing for Oracle
Quest Code Tester blog
  中文技术资料库
技术文章
 

Blogs

Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies.  Have some views of your own to share?  Post your comments!  Note:  Comments are restricted to registered Toad World users.

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.


Dec 6

Written by: Guy Harrison
12/6/2011 1:01 PM  RssIcon

Since my first post on connecting to Amazon Elastic Map Reduce with TOAD, we’ve added quite a few features to our Hadoop support in general and our EMR support specifically, so I thought I’d summarize those features in this blog post

Amazon Elastic Map Reduce is a cloud-based version of Hadoop hosted on Amazon Elastic Compute Cloud (EC2) instance.  Using EMR, you can quickly establish a cloud based Hadoop cluster to perform map reduce work flows. 

EMR support Hive of course, and Toad for Cloud Databases (TCD)  includes Hive support, so let’s look at using that to query EMR data.

Using the Toad direct Hive client

TCD direct Hive connection support is the quickest way to establish a connection to Hive.  It uses a bundled JDBC driver to establish the connection.

Below we create a new connection to a Hive server running on EMR:

image

  1. Right click on Hive connections and choose “Connect to Hive” to create a new Hive connection.
  2. The host address is the “Master” EC2 instance for your EMR cluster.  You’ll find that on the EMR Job flow management page within your Amazon AWS console.  The Hive 0.5 server is running on port 10000 by default.
  3. Specifying a job tracker port allows us to track the execution of our Hive jobs in EMR.  The standard Hadoop jobtracker port is 50030, but in EMR it’s 9600.
  4. It’s possible to open up port 10000 so you can directly connect with Hive clients, but it’s a bad idea usually.  Hive has negligible built-in security, so you’d be exposing your Hive data.   For that reason we support a SSH mode in which you can tunnel through to your hadoop server using the keypair file that you used to start the EMR job flow.  The key name is also shown in the EMR console page, though obviously you’ll need to have an actual keypair file.

The direct Hive client allows you to execute any legal Hive QL commands.  In the example below, we create a new Hive table based on data held in an S3 bucket (The data is some UN data on homicide rates I uploaded).

SNAGHTML9c66e8d

Connecting Hive to the Toad data hub

It’s great to be able to use Hive to exploit Map Reduce using familiar (to me) SQL-like syntax.  But the real advantage of TCD for Hive is that we link to data that might be held in other sources – like Oracle, Cassandra, SQL Server, MongoDB, etc.

Setting up a hub connection to EMR hive is very similar to setting up a direct connection.  Of course you need a data hub installed (see here for instructions), then right click on the hub node and select “map data source”:

Now that the hub knows about the EMR hive connection, we can issue queries that access Hive and – in the same SQL – other datasources. For instance, here’s a query that joins homicide data in Hive Elastic Map Reduce with population data stored in a Oracle database (running as Amazonn RDS:  Relational Database Service).  We can do these cross platform joins across a lot of different types of database sources, including any ODBC compliant databases, any Apache Hbase or Hive connections, Cassandra, MongoDB, SimpleDB, Azure table services:

In the version that we are just about to release, queries can be saved as views or snapshots, allowing easier access from external tools of for users who aren’t familiar with SQL.   In the example above, I’m saving my query as a view.

Using other hub-enabled clients

TCD isn’t the only product that can issue hub queries.  In beta today, the Quest Business Intelligence Studio can attach to the data hub, and allows you to graphically explore you data using drag and drop, click and drilldown paradigms:

It’s great to be living in Australia – one of the lowest homicide rates!

If you’re a hard core data scientist, you can even attach R through to the hub via the RODBC interface.  So for instance, in the screen shot below, I’m using R to investigate the correlation  between population density and homicide rate.  The data comes from Hive (EMR) and Oracle (RDS),  is joined in the hub, saved as a snapshot and then feed into R for analysis.  Pretty cool for a crusty old stats guy like me (My very first computer program was written in 1979 on SPSS).

image

This blog has been syndicated from Guys' blog http://guyharrison.squarespace.com/blog/2011/12/5/amazon-elastic-map-reduce-emr-hive-and-toad.html

Search Blog Entries
 
Blog Archives
 
Archive
<May 2013>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Monthly
May, 2013 (15)
April, 2013 (13)
March, 2013 (10)
February, 2013 (5)
January, 2013 (7)
December, 2012 (6)
November, 2012 (10)
October, 2012 (8)
September, 2012 (6)
August, 2012 (8)
July, 2012 (8)
June, 2012 (12)
May, 2012 (21)
April, 2012 (10)
March, 2012 (16)
February, 2012 (19)
January, 2012 (20)
December, 2011 (19)
November, 2011 (14)
October, 2011 (12)
September, 2011 (17)
August, 2011 (15)
July, 2011 (16)
June, 2011 (13)
May, 2011 (15)
April, 2011 (8)
March, 2011 (21)
February, 2011 (17)
January, 2011 (16)
December, 2010 (13)
November, 2010 (13)
October, 2010 (7)
September, 2010 (15)
August, 2010 (11)
July, 2010 (13)
June, 2010 (12)
May, 2010 (14)
April, 2010 (12)
March, 2010 (13)
February, 2010 (12)
January, 2010 (7)
December, 2009 (10)
November, 2009 (12)
October, 2009 (15)
September, 2009 (18)
August, 2009 (13)
July, 2009 (23)
June, 2009 (14)
May, 2009 (17)
April, 2009 (7)
March, 2009 (14)
February, 2009 (7)
January, 2009 (12)
December, 2008 (7)
November, 2008 (11)
October, 2008 (19)
September, 2008 (14)
August, 2008 (11)
July, 2008 (14)
June, 2008 (19)
May, 2008 (12)
April, 2008 (18)
March, 2008 (13)
February, 2008 (8)
January, 2008 (7)
December, 2007 (5)
November, 2007 (8)
October, 2007 (13)
September, 2007 (13)
August, 2007 (16)
July, 2007 (11)
June, 2007 (6)
May, 2007 (5)
April, 2007 (5)
March, 2007 (8)
February, 2007 (6)
January, 2007 (6)
December, 2006 (5)
November, 2006 (8)
October, 2006 (4)
August, 2006 (3)