Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Sunday, December 1, 2013

Byzantine Bettors

Last year I had an idea to work my way through Dennis Shasha's book: Puzzles for Programmers and Pros.  Well, it's been a busy year and I didn't quite get around to it.  But, since we had a holiday this week, I had some time to play around and managed to finish another one.  As promised, I'll write about it here.

In this problem, you have some advisors, some who always tell the truth and some who may or may not lie.  The idea is you can make a bet, from $0 to $100, on whether a number on a piece of paper (face down) is 0 or 1.  The goal is to figure out how much you are guaranteed to win.

In the first case, you have two advisors out of four who always tell the truth.  You start with $100.  I figured out that the amount you are guaranteed to win is $400.  Here is how I worked it out:


The second part is a bit trickier: how much are you guaranteed to win if you start with $100 but there are only three advisors, and only one of them always tells the truth.  The answer I got is $266.69 - which I got by working out the different cases you would encounter and working through the possibilities.  I used a flow chart to do that:


I'm enjoying the book and hopefully will get around to posting a few more problems soon!

Saturday, April 20, 2013

Issues with Panorama TitleTemplate in Windows Phone 8 Apps

Recently I was playing with a prototype Windows Phone 8 app and wanted to set a custom title and subtitle for my Panorama control. The code looked like this:

XAML:
<phone:panorama name="Pano">
  <phone:Panorama.TitleTemplate>
    <DataTemplate>=
      <StackPanel Margin="14,50,0,0">
        <TextBlock Name="Number" Text="{Binding}" Style="{StaticResource PhoneTextNormalStyle}" Margin="0,0,0,0" >
        </TextBlock>
        <TextBlock Name="Title" Text="{Binding}" Style="{StaticResource PhoneTextSubtleStyle}" Margin="0,0,0,0" />
      </StackPanel>
    </DataTemplate>
  </phone:Panorama.TitleTemplate>
</phone:Panorama>
Unfortunately there doesn't seem to be an easy way to set the binding for the Number and Title fields using the template.  I found an easy solution by nixing the TitleTemplate and just setting the title directly as follows:

New XAML:
<phone:Panorama Name="Pano">
  <phone:Panorama.Title>
      <StackPanel Margin="14,50,0,0">
        <TextBlock Name="Number" Text="{Binding}" Style="{StaticResource PhoneTextNormalStyle}" Margin="0,0,0,0" >
        </TextBlock>
        <TextBlock Name="Title" Text="{Binding}" Style="{StaticResource PhoneTextSubtleStyle}" Margin="0,0,0,0" />
      </StackPanel>
  </phone:Panorama.Title>
</phone:Panorama>
Now I can set the binding in code behind as such:

            this.Number.Text = numberText;
            this.Title.Text = titleText;

Where numberText and titleText are public static strings defined in my code behind.  The resulting appearance is exactly the same as if I'd used the TitleTemplate.

I couldn't find any useful articles online so thought I'd share it here in case anyone else is spending a lot of time trying to figure this out!  Hope this helps you out!

Thursday, May 17, 2012

Announcing the Launch of "Where's My Game?"!

On Tuesday this week my first Windows Phone app, "Where's My Game?", was finally published.  If you play frisbee in the Greater Seattle region, this app is for you!  Here are a few things it will help you to do:
  • for a given DiscNW league and team, see when and where your next game is (handy when on the go, since the desktop websites that host the schedules are hard to use from a smartphone)
  • map your game and get directions from your current location (helpful if you're lost)
  • see the DiscNW Twitter feed for any last minute cancellations/field changes
I've already submitted my first update which will add this summer's Microsoft Ultimate Hat League schedule, so if you work at Microsoft and play frisbee you can see your games and locations too.  That should be available later this week or early next (I'll make another announcement when it is).

In the coming weeks I'll add:
  • the ability to save your favorite teams/leagues
  • more dynamically loaded team schedules (so cancellations/rescheduled games appear faster)
Long term plans include creating a web service so groups can submit their own schedules.  That's a ways off, however.  For now, I hope you enjoy the app and find it useful!

Tuesday, May 1, 2012

Developing for Windows Phone 7

A quick update on my latest project: developing an app for Windows Phone.  One of my other pasttimes here in Seattle is playing Ultimate Frisbee (pretty frequently, too; I might play as much as four times a week in the summer).  However, if I'm on the go and need to figure out when and where my next game is, unfortunately the existing websites containing our schedules do not work well on mobile devices.  The text is small, I usually need to log in and navigate to several different pages before finding it, and forget about an easy way to see the map.  So, I decided to make an app for my phone that will do all this for me.  Here are some screen shots:


I've just submitted my app (last night, 10:30pm!) and am waiting for it to be approved for the Marketplace.  I'm planning to write a little more here about the different challenges I encountered and how I worked around them, so stay tuned!

Friday, December 23, 2011

To-do's for 2011

This great article, "11 Things every Software Developer should be doing in 2012" has some excellent advice for developers to follow.  When we transition from student life to the world of full-time work, it's easy to focus only on the skills needed for day-to-day work and let the rest become rusty.  It's so important to not let that happen through continued practice and study outside of work, something that's definitely on my list of New Year's resolutions.

What's on your to-do list for 2012?  What are your professional goals/resolutions?

Saturday, December 10, 2011

A Sweet Tooth, cont...

So I didn't quite finish off the problem I talked about in my last post.

The next part asks how much Jeremy's advantage increases by if you increase the number of cakes.  You might say, have 7 cakes.  Would Jeremy still come out ahead using the rules in the previous challenge?

I just used some extrapolation to do this one.  It's not quite as thorough as the solution in the book, but we end up at the same place.  First, I looked at the case where there are two cakes, n = 2.  Then Jeremy's advantage over Marie is 5/4 - 3/4 = 1/2.  Next, I looked at the case where there are three cakes, n = 3.  Then Jeremy's advantage over Marie is 15/8 - 13/8 = 2/8 = 1/4.  You can keep increasing n and you will see a trend, which works out to the following equation: A = 1/2(n-1).

I wrote a little code in C++ to let us easily extrapolate:
#include "stdafx.h"
#include "math.h"
#include <iostream>
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
 double advantage;
 int numCakes;

 cout << "Enter number of cakes: ";
 cin >> numCakes;
 advantage = 1.0/(pow(2.0, numCakes - 1));
 cout << "Jeremy's advantage is: " << advantage << endl;
 
 return 0;
}

The question then asks if there's a way to make sure both players of this game get an equal amount of the cake.  That part is easy!  As we noticed when Marie goes first, Jeremy always cuts that cake in half.  If she can always go first, they'll always get an equal amount of cake.

Thursday, December 8, 2011

A Sweet Tooth

As I mentioned in yesterday's post, I've been working through a book of puzzles for programmers.  The title of the first problem is 'A Sweet Tooth', and is very apropos since I had a visit to the dentist this morning (I should probably visit the dentist more and do puzzles less, but that's another matter).

In the problem, there are two children playing a game of cutting cakes and trying to get the largest pieces.  Dennis Shasha, the author, does a pretty good job of explaining the problem, but the hints he gives are a bit wordy and I think those who haven't done a lot of math proofs before might not bother with them.  I think it's easier to state assumptions and separate out the reasoning into cases, working through the math for each methodically.  I like to write mine out on a white board and I wrote out all the algebra, but ended up with the same solution as the one in the book.

I'm too lazy to type it out, but thanks to my handy cell phone you can see how I did it here.  Here's my answer to the first problem:


And here's my answer to the second:


I think it would be fun to code up a generic solution for x number of cakes.  I'll try to post it here when I do.

Wednesday, December 7, 2011

Practicing Problem Solving

I've been looking at Dennis Shasha's book "Puzzles for Programmers and Pros", which combines two of my favorite activities: programming and puzzles!  Going through the book is fun and a good way to brush up on those languages one doesn't use on a regular basis, and I may post my experiences with it going forward.  Have you tried the puzzles in the book?  What did you think?

Sunday, December 4, 2011

CS Education Week Is Here!

This week is CS Education Week.  The idea is to celebrate and raise awareness of the impact of computing science and the need for CS education.  You can find out more here.

My first experience with CS education was terrible.  It was Grade 10 and I was enrolled in the pre-IB program (a preparation year for the full IB program which started in Grade 11).  I had chosen CS as an elective as I was generally interested in computers and programming, but had no background or training in either.  The first day the teacher gave out the assignment:

"OK, you guys are going to code a database, and you will be able to sort, save, search, and print records.  Go."

Can you imagine my shock and panic?  I had no idea what to do, but it seemed like all the other students in the class (all  boys, incidentally) did.  They confidently started up their IDEs and began typing away.  There was no textbook and the teacher did not offer any kind of useful help, and there was no internet so I could not search for examples or tutorials there either.

Well, I made it through the year (I had to, since by that time it was too late to transfer and nothing else fit into my schedule) but vowed never to take another CS course again.  I didn't feel like I learned anything except HTML programming and was very annoyed by the experience.  The next year I switched to Physics and managed to avoid CS courses almost entirely in my undergraduate studies.

Luckily I realized later how important CS would be for me and went back to school to study it - but I imagine that there must be many students like me who get turned off early on and never come back.  My experience highlights a few things that are currently missing from CS education in my opinion:
  • Programming is not CS: That is, programming is a tool of CS, but at its roots CS is much more than just writing code.  CS to me is all about modeling solutions to problems using algorithms and data structures.  It's about how to think abstractly, how to analyze problems and their solutions to come up with the most efficient one, and it's how to communicate those solutions to users in a sensible way.  When I took my first course in algorithms and data structures, I fell in love, but that wasn't until long after completing my undergraduate studies.  It seems to me we have the order of things backwards here.
  • CS is everywhere: There are so areas of education that CS could impact.  Bringing up CS when teaching about other topics could help inspire students or at least get them thinking about it more broadly.  Algorithms are an easy example when it comes to mathematics, and pretty natural when you think about the programmable graphing calculators students are encouraged to use these days.  We can start even earlier; even kids in elementary school could be learning how to sort using various algorithms (there is a neat demo using blocks and weights, or discs).  There are all sorts of applications of CS to art (digital painting, photomanipulation, using computers in art installations, graphics displays, etc).  There is a lot of literature around computing these days.  I'm thinking of Cyberpunk books like Neuromancer or Cryptonomicon.  How about artificial intelligence?  When I was a kid one of my favorite discussions on this topic was whether the character Data from "Star Trek: The Next Generation" was alive or not:
  • Teachers need CS, too: In my (admittedly limited) experience, few teacher training programs have little if any focus on CS (or even STEM subjects in general).  How can we attract people with the necessary technical skills to teaching, when the technology sector provides numerous better-paying jobs?  But it's not just technical skills we need in the classroom - we need a special blend of the ability to teach and instruct in addition to the necessary subject knowledge.  At my university they had a course which was in effect 'Math for Teachers'.  Maybe we need something similar for CS.  Greater awareness of CS in general among teachers could help them integrate it into more traditional subjects, especially when CS-specific courses are not available.
Are you supporting CS Education Week in some way?  If so, please sign the pledge and talk about it!  If not, please consider how you can help raise awareness.  I don't think my experience is all that uncommon, and if so we must be losing an awful lot of talent to other fields.  Let's do what we can to help make things better!

Saturday, October 29, 2011

Site Update

For the last few months I've received a few messages from my old school that I will soon be cut off from using their web hosting for this site.  I can't complain seeing as I graduated in mid-2009 and it's now nearly the end of 2011!

I debated what to do - should I get my own domain and find storage somewhere else?  Instead, I found another solution - Blogger Pages.  This [somewhat] new feature allows users to create static pages in addition to their blogs, and with a little template magic I was able to pretty much clone the old pages - apart from the URLs I think the migration has been pretty seamless, and am quite pleased with the results.

Since I have pretty much written my own template and gotten rid of almost everything except the bare minimum blogger-related code, I had a few challenges in migrating to the new page system.  First of all, I was using CSS before to switch the tabs' appearance when you switched pages, so the current tab was always emphasized.  In the new setup, I had to do this all within the template.  What I did was use conditional blogger tags. Before, the list that forms my tabs looked something like this:

<li id='current'><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
<li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
<li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
<li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
<li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
<li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>

Now that I have a single template, I needed to use the conditional blogger tags to put a tab 'switcher' within the template itself. Looks like this:

<div id='menu'>
<ul>
  <b:if cond='data:blog.url == "http://kjtsouka.blogspot.com/p/home.html"'>
    <li id='current'><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
    <li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
    <li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
    <li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
    <li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
    <li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
  </b:if> <b:if cond='data:blog.url == "http://kjtsouka.blogspot.com/p/academic.html"'>
    <li><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
    <li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
     <li id='current'><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
  </b:if> <b:if cond='data:blog.url == "http://kjtsouka.blogspot.com/p/professional.html"'>
     <li><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
     <li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
     <li id='current'><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
  </b:if> <b:if cond='data:blog.url == "http://kjtsouka.blogspot.com/p/personal.html"'>
     <li><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
     <li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
     <li id='current'><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
  </b:if> <b:if cond='data:blog.url == "http://kjtsouka.blogspot.com/p/contact.html"'>
     <li><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
     <li><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
     <li id='current'><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
  </b:if> <b:if cond='data:blog.pageType != "static_page"'>
     <li><a href='http://kjtsouka.blogspot.com/p/home.html'>Home</a></li>
     <li id='current'><a href='http://kjtsouka.blogspot.com'>Blog</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/academic.html'>Academic</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/professional.html'>Professional</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/personal.html'>Personal</a></li>
     <li><a href='http://kjtsouka.blogspot.com/p/contact.html'>Contact</a></li>
   </b:if>
</ul>
</div>

Have you found conditional tags useful in your blog(s)? I think this is pretty cool and in a way makes me less reliant on the various css, javascript, and html hacks I was using before to achieve the same effects.

Monday, October 3, 2011

A trip to Ada

**This post sort of coincides with the arrival of Ada Lovelace Day (on October 7 this year, get ready!).**

I've walked past Ada's Technical Books a few times in the last few months and noticed some very cool window display items (such as Neal Stephenson's Reamde), but I was always there when it was closed. Yesterday however, I happened to be in the area when it was open and got a chance to finally browse inside.

Outside Ada's Technical Books on Capital Hill in Seattle, WA

This store is so neat! It's named for Ada Lovelace, widely thought to be the first female computer programmer. There were little references to technical women throughout the store, including a book about Grace Hopper in the biography section, greeting cards featuring technical women at the front of the store, and of course the name of the shop itself. Besides this, there is a very nice selection of technical books, quirky puzzles and games, biographies about famous scientists, programmers, and mathmeticians, and even a small sci-fi and young adult section. In the latter I spied a Marvel comic version of Orson Scott Card's Ender's Game - have any of you read it? Would you recommend it?

I really enjoyed my visit to this shop and if you are in the area, I'm sure you will too. :)

Saturday, May 29, 2010

Public Table Extraction Dataset

I am posting a copy of the table extraction dataset I created for my thesis here.

The dataset has three parts:
  • PublicTableExtractionDataset, a SQL database to keep track of the html pages and tables and which contains the manual labels of 'data table' or 'layout table'
  • JavaCrawlerTestDump, a folder containing all the crawled html pages
  • TableDump, a folder containing all the extracted tables from each crawled html page
Practical Information
Schema: PublicTableExtractionDataset consists of two tables, HTMLPages and Table_Contents. HTMLPages contains information on where html pages are located and how to identify them, while Table_Contents contains information on each table extracted from each HTMLPage, as well as the type of table it is (a value of '1' indicates a layout table, while a value of '2' indicates a data table).

The schema for the two tables is as follows:

HTMLPages:
  • File_ID (int, not null)
  • File_Name (varchar(200), not null)
  • Page_Domain (varchar(200), not null)
  • URL (varchar(1000), not null)
  • Page_Type(int, not null)
Table_Contents:
  • File_ID (int, not null)
  • Table_ID (int, not null)
  • Table_File_Location (varchar(200), not null)
  • Table_Type (int, null)
Format: This database is a backup of the original SQL database I used. You will need to import it to a new database using the 'import database' wizard provided with SQL Server. I have tested this with the express and full versions of SQL Server 2000 and 2008, so please let me know if you have any questions.

Accessing html pages and tables: I have removed the folder locations from the database, but you can easily add your own. For example, to update the HTMLPages SQL table to add the locations, you could use the following query:

update HTMLPages
set File_Name = 'new location' + File_Name
from HTMLPages

The same query could be used to update the Table_Contents table, just remember to change HTMLPages to Table_Contents.

Dataset Statistics
I collected 9,365 HTML pages which contain the <table> tag from 512 random domains. These pages contain a minimum of 1 and a maximum of 1,539 table pages. 6,620 table pages consist only of non-data tables, while 2,745 pages consist of at least one data table.

The total number of tables collected was 78,438, with 74,202 (94.6%) of these being non-data tables, and 4,236 (5.4%) being data tables.

More Details
You can read more about this data set and the experiments I used it for in my thesis.

Tuesday, May 19, 2009

Populating a Flex DataGrid Dynamically from an XMLList

Ok, so I've looked everywhere and couldn't find one succinct, easy to understand explanation for how to do this. So I'm going to take a stab at it.

The Problem
I have a web service that provides my Flex application with some XML. This XML represents a data table, and in my Flex application I want to populate a DataGrid with that XML, thus showing a visual representation of the data table. The problem is it is quite tricky to populate the DataGrid dynamically at run time. Most examples show you how to do it at compile time, that is, by specifying the columns and their fields in the code. My problem is that I don't know the column fields in advance, this is passed to me from the web service.

The Solution
The XML from the web service is in the following format:

<NewDataSet>
<Table1>
<column1>data1</column1>
<column2>data2</column2>
<column3>data3</column3>
</Table1>
<Table1>
...
</Table1>
</NewDataSet>

First, we create a global variable that will hold the contents of the XML sent from the web service. Here's mine:

[Bindable]
private var _xmlData:XMLList;

Now, we need to get the XML from the web service. First, I have a button that the user can click to connect to the web service. When the button is clicked, we activate the following function:

private function getTableData(event:MouseEvent):void
{
var service:WebService = new WebService();
service.addEventListener(ResultEvent.RESULT, serviceResultDataTableReceiver);
service.addEventListener(FaultEvent.FAULT, serviceFaultHandler);
service.loadWSDL("http://localhost:4753/PlayersService.asmx?wsdl");
service.getDataTable(tableComboBoxData.text());
}

Here 'getDataTable' is the web service method, which takes a parameter representating the ID of the table I want to retrieve from the database. That part isn't really important to you, it could be any web service method that sends your Flex application some XML.

To retreive the XML from the web service, I've attached an eventListener called 'serviceResultDataTableReceiver'. Here are the details for that function:

private function serviceResultDataTableReceiver(event:ResultEvent):void {
var tempXML:XML = XML(event.result);
_xmlData = tempXML.Table1 as XMLList;
dataGrid.dataProvider = _xmlData;
for each ( var node : XML in _xmlData[0].children() ) {
addDataGridColumn(node.name());
}
}

Here we get the result XML from the web service (tempXML) and convert it to an XMLList. That will be a DataProvider for our DataGrid. Once we have the XMLList, we need to add the columns dynamically, depending on how many child nodes we have (i.e.: column1, column2, and column3 in our XML example above.

Next, we need the function to add the columns to our DataGrid:

private function addDataGridColumn(dataField:String):void {
var dgc:DataGridColumn = new DataGridColumn(dataField);
var cols:Array = dataGrid.columns;
cols.push(dgc);
dataGrid.columns = cols;
dataGrid.validateNow();
}

Here we pass the name of the node to the function and create a new DataGridColumn with this name. We add this column to the array containing the DataGrid's columns ('dataGrid.columns'). validateNow() ensures the table is refreshed before the next column is added. If you don't do this, you may end up with a table that only has the most recently added column.

Finally, special thanks to the poster at The Blogger Guide for the tip on posting code snippets in Blogger posts!

Tuesday, May 5, 2009

SQL Trick

Found a trick today for SQL from this site. I needed to get a random sample from a dataset, where each record has a unique ID record.

According to the site, the query to get one random record by its ID is:

"SELECT TOP 1 someColumn
FROM someTable
ORDER BY NEWID() "

If you change Top 1 to Top 50, you can get 50 random records according to their ID (someColumn in this case). Of course you can use any number you wish.

Tuesday, January 27, 2009

Traversing the DOM in C#

Today I had a problem where I needed to find all tags of a given type in an HTML document and extract whatever was within them, including nested tags. That is, given:
<HTML>
<BODY>
<TAG1>
<TAG1>
</TAG1>
</TAG1>
</BODY>
</HTML>
I should get two strings returned. The first would consist of the outer tag and its contents, and look like:
<TAG1>
<TAG1>
</TAG1>
</TAG1>
and the second would consist of the inner tag and its contents, and look like:
<TAG1>
</TAG1>
I knew I had used some code for traversing the nodes of an HTML page in C# before, but although the basic traversal code was helpful, I didn't have anything that would help me pull out the contents. I had a look online, but couldn't find anything that really matched what I wanted to do. So, I wrote my own. Maybe someone else out there will find it useful too, or can recommend another approach. I'm always open to suggestions!

First, we need to get the HTML document in a form that can be parsed easily. I used IHTMLDocument2, part of the mshtml COM module in C#. My document was already in the form of a string ("stringOfHTML"), so it was easy to transform that into the IHTMLDocument2 format. Here's how it's done:
IHTMLDocument2 doc = new HTMLDocumentClass();
doc.write(new object[] { stringOfHTML });
doc.close();
Once that is done, you need a way to access each node and traverse through them. I store the body node in an IHTMLElement as follows:
IHTMLElement bodyElement = doc.body;
Now I want to iterate through the child nodes, so I use IHTMLElementCollection to create a collection of IHTMLElements, where each item in the collection is a child node of the body tag:
IHTMLElementCollection childTags = IHTMLElementCollection)bodyElement.children;
Using some recursion, we can extract the tags we want from within an HTML document. Here is the code below:
public void extractTagOfType(String stringOfHTML)
{
IHTMLDocument2 doc = new HTMLDocumentClass();
doc.write(new object[] { stringOfHTML});
doc.close();
IHTMLElement bodyElement = doc.body;
IHTMLElementCollection childTags = (IHTMLElementCollection) bodyElement.children;
if (childTags.length > 0)
{
foreach (IHTMLElement child in childTags)
{
if (child.tagName.Equals(DesiredTagName))
{
//do something with the contents of the tag (child.innerHTML)
//check inside this tag in case there are any other tags of this type nested inside it
extractTagOfType(child.innerHTML);
}
else
{
//there might be one of the tags we want nested inside the current node
extractTagOfType(child.innerHTML);
}
}
}
}

Saturday, January 3, 2009

JRuby & Java Communication

I recently spent several frustrating days trying to get some JRuby code to call some Java classes. The problem seemed to be that the individual Java classes required some jars that weren't on the JRuby classpath, although they were on the classpath for the Java project they belonged to. I still haven't figured out the proper way to fix that, and the only solution I've found so far is to include the following line in my JRuby code:

"$CLASSPATH << path/to/something.jar"

This includes the required jar in the JRuby classpath. I'd really like to be able to just give a -cp (or similar) command to the JRuby interpreter, but haven't been able to figure out what that would be yet.

The next issue was that there were some dlls required by the Java class. I'm still not sure why, since when the Java project is run independently it isn't necessary to tell the JVM where the dlls are, but when the classes are called individually the dlls aren't found, leading to problems. The only way I could find to fix this problem is to include the following line in my Java class:

"System.load(path/to/myDLL.dll)"

Now the entire thing is up and running, but I wish there were a better way to solve these problems. It took me a while to even get this far, so hopefully someone else having the same problem will find this helpful.

Tuesday, August 19, 2008

Ankh!

I have discovered ankhsvn - an integrated subversion control plugin for Visual Studio. It is so easy to use and I can revert, submit, etc right from inside the IDE! It even works with Visual Studio 2008. Love it, especially since I was intimidated by SVN as a new user of subversion. If you are using Visual Studio and want to use subversion, you will want to check this plugin out.

Wednesday, June 25, 2008

Shocking

C# doesn't have a built in Set collection?! Crazy.

Wednesday, October 31, 2007

So Others Will Not Share My Pain

Two things I learned today:

1) LaTeX
To include a .cls stylesheet in a LaTeX document, simply place it in the same directory. Easy!

2) .NET
When making queries to very, very large tables (>>5 million tuples), if you are doing statements with aggregation, like select count(*), make sure to add the following line (in bold) after you create your command. For example:

SqlCommand getSizeCmd = new SqlCommand(tempCommand, myConnection);
getSizeCmd.CommandTimeout = 10000;

Otherwise, pain, pain pain, I tell you!

(Basically, you will get timeout errors every time.)