Main Content

Python dictionaries - reaching to new uses

Archive - Originally posted on "The Horse's Mouth" - 2010-10-05 21:33:51 - Graham Ellis

If it sounds easy, it should be easy.

"Go through a file and echo the lines ... but if a line's duplicated, echo it only once". This was a sample application requested on today's Python course ... but one which has been troubling the folks for a while as they tried to filter the data in other languages.

The solution? Set up a dictionary, and as each line is read pretend that the whole line is a key ... and see if the key exists (i.e. if the line occurred previously). If it didn't, then output the line and add a member to the dictionary to indicate that you've done so - keyed on the line, and with a value of "1" which we're not really interested in - we're just looking at the presence or absence of a member. Source code [here] and it's very short!

Such a use of dictionaries has provided a big leap forward for this application. What was previously a nighmare task is now just a few lines of code, and the hashing algorithm used by the dictionary means that it's going to be efficient even as the data set grows.


But there are lots of other ways that dictionaries can be used in this "unique line" application - just as there are lots of ways to get across the water once you find the narrowest and easiest crossing point ...


[here] is the application expanded to give a count of the number of times each line occurs, sorted by the number of occurrences.

[here] is an exended report, telling you all the line numbers on which a particular string occurs (makes up the line)

[here] is the same report, but sorted by the number of occurrences of each line - so you have all the unique lines together, for example.

Oh - the sample data is [here].

Further pictures at St Budeaux / the Saltash ferry ...