Main Content
Finding all the unique lines in a file, using Python or Perl Archive - Originally posted on "The Horse's Mouth" - 2012-03-20 19:41:23 - Graham Ellis
A question - how do I process all the unique lines from a file in Python? Asked by a delegate today, solved neatly and easily using a generator which means that there's no need to store all the data - unique values can be passed back and processed onwards as they're found. This is fantastic news if the input isn't really a file, but is some other reporting data source that's slower and you would like to get answers even as the data's still flowing in.
def unique(source):
sofar = {}
for val in open(source):
if not sofar.get(val):
yield val.strip()
sofar[val] = 1
for lyne in unique("info.txt"):
print lyne
[complete source] . Neat, isn't it? I love Python! And to test that love, I thought I would answer the same question in Perl:
sub unique {
open FH,$_[0];
my %sofar;
my @uvals;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
push @uvals,$line;
}
}
return @uvals;
}
foreach $lyne (unique("info.txt")) {
print $lyne;
}
[complete source] . A little longer, and as Perl doesn't have a generator as such, I was tempted to write the code to only return the unique list once the whole incoming data flow had been received. But a little more thought let me produce a generator-line alternative:
sub unique {
$static or open FH,$_[0];
$static = 1;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
return $line;
}
}
return "";
}
while ($lyne = unique("info.txt")) {
print $lyne;
}
[complete source] . Actually rather neat, but relying on the use of a global variable to note the state of the "generator" routine, and a need to take care to flag the end of the data. Careful code examination will show you that the return "";
is actually redundant, as Perl returns the result of the last expression evaluated, which is false
when the loop exits. However, start applying tricks like this and you're getting into code that's going to be hard to maintain.
Truth be know - I love Perl too. See our Perl Courses and Python Courses . Happy to teach you either - to help you use their strengths and write good maintainable code in either.
Some other articles
Y107 - Dictionaries Sorting a dict in Python Unique word locator - Python dict example Sorting in Python 3 - and how it differs from Python 2 sorting Setting up and using a dict in Python - simple first example Exception, Lambda, Generator, Slice, Dict - examples in one Python program Collections in Python - list tuple dict and string. Multiple identical keys in a Python dict - yes, you can! This article Football league tables - under old and new point system. Python program. Learning more about our web site - and learning how to learn about yours Python sets and frozensets - what are they? Passing optional and named parameters to python methods Python - some common questions answered in code examples Python dictionaries - reaching to new uses Looking up a value by key - associative arrays / Hashes / Dictionaries Python - fresh examples of all the fundamentals Using a list of keys and a list of values to make a dictionary in Python - zip Python dictionary for quick look ups Python collections - mutable and imutable Can't resist writing about Python Y105 - Functions, Modules and Packages From and Import in Python - where is the module loaded from? Embedding more complex code into a named block Nesting decorators Recursion in Python - the classic example What are callbacks? Why use them? An example in Python What is the difference between a function and a method? Reading command line parameters in Python A good example of recursion - a real use in Python Python - even named code blocks are objects Multiple yields and no loops in a Python generator? Python functions - an introduction to how they work Python varables - checking existance, and call by name or by value? Exception, Lambda, Generator, Slice, Dict - examples in one Python program vargs in Python - how to call a method with unknown number of parameters Optional positional and named parameters in Python Default local - a good choice by the author of Python Static variables in Python? Python timing - when to use a list, and when to use a generator Functions are first class variables in Lua and Python This article Python Packages - groupings of modules. An introduction Static variables in functions - and better ways using objects Passing optional and named parameters to python methods Catching the fishes first? Passing parameters to Python functions - the options you have Returning multiple values from a function call in various languages - a comparison Using an exception to initialise a static variable in a Python function / method Python - some common questions answered in code examples Passing a variable number of parameters in to a function / method Program for reliability and efficiency - do not duplicate, but rather share and re-use Optional and named parameters to Python functions/methods Python - access to variables in the outer scope Global and Enable - two misused words! Good example of recursion in Python - analyse an RSS feed Sample code with errors in it on our web site Optional parameters to Python functions Multiple returns from a function in Python Conversion of OSI grid references to Eastings and Northings Dynamic code - Python Optional and named parameters in Python What to do with a huge crop of apples Anonymous functions (lambdas) and map in Python Sharing variables with functions, but keeping them local too - Python Global - Tcl, PHP, Python Python Script - easy examples of lots of basics Returning multiple values from a function (Perl, PHP, Python) A better alternative to cutting and pasting code Function / method parameters with * and ** in Python It's the 1st, not the 1nd 1rd or 1th. Sludge off the mountain, and Python and PHP Python - A list of methods Recursion in Python Python - function v method Dynamic functions and names - Python Do not duplicate your code Cottage industry or production line data handling methods Python modules. The distribution, The Cheese Shop and the Vaults of Parnassus. Python - block insets help with documentation Python's Generator functions Difference between import and from in Python What is a callback? Code and code maintainance efficiency Call by name v call by value Lambdas in Python Python generator functions, lambdas, and iterators Distance Learning Variable Scope Q110 - Programming Algorithms Some gems from an introduction to Python Identifying the first and last records in a sequence Testing new algorithms in PHP A good example of recursion - a real use in Python Finding sum, minimum, maximum and average in Python (and Ruby) Selecting RECENT and POPULAR news and trends for your web site users Learning to program - what are algorithms and design patterns? This article Finding the total, average, minimum and maximum in a program Why would you want to use a Perl hash? AND and OR operators - what is the difference between logical and bitwise varieties? How many toilet rolls - hotel inventory and useage Finding elements common to many lists / arrays Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl Arrays v Lists - what is the difference, why use one or the other Lots of way of converting 3 letter month abbreviations to numbers Sorting people by their names Comparing floating point numbers - a word of caution and a solution And and Or illustrated by locks A life lesson from the accuracy of numbers in Excel and Lua Grouping rows for a summary report - MySQL and PHP Matching disparate referencing systems (MediaWiki, PHP, also Tcl) Nuclear Physics comes to our web site Validating Credit Card Numbers Ordnance Survey Grid Reference to Latitude / Longitude Updating a page strictly every minute (PHP, Perl) Speed Networking - a great evening and how we arranged it How similar are two words Bellringing and Programming and Objects and Perl Searching for numbers P211 - Hashes This article Why would you want to use a Perl hash? $ is atomic and % and @ are molecular - Perl Buckets Finding elements common to many lists / arrays Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl Sorting - naturally, or into a different order Looking up a value by key - associative arrays / Hashes / Dictionaries Perl - the duplicate key problem explained, and solutions offered Fresh Perl Teaching Examples - part 2 of 3 Out of memory during array extend - Perl A few of my favourite things Perl - Subs, Chop v Chomp, => v , Environment variables in Perl / use Env Stable sorting - Tcl, Perl and others Perl - a list or a hash? -> , >= and => in Perl (Perl) Callbacks - what are they? What is a callback? Conventional restraints removed