Category: Python

How to identify duplicate lines in a text file using Python

Here is a short program in Python to identify the count of duplicate lines in a text file.

import tkinter as tk
from tkinter import filedialog
from collections import defaultdict
import pandas as pd
import collections
from pathlib import Path
import os

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()

label1 = tk.Label(root, text='Log Analyser')
label2 = tk.Label(root, text='Import a file...')
label1.config(font=('Arial', 20))
label2.config(font=('Arial', 10))
canvas1.create_window(400, 50, window=label1)
canvas1.create_window(200, 180, window=label2)

def getLogFile ():
      global df

      import_file = filedialog.askopenfilename()
      Counter = 0

      with open(import_file, "r+") as f:
            d = f.readlines()
            f.seek(0)
            entries = Path(import_file)
            fileabspath = os.path.abspath(import_file)
                        
            fw= open(fileabspath.replace(entries.name,"Duplicate_Log_Info.txt"),"w+")
            
            counts = collections.Counter(l.strip() for l in f)
            for line, count in counts.most_common():
                #print (line, "|"+str(count))
                fw.write(line + "|"+str(count) + "\n")
            label3 = tk.Label(root, text=entries.name + ": Import is successful, Please check the output file - "+ fw.name + ".")
            label3.config(font=('Arial', 10))
            canvas1.create_window(400, 220, window=label3)
            f.close()
            fw.close()

            
browseButton_Excel = tk.Button(text='Choose a file...', command=getLogFile, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)

button3 = tk.Button (root, text='Close', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(500, 180, window=button3)

root.mainloop()

Output:

If you enjoyed this blog post, feel free to share it with your friends!

How to draw multi line graphs in python using matplotlib and tkinter

This is the follow up post to my earlier posts. Today we will extend the last post to further look at having multi line in the graph using matplotlib and tkinter.

Source Code:
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd

from random import randint
colors = []

for i in range(12):
    colors.append('#%06X' % randint(0, 0xFFFFFF))
    
    
root= tk.Tk()
 
canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack() 
label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)
 
def getExcel ():
    global df
 
    import_file_path = filedialog.askopenfilename()
    df = pd.read_excel (import_file_path)
    global bar1
    figure1 = Figure(figsize=(4,3), dpi=100)
    subplot1 = figure1.add_subplot(111)
    #subplot1.bar(x,y,color = 'lightsteelblue')
    bar1 = FigureCanvasTkAgg(figure1, root)
    bar1.name='latheesh'
    bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
    

    for i in range(0,len(pd.unique(df['Month']))):
        x=  df['Day'][df['Month']==i+1]
        y=  df['Count'][df['Month']==i+1]
        subplot1.plot(x, y, color=colors[i+1], linestyle='dashed', linewidth = 1, marker='o', markerfacecolor=colors[i+1], markersize=12)
    
 
def clear_charts():
    bar1.get_tk_widget().pack_forget()
 
browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)
 
button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)
 
button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)
 
root.mainloop()

Sample Excel data:

Resulted Graph:

Explanation:

The major differences are inclusion of two snippets:

1. Added an array for color codes

from random import randint
colors = []

for i in range(12):
    colors.append('#%06X' % randint(0, 0xFFFFFF))

2. Drawing multiple lines, in our case, the month column is the filtered data for drawing multiple lines.For every unique month, it loops the data and plot individual line in the graph.

    for i in range(0,len(pd.unique(df['Month']))):
        x=  df['Day'][df['Month']==i+1]
        y=  df['Count'][df['Month']==i+1]
        subplot1.plot(x, y, color=colors[i+1], linestyle='dashed', linewidth = 1, marker='o', markerfacecolor=colors[i+1], markersize=12)

Hope this helps to understand the basics of drawing graph from Excel. I would recommend you to explore and have hands on further to understand better.

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

Open and Read from an Excel File and plot a chart in Python using matplotlib and tkinter

Today, we are going to see a simple program to read an excel and plot a chart using the data. In this example, we are going to explore few important features like – FileDialog, tkinter etc. Before we go through the details, Let us look at the entire code as below.
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()

label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)

def getExcel ():
      global df

      import_file_path = filedialog.askopenfilename()
      df = pd.read_excel (import_file_path)
      global bar1
      x = df['Day']
      y = df['Count']

      figure1 = Figure(figsize=(4,3), dpi=100)
      subplot1 = figure1.add_subplot(111)
      subplot1.bar(x,y,color = 'lightsteelblue')
      bar1 = FigureCanvasTkAgg(figure1, root)
      bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
      subplot1.plot(x, y, color='green', linestyle='dashed', linewidth = 3, marker='o', markerfacecolor='blue', markersize=12)

def clear_charts():
      bar1.get_tk_widget().pack_forget()

browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)

button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)

button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)

root.mainloop()
You can run the above code and see the output. Now, let us quickly go segment by segment to understand better. The below are the code to import tkinter, matplotlib and pandas
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd
create tkinter object and open a Canvas using the below code.
root= tk.Tk()
canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()
Let us configure the basic information for the canvas.
label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)
Function Definitions as below to open the file using filedialog and read the excel. You can see the sample data in the excel used in the example code.
def getExcel ():
      global df
 
      import_file_path = filedialog.askopenfilename()
      df = pd.read_excel (import_file_path)
      global bar1
      x = df['Day']
      y = df['Count']
 
      figure1 = Figure(figsize=(4,3), dpi=100)
      subplot1 = figure1.add_subplot(111)
      subplot1.bar(x,y,color = 'lightsteelblue')
      bar1 = FigureCanvasTkAgg(figure1, root)
      bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
      subplot1.plot(x, y, color='green', linestyle='dashed', linewidth = 3, marker='o', markerfacecolor='blue', markersize=12)
 
def clear_charts():
      bar1.get_tk_widget().pack_forget()
Create buttons to perform the events in the requirements and mainloop invokation.
browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)
 
button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)
 
button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)

root.mainloop()
In the next post, we will see more on plotting multi lines with a real time example in the Canvas.

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

A glance at Anaconda, Jupyter notebook and Python for beginners

Jupyter notebook is traditional IDE for Python. It is a very popular IDE for most of data professionals as its very easy to install and use.
Jupyter notebook basically includes 2 components – jupyter notebook Server and a Browser. Browser communicates to server and process the requests.Browser usually uses a default localhost:8888 to connect to jupyter server.

Now, let us look at Anaconda, a package manager which allows to install many libraries. When install Anaconda, Python and Jupyter comes along with the installation. To install Anaconda, go to https://anaconda.org (for individual -> https://www.anaconda.com/products/individual#windows) and download the latest file which is compatible to the workstation (depending on windows or Mac).

To launch jupyter, launch Anaconda navigator and then select jupyter, which would eventually open a browser where programmers can write and
run the codes.

Writing First Program in Jupyter notebook

1. Create a folder in Desktop to put save our sample work

2. Create python file by clicking New Python 3(in the screenshot)

3. It will open a code blocker where you can write programs

I am a beginner to Python and am writing these posts as I learn things for two main reasons, not to forget and to share with community. I would like to share your thoughts and experiences in comment section, so we all will be part of learning and sharing!

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

Python Dictionaries

A dictionary is a set of unordered key, value pairs. In a dictionary, the keys must be unique and they are stored in an unordered manner.

In this tutorial you will learn the basics of how to use the Python dictionary.

Creating a Dictionary:

dict1

Accessing Items:
You can access the items of a dictionary by referring to its key name, inside square brackets:

dict2

Updating Dictionary:
You can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or deleting an existing entry as shown below in the simple example −

dict3

Loop Through a Dictionary:
You can loop through a dictionary by using a for loop.When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.

dict4.JPG

Check if Exists:

You can test the presence of a key using ‘in’ or ‘not in’

dict5

Restrictions on Dictionary Keys:
Almost any type of value can be used as a dictionary key in Python. As an example,  integer, float, and Boolean objects are used as keys:

dict6

However, there are a couple restrictions that dictionary keys must abide by.

First, a given key can appear in a dictionary only once. Duplicate keys are not allowed. A dictionary maps each key to a corresponding value, so it doesn’t make sense to map a particular key more than once.
You could see below that when you assign a value to an already existing dictionary key, it does not add the key a second time, but replaces the existing value:

dict7

Restrictions on Dictionary Values:
By contrast, there are no restrictions on dictionary values. Literally none at all. A dictionary value can be any type of object Python supports, including mutable types like lists and dictionaries, and user-defined objects.There is also no restriction against a particular value appearing in a dictionary multiple times.

Hope you have enjoyed the post. Keep reading