Category: NO SQL

How to identify duplicate lines in a text file using Python

Here is a short program in Python to identify the count of duplicate lines in a text file.

import tkinter as tk
from tkinter import filedialog
from collections import defaultdict
import pandas as pd
import collections
from pathlib import Path
import os

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()

label1 = tk.Label(root, text='Log Analyser')
label2 = tk.Label(root, text='Import a file...')
label1.config(font=('Arial', 20))
label2.config(font=('Arial', 10))
canvas1.create_window(400, 50, window=label1)
canvas1.create_window(200, 180, window=label2)

def getLogFile ():
      global df

      import_file = filedialog.askopenfilename()
      Counter = 0

      with open(import_file, "r+") as f:
            d = f.readlines()
            f.seek(0)
            entries = Path(import_file)
            fileabspath = os.path.abspath(import_file)
                        
            fw= open(fileabspath.replace(entries.name,"Duplicate_Log_Info.txt"),"w+")
            
            counts = collections.Counter(l.strip() for l in f)
            for line, count in counts.most_common():
                #print (line, "|"+str(count))
                fw.write(line + "|"+str(count) + "\n")
            label3 = tk.Label(root, text=entries.name + ": Import is successful, Please check the output file - "+ fw.name + ".")
            label3.config(font=('Arial', 10))
            canvas1.create_window(400, 220, window=label3)
            f.close()
            fw.close()

            
browseButton_Excel = tk.Button(text='Choose a file...', command=getLogFile, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)

button3 = tk.Button (root, text='Close', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(500, 180, window=button3)

root.mainloop()

Output:

If you enjoyed this blog post, feel free to share it with your friends!

Python Dictionaries – del, clear

This is a continuation of Kiran’s Python Dictionaries post. There were few online and offline questions, so thought of writing a post on del and clear commands on Python Dictionary.

Few Dictionary operations

To better explain, let us create a sample dictionary for our explanation.

fruit= {"ORange":"I love ORange",
       "Apple":"Apple is good for health"}
print ("Entire defined dictionary values")
fruit

1. How to Delete a key from Dictionary

fruit= {"ORange":"I love ORange",
       "Apple":"Apple is good for health"}
#Delete command is as below
print ("Delete a single element")
del fruit["Apple"]
fruit

2. How to Delete entire Dictionary

fruit= {"ORange":"I love ORange",
       "Apple":"Apple is good for health"}
print ('Delete the dictionary')
del fruit
fruit

3. How to Clear entire Dictionary

fruit= {"ORange":"I love ORange",
       "Apple":"Apple is good for health"}
print ('Clear the dictionary elements')
fruit.clear()
fruit

If you enjoyed this blog post, feel free to share it with your friends!

How to draw multi line graphs in python using matplotlib and tkinter

This is the follow up post to my earlier posts. Today we will extend the last post to further look at having multi line in the graph using matplotlib and tkinter.

Source Code:
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd

from random import randint
colors = []

for i in range(12):
    colors.append('#%06X' % randint(0, 0xFFFFFF))
    
    
root= tk.Tk()
 
canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack() 
label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)
 
def getExcel ():
    global df
 
    import_file_path = filedialog.askopenfilename()
    df = pd.read_excel (import_file_path)
    global bar1
    figure1 = Figure(figsize=(4,3), dpi=100)
    subplot1 = figure1.add_subplot(111)
    #subplot1.bar(x,y,color = 'lightsteelblue')
    bar1 = FigureCanvasTkAgg(figure1, root)
    bar1.name='latheesh'
    bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
    

    for i in range(0,len(pd.unique(df['Month']))):
        x=  df['Day'][df['Month']==i+1]
        y=  df['Count'][df['Month']==i+1]
        subplot1.plot(x, y, color=colors[i+1], linestyle='dashed', linewidth = 1, marker='o', markerfacecolor=colors[i+1], markersize=12)
    
 
def clear_charts():
    bar1.get_tk_widget().pack_forget()
 
browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)
 
button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)
 
button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)
 
root.mainloop()

Sample Excel data:

Resulted Graph:

Explanation:

The major differences are inclusion of two snippets:

1. Added an array for color codes

from random import randint
colors = []

for i in range(12):
    colors.append('#%06X' % randint(0, 0xFFFFFF))

2. Drawing multiple lines, in our case, the month column is the filtered data for drawing multiple lines.For every unique month, it loops the data and plot individual line in the graph.

    for i in range(0,len(pd.unique(df['Month']))):
        x=  df['Day'][df['Month']==i+1]
        y=  df['Count'][df['Month']==i+1]
        subplot1.plot(x, y, color=colors[i+1], linestyle='dashed', linewidth = 1, marker='o', markerfacecolor=colors[i+1], markersize=12)

Hope this helps to understand the basics of drawing graph from Excel. I would recommend you to explore and have hands on further to understand better.

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

Open and Read from an Excel File and plot a chart in Python using matplotlib and tkinter

Today, we are going to see a simple program to read an excel and plot a chart using the data. In this example, we are going to explore few important features like – FileDialog, tkinter etc. Before we go through the details, Let us look at the entire code as below.
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd

root= tk.Tk()

canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()

label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)

def getExcel ():
      global df

      import_file_path = filedialog.askopenfilename()
      df = pd.read_excel (import_file_path)
      global bar1
      x = df['Day']
      y = df['Count']

      figure1 = Figure(figsize=(4,3), dpi=100)
      subplot1 = figure1.add_subplot(111)
      subplot1.bar(x,y,color = 'lightsteelblue')
      bar1 = FigureCanvasTkAgg(figure1, root)
      bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
      subplot1.plot(x, y, color='green', linestyle='dashed', linewidth = 3, marker='o', markerfacecolor='blue', markersize=12)

def clear_charts():
      bar1.get_tk_widget().pack_forget()

browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)

button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)

button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)

root.mainloop()
You can run the above code and see the output. Now, let us quickly go segment by segment to understand better. The below are the code to import tkinter, matplotlib and pandas
import tkinter as tk
from tkinter import filedialog
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
import pandas as pd
create tkinter object and open a Canvas using the below code.
root= tk.Tk()
canvas1 = tk.Canvas(root, width = 800, height = 300)
canvas1.pack()
Let us configure the basic information for the canvas.
label1 = tk.Label(root, text='Data Analyser')
label1.config(font=('Arial', 20))
canvas1.create_window(400, 50, window=label1)
Function Definitions as below to open the file using filedialog and read the excel. You can see the sample data in the excel used in the example code.
def getExcel ():
      global df
 
      import_file_path = filedialog.askopenfilename()
      df = pd.read_excel (import_file_path)
      global bar1
      x = df['Day']
      y = df['Count']
 
      figure1 = Figure(figsize=(4,3), dpi=100)
      subplot1 = figure1.add_subplot(111)
      subplot1.bar(x,y,color = 'lightsteelblue')
      bar1 = FigureCanvasTkAgg(figure1, root)
      bar1.get_tk_widget().pack(side=tk.LEFT, fill=tk.BOTH, expand=0)
      subplot1.plot(x, y, color='green', linestyle='dashed', linewidth = 3, marker='o', markerfacecolor='blue', markersize=12)
 
def clear_charts():
      bar1.get_tk_widget().pack_forget()
Create buttons to perform the events in the requirements and mainloop invokation.
browseButton_Excel = tk.Button(text='Load File...', command=getExcel, bg='green', fg='white', font=('helvetica', 12, 'bold'))
canvas1.create_window(400, 180, window=browseButton_Excel)
 
button2 = tk.Button (root, text='Clear Chart', command=clear_charts, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 220, window=button2)
 
button3 = tk.Button (root, text='Exit!', command=root.destroy, bg='green', font=('helvetica', 11, 'bold'))
canvas1.create_window(400, 260, window=button3)

root.mainloop()
In the next post, we will see more on plotting multi lines with a real time example in the Canvas.

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

A glance at Anaconda, Jupyter notebook and Python for beginners

Jupyter notebook is traditional IDE for Python. It is a very popular IDE for most of data professionals as its very easy to install and use.
Jupyter notebook basically includes 2 components – jupyter notebook Server and a Browser. Browser communicates to server and process the requests.Browser usually uses a default localhost:8888 to connect to jupyter server.

Now, let us look at Anaconda, a package manager which allows to install many libraries. When install Anaconda, Python and Jupyter comes along with the installation. To install Anaconda, go to https://anaconda.org (for individual -> https://www.anaconda.com/products/individual#windows) and download the latest file which is compatible to the workstation (depending on windows or Mac).

To launch jupyter, launch Anaconda navigator and then select jupyter, which would eventually open a browser where programmers can write and
run the codes.

Writing First Program in Jupyter notebook

1. Create a folder in Desktop to put save our sample work

2. Create python file by clicking New Python 3(in the screenshot)

3. It will open a code blocker where you can write programs

I am a beginner to Python and am writing these posts as I learn things for two main reasons, not to forget and to share with community. I would like to share your thoughts and experiences in comment section, so we all will be part of learning and sharing!

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!