Python Pune Meetup was held @ Amazatic Solutions and that was about Part II of Machine learning by Sudarshan Gadhave on 28th May 2016. Meetup was specifically targeted on python pandas as last meetup already covered basics of data science.

Sudarshan started from Required tools/libraries for machine learning :

  • Ipython
  • numpy
  • python-pandas
  • python-scikit-learn
  • matplotlib

These are best opensource alternatives to softwares like MATLAB !!

Machine learning is all about technical,business and statistics. if there is small data sets. that can be done manually, but if there are thousands of rows and columns in data stats, then what ? most of the time of Data Scientists is consumed to get the data,analyze the data,sepeartion of data,and cleaning of data, that means almost 70% of the total work in intial phase. Later comes the actual business logic and model building.

we have used titanic data sets for example (https://github.com/pcsanwald/kaggle-titanic/blob/master/train.csv).

Then, Following things we have covered:

  • Numpy basics
  • Pandas Data Structures and basics
  • Data Loading
  • Data Wrangling (Clean, Transform, Merge)
  • Data Aggregation and grouping operations
  • Exploratory Data Analysis and Descriptive Statistics
  • Plotting and Visualization

Features/comarison of Numpy & pandas:

numpy:

  • built for fast array processing, vector operations
  • 15 times faster then list
  • made for scientific calculations

python pandas :

  • Datastructure series - 1d numpy array stores data. its like column in dataframe context
  • series also has the index and the value
  • We can change the values in series using index
  • Dataframe is collection of series like excel sheet
  • More the memory More data can be analysed in python pandas

Click Here to know more about pandas from beginning

We also discussed about the Matplotlib,Matplotlib is the library used to mainly produce the charts related to the data. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of code. I found this as good and free alternative to MATLAB.

Sudarshan Suggested to go through this book to know more & from very basics towards machine learning as well Data science: Python for data Analysis

Those who missed meetup can follow these links : https://github.com/sudarshan1413/python_pandas_meetup_2016

Thank you Amazatic Solutions, and Stay Connected to Python Pune for upcoming meetups !!!

Selfie @meetup @Amazatic by Sudarshan:

.