Data structrue and algorithm

Basic data structrue and algorithm

Posted by Yingshan Li on January 8, 2020

递归(Recursion):

将大的计算不断分割成小的计算,递归函数不断调用自身,比如斐波拉契数列f(n)=f(n-1)+f(n-2)

搜索

  1. 顺序搜索(linear search):对于无序Array,需要遍历整个array;对于sorted array,只需要遍历一部分
  2. Binary search:适用于sorted array,运用divide and conquer方法

排序

1. 冒泡排序(Bubble sort):

像泡泡一样,大的值不断向上走

def bubbleSort( theSeq ):
	n = len( theSeq )
	for i in range( n - 1 ) :
		for j in range( n –i - 1 ) :
			if theSeq[j] > theSeq[j + 1] :
				tmp = theSeq[j]
        		theSeq[j] = theSeq[j + 1]
        		theSeq[j + 1] = tmp

2. 选择排序(Selection sort):

最直观的排序方法,就像打扑克一样,先选出最小的,再选出第二小的,以此类推

def selectionSort( theSeq ):
	n = len( theSeq )
	for i in range( n - 1 ): 
		smallNdx = i
		for j in range( i + 1, n ): 
			if theSeq[j] < theSeq[smallNdx] : 
				smallNdx = j 
	if smallNdx != i : 
		tmp = theSeq[i] 
     		theSeq[i] = theSeq[smallNdx]
     		theSeq[smallNdx] = tmp

3. 插入排序(Insertion sort):

这个也跟打扑克有点像,事实上应该是大部分人打扑克时的排序方法,就是不停的将后面的每个item插入到前面排好的序列中

def insertionSort( theSeq ): 
	n = len( theSeq )
	for i in range( n-1 ) : 
    	value = theSeq[i]
		pos = i
		while pos > 0 and value < theSeq[pos - 1] : 
     		theSeq[pos] = theSeq[pos - 1]
      		pos -= 1
    	theSeq[pos] = value

4. 归并排序(Merge sort):

运用divide and conquer (recursion),将array不停的分割一直分到只有两个item,然后将两个item排序,最后再归并起来

def pythonMergeSort( theList ): 
	# Check the base case. 
	if len(theList) <= 1 : 
		return theList 
	else :
		# Compute the midpoint. 
       mid = len(theList) // 2
       # Split the list and perform the recursive step.
        leftHalf = pythonMergeSort( theList[ :mid ] )
        rightHalf = pythonMergeSort( theList[ mid: ] )
        #Merge the two ordered sublists.
        newList = mergeOrderedLists( leftHalf, rightHalf ) 
	return newList 

5. 快速排序(Quick sort):

运用divide and conquer (recursion),随便选择一个item当做pivot,然后将大于它的元素放在左边,将小于它的元素放在右边,这样循环(recursion)下去,

	\# Sorts an array or list using the recursive quick sort algorithm. 
def quickSort( theSeq ):
n = len( theSeq ) recQuickSort( theSeq, 0, n-1 ) 

	\# The recursive implementation using virtual segments. 
def recQuickSort( theSeq, first, last ): 
	if first >= last : # Check the base case. 
		return 
	else : 
      	# Save the pivot value.
     	pivot = theSeq[first]
		# Partition the sequence and obtain the pivot position. 
		pos = partitionSeq( theSeq, first, last ) 
		# Repeat the process on the two subsequences. 
     	recQuickSort( theSeq, first, pos - 1 )
     	recQuickSort( theSeq, pos + 1, last )

6. 基数排序(Radix sort):

先根据个位数从0-9将元素划分,下一次在I 基础上根据十位数从0-9划分,以此类推

from llistqueue import Queue 
from array import Array 
def radixSort( intList, numDigits ):
\#the latter represents the maximum number of digits possible in the largest key value.
	binArray = Array( 10 ) 
	for k in range( 10 ): 
  		binArray[k] = Queue()
	column = 1
		for d in range( numDigits ): 
			for key in intList :
				digit = (key // column) % 10 
				binArray[digit].enqueue( key ) 
			i=0
			for bin in binArray : 
				while not bin.isEmpty() : 
					intList[i] = bin.dequeue() 
					i += 1 
			column *= 10 

二叉树

1. 基本概念

由分支(edges)和节点(nodes)组成,每棵树都有一个根,节点可以分为父节点和子节点,也可以分为内节点(interior node,至少又一个子节点),和叶节点(leaf node,没有子节点了)。二叉树分为完美二叉树(perfect binary tree,所有的叶节点都在同一level)和完全二叉树(complete binary tree,从头到h-1层是完美二叉树,最后一层从左到右没有间隔)

2. 遍历二叉树:

1. Preorder traversal (先序遍历):

先访问Node,再遍历左边,然后遍历右边,这样递归

def preorderTrav( subtree ): 
	if subtree is not None : 
    	print( subtree.data )
    	preorderTrav( subtree.left )
    	preorderTrav( subtree.right )

2. Inorder traversal(中序遍历):

先遍历左边,再访问Node,最后访问右边

def inorderTrav( subtree ): 
	if subtree is not None : 
        inorderTrav( subtree.left )
        print( subtree.data )
        inorderTrav( subtree.right )

3. Postorder traversal(后序遍历):

先遍历左边,再遍历右边,最后访问Node

def postorderTrav( subtree ): 
	if subtree is not None :
		postorderTrav( subtree.left )
		postorderTrav( subtree.right )
		print( subtree.data )

4. Breath-First traversal(宽度优先遍历):

从左往右一层一层的遍历

def breadthFirstTrav( bintree ):
# Create a queue and add the root node to it. 
Queue q
q.enqueue( bintree )
\# Visit each node in the tree. 
while not q.isEmpty() :
	# Remove the next node from the queue and visit it. 
   node = q.dequeue()
   print( node.data )
   # Add the two children to the queue.
   if node.left is not None :
   		q.enqueue( node.left ) 
	if node.right is not None : 
		q.enqueue( node.right ) 

3. Expression tree:

叶节点上是数值,其它节点上是操作符,处于低层的操作符优先执行,根节点的操作符最后执行

class ExpressionTree :
	def __init__( self, expStr ): 
    	self._expTree = None
    	self._buildTree( expStr )
	def evaluate( self, varMap ):
		return self._evalTree( self._expTree, varMap ) 
	def __str__( self ):
		return self._buildString( self._expTree ) 
\# ... 
\# Storage class for creating the tree nodes. 

class _ExpTreeNode :
	def __init__( self, data ): 
   		self.element = data
    	self.left = None
    	self.right = None

4. 用二叉树来构建heaps(堆栈)

Max heap(最大堆):每个节点的数值都大于子节点; Min heap(最小堆):每个节点的数值都大于子节点

class MaxHeap :
# ... # Add a new value to the heap. 
def add( self, value ):
	assert self._count < self.capacity(), 
		"Cannot add to a full heap."
	# Add the new value to the end of the list. 
	self._elements[ self._count ] = value self._count += 1 
   	# Sift the new value up the tree.
  	self._siftUp( self._count - 1 )
			
def _siftUp( self, ndx ): 
	if ndx > 0 : 
    	parent = ndx // 2
		if self._elements[ndx] > self._elements[parent] : 
			tmp = self._elements[ndx]
			self._elements[ndx] = self._elements[parent] 
			self._elements[parent] = tmp 
			self._siftUp( parent )

class MaxHeap : 
\# ... 
	def extract( self ): 
		assert self._count > 0, 
			"Cannot extract from an empty heap."
		value = self._elements[0]
		self._count -= 1
		self._elements[0] = self._elements[ self._count ] 
		self._siftDown( 0 ) 

动态规划

1. 公式:

2. 伪代码:

3. Sequence alignment序列比对:

penalty score. For match, mismatch and gap; fill out the score matrix by choosing the max number; backtracking

3. 0-1 Knapsack(0-1背包问题):

假设背包的承重上限为6,有四种物品,价值分别为30,40,45,100;对应的重量为1,2,3,4.

分析过程

  1. 当只有0个物品或者背包空间为0的时候,什么也拿不了,因此拿到的总重量和价值都为0(第一行和第一列)
  2. 当只有一个物品(30, 1),即价值是30重量是1时,不管背包大小是多少,都只能拿一个,总价值为30
  3. 当物品有两个(30,1)和(20,2)时,可以只拿(30,1),也可以只拿(40,2),也可以两个都拿(总价值为70)
  4. 当物品有三个时,即(30,1)、(40,2)、(45,3)时,我们可以选择(30,1)、(40,2)、(45,3)、(30,1)和(40,2)、(30,1)和(45,3)、(40,2)和(45,3)、(30,1)和(40,2)和(45,3)。所有的情况都没有超出背包的极限
  5. 当物品有4个时,以此类推,注意不能超过背包的极限

图论

1. 概念

1. 图G=(V, E), V代表nodes也叫vertices,E表示Edges
2. Kn:complete graph,所有的edges都相连
3. Cn:circle graph,不一定所有的edges都相连,但首尾相接
4. trees:没有circle的graph

2. 遍历图Graph traversal

DFS keeps walking down a path until it is forced to backtrack; it backtracks until it finds a new path to go down 

Recursive implementation: u, s – start node 

Stack-based implementa:on of DFS

 

3. Minimum spanning tree (MST)

  1. Prim’s algorithm: 在graph中随便找个node作为起点,然后一个一个的加入下一个节点让新的节点加入的edge长度最小。
  2. Kruskal’s algorithm:Add edges in increasing weight and skip those whose addition would create a cycle.
  3. Dijkstra’s algorithm:针对directed graph,根Prim‘s差不多,只是有方向了而已

贪婪算法:

跟动态规划算法相对,可以获取最佳结果